JP6278918B2

JP6278918B2 - Data analysis apparatus, method, and program

Info

Publication number: JP6278918B2
Application number: JP2015056581A
Authority: JP
Inventors: 匡宏幸島; 達史松林; 澤田　宏; 宏澤田
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2015-03-19
Filing date: 2015-03-19
Publication date: 2018-02-14
Anticipated expiration: 2035-03-19
Also published as: JP2016177485A

Description

本発明は、データ解析装置、方法、及びプログラムに関する。 The present invention relates to a data analysis apparatus, method, and program.

ＰＯＳ（Point of Sales）データに代表される購買履歴などの構造化されたデータのみならず、テキストデータや画像データなどの構造化されていないデータの多くは前処理によって行列形式により表現できることが知られている。これら行列表現されたデータ中に存在するクラスタを発見するための手法として、非負値行列分解（Non-negative Matrix Factorization, NMF）と呼ばれる手法の有用性がこれまで示されている(例えば非特許文献１を参照)。 It is known that not only structured data such as purchase history represented by POS (Point of Sales) data but also many unstructured data such as text data and image data can be expressed in a matrix format by preprocessing. It has been. The usefulness of a technique called Non-Negative Matrix Factorization (NMF) has been shown so far as a technique for discovering clusters existing in these matrix-represented data (for example, non-patent literature) 1).

ＮＭＦの適用により入力となる行列データはそれより低次のランクの行列の積に分解される。この各低次行列がそれぞれ各行、各列に対応する事物のクラスタへの寄与度を表しており、クラスタ発見が可能となる。したがって例えば購買履歴データに対し適用することで抽出されたクラスタをもとにユーザへのおすすめ商品リストを作成したり、ニュース記事文書集合に対する適用結果から記事の自動分類が可能となる。 By applying NMF, the input matrix data is decomposed into products of lower-order rank matrices. Each low-order matrix represents the contribution to the cluster of things corresponding to each row and each column, and cluster discovery is possible. Therefore, for example, a recommended product list for a user can be created based on a cluster extracted by applying to purchase history data, and articles can be automatically classified from application results to a news article document set.

ＮＭＦの購買データへの適用例を図１５に示す。購買データを表すユーザ商品行列Ｘ＝{ｘ_ｉｊ}は行列中の第ｉ行目に対応するユーザが第ｊ列目に対応する商品の購入数がｘ_ｉｊの値となるｉ行ｊ列の行列である。 An example of application of NMF to purchase data is shown in FIG. The user product matrix X = {x _ij } representing purchase data is a matrix of i rows and j columns in which the number of products purchased by the user corresponding to the i th row in the matrix corresponding to the j th column is the value of x _ij. It is.

このユーザ商品行列にＮＭＦを適用することで、 By applying NMF to this user product matrix,

となるＩ行Ｒ列のユーザ特徴行列Ａ＝｛ａ_ｉｒ｝とＪ行Ｒ列の商品特徴行列Ｂ＝｛ｂ_ｊｒ｝が求まる。 A user feature matrix A = {a _ir } of I row and R column and a product feature matrix B = {b _jr } of J row and R column are obtained.

ただし、記号 However, the symbol

は、両者が類似していること、記号の上付きの記号Ｔは行列の転置を表す。また、ａ_ｉｒの値がユーザｉのクラスタｒへの寄与度（所属度合い）、ｂ_ｊｒの値が商品ｊのクラスタｒへの寄与度を表す。 Are similar to each other, and the superscript symbol T represents the transpose of the matrix. Further, the value of a _{ir represents} the degree of contribution (affiliation degree) of user i to cluster r, and the value of b _jr represents the degree of contribution of product j to cluster r.

図１５のユーザ特徴行列Ａのクラスタ１に対応する列に着目すると、ユーザ１、ユーザ２、及びユーザ３に対応する１行目、２行目、及び３行目の値が０より大きい値となっていることが分かる。これはユーザ１、ユーザ２、及びユーザ３がクラスタ１に所属することを示している。また、商品特徴行列Ｂのクラスタ１に対応する行に着目すると、１列目のビール１、２列目のビール２、及び３列目のビール３という商品に該当する列の値が０より大きい値となっていることが分かる。これはビール１、ビール２、及びビール３という３つの単語が同じユーザに購入されやすいというクラスタ１のもつ特徴を表しているといえる。したがって、このビール１、ビール２、及びビール３という商品をまとめてクラスタ１の商品特徴と呼ぶ。同様に、クラスタ１に所属するユーザのことをクラスタ１のユーザ特徴と呼ぶ。クラスタ１の商品特徴とユーザ特徴をまとめてクラスタ１の特徴と呼ぶこととする。このようにＮＭＦの適用によって得られたユーザ特徴行列Ａと商品特徴行列Ｂをもとに図１６のようなクラスタ抽出が可能となる。なお、クラスタの総数に相当する商品特徴行列のランク数は、解析する前に予め決定しておくものとする。 When attention is paid to the column corresponding to the cluster 1 of the user feature matrix A in FIG. 15, the values of the first row, the second row, and the third row corresponding to the user 1, the user 2, and the user 3 are larger than 0. You can see that This indicates that user 1, user 2, and user 3 belong to cluster 1. Further, focusing on the row corresponding to the cluster 1 of the product feature matrix B, the value of the column corresponding to the product of the first column of beer 1, the second column of beer 2, and the third column of beer 3 is greater than zero. It turns out that it is a value. This can be said to represent a feature of cluster 1 that the three words beer 1, beer 2, and beer 3 are easily purchased by the same user. Therefore, the products called beer 1, beer 2, and beer 3 are collectively referred to as the product features of cluster 1. Similarly, users belonging to cluster 1 are referred to as user characteristics of cluster 1. The product features and user features of cluster 1 are collectively referred to as the features of cluster 1. Thus, cluster extraction as shown in FIG. 16 can be performed based on the user feature matrix A and the product feature matrix B obtained by applying NMF. Note that the rank number of the product feature matrix corresponding to the total number of clusters is determined in advance before analysis.

澤田宏, “非負値行列因子分解NMFの基礎とデータ／信号解析への応用”, 電子情報通信学会誌, Vol. 95, No. 9, pp. 829-833, 2012.Hiroshi Sawada, “Basics of Non-Negative Matrix Factorization NMF and its Application to Data / Signal Analysis”, IEICE Journal, Vol. 95, No. 9, pp. 829-833, 2012. K. Takeuchi, K. Ishiguro, A. Kimura, and H. Sawada, Non-negative Multiple Matrix Factorization, Proceedings of 23rd International Joint Conference on Artificial Intelligence (IJCAI2013), pp. 1713-1720, 2013K. Takeuchi, K. Ishiguro, A. Kimura, and H. Sawada, Non-negative Multiple Matrix Factorization, Proceedings of 23rd International Joint Conference on Artificial Intelligence (IJCAI2013), pp. 1713-1720, 2013

しかしながら、上記の非特許文献１の技術が入力として利用する購買データは、“誰が”どの商品を購入したかを表す、ユーザＩＤに紐付けられた購買履歴のみを扱うことを想定しており、ユーザＩＤに紐付けられない購買履歴を合わせた解析を行うことは想定されてはいない。 However, it is assumed that the purchase data used as an input by the technology of Non-Patent Document 1 described above deals with only a purchase history associated with a user ID, which indicates who purchased what product. It is not envisaged to perform an analysis that combines purchase histories that are not associated with user IDs.

近年のデータ解析においては、ユーザＩＤに紐付くデータ及び紐付かないデータの両方が存在する状況が多数存在している。以下に２つの例を挙げる。 In recent data analysis, there are many situations where both data associated with a user ID and data not associated with the user ID exist. Two examples are given below.

１つ目の例は、個人を特定できる形式でのデータ利用期間に制限が存在するという状況である。これは個人情報保護等の観点から、不必要に長い期間個人が特定できるデータが存在することを避けるための制限である。図１７にデータの例を示す。ここで挙げている例は、2013.4.1〜2013.9.30までのデータに関しては、利用期間が過ぎたことで個人を特定するユーザＩＤが除去され利用できず、ユーザＩＤに紐付かないデータとなっており、2013.10.1以降のデータに関しては、個人を特定するユーザＩＤを含めた全カラムが利用できる、ユーザＩＤに紐付くデータとなっている。したがって、例えば2013.4.1〜2013.9.30の期間に関しては、男性又は女性ユーザ全体での各商品の購入数といった属性毎の統計情報のみが利用できることになる。 The first example is a situation where there is a limitation on the data use period in a format that can identify an individual. This is a restriction for avoiding the existence of data that can identify an individual for an unnecessarily long period from the viewpoint of protecting personal information. FIG. 17 shows an example of data. In the example given here, for data from 2013.4.1 to 2013.9.30, the user ID that identifies the individual is removed because the usage period has passed, and the data cannot be used and is not associated with the user ID. The data after 2013.10.1 is data associated with the user ID that can use all the columns including the user ID that identifies the individual. Therefore, for example, for the period from 2013.4.1 to 2013.9.30, only statistical information for each attribute such as the number of purchases of each product by male or female users as a whole can be used.

２つ目の例は、企業間でデータを匿名化させたうえでデータ共有を行うという例である。図１８にデータの例を示す。自社とデータホルダ１、及びデータホルダ２は各自がもつユーザＩＤが紐付くデータを共通データ基盤上にデータを格納している。自社は共通データ基盤から、データホルダ１、及びデータホルダ２のデータそのものを取りだすことはできないが、全社のデータを使って計算される、ユーザＩＤとは紐付かないデータ、例えば世代別の統計情報を利用することができる。従って自社の利用可能なデータは、自分自身が当初から持つユーザＩＤに紐付くデータと、共通データ基盤から取り出せるユーザＩＤとは紐付かない統計情報ということになる。 The second example is an example of sharing data after anonymizing data between companies. FIG. 18 shows an example of data. The company, the data holder 1 and the data holder 2 store the data associated with the user IDs of their own data on the common data base. Although the company cannot extract the data itself of the data holder 1 and the data holder 2 from the common data base, the data that is calculated using the company-wide data and is not linked to the user ID, for example, the statistical information for each generation Can be used. Therefore, the data that can be used by the company is statistical information that is not associated with the user ID that the user has from the beginning and the user ID that can be extracted from the common data base.

このようなユーザＩＤに紐付くデータ及び紐付かない購買データの両方が存在している場合であっても、非特許文献１の方法はユーザＩＤに紐付かない購買データを合わせた解析を行うことができないために、これを利用せず解析を行うことになる。 Even in the case where both data associated with the user ID and purchase data not associated with the user ID exist, the method of Non-Patent Document 1 cannot perform an analysis combining the purchase data not associated with the user ID. Therefore, analysis is performed without using this.

しかし、上記のアプローチは、ユーザＩＤに紐付くデータの割合がデータの総数に対して小さくなる時に問題がある。なぜなら、例えば両方のデータを用いた時にもっとも購買数の多い商品と、ユーザＩＤに紐付くデータのみを用いた時にもっとも購買数の多い商品が異なる等の状況があった場合に、データ全体の傾向をつかむことができなくなるためである。 However, the above approach has a problem when the ratio of data associated with the user ID becomes smaller than the total number of data. This is because, for example, when there is a situation where the product with the highest purchase quantity when using both data and the product with the highest purchase quantity when using only the data associated with the user ID, etc. It is because it becomes impossible to grasp.

本発明は、上記問題点を解決するために成されたものであり、個々を識別不能な個体群の特徴を含めて、全体の個体群の特徴を解析できるデータ解析装置、方法、及びプログラムを提供することを目的とする。 The present invention has been made in order to solve the above problems, and includes a data analysis apparatus, method, and program capable of analyzing the characteristics of the entire population, including the characteristics of the population that cannot be individually identified. The purpose is to provide.

上記目的を達成するために、第１の発明に係るデータ解析装置は、個々を識別可能な第１の個体群に含まれる個体ｉ（１≦ｉ≦Ｉ，Ｉは１以上の整数）とオブジェクトｊ（１≦ｊ≦Ｊ，Ｊは１以上の整数）との関連度を表した要素ｘ_ｉｊを持つＩ×Ｊの個体オブジェクト行列Ｘ、及び個々を識別不能な第２の個体群についての、個体の属性ｋ（１≦ｋ≦Ｋ，Ｋは１以上の整数）と前記オブジェクトｊとの関連度を表した要素ｙ_ｋｊを持つＫ×Ｊの属性オブジェクト行列Ｙを、前記個体ｉが、クラスタｒ（１≦ｒ≦Ｒ，Ｒは１以上の整数）に所属することを表す非負値の要素ａ_ｉｒを持つＩ×Ｒの第１の特徴行列Ａと、前記オブジェクトｊが、前記クラスタｒに所属することを表す非負値の要素ｂ_ｊｒを持つＪ×Ｒの第２の特徴行列Ｂと、前記属性ｋが、前記クラスタｒに所属することを表す非負値の要素ｃ_ｋｒを持つＫ×Ｒの第３の特徴行列Ｃと、に分解するデータ解析装置であって、前記第１の個体群に含まれる前記個体ｉと前記属性ｋとの関連度を表した要素ｖ_ｉｋを持つＩ×Ｋの個体属性行列Ｖと、前記第２の個体群についての前記属性ｋを有する個体数を表す要素ｗ_ｋを持つＫ次元の属性ベクトルＷとに基づいて、前記第２の個体群についての前記属性ｋを有する個体の数と前記第１の個体群についての前記属性ｋを有する個体の数との比率を表す要素ｐ_ｋｋを持つ属性比率行列Ｐを計算する属性比率情報計算部と、前記個体オブジェクト行列Ｘ、前記属性オブジェクト行列Ｙ、前記第１の特徴行列Ａ、前記第２の特徴行列Ｂ、及び前記第３の特徴行列Ｃに基づいて、前記第３の特徴行列Ｃが、前記属性比率行列Ｐと前記個体属性行列Ｖと前記第１の特徴行列Ａとで表わされる線形制約の下で、前記第１の特徴行列Ａ、前記第２の特徴行列Ｂ、及び前記第３の特徴行列Ｃを推定する特徴行列推定部と、予め定められた反復終了条件を満足するまで、前記特徴行列推定部による推定を繰り返す反復判定部と、を含んで構成されている。 In order to achieve the above object, the data analysis apparatus according to the first invention includes an individual i (1 ≦ i ≦ I, I is an integer of 1 or more) and an object included in a first individual group that can identify each individual. j (1 ≦ j ≦ J, where J is an integer _equal to or greater than 1), an I × J individual object matrix X having an element x _ij , and a second individual group that cannot be individually identified, An attribute object matrix Y of K × J having an element y _kj representing the degree of association between an object attribute k (1 ≦ k ≦ K, K is an integer of 1 or more) and the object j, and the individual i is a cluster The first feature matrix A of I × R having a non-negative element a _ir indicating that it belongs to r (1 ≦ r ≦ R, R is an integer of 1 or more), and the object j are included in the cluster r. a second feature matrix B J × R with elements b _jr nonnegative value indicating that it belongs the Sex k is a third feature matrix and C, and decomposed data analyzer of K × R with elements c _kr of non-negative values indicating that belonging to the cluster r, the first population An I × K individual attribute matrix V having an element v _ik representing the degree of association between the individual i and the attribute k included, and an element w representing the number of individuals having the attribute k for the second individual group based on the K-dimensional attribute vector W with _k, the ratio of the number of individuals with the attribute k for the number and the first population of individuals with the attribute k for the second population An attribute ratio information calculation unit for calculating an attribute ratio matrix P having an element p _kk representing the individual object matrix X, the attribute object matrix Y, the first feature matrix A, the second feature matrix B, and Based on the third feature matrix C, the The third feature matrix C is the first feature matrix A and the second feature under a linear constraint represented by the attribute ratio matrix P, the individual attribute matrix V, and the first feature matrix A. A matrix B and a feature matrix estimator that estimates the third feature matrix C, and an iterative determination unit that repeats the estimation by the feature matrix estimator until a predetermined iteration end condition is satisfied. Has been.

また、第１の発明に係るデータ解析装置において、前記個体オブジェクト行列Ｘの要素ｘ_ｉｊは非負値であり、前記属性オブジェクト行列Ｙの要素ｙ_ｋｊは非負値であり、前記第１の特徴行列Ａの要素ａ_ｉｒは非負値であり、前記第２の特徴行列Ｂの要素ｂ_ｊｒは非負値であり、前記第３の特徴行列Ｃの要素ｃ_ｋｒは非負値であり、前記特徴行列推定部は、非負値分解により、前記第１の特徴行列Ａ、前記第２の特徴行列Ｂ、及び前記第３の特徴行列Ｃを推定するようにしてもよい。 In the data analysis apparatus according to the first aspect of the present invention, element x _ij of the individual object matrix X is a non-negative value, element y _kj of the attribute object matrix Y is a non-negative value, and the first feature matrix A Element a _ir is non-negative, element b _jr of the second feature matrix B is non-negative, element c _kr of the third feature matrix C is non-negative, and the feature matrix estimator is The first feature matrix A, the second feature matrix B, and the third feature matrix C may be estimated by non-negative value decomposition.

第２の発明に係るデータ解析装置は、個々を識別可能な第１のユーザ群に含まれるユーザｉ（１≦ｉ≦Ｉ，Ｉは１以上の整数）による商品ｊ（１≦ｊ≦Ｊ，Ｊは１以上の整数）の購買数を表した要素ｘ_ｉｊを持つＩ×Ｊのユーザ商品行列Ｘ、及び個々を識別不能な第２のユーザ群についての、ユーザの属性ｋ（１≦ｋ≦Ｋ，Ｋは１以上の整数）を有するユーザによる前記商品ｊの購買数を表した要素Ｙ_ｋｊを持つＫ×Ｊの属性商品行列Ｙを、前記ユーザｉが、クラスタｒ（１≦ｒ≦Ｒ，Ｒは１以上の整数）に所属することを表す非負値の要素ａ_ｉｒを持つＩ×Ｒのユーザ特徴行列Ａと、前記商品ｊが、前記クラスタｒに所属することを表す非負値の要素ｂ_ｊｒを持つＪ×Ｒの商品特徴行列Ｂと、前記属性ｋが、前記クラスタｒに所属することを表す非負値の要素ｃ_ｋｒを持つＫ×Ｒの属性特徴行列Ｃと、に分解するデータ解析装置であって、前記第１のユーザ群に含まれる前記ユーザｉと前記属性ｋとの関連度を表した要素ｖ_ｉｋを持つＩ×Ｋのユーザ属性行列Ｖと、前記第２のユーザ群についての前記属性ｋを有するユーザ数を表す要素ｗ_ｋを持つＫ次元の属性人数ベクトルＷとに基づいて、前記第２のユーザ群についての前記属性ｋを有するユーザ数と前記第１のユーザ群についての前記属性ｋを有するユーザ数との比率を表す要素ｐ_ｋｋを持つ属性比率行列Ｐを計算する属性比率情報計算部と、前記ユーザ商品行列Ｘ、前記属性商品行列Ｙ、前記ユーザ特徴行列Ａ、前記商品特徴行列Ｂ、及び前記属性特徴行列Ｃに基づいて、前記属性特徴行列Ｃが、前記属性比率行列Ｐと前記ユーザ属性行列Ｖと前記ユーザ特徴行列Ａとで表わされる線形制約の下で、前記ユーザ特徴行列Ａ、前記商品特徴行列Ｂ、及び前記属性特徴行列Ｃを推定する特徴行列推定部と、予め定められた反復終了条件を満足するまで、前記特徴行列推定部による推定を繰り返す反復判定部と、を含んで構成されている。 The data analysis apparatus according to the second invention is a product j (1 ≦ j ≦ J, which is a user i (1 ≦ i ≦ I, I is an integer of 1 or more) included in a first user group that can be individually identified. I × J user product matrix X having an element x _ij indicating the number of purchases, where J is an integer equal to or greater than 1, and the user attribute k (1 ≦ k ≦) for the second user group that cannot be identified individually The user i has a cluster r (1 ≦ r ≦ R), and K × J attribute product matrix Y having an element Y _kj representing the number of purchases of the product j by users having K and K are integers of 1 or more. , R is an integer equal to or greater than 1), and an I × R user feature matrix A having a non-negative element a _ir and a non-negative element indicating that the product j belongs to the cluster r and product feature matrix B of J × R with b _jr, said attribute k is belonging to the cluster r A property characteristic matrix C of K × R with elements c _kr of non-negative values representing, in a decomposing data analyzer, the relevance between the attribute k and the user i included in the first user group Based on an I × K user attribute matrix V having an element v _ik and a K-dimensional attribute number vector W having an element w _k representing the number of users having the attribute k for the second user group. Attribute for calculating an attribute ratio matrix P having an element p _kk representing the ratio between the number of users having the attribute k for the second user group and the number of users having the attribute k for the first user group Based on the ratio information calculation unit, the user product matrix X, the attribute product matrix Y, the user feature matrix A, the product feature matrix B, and the attribute feature matrix C, the attribute feature matrix C includes the attribute ratio Matrix P and previous A feature matrix estimator for estimating the user feature matrix A, the product feature matrix B, and the attribute feature matrix C under a linear constraint represented by the user attribute matrix V and the user feature matrix A; And an iterative determination unit that repeats estimation by the feature matrix estimation unit until the iterative completion condition is satisfied.

第３の発明に係るデータ解析方法は、個々を識別可能な第１の個体群に含まれる個体ｉ（１≦ｉ≦Ｉ，Ｉは１以上の整数）とオブジェクトｊ（１≦ｊ≦Ｊ，Ｊは１以上の整数）との関連度を表した要素ｘ_ｉｊを持つＩ×Ｊの個体オブジェクト行列Ｘ、及び個々を識別不能な第２の個体群についての、個体の属性ｋ（１≦ｋ≦Ｋ，Ｋは１以上の整数）と前記オブジェクトｊとの関連度を表した要素Ｙ_ｋｊを持つＫ×Ｊの属性オブジェクト行列Ｙを、前記個体ｉが、クラスタｒ（１≦ｒ≦Ｒ，Ｒは１以上の整数）に所属することを表す非負値の要素ａ_ｉｒを持つＩ×Ｒの第１の特徴行列Ａと、前記オブジェクトｊが、前記クラスタｒに所属することを表す非負値の要素ｂ_ｊｒを持つＪ×Ｒの第２の特徴行列Ｂと、前記属性ｋが、前記クラスタｒに所属することを表す非負値の要素ｃ_ｋｒを持つＫ×Ｒの第３の特徴行列Ｃと、に分解するデータ解析装置におけるデータ解析方法であって、属性比率情報計算部が、前記第１の個体群に含まれる前記個体ｉと前記属性ｋとの関連度を表した要素ｖ_ｉｋを持つＩ×Ｋの個体属性行列Ｖと、前記第２の個体群についての前記属性ｋを有する個体数を表す要素ｗ_ｋを持つＫ次元の属性ベクトルＷとに基づいて、前記第２の個体群についての前記属性ｋを有する個体の数と前記第１の個体群についての前記属性ｋを有する個体の数との比率を表す要素ｐ_ｋｋを持つ属性比率行列Ｐを計算するステップと、特徴行列推定部が、前記個体オブジェクト行列Ｘ、前記属性オブジェクト行列Ｙ、前記第１の特徴行列Ａ、前記第２の特徴行列Ｂ、及び前記第３の特徴行列Ｃに基づいて、前記第３の特徴行列Ｃが、前記属性比率行列Ｐと前記個体属性行列Ｖと前記第１の特徴行列Ａとで表わされる線形制約の下で、前記第１の特徴行列Ａ、前記第２の特徴行列Ｂ、及び前記第３の特徴行列Ｃを推定するステップと、反復判定部が、予め定められた反復終了条件を満足するまで、前記特徴行列推定部による推定を繰り返すステップと、を含んで実行することを特徴とする。 According to a third aspect of the present invention, there is provided a data analysis method in which an individual i (1 ≦ i ≦ I, I is an integer of 1 or more) and an object j (1 ≦ j ≦ J, An individual attribute k (1 ≦ k) for an I × J individual object matrix X having an element x _ij that represents the degree of association with J and an individual that is not distinguishable from each other. ≦ K, K is an integer greater than or equal to 1) and an attribute object matrix Y of K × J having an element Y _kj representing the degree of association between the object j, the individual i is represented by a cluster r (1 ≦ r ≦ R, R is an integer greater than or equal to 1) and a non-negative first element matrix A having a non-negative element a _ir indicating non-negative value a _ir and a non-negative value indicating that the object j belongs to the cluster r a second feature matrix B J × R with elements b _jr, the attribute k is the cluster r A third feature matrix C and the data analysis method in decomposing the data analysis device to the K × R with elements c _kr of non-negative values indicating that they belong, the attribute ratio information calculating unit, the first individual Represents an I × K individual attribute matrix V having an element v _ik representing the degree of association between the individual i and the attribute k included in the group, and the number of individuals having the attribute k for the second individual group Based on a K-dimensional attribute vector W having element w _k , the number of individuals having the attribute k for the second population and the number of individuals having the attribute k for the first population, A step of calculating an attribute ratio matrix P having an element p _kk representing a ratio of the same, and a feature matrix estimation unit, wherein the individual object matrix X, the attribute object matrix Y, the first feature matrix A, and the second feature Matrix B, and the third Based on the feature matrix C, the third feature matrix C is subject to linear constraints represented by the attribute ratio matrix P, the individual attribute matrix V, and the first feature matrix A. Estimating the matrix A, the second feature matrix B, and the third feature matrix C, and the estimation by the feature matrix estimation unit until the iteration determination unit satisfies a predetermined iteration termination condition. And repeating the steps.

第４の発明に係るプログラムは、コンピュータを、上記に記載のデータ解析装置の各部として機能させるためのプログラムである。 A program according to a fourth invention is a program for causing a computer to function as each part of the data analysis device described above.

第１の発明及び第３の発明によれば、個体属性行列Ｖと、属性ベクトルＷとに基づいて、比率を表す要素ｐ_ｋｋを持つ属性比率行列Ｐを計算し、個体オブジェクト行列Ｘ、属性オブジェクト行列Ｙ、第１の特徴行列Ａ、第２の特徴行列Ｂ、及び第３の特徴行列Ｃに基づいて、第３の特徴行列Ｃが、属性比率行列Ｐと個体属性行列Ｖと第１の特徴行列Ａとで表わされる線形制約の下で、第１の特徴行列Ａ、第２の特徴行列Ｂ、及び第３の特徴行列Ｃを推定することにより、個々を識別不能な個体群の特徴を含めて、全体の個体群の特徴を解析できる。 According to the first and third inventions, the attribute ratio matrix P having the element p _kk representing the ratio is calculated based on the individual attribute matrix V and the attribute vector W, and the individual object matrix X, the attribute object Based on the matrix Y, the first feature matrix A, the second feature matrix B, and the third feature matrix C, the third feature matrix C includes an attribute ratio matrix P, an individual attribute matrix V, and a first feature. By including the first feature matrix A, the second feature matrix B, and the third feature matrix C under the linear constraint represented by the matrix A, the features of the indistinguishable individual group are included. Thus, the characteristics of the entire population can be analyzed.

第２の発明によれば、ユーザ属性行列Ｖと、属性人数ベクトルＷとに基づいて、比率を表す要素ｐ_ｋｋを持つ属性比率行列Ｐを計算し、ユーザ商品行列Ｘ、属性商品行列Ｙ、ユーザ特徴行列Ａ、商品特徴行列Ｂ、及び属性特徴行列Ｃに基づいて、属性特徴行列Ｃが、属性比率行列Ｐとユーザ属性行列Ｖとユーザ特徴行列Ａとで表わされる線形制約の下で、ユーザ特徴行列Ａ、商品特徴行列Ｂ、及び属性特徴行列Ｃを推定することにより、個々を識別不能なユーザ群の特徴を含めて、全体のユーザ群の特徴を解析できる。 According to the second invention, the attribute ratio matrix P having the element p _kk representing the ratio is calculated based on the user attribute matrix V and the attribute number vector W, and the user product matrix X, the attribute product matrix Y, the user Based on the feature matrix A, the product feature matrix B, and the attribute feature matrix C, the attribute feature matrix C is subjected to user features under linear constraints represented by an attribute ratio matrix P, a user attribute matrix V, and a user feature matrix A. By estimating the matrix A, the product feature matrix B, and the attribute feature matrix C, it is possible to analyze the features of the entire user group including the features of the user group that cannot be individually identified.

本発明のデータ解析装置、方法、及びプログラムによれば、個々を識別不能な個体群の特徴を含めて、全体の個体群の特徴を解析できる、という効果が得られる。 According to the data analysis apparatus, method, and program of the present invention, it is possible to analyze the characteristics of the entire population including the characteristics of the population that cannot be individually identified.

自社のデータと他社のデータとから利用できるデータ及びデータから得られる行列の一例を示す概念図である。It is a conceptual diagram which shows an example of the matrix obtained from the data which can be utilized from the data of own company, and the data of another company, and data. ユーザ商品行列Ｘを利用した解析の一例を示す概念図である。It is a conceptual diagram which shows an example of the analysis using the user goods matrix X. ユーザ商品行列Ｘと属性商品行列Ｙに成り立つ関係性を考慮した行列分解の一例を示す概念図である。It is a conceptual diagram which shows an example of the matrix decomposition | disassembly which considered the relationship formed in the user goods matrix X and the attribute goods matrix Y. ユーザ商品行列Ｘ及び属性商品行列Ｙと、ユーザ特徴行列Ａ、商品特徴行列Ｂ、及び属性特徴行列Ｃとの関係性の一例を示す概念図である。It is a conceptual diagram which shows an example of the relationship between the user product matrix X and the attribute product matrix Y, the user feature matrix A, the product feature matrix B, and the attribute feature matrix C. 本発明の実施の形態に係るデータ解析装置の構成を示すブロック図である。It is a block diagram which shows the structure of the data analyzer which concerns on embodiment of this invention. 記憶部２に格納されたユーザ商品情報テーブルの一例を示す概念図である。3 is a conceptual diagram illustrating an example of a user product information table stored in a storage unit 2. FIG. 記憶部２に格納された属性商品情報テーブルの一例を示す概念図である。3 is a conceptual diagram illustrating an example of an attribute product information table stored in a storage unit 2. FIG. 記憶部２に格納されたユーザ属性情報テーブルの一例を示す概念図である。3 is a conceptual diagram illustrating an example of a user attribute information table stored in a storage unit 2. FIG. 記憶部２に格納された属性人数情報テーブルの一例を示す概念図である。3 is a conceptual diagram illustrating an example of an attribute number information table stored in a storage unit 2. FIG. 記憶部２に格納された属性比率情報テーブルの一例を示す概念図である。3 is a conceptual diagram illustrating an example of an attribute ratio information table stored in a storage unit 2. FIG. 記憶部２に格納されたユーザ特徴テーブルの一例を示す概念図である。4 is a conceptual diagram illustrating an example of a user feature table stored in a storage unit 2. FIG. 記憶部２に格納された商品特徴テーブルの一例を示す概念図である。4 is a conceptual diagram illustrating an example of a product feature table stored in a storage unit 2. FIG. 記憶部２に格納された属性特徴テーブルの一例を示す概念図である。3 is a conceptual diagram illustrating an example of an attribute feature table stored in a storage unit 2. FIG. 本発明の実施の形態に係るデータ解析装置における特徴行列推定処理ルーチンを示すフローチャートである。It is a flowchart which shows the characteristic matrix estimation process routine in the data analysis apparatus which concerns on embodiment of this invention. ユーザ商品行列、ユーザ特徴行列、及び商品特徴行列の一例を示す概念図である。It is a conceptual diagram which shows an example of a user product matrix, a user feature matrix, and a product feature matrix. クラスタ抽出の一例を示す概念図となる。It is a conceptual diagram which shows an example of cluster extraction. 一定期間以降はユーザ毎の統計情報は利用できず、属性毎の統計情報が利用可能である一例を示す概念図であるIt is a conceptual diagram which shows an example in which the statistical information for every user cannot be used after a fixed period, and the statistical information for every attribute can be used. 企業間のデータ共有の一例を示す概念図である。It is a conceptual diagram which shows an example of the data sharing between companies.

以下、図面を参照して本発明の実施の形態を詳細に説明する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

＜本発明の実施の形態に係る概要＞ <Outline according to Embodiment of the Present Invention>

まず、本発明の実施の形態における概要を説明する。 First, an outline of the embodiment of the present invention will be described.

本発明の実施の形態では、ユーザＩＤに紐付く購買履歴から作成されるＩ×Ｊのユーザ商品行列Ｘ、ユーザＩＤに紐付かない購買履歴を用いて作成されるＫ×Ｊの属性商品行列Ｙ＝｛ｙ_ｋｊ｝、この属性商品行列Ｙ中に存在する属性毎のユーザ数を表すＫ次元属性人数ベクトルＷ＝｛ｗ_ｋ｝、Ｘ中に存在するユーザとその属性の対応関係を表すＩ×Ｋのユーザ属性行列Ｖ＝｛ｖ_ｉｋ｝、という３つの行列と１つのベクトルからクラスタ抽出を行う技術を用いる。ここで、ユーザ商品行列Ｘは、ユーザＩＤを付与された個々の識別可能な第１のユーザ群に含まれるユーザｉ（１≦ｉ≦Ｉ，Ｉは１以上の整数）による商品ｊ（１≦ｊ≦Ｊ，Ｊは１以上の整数）との購買数を表した要素ｘ_ｉｊを持つＩ×Ｊの行列である。また、属性商品行列Ｙは、個々を識別不能な第２のユーザ群についての、ユーザの属性ｋ（１≦ｋ≦Ｋ，Ｋは１以上の整数）を有するユーザによる商品ｊとの購買数を表した要素ｙ_ｋｊを持つＫ×Ｊの行列である。また、ユーザ属性行列Ｖは、第１のユーザ群に含まれるユーザｉと属性ｋとの関連度を表した要素ｖ_ｉｋを持つＩ×Ｋの行列である。なお、属性商品行列Ｙは要素ｙ_ｋｊが属性ｋ（例えば男性３０代全体）の商品ｊの総購入数を表す行列である。ユーザ属性行列Ｖの要素ｖ_ｉｋはユーザｉが属性ｋに属する時１、そうでなければ０をとる行列である。また、ｖ_ｉｋの値は０または１に限られず、０又は正の整数値であればよい。ただし、負の数は用いない。 In the embodiment of the present invention, an I × J user product matrix X created from a purchase history associated with a user ID, a K × J attribute product matrix Y = created using a purchase history not associated with a user ID = {Y _kj }, K-dimensional attribute number vector W = {w _k } representing the number of users for each attribute existing in the attribute product matrix Y, I × K representing the correspondence between users existing in X and their attributes _Is used to extract a cluster from three matrices and one vector of user attribute matrix V = {v _ik }. Here, the user product matrix X is a product j (1 ≦ 1) by a user i (1 ≦ i ≦ I, I is an integer of 1 or more) included in each identifiable first user group given a user ID. j ≦ J, where J is an integer equal to or greater than 1) and is an I × J matrix having an element x _ij . In addition, the attribute product matrix Y indicates the number of purchases with the product j by the user having the user attribute k (1 ≦ k ≦ K, where K is an integer of 1 or more) for the second user group that cannot be individually identified. It is a K × J matrix with the elements y _kj represented. The user attribute matrix V is an I × K matrix having an element v _ik representing the degree of association between the user i and the attribute k included in the first user group. The attribute product matrix Y is a matrix that represents the total number of purchases of the product j having the element y _kj as the attribute k (for example, the entire male 30s). The element v _ik of the user attribute matrix V is a matrix that takes 1 when the user i belongs to the attribute k and 0 otherwise. The value of v _ik is not limited to 0 or 1, and may be 0 or a positive integer value. However, negative numbers are not used.

上記のデータの作成の例を図１に示す。前述したように、上記非特許文献１の方法ではユーザＩＤに紐付かない購買データを合わせた解析を行うことができないため、図２に示すようにユーザ商品行列Ｘだけを利用した解析を行うことになる。 An example of the creation of the above data is shown in FIG. As described above, the method of Non-Patent Document 1 cannot perform an analysis that combines purchase data that is not associated with a user ID, and therefore performs an analysis using only the user product matrix X as shown in FIG. Become.

これに対して本発明の実施の形態では上記の全ての利用可能なデータを用いて、図３に示すような、ユーザ商品行列Ｘと属性商品行列Ｙに成り立つ関係性を考慮した行列分解手法である。図３では、まずユーザ商品行列Ｘ中に存在する各属性毎のユーザ数と属性商品行列Ｙ中の属性毎のユーザ数の比率行列を属性比率行列Ｐ＝｛ｐ_ｋｋ｝_ｋ＝１ ^Ｋで定義する。行列Ｐの値は図中に示すようにユーザ属性行列Ｖと属性人数ベクトルＷの値を用いて計算され、対角成分以外の値は０の値となる行列である。また、ユーザ商品行列Ｘとユーザ属性行列Ｖから計算される男性、又は女性といった属性別の統計値を部分統計値、行列Ｙが示す属性別の統計値を全体統計値と呼ぶ。以後、説明する本実施の形態で考える行列分解モデルでは、部分統計値と全体統計値の間はその比例定数が属性比率行列Ｐとなる“おおよそ”の比例関係にあるとしたモデル化を行う。ここで述べた“おおよそ”の意味については後述する。 On the other hand, in the embodiment of the present invention, using all the above-mentioned available data, a matrix decomposition method that considers the relationship that holds between the user product matrix X and the attribute product matrix Y as shown in FIG. is there. In FIG. 3, first, a ratio matrix of the number of users for each attribute existing in the user product matrix X and the number of users for each attribute in the attribute product matrix Y is defined as an attribute ratio matrix P = {p _kk } _{k = 1} ^K. To do. The values of the matrix P are calculated using the values of the user attribute matrix V and the attribute number vector W as shown in the figure, and values other than the diagonal component are zero values. Further, a statistical value for each attribute such as male or female calculated from the user product matrix X and the user attribute matrix V is called a partial statistical value, and a statistical value for each attribute indicated by the matrix Y is called an overall statistical value. Hereinafter, in the matrix decomposition model considered in the present embodiment to be described, modeling is performed on the assumption that the proportionality constant is an “approximate” proportional relationship in which the proportionality constant is the attribute ratio matrix P between the partial statistical value and the overall statistical value. The meaning of “approximately” mentioned here will be described later.

本発明の実施の形態では、ユーザ商品行列Ｘと属性商品行列Ｙの In the embodiment of the present invention, the user product matrix X and the attribute product matrix Y

という行列分解形を考える。ＣをＫ行Ｒ列の属性特徴行列Ｃ＝｛ｃ_ｋｒ｝と呼び、ｃ_ｋｒの値が属性ｋのクラスタｒへの寄与度を表す。本発明の実施の形態では、行列ＡとＣの間にＰＶ^ＴＡ＝Ｃが成立するという線形制約を導入する。属性比率行列Ｐを用いて線形制約を定義している点に本発明の実施の形態における特徴がある。この線形制約の導入により、前述した部分統計と全体統計が“おおよそ”比例する、という仮定を考慮した出力として、特徴行列Ａ、Ｂ、及びＣが得られる。比例定数は属性比率行列Ｐの各要素の値に対応しているため、比例定数自体の推定はユーザ商品行列Ｘと属性商品行列Ｙの行列分解と独立に行えるという利点がある。図４に本発明の実施の形態の適用例を示す。ユーザ特徴行列Ａがユーザとクラスタの関係、商品特徴行列Ｂが商品とクラスタの関係、属性特徴行列Ｃが属性とクラスタの関係を表していることから、これらを利用して例えば図４中のヒストグラムを表示させることができる。このヒストグラムから各クラスタに特徴的なユーザ、商品、及び属性を特定することができるようになる。 Consider the matrix decomposition form. C is referred to as an attribute feature matrix C = {c _kr } with K rows and R columns, and the value of c _kr represents the degree of contribution of attribute k to cluster r. In the embodiment of the present invention, a linear constraint that PV ^T A = C is established between the matrices A and C is introduced. A feature of the embodiment of the present invention is that a linear constraint is defined using the attribute ratio matrix P. By introducing this linear constraint, the feature matrices A, B, and C are obtained as an output considering the above-mentioned assumption that the partial statistics and the overall statistics are “approximately” proportional. Since the proportionality constant corresponds to the value of each element of the attribute ratio matrix P, there is an advantage that the proportionality constant itself can be estimated independently of the matrix decomposition of the user product matrix X and the attribute product matrix Y. FIG. 4 shows an application example of the embodiment of the present invention. The user feature matrix A represents the relationship between the user and the cluster, the product feature matrix B represents the relationship between the product and the cluster, and the attribute feature matrix C represents the relationship between the attribute and the cluster. Can be displayed. From this histogram, users, products, and attributes characteristic of each cluster can be identified.

ここで、記号 Where the sign

で表現した類似の尺度と、前述した“おおよそ”の比例関係、の意味するものについて説明する。上記非特許文献１にも記述されているように、行列の類似の尺度には、ユークリッド距離に基づくものや一般化カルバックライブラーダイバージェンス（ＫＬ距離）により定義される距離尺度が用いられ、値が小さいほど両者が類似していることを表す。 The meaning of the similar scale expressed in the above and the above-mentioned “approximately” proportional relationship will be explained. As described in Non-Patent Document 1 above, as a similar measure of a matrix, a measure based on Euclidean distance or a distance measure defined by generalized Kullback library divergence (KL distance) is used. The smaller the value, the more similar.

どの距離を利用するかは、データが持つ性質を考慮して決定される。例えば、非特許文献２にも記述されているように、距離尺度にＫＬ距離を用いる際には、ユーザ商品行列Ｘの各要素ｘ_ｉｊがパラメタΣ_ｒａ_ｉｒｂ_ｊｒのポアソン分布に従って得られているとした確率モデルを考えていることに相当する。したがってポアソン分布の性質よりｘ_ｉｊのとる値の期待値はΣ_ｒａ_ｉｒｂ_ｊｒとなるが、実際のデータ中のｘ_ｉｊの値は平均からずれた値をとることも想定されたモデル化をされている。これを踏まえて、“おおよそ”の比例関係という言葉をきちんと定義すると、本発明の実施の形態においては、各属性ｋに関して部分統計の期待値と全体統計の期待値に比例関係（比例定数ｐ_ｋｋ）を想定したモデリングを行っていることになる。したがって、この期待値のもとでの比例関係が保たれるように制限したうえで特徴行列Ａ、Ｂ、及びＣを算出することで、全体統計の値も考慮された解析が可能になる。 Which distance is used is determined in consideration of the properties of the data. For example, as described in Non-Patent Document 2, when the KL distance is used as the distance measure, each element x _ij of the user product matrix X is obtained according to the Poisson distribution of the parameter Σ _r a _ir b _jr. This is equivalent to considering a probability model. Therefore, the expected value of x _ij is Σ _r a _ir b _jr due to the Poisson distribution, but the modeling assumes that the value of x _ij in the actual data is deviated from the average. Has been. Based on this, when the term “approximately” proportional relationship is defined properly, in the embodiment of the present invention, the proportional relationship (proportional constant p _kk) between the expected value of partial statistics and the expected value of overall statistics for each attribute k. ) Is assumed. Therefore, by calculating the feature matrices A, B, and C after limiting the proportional relationship under the expected value, it is possible to perform an analysis in consideration of the value of the overall statistics.

本発明の実施の形態の概要動作は、以下に示すとおりである。 The general operation of the embodiment of the present invention is as follows.

ステップ１）ユーザ商品行列、属性商品行列、ユーザ属性行列、及び属性人数ベクトルを入力する Step 1) Input user product matrix, attribute product matrix, user attribute matrix, and attribute number vector

ステップ２）属性比率行列を計算する Step 2) Calculate the attribute ratio matrix

ステップ３）各特徴行列を推定する Step 3) Estimate each feature matrix

ステップ４）各特徴行列を出力する Step 4) Output each feature matrix

＜本発明の実施の形態に係るデータ解析装置の構成＞ <Configuration of Data Analysis Device according to Embodiment of the Present Invention>

次に、本発明の実施の形態に係るデータ解析装置の構成について説明する。図５に示すように、本発明の実施の形態に係るデータ解析装置１００は、ＣＰＵと、ＲＡＭと、後述するデータ解析処理ルーチンを実行するためのプログラムや各種データを記憶したＲＯＭと、を含むコンピュータで構成することが出来る。このデータ解析装置１００は、機能的には図５に示すように入力部１と、記憶部２と、演算部３と、出力部４とを備えている。 Next, the configuration of the data analysis apparatus according to the embodiment of the present invention will be described. As shown in FIG. 5, a data analysis apparatus 100 according to an embodiment of the present invention includes a CPU, a RAM, and a ROM that stores a program and various data for executing a data analysis processing routine to be described later. Can be configured with a computer. Functionally, the data analysis apparatus 100 includes an input unit 1, a storage unit 2, a calculation unit 3, and an output unit 4 as shown in FIG.

入力部２は、外部装置２００から出力された、個々を識別可能な第１のユーザ群に含まれるユーザｉ（１≦ｉ≦Ｉ，Ｉは１以上の整数）による商品ｊ（１≦ｊ≦Ｊ，Ｊは１以上の整数）の購買数を表した要素ｘ_ｉｊを持つＩ×Ｊのユーザ商品行列Ｘ、個々を識別不能な第２のユーザ群についての、ユーザの属性ｋ（１≦ｋ≦Ｋ，Ｋは１以上の整数）を有するユーザによる前記商品ｊの購買数を表した要素Ｙ_ｋｊを持つＫ×Ｊの属性商品行列Ｙ、及び個々を識別可能な第１のユーザ群に含まれるユーザｉと属性ｋとの関連度を表した要素ｖ_ｉｋを持つＩ×Ｋのユーザ属性行列Ｖを受け付ける。 The input unit 2 outputs a product j (1 ≦ j ≦) by a user i (1 ≦ i ≦ I, I is an integer of 1 or more) included in a first user group that can be individually identified and output from the external device 200. J, J is an integer equal to or greater than 1) I × J user product matrix X having an element x _ij representing the number of purchases, and a user attribute k (1 ≦ k) for a second user group that cannot be individually identified ≦ K, K is an integer greater than or equal to 1), and K × J attribute product matrix Y having an element Y _kj representing the number of purchases of the product j by a user, and the first user group that can identify each of them I × K user attribute matrix V having element v _ik representing the degree of association between user i and attribute k is received.

記憶部２には、入力部１によって受け付けたユーザ商品行列Ｘ、属性商品行列Ｙ、及びユーザ属性行列Ｖに基づいて後述する各部で作成される、ユーザ商品情報テーブル８１と、属性商品情報テーブル８２と、ユーザ属性情報テーブル８３と、属性人数情報テーブル８４と、属性比率情報テーブル８５と、ユーザ特徴テーブル８６と、商品特徴テーブル８７と、属性特徴テーブル８８とが格納される。なお、テーブル形式のデータは行列形式にて表現できることから、以下の説明では、各テーブルと各特徴行列を同一視し、区別せずに用いる。 In the storage unit 2, a user product information table 81 and an attribute product information table 82, which are created by each unit described later based on the user product matrix X, the attribute product matrix Y, and the user attribute matrix V received by the input unit 1. A user attribute information table 83, an attribute number information table 84, an attribute ratio information table 85, a user feature table 86, a product feature table 87, and an attribute feature table 88 are stored. Since the data in the table format can be expressed in a matrix format, in the following description, each table and each feature matrix are identified and used without distinction.

＜ユーザ商品情報テーブル８１＞
ユーザ商品情報テーブル８１は、図６に示すように、ユーザＩＤフィールド、商品ＩＤフィールド、及び購買数フィールドを有する。ユーザＩＤフィールドは、ユーザ商品情報処理部３０により追加されたユーザを特定する識別子が設定される。商品ＩＤフィールドは、後述するユーザ商品情報処理部３０により追加されたユーザの購入した商品を特定する識別子が設定される。購買数フィールドは、ユーザ商品情報処理部３０により１、又は当該商品の当該ユーザの購入数が設定される。なお、購買数の値には０または正の整数値を設定できるが、負の数を設定することはできない。 <User product information table 81>
As shown in FIG. 6, the user product information table 81 has a user ID field, a product ID field, and a purchase quantity field. In the user ID field, an identifier for identifying a user added by the user product information processing unit 30 is set. In the product ID field, an identifier for specifying a product purchased by the user added by the user product information processing unit 30 described later is set. In the purchase number field, 1 is set by the user product information processing unit 30 or the number of purchases of the user of the product is set. Although the value of the number of purchases can be set to 0 or a positive integer value, a negative number cannot be set.

＜属性商品情報テーブル８２＞
属性商品情報テーブル８２は、図７に示すように、属性ＩＤフィールド、商品ＩＤフィールド、及び購買数フィールドを有する。属性ＩＤフィールドは、後述する属性商品情報処理部３２により追加された属性を特定する識別子が設定される。商品ＩＤフィールドは、属性商品情報処理部３２により追加された商品を特定する識別子が設定される。購買数フィールドは、属性商品情報処理部３２により１、または当該商品の当該属性の購入数が設定される。なお、購買数の値には０又は正の整数値を設定できるが、負の数を設定することはできない。 <Attribute product information table 82>
As shown in FIG. 7, the attribute product information table 82 has an attribute ID field, a product ID field, and a purchase quantity field. In the attribute ID field, an identifier for specifying an attribute added by the attribute product information processing unit 32 described later is set. In the product ID field, an identifier for specifying a product added by the attribute product information processing unit 32 is set. In the purchase number field, 1 is set by the attribute product information processing unit 32, or the purchase number of the attribute of the product is set. In addition, although 0 or a positive integer value can be set as the value of the number of purchases, a negative number cannot be set.

＜ユーザ属性情報テーブル８３＞
ユーザ属性情報テーブル８３は、図８に示すように、ユーザＩＤフィールド、属性ＩＤフィールド、及び所属値フィールドを有する。ユーザＩＤフィールドは、後述するユーザ属性情報処理部３４によりユーザを特定する識別子が設定される。属性ＩＤフィールドは、ユーザ属性情報処理部３４により属性を特定する識別子が設定される。所属値フィールドには、ユーザ属性情報処理部３４によって当該ユーザが当該属性に所属する場合には１、そうでなければ０が設定される。 <User attribute information table 83>
As shown in FIG. 8, the user attribute information table 83 has a user ID field, an attribute ID field, and an affiliation value field. In the user ID field, an identifier for identifying a user is set by a user attribute information processing unit 34 described later. In the attribute ID field, an identifier for specifying an attribute is set by the user attribute information processing unit 34. In the affiliation value field, 1 is set by the user attribute information processing unit 34 when the user belongs to the attribute, and 0 is set otherwise.

＜属性人数情報テーブル８４＞
属性人数情報テーブル８４は、図９に示すように、属性ＩＤフィールド、及び人数値フィールドを有する。属性ＩＤフィールドは、後述する属性人数情報処理部３６により属性を特定する識別子が設定される。人数値フィールドには、属性人数情報処理部３６によって当該属性に所属するユーザ数が設定される。 <Attribute Number Information Table 84>
As shown in FIG. 9, the attribute number information table 84 has an attribute ID field and a person value field. In the attribute ID field, an identifier for specifying an attribute is set by the attribute number information processing unit 36 described later. The number of users belonging to the attribute is set by the attribute number information processing unit 36 in the person number field.

＜属性比率情報テーブル８５＞
属性比率情報テーブル８５は、図１０に示すように、属性ＩＤフィールド、及び比率値フィールドを有する。属性ＩＤフィールドは、後述する属性比率情報計算部３８により属性を特定する識別子が設定される。比率値フィールドには、属性比率情報計算部３８によって算出された値が設定される。 <Attribute ratio information table 85>
As shown in FIG. 10, the attribute ratio information table 85 has an attribute ID field and a ratio value field. In the attribute ID field, an identifier for specifying an attribute is set by an attribute ratio information calculation unit 38 to be described later. A value calculated by the attribute ratio information calculation unit 38 is set in the ratio value field.

＜ユーザ特徴テーブル８６＞
ユーザ特徴テーブル８６は、図１１に示すように、ユーザＩＤフィールドと、クラスタＩＤフィールドと、ユーザ特徴値フィールドとを有する。ユーザＩＤフィールドには、後述する特徴行列推定部４０によりユーザを特定する識別子が設定される。クラスタＩＤフィールドには、特徴行列推定部４０によりクラスタを特定する識別子が設定される。ユーザ特徴値フィールドには、特徴行列推定部４０により算出された当該ユーザの当該クラスタに対する特徴値が設定される。 <User feature table 86>
As shown in FIG. 11, the user feature table 86 has a user ID field, a cluster ID field, and a user feature value field. An identifier for identifying the user is set in the user ID field by the feature matrix estimation unit 40 described later. In the cluster ID field, an identifier for identifying a cluster is set by the feature matrix estimation unit 40. In the user feature value field, the feature value of the user for the cluster calculated by the feature matrix estimation unit 40 is set.

＜商品特徴テーブル８７＞
商品特徴テーブル８７は、図１２に示すように、商品ＩＤフィールドと、クラスタＩＤフィールドと、商品特徴値フィールドとを有する。商品ＩＤフィールドには、特徴行列推定部４０により商品を特定する識別子が設定される。クラスタＩＤフィールドには、特徴行列推定部４０によりクラスタを特定する識別子が設定される。商品特徴値フィールドには、特徴行列推定部４０により算出された当該商品の当該クラスタに対する特徴値が設定される。 <Product feature table 87>
As shown in FIG. 12, the product feature table 87 includes a product ID field, a cluster ID field, and a product feature value field. In the product ID field, an identifier for specifying a product by the feature matrix estimation unit 40 is set. In the cluster ID field, an identifier for identifying a cluster is set by the feature matrix estimation unit 40. In the product feature value field, the feature value for the cluster of the product calculated by the feature matrix estimation unit 40 is set.

＜属性特徴テーブル８８＞
属性特徴テーブル８８は、図１３に示すように、属性ＩＤフィールドと、クラスタＩＤフィールドと、属性特徴値フィールドとを有する。属性ＩＤフィールドには、特徴行列推定部４０により属性を特定する識別子が設定される。クラスタＩＤフィールドには、特徴行列推定部４０によりクラスタを特定する識別子が設定される。属性特徴値フィールドには、特徴行列推定部４０により算出された当該属性の当該クラスタに対する特徴値が設定される。 <Attribute feature table 88>
As shown in FIG. 13, the attribute feature table 88 has an attribute ID field, a cluster ID field, and an attribute feature value field. In the attribute ID field, an identifier for specifying an attribute by the feature matrix estimation unit 40 is set. In the cluster ID field, an identifier for identifying a cluster is set by the feature matrix estimation unit 40. In the attribute feature value field, a feature value for the cluster of the attribute calculated by the feature matrix estimation unit 40 is set.

演算部３は、ユーザ商品情報処理部３０と、属性商品情報処理部３２と、ユーザ属性情報処理部３４と、属性人数情報処理部３６と、属性比率情報計算部３８と、特徴行列推定部４０と、反復判定部４２と、特徴行列処理部４４とを含んで構成されている。 The calculation unit 3 includes a user product information processing unit 30, an attribute product information processing unit 32, a user attribute information processing unit 34, an attribute number information processing unit 36, an attribute ratio information calculation unit 38, and a feature matrix estimation unit 40. And an iterative determination unit 42 and a feature matrix processing unit 44.

＜ユーザ商品情報処理部３０＞
ユーザ商品情報処理部３０は、入力部１によって受け付けたユーザ商品行列Ｘに基づいて、ユーザＩＤ毎及び商品ＩＤ毎の購買数を、記憶部２のユーザ商品情報テーブル８１に格納する。また、ユーザ商品情報テーブル８１を更新するときにユーザ商品情報処理部３０は、記憶部２に格納されたユーザ商品情報テーブル８１に、追加されたユーザ、商品、購入数に応じて、ユーザＩＤフィールド、商品ＩＤフィールド、及び購入数フィールドの値を設定した行を挿入する。なお、ユーザ商品情報処理部３０による更新のタイミングは、例えば、システム管理者が外部装置２００から供給されるデータをもとに手動で管理できるようにしてもよいし、新たな購買が発生した場合に外部装置２００が自動的に処理を起動するようにしてもよい。 <User Product Information Processing Unit 30>
The user product information processing unit 30 stores the number of purchases for each user ID and each product ID in the user product information table 81 of the storage unit 2 based on the user product matrix X received by the input unit 1. In addition, when updating the user product information table 81, the user product information processing unit 30 adds a user ID field to the user product information table 81 stored in the storage unit 2 according to the added user, product, and number of purchases. , A line in which the values of the product ID field and the purchase quantity field are set is inserted. Note that the update timing by the user product information processing unit 30 may be manually managed by the system administrator based on data supplied from the external device 200, or when a new purchase occurs, for example. Alternatively, the external device 200 may automatically start processing.

＜属性商品情報処理部３２＞
属性商品情報処理部３２は、入力部１によって受け付けた属性商品行列Ｙに基づいて、属性ＩＤ毎及び商品ＩＤ毎の購買数を、記憶部２の属性商品情報テーブル８２に格納する。また、属性商品情報テーブル８２を更新するときに、属性商品情報処理部３２は、記憶部２に格納された属性商品情報テーブル８２に、追加された属性、商品、購入数に応じて、属性ＩＤフィールド、商品ＩＤフィールド、購買数フィールドの値を設定した行を挿入する。なお、属性商品情報処理部３２による属性商品情報更新のタイミングは、例えば、外部装置２００から供給されるＰＯＳデータをもとにシステム管理者が手動で管理できるようにしてもよいし、新たな購買が発生した場合に外部装置２００から自動的に処理を起動するようにしてもよい。 <Attribute Product Information Processing Unit 32>
The attribute product information processing unit 32 stores the number of purchases for each attribute ID and each product ID in the attribute product information table 82 of the storage unit 2 based on the attribute product matrix Y received by the input unit 1. Further, when updating the attribute product information table 82, the attribute product information processing unit 32 adds an attribute ID to the attribute product information table 82 stored in the storage unit 2 according to the added attribute, product, and number of purchases. Insert a row in which the values of the field, the product ID field, and the purchase quantity field are set. Note that the attribute product information update timing by the attribute product information processing unit 32 may be manually managed by a system administrator based on POS data supplied from the external device 200, for example. When this occurs, the processing may be automatically started from the external device 200.

＜ユーザ属性情報処理部３４＞
ユーザ属性情報処理部３４は、入力部１によって受け付けたユーザ属性行列Ｖに基づいて、ユーザＩＤ毎及び属性ＩＤ毎の所属値を、記憶部２のユーザ属性情報テーブル８３に格納する。 <User attribute information processing unit 34>
The user attribute information processing unit 34 stores the affiliation value for each user ID and each attribute ID in the user attribute information table 83 of the storage unit 2 based on the user attribute matrix V received by the input unit 1.

＜属性人数情報処理部３６＞
属性人数情報処理部３６は、個々を識別不能な第２のユーザ群についての属性ｋを有するユーザ数を表す要素ｗ_ｋを持つＫ次元の属性人数ベクトルＷに基づいて、属性ＩＤ毎の人数値を、記憶部２の属性人数情報テーブル８４に格納する。また、属性人数情報テーブル８４を更新するときに、属性人数情報処理部３６は、記憶部２に格納された属性人数情報テーブル８４に、追加された属性とその人数に応じて、属性ＩＤフィールド、及び人数値フィールドの値を設定した行を挿入する。なお、属性人数情報処理部３６によるユーザ属性情報更新のタイミングは、例えば外部装置２００から供給されるＰＯＳデータをもとにシステム管理者が手動で管理できるようにしてもよいし、新たな商品が出現した場合に外部装置２００から自動的に処理を起動するようにしてもよい。 <Attribute Number Information Processing Unit 36>
The attribute number information processing unit 36 is based on a K-dimensional attribute number vector W having an element w _k representing the number of users having the attribute k for the second user group that cannot be individually identified. Is stored in the attribute number information table 84 of the storage unit 2. When the attribute number information table 84 is updated, the attribute number information processing unit 36 adds an attribute ID field to the attribute number information table 84 stored in the storage unit 2 according to the added attribute and the number of persons. And insert a line with the value of the person value field. Note that the user attribute information update timing by the attribute number information processing unit 36 may be manually managed by a system administrator based on POS data supplied from the external device 200, for example. The process may be automatically started from the external device 200 when it appears.

＜属性比率情報計算部３８＞
属性比率情報計算部３８は、ユーザ属性行列Ｖと、属性人数ベクトルＷとに基づいて、第２のユーザ群についての属性ｋを有するユーザ数と第１のユーザ群についての属性ｋを有するユーザ数との比率を表す要素ｐ_ｋｋを持つ属性比率行列Ｐを計算する。 <Attribute ratio information calculation unit 38>
The attribute ratio information calculation unit 38, based on the user attribute matrix V and the attribute number vector W, counts the number of users having the attribute k for the second user group and the number of users having the attribute k for the first user group. An attribute ratio matrix P having an element p _kk representing the ratio is calculated.

ここで、記憶部２に格納されたユーザ属性情報テーブル８３に存在する全データから、Ｉ×Ｋのユーザ属性行列Ｖ＝｛｛ｖ_ｉｋ｝｝_{ｉ，ｋ＝１} ^Ｉ，Ｋ、が得られる。また、記憶部２に格納された属性人数情報テーブル８４に存在する全データから、属性人数ベクトルＷ＝｛ｗ_ｋ｝_ｋ＝１ ^Ｋ、が得られる。 Here, an I × K user attribute matrix V = {{v _ik }} _{i, k = 1} ^{I, K} is obtained from all data existing in the user attribute information table 83 stored in the storage unit 2. Further, the attribute number vector W = {w _k } _{k = 1} ^K is obtained from all data existing in the attribute number information table 84 stored in the storage unit 2.

属性比率情報計算部３８は、具体的には、属性比率行列をＰ＝｛ｐ_ｋｋ｝_ｋ＝１ ^Ｋとし、要素ｐ_ｋｋを以下（１）式で計算し、属性ＩＤ毎の比率値を、記憶部２の属性比率情報テーブル８５に格納する。 Specifically, the attribute ratio information calculation unit 38 sets the attribute ratio matrix to P = {p _kk } _{k = 1} ^K , calculates the element p _{kk according} to the following equation (1), and calculates the ratio value for each attribute ID: It is stored in the attribute ratio information table 85 of the storage unit 2.

例えば、属性比率情報計算部３８は、上記（１）式に従って、属性比率行列Ｐの各要素を計算し、属性比率行列Ｐの属性とその要素の値に応じて、記憶部２に格納された属性比率情報テーブル８５に、属性ＩＤフィールド、及び比率値フィールドの値を設定した行を挿入する。なお、属性比率情報計算部３８によるユーザ属性情報更新のタイミングは、例えば外部装置２００から供給されるＰＯＳデータをもとにシステム管理者が手動で管理できるようにしてもよいし、新たな商品が出現した場合に外部装置２００から自動的に処理を起動するようにしてもよい。 For example, the attribute ratio information calculation unit 38 calculates each element of the attribute ratio matrix P according to the above equation (1), and stores it in the storage unit 2 according to the attribute of the attribute ratio matrix P and the value of the element. A row in which the value of the attribute ID field and the ratio value field is set is inserted into the attribute ratio information table 85. Note that the user attribute information update timing by the attribute ratio information calculation unit 38 may be manually managed by a system administrator based on POS data supplied from the external device 200, for example. The process may be automatically started from the external device 200 when it appears.

＜特徴行列推定部４０、反復判定部４２＞
特徴行列推定部４０は、以下に説明するように、ユーザ商品行列Ｘ、属性商品行列Ｙ、ユーザ属性行列Ｖ、属性比率行列Ｐ、ユーザ特徴テーブル８６に格納されたユーザ特徴行列Ａ、商品特徴テーブル８７に格納された商品特徴行列Ｂ、及び属性特徴テーブル８８に格納された属性特徴行列Ｃに基づいて、属性特徴行列Ｃが、属性比率行列Ｐとユーザ属性行列Ｖとユーザ特徴行列Ａとで表される、ＰＶ^ＴＡ＝Ｃの線形制約の下で、ユーザ特徴行列Ａ、商品特徴行列Ｂ、及び属性特徴行列Ｃを推定する。なお、特徴行列推定部４０の処理は、例えば、外部装置２００から特徴出力のリクエストが入力されたときや、予め定めた定期処理などにより、任意のタイミングで実行すればよい。 <Feature Matrix Estimation Unit 40, Iterative Determination Unit 42>
As will be described below, the feature matrix estimation unit 40 includes a user product matrix X, an attribute product matrix Y, a user attribute matrix V, an attribute ratio matrix P, a user feature matrix A stored in the user feature table 86, and a product feature table. Based on the product feature matrix B stored in 87 and the attribute feature matrix C stored in the attribute feature table 88, the attribute feature matrix C is expressed as an attribute ratio matrix P, a user attribute matrix V, and a user feature matrix A. The user feature matrix A, the product feature matrix B, and the attribute feature matrix C are estimated under the linear constraint of PV ^T A = C. Note that the processing of the feature matrix estimation unit 40 may be executed at an arbitrary timing, for example, when a feature output request is input from the external device 200 or by a predetermined periodic process.

ここで、記憶部２に格納されたユーザ商品情報テーブル８１に存在する全データからＩ×Ｊのユーザ商品行列Ｘ＝｛ｘ_ｉｊ｝_{ｉ，ｊ＝１} ^Ｉ，Ｊが得られる。また、記憶部２に格納された属性商品情報テーブル８２に存在する全データからＫ×Ｊの属性商品行列Ｙ＝｛ｙ_ｋｊ｝_{ｋ，ｊ＝１} ^Ｋ，Ｊが得られる。また、記憶部２に格納されたユーザ属性情報テーブル８３に存在する全データからＩ×Ｋのユーザ属性行列Ｖ＝｛ｖ_ｉｋ｝_{ｉ，ｋ＝１} ^Ｉ，Ｋが得られる。また、記憶部２に格納された属性比率情報テーブル８５に存在する全データからＫ×Ｋの属性比率行列Ｐ＝｛ｐ_ｋｋ｝_ｋ＝１ ^Ｋが得られる。そして、ユーザ特徴行列Ａ、商品特徴行列Ｂ、及び属性特徴行列ＣをそれぞれＡ＝｛ａ_ｉｒ｝_{ｉ＝１，ｒ＝１} ^Ｉ，Ｒ，Ｂ＝｛ｂ_ｊｒ｝_{ｊ＝１，ｒ＝１} ^Ｊ，Ｒ，Ｃ＝｛ｃ_ｋｒ｝_{ｋ＝１，ｒ＝１} ^Ｋ，Ｒとする。ユーザ特徴行列Ａがユーザとクラスタの関係、商品特徴行列Ｂが商品とクラスタの関係、属性特徴行列Ｃが属性とクラスタの関係を表す。また、Ｉは全ユーザ数、Ｊは全商品数、Ｋは全属性数を表す。また、ｉがユーザを特定する識別子、ｊが商品を特定する識別子、ｋが属性を特定する識別子、ｒがクラスタを特定する識別子に対応する。 Here, an I × J user product matrix X = {x _ij } _{i, j = 1} ^{I, J} is obtained from all data existing in the user product information table 81 stored in the storage unit 2. Further, a K × J attribute product matrix Y = {y _kj } _{k, j = 1} ^{K, J} is obtained from all data existing in the attribute product information table 82 stored in the storage unit 2. Further, an I × K user attribute matrix V = {v _ik } _{i, k = 1} ^{I, K} is obtained from all data existing in the user attribute information table 83 stored in the storage unit 2. Further, a K × K attribute ratio matrix P = {p _kk } _{k = 1} ^K is obtained from all data existing in the attribute ratio information table 85 stored in the storage unit 2. The user feature matrix A, the product feature matrix B, and the attribute feature matrix C are respectively A = {a _ir } _{i = 1, r = 1} ^{I, R} , B = {b _jr } _{j = 1, r = 1} ^{J , R 1} , C = {c _kr } _{k = 1, r = 1} ^{K, R.} The user feature matrix A represents the relationship between users and clusters, the product feature matrix B represents the relationship between products and clusters, and the attribute feature matrix C represents the relationship between attributes and clusters. I represents the total number of users, J represents the total number of products, and K represents the total number of attributes. Further, i corresponds to an identifier for specifying a user, j is an identifier for specifying a product, k is an identifier for specifying an attribute, and r is an identifier for specifying a cluster.

特徴行列推定部４０は、具体的には以下の第１〜第３の処理を行う。 Specifically, the feature matrix estimation unit 40 performs the following first to third processes.

まず、特徴行列推定部４０は、第１の処理として、ユーザ商品行列Ｘ、属性商品行列Ｙ、ユーザ属性行列Ｖ、属性比率行列Ｐ、ユーザ特徴行列Ａ、商品特徴行列Ｂ、及び属性特徴行列Ｃに基づいて、以下（２）式に従って、ユーザ特徴行列Ａの要素ａ_ｉｒを更新する。 First, the feature matrix estimation unit 40 performs, as a first process, a user product matrix X, an attribute product matrix Y, a user attribute matrix V, an attribute ratio matrix P, a user feature matrix A, a product feature matrix B, and an attribute feature matrix C. Based on the above, the element a _ir of the user feature matrix A is updated according to the following equation (2).

ただし、 However,

とする。ここで＾ｘ_ｉｊはユーザ特徴行列Ａ、及び商品特徴行列Ｂによるｘ_ｉｊの推定値であり、＾ｙ_ｋｊは属性特徴行列Ｃ、及び商品特徴行列Ｂによるｙ_ｋｊの推定値であると見なせる。 And Here _{^ x ij} is the estimated value of _{x ij} by a user feature matrix A, and product feature matrix B, _{^ y kj} can be regarded as an estimate of the _{y kj} by property characteristic matrix C, and the product feature matrix B.

そして、特徴行列推定部４０は、上記（２）式で更新したユーザ特徴行列Ａの要素ａ_ｉｒの値を、ユーザ特徴テーブル８６に格納する。 Then, the feature matrix estimation unit 40 stores the value of the element a _ir of the user feature matrix A updated by the above equation (2) in the user feature table 86.

次に、特徴行列推定部４０は、第２の処理として、ユーザ商品行列Ｘ、属性商品行列Ｙ、ユーザ特徴行列Ａ、商品特徴行列Ｂ、及び属性特徴行列Ｃに基づいて、以下（３）式に従って、商品特徴行列Ｂの各々の要素ｂ_ｊｒを更新する。 Next, as a second process, the feature matrix estimation unit 40 performs the following expression (3) based on the user product matrix X, the attribute product matrix Y, the user feature matrix A, the product feature matrix B, and the attribute feature matrix C. , Each element b _jr of the product feature matrix B is updated.

そして、特徴行列推定部４０は、上記（３）式で更新した商品特徴行列Ｂの要素ｂ_ｊｒの値を、商品特徴テーブル８７に格納する。 Then, the feature matrix estimation unit 40 stores the value of the element b _jr of the product feature matrix B updated by the above equation (3) in the product feature table 87.

次に、特徴行列推定部４０は、第３の処理として、ユーザ属性行列Ｖ、属性比率行列Ｐ、及びユーザ特徴行列Ａに基づいて、以下（４）式に従って、属性特徴行列Ｃの各々の要素ｃ_ｋｒを更新する。 Next, as a third process, the feature matrix estimation unit 40 performs each element of the attribute feature matrix C according to the following equation (4) based on the user attribute matrix V, the attribute ratio matrix P, and the user feature matrix A. c Update _kr .

そして、特徴行列推定部４０は、上記（４）式で更新した属性特徴行列Ｃの要素ｃ_ｋｒの値を、属性特徴テーブル８８に格納する。 Then, the feature matrix estimation unit 40 stores the value of the element c _kr of the attribute feature matrix C updated by the above equation (4) in the attribute feature table 88.

反復判定部４２は、反復終了条件を満足するまで、特徴行列推定部４０による推定を繰り返す。具体的には、更新前の各要素の値と、更新後の各要素の値との差の絶対値の最大値が、値更新の最大変化幅を示す変数δより大きければ、δに当該最大値を代入する。なお、一連の各行列の更新処理毎に、値更新の最大変化幅を示す変数δは初期化される。また、繰り返し条件の閾値ε、最大繰り返し回数Ｎが予め設定される。 The iteration determination unit 42 repeats the estimation by the feature matrix estimation unit 40 until the iteration end condition is satisfied. Specifically, if the maximum absolute value of the difference between the value of each element before the update and the value of each element after the update is larger than the variable δ indicating the maximum change width of the value update, the maximum value is set to δ. Assign a value. Note that the variable δ indicating the maximum change width of the value update is initialized for each series of update processing of each matrix. Further, the threshold value ε of the repetition condition and the maximum number of repetitions N are set in advance.

より詳細には、反復判定部４２は、ユーザ特徴テーブル８６に格納されていた更新前のユーザ特徴行列Ａの要素の値ａ_ｉｒ ^ｏｌｄと、上記式（２）に従って特徴行列推定部４０によって更新された更新後のユーザ特徴行列Ａの要素の値ａ_ｉｒ ^ｎｅｗの差の絶対値の最大値ｍａｘ_ｉｒ｜ａ_ｉｒ ^ｏｌｄ−ａ_ｉｒ ^ｎｅｗ｜がδより大きければ、δ←｜ａ_ｉｒ ^ｏｌｄ−ａ_ｉｒ ^ｎｅｗ｜と更新する。なお、記号「←」は右辺の計算結果を左辺の変数に代入する処理を意味する。また、代入処理前のユーザ特徴行列Ａの要素の値をａ_ｉｒ ^ｏｌｄ、代入処理後の値をａ_ｉｒ ^ｎｅｗとして記述した。 More specifically, the iterative determination unit 42 is updated by the feature matrix estimation unit 40 according to the element value a _ir ^old of the user feature matrix A before update stored in the user feature table 86 and the above equation (2). If the maximum value max _ir | _air ^old −a _ir ^new | of the difference between the element values a _ir ^new of the updated user feature matrix A is greater than δ, δ ← | _air ^old −a _ir ^new Update with |. The symbol “←” means a process of assigning the calculation result on the right side to the variable on the left side. Further, the value of the element of the user feature matrix A before the substitution process is described as a _ir ^old , and the value after the substitution process is described as a _ir ^new .

そして、反復判定部４２は、商品特徴テーブル８７に格納されていた更新前の商品特徴行列Ｂの要素の値と更新後の商品特徴行列Ｂの当該要素の値の差の絶対値の最大値ｍａｘ_ｊｒ｜ｂ_ｊｒ ^ｏｌｄ−ｂ_ｊｒ ^ｎｅｗ｜がδより大きければ、δ←ｍａｘ_ｊｒ｜ｂ_ｊｒ ^ｏｌｄ−ｂ_ｊｒ ^ｎｅｗ｜と更新する。なお、代入処理前の商品特徴行列Ｂの要素の値をｂ_ｊｒ ^ｏｌｄ、代入処理後の値をｂ_ｊｒ ^ｎｅｗとして記述した。 The iterative determination unit 42 then determines the maximum value max of the absolute value of the difference between the element value of the product feature matrix B before the update stored in the product feature table 87 and the value of the element of the product feature matrix B after the update. _jr | if is greater than _{_{δ, δ ← max jr | |}} b jr old -b jr new b jr old -b jr new | to update. In addition, the value of the element of the product feature matrix B before the substitution process is described as b _jr ^old , and the value after the substitution process is described as b _jr ^new .

そして、反復判定部４２は、属性特徴テーブル８８に格納されていた更新前の属性特徴行列Ｃの要素の値と更新後の属性特徴行列Ｃの要素の値の差の絶対値の最大値ｍａｘ_ｋｒ｜ｃ_ｋｒ ^ｏｌｄ−ｃ_ｋｒ ^ｎｅｗ｜がδより大きければ、δ←ｍａｘ_ｋｒ｜ｃ_ｋｒ ^ｏｌｄ−ｃ_ｋｒ ^ｎｅｗ｜と更新する。代入処理前の属性特徴行列Ｃの要素の値をｃ_ｋｒ ^ｏｌｄ、代入処理後の値をｃ_ｋｒ ^ｎｅｗとして記述した。 The iterative determination unit 42 then determines the maximum value max _kr of the absolute value of the difference between the element value of the attribute feature matrix C before update stored in the attribute feature table 88 and the value of the element of the attribute feature matrix C after update. If | c _kr ^old −c _kr ^new | is larger than δ, update as δ ← max _kr | c _kr ^old −c _kr ^new |. The value of the element of the attribute feature matrix C before the substitution process is described as c _kr ^old , and the value after the substitution process is described as c _kr ^new .

反復判定部４２は、繰り返し回数ｎが、予め定めた最大繰り返し数Ｎを超えるか、特徴行列推定部４０による更新による最大変化幅を表すδが予め定めた閾値εより小さければ特徴行列推定部４０による更新処理を終了する。 The iterative determination unit 42 determines the feature matrix estimation unit 40 if the number of iterations n exceeds a predetermined maximum number of repetitions N or if δ representing the maximum change width due to update by the feature matrix estimation unit 40 is smaller than a predetermined threshold ε. The update process by is terminated.

上記（２）〜（４）式の各更新式は、全てのユーザｉ、商品ｊ、及び属性ｋについて＾ｘ_ｉｊ＝ｘ_ｉｊ、＾ｙ_ｋｊ＝ｙ_ｋｊが成立する時、左辺と右辺が一致し、更新の最大変化幅を示す変数δの値が閾値ε以下となるため、更新が停止することが分かる。また、あるユーザｉについて、全てのｊ、ｋについて＾ｘ_ｉｊ＜ｘ_ｉｊ、＾ｙ_ｋｊ＜ｙ_ｋｊである時に上記（２）式の更新を行うと、右辺の分子が右辺の分母より大きくなるために、ａ_ｉｒを現在の値よりも大きくなるように更新することとなり、＾ｘ_ｉｊと＾ｙ_ｋｊの値が大きくなるように特徴となる要素ａ_ｉｒを更新することになる。 Each of the updating formulas (2) to (4) is such that the left side and the right side are _equal when ^ x _ij = x _ij and ^ y _kj = y _kj for all users i, products j, and attributes k. In addition, it can be seen that the update stops because the value of the variable δ indicating the maximum change width of the update is equal to or less than the threshold ε. In addition, for a certain user i, when the above equation (2) is updated when ^ x _ij <x _ij and ^ y _kj <y _kj for all j and k, the numerator on the right side becomes larger than the denominator on the right side. Therefore, a _ir is updated to be larger than the current value, and the characteristic element a _ir is updated so that the values of ^ x _ij and ^ y _kj are increased.

図１４に、特徴行列推定部４０、及び反復判定部４２における特徴行列推定処理のフローチャートを示す。 FIG. 14 shows a flowchart of the feature matrix estimation process in the feature matrix estimation unit 40 and the iteration determination unit 42.

まず、ステップＳ１００で、ユーザ特徴テーブル８６に格納されているユーザ特徴行列Ａ、商品特徴テーブル８７に格納されている商品特徴行列Ｂ、及び属性特徴テーブル８８に格納されている属性特徴行列Ｃをそれぞれ初期化する。同様に終了条件の閾値ε、及び最大繰り返し回数を設定する。 First, in step S100, the user feature matrix A stored in the user feature table 86, the product feature matrix B stored in the product feature table 87, and the attribute feature matrix C stored in the attribute feature table 88 are respectively obtained. initialize. Similarly, the threshold value ε of the end condition and the maximum number of repetitions are set.

次に、ステップＳ１０２で、終了条件の判定に用いる、更新の最大変化幅を示す変数δを初期化する。 Next, in step S102, a variable δ indicating the maximum change width of the update used for determining the end condition is initialized.

ステップＳ１０４では、ユーザ商品情報テーブル８１に格納されているユーザ商品行列Ｘ、属性商品情報テーブル８２に格納されている属性商品行列Ｙ、ユーザ属性情報テーブル８３に格納されているユーザ属性行列Ｖ、属性比率情報テーブル８５に格納されている属性比率行列Ｐ、ユーザ特徴テーブル８６に格納されているユーザ特徴行列Ａ、商品特徴テーブル８７に格納されている商品特徴行列Ｂ、及び属性特徴テーブル８８に格納されている属性特徴行列Ｃに基づいて、上記（２）式に従って、ユーザ特徴行列Ａを更新し、ユーザ特徴テーブル８６に格納する。そして、ユーザ特徴テーブル８６に格納されていた更新前のユーザ特徴行列Ａの要素の値ａ_ｉｒ ^ｏｌｄと、上記式（２）に従って特徴行列推定部４０によって更新された更新後のユーザ特徴行列Ａの要素の値ａ_ｉｒ ^ｎｅｗの差の絶対値の最大値ｍａｘ_ｉｒ｜ａ_ｉｒ ^ｏｌｄ−ａ_ｉｒ ^ｎｅｗ｜がδより大きければ、δ←｜ａ_ｉｒ ^ｏｌｄ−ａ_ｉｒ ^ｎｅｗ｜と更新する。 In step S104, the user product matrix X stored in the user product information table 81, the attribute product matrix Y stored in the attribute product information table 82, the user attribute matrix V stored in the user attribute information table 83, the attribute The attribute ratio matrix P stored in the ratio information table 85, the user feature matrix A stored in the user feature table 86, the product feature matrix B stored in the product feature table 87, and the attribute feature table 88 are stored. Based on the attribute feature matrix C, the user feature matrix A is updated according to the above equation (2) and stored in the user feature table 86. Then, the element value a _ir ^old of the user feature matrix A before update stored in the user feature table 86 and the updated user feature matrix A updated by the feature matrix estimation unit 40 according to the above equation (2). If the maximum value max _ir | a _ir ^old −a _ir ^new | of the absolute value of the difference between the element values a _ir ^new is larger than δ, δ ← | a _ir ^old −a _ir ^new | is updated.

ステップＳ１０６では、ユーザ商品行列Ｘ、属性商品行列Ｙ、ユーザ特徴行列Ａ、商品特徴行列Ｂ、及び属性特徴行列Ｃに基づいて、上記（３）式に従って、商品特徴行列Ｂを更新し、商品特徴テーブル８７に格納する。そして、商品特徴テーブル８７に格納されていた更新前の商品特徴行列Ｂの要素の値と更新後の商品特徴行列Ｂの当該要素の値の差の絶対値の最大値ｍａｘ_ｊｒ｜ｂ_ｊｒ ^ｏｌｄ−ｂ_ｊｒ ^ｎｅｗ｜がδより大きければ、δ←ｍａｘ_ｊｒ｜ｂ_ｊｒ ^ｏｌｄ−ｂ_ｊｒ ^ｎｅｗ｜と更新する。 In step S106, based on the user product matrix X, the attribute product matrix Y, the user feature matrix A, the product feature matrix B, and the attribute feature matrix C, the product feature matrix B is updated according to the above equation (3), and the product feature Store in table 87. Then, the maximum value max _jr | b _jr ^old − of the difference between the element value of the product feature matrix B before update stored in the product feature table 87 and the value of the element of the product feature matrix B after update. If b _jr ^new | is larger than δ, δ ← max _jr | b _jr ^old −b _jr ^new | is updated.

ステップＳ１０８では、ユーザ属性行列Ｖ、属性比率行列Ｐ、及びユーザ特徴行列Ａに基づいて、上記（４）式に従って、属性特徴行列Ｃを更新し、属性特徴テーブル８８に格納する。そして、属性特徴テーブル８８に格納されていた更新前の属性特徴行列Ｃの要素の値と更新後の属性特徴行列Ｃの要素の値の差の絶対値の最大値ｍａｘ_ｋｒ｜ｃ_ｋｒ ^ｏｌｄ−ｃ_ｋｒ ^ｎｅｗ｜がδより大きければ、δ←ｍａｘ_ｋｒ｜ｃ_ｋｒ ^ｏｌｄ−ｃ_ｋｒ ^ｎｅｗ｜と更新する。 In step S108, the attribute feature matrix C is updated according to the above equation (4) based on the user attribute matrix V, the attribute ratio matrix P, and the user feature matrix A, and stored in the attribute feature table 88. Then, the maximum absolute value max _kr | c _kr ^old −c of the difference between the element value of the attribute feature matrix C before update stored in the attribute feature table 88 and the value of the element of the attribute feature matrix C after update. _{If kr} ^new | is larger than δ, δ ← max _kr | c _kr ^old −c _kr ^new | is updated.

ステップＳ１１０では、計算繰り返し回数を更新する。 In step S110, the calculation repetition count is updated.

ステップＳ１１２では、計算繰り返し回数がステップＳ１００で定めた最大繰り返し回数を超えるか、更新による最大変化幅を表す変数δがステップＳ１００で定めた閾値εより小さければ処理を終了する。 In step S112, if the number of calculation iterations exceeds the maximum number of iterations determined in step S100, or if the variable δ representing the maximum change width due to the update is smaller than the threshold ε defined in step S100, the process ends.

＜特徴行列処理部４４＞
特徴行列処理部４４は、特徴行列推定部４０の更新処理が終了した場合に、記憶部２に格納された更新済みの各テーブルの値を出力する。例えば、外部装置２００から特徴出力のリクエストが入力された場合に実行すればよい。出力は全特徴を出力する場合には、ユーザ特徴テーブル８６、商品特徴テーブル８７、及び属性特徴テーブル８８の全ての行を出力すればよい。また、クラスタの商品特徴のみを利用する場合には、例えばリクエストの引数をクラスタＩＤとして、商品特徴テーブル８７から、当該クラスタＩＤを持つ行の商品ＩＤフィールド、商品特徴値フィールドを出力した後、商品特徴値フィールドの値の大きい順に商品ＩＤ１０件を表示することで当該クラスタを特徴づける商品を求めることができる。 <Feature matrix processing unit 44>
The feature matrix processing unit 44 outputs the value of each updated table stored in the storage unit 2 when the update process of the feature matrix estimation unit 40 is completed. For example, it may be executed when a feature output request is input from the external device 200. When all features are output, all the rows of the user feature table 86, the product feature table 87, and the attribute feature table 88 may be output. When only the product features of the cluster are used, for example, the product argument field and the product feature value field of the row having the cluster ID are output from the product feature table 87 using the request argument as the cluster ID, and then the product A product that characterizes the cluster can be obtained by displaying 10 product IDs in descending order of the value of the feature value field.

出力部４は、特徴行列処理部４４によって出力された各特徴を、外部装置２００へ出力する。 The output unit 4 outputs each feature output by the feature matrix processing unit 44 to the external device 200.

以上説明したように、本発明の実施の形態に係るデータ解析装置によれば、ユーザ属性行列Ｖと、属性人数ベクトルＷとに基づいて、比率を表す要素ｐ_ｋｋを持つ属性比率行列Ｐを計算し、ユーザ商品行列Ｘ、属性商品行列Ｙ、ユーザ特徴行列Ａ、商品特徴行列Ｂ、及び属性特徴行列Ｃに基づいて、属性特徴行列Ｃが、属性比率行列Ｐとユーザ属性行列Ｖとユーザ特徴行列Ａとで表わされる線形制約の下で、ユーザ特徴行列Ａ、商品特徴行列Ｂ、及び属性特徴行列Ｃを推定することにより、個々を識別不能なユーザ群の特徴を含めて、全体のユーザ群の特徴を解析できる、という効果が得られる。 As described above, according to the data analysis apparatus according to the embodiment of the present invention, the attribute ratio matrix P having the element p _kk representing the ratio is calculated based on the user attribute matrix V and the attribute number vector W. Based on the user product matrix X, the attribute product matrix Y, the user feature matrix A, the product feature matrix B, and the attribute feature matrix C, the attribute feature matrix C is divided into an attribute ratio matrix P, a user attribute matrix V, and a user feature matrix. By estimating the user feature matrix A, the product feature matrix B, and the attribute feature matrix C under the linear constraint represented by A, including the features of the user group that cannot be individually identified, The effect that the feature can be analyzed is obtained.

なお、本発明は、上述した実施の形態に限定されるものではなく、この発明の要旨を逸脱しない範囲内で様々な変形や応用が可能である。 The present invention is not limited to the above-described embodiment, and various modifications and applications can be made without departing from the gist of the present invention.

例えば、上述した実施の形態では、ユーザ商品行列と属性商品行列からクラスタを抽出する例を示しているが、この例に限定されるものではない。 For example, in the above-described embodiment, an example in which clusters are extracted from the user product matrix and the attribute product matrix is shown, but the present invention is not limited to this example.

例えば、個々を識別可能な第１の個体群を第１のユーザ群以外のものとし、個々を識別不能な第２の個体群を第２のユーザ群以外のものとし、オブジェクトを商品以外のものとし、個々を識別可能な第１の個体群に含まれる個体ｉ（１≦ｉ≦Ｉ，Ｉは１以上の整数）とオブジェクトｊ（１≦ｊ≦Ｊ，Ｊは１以上の整数）との関連度を表した要素ｘ_ｉｊを持つＩ×Ｊの個体オブジェクト行列Ｘ、及び個々を識別不能な第２の個体群についての、個体の属性ｋ（１≦ｋ≦Ｋ，Ｋは１以上の整数）と前記オブジェクトｊとの関連度を表した要素ｙ_ｋｊを持つＫ×Ｊの属性オブジェクト行列Ｙを、前記個体ｉが、クラスタｒ（１≦ｒ≦Ｒ，Ｒは１以上の整数）に所属することを表す非負値の要素ａ_ｉｒを持つＩ×Ｒの第１の特徴行列Ａと、前記オブジェクトｊが、前記クラスタｒに所属することを表す非負値の要素ｂ_ｊｒを持つＪ×Ｒの第２の特徴行列Ｂと、前記属性ｋが、前記クラスタｒに所属することを表す非負値の要素ｃ_ｋｒを持つＫ×Ｒの第３の特徴行列Ｃと、に分解するようにしてもよい。 For example, the first group of individuals that can be individually identified is other than the first user group, the second group of individuals that cannot be individually identified is other than the second user group, and the object is other than a product. And an individual i (1 ≦ i ≦ I, I is an integer of 1 or more) and an object j (1 ≦ j ≦ J, J is an integer of 1 or more) included in the first individual group that can identify each individual The individual object matrix X of I × J having the element x _ij representing the degree of association and the second individual group that cannot identify each individual attribute k (1 ≦ k ≦ K, where K is an integer greater than or equal to 1 an attribute object matrix Y of K × J with) an element y _kj representing the degree of relevance between the object j, the individual i is the cluster r (1 ≦ r ≦ R, R is belonging to an integer of 1 or more) a first feature matrix a I × R with elements a _ir of non-negative values indicating that the Obuji Transfected j is a second feature matrix B J × R with elements b _jr nonnegative value indicating that belonging to the cluster r, the attribute k is a non-negative value indicating that belonging to the cluster r It may be decomposed into a K × R third feature matrix C having an element c _kr .

例えば、ユーザとユーザの訪問地の訪問数を表現する行列、訪問地における属性毎の訪問数を表現する行列の組など、訪問地、ユーザ、カテゴリのように１つ１つにＩＤ番号を付与して識別可能であり行列形式としてデータを表現することが可能な事物であり、ユーザと当該ユーザが所属する属性のように対応関係性が存在するものならば、あらゆるものがデータ解析装置によってクラスタ抽出が可能である。 For example, an ID number is assigned to each of the visited place, the user, and the category, such as a matrix that represents the number of visits between the user and the visited place of the user and a set of matrices that represents the number of visits for each attribute in the visited place Can be identified and can represent data in a matrix format, and if there is a correspondence relationship between the user and the attribute to which the user belongs, everything is clustered by the data analyzer. Extraction is possible.

また、出現数や購入回数のように整数である必要もなく、一般に０以上の実数であればよい。入力となる行列が３つ以上存在する場合にも、本発明の実施の形態に係る方法は適用可能である。 Further, it is not necessary to be an integer like the number of appearances and the number of purchases, and generally a real number of 0 or more is sufficient. The method according to the embodiment of the present invention can also be applied when there are three or more input matrices.

また、上述した実施の形態の図５に示すデータ解析装置の各構成要素の動作をプログラムとして構築し、データ解析装置として利用されるコンピュータにインストールして実行させる、又は、ネットワークを介して流通させることが可能である。 Further, the operation of each component of the data analysis apparatus shown in FIG. 5 of the above-described embodiment is constructed as a program and installed in a computer used as the data analysis apparatus to be executed or distributed via a network. It is possible.

１入力部
２記憶部
３演算部
４出力部
３０ユーザ商品情報処理部
３２属性商品情報処理部
３４ユーザ属性情報処理部
３６属性人数情報処理部
３８属性比率情報計算部
４０特徴行列推定部
４２反復判定部
４４特徴行列処理部
８１ユーザ商品情報テーブル
８２属性商品情報テーブル
８３ユーザ属性情報テーブル
８４属性人数情報テーブル
８５属性比率情報テーブル
８６ユーザ特徴テーブル
８７商品特徴テーブル
８８属性特徴テーブル
１００データ解析装置
２００外部装置 DESCRIPTION OF SYMBOLS 1 Input part 2 Memory | storage part 3 Calculation part 4 Output part 30 User product information processing part 32 Attribute product information processing part 34 User attribute information processing part 36 Attribute number information processing part 38 Attribute ratio information calculation part 40 Feature matrix estimation part 42 Iterative determination Unit 44 Feature matrix processing unit 81 User product information table 82 Attribute product information table 83 User attribute information table 84 Attribute number information table 85 Attribute ratio information table 86 User feature table 87 Product feature table 88 Attribute feature table 100 Data analysis device 200 External device

Claims

Degree of association between an individual i (1 ≦ i ≦ I, I is an integer of 1 or more) and an object j (1 ≦ j ≦ J, J is an integer of 1 or more) included in the first individual group that can identify each individual An individual object matrix X of I × J having an element x _ij that represents and an individual attribute k (1 ≦ k ≦ K, where K is an integer of 1 or more) for a second individual group that cannot be identified individually A K × J attribute object matrix Y having an element y _kj representing the degree of association with the object j,
A first feature matrix A of I × R having a non-negative element a _ir indicating that the individual i belongs to a cluster r (1 ≦ r ≦ R, R is an integer of 1 or more), and the object j Is a J × R second feature matrix B having a non-negative element b _jr indicating that it belongs to the cluster r, and a non-negative element c indicating that the attribute k belongs to the cluster r. a data analysis device that decomposes into a K × R third feature matrix C having _kr ,
An I × K individual attribute matrix V having an element v _ik representing the degree of association between the individual i and the attribute k included in the first individual group, and the attribute k for the second individual group. Based on a K-dimensional attribute vector W having an element w _k representing the number of individuals, the number of individuals having the attribute k for the second population and the attribute k for the first population are An attribute ratio information calculation unit for calculating an attribute ratio matrix P having an element p _kk representing a ratio with the number of individuals having the number of individuals,
Based on the individual object matrix X, the attribute object matrix Y, the first feature matrix A, the second feature matrix B, and the third feature matrix C, the third feature matrix C Under the linear constraints represented by the attribute ratio matrix P, the individual attribute matrix V, and the first feature matrix A, the first feature matrix A, the second feature matrix B, and the third feature. A feature matrix estimator for estimating the matrix C;
An iterative determination unit that repeats estimation by the feature matrix estimation unit until a predetermined iteration end condition is satisfied;
Data analysis device including

The element x _ij of the individual object matrix X is non-negative, the element y _kj of the attribute object matrix Y is non-negative, the element a _ir of the first feature matrix A is non-negative, and the second The element b _jr of the feature matrix B of FIG. 6 is a non-negative value, the element c _kr of the third feature matrix C is a non-negative value,
The data analysis apparatus according to claim 1, wherein the feature matrix estimation unit estimates the first feature matrix A, the second feature matrix B, and the third feature matrix C by nonnegative decomposition.

The number of purchases of a product j (1 ≦ j ≦ J, where J is an integer of 1 or more) by a user i (1 ≦ i ≦ I, I is an integer of 1 or more) included in the first user group that can be identified individually. I × J user product matrix X having the element x _ij represented, and a user attribute k (1 ≦ k ≦ K, where K is an integer equal to or greater than 1) for the second user group that cannot be identified individually A K × J attribute product matrix Y having an element Y _kj representing the number of purchases of the product j by the user,
An I × R user feature matrix A having a non-negative element a _ir indicating that the user i belongs to a cluster r (1 ≦ r ≦ R, R is an integer of 1 or more), and the product j is: A J × R product feature matrix B having a non-negative element b _jr representing belonging to the cluster r, and a K having a non-negative element c _kr representing that the attribute k belongs to the cluster r A data analysis device that decomposes into an attribute feature matrix C of × R,
An I × K user attribute matrix V having an element v _ik representing the degree of association between the user i and the attribute k included in the first user group, and the attribute k for the second user group. The number of users having the attribute k for the second user group and the attribute k for the first user group based on the K-dimensional attribute number vector W having an element w _k representing the number of users having An attribute ratio information calculation unit for calculating an attribute ratio matrix P having an element p _kk representing a ratio with the number of users having;
Based on the user product matrix X, the attribute product matrix Y, the user feature matrix A, the product feature matrix B, and the attribute feature matrix C, the attribute feature matrix C includes the attribute ratio matrix P and the user attribute. A feature matrix estimator for estimating the user feature matrix A, the product feature matrix B, and the attribute feature matrix C under linear constraints represented by a matrix V and the user feature matrix A;
An iterative determination unit that repeats estimation by the feature matrix estimation unit until a predetermined iteration end condition is satisfied;
Data analysis device including

Degree of association between an individual i (1 ≦ i ≦ I, I is an integer of 1 or more) and an object j (1 ≦ j ≦ J, J is an integer of 1 or more) included in the first individual group that can identify each individual An individual object matrix X of I × J having an element x _ij that represents and an individual attribute k (1 ≦ k ≦ K, where K is an integer of 1 or more) for a second individual group that cannot be identified individually A K × J attribute object matrix Y having an element Y _kj representing the degree of association with the object j,
A first feature matrix A of I × R having a non-negative element a _ir indicating that the individual i belongs to a cluster r (1 ≦ r ≦ R, R is an integer of 1 or more), and the object j Is a J × R second feature matrix B having a non-negative element b _jr indicating that it belongs to the cluster r, and a non-negative element c indicating that the attribute k belongs to the cluster r. A data analysis method in a data analysis apparatus that decomposes into a K × R third feature matrix C having _kr ,
An attribute ratio information calculation unit includes an I × K individual attribute matrix V having an element v _ik representing a degree of association between the individual i and the attribute k included in the first individual group, and the second individual The number of individuals having the attribute k for the second individual group and the first individual based on a K-dimensional attribute vector W having an element w _k representing the number of individuals having the attribute k for the group Calculating an attribute ratio matrix P having an element p _kk that represents the ratio of the group to the number of individuals having the attribute k;
A feature matrix estimator is configured to generate the third object matrix X, the attribute object matrix Y, the first feature matrix A, the second feature matrix B, and the third feature matrix C based on the third feature matrix C. The feature matrix C is the first feature matrix A, the second feature matrix B, under linear constraints represented by the attribute ratio matrix P, the individual attribute matrix V, and the first feature matrix A. And estimating the third feature matrix C;
Repeating the estimation by the feature matrix estimation unit until the iteration determination unit satisfies a predetermined iteration termination condition;
Data analysis method including

The program for functioning a computer as each part of the data analysis apparatus of any one of Claims 1-3.