JP5965011B2

JP5965011B2 - Method and apparatus for relational model determination

Info

Publication number: JP5965011B2
Application number: JP2015051733A
Authority: JP
Inventors: ロフン; シュンチェンリュウ; 遼平藤巻; 慎二中台
Original assignee: NEC China Co Ltd
Current assignee: NEC China Co Ltd
Priority date: 2014-03-28
Filing date: 2015-03-16
Publication date: 2016-08-03
Anticipated expiration: 2035-03-16
Also published as: CN104951642A; JP2015191667A

Description

本開示は、統計の技術分野に関する。特に、リレーショナルモデルのモデル選択用の方法と装置に関する。 The present disclosure relates to the technical field of statistics. In particular, it relates to a method and apparatus for model selection of a relational model.

統計的手法の継続的な発展に伴い、オブジェクト間のリレーショナル情報をモデリングすることは注目の話題になっている。オブジェクト間には様々なリレーショナル情報が存在し、例えば、調査対象の母集団に含まれる人々の間の連絡先情報、インターネット上のページ間のリンクのリレーショナル情報が存在する。様々なリレーショナル情報は、１つのカテゴリにおけるオブジェクト間又は複数のカテゴリにおけるオブジェクト間の相関を表現する。また、リレーショナル情報についての分析により、より価値のある情報が取得されてよい。このため、リレーショナル情報に基づく適用がますます勧められており、その１つはリレーショナル情報に応じた異なるサンプルデータ用にリレーションクラスタリングを行っている。しかしながら、リレーションクラスタリング中、通常、リレーショナルモデルが使用される。例えば、映画会社が、顧客の応答情報に基づいて現在上映中の映画シリーズに総合的な評価を与えることを望む場合、現在上映中の映画シリーズについてユーザのグループにより与えられたスコアを収集し、リレーショナルモデルを使用することによってユーザと映画を異なるサンプルカテゴリにグループ化する。このようにして、ユーザ、映画及びスコアについて同時クラスタリングが実現される。また、クラスタリング結果解析によって、その視聴者の特徴といった映画についての有益な情報が取得される。実際のクラスタリング中、リレーショナル情報と共に、属性情報のような非リレーショナル情報が一般的に使用される。そのため、現在、クラスタリングにおいてリレーショナル情報を非リレーショナル情報に連携させること（例えばリレーショナル情報及び非リレーショナル情報に応じてリレーショナルモデルを決定すること）は、リレーションクラスタリングについて学習するための重要な課題になっている。 With the continuous development of statistical methods, modeling relational information between objects has become a hot topic. Various relational information exists between the objects, for example, contact information between people included in the surveyed population, and relational information of links between pages on the Internet. Various relational information represents the correlation between objects in one category or between objects in multiple categories. Also, more valuable information may be acquired by analyzing relational information. For this reason, application based on relational information is increasingly recommended, one of which is relation clustering for different sample data according to relational information. However, during relation clustering, a relational model is usually used. For example, if a movie company wants to give a comprehensive rating to a currently playing movie series based on customer response information, it collects the score given by the group of users for the currently showing movie series, Group users and movies into different sample categories by using a relational model. In this way, simultaneous clustering is realized for users, movies and scores. In addition, useful information about the movie such as the characteristics of the viewer is acquired by the clustering result analysis. During actual clustering, non-relational information such as attribute information is commonly used along with relational information. Therefore, at present, linking relational information with non-relational information in clustering (for example, determining a relational model according to relational information and non-relational information) is an important issue for learning about relation clustering. .

実際には、リレーショナルモデルは、潜在変数及びモデルパラメータ、又は、潜在変数の変分分布及びモデルパラメータによって決定される。潜在変数は、直接観測できないがサンプルデータから導出できる変数のことである。潜在変数の変分分布は、対応するカテゴリにおけるクラスタリングサンプルデータの確率を記述するために使用される。一方、モデルパラメータは、各カテゴリ下のサブモデルのパラメータを記述するために使用される。現在、回帰に基づく潜在因子モデル（Regression-based latent factor models）［ディーパックアガーワル（Deepak Agarwal）他著、ＫＤＤ’０９会報、２００９］という論文において、潜在変数及びモデルパラメータを決定する方法が提案されている。この方法において、第一に、サンプルデータ、２つのサンプル属性のグループ、２つの潜在変数のグループ、及びモデルパラメータに応じて対数尤度が取得される。第二に、目的関数が対数尤度に応じて与えられ、更に、目的関数の収束を可能にする潜在変数の各々及びモデルパラメータがサンプリング手段によって決定される。目的関数の収束を可能にする潜在変数及びモデルパラメータは、リレーショナルモデル決定用の潜在変数及びモデルパラメータとして機能してよい。 In practice, the relational model is determined by latent variables and model parameters, or variational distributions and model parameters of latent variables. A latent variable is a variable that cannot be observed directly but can be derived from sample data. The variational distribution of latent variables is used to describe the probability of clustering sample data in the corresponding category. On the other hand, the model parameter is used to describe the parameters of the sub model under each category. Currently, a paper called Regression-based latent factor models [Deepak Agarwal et al., KDD'09 Bulletin, 2009] proposes a method for determining latent variables and model parameters. ing. In this method, first, log likelihood is obtained according to sample data, two sample attribute groups, two latent variable groups, and model parameters. Secondly, an objective function is given as a function of log likelihood, and further, each of the latent variables and model parameters that enable convergence of the objective function are determined by the sampling means. Latent variables and model parameters that enable convergence of the objective function may function as latent variables and model parameters for relational model determination.

本開示の実施例中、発明者は、先行技術が少なくとも下記のような課題を有することを確認している。 In the examples of the present disclosure, the inventors have confirmed that the prior art has at least the following problems.

目的関数が対数尤度のみに応じて決定されるため、この目的関数によって決定されるリレーショナルモデルは最適なモデル構造及びパラメータを自動的に獲得することができない。また、モデル選択は非常に複雑である。加えて、目的関数の収束を可能にする潜在変数及びモデルパラメータがサンプリング手段によって決定されるときの効率が悪い。また、決定されたリレーショナルモデルの精度が低い。 Since the objective function is determined only depending on the log likelihood, the relational model determined by this objective function cannot automatically obtain the optimal model structure and parameters. Also, model selection is very complex. In addition, it is inefficient when the latent variables and model parameters that enable the convergence of the objective function are determined by the sampling means. Also, the accuracy of the determined relational model is low.

先行技術における課題を解決するために、本開示の実施形態は、リレーショナルモデル決定用の方法及び装置を提供する。技術的解決方法は具体的には下記の通りである。 In order to solve the problems in the prior art, embodiments of the present disclosure provide a method and apparatus for relational model determination. Specifically, the technical solution is as follows.

第１の観点によると、提供されるリレーショナルモデル決定用の方法は、
サンプルデータ、少なくとも２つのサンプル属性のグループ、少なくとも２つの潜在変数のグループ、及びモデルパラメータに応じて決定される対数尤度と、正規化項と、前記潜在変数の各々の変分分布の対数と、を取得することと、
前記対数尤度と、前記正規化項と、前記潜在変数の各々の前記変分分布の対数と、に応じて目的関数を決定することと、
前記目的関数の収束を可能にする前記潜在変数の各々の変分分布及びモデルパラメータを決定し、前記目的関数の収束を可能にする前記潜在変数の各々の前記変分分布及び前記モデルパラメータに応じてリレーショナルモデルを決定することと、
を備える。 According to a first aspect, the provided relational model determination method is:
Logarithmic likelihood determined according to sample data, a group of at least two sample attributes, a group of at least two latent variables, and model parameters, a normalization term, and a logarithm of the variational distribution of each of the latent variables; Getting, and
Determining an objective function according to the log likelihood, the normalization term, and the logarithm of the variational distribution of each of the latent variables;
Determining a variational distribution and model parameters of each of the latent variables enabling convergence of the objective function, and depending on the variational distribution and model parameters of each of the latent variables enabling convergence of the objective function To determine the relational model
Is provided.

第１の観点に関し、第１の観点の第１の可能な実施例において、前記サンプルデータ、前記少なくとも２つのサンプル属性のグループ、前記少なくとも２つの潜在変数のグループ、及び前記モデルパラメータに応じて決定される前記対数尤度は、

であって、ｌｏｇｐ（）は前記対数尤度を示し、ｐは同時確率密度関数を示し、

は前記サンプルデータを示し、Ｎ_ｒは行のサンプルの数を示し、Ｎ_ｃは列のサンプルの数を示し、Ａ^Ｒは行のサンプル属性のセットを示し、Ａ^Ｃは列のサンプル属性のセットを示し、Ｚ^Ｒは行の潜在変数行列を示し、Ｚ^ｃは列の潜在変数行列を示し、θはモデルパラメータのセットを示し、前記モデルパラメータはα、β、φ、η、ξを含み、α、βはそれぞれ行と列の混合比であり、φは各サンプルカテゴリにおけるサブモデルパラメータを示し、ηは各サンプルカテゴリにおける行のサンプル属性の前記モデルパラメータを示し、ξは各サンプルカテゴリにおける列のサンプル属性の前記モデルパラメータを示す。 Regarding the first aspect, in a first possible embodiment of the first aspect, determined according to the sample data, the group of at least two sample attributes, the group of at least two latent variables, and the model parameter The log likelihood to be

Where logp () indicates the log likelihood, p indicates the joint probability density function,

Represents the sample data, the number of samples N _r row, N _c denotes the number of samples of the column, shows a set of sample attributes of A ^R rows, A ^C is the set of samples attribute column are shown, indicates the latent variable matrix Z ^R rows, Z ^c represents the latent variable matrix column, theta represents the set of model parameters, the model parameters include alpha, beta, phi, eta, and xi], α and β are the row and column mixing ratios, φ indicates the sub-model parameter in each sample category, η indicates the model parameter of the row sample attribute in each sample category, and ξ indicates the column in each sample category The model parameters of the sample attributes are shown.

第１の観点に関し、第１の観点の第２の可能な実施例において、前記サンプルデータ、前記少なくとも２つのサンプル属性のグループ、前記少なくとも２つの潜在変数のグループ、及び前記モデルパラメータに応じて決定される前記正規化項は、

であって、Ｎ_ｒは行のサンプルの数を示し、Ｎ_ｃは列のサンプルの数を示し、Ｋ_ｒは行のサンプルカテゴリの数を示し、Ｋ_ｃは列のサンプルカテゴリの数を示し、

は前記潜在変数の前記変分分布の近似値を示し、

は第ｐ行のサンプルカテゴリに対する第ｉ行のサンプルデータの会員を記述するための行の潜在変数を示し、

は第ｑ列のサンプルカテゴリに対する第ｊ列のサンプルデータの会員を記述するための列の潜在変数を示し、αとβはそれぞれ行と列の混合比であり、Ｄ_αはαの次元を示し、Ｄ_βはβの次元を示し、Ｄ_ｐｑは第ｐ行第ｑ列のサンプルカテゴリにおけるサブモデルパラメータの次元を示し、η_ｐは第ｐ行のサンプルカテゴリにおける行のサンプル属性の前記モデルパラメータを示し、

はη_ｐの次元を示し、ξ_ｑは第ｑ列のサンプルカテゴリにおける列のサンプル属性の前記モデルパラメータを示し、

はξ_ｑの次元を示し、Ｌ（ａ，ｂ）＝ｌｏｇｂ＋(ａ−ｂ)／ｂであり、

においてａは

を示し、ｂは

を示し、

においてａは

を示し、ｂは

を示し、

においてａは

を示し、ｂは

を示す。 With respect to the first aspect, in a second possible embodiment of the first aspect, determined according to the sample data, the group of at least two sample attributes, the group of at least two latent variables, and the model parameter The normalized term to be

Where N _r indicates the number of samples in the row, N _c indicates the number of samples in the column, K _r indicates the number of sample categories in the row, K _c indicates the number of sample categories in the column,

Indicates an approximation of the variational distribution of the latent variable,

Indicates a latent variable in the row for describing the membership of the sample data in row i for the sample category in row p,

Indicates the latent variables of the column for describing the membership of the sample data of the j-th column for the sample category of the q-th column, α and β are the mixture ratio of rows and columns, respectively, and D _α indicates the dimension of α , D _β denotes the dimension of β, D _pq denotes the dimension of the submodel parameter in the sample category of the pth row and the qth column, and η _p denotes the model parameter of the sample attribute of the row in the sample category of the pth row. Show

Denotes the dimension of η _p , ξ _q denotes the model parameter of the column sample attribute in the q-th column sample category,

Indicates the dimension of ξ _q , L (a, b) = logb + (ab −) / b,

A is

And b is

Indicate

A is

And b is

Indicate

A is

And b is

Indicates.

第１の観点に関し、第１の観点の第３の可能な実施例において、前記サンプルデータ、前記少なくとも２つのサンプル属性のグループ、前記少なくとも２つの潜在変数のグループ、及び前記モデルパラメータに応じて決定される前記潜在変数の各々の前記変分分布の対数はｌｏｇｑ（Ｚ^Ｒ）及びｌｏｇｑ（Ｚ^ｃ）であり、ｑ（Ｚ^Ｒ）は行の潜在変数行列Ｚ^Ｒの前記変分分布を示し、ｑ（Ｚ^ｃ）は列の潜在変数行列Ｚ^ｃの前記変分分布を示す。 Regarding the first aspect, in a third possible embodiment of the first aspect, determined according to the sample data, the group of at least two sample attributes, the group of at least two latent variables, and the model parameter The logarithm of the variational distribution of each of the latent variables to be performed is logq (Z ^R ) and logq (Z ^c ), q (Z ^R ) denotes the variational distribution of the latent variable matrix Z ^{R in} a row, q (Z ^c ) represents the variational distribution of the column latent variable matrix Z ^c .

第１の観点の第３の可能な実施例に対する第１の観点の何れかに関し、第１の観点の第４の実施例において、前記対数尤度と、前記正規化項と、前記潜在変数の各々の前記変分分布の対数と、に応じて目的関数を決定することは、前記対数尤度の期待値、前記正規化項の期待値、及び前記潜在変数の各々の前記変分分布の対数の期待値に応じて前記目的関数を決定することを含む。 With respect to any of the first aspects to the third possible embodiment of the first aspect, in the fourth embodiment of the first aspect, the log likelihood, the normalization term, and the latent variable Determining the objective function according to the logarithm of each of the variational distributions is the expected value of the log likelihood, the expected value of the normalization term, and the logarithm of the variational distribution of each of the latent variables. Determining the objective function in accordance with the expected value of.

第１の観点の第４の可能な実施例に関し、第１の観点の第５の実施例において、前記対数尤度の期待値、前記正規化項の期待値、及び前記潜在変数の各々の前記変分分布の対数の期待値に応じて決定された前記目的関数

は、

である。 With respect to a fourth possible embodiment of the first aspect, in the fifth embodiment of the first aspect, the expected value of the log likelihood, the expected value of the normalization term, and each of the latent variables The objective function determined according to the expected logarithm of the variation distribution

Is

It is.

第１の観点の第５の可能な実施例に関し、第１の観点の第６の実施例において、前記目的関数の収束を可能にする前記潜在変数の各々の変分分布及びモデルパラメータを決定することは、
前記潜在変数の各々の更新された変分分布及び更新されたモデルパラメータを取得することと、
前記潜在変数の各々の前記更新された変分分布及び前記更新されたモデルパラメータに応じて前記目的関数が収束するか否かを判別し、前記目的関数が収束しない場合には前記目的関数の収束を可能にする前記潜在変数の各々の前記変分分布及び前記モデルパラメータを取得するまで前記潜在変数の各々の前記更新された変分分布及び前記更新されたモデルパラメータを再取得することと、
を含む。 With respect to the fifth possible embodiment of the first aspect, in the sixth embodiment of the first aspect, the variation distribution and model parameters of each of the latent variables that allow convergence of the objective function are determined. That is
Obtaining an updated variation distribution and updated model parameters for each of the latent variables;
It is determined whether or not the objective function converges according to the updated variation distribution and the updated model parameter of each of the latent variables, and if the objective function does not converge, the objective function converges Re-acquiring the updated variation distribution and the updated model parameters for each of the latent variables until obtaining the variation distribution and the model parameters for each of the latent variables to allow
including.

第１の観点の第６の可能な実施例に関し、第１の観点の第７の実施例において、前記潜在変数の各々の更新された変分分布及び更新されたモデルパラメータを取得することは、
前記目的関数の収束を可能にする前記潜在変数の各々の更新された変分分布を取得するまで、下記の式

を使用することによって前記潜在変数の各々の前記変分分布を交互に更新することと、
下記の式

ここで、

ｔは現在の更新を示し、ｔ−１は前回の更新又は初期設定を示す、
を使用することによって前記目的関数の収束を可能にする前記潜在変数の各々の前記更新された変分分布に応じて前記モデルパラメータを更新することと、
を含む。 Regarding the sixth possible embodiment of the first aspect, in the seventh embodiment of the first aspect, obtaining each updated variation distribution and updated model parameter of the latent variable comprises:
Until obtaining an updated variational distribution of each of the latent variables that allows convergence of the objective function:

Alternately updating the variational distribution of each of the latent variables by using
The following formula

here,

t indicates the current update, t-1 indicates the previous update or initial setting,
Updating the model parameters in response to the updated variational distribution of each of the latent variables that enables convergence of the objective function by using
including.

第１の観点の第６の可能な実施例に関し、第１の観点の第８の実施例において、前記潜在変数の各々の更新された変分分布及び更新されたモデルパラメータを取得することは、
前記更新されたモデルパラメータを取得するために下記の式

ここで、

を使用することによって前記モデルパラメータを更新することと、
前記目的関数の収束を可能にする前記潜在変数の各々の更新された変分分布を取得するために下記の式

ここで、ｔは現在の更新を示し、ｔ−１は前回の更新又は初期設定を示す、
を使用することによって前記更新されたモデルパラメータに応じて前記潜在変数の各々の前記変分分布を交互に更新することと、
を含む。 Regarding the sixth possible embodiment of the first aspect, in the eighth embodiment of the first aspect, obtaining an updated variation distribution and updated model parameters of each of the latent variables comprises:
In order to obtain the updated model parameters:

here,

Updating the model parameters by using
To obtain an updated variational distribution of each of the latent variables that allows convergence of the objective function:

Here, t indicates the current update, t-1 indicates the previous update or initial setting,
Alternately updating the variational distribution of each of the latent variables in response to the updated model parameters by using
including.

第１の観点の第６から８の実施例の何れかに関し、第１の観点の第９の実施例において、前記潜在変数の各々の前記更新された変分分布及び前記更新されたモデルパラメータに応じて前記目的関数が収束するか否かを判別することは、
前記潜在変数の各々の前記更新された変分分布及び前記更新されたモデルパラメータに応じて決定された前記目的関数と、前記潜在変数の各々の前回更新された変分分布及び前回更新されたモデルパラメータに応じて決定された前回取得された目的関数と、の間の距離が閾値より短いか否かを比較することと、
前記潜在変数の各々の前記更新された変分分布及び前記更新されたモデルパラメータに応じて決定された前記目的関数と前記前回取得された目的関数との間の距離が前記閾値より短い場合には前記目的関数が収束すると判別することと、
を含む。 With respect to any of the sixth to eighth embodiments of the first aspect, in the ninth embodiment of the first aspect, the updated variation distribution and each of the updated model parameters of each of the latent variables In response to determining whether the objective function converges,
The objective function determined in response to the updated variation distribution and the updated model parameters of each of the latent variables; the last updated variation distribution and the last updated model of each of the latent variables; Comparing whether the distance between the objective function acquired last time determined according to the parameter is shorter than a threshold,
When the distance between the objective function determined in accordance with the updated variation distribution and the updated model parameter of each of the latent variables and the previously acquired objective function is shorter than the threshold value Determining that the objective function converges;
including.

第２の観点において、提供されるリレーショナルモデル決定用の装置は、
サンプルデータ、少なくとも２つのサンプル属性のグループ、少なくとも２つの潜在変数のグループ、及びモデルパラメータに応じて決定される対数尤度と、正規化項と、前記潜在変数の各々の変分分布の対数と、を取得するように構成された取得モジュールと、
前記対数尤度と、前記正規化項と、前記潜在変数の各々の前記変分分布の対数と、に応じて目的関数を決定するように構成された第１の決定モジュールと、
前記目的関数の収束を可能にする前記潜在変数の各々の変分分布及びモデルパラメータを決定するように構成された第２の決定モジュールと、
前記目的関数の収束を可能にする前記潜在変数の各々の前記変分分布及び前記モデルパラメータに応じてリレーショナルモデルを決定するように構成された第３の決定モジュールと、
を備える。 In a second aspect, provided apparatus for relational model determination is:
Logarithmic likelihood determined according to sample data, a group of at least two sample attributes, a group of at least two latent variables, and model parameters, a normalization term, and a logarithm of the variational distribution of each of the latent variables; An acquisition module configured to acquire, and
A first determination module configured to determine an objective function according to the log likelihood, the normalization term, and the logarithm of the variational distribution of each of the latent variables;
A second determination module configured to determine a variational distribution and model parameters of each of the latent variables that allow convergence of the objective function;
A third determination module configured to determine a relational model as a function of the variational distribution and model parameters of each of the latent variables that allow convergence of the objective function;
Is provided.

第２の観点に関し、第２の観点の第１の可能な実施例において、前記取得モジュールによって取得される前記対数尤度は、

は前記サンプルデータを示し、Ｎ_ｒは行のサンプルの数を示し、Ｎ_ｃは列のサンプルの数を示し、Ａ^Ｒは行のサンプル属性のセットを示し、Ａ^Ｃは列のサンプル属性のセットを示し、Ｚ^Ｒは行の潜在変数行列を示し、Ｚ^ｃは列の潜在変数行列を示し、θはモデルパラメータのセットを示し、前記モデルパラメータはα、β、φ、η、ξを含み、α、βはそれぞれ行と列の混合比であり、φは各サンプルカテゴリにおけるサブモデルパラメータを示し、ηは各サンプルカテゴリにおける行のサンプル属性の前記モデルパラメータを示し、ξは各サンプルカテゴリにおける列のサンプル属性の前記モデルパラメータを示す。 Regarding the second aspect, in the first possible embodiment of the second aspect, the log likelihood obtained by the acquisition module is:

第２の観点に関し、第２の観点の第２の可能な実施例において、前記取得モジュールによって取得される前記正規化項は、

は前記潜在変数の前記変分分布の近似値を示し、

においてａは

を示し、ｂは

を示し、

においてａは

を示し、ｂは

を示し、

においてａは

を示し、ｂは

を示す。 With respect to the second aspect, in a second possible embodiment of the second aspect, the normalization term obtained by the obtaining module is:

Indicates the dimension of ξ _q , L (a, b) = logb + (ab −) / b,

A is

And b is

Indicate

A is

And b is

Indicate

A is

And b is

Indicates.

第２の観点に関し、第２の観点の第３の可能な実施例において、前記取得モジュールによって取得される前記潜在変数の各々の前記変分分布の対数はｌｏｇｑ（Ｚ^Ｒ）及びｌｏｇｑ（Ｚ^ｃ）であり、ｑ（Ｚ^Ｒ）は行の潜在変数Ｚ^Ｒの前記変分分布を示し、ｑ（Ｚ^ｃ）は列の潜在変数Ｚ^ｃの前記変分分布を示す。 Regarding the second aspect, in a third possible embodiment of the second aspect, the logarithm of the variational distribution of each of the latent variables acquired by the acquisition module is logq (Z ^R ) and logq (Z ^c Q (Z ^R ) ^denotes the variation distribution of the latent variable Z ^{R in} the row, and q (Z ^c ) denotes the variation distribution of the latent variable Z ^{c in} the column.

第２の観点の第３の可能な実施例に対する第２の観点の何れかに関し、第２の観点の第４の可能な実施例において、前記第１の決定モジュールは、前記対数尤度の期待値、前記正規化項の期待値、及び前記潜在変数の各々の前記変分分布の対数の期待値に応じて目的関数を決定するように構成されている。 With respect to any of the second aspects to the third possible embodiment of the second aspect, in the fourth possible embodiment of the second aspect, the first decision module expects the log likelihood The objective function is determined according to the value, the expected value of the normalization term, and the expected value of the logarithm of the variation distribution of each of the latent variables.

第２の観点の第４の可能な実施例に関し、第２の観点の第５の可能な実施例において、前記第１の決定モジュールによって決定された前記目的関数

は、

である。 With respect to a fourth possible embodiment of the second aspect, in the fifth possible embodiment of the second aspect, the objective function determined by the first determination module

Is

It is.

第２の観点の第５の可能な実施例に関し、第２の観点の第６の可能な実施例において、前記第２の決定モジュールは、
前記潜在変数の各々の更新された変分分布及び更新されたモデルパラメータを取得するように構成された取得ユニットと、
前記潜在変数の各々の前記更新された変分分布及び前記更新されたモデルパラメータに応じて前記目的関数が収束するか否かを判別するように構成された判別ユニットと、
を含み、
前記取得ユニットは、前記目的関数が収束しない場合には前記目的関数の収束を可能にする前記潜在変数の各々の変分分布及びモデルパラメータを取得するまで前記潜在変数の各々の前記変分分布及び前記更新されたモデルパラメータを再取得するように構成されている。 With respect to the fifth possible embodiment of the second aspect, in the sixth possible embodiment of the second aspect, the second determination module comprises:
An acquisition unit configured to acquire an updated variation distribution and updated model parameters of each of the latent variables;
A discriminating unit configured to discriminate whether or not the objective function converges according to the updated variation distribution and the updated model parameter of each of the latent variables;
Including
The acquisition unit is configured to acquire the variation distribution of each of the latent variables and the variation distribution of each of the latent variables until obtaining the variation distribution and model parameters of each of the latent variables that enable convergence of the objective function if the objective function does not converge. The updated model parameter is re-acquired.

第２の観点の第６の可能な実施例に関し、第２の観点の第７の可能な実施例において、前記取得ユニットは、
前記目的関数の収束を可能にする前記潜在変数の各々の更新された変分分布を取得するまで、下記の式

を使用することによって前記潜在変数の各々の前記変分分布を交互に更新するように構成された第１の更新サブユニットと、
更新されたモデルパラメータを取得するために下記の式

ここで、

ｔは現在の更新を示し、ｔ−１は前回の更新又は初期設定を示す、
を使用することによって前記目的関数の収束を可能にする前記潜在変数の各々の前記更新された変分分布に応じて前記モデルパラメータを更新するように構成された第２の更新サブユニットと、
を含む。 With respect to the sixth possible embodiment of the second aspect, in the seventh possible embodiment of the second aspect, the acquisition unit comprises:
Until obtaining an updated variational distribution of each of the latent variables that allows convergence of the objective function:

A first update subunit configured to alternately update the variational distribution of each of the latent variables by using
To get the updated model parameters:

here,

t indicates the current update, t-1 indicates the previous update or initial setting,
A second update subunit configured to update the model parameters in response to the updated variational distribution of each of the latent variables that enables convergence of the objective function by using
including.

第２の観点の第６の可能な実施例に関し、第２の観点の第８の可能な実施例において、前記取得ユニットは、
前記更新されたモデルパラメータを取得するために下記の式

ここで、

を使用することによって前記モデルパラメータを更新するように構成された第３の更新サブユニットと、
前記目的関数の収束を可能にする前記潜在変数の各々の更新された変分分布を取得するために下記の式

ここで、ｔは現在の更新を示し、ｔ−１は前回の更新又は初期設定を示す、
を使用することによって前記更新されたモデルパラメータに応じて前記潜在変数の各々の前記変分分布を交互に更新するように構成された第４の更新サブユニットと、
を含む。 With respect to the sixth possible embodiment of the second aspect, in the eighth possible embodiment of the second aspect, the acquisition unit comprises:
In order to obtain the updated model parameters:

here,

A third update subunit configured to update the model parameters by using
To obtain an updated variational distribution of each of the latent variables that allows convergence of the objective function:

Here, t indicates the current update, t-1 indicates the previous update or initial setting,
A fourth update subunit configured to alternately update the variational distribution of each of the latent variables in response to the updated model parameters by using
including.

第２の観点の第６から８の実施例に関し、第２の観点の第９の可能な実施例において、前記判別ユニットは、
前記潜在変数の各々の前記更新された変分分布及び前記更新されたモデルパラメータに応じて決定された前記目的関数と、前記潜在変数の各々の前回更新された変分分布及び前回更新されたモデルパラメータに応じて決定された前回取得された目的関数と、の間の距離が閾値より短いか否かを比較するように構成された比較サブユニットと、
前記潜在変数の各々の前記更新された変分分布及び前記更新されたモデルパラメータに応じて決定された前記目的関数と前記前回取得された目的関数との間の距離が前記閾値より短い場合には前記目的関数が収束すると判別するように構成された判別サブユニットと、
を含む。 With respect to the sixth to eighth embodiments of the second aspect, in the ninth possible embodiment of the second aspect, the discrimination unit comprises:
The objective function determined in response to the updated variation distribution and the updated model parameters of each of the latent variables; the last updated variation distribution and the last updated model of each of the latent variables; A comparison subunit configured to compare whether the distance between the previously obtained objective function determined according to the parameter is shorter than a threshold value;
When the distance between the objective function determined in accordance with the updated variation distribution and the updated model parameter of each of the latent variables and the previously acquired objective function is shorter than the threshold value A discrimination subunit configured to determine that the objective function converges;
including.

本開示の実施形態によって提供される技術的解決方法は下記の利点を有する。 The technical solution provided by the embodiments of the present disclosure has the following advantages.

目的関数は、サンプルデータ、少なくとも２つのサンプル属性のグループ、少なくとも２つの潜在変数のグループ、及びモデルパラメータに応じて決定される対数尤度と、正規化項と、潜在変数の各々の変分分布の対数と、に応じて決定される。また、リレーショナルモデルは、目的関数の収束を可能にする潜在変数の各々の変分分布及びモデルパラメータに応じて決定される。このようにして、リレーショナルモデル決定の効率及び精度が改善される。更に、正規化項を使用することによって、モデルの複雑性が自動的に制御されてよい。その結果、モデル決定の効率が改善される。 The objective function includes sample data, a group of at least two sample attributes, a group of at least two latent variables, a log likelihood determined according to model parameters, a normalization term, and a variational distribution of each of the latent variables. Is determined according to the logarithm. The relational model is also determined according to the variational distribution and model parameters of each of the latent variables that enable the objective function to converge. In this way, the efficiency and accuracy of relational model determination is improved. Furthermore, the complexity of the model may be automatically controlled by using normalization terms. As a result, the efficiency of model determination is improved.

本開示の実施形態における技術的解決法をより明確に説明するために、実施形態を記述するために使用される添付の図面は、下記のように簡単に紹介される。明らかに、以下に記載の添付図面は本開示のいくつかの実施形態だけを示し、当業者は創造的な努力なしでこれらの添付図面から更に他の添付図面を導き出してよい。 In order to more clearly describe the technical solutions in the embodiments of the present disclosure, the accompanying drawings used to describe the embodiments are briefly introduced as follows. Apparently, the accompanying drawings described below show only some embodiments of the present disclosure, and those skilled in the art may derive still other attached drawings from these accompanying drawings without creative efforts.

本開示の実施形態１に係るリレーショナルモデル決定用の方法のフローチャートである。3 is a flowchart of a relational model determination method according to the first embodiment of the present disclosure. 本開示の実施形態２に係るリレーショナルモデル決定用の方法のフローチャートである。6 is a flowchart of a relational model determination method according to Embodiment 2 of the present disclosure. 本開示の実施形態３に係るリレーショナルモデル決定用の装置の概略構成図である。It is a schematic block diagram of the apparatus for relational model determination concerning Embodiment 3 of this indication. 本開示の実施形態３に係る第２の決定モジュールの概略構成図である。It is a schematic block diagram of the 2nd determination module which concerns on Embodiment 3 of this indication. 本開示の実施形態３に係る取得ユニットの概略構成図である。It is a schematic block diagram of the acquisition unit which concerns on Embodiment 3 of this indication. 本開示の実施形態３に係る別の取得ユニットの概略構成図である。It is a schematic block diagram of another acquisition unit which concerns on Embodiment 3 of this indication. 本開示の実施形態３に係る判別ユニットの概略構成図である。It is a schematic block diagram of the discrimination | determination unit which concerns on Embodiment 3 of this indication.

本開示の目的、技術的解決方法、及び利点をより明確にするために、本開示の実施形態では、添付図面を参照して以下に詳細に説明する。 In order to make the objectives, technical solutions, and advantages of the present disclosure clearer, the embodiments of the present disclosure are described in detail below with reference to the accompanying drawings.

（実施形態１）
本開示の実施形態は、リレーショナルモデル決定用の方法を提供する。図１を参照すると、方法は下記のステップを備える。 (Embodiment 1)
Embodiments of the present disclosure provide a method for relational model determination. Referring to FIG. 1, the method includes the following steps.

ステップ１０１：サンプルデータ、少なくとも２つのサンプル属性のグループ、少なくとも２つの潜在変数のグループ、及びモデルパラメータに応じて決定される対数尤度と、正規化項と、潜在変数の各々の変分分布の対数と、を取得すること。 Step 101: logarithmic likelihood determined according to sample data, group of at least two sample attributes, group of at least two latent variables, and model parameters, normalization term, and variational distribution of each of the latent variables To get the logarithm.

任意の実施形態のように、サンプルデータ、少なくとも２つのサンプル属性のグループ、少なくとも２つの潜在変数のグループ、及びモデルパラメータに応じて決定される対数尤度は、

であって、ｌｏｇｐ（）は対数尤度を示し、ｐは同時確率密度関数を示し、

はサンプルデータを示し、Ｎ_ｒは行のサンプルの数を示し、Ｎ_ｃは列のサンプルの数を示し、Ａ^Ｒは行のサンプル属性のセットを示し、Ａ^Ｃは列のサンプル属性のセットを示し、Ｚ^Ｒは行の潜在変数行列を示し、Ｚ^ｃは列の潜在変数行列を示し、θはモデルパラメータのセットを示し、モデルパラメータはα、β、φ、η、ξを含み、α、βはそれぞれ行と列の混合比であり、φは各サンプルカテゴリにおけるサブモデルパラメータを示し、ηは各サンプルカテゴリにおける行のサンプル属性のモデルパラメータを示し、ξは各サンプルカテゴリにおける列のサンプル属性のモデルパラメータを示す。 As in any embodiment, the log likelihood determined as a function of sample data, a group of at least two sample attributes, a group of at least two latent variables, and model parameters is:

Represents sample data, the number of samples N _r row, N _c denotes the number of samples of the column, shows a set of sample attributes of A ^R rows, A ^C is the set of samples attribute column shows, Z ^R indicates a potentially variable matrix row, Z ^c represents the latent variable matrix column, theta represents the set of model parameters, the model parameters include alpha, beta, phi, eta, and xi], alpha, β is the row / column mixing ratio, φ is the sub-model parameter for each sample category, η is the model parameter for the row sample attribute for each sample category, and ξ is the column sample attribute for each sample category The model parameters of are shown.

任意の実施形態のように、サンプルデータ、少なくとも２つのサンプル属性のグループ、少なくとも２つの潜在変数のグループ、及びモデルパラメータに応じて決定される正規化項は、

は潜在変数の変分分布の近似値を示し、

は第ｑ列のサンプルカテゴリに対する第ｊ列のサンプルデータの会員を記述するための列の潜在変数を示し、αとβはそれぞれ行と列の混合比であり、Ｄ_αはαの次元を示し、Ｄ_βはβの次元を示し、Ｄ_ｐｑは第ｐ行第ｑ列のサンプルカテゴリにおけるサブモデルパラメータの次元を示し、η_ｐは第ｐ行のサンプルカテゴリにおける行のサンプル属性のモデルパラメータを示し、

はη_ｐの次元を示し、ξ_ｑは第ｑ列のサンプルカテゴリにおける列のサンプル属性のモデルパラメータを示し、

においてａは

を示し、ｂは

を示し、

においてａは

を示し、ｂは

を示し、

においてａは

を示し、ｂは

を示す。 As in any embodiment, the normalization term determined in response to the sample data, the group of at least two sample attributes, the group of at least two latent variables, and the model parameter is:

Is an approximation of the variational distribution of latent variables,

Indicates the latent variables of the column for describing the membership of the sample data of the j-th column for the sample category of the q-th column, α and β are the mixture ratio of rows and columns, respectively, and D _α indicates the dimension of α , D _β indicates the dimension of β, D _pq indicates the dimension of the sub-model parameter in the sample category of the p-th row and q-th column, and η _p indicates the model parameter of the sample attribute of the row in the sample category of the p-th row ,

Indicates the dimension of η _p , ξ _q indicates the model parameter of the column sample attribute in the sample category of column q,

Indicates the dimension of ξ _q , L (a, b) = logb + (ab −) / b,

A is

And b is

Indicate

A is

And b is

Indicate

A is

And b is

Indicates.

任意の実施形態のように、サンプルデータ、少なくとも２つのサンプル属性のグループ、少なくとも２つの潜在変数のグループ、及びモデルパラメータに応じて決定される潜在変数の各々の変分分布の対数はｌｏｇｑ（Ｚ^Ｒ）及びｌｏｇｑ（Ｚ^ｃ）であり、ｑ（Ｚ^Ｒ）は行の潜在変数Ｚ^Ｒの変分分布を示し、ｑ（Ｚ^ｃ）は列の潜在変数Ｚ^ｃの変分分布を示す。 As in any embodiment, the logarithm of the variational distribution of each of the sample data, the group of at least two sample attributes, the group of at least two latent variables, and the latent variable determined according to the model parameters is logq (Z a ^R) and logq ^{(Z c),} shows the variation distribution of the latent variables ^{Z R} of q ^{(Z R)} row, q ^{(Z c)} shows the variation distribution of the latent variables ^{Z c} columns.

ステップ１０２：対数尤度と、正規化項と、潜在変数の各々の変分分布の対数と、に応じて目的関数を決定すること。 Step 102: Determine an objective function according to the log likelihood, the normalization term, and the logarithm of each variational distribution of latent variables.

任意の実施形態のように、対数尤度と、正規化項と、潜在変数の各々の変分分布の対数と、に応じて目的関数を決定することは、対数尤度の期待値、正規化項の期待値、及び潜在変数の各々の変分分布の対数の期待値に応じて目的関数を決定することを含む。 As in any embodiment, determining the objective function according to the log likelihood, the normalization term, and the logarithm of each variational distribution of the latent variable is the expected value of the log likelihood, normalization Determining an objective function according to the expected value of the term and the expected value of the logarithm of each variational distribution of the latent variable.

任意の実施形態のように、対数尤度の期待値、正規化項の期待値、及び潜在変数の各々の変分分布の対数の期待値に応じて決定された目的関数

は、

である。 Objective function determined according to the expected value of the log likelihood, the expected value of the normalization term, and the expected value of the logarithm of each variational distribution of latent variables, as in any embodiment

Is

It is.

ステップ１０３：目的関数の収束を可能にする潜在変数の各々の変分分布及びモデルパラメータを決定し、目的関数の収束を可能にする潜在変数の各々の変分分布及びモデルパラメータに応じてリレーショナルモデルを決定すること。 Step 103: Determine each variation distribution and model parameter of the latent variable that enables convergence of the objective function, and relational model according to each variation distribution and model parameter of the latent variable that enables convergence of the objective function To decide.

任意の実施形態のように、目的関数の収束を可能にする潜在変数の各々の変分分布及びモデルパラメータを決定することは、
潜在変数の各々の更新された変分分布及び更新されたモデルパラメータを取得することと、
潜在変数の各々の更新された変分分布及び更新されたモデルパラメータに応じて目的関数が収束するか否かを判別し、目的関数が収束しない場合には目的関数の収束を可能にする潜在変数の各々の変分分布及び更新されたモデルパラメータを取得するまで潜在変数の各々の更新された変分分布及び更新されたモデルパラメータを再取得することと、
を含む。 As in any embodiment, determining the variational distribution and model parameters of each of the latent variables that allow convergence of the objective function
Obtaining an updated variation distribution and updated model parameters for each of the latent variables;
A latent variable that determines whether or not the objective function converges according to each updated variation distribution and updated model parameter of the latent variable, and allows the objective function to converge if the objective function does not converge Reacquiring each updated variation distribution and updated model parameter of the latent variable until each variation distribution and updated model parameter of
including.

任意の実施形態のように、潜在変数の各々の更新された変分分布及び更新されたモデルパラメータを取得することは、
目的関数の収束を可能にする潜在変数の各々の更新された変分分布を取得するまで、下記の式

を使用することによって潜在変数の各々の変分分布を交互に更新することと、
更新されたモデルパラメータを取得するために下記の式

ここで、

ｔは現在の更新を示し、ｔ−１は前回の更新又は初期設定を示す、
を使用することによって目的関数の収束を可能にする潜在変数の各々の更新された変分分布に応じてモデルパラメータを更新することと、
を含む。 As in any embodiment, obtaining an updated variation distribution and updated model parameters for each of the latent variables
Until we get an updated variational distribution of each of the latent variables that enable the convergence of the objective function:

Alternately updating the variational distribution of each of the latent variables by using
To get the updated model parameters:

here,

t indicates the current update, t-1 indicates the previous update or initial setting,
Updating the model parameters in response to each updated variational distribution of the latent variables that enables convergence of the objective function by using
including.

任意の実施形態のように、潜在変数の各々の更新された変分分布及び更新されたモデルパラメータを取得することは、
更新されたモデルパラメータを取得するために下記の式

ここで、

を使用することによってモデルパラメータを更新することと、
目的関数の収束を可能にする潜在変数の各々の更新された変分分布を取得するために下記の式

ここで、ｔは現在の更新を示し、ｔ−１は前回の更新又は初期設定を示す、
を使用することによって更新されたモデルパラメータに応じて潜在変数の各々の変分分布を交互に更新することと、
を含む。 As in any embodiment, obtaining an updated variation distribution and updated model parameters for each of the latent variables
To get the updated model parameters:

here,

Updating model parameters by using
To obtain an updated variational distribution of each of the latent variables that allow the objective function to converge:

Here, t indicates the current update, t-1 indicates the previous update or initial setting,
Alternately updating the variational distribution of each of the latent variables in response to the model parameters updated by using
including.

任意の実施形態のように、潜在変数の各々の更新された変分分布及び更新されたモデルパラメータに応じて目的関数が収束するか否かを判別することは、
潜在変数の各々の更新された変分分布及び更新されたモデルパラメータに応じて決定された目的関数と、潜在変数の各々の前回更新された変分分布及び前回更新されたモデルパラメータに応じて決定された前回取得された目的関数と、の間の距離が閾値より短いか否かを比較することと、
潜在変数の各々の更新された変分分布及び更新されたモデルパラメータに応じて決定された目的関数と前回取得された目的関数との間の距離が閾値より短い場合には目的関数が収束すると判別することと、
を含む。 As in any embodiment, determining whether the objective function converges according to the updated variational distribution and updated model parameters of each of the latent variables is
Objective function determined according to each updated variation distribution and updated model parameter of latent variable, and determined according to last updated variation distribution and last updated model parameter of each latent variable Comparing whether the distance between the last obtained objective function is less than a threshold value,
It is determined that the objective function converges if the distance between the objective function determined according to each updated variation distribution of the latent variable and the updated model parameter and the objective function acquired last time is shorter than the threshold value. To do
including.

本開示のこの実施形態に応じた方法において、サンプルデータ、少なくとも２つのサンプル属性のグループ、少なくとも２つの潜在変数のグループ、及びモデルパラメータに応じて決定される対数尤度と、正規化項と、潜在変数の各々の変分分布の対数と、に応じて目的関数を決定し、目的関数の収束を可能にする潜在変数の各々の変分分布及びモデルパラメータに応じてリレーショナルモデルを決定することによって、リレーショナルモデル決定の効率及び精度は改善される。更に、正規化項の使用によってモデルの複雑性は自動的に制御されてよいので、モデル決定の効率は改善される。 In a method according to this embodiment of the present disclosure, a log likelihood determined as a function of sample data, a group of at least two sample attributes, a group of at least two latent variables, and a model parameter; and a normalization term; By determining the objective function according to the logarithm of each variational distribution of the latent variable and determining the relational model according to each variational distribution of the latent variable and the model parameters that enable convergence of the objective function The efficiency and accuracy of relational model determination is improved. In addition, the efficiency of model determination is improved because the complexity of the model may be automatically controlled through the use of normalization terms.

（実施形態２）
本開示の実施形態は、リレーショナルモデル決定用の方法を提供する。上記方法の実施形態の内容に関し、本開示のこの実施形態に応じた方法は詳細に記述される。図２を参照すると、方法は下記のステップを備える。 (Embodiment 2)
Embodiments of the present disclosure provide a method for relational model determination. With respect to the content of the above method embodiment, the method according to this embodiment of the present disclosure will be described in detail. Referring to FIG. 2, the method includes the following steps.

ステップ２０１：サンプルデータ、少なくとも２つのサンプル属性のグループ、少なくとも２つの潜在変数のグループ、及びモデルパラメータに応じて決定される対数尤度と、正規化項と、潜在変数の各々の変分分布の対数と、を取得すること。 Step 201: logarithmic likelihood determined according to sample data, group of at least two sample attributes, group of at least two latent variables, and model parameters, normalization term, and variational distribution of each of the latent variables To get the logarithm.

サンプルデータの内容及び次元は、この実施形態において限定されない。特定の実施例中、サンプルデータは複数の映画ついての複数のユーザによって与えられたスコアであってよい。この場合、サンプルデータの次元は２でよい。つまり、スコアの統計は２次元（例えばユーザと映画）から行われる。それにもかかわらず、上記内容及び次元に加えて、サンプルデータは他の内容及び次元を有していて構わない。 The contents and dimensions of the sample data are not limited in this embodiment. In certain embodiments, the sample data may be scores provided by multiple users for multiple movies. In this case, the dimension of the sample data may be 2. That is, the statistics of scores are performed from two dimensions (for example, a user and a movie). Nevertheless, in addition to the above contents and dimensions, the sample data may have other contents and dimensions.

理解を容易にするために、サンプルデータは例として下記に示すように記述される。サンプルデータは、５×５の行列として表される。行列の行はユーザ１からユーザ５までを示し、一方で行列の列は映画１から映画５を示す。行列の任意の要素Ｘ_ｉｊはユーザｉにより与えられた映画ｊについてのスコアを示す。なお、１≦ｉ≦５、１≦ｊ≦５、ｉとｊはいずれも整数である。

For ease of understanding, the sample data is described as an example below. Sample data is represented as a 5 × 5 matrix. The rows of the matrix show users 1 through 5 while the columns of the matrix show movies 1 through 5. An optional element X _ij of the matrix indicates the score for movie j given by user i. 1 ≦ i ≦ 5, 1 ≦ j ≦ 5, i and j are both integers.

サンプル属性は、これに限定されないが、行のサンプル属性、列のサンプル属性等を含む。サンプル属性の具体的内容はこの実施形態において限定されない。特定の実施例中、サンプル属性はサンプルデータに対応するオブジェクトの属性であってよい。理解を容易にするために、サンプルデータはまだ例として上記に示すように記述される。行のサンプル属性は行のサンプルデータに対応するユーザの属性（例えば、年齢、性別）であってよい。一方、列のサンプル属性は列のサンプルデータに対応する映画の属性（例えば、映画の種類、開始時間）であってよい。サンプル属性に応じて対数尤度、正規化項、及び潜在変数の各々の変分分布の対数を決定することによって、リレーショナル情報をクラスタリング用の非リレーショナル情報に連携させることが達成される。つまり、リレーショナルモデルは、リレーショナル情報及び非リレーショナル情報に応じて決定される。 Sample attributes include, but are not limited to, row sample attributes, column sample attributes, and the like. The specific content of the sample attribute is not limited in this embodiment. In certain embodiments, the sample attribute may be an attribute of an object corresponding to the sample data. For ease of understanding, the sample data is still described as shown above by way of example. The row sample attribute may be a user attribute (eg, age, gender) corresponding to the row sample data. On the other hand, the column sample attribute may be a movie attribute (eg, movie type, start time) corresponding to the column sample data. Linking relational information to non-relational information for clustering is accomplished by determining the logarithmic likelihood, normalization term, and logarithm of each variational distribution of latent variables according to sample attributes. That is, the relational model is determined according to relational information and non-relational information.

潜在変数は、これに限定されないが、行の潜在変数、列の潜在変数等を含む。潜在変数の具体的内容はこの実施形態において限定されない。特定の実施例中、潜在変数は、潜在変数行列Ｚであってよい。なお、行列において任意の要素Ｚ_ｉｐは、第ｉ行のサンプルデータが第ｐ行のサンプルカテゴリに従属することを示す。 Latent variables include, but are not limited to, row latent variables, column latent variables, and the like. The specific content of the latent variable is not limited in this embodiment. In certain embodiments, the latent variable may be a latent variable matrix Z. Note that an arbitrary element Z _ip in the matrix indicates that the i-th row sample data is subordinate to the p-th row sample category.

モデルパラメータは、これに限定されないが、行の混合比、列の混合比、各サンプルカテゴリにおけるサブモデルパラメータ等を含む。モデルパラメータの具体的内容はこの実施形態において限定されない。行列形式のサンプルデータを例に取ると、行の混合比は、決定されたリレーショナルモデルにおける行列の行の総数に対する決定されたリレーショナルモデルの各サンプルカテゴリにおける行列の行の数の比である。列の混合比は、決定されたリレーショナルモデルにおける行列の列の総数に対する決定されたリレーショナルモデルの各サンプルカテゴリにおける行列の列の数の比である。また、各サンプルカテゴリにおけるサブモデルパラメータは、決定されたリレーショナルモデルの各サンプルカテゴリにおけるデータ分布のパラメータである。 Model parameters include, but are not limited to, row mix ratio, column mix ratio, sub-model parameters in each sample category, and the like. The specific contents of the model parameters are not limited in this embodiment. Taking sample data in matrix form as an example, the row mixing ratio is the ratio of the number of matrix rows in each sample category of the determined relational model to the total number of rows of the matrix in the determined relational model. The column mixing ratio is the ratio of the number of matrix columns in each sample category of the determined relational model to the total number of matrix columns in the determined relational model. The sub model parameters in each sample category are data distribution parameters in each sample category of the determined relational model.

潜在変数は、モデルパラメータから独立又はモデルパラメータに依存していてよいことに留意すべきである。実際には潜在変数とモデルパラメータとの間に依存性が存在するので、決定されたリレーショナルモデルをより正確にするために、本開示の実施形態は、潜在変数とモデルパラメータとの間に依存性が存在するシナリオを例として取ることによって記述される。 It should be noted that the latent variable may be independent of or dependent on the model parameters. In practice, there is a dependency between the latent variable and the model parameter, so in order to make the determined relational model more accurate, embodiments of the present disclosure provide a dependency between the latent variable and the model parameter. Is described by taking as an example a scenario.

この実施形態に応じた方法において、サンプルデータ、少なくとも２つのサンプル属性のグループ、少なくとも２つの潜在変数のグループ、及びモデルパラメータに応じて決定される対数尤度と、正規化項と、潜在変数の各々の変分分布の対数と、を取得するために、最初に下記に示す同時確率密度関数（joint probability density distribution）が演繹される。

ここで、ｐは同時確率密度関数を示し、

はサンプルデータを示し、Ｎ_ｒは行のサンプルの数を示し、Ｎ_ｃは列のサンプルの数を示し、Ａ^Ｒは行のサンプル属性のセットを示し、Ａ^Ｃは列のサンプル属性のセットを示し、Ｚ^Ｒは行の潜在変数行列を示し、Ｚ^ｃは列の潜在変数行列を示し、θはモデルパラメータのセットを示し、モデルパラメータはα、β、φ、η及びξを含み、α、βはそれぞれ行と列の混合比であり、φは各サンプルカテゴリにおけるサブモデルパラメータを示し、ηは各サンプルカテゴリにおける行のサンプル属性のモデルパラメータを示し、ξは各サンプルカテゴリにおける列のサンプル属性のモデルパラメータを示し、Ｋ_ｒは行のサンプルカテゴリの数を示し、Ｋ_ｃは列のサンプルカテゴリの数を示し、Ｘ_ｉｊは第ｉ行第ｊ列のサンプルデータを示し、φ_ｐｑは第ｐ行第ｑ列のサンプルカテゴリにおけるサブモデルパラメータを示し、

は第ｑ列のサンプルカテゴリに対する第ｊ列のサンプルデータの会員を記述するための列の潜在変数を示し、α_ｐは第ｐ行のサンプルカテゴリの行の混合比を示し、β_ｐは第ｐ列のサンプルカテゴリの列の混合比を示し、

は第ｉ行の行のサンプル属性を示し、

は第ｊ列の列のサンプル属性を示し、η_ｐは第ｐ行のサンプルカテゴリにおける行のサンプル属性のモデルパラメータを示し、ξ_ｐは第ｑ列のサンプルカテゴリにおける列のサンプル属性のモデルパラメータを示す。 In a method according to this embodiment, a log likelihood determined according to sample data, a group of at least two sample attributes, a group of at least two latent variables, and model parameters, a normalization term, and a latent variable To obtain the logarithm of each variational distribution, the joint probability density function shown below is first deduced.

Where p is the joint probability density function,

Represents sample data, the number of samples N _r row, N _c denotes the number of samples of the column, shows a set of sample attributes of A ^R rows, A ^C is the set of samples attribute column shows, Z ^R indicates a potentially variable matrix row, Z ^c represents the latent variable matrix column, theta represents the set of model parameters, the model parameters include the alpha, beta, phi, eta and xi], alpha, β is the row / column mixing ratio, φ is the sub-model parameter for each sample category, η is the model parameter for the row sample attribute for each sample category, and ξ is the column sample attribute for each sample category Model parameters, K _r indicates the number of sample categories in the row, K _c indicates the number of sample categories in the column, X _ij indicates the sample data in the i-th row and j-th column, and φ _pq Indicates the sub-model parameters in the sample category of the p-th row and the q-th column,

Indicates a latent variable in the column for describing the membership of the sample data in the j-th column with respect to the sample category in the q-th column, α _p indicates the mixing ratio of the rows in the sample category in the p-th row, and β _p indicates the p-th Indicates the mixing ratio of the columns in the column sample category,

Indicates the sample attribute of the row i,

Indicates the sample attribute of the column in the j-th column, η _p indicates the model parameter of the sample attribute of the row in the sample category of the p-th row, and ξ _p indicates the model parameter of the sample attribute of the column in the sample category of the q-th column. Show.

上記同時確率密度関数は、リレーショナルモデルの確率密度分布を決定する。リレーショナルモデルの確率密度分布は、同時確率密度関数における潜在変数Ｚ^Ｒ及びＺ^ｃと同様にモデルパラメータα、β、φ、η及びξを決定することによって決定されてよい。また、リレーショナルモデルはこのように決定される。同時確率密度関数を解けるようにするにあたり、下記の確率密度分布の等式の両辺の対数は、対数尤度を取得するために計算される。

The joint probability density function determines the probability density distribution of the relational model. The probability density distribution of the relational model may be determined by determining the model parameters α, β, φ, η, and ξ as well as the latent variables Z ^R and Z ^c in the joint probability density function. The relational model is determined in this way. In order to be able to solve the joint probability density function, the logarithm of both sides of the following probability density distribution equation is calculated to obtain the log likelihood.

特に、サンプルデータ

が行列形式で表される場合には、Ｎ_ｒは行のサンプルの数を示し、Ｎ_ｃは列のサンプルの数を示し、ｕは行のサンプル属性の次元を示し、ｖは列のサンプル属性の次元を示し、Ａ^ＲはＮ_ｒ*ｕの行列を示し、Ａ^ＣはＮ_ｃ*ｖの行列を示し、Ｋ_ｒは行のサンプルカテゴリの数を示し、Ｋ_ｃは列のサンプルカテゴリの数を示す。Ｚ^ＲはＮ_ｒ*Ｋ_ｒのブロック変数行列である。Ｚ^Ｒの各要素は

である。

の場合、それは第ｉ行のサンプルデータが第ｐ行のサンプルカテゴリに属することを示す。Ｚ^ｃはＮ_ｃ*Ｋ_ｃのブロック変数行列である。Ｚ^ｃの各要素は

である。

の場合、それは第ｊ列のサンプルデータが第ｑ列のサンプルカテゴリに属することを示す。行の混合比αは、サンプルデータ行列の行の総数に対するリレーショナルデータモデルの各サンプルカテゴリにおける行の数の比を示す。列の混合比βは、サンプルデータ行列の列の総数に対するリレーショナルデータモデルの各サンプルカテゴリにおける列の数の比を示す。各サブモデルにおけるモデルパラメータφは、リレーショナルモデルにおける各サンプルカテゴリのサンプルデータがサンプルカテゴリにおいて従う分布のパラメータを示す。例えば、各サンプルカテゴリにおけるサンプルデータがガウス分布に従う場合、φはガウス分布における期待値μ及び分散δを示す。他の例として、各サンプルカテゴリのサンプルデータがポアソン分布に従う場合、φは、ポアソン分布における期待値及び分散λを示す。ここで、各サンプルカテゴリにおけるサンプルデータは上記の分布以外の他の分布に従っていてよく、この実施形態において分布は限定されないことに留意すべきである。各サンプルカテゴリにおける行のサンプル属性のモデルパラメータηは各サンプルカテゴリにおけるサンプルに対応する行のオブジェクトの属性を示し、各サンプルカテゴリにおける列のサンプル属性のモデルパラメータξは各サンプルカテゴリにおけるサンプルに対応する列のオブジェクトの属性を示す。 In particular, sample data

Is expressed in matrix form, N _r indicates the number of samples in the row, N _c indicates the number of samples in the column, u indicates the dimension of the sample attribute in the row, and v indicates the sample attribute in the column A ^R represents a matrix of N _r * u, A ^C represents a matrix of N _c * v, K _r represents the number of sample categories in a row, and K _c represents the number of sample categories in a column Indicates. Z ^R is a block variable matrix of N _r * K _r . Each element of the Z ^R is

It is.

, It indicates that the sample data in the i-th row belongs to the sample category in the p-th row. Z ^c is a block variable matrix of N _c * K _c . Each element of the Z ^c is

It is.

In the case of, it indicates that the sample data in the j-th column belongs to the sample category in the q-th column. The row mixing ratio α indicates the ratio of the number of rows in each sample category of the relational data model to the total number of rows in the sample data matrix. The column mixing ratio β indicates the ratio of the number of columns in each sample category of the relational data model to the total number of columns in the sample data matrix. The model parameter φ in each sub model indicates a distribution parameter that the sample data of each sample category in the relational model follows in the sample category. For example, when sample data in each sample category follows a Gaussian distribution, φ indicates an expected value μ and a variance δ in the Gaussian distribution. As another example, when sample data of each sample category follows a Poisson distribution, φ indicates an expected value and a variance λ in the Poisson distribution. Here, it should be noted that the sample data in each sample category may follow a distribution other than the above distribution, and the distribution is not limited in this embodiment. The model parameter η of the sample attribute of the row in each sample category indicates the attribute of the object of the row corresponding to the sample in each sample category, and the model parameter ξ of the sample attribute of the column in each sample category corresponds to the sample in each sample category Indicates the object attribute of the column.

は潜在変数の変分分布の近似値を示し、

においてａは

を示し、ｂは

を示し、

においてａは

を示し、ｂは

を示し、

においてａは

を示し、ｂは

を示す。従って、正規化項は下記のように拡張されてよい。

As in any embodiment, the normalization term determined in response to the sample data, the group of at least two sample attributes, the group of at least two latent variables, and the model parameter is:

Is an approximation of the variational distribution of latent variables,

Indicates the dimension of ξ _q , L (a, b) = logb + (ab −) / b,

A is

And b is

Indicate

A is

And b is

Indicate

A is

And b is

Indicates. Thus, the normalization term may be extended as follows:

特に、Ｋ_ｒがブロック変数行列Ｚ^Ｒの行のサンプルカテゴリの数を示す場合には、Ｄ_α＝Ｄ（α）＝Ｋ_ｒ−１である。Ｋ_ｃがブロック変数行列Ｚ^Ｒの列のサンプルカテゴリの数を示す場合には、Ｄ_β＝Ｄ（β）＝Ｋ_ｃ−１である。各サンプルカテゴリにおけるサンプルデータがガウス分布に従い、ガウス分布の期待値と分散がそれぞれμとδ、即ち、ガウス分布が２つのパラメータを有する場合には、Ｄ_ｋｌ＝Ｄ（φ_ｋｌ）＝２である。各サンプルカテゴリにおけるサンプルデータがポアソン分布に従い、ポアソン分布の期待値と分散がいずれもλ、即ち、ポアソン分布が１つのパラメータλを有する場合には、Ｄ_ｋｌ＝Ｄ（φ_ｋｌ）＝１である。 In particular, when K _r indicates the number of sample categories in the row of the block variable matrix Z ^R , D _α = D (α) = K _r −1. When K _c indicates the number of sample categories in the column of the block variable matrix Z ^R , D _β = D (β) = K _c −1. If the sample data in each sample category follows a Gaussian distribution and the expected value and variance of the Gaussian distribution are μ and δ, that is, if the Gaussian distribution has two parameters, D _kl = D (φ _kl ) = 2. . If the sample data in each sample category follows a Poisson distribution and the expected value and variance of the Poisson distribution are both λ, ie, the Poisson distribution has one parameter λ, then D _kl = D (φ _kl ) = 1. .

更に、潜在変数の変分分布の近似値

は、この実施形態において具体的に限定されず、前回更新された潜在変数又は初期設定から取得された更新された潜在変数の変分分布の値を含むがこれに限定されない。理解を容易にするために、この実施形態では、例として、前回更新された潜在変数又は潜在変数の変分分布の近似値

である初期設定から取得された更新された潜在変数、の変分分布の値を採用することによって説明する。正規化項を初めて決定する場合、潜在変数の変数分布の近似値は、初期設定から取得された更新された潜在変数の変数分布の値であってよい。正規化項を決定することが初めてではない場合、潜在変数の変数分布の近似値は、前回更新された潜在変数の変分分布の値であってよい。 Furthermore, the approximate value of the variation distribution of the latent variable

Is not specifically limited in this embodiment, and includes, but is not limited to, the previously updated latent variable or the value of the variation distribution of the updated latent variable obtained from the initial setting. In order to facilitate understanding, in this embodiment, as an example, the last updated latent variable or the approximate value of the variation distribution of the latent variable is used.

This will be explained by adopting the variation distribution value of the updated latent variable acquired from the initial setting. When the normalization term is determined for the first time, the approximate value of the variable distribution of the latent variable may be the updated value of the variable distribution of the latent variable obtained from the initial setting. If it is not the first time to determine the normalization term, the approximate value of the variable distribution of the latent variable may be the value of the variational distribution of the latent variable updated last time.

ここで、正規化項を決定することによって、決定されたリレーショナルモデルの複雑性が自動的に制御されてよいし、リレーショナルモデルを決定することの効率が改善されることに留意すべきである。 It should be noted here that by determining the normalization term, the complexity of the determined relational model may be automatically controlled and the efficiency of determining the relational model is improved.

特に、Ｚ^Ｒの潜在変数の変分分布は下記のように表現されてよい。

In particular, variation distribution of the latent variables Z ^R may be expressed as follows.

Ｚ^ｃの潜在変数の変分分布は下記のように表現されてよい。

Variational distribution of the latent variables Z ^c may be expressed as follows.

ステップ２０２：対数尤度の期待値、正規化項の期待値、及び潜在変数の各々の変分分布の対数の期待値に応じて目的関数を決定すること。 Step 202: Determine an objective function according to the expected value of the log likelihood, the expected value of the normalization term, and the expected value of the logarithm of each variation distribution of the latent variable.

対数尤度は、ステップ２０１において因子化された表現として記載されている。対数尤度を解けるようにするにあたり、下記に示す因子化情報量基準（Factorized Information Criterion）（ＦＩＣ）を一例とするタイトな下界（tight lower bound）を取得するために、各因子にラプラス近似が行われる。

ここで、

ＦＩＣが最大の場合、

はθの値を示す。 The log likelihood is described as a factorized expression in step 201. In order to solve the log likelihood, Laplace approximation is applied to each factor in order to obtain a tight lower bound (Factorized Information Criterion) (FIC) as an example. Done.

here,

If FIC is the largest,

Indicates the value of θ.

更に、ＦＩＣがサンプルデータ

と潜在変数Ｚ^Ｒ及びＺ^ｃとを含むため、解答は、通常、期待値最大化（ＥＭ）アルゴリズムを介して行われる。しかしながら、リレーショナルモデルは、従属潜在変数によって決定されるので、従来のＥＭアルゴリズムは、ＦＩＣの解法に適用できない。ＦＩＣを解けるようにするにあたり、この実施形態は、ＦＩＣの漸近的に一様な下界を取得するために、下記に示すスケーリングＦＩＣの方法を採用する。

ここで、

関数はＬ（ａ，ｂ）＝ｌｏｇｂ＋(ａ−ｂ)／ｂである。 In addition, FIC has sample data

And the latent variables Z ^R and Z ^c , the answer is usually done via an Expectation Maximization (EM) algorithm. However, since the relational model is determined by the dependent latent variables, the conventional EM algorithm cannot be applied to the FIC solution. In order to be able to solve the FIC, this embodiment employs the scaling FIC method described below to obtain an asymptotically uniform lower bound of the FIC.

here,

The function is L (a, b) = logb + (ab) / b.

は、

Is

It is.

更に、目的関数は、上記ステップによって決定される。目的関数を介してリレーショナル関数を決定するために、この実施形態に応じた方法は下記に示す後続のステップを更に含む。 Furthermore, the objective function is determined by the above steps. In order to determine the relational function via the objective function, the method according to this embodiment further includes the following steps shown below.

ステップ２０３：潜在変数の各々の更新された変分分布及び更新されたモデルパラメータを取得すること。 Step 203: Obtain an updated variation distribution and updated model parameters for each of the latent variables.

初期設定の方法は、この実施形態に限定されない。特定の実施例において、初期設定は、確率論的な方法によって行われてよい。即ち、α、β、φ、η及びξの値は、確率論的に初期設定される。しかし、上記の方法に加えて、他の方法も可能である。 The initial setting method is not limited to this embodiment. In certain embodiments, initialization may be done by a probabilistic method. That is, the values of α, β, φ, η, and ξ are initialized stochastically. However, in addition to the above methods, other methods are possible.

更に、現在の潜在変数Ｚ^Ｒの変分分布が計算されるときに前回の潜在変数Ｚ^ｃの変分分布が使用されるように、目的関数の収束を可能にする潜在変数の各々の更新された変分分布が取得されるまで、潜在変数の各々の変分分布を交互に更新すること、即ち、潜在変数Ｚ^ｃの変分分布と潜在変数Ｚ^Ｒの変分分布とを交互に更新することが必要とされる。潜在変数の各々の変分分布を収束する条件は、この実施形態において限定されない。特定の実施例において、行の潜在変数Ｚ^Ｒについて、行の潜在変数の現在の変分分布と行の潜在変数の前回の変分分布との間のユークリッド距離が計算されてもよい。計算されたユークリッド距離が距離の閾値より短い場合には、行の潜在変数の現在の変分分布が収束していると判別される。距離の閾値の値は、実際の状態に応じて設定されてよいが、この実施形態において限定されない。 Further, variation distribution of the previous latent variable Z ^c is to be used, updated for each latent variable to allow the convergence of the objective function when the variation distribution of the current latent variable Z ^R is calculated was to variational distribution is obtained, updating the variational distribution of each latent variable alternately, i.e., updates the variation distribution of the latent variables Z ^R variational distribution of the latent variable Z ^c alternately Is needed. The condition for converging each variation distribution of the latent variable is not limited in this embodiment. In certain embodiments, the latent variable Z ^R of the line, the Euclidean distance between the previous variation distribution of current variation distribution and line latent variables latent variables line may be calculated. If the calculated Euclidean distance is shorter than the distance threshold, it is determined that the current variation distribution of the latent variable in the row has converged. The distance threshold value may be set according to the actual state, but is not limited in this embodiment.

それにもかかわらず、潜在変数の各々の更新された変分分布が収束しているか否かを判別する方法に加え、更新回数を設定する方法において潜在変数の各々の収束された変分分布が取得されてもよい。この方法において、特定の実施例中、更新回数が更新回数の閾値に達した場合には、潜在変数の各々の変分分布が収束すると判別され、目的関数の収束を可能にする潜在変数の各々の更新された変分分布が取得される。更新回数の閾値の設定は、この実施形態において限定されない。 Nevertheless, in addition to the method of determining whether each updated variation distribution of the latent variable has converged, the method of setting the number of updates obtains each converged variation distribution of the latent variable. May be. In this method, in a specific embodiment, when the update count reaches the update count threshold, it is determined that each variation distribution of the latent variable converges, and each of the latent variables that enable convergence of the objective function The updated variational distribution of is obtained. The setting of the threshold value for the number of updates is not limited in this embodiment.

任意の実施形態のように、潜在変数の各々の更新された変分分布及び更新されたモデルパラメータを取得する上記方法に加え、この実施形態において提供される方法は（潜在変数の各々の更新された変分分布及び更新されたモデルパラメータを取得する下記の方法に限定されないが）、
更新されたモデルパラメータを取得するために下記の式

を使用することによってモデルパラメータを取得することと、
目的関数の収束を可能にする潜在変数の各々の更新された変分分布を取得するために下記の式

ここで、ｔは現在の更新を示し、ｔ−１は前回の更新又は初期設定を示す、
を使用することによって更新されたモデルパラメータに応じて潜在変数の各々の変分分布を交互に更新することと、
を更に含む。 In addition to the above method of obtaining an updated variational distribution and updated model parameters for each of the latent variables, as in any embodiment, the method provided in this embodiment is (updated for each of the latent variables. But not limited to the following methods of obtaining the variation distribution and updated model parameters),
To get the updated model parameters:

Obtaining model parameters by using
To obtain an updated variational distribution of each of the latent variables that allow the objective function to converge:

Here, t indicates the current update, t-1 indicates the previous update or initial setting,
Alternately updating the variational distribution of each of the latent variables in response to the model parameters updated by using
Is further included.

ステップ２０３において上記の式を使用することにより潜在変数の各々の更新された変分分布及び更新されたモデルパラメータが初めて取得された場合、ｔ−１が初期設定を示すので、ｔ−１に対応するパラメータは初期値である。例えば、ステップ２０３において潜在変数の各々の更新された変分分布及び更新されたモデルパラメータが初めて取得された場合、上記の式における

はα_ｐの初期値を示し、

はβ_ｑの初期値を示す。なお、初期設定の方法はこの実施形態において限定されない。特定の実施例において、確率論的な初期設定の方法は、ｑ（Ｚ^Ｒ）及びｑ（Ｚ^Ｃ）を初期化するために使用されてよい。それにもかかわらず、確率論的な初期設定の方法に加え、他の方法も可能であってよい。 When the updated variation distribution and updated model parameters of each of the latent variables are obtained for the first time by using the above formula in step 203, t-1 represents the initial setting and corresponds to t-1. The parameters to be set are initial values. For example, if the updated variation distribution and updated model parameters of each of the latent variables are obtained for the first time in step 203,

Indicates the initial value of α _p ,

Denotes the initial value of beta _q. The initial setting method is not limited in this embodiment. In certain embodiments, a stochastic initialization method may be used to initialize q (Z ^R ) and q (Z ^C ). Nevertheless, in addition to the stochastic initialization method, other methods may be possible.

ステップ２０３において上記の式を使用することにより潜在変数の各々の更新された変分分布及び更新されたモデルパラメータを取得することが初めてではない場合、ｔ−１は前回の更新を示すので、ｔ−１に対応するパラメータは前回更新された値である。例えば、ステップ２０３において潜在変数の各々の更新された変分分布及び更新されたモデルパラメータを３回取得した場合、上記の式における

は潜在変数の各々の更新された変分分布及び更新されたモデルパラメータを２回取得した場合に取得されたα_ｐの値を示し、

は潜在変数の各々の更新された変分分布及び更新されたモデルパラメータを２回取得した場合に取得されたβ_ｑの値を示す。 If it is not the first time to obtain each updated variation distribution and updated model parameter of the latent variable by using the above equation in step 203, t-1 indicates the previous update, so t The parameter corresponding to -1 is the value updated last time. For example, if the updated variation distribution and updated model parameters of each latent variable are obtained three times in step 203,

Indicates the updated variation distribution of each latent variable and the value of α _p obtained when the updated model parameter is acquired twice,

Indicates the updated variation distribution of each latent variable and the value of β _q acquired when the updated model parameter is acquired twice.

更に、潜在変数の各々の変分分布が更新されたモデルパラメータに応じて更新された場合、目的関数の収束を可能にする潜在変数の各々の更新された変分分布が取得されるまで、潜在変数Ｚ^Ｒの変分分布と潜在変数Ｚ^ｃの変分分布とを交互に更新することも必要とされる。 Furthermore, if each variation distribution of the latent variable is updated in response to the updated model parameters, the latent variation is acquired until each updated variation distribution of the latent variable that allows convergence of the objective function is obtained. It is also a need to update the variation distribution of the variational distribution and latent variables Z ^c of the variable Z ^R alternately.

加えて、特定の実施例において、行のサンプル成分の異なる数Ｋ_ｒ及び列のサンプル成分の異なる数Ｋ_ｃが設定されてよい。例えば、Ｋ_ｒの最小値がＫ_ｒｍｉｎとして設定され、一方でＫ_ｒの最大値がＫ_ｒｍａｘとして設定される。また、Ｋ_ｃの最小値がＫ_ｃｍｉｎとして設定され、一方でＫ_ｃの最大値がＫ_ｃｍａｘとして設定される。Ｋ_ｒとＫ_ｃの範囲内で、Ｋ_ｒとＫ_ｃの各値の組み合わせについて、潜在変数の各々の更新された変分分布及び更新されたモデルパラメータが取得される。 In addition, in certain embodiments, a different number K _r of sample components in a row and a different number K _c of sample components in a column may be set. For example, the minimum value of _{K r} is set as _{K rmin,} whereas the maximum value of _{K r} is set as _{K rmax} at. The minimum value of _{K c} is set as _{K cmin,} whereas the maximum value of _{K c} is set as _{K cmax} with. Within the K _r and K _c, for each combination of values of K _r and K _c, each variation distribution and the updated model parameters are updated in the latent variable is obtained.

潜在変数の各々の更新された変分分布及び更新されたモデルパラメータの取得時において、目的関数の収束を可能にする潜在変数の各々の更新された変分分布が取得されるまで、潜在変数の各々の変分分布が最初に交互に更新されてよいこと、及び、更新されたモデルパラメータを取得するために、モデルパラメータが潜在変数の各々の更新された変分分布に応じて更新されることに留意すべきである。又は、更新されたモデルパラメータを取得するためにモデルパラメータが最初に更新されてよいこと、及び、目的関数の収束を可能にする潜在変数の各々の更新された変分分布を取得するために、潜在変数の各々の変分分布が更新されたモデルパラメータに応じて交互に更新されることに留意すべきである。つまり、潜在変数の各々の変分分布とモデルパラメータの更新順序はこの実施形態において限定されない。 At the time of obtaining each updated variation distribution and updated model parameter of the latent variable, until each updated variation distribution of the latent variable that enables convergence of the objective function is obtained, Each variation distribution may be updated alternately first and the model parameters are updated according to each updated variation distribution of the latent variables to obtain updated model parameters Should be noted. Or, to obtain updated model parameters, the model parameters may be updated first, and to obtain an updated variational distribution of each of the latent variables that enable convergence of the objective function, It should be noted that the variation distribution of each of the latent variables is updated alternately according to the updated model parameters. That is, the variation distribution of each latent variable and the update order of model parameters are not limited in this embodiment.

ステップ２０４：潜在変数の各々の更新された変分分布及び更新されたモデルパラメータに応じて目的関数が収束するか否かを判別し、目的関数が収束しない場合には目的関数の収束を可能にする潜在変数の各々の変分分布及びモデルパラメータを取得するまで潜在変数の各々の更新された変分分布及び更新されたモデルパラメータを再取得すること。 Step 204: It is determined whether or not the objective function converges according to each updated variation distribution of the latent variables and the updated model parameters. If the objective function does not converge, the objective function can be converged. Reacquiring each updated variation distribution and updated model parameter of the latent variable until each variation distribution and model parameter of the latent variable is acquired.

特に、潜在変数の各々の更新された変分分布及び更新されたモデルパラメータに応じて目的関数が収束するか否かを判別することは、これに限定されないが、
潜在変数の各々の更新された変分分布及び更新されたモデルパラメータに応じて決定された目的関数と、潜在変数の各々の前回更新された変分分布及び前回更新されたモデルパラメータに応じて決定された前回取得された目的関数と、の間の距離が閾値より短いか否かを比較することと、
潜在変数の各々の更新された変分分布及び更新されたモデルパラメータに応じて決定された目的関数と前回取得された目的関数との間の距離が閾値より短い場合には目的関数が収束すると判別することと、
を含む。 In particular, it is not limited to determining whether the objective function converges according to each updated variation distribution of the latent variables and the updated model parameters,
Objective function determined according to each updated variation distribution and updated model parameter of latent variable, and determined according to last updated variation distribution and last updated model parameter of each latent variable Comparing whether the distance between the last obtained objective function is less than a threshold value,
It is determined that the objective function converges if the distance between the objective function determined according to each updated variation distribution of the latent variable and the updated model parameter and the objective function acquired last time is shorter than the threshold value. To do
including.

閾値の大きさはこの実施形態において限定されない。特定の実施例では、異なる閾値は、サンプルデータの容量に応じて設定されて構わない。目的関数は潜在変数の各々の更新された変分分布及び更新されたモデルパラメータによって決定されるので、目的関数は対数尤度に連続的に近似されてよい。目的関数が収束する場合、対数尤度の値は目的関数の値に近似されるので、解けない対数尤度は解ける対数尤度に変換されてよく、リレーショナルモデルの決定はこのように実現される。 The magnitude of the threshold is not limited in this embodiment. In a specific embodiment, different threshold values may be set according to the volume of sample data. Since the objective function is determined by each updated variation distribution of latent variables and the updated model parameters, the objective function may be continuously approximated to log likelihood. When the objective function converges, the log-likelihood value is approximated to the objective function value, so the log likelihood that cannot be solved may be converted into the log likelihood that can be solved, and the determination of the relational model is realized in this way .

目的関数が収束しないと判別された場合、及び潜在変数の各々の更新された変分分布及び更新されたモデルパラメータが再取得される場合、処理はステップ２０３に戻り、ステップ２０３に記述された方法において潜在変数の各々の更新された変分分布及び更新されたモデルパラメータが再取得される。潜在変数の更新された変分分布及び更新されたモデルパラメータが初めて取得された場合、ステップ２０３の式においてｔ−１は初期値を示す。しかしながら、潜在変数の更新された変分分布及び更新されたモデルパラメータを再取得するためにステップ２０３に戻った場合、ステップ２０３の式においてｔ−１は前回の更新を示す。例えば、ステップ２０３の式を使用することにより潜在変数の更新された変分分布及び更新されたモデルパラメータが初めて取得された場合、上記の式におけるｔ−１に対応するパラメータは初期値であり、初めて取得された潜在変数の更新された変分分布及び更新されたモデルパラメータが取得される。初めて取得された潜在変数の更新された変分分布及び更新されたモデルパラメータが目的関数を収束させられない場合、初めて取得された潜在変数の更新された変分分布及び更新されたモデルパラメータは、ステップ２０３においてｔ−１に対応するパラメータの値として使用される。それから、潜在変数の更新された変分分布及び更新されたモデルパラメータが再取得され、再取得された潜在変数の更新された変分分布及び更新されたモデルパラメータが目的関数を収束させられるか否かが判断される。更新は、収束を可能にする潜在変数の各々の変分分布及びモデルパラメータが取得されるまでこのような方法で繰り返される。 If it is determined that the objective function does not converge, and if the updated variational distribution and updated model parameters of each of the latent variables are reacquired, the process returns to step 203 and the method described in step 203 At which the updated variational distribution and updated model parameters of each of the latent variables are reacquired. When the updated variation distribution of the latent variable and the updated model parameter are acquired for the first time, t−1 indicates an initial value in the expression of Step 203. However, when returning to step 203 to reacquire the updated variation distribution of latent variables and the updated model parameters, t-1 in the equation of step 203 indicates the previous update. For example, when the updated variation distribution of latent variables and the updated model parameter are obtained for the first time by using the equation of step 203, the parameter corresponding to t-1 in the above equation is an initial value, The updated variation distribution of the latent variable acquired for the first time and the updated model parameter are acquired. If the updated variation distribution and updated model parameter of the latent variable acquired for the first time cannot converge the objective function, the updated variation distribution and updated model parameter of the latent variable acquired for the first time are In step 203, it is used as a parameter value corresponding to t-1. Then, the updated variation distribution of the latent variable and the updated model parameter are reacquired, and whether the updated variation distribution and the updated model parameter of the reacquired latent variable can converge the objective function. Is judged. The update is repeated in this manner until the variational distribution and model parameters of each of the latent variables that allow convergence are obtained.

ステップ２０５：目的関数の収束を可能にする潜在変数の各々の変分分布及びモデルパラメータに応じてリレーショナルモデルを決定すること。 Step 205: Determine a relational model according to the variational distribution and model parameters of each of the latent variables that enable convergence of the objective function.

このステップにおいて、収束時の目的関数の値が対数尤度に近似すると、目的関数の収束を可能にする潜在変数の各々の変分分布及びモデルパラメータに応じてリレーショナルモデルが決定されてよい。 In this step, when the value of the objective function at the time of convergence approximates the logarithmic likelihood, a relational model may be determined according to each variation distribution and model parameter of the latent variable that enables the objective function to converge.

更に、行のサンプルカテゴリの異なる数Ｋ_ｒ及び列のサンプルカテゴリの異なる数Ｋ_ｃが設定されてよい。また、Ｋ_ｒとＫ_ｃの各値の組み合わせについて、潜在変数の各々の更新された変分分布及び更新されたモデルパラメータが取得されると、目的関数の最大値を可能にするＫ_ｒとＫ_ｃは、目的関数の収束を可能にすることに基づいて選択されてよい。更に、リレーショナルモデルは、Ｋ_ｒとＫ_ｃの使用により計算された潜在変数の各々の変分分布及びモデルパラメータに応じて決定される。 Furthermore, the number K _c of different sample categories number K _r and column different sample categories of lines may be set. Also, K combinations of values of _r and K _c, as each updated variational distribution and updated model parameters of latent variables are acquired, K _r and K that enables the maximum value of the objective function _c may be selected based on enabling convergence of the objective function. Furthermore, the relational model is determined according to the variational distribution and the model parameters for each of the calculated latent variables by use of the K _r and K _c.

行のサンプルカテゴリの値Ｋ_ｒ及び列のサンプルカテゴリの値Ｋ_ｃは、決定されたリレーショナルモデルにおける行のサンプルカテゴリの値及び列のサンプルカテゴリの値と同じであってもよいし、異なっていてもよいことに留意すべきである。つまり、リレーショナルモデルの構造は、リレーショナルモデルを決定する処理の間に自動的に調整されてよい。 The row sample category value K _r and the column sample category value K _c may be the same as or different from the row sample category value and the column sample category value in the determined relational model. It should be noted that That is, the structure of the relational model may be automatically adjusted during the process of determining the relational model.

決定されたリレーショナルモデルは、データのクラスタリング及びデータのカテゴライゼーションに適用されてよい。決定されたリレーショナルモデルがデータのクラスタリングに適用された場合、リレーショナルモデルを決定する処理はクラスタリングデータの処理である。決定されたリレーショナルモデルがデータのカテゴライゼーションに適用された場合、リレーショナルモデルを決定するために更なる処理を行うことも必要とされる。データクラスタリング及びカテゴライゼーションの結果は顧客分析、生物分析、土地分析（geoanalysis）等に対して使用されてよいので、多くの社会的価値及び経済的価値が発生する。 The determined relational model may be applied to data clustering and data categorization. When the determined relational model is applied to data clustering, the process of determining the relational model is a process of clustering data. If the determined relational model is applied to data categorization, further processing is also required to determine the relational model. Since the results of data clustering and categorization may be used for customer analysis, biological analysis, geoanalysis, etc., many social and economic values are generated.

本開示のこの実施形態に応じた方法において、目的関数は、サンプルデータ、少なくとも２つのサンプル属性のグループ、少なくとも２つの潜在変数のグループ、及びモデルパラメータに応じて決定される対数尤度と、正規化項と、潜在変数の各々の変分分布の対数と、に応じて決定される。また、リレーショナルモデルは、目的関数の収束を可能にする潜在変数の各々の変分分布及びモデルパラメータに応じて決定される。この方法において、リレーショナルモデル決定の効率及び精度は改善される。更に、正規化項を使用することによってモデルの複雑性は自動的に制御されてよいので、モデル決定の効率は改善される。 In the method according to this embodiment of the present disclosure, the objective function includes: log data determined according to sample data, at least two sample attribute groups, at least two latent variable groups, and model parameters; And the logarithm of each variational distribution of the latent variable. The relational model is also determined according to the variational distribution and model parameters of each of the latent variables that enable the objective function to converge. In this way, the efficiency and accuracy of relational model determination is improved. Furthermore, the efficiency of model determination is improved because the complexity of the model may be automatically controlled by using the normalization term.

（実施形態３）
本開示の実施形態は、リレーショナルモデル決定用の装置を提供する。図３を参照すると、装置は、
サンプルデータ、少なくとも２つのサンプル属性のグループ、少なくとも２つの潜在変数のグループ、及びモデルパラメータに応じて決定される対数尤度と、正規化項と、潜在変数の各々の変分分布の対数と、を取得するように構成された取得モジュール３０１と、
対数尤度と、正規化項と、潜在変数の各々の変分分布の対数と、に応じて目的関数を決定するように構成された第１の決定モジュール３０２と、
目的関数の収束を可能にする潜在変数の各々の変分分布及びモデルパラメータを決定するように構成された第２の決定モジュール３０３と、
目的関数の収束を可能にする潜在変数の各々の変分分布及びモデルパラメータに応じてリレーショナルモデルを決定するように構成された第３の決定モジュール３０４と、
を備える。 (Embodiment 3)
Embodiments of the present disclosure provide an apparatus for relational model determination. Referring to FIG. 3, the device
Logarithmic likelihood determined according to sample data, at least two sample attribute groups, at least two latent variable groups, and model parameters, a normalization term, and a logarithm of each variational distribution of latent variables; An acquisition module 301 configured to acquire
A first determination module 302 configured to determine an objective function as a function of log likelihood, normalization term, and logarithm of each variational distribution of latent variables;
A second determination module 303 configured to determine variation distributions and model parameters of each of the latent variables that enable convergence of the objective function;
A third determination module 304 configured to determine a relational model as a function of each variational distribution of latent variables and model parameters that enable convergence of the objective function;
Is provided.

任意の実施形態のように、取得モジュール３０１によって取得される対数尤度は、

はサンプルデータを示し、Ｎ_ｒは行のサンプルの数を示し、Ｎ_ｃは列のサンプルの数を示し、Ａ^Ｒは行のサンプル属性のセットを示し、Ａ^Ｃは列のサンプル属性のセットを示し、Ｚ^Ｒは行の潜在変数行列を示し、Ｚ^ｃは列の潜在変数行列を示し、θはモデルパラメータのセットを示し、モデルパラメータはα、β、φ、η、ξを含み、α、βはそれぞれ行と列の混合比であり、φは各サンプルカテゴリにおけるサブモデルパラメータを示し、ηは各サンプルカテゴリにおける行のサンプル属性のモデルパラメータを示し、ξは各サンプルカテゴリにおける列のサンプル属性のモデルパラメータを示す。 As in any embodiment, the log likelihood obtained by the acquisition module 301 is

任意の実施形態のように、取得モジュール３０１によって取得される正規化項は、

は潜在変数の変分分布の近似値を示し、

においてａは

を示し、ｂは

を示し、

においてａは

を示し、ｂは

を示し、

においてａは

を示し、ｂは

を示す。 As in any embodiment, the normalization term obtained by the acquisition module 301 is

Is an approximation of the variational distribution of latent variables,

Indicates the dimension of ξ _q , L (a, b) = logb + (ab −) / b,

A is

And b is

Indicate

A is

And b is

Indicate

A is

And b is

Indicates.

任意の実施形態のように、取得モジュール３０１によって取得される潜在変数の各々の変分分布の対数はｌｏｇｑ（Ｚ^Ｒ）及びｌｏｇｑ（Ｚ^ｃ）であり、ｑ（Ｚ^Ｒ）は行の潜在変数Ｚ^Ｒの変分分布を示し、ｑ（Ｚ^ｃ）は列の潜在変数Ｚ^ｃの変分分布を示す。 As in any embodiment, the logarithm of the variational distribution of each of the latent variables acquired by the acquisition module 301 is logq (Z ^R ) and logq (Z ^c ), where q (Z ^R ) is the row latent variable. shows the variation distribution of ^Z ^R, q (Z ^c) shows the variation distribution of the latent variables ^{Z c} columns.

任意の実施形態のように、第１の決定モジュール３０２によって決定された目的関数

は、

である。 Objective function determined by the first determination module 302, as in any embodiment

Is

It is.

任意の実施形態のように、図４を参照すると、第２の決定モジュール３０３は、
潜在変数の各々の更新された変分分布及び更新されたモデルパラメータを取得するように構成された取得ユニット３０３１と、
潜在変数の各々の更新された変分分布及び更新されたモデルパラメータに応じて目的関数が収束するか否かを判別するように構成された判別ユニット３０３２と、
を含み、
取得ユニット３０３１は、目的関数が収束しない場合には目的関数の収束を可能にする潜在変数の各々の変分分布及びモデルパラメータを取得するまで潜在変数の各々の変分分布及び更新されたモデルパラメータを再取得するように構成されている。 As in any embodiment, referring to FIG. 4, the second determination module 303
An acquisition unit 3031 configured to acquire an updated variation distribution and updated model parameters of each of the latent variables;
A discriminating unit 3032 configured to discriminate whether or not the objective function converges according to each updated variation distribution of the latent variables and the updated model parameters;
Including
The acquisition unit 3031 obtains each variation distribution and model parameter of each latent variable until obtaining each variation distribution and model parameter of the latent variable that enables convergence of the objective function if the objective function does not converge. Is configured to re-acquire.

任意の実施形態のように、図５を参照すると、取得ユニット３０３１は、
目的関数の収束を可能にする潜在変数の各々の更新された変分分布を取得するまで、下記の式

を使用することによって潜在変数の各々の変分分布を交互に更新するように構成された第１の更新サブユニット３０３１１と、
更新されたモデルパラメータを取得するために下記の式

ｔは現在の更新を示し、ｔ−１は前回の更新又は初期設定を示す、
を使用することによって目的関数の収束を可能にする潜在変数の各々の更新された変分分布に応じてモデルパラメータを更新するように構成された第２の更新サブユニット３０３１２と、
を含む。 As in any embodiment, referring to FIG.
Until we get an updated variational distribution of each of the latent variables that enable the convergence of the objective function:

A first update subunit 30311 configured to alternately update the variational distribution of each of the latent variables by using
To get the updated model parameters:

t indicates the current update, t-1 indicates the previous update or initial setting,
A second update subunit 30312 configured to update the model parameters in response to each updated variational distribution of the latent variable that enables convergence of the objective function by using
including.

任意の実施形態のように、図６を参照すると、取得ユニット３０３１は、
更新されたモデルパラメータを取得するために下記の式

を使用することによってモデルパラメータを更新するように構成された第３の更新サブユニット３０３１３と、
目的関数の収束を可能にする潜在変数の各々の更新された変分分布を取得するために下記の式

ここで、ｔは現在の更新を示し、ｔ−１は前回の更新又は初期設定を示す、
を使用することによって更新されたモデルパラメータに応じて潜在変数の各々の変分分布を交互に更新するように構成された第４の更新サブユニット３０３１４と、
を含む。 As in any embodiment, referring to FIG.
To get the updated model parameters:

A third update subunit 30313 configured to update model parameters by using
To obtain an updated variational distribution of each of the latent variables that allow the objective function to converge:

Here, t indicates the current update, t-1 indicates the previous update or initial setting,
A fourth update subunit 30314 configured to alternately update the variational distribution of each of the latent variables in response to the model parameters updated by using
including.

任意の実施形態のように、図７を参照すると、判別ユニット３０３２は、
潜在変数の各々の更新された変分分布及び更新されたモデルパラメータに応じて決定された目的関数と、潜在変数の各々の前回更新された変分分布及び前回更新されたモデルパラメータに応じて決定された前回取得された目的関数と、の間の距離が閾値より短いか否かを比較するように構成された比較サブユニット３０３２１と、
潜在変数の各々の更新された変分分布及び更新されたモデルパラメータに応じて決定された目的関数と前回取得された目的関数との間の距離が閾値より短い場合には目的関数が収束すると判別するように構成された判別サブユニット３０３２２と、
を含む。 As in any embodiment, referring to FIG.
Objective function determined according to each updated variation distribution and updated model parameter of latent variable, and determined according to last updated variation distribution and last updated model parameter of each latent variable A comparison subunit 30321 configured to compare whether the distance between the previously obtained objective function is shorter than a threshold;
It is determined that the objective function converges if the distance between the objective function determined according to each updated variation distribution of the latent variable and the updated model parameter and the objective function acquired last time is shorter than the threshold value. A discrimination subunit 30322 configured to:
including.

結論として、本開示の実施形態に応じた装置において、目的関数は、サンプルデータ、少なくとも２つのサンプル属性のグループ、少なくとも２つの潜在変数のグループ、及びモデルパラメータに応じて決定される対数尤度と、正規化項と、潜在変数の各々の変分分布の対数と、に応じて決定される。また、リレーショナルモデルは、目的関数の収束を可能にする潜在変数の各々の変分分布及びモデルパラメータに応じて決定される。この方法において、リレーショナルモデル決定の効率及び精度は改善される。更に、正規化項を使用することによってモデルの複雑性は自動的に制御されてよいので、モデル決定の効率は改善される。 In conclusion, in an apparatus according to an embodiment of the present disclosure, the objective function includes log data determined according to sample data, a group of at least two sample attributes, a group of at least two latent variables, and model parameters. , Depending on the normalization term and the logarithm of each variational distribution of the latent variable. The relational model is also determined according to the variational distribution and model parameters of each of the latent variables that enable the objective function to converge. In this way, the efficiency and accuracy of relational model determination is improved. Furthermore, the efficiency of model determination is improved because the complexity of the model may be automatically controlled by using the normalization term.

上記実施形態によって提供されるリレーショナルモデル決定用の装置がリレーショナルモデルを決定する場合における上記全ての機能モジュールの分割は例にすぎないことに留意すべきである。実際には、上記機能は必要に応じて異なる機能モジュールに分散されていてよい、即ち、装置の内部構造は上述の全て又は一部の機能を実施するために異なる機能モジュールに分割されていてよい。加えて、上記実施形態において提供されるリレーショナルモデル決定用の装置及びリレーショナルモデル決定用の方法は同一の思想であり、装置の具体的な実施処理は方法の実施形態を参照するが、ここでは繰り返さない。 It should be noted that the division of all the functional modules in the case where the relational model determination apparatus provided by the above embodiment determines the relational model is only an example. In practice, the above functions may be distributed in different functional modules as required, i.e. the internal structure of the device may be divided into different functional modules to implement all or some of the functions described above. . In addition, the relational model determination apparatus and the relational model determination method provided in the above embodiment have the same idea, and the specific implementation process of the apparatus refers to the method embodiment, but is repeated here. Absent.

本開示の上記実施形態の連番は単に説明の目的で提供されたものにすぎず、実施形態の優先度を示すものではない。 The serial numbers of the above-described embodiments of the present disclosure are provided merely for the purpose of explanation, and do not indicate the priority of the embodiments.

当業者は、前述の方法のステップの全部又は一部がハードウェア又は関連付けられたハードウェアに指示するプログラムによって実行されてよいことを理解すべきである。プログラムは、コンピュータ読み取り可能な記録媒体に格納されてよい。記録媒体は、読み出し専用メモリ、磁気ディスク、光ディスク等であってよい。 One skilled in the art should understand that all or part of the method steps described above may be performed by a program that instructs the hardware or associated hardware. The program may be stored in a computer-readable recording medium. The recording medium may be a read only memory, a magnetic disk, an optical disk or the like.

上記は、単に本開示の好ましい実施形態であって、本発明を限定するものではない。本開示の精神及び原理から逸脱することなく行われる種々の修正、同等の置換、又は改良は、本発明の保護範囲内に含まれるべきである。 The above are merely preferred embodiments of the present disclosure and are not intended to limit the present invention. Various modifications, equivalent substitutions, or improvements made without departing from the spirit and principle of the present disclosure should be included in the protection scope of the present invention.

（付記１）
サンプルデータ、少なくとも２つのサンプル属性のグループ、少なくとも２つの潜在変数のグループ、及びモデルパラメータに応じて決定される対数尤度と、正規化項と、前記潜在変数の各々の変分分布の対数と、を取得することと、
前記対数尤度と、前記正規化項と、前記潜在変数の各々の前記変分分布の対数と、に応じて目的関数を決定することと、
前記目的関数の収束を可能にする前記潜在変数の各々の変分分布及びモデルパラメータを決定し、前記目的関数の収束を可能にする前記潜在変数の各々の前記変分分布及び前記モデルパラメータに応じてリレーショナルモデルを決定することと、
を備えるリレーショナルモデル決定用の方法。 (Appendix 1)
Logarithmic likelihood determined according to sample data, a group of at least two sample attributes, a group of at least two latent variables, and model parameters, a normalization term, and a logarithm of the variational distribution of each of the latent variables; Getting, and
Determining an objective function according to the log likelihood, the normalization term, and the logarithm of the variational distribution of each of the latent variables;
Determining a variational distribution and model parameters of each of the latent variables enabling convergence of the objective function, and depending on the variational distribution and model parameters of each of the latent variables enabling convergence of the objective function To determine the relational model
A method for determining a relational model comprising:

（付記２）
前記サンプルデータ、前記少なくとも２つのサンプル属性のグループ、前記少なくとも２つの潜在変数のグループ、及び前記モデルパラメータに応じて決定される前記対数尤度は、

は前記サンプルデータを示し、Ｎ_ｒは行のサンプルの数を示し、Ｎ_ｃは列のサンプルの数を示し、Ａ^Ｒは行のサンプル属性のセットを示し、Ａ^Ｃは列のサンプル属性のセットを示し、Ｚ^Ｒは行の潜在変数行列を示し、Ｚ^ｃは列の潜在変数行列を示し、θはモデルパラメータのセットを示し、前記モデルパラメータはα、β、φ、η、ξを含み、α、βはそれぞれ行と列の混合比であり、φは各サンプルカテゴリにおけるサブモデルパラメータを示し、ηは各サンプルカテゴリにおける行のサンプル属性の前記モデルパラメータを示し、ξは各サンプルカテゴリにおける列のサンプル属性の前記モデルパラメータを示す、
付記１に記載の方法。 (Appendix 2)
The log likelihood determined in response to the sample data, the at least two sample attribute groups, the at least two latent variable groups, and the model parameters is:

Represents the sample data, the number of samples N _r row, N _c denotes the number of samples of the column, shows a set of sample attributes of A ^R rows, A ^C is the set of samples attribute column are shown, indicates the latent variable matrix Z ^R rows, Z ^c represents the latent variable matrix column, theta represents the set of model parameters, the model parameters include alpha, beta, phi, eta, and xi], α and β are the row and column mixing ratios, φ indicates the sub-model parameter in each sample category, η indicates the model parameter of the row sample attribute in each sample category, and ξ indicates the column in each sample category Indicates the model parameter of the sample attribute of
The method according to appendix 1.

（付記３）
前記サンプルデータ、前記少なくとも２つのサンプル属性のグループ、前記少なくとも２つの潜在変数のグループ、及び前記モデルパラメータに応じて決定される前記正規化項は、

は前記潜在変数の前記変分分布の近似値を示し、

においてａは

を示し、ｂは

を示し、

においてａは

を示し、ｂは

を示し、

においてａは

を示し、ｂは

を示す、
付記１に記載の方法。 (Appendix 3)
The normalization term determined in response to the sample data, the at least two sample attribute groups, the at least two latent variable groups, and the model parameters,

Indicates the dimension of ξ _q , L (a, b) = logb + (ab −) / b,

A is

And b is

Indicate

A is

And b is

Indicate

A is

And b is

Showing,
The method according to appendix 1.

（付記４）
前記サンプルデータ、前記少なくとも２つのサンプル属性のグループ、前記少なくとも２つの潜在変数のグループ、及び前記モデルパラメータに応じて決定される前記潜在変数の各々の前記変分分布の対数はｌｏｇｑ（Ｚ^Ｒ）及びｌｏｇｑ（Ｚ^ｃ）であり、ｑ（Ｚ^Ｒ）は行の潜在変数Ｚ^Ｒの前記変分分布を示し、ｑ（Ｚ^ｃ）は列の潜在変数Ｚ^ｃの前記変分分布を示す付記１に記載の方法。 (Appendix 4)
The logarithm of the variational distribution of each of the sample data, the at least two sample attribute groups, the at least two latent variable groups, and each of the latent variables determined according to the model parameters is logq (Z ^R ) And logq (Z ^c ), q (Z ^R ) ^denotes the variation distribution of the latent variable Z ^{R in} the row, and q (Z ^c ) denotes the variation distribution of the latent variable Z ^{c in} the column. The method described in 1.

（付記５）
前記対数尤度と、前記正規化項と、前記潜在変数の各々の前記変分分布の対数と、に応じて目的関数を決定するステップは、前記対数尤度の期待値、前記正規化項の期待値及び前記潜在変数の各々の前記変分分布の対数の期待値に応じて前記目的関数を決定することを含む付記１乃至４の何れかに記載の方法。 (Appendix 5)
The step of determining an objective function according to the log likelihood, the normalization term, and the logarithm of the variational distribution of each latent variable includes the expected value of the log likelihood, the normalization term The method according to any one of appendices 1 to 4, further comprising: determining the objective function according to an expected value and an expected value of the logarithm of the variation distribution of each of the latent variables.

（付記６）
前記対数尤度の期待値、前記正規化項の期待値、及び前記潜在変数の各々の前記変分分布の対数の期待値に応じて決定された前記目的関数

は、

である付記５に記載の方法。 (Appendix 6)
The objective function determined according to the expected value of the log likelihood, the expected value of the normalization term, and the expected value of the logarithm of the variation distribution of each of the latent variables

Is

The method according to appendix 5, wherein

(付記７)
前記目的関数の収束を可能にする前記潜在変数の各々の変分分布及びモデルパラメータを決定するステップは、
前記潜在変数の各々の更新された変分分布及び更新されたモデルパラメータを取得することと、
前記潜在変数の各々の前記更新された変分分布及び前記更新されたモデルパラメータに応じて前記目的関数が収束するか否かを判別し、前記目的関数が収束しない場合には前記目的関数の収束を可能にする前記潜在変数の各々の前記変分分布及び前記更新されたモデルパラメータを取得するまで前記潜在変数の各々の前記更新された変分分布及び前記更新されたモデルパラメータを再取得することと、
を含む付記６に記載の方法。 (Appendix 7)
Determining a variational distribution and model parameters for each of the latent variables that allow convergence of the objective function;
Obtaining an updated variation distribution and updated model parameters for each of the latent variables;
It is determined whether or not the objective function converges according to the updated variation distribution and the updated model parameter of each of the latent variables, and if the objective function does not converge, the objective function converges Re-acquiring the updated variation distribution and the updated model parameters for each of the latent variables until obtaining the variation distribution and the updated model parameters for each of the latent variables. When,
The method according to appendix 6, comprising:

（付記８）
前記潜在変数の各々の更新された変分分布及び更新されたモデルパラメータを取得するステップは、
前記目的関数の収束を可能にする前記潜在変数の各々の更新された変分分布を取得するまで、下記の式

を使用することによって前記潜在変数の各々の前記変分分布を交互に更新することと、
更新されたモデルパラメータを取得するために下記の式

ここで、

ｔは現在の更新を示し、ｔ−１は前回の更新又は初期設定を示す、
を使用することによって前記目的関数の収束を可能にする前記潜在変数の各々の前記更新された変分分布に応じて前記モデルパラメータを更新することと、
を含む付記７に記載の方法。 (Appendix 8)
Obtaining an updated variation distribution and an updated model parameter for each of the latent variables;
Until obtaining an updated variational distribution of each of the latent variables that allows convergence of the objective function:

here,

t indicates the current update, t-1 indicates the previous update or initial setting,
Updating the model parameters in response to the updated variational distribution of each of the latent variables that enables convergence of the objective function by using
The method according to appendix 7, including:

（付記９）
前記潜在変数の各々の更新された変分分布及び更新されたモデルパラメータを取得するステップは、
前記更新されたモデルパラメータを取得するために下記の式

ここで、

ここで、ｔは現在の更新を示し、ｔ−１は前回の更新又は初期設定を示す、
を使用することによって前記更新されたモデルパラメータに応じて前記潜在変数の各々の前記変分分布を交互に更新することと、
を含む付記７に記載の方法。 (Appendix 9)
Obtaining an updated variation distribution and an updated model parameter for each of the latent variables;
In order to obtain the updated model parameters:

here,

Here, t indicates the current update, t-1 indicates the previous update or initial setting,
Alternately updating the variational distribution of each of the latent variables in response to the updated model parameters by using
The method according to appendix 7, including:

（付記１０）
前記潜在変数の各々の前記更新された変分分布及び前記更新されたモデルパラメータに応じて前記目的関数が収束するか否かを判別するステップは、
前記潜在変数の各々の前記更新された変分分布及び前記更新されたモデルパラメータに応じて決定された前記目的関数と、前記潜在変数の各々の前回更新された変分分布及び前回更新されたモデルパラメータに応じて決定された前回取得された目的関数と、の間の距離が閾値より短いか否かを比較することと、
前記潜在変数の各々の前記更新された変分分布及び前記更新されたモデルパラメータに応じて決定された前記目的関数と前記前回取得された目的関数との間の距離が前記閾値より短い場合には前記目的関数が収束すると判別することと、
を含む付記７乃至９の何れかに記載の方法。 (Appendix 10)
Determining whether the objective function converges according to the updated variation distribution and the updated model parameters of each of the latent variables;
The objective function determined in response to the updated variation distribution and the updated model parameters of each of the latent variables; the last updated variation distribution and the last updated model of each of the latent variables; Comparing whether the distance between the objective function acquired last time determined according to the parameter is shorter than a threshold,
When the distance between the objective function determined in accordance with the updated variation distribution and the updated model parameter of each of the latent variables and the previously acquired objective function is shorter than the threshold value Determining that the objective function converges;
The method according to any one of appendices 7 to 9, including:

（付記１１）
サンプルデータ、少なくとも２つのサンプル属性のグループ、少なくとも２つの潜在変数のグループ、及びモデルパラメータに応じて決定される対数尤度と、正規化項と、前記潜在変数の各々の変分分布の対数と、を取得するように構成された取得モジュールと、
前記対数尤度と、前記正規化項と、前記潜在変数の各々の前記変分分布の対数と、に応じて目的関数を決定するように構成された第１の決定モジュールと、
前記目的関数の収束を可能にする前記潜在変数の各々の変分分布及びモデルパラメータを決定するように構成された第２の決定モジュールと、
前記目的関数の収束を可能にする前記潜在変数の各々の前記変分分布及び前記モデルパラメータに応じてリレーショナルモデルを決定するように構成された第３の決定モジュールと、
を備えるリレーショナルモデル決定用の装置。 (Appendix 11)
Logarithmic likelihood determined according to sample data, a group of at least two sample attributes, a group of at least two latent variables, and model parameters, a normalization term, and a logarithm of the variational distribution of each of the latent variables; An acquisition module configured to acquire, and
A first determination module configured to determine an objective function according to the log likelihood, the normalization term, and the logarithm of the variational distribution of each of the latent variables;
A second determination module configured to determine a variational distribution and model parameters of each of the latent variables that allow convergence of the objective function;
A third determination module configured to determine a relational model as a function of the variational distribution and model parameters of each of the latent variables that allow convergence of the objective function;
A device for determining a relational model.

（付記１２）
前記取得モジュールによって取得される前記対数尤度は、

は前記サンプルデータを示し、Ｎ_ｒは行のサンプルの数を示し、Ｎ_ｃは列のサンプルの数を示し、Ａ^Ｒは行のサンプル属性のセットを示し、Ａ^Ｃは列のサンプル属性のセットを示し、Ｚ^Ｒは行の潜在変数行列を示し、Ｚ^ｃは列の潜在変数行列を示し、θはモデルパラメータのセットを示し、前記モデルパラメータはα、β、φ、η、ξを含み、α、βはそれぞれ行と列の混合比であり、φは各サンプルカテゴリにおけるサブモデルパラメータを示し、ηは各サンプルカテゴリにおける行のサンプル属性の前記モデルパラメータを示し、ξは各サンプルカテゴリにおける列のサンプル属性の前記モデルパラメータを示す、
付記１１に記載の装置。 (Appendix 12)
The log likelihood acquired by the acquisition module is:

Represents the sample data, the number of samples N _r row, N _c denotes the number of samples of the column, shows a set of sample attributes of A ^R rows, A ^C is the set of samples attribute column are shown, indicates the latent variable matrix Z ^R rows, Z ^c represents the latent variable matrix column, theta represents the set of model parameters, the model parameters include alpha, beta, phi, eta, and xi], α and β are the row and column mixing ratios, φ indicates the sub-model parameter in each sample category, η indicates the model parameter of the row sample attribute in each sample category, and ξ indicates the column in each sample category Indicates the model parameter of the sample attribute of
The apparatus according to appendix 11.

（付記１３）
前記取得モジュールによって取得される前記正規化項は、

は前記潜在変数の前記変分分布の近似値を示し、

においてａは

を示し、ｂは

を示し、

においてａは

を示し、ｂは

を示し、

においてａは

を示し、ｂは

を示す、
付記１１に記載の装置。 (Appendix 13)
The normalization term acquired by the acquisition module is:

Indicates the dimension of ξ _q , L (a, b) = logb + (ab −) / b,

A is

And b is

Indicate

A is

And b is

Indicate

A is

And b is

Showing,
The apparatus according to appendix 11.

（付記１４）
前記取得モジュールによって取得される前記潜在変数の各々の前記変分分布の対数はｌｏｇｑ（Ｚ^Ｒ）及びｌｏｇｑ（Ｚ^ｃ）であり、ｑ（Ｚ^Ｒ）は行の潜在変数Ｚ^Ｒの前記変分分布を示し、ｑ（Ｚ^ｃ）は列の潜在変数Ｚ^ｃの前記変分分布を示す付記１１に記載の装置。 (Appendix 14)
The logarithm of the variation distribution of each of the latent variables acquired by the acquisition module is logq (Z ^R ) and logq (Z ^c ), where q (Z ^R ) is the variation of the latent variable Z ^R of the row. The apparatus according to claim 11, wherein the distribution indicates a distribution, and q (Z ^c ) indicates the variation distribution of the latent variable Z ^c of the column.

（付記１５）
前記第１の決定モジュールは、前記対数尤度の期待値、前記正規化項の期待値、及び前記潜在変数の各々の前記変分分布の対数の期待値に応じて目的関数を決定するように構成されている付記１１乃至１４の何れかに記載の装置。 (Appendix 15)
The first determination module determines an objective function according to an expected value of the log likelihood, an expected value of the normalization term, and an expected value of the logarithm of the variation distribution of each of the latent variables. 15. The apparatus according to any one of appendices 11 to 14, which is configured.

（付記１６）
前記第１の決定モジュールによって決定された前記目的関数

は、

である付記１５に記載の装置。 (Appendix 16)
The objective function determined by the first determination module;

Is

The apparatus according to appendix 15, wherein

（付記１７）
前記第２の決定モジュールは、
前記潜在変数の各々の更新された変分分布及び更新されたモデルパラメータを取得するように構成された取得ユニットと、
前記潜在変数の各々の前記更新された変分分布及び前記更新されたモデルパラメータに応じて前記目的関数が収束するか否かを判別するように構成された判別ユニットと、
を含み、
前記取得ユニットは、前記目的関数が収束しない場合には前記目的関数の収束を可能にする前記潜在変数の各々の変分分布及びモデルパラメータを取得するまで前記潜在変数の各々の前記変分分布及び前記更新されたモデルパラメータを再取得するように構成されている、
付記１６に記載の装置。 (Appendix 17)
The second determination module is
An acquisition unit configured to acquire an updated variation distribution and updated model parameters of each of the latent variables;
A discriminating unit configured to discriminate whether or not the objective function converges according to the updated variation distribution and the updated model parameter of each of the latent variables;
Including
The acquisition unit is configured to acquire the variation distribution of each of the latent variables and the variation distribution of each of the latent variables until obtaining the variation distribution and model parameters of each of the latent variables that enable convergence of the objective function if the objective function does not converge. Configured to re-acquire the updated model parameters;
The apparatus according to appendix 16.

（付記１８）
前記取得ユニットは、
前記目的関数の収束を可能にする前記潜在変数の各々の更新された変分分布を取得するまで、下記の式

ここで、

ｔは現在の更新を示し、ｔ−１は前回の更新又は初期設定を示す、
を使用することによって前記目的関数の収束を可能にする前記潜在変数の各々の前記更新された変分分布に応じて前記モデルパラメータを更新するように構成された第２の更新サブユニットと、
を含む付記１７に記載の装置。 (Appendix 18)
The acquisition unit is
Until obtaining an updated variational distribution of each of the latent variables that allows convergence of the objective function:

here,

t indicates the current update, t-1 indicates the previous update or initial setting,
A second update subunit configured to update the model parameters in response to the updated variational distribution of each of the latent variables that enables convergence of the objective function by using
The apparatus according to appendix 17, comprising:

（付記１９）
前記取得ユニットは、
前記更新されたモデルパラメータを取得するために下記の式

ここで、

ここで、ｔは現在の更新を示し、ｔ−１は前回の更新又は初期設定を示す、
を使用することによって前記更新されたモデルパラメータに応じて前記潜在変数の各々の前記変分分布を交互に更新するように構成された第４の更新サブユニットと、
を含む付記１７に記載の装置。 (Appendix 19)
The acquisition unit is
In order to obtain the updated model parameters:

here,

Here, t indicates the current update, t-1 indicates the previous update or initial setting,
A fourth update subunit configured to alternately update the variational distribution of each of the latent variables in response to the updated model parameters by using
The apparatus according to appendix 17, comprising:

（付記２０）
前記判別ユニットは、
前記潜在変数の各々の前記更新された変分分布及び前記更新されたモデルパラメータに応じて決定された前記目的関数と、前記潜在変数の各々の前回更新された変分分布及び前回更新されたモデルパラメータに応じて決定された前回取得された目的関数と、の間の距離が閾値より短いか否かを比較するように構成された比較サブユニットと、
前記潜在変数の各々の前記更新された変分分布及び前記更新されたモデルパラメータに応じて決定された前記目的関数と前記前回取得された目的関数との間の距離が前記閾値より短い場合には前記目的関数が収束すると判別するように構成された判別サブユニットと、
を含む付記１７乃至１９の何れかに記載の装置。 (Appendix 20)
The discrimination unit is
The objective function determined in response to the updated variation distribution and the updated model parameters of each of the latent variables; the last updated variation distribution and the last updated model of each of the latent variables; A comparison subunit configured to compare whether the distance between the previously obtained objective function determined according to the parameter is shorter than a threshold value;
When the distance between the objective function determined in accordance with the updated variation distribution and the updated model parameter of each of the latent variables and the previously acquired objective function is shorter than the threshold value A discrimination subunit configured to determine that the objective function converges;
The device according to any one of appendices 17 to 19, including:

Claims

Logarithmic likelihood determined according to sample data, a group of at least two sample attributes, a group of at least two latent variables, and model parameters, a normalization term, and a logarithm of the variational distribution of each of the latent variables; An acquisition module configured to acquire, and
A first determination module configured to determine an objective function according to the log likelihood, the normalization term, and the logarithm of the variational distribution of each of the latent variables;
A second determination module configured to determine a variational distribution and model parameters of each of the latent variables that allow convergence of the objective function;
A third determination module configured to determine a relational model as a function of the variational distribution and model parameters of each of the latent variables that allow convergence of the objective function;
A device for determining a relational model.

The log likelihood acquired by the acquisition module is:

Represents the sample data, the number of samples N _r row, N _c denotes the number of samples of the column, shows a set of sample attributes of A ^R rows, A ^C is the set of samples attribute column are shown, indicates the latent variable matrix Z ^R rows, Z ^c represents the latent variable matrix column, theta represents the set of model parameters, the model parameters include alpha, beta, phi, eta, and xi], α and β are the row and column mixing ratios, φ indicates the sub-model parameter in each sample category, η indicates the model parameter of the row sample attribute in each sample category, and ξ indicates the column in each sample category Indicates the model parameter of the sample attribute of
The apparatus of claim 1.

The normalization term acquired by the acquisition module is:

Indicates the dimension of ξ _q , L (a, b) = logb + (ab −) / b,

A is

And b is

Indicate

A is

And b is

Indicate

A is

And b is

Showing,
The apparatus of claim 1.

The logarithm of the variation distribution of each of the latent variables acquired by the acquisition module is logq (Z ^R ) and logq (Z ^c ), where q (Z ^R ) is the variation of the latent variable Z ^R of the row. The apparatus according to claim 1, wherein q is a distribution, and q (Z ^c ) is the variational distribution of the latent variable Z ^c of the column.

The first determination module determines an objective function according to an expected value of the log likelihood, an expected value of the normalization term, and an expected value of the logarithm of the variation distribution of each of the latent variables. The device according to claim 1, wherein the device is configured.

The objective function determined by the first determination module;

Is

6. The apparatus of claim 5, wherein

The second determination module is
An acquisition unit configured to acquire an updated variation distribution and updated model parameters of each of the latent variables;
A discriminating unit configured to discriminate whether or not the objective function converges according to the updated variation distribution and the updated model parameter of each of the latent variables;
Including
The acquisition unit is configured to acquire the variation distribution of each of the latent variables and the variation distribution of each of the latent variables until obtaining the variation distribution and model parameters of each of the latent variables that enable convergence of the objective function if the objective function does not converge. Configured to re-acquire the updated model parameters;
The apparatus according to claim 6.

The acquisition unit is
Until obtaining an updated variational distribution of each of the latent variables that allows convergence of the objective function:

here,

t indicates the current update, t-1 indicates the previous update or initial setting,
A second update subunit configured to update the model parameters in response to the updated variational distribution of each of the latent variables that enables convergence of the objective function by using
The apparatus of claim 7 comprising:

The acquisition unit is
In order to obtain the updated model parameters:

here,

Here, t indicates the current update, t-1 indicates the previous update or initial setting,
A fourth update subunit configured to alternately update the variational distribution of each of the latent variables in response to the updated model parameters by using
The apparatus of claim 7 comprising:

Logarithmic likelihood determined according to sample data, a group of at least two sample attributes, a group of at least two latent variables, and model parameters, a normalization term, and a logarithm of the variational distribution of each of the latent variables; Getting, and
Determining an objective function according to the log likelihood, the normalization term, and the logarithm of the variational distribution of each of the latent variables;
Determining a variational distribution and model parameters of each of the latent variables enabling convergence of the objective function, and depending on the variational distribution and model parameters of each of the latent variables enabling convergence of the objective function To determine the relational model
A method for determining a relational model comprising: