JP2015181001A

JP2015181001A - Method and device for determining relational model

Info

Publication number: JP2015181001A
Application number: JP2015051729A
Authority: JP
Inventors: ロフン; Lu Fung; シュンチェンリュウ; shun cheng Liu; 遼平藤巻; Ryohei Fujimaki
Original assignee: NEC China Co Ltd
Current assignee: NEC China Co Ltd
Priority date: 2014-03-18
Filing date: 2015-03-16
Publication date: 2015-10-15
Also published as: CN104933015A

Abstract

PROBLEM TO BE SOLVED: To automatically control complexity of a relational model by improving efficiency and accuracy for determining the relational model.SOLUTION: A method for determining a relational model determines a target function according to sample data, a logarithmic likelihood determined according to a model parameter and at least two latent variables for describing the sample category of the sample data, a normalization term, and a logarithm of variation distribution of each of the latent variables (102); and determines a relational model according to the model parameter and the variation distribution of each of the latent variables enabling convergence of the target function (103).

Description

本開示は、統計の技術分野に関する。特に、リレーショナルモデル決定用の方法と装置に関する。 The present disclosure relates to the technical field of statistics. In particular, it relates to a method and apparatus for relational model determination.

統計的手法の一定の発展に伴い、オブジェクト間のリレーショナル情報用のモデリングは、注目の話題になっている。オブジェクト間には様々なリレーショナル情報が存在し、例えば、ソーシャルネットワーク内の人々の間の連絡先情報、インターネット上のページ間のリンクリレーショナル情報が存在する。様々なリレーショナル情報は、１つのカテゴリにおけるオブジェクト間又は複数のカテゴリにおけるオブジェクト間の相関を表現する。また、リレーショナル情報についての分析により、より多くの貴重な情報が取得されてもよい。従って、近年、リレーショナル情報に基づいた実際の適用は発展を続けている。リレーショナル情報を分析するために、モデル駆動型の方法が広く使用されている。リレーショナルモデリング用の１つの重要なモチベーションは、リレーショナルネットワークの基礎となる潜在的なグループ構造（エンティティの接続性に基づいていくつかのグループに分割されたエンティティ）を明らかにすることである。例えば、映画会社が上映中の映画についてユーザのコメントを取得したい場合、同社は種々の映画用のユーザの評価を収集し、リレーショナルモデルを使用してユーザと映画を異なるグループに分割する。このように、同時クラスタリングは、ユーザ、映画、評価用に成し遂げられ、クラスタリングの結果は、更なる分析用に使用される。このプロセスの間、最も重要なステップは、収集されたリレーショナル情報に基づいて、リレーショナルモデルを決定することである。 With the constant development of statistical methods, modeling for relational information between objects has become a hot topic. Various relational information exists between objects, for example, contact information between people in a social network and link relational information between pages on the Internet. Various relational information represents the correlation between objects in one category or between objects in multiple categories. Further, more valuable information may be acquired by analyzing relational information. Therefore, in recent years, the actual application based on relational information continues to develop. Model-driven methods are widely used to analyze relational information. One important motivation for relational modeling is to reveal the underlying group structure (entities divided into groups based on entity connectivity) underlying the relational network. For example, if a movie company wants to get user comments about a movie that is showing, the company collects user ratings for various movies and uses a relational model to divide the user and the movie into different groups. Thus, simultaneous clustering is accomplished for users, movies, and ratings, and the clustering results are used for further analysis. During this process, the most important step is to determine the relational model based on the collected relational information.

実際には、リレーショナルモデルは、潜在変数の変分分布及びモデルパラメータによって決定される。潜在変数は、直接観測できないがサンプルデータを使用した推論により取得可能な変数のことである。潜在変数の変分分布は、サンプルデータが特定のカテゴリに分割される確率を表現するために使用される。モデルパラメータは、各カテゴリのサブモデルのパラメータを表現するために使用される。現在、確率論的なブロックモデル用の変分ベイズ推論及び複雑性制御（Variational Bayesian inference and complexity control for stochastic block models）［ラトゥーシュ（Latouche）他著、統計モデリング、２０１２年］と名付けられた論文は、ネットワークデータに基づいた潜在変数の変分分布及びモデルパラメータを決定する方法を提案している。この方法によれば、第一に、我々は、サンプルデータ、潜在変数、及びモデルパラメータに基づいて対数尤度（log-likelihood）と潜在変数の変分分布（variational distribution）の対数とを計算する。第二に、目的関数は、対数尤度と潜在変数の変分分布の対数とに応じて決定される。そして、最後に、目的関数の収束を可能にする潜在変数の変分分布及びモデルパラメータは決定され、リレーショナルモデルを構成するために使用される。 In practice, the relational model is determined by the variational distribution of latent variables and model parameters. A latent variable is a variable that cannot be observed directly but can be obtained by inference using sample data. The variational distribution of latent variables is used to represent the probability that sample data will be divided into specific categories. Model parameters are used to represent the parameters of each category sub-model. Currently, a paper named Variational Bayesian inference and complexity control for stochastic block models [Latouche et al., Statistical Modeling, 2012] We propose a method for determining variational distribution and model parameters of latent variables based on network data. According to this method, we first calculate the log-likelihood and the logarithm of the variational distribution of the latent variable based on the sample data, the latent variable, and the model parameters. . Second, the objective function is determined according to the log likelihood and the logarithm of the variational distribution of the latent variable. And finally, the variational distribution of latent variables and model parameters that enable convergence of the objective function are determined and used to construct the relational model.

本開示の実施例中、発明者は、先行技術が以下のような課題を有することを確認している。 In the embodiments of the present disclosure, the inventors have confirmed that the prior art has the following problems.

上記方法が同一カテゴリのオブジェクト間の相互関係を記述するネットワークデータに制限されているので、この方法は潜在変数を１つしか含まない。その結果、上記方法で取得された潜在変数の変分分布及びモデルパラメータによって決定されたリレーショナルモデルの適用は、いくつかの制限を有する。更に、目的関数が対数尤度の期待値と潜在変数の変分分布の対数の期待値に応じて決定されるように、目的関数によって決定されるリレーショナルモデルの複雑性は高い。 Since the method is limited to network data describing the interrelationships between objects of the same category, this method contains only one latent variable. As a result, the application of the relational model determined by the variation distribution of latent variables and model parameters obtained by the above method has several limitations. Further, the complexity of the relational model determined by the objective function is high so that the objective function is determined according to the expected value of the log likelihood and the expected value of the logarithm of the variational distribution of the latent variable.

先行技術に存在する技術的課題を解決するために、本開示の実施形態は、リレーショナルモデル決定用の方法及び装置を提供する。技術的解決方法は下記の通りである。 In order to solve the technical problems existing in the prior art, embodiments of the present disclosure provide a method and apparatus for relational model determination. The technical solution is as follows.

第１の観点によると、提供されるリレーショナルモデル決定用の方法は、
サンプルデータ、少なくとも２つの潜在変数及びモデルパラメータに応じて決定される対数尤度と、正規化項と、前記潜在変数の各々の変分分布の対数と、を取得することと、
前記対数尤度と、前記正規化項と、前記潜在変数の各々の前記変分分布の対数と、に応じて目的関数を決定することと、
前記目的関数の収束を可能にする前記潜在変数の各々の変分分布及びモデルパラメータを決定し、前記目的関数の収束を可能にする前記潜在変数の各々の前記変分分布及び前記モデルパラメータに応じてリレーショナルモデルを決定することと、
を備え、
前記潜在変数の各々は、前記サンプルデータのサンプルカテゴリを記述するために使用される。 According to a first aspect, the provided relational model determination method is:
Obtaining logarithmic likelihood determined according to sample data, at least two latent variables and model parameters, a normalization term, and a logarithm of each variational distribution of the latent variables;
Determining an objective function according to the log likelihood, the normalization term, and the logarithm of the variational distribution of each of the latent variables;
Determining a variational distribution and model parameters of each of the latent variables enabling convergence of the objective function, and depending on the variational distribution and model parameters of each of the latent variables enabling convergence of the objective function To determine the relational model
With
Each of the latent variables is used to describe a sample category of the sample data.

第１の観点に関し、第１の観点の第１の可能な実施例において、前記サンプルデータ、前記少なくとも２つの潜在変数及び前記モデルパラメータに応じて決定される前記対数尤度は、

であって、ｌｏｇｐ（）は前記対数尤度を示し、ｐは同時確率密度関数を示し、

は前記サンプルデータを示し、ｄは前記サンプルデータの次元を示し、Ｎ_１は第１次元のサンプルデータの数を示し、Ｎ_ｄは第ｄ次元のサンプルデータの数を示し、Ｚ^１は第１次元のサンプルデータの潜在変数を示し、Ｚ^ｄは第ｄ次元のサンプルデータの潜在変数を示し、θは前記モデルパラメータのセットを示し、前記モデルパラメータはα^１，．．．，α^ｄ，φを含み、α^１は第１次元の混合比であり、α^ｄは第ｄ次元の混合比であり、φはモデルパラメータを示す。 With respect to the first aspect, in a first possible embodiment of the first aspect, the log likelihood determined in response to the sample data, the at least two latent variables and the model parameter is:

Where logp () indicates the log likelihood, p indicates the joint probability density function,

Represents the sample data, d represents the dimension of the sample data, N ₁ represents the number of sample data in the first dimension, N _d represents the number of sample data in the d-th dimension, and Z ¹ represents the first Dimensional sample data latent variables, Z ^d denotes d-dimensional sample data latent variables, θ denotes the set of model parameters, and the model parameters are α ¹ ,. . . , Α ^d , φ, where α ¹ is the first-dimensional mixing ratio, α ^d is the d-dimensional mixing ratio, and φ is the model parameter.

第１の観点に関し、第１の観点の第２の可能な実施例において、前記サンプルデータ、前記少なくとも２つの潜在変数及び前記モデルパラメータに応じて決定される前記正規化項は、

であって、ｄは前記サンプルデータの次元を示し、Ｎ_１は第１次元のサンプルデータの数を示し、Ｎ_ｄは第ｄ次元のサンプルデータの数を示し、Ｎ_ｉは第ｉ次元のサンプルデータの数を示し、Ｋ_ｌは第１次元のサンプルカテゴリの数を示し、Ｋ_ｄは第ｄ次元のサンプルカテゴリの数を示し、

は前記潜在変数の前記変分分布の近似値を示し、

は第ｐ_１サンプルカテゴリにおける第１次元の第ｊ_１サンプルデータの潜在変数を示し、

は、第ｐ_ｄサンプルカテゴリにおける第ｄ次元の第ｊ_ｄサンプルデータの潜在変数を示し、α^ｉは第ｉ次元の混合比を示し、

はα^ｉの次元を示し、

は第１次元の第ｐ_１サンプルカテゴリ，．．．，第ｄ次元の第ｐ_ｄサンプルカテゴリにおけるサブモデルパラメータの次元を示し、Ｌ（ａ，ｂ）＝ｌｏｇｂ＋(ａ−ｂ)／ｂであって、ａは

を示し、ｂは

を示す。 With respect to the first aspect, in a second possible embodiment of the first aspect, the normalization term determined in response to the sample data, the at least two latent variables and the model parameter is:

Where d represents the dimension of the sample data, N ₁ represents the number of sample data in the first dimension, N _d represents the number of sample data in the d-th dimension, and N _i represents the sample in the i-th dimension. Indicates the number of data, K _l indicates the number of sample categories in the first dimension, K _d indicates the number of sample categories in the d dimension,

Indicates an approximation of the variational distribution of the latent variable,

Indicates a latent variable of the j _1st sample data of the 1st dimension in the p ₁ sample category,

Indicates a latent variable of the dth dimension _jd sample data in the _pdth sample category, α ⁱ indicates the i th dimension mixing ratio,

Indicates the dimension of α ⁱ ,

Is the first dimension p ₁ sample category,. . . , Denote the dimension of the submodel parameter in the _pd sample category of the d th dimension, where L (a, b) = logb + (ab) / b, where a is

And b is

Indicates.

第１の観点に関し、第１の観点の第３の可能な実施例において、前記サンプルデータ、前記少なくとも２つの潜在変数及び前記モデルパラメータに応じて決定される前記潜在変数の各々の前記変分分布の対数はｌｏｇｑ（Ｚ^１），．．．ｌｏｇｑ（Ｚ^ｄ）を含み、ｑ(Ｚ^１)は潜在変数Ｚ^１の前記変分分布を示し、ｑ(Ｚ^ｄ)は潜在変数Ｚ^ｄの前記変分分布を示す。 With respect to the first aspect, in a third possible embodiment of the first aspect, the variation distribution of each of the latent variables determined in response to the sample data, the at least two latent variables, and the model parameters. Logarithm of logq (Z ¹ ),. . . logq (Z ^d ), q (Z ¹ ) indicates the variation distribution of the latent variable Z ¹ , and q (Z ^d ) indicates the variation distribution of the latent variable Z ^d .

第１の観点の第１から３の可能な実施例に関し、第１の観点の第４の実施例において、前記対数尤度と、前記正規化項と、前記潜在変数の各々の前記変分分布の対数と、に応じて目的関数を決定することは、前記対数尤度の期待値、前記正規化項の期待値及び前記潜在変数の各々の前記変分分布の対数の期待値に応じて前記目的関数を決定することを含む。 With respect to the first to third possible embodiments of the first aspect, in the fourth embodiment of the first aspect, the variational distribution of each of the log likelihood, the normalized term, and the latent variable Determining the objective function according to the logarithm of the logarithm of Including determining an objective function.

第１の観点の第４の可能な実施例に関し、第１の観点の第５の実施例において、前記対数尤度の期待値、前記正規化項の期待値及び前記潜在変数の各々の前記変分分布の対数の期待値に応じて決定された前記目的関数

は、

である。 With respect to a fourth possible embodiment of the first aspect, in the fifth embodiment of the first aspect, the expected value of the log likelihood, the expected value of the normalization term, and the variation of each of the latent variables. The objective function determined according to the expected value of the logarithm of the min distribution

Is

It is.

第１の観点の第５の可能な実施例に関し、第１の観点の第６の実施例において、前記目的関数の収束を可能にする前記潜在変数の各々の変分分布及び前記モデルパラメータを決定することは、
前記潜在変数の各々の更新された変分分布及び更新されたモデルパラメータを取得することと、
前記潜在変数の各々の前記更新された変分分布及び前記更新されたモデルパラメータに応じて前記目的関数が収束するか否かを判別し、前記目的関数が収束しない場合には前記目的関数の収束を可能にする前記潜在変数の各々の前記変分分布及び前記更新されたモデルパラメータを取得するまで前記潜在変数の各々の前記更新された変分分布及び前記更新されたモデルパラメータを再取得することと、
を含む。 With respect to a fifth possible embodiment of the first aspect, in the sixth embodiment of the first aspect, the variation distribution and each model parameter of each of the latent variables that allow the objective function to converge are determined. To do
Obtaining an updated variation distribution and updated model parameters for each of the latent variables;
It is determined whether or not the objective function converges according to the updated variation distribution and the updated model parameter of each of the latent variables, and if the objective function does not converge, the objective function converges Re-acquiring the updated variation distribution and the updated model parameters for each of the latent variables until obtaining the variation distribution and the updated model parameters for each of the latent variables. When,
including.

第１の観点の第６の可能な実施例に関し、第１の観点の第７の実施例において、前記潜在変数の各々の更新された変分分布及び更新されたモデルパラメータを取得することは、
前記潜在変数の各々の収束され更新された変分分布を取得するまで、下記の式

を使用することによって前記潜在変数の各々の前記変分分布を交互に更新することと、
下記の式

ｔは現在の更新を示し、ｔ−１は前回の更新又は初期設定を示す、
を使用することによって前記潜在変数の各々の前記更新された変分分布に応じて前記更新されたモデルパラメータを取得することと、
を含む。 Regarding the sixth possible embodiment of the first aspect, in the seventh embodiment of the first aspect, obtaining each updated variation distribution and updated model parameter of the latent variable comprises:
Until we obtain a converged and updated variational distribution for each of the latent variables:

Alternately updating the variational distribution of each of the latent variables by using
The following formula

t indicates the current update, t-1 indicates the previous update or initial setting,
Obtaining the updated model parameters in response to the updated variational distribution of each of the latent variables by using
including.

第１の観点の第６の可能な実施例に関し、第１の観点の第８の実施例において、前記潜在変数の各々の更新された変分分布及び更新されたモデルパラメータを取得することは、
前記更新されたモデルパラメータを取得するために下記の式

を使用することによって前記モデルパラメータを更新することと、
前記潜在変数の各々の収束され更新された変分分布を取得するために下記の式

ここで、ｔは現在の更新を示し、ｔ−１は前回の更新又は初期設定を示す、
を使用することによって前記更新されたモデルパラメータに応じて前記潜在変数の各々の前記変分分布を交互に更新することと、
を含む。 Regarding the sixth possible embodiment of the first aspect, in the eighth embodiment of the first aspect, obtaining an updated variation distribution and updated model parameters of each of the latent variables comprises:
In order to obtain the updated model parameters:

Updating the model parameters by using
In order to obtain a converged and updated variational distribution of each of the latent variables:

Here, t indicates the current update, t-1 indicates the previous update or initial setting,
Alternately updating the variational distribution of each of the latent variables in response to the updated model parameters by using
including.

第１の観点の第６から８の可能な実施例のいずれかに関し、第１の観点の第９の実施例において、前記潜在変数の各々の前記更新された変分分布及び前記更新されたモデルパラメータに応じて前記目的関数が収束するか否かを判別することは、
前記潜在変数の各々の前記更新された変分分布及び前記更新されたモデルパラメータに応じて決定された前記目的関数と、前記潜在変数の各々の前回更新された変分分布及び前回更新されたモデルパラメータに応じて決定された前回取得された目的関数と、の間の距離が閾値より短いか否かを判別することと、
前記潜在変数の各々の前記更新された変分分布及び前記更新されたモデルパラメータに応じて決定された前記目的関数と前記前回取得された目的関数との間の距離が前記閾値より短い場合には前記目的関数が収束すると判別することと、
を含む。 With respect to any of the sixth to eighth possible embodiments of the first aspect, in the ninth embodiment of the first aspect, the updated variation distribution and the updated model of each of the latent variables Determining whether the objective function converges according to a parameter is:
The objective function determined in response to the updated variation distribution and the updated model parameters of each of the latent variables; the last updated variation distribution and the last updated model of each of the latent variables; Determining whether the distance between the previously acquired objective function determined according to the parameter and the objective function is shorter than a threshold;
When the distance between the objective function determined in accordance with the updated variation distribution and the updated model parameter of each of the latent variables and the previously acquired objective function is shorter than the threshold value Determining that the objective function converges;
including.

第２の観点によると、提供されるリレーショナルモデル決定用の装置は、
サンプルデータ、少なくとも２つの潜在変数及びモデルパラメータに応じて決定される対数尤度と、正規化項と、前記潜在変数の各々の変分分布の対数と、を取得するように構成された取得モジュールと、
前記対数尤度と、前記正規化項と、前記潜在変数の各々の前記変分分布の対数と、に応じて目的関数を決定するように構成された第１の決定モジュールと、
前記目的関数の収束を可能にする前記潜在変数の各々の変分分布及びモデルパラメータを決定するように構成された第２の決定モジュールと、
前記目的関数の収束を可能にする前記潜在変数の各々の前記変分分布及び前記モデルパラメータに応じてリレーショナルモデルを決定するように構成された第３の決定モジュールと、
を備え、
前記潜在変数の各々は、前記サンプルデータのサンプルカテゴリを記述するために使用される。 According to a second aspect, the provided relational model determination device is:
An acquisition module configured to acquire log likelihood determined according to sample data, at least two latent variables and model parameters, a normalization term, and a logarithm of each variation distribution of the latent variables. When,
A first determination module configured to determine an objective function according to the log likelihood, the normalization term, and the logarithm of the variational distribution of each of the latent variables;
A second determination module configured to determine a variational distribution and model parameters of each of the latent variables that allow convergence of the objective function;
A third determination module configured to determine a relational model as a function of the variational distribution and model parameters of each of the latent variables that allow convergence of the objective function;
With
Each of the latent variables is used to describe a sample category of the sample data.

第２の観点に関し、第２の観点の第１の可能な実施例において、前記取得モジュールによって取得された前記対数尤度は、

は前記サンプルデータを示し、ｄは前記サンプルデータの次元を示し、Ｎ_１は第１次元のサンプルデータの数を示し、Ｎ_ｄは第ｄ次元のサンプルデータの数を示し、Ｚ^１は第１次元のサンプルデータの潜在変数を示し、Ｚ^ｄは第ｄ次元のサンプルデータの潜在変数を示し、θは前記モデルパラメータのセットを示し、前記モデルパラメータはα^１，．．．，α^ｄ，φを含み、α^１は第１次元の混合比であり、α^ｄは第ｄ次元の混合比であり、φはモデルパラメータを示す。 Regarding the second aspect, in the first possible embodiment of the second aspect, the log likelihood obtained by the acquisition module is:

第２の観点に関し、第２の観点の第２の可能な実施例において、前記取得モジュールによって取得された前記正規化項は、

は前記潜在変数の前記変分分布の近似値を示し、

はα^ｉの次元を示し、

を示し、ｂは

を示す。 Regarding the second aspect, in a second possible embodiment of the second aspect, the normalization term obtained by the obtaining module is:

Indicates the dimension of α ⁱ ,

And b is

Indicates.

第２の観点に関し、第２の観点の第３の可能な実施例において、前記取得モジュールによって取得された前記潜在変数の各々の前記変分分布の対数はｌｏｇｑ（Ｚ^１），．．．ｌｏｇｑ（Ｚ^ｄ）を含み、ｑ（Ｚ^１）は潜在変数Ｚ¹の前記変分分布を示し、ｑ（Ｚ^ｄ）は潜在変数Ｚ^ｄの前記変分分布を示す。 With respect to the second aspect, in a third possible embodiment of the second aspect, the logarithm of the variation distribution of each of the latent variables acquired by the acquisition module is logq (Z ¹ ),. . . logq (Z ^d ), q (Z ¹ ) indicates the variation distribution of the latent variable Z ¹ , and q (Z ^d ) indicates the variation distribution of the latent variable Z ^d .

第２の観点の第３の可能な実施例における第２の観点のいずれかに関し、第２の観点の第４の可能な実施例において、前記第１の決定モジュールは、前記対数尤度の期待値、前記正規化項の期待値及び前記潜在変数の各々の前記変分分布の対数の期待値に応じて目的関数を決定するように構成されている。 With respect to any of the second aspects of the third possible embodiment of the second aspect, in the fourth possible embodiment of the second aspect, the first decision module expects the log likelihood The objective function is determined according to the value, the expected value of the normalization term, and the expected value of the logarithm of the variation distribution of each of the latent variables.

第２の観点の第４の可能な実施例に関し、第２の観点の第５の可能な実施例において、前記第１の決定モジュールによって決定された前記目的関数

は、

である。 With respect to a fourth possible embodiment of the second aspect, in the fifth possible embodiment of the second aspect, the objective function determined by the first determination module

Is

It is.

第２の観点の第５の可能な実施例に関し、第２の観点の第６の可能な実施例において、前記第２の決定モジュールは、
潜在変数の各々の更新された変分分布及び更新されたモデルパラメータを取得するように構成された取得ユニットと、
前記潜在変数の各々の前記更新された変分分布及び前記更新されたモデルパラメータに応じて前記目的関数が収束するか否かを判別するように構成された判別ユニットと、
を含み、
前記取得ユニットは、前記目的関数が収束しない場合には前記目的関数の収束を可能にする前記潜在変数の各々の変分分布及び前記モデルパラメータを取得するまで前記潜在変数の各々の前記変分分布及び前記更新されたモデルパラメータを再取得するように構成されている。 With respect to the fifth possible embodiment of the second aspect, in the sixth possible embodiment of the second aspect, the second determination module comprises:
An acquisition unit configured to acquire an updated variation distribution and updated model parameters of each of the latent variables;
A discriminating unit configured to discriminate whether or not the objective function converges according to the updated variation distribution and the updated model parameter of each of the latent variables;
Including
The acquisition unit includes a variational distribution of each of the latent variables that enables convergence of the objective function if the objective function does not converge and the variational distribution of each of the latent variables until obtaining the model parameters. And re-acquiring the updated model parameters.

第２の観点の第６の可能な実施例に関し、第２の観点の第７の可能な実施例において、前記取得ユニットは、
前記潜在変数の各々の収束され更新された変分分布を取得するまで、下記の式

を使用することによって前記潜在変数の各々の前記変分分布を交互に更新するように構成された第１の更新サブユニットと、
下記の式

ｔは現在の更新を示し、ｔ−１は前回の更新又は初期設定を示す、
を使用することによって前記潜在変数の各々の前記更新された変分分布に応じて前記更新されたモデルパラメータを取得するように構成された第２の更新サブユニットと、
を含む。 With respect to the sixth possible embodiment of the second aspect, in the seventh possible embodiment of the second aspect, the acquisition unit comprises:
Until we obtain a converged and updated variational distribution for each of the latent variables:

A first update subunit configured to alternately update the variational distribution of each of the latent variables by using
The following formula

t indicates the current update, t-1 indicates the previous update or initial setting,
A second update subunit configured to obtain the updated model parameters in response to the updated variational distribution of each of the latent variables by using
including.

第２の観点の第６の可能な実施例に関し、第２の観点の第８の可能な実施例において、前記取得ユニットは、
前記更新されたモデルパラメータを取得するために下記の式

を使用することによって前記モデルパラメータを更新するように構成された第３の更新サブユニットと、
前記潜在変数の各々の収束され更新された変分分布を取得するために下記の式

ここで、ｔは現在の更新を示し、ｔ−１は前回の更新又は初期設定を示す、
を使用することによって前記更新されたモデルパラメータに応じて前記潜在変数の各々の前記変分分布を交互に更新するように構成された第４の更新サブユニットと、
を含む。 With respect to the sixth possible embodiment of the second aspect, in the eighth possible embodiment of the second aspect, the acquisition unit comprises:
In order to obtain the updated model parameters:

A third update subunit configured to update the model parameters by using
In order to obtain a converged and updated variational distribution of each of the latent variables:

Here, t indicates the current update, t-1 indicates the previous update or initial setting,
A fourth update subunit configured to alternately update the variational distribution of each of the latent variables in response to the updated model parameters by using
including.

第２の観点の第６から８の実施例のいずれかに関し、第２の観点の第９の実施例において、前記判別ユニットは、
前記潜在変数の各々の前記更新された変分分布及び前記更新されたモデルパラメータに応じて決定された前記目的関数と、前記潜在変数の各々の前回更新された変分分布及び前回更新されたモデルパラメータに応じて決定された前回取得された目的関数と、の間の距離が閾値より短いか否かを判別するように構成された比較サブユニットと、
前記潜在変数の各々の前記更新された変分分布及び前記更新されたモデルパラメータに応じて決定された前記目的関数と前記前回取得された目的関数との間の距離が前記閾値より短い場合には前記目的関数が収束すると判別するように構成された判別サブユニットと、
を含む。 With regard to any of the sixth to eighth embodiments of the second aspect, in the ninth embodiment of the second aspect, the determination unit comprises:
The objective function determined in response to the updated variation distribution and the updated model parameters of each of the latent variables; the last updated variation distribution and the last updated model of each of the latent variables; A comparison subunit configured to determine whether the distance between the previously obtained objective function determined according to the parameter and the distance is less than a threshold;
When the distance between the objective function determined in accordance with the updated variation distribution and the updated model parameter of each of the latent variables and the previously acquired objective function is shorter than the threshold value A discrimination subunit configured to determine that the objective function converges;
including.

本開示の実施形態において提供される技術的解決方法は、下記の有益な効果を得る。 The technical solutions provided in the embodiments of the present disclosure obtain the following beneficial effects.

サンプルデータ、サンプルデータのサンプルカテゴリを記述するための少なくとも２つの潜在変数及びモデルパラメータに応じて決定される対数尤度と、正規化項と、潜在変数の各々の変分分布の対数と、を取得し、対数尤度と、正規化項と、潜在変数の各々の変分分布の対数と、に応じて目的関数を決定することによって、目的関数の収束を可能にする潜在変数の各々の変分分布及びモデルパラメータに応じて決定されたリレーショナルモデルは、２つ以上のエンティティ間における相互関係の分析に適用可能になり、その結果、リレーショナルモデルの適用範囲を拡張する。更に、目的関数に正規化項を導入することによって、決定されたリレーショナルモデルの複雑性は自動的に制御される。 Log data, logarithmic likelihood determined according to at least two latent variables and model parameters for describing sample data, sample categories of sample data, normalization term, and logarithm of each variation distribution of latent variables, Obtaining and determining the objective function according to the log likelihood, normalization term, and logarithm of each variation distribution of the latent variable, thereby allowing each variable of the latent variable to converge. The relational model determined according to the min distribution and model parameters can be applied to the analysis of interrelationships between two or more entities, thereby extending the scope of application of the relational model. Furthermore, the complexity of the determined relational model is automatically controlled by introducing a normalization term into the objective function.

本開示の実施形態における技術的解決法をより明確に説明するために、実施形態を記述するために使用される添付の図面は、下記のように簡単に紹介される。明らかに、以下に記載の添付図面は、本発明のいくつかの実施形態だけを説明し、当業者は、いかなる創造的な努力もすることなく、これらの添付図面に基づいて、他の添付図面を導き出してもよい。 In order to more clearly describe the technical solutions in the embodiments of the present disclosure, the accompanying drawings used to describe the embodiments are briefly introduced as follows. Apparently, the accompanying drawings described below describe only some embodiments of the present invention, and those of ordinary skill in the art, based on these accompanying drawings, do not make any creative efforts. May be derived.

本開示の実施形態１に係るリレーショナルモデル決定用の方法のフローチャートである。3 is a flowchart of a relational model determination method according to the first embodiment of the present disclosure. 本開示の実施形態２に係るリレーショナルモデル決定用の方法のフローチャートである。6 is a flowchart of a relational model determination method according to Embodiment 2 of the present disclosure. 本開示の実施形態３に係るリレーショナルモデル決定用の装置の概略構成図である。It is a schematic block diagram of the apparatus for relational model determination concerning Embodiment 3 of this indication. 本開示の実施形態３に係る第２の決定モジュールの概略構成図である。It is a schematic block diagram of the 2nd determination module which concerns on Embodiment 3 of this indication. 本開示の実施形態３に係る第１の取得ユニットの概略構成図である。It is a schematic block diagram of the 1st acquisition unit which concerns on Embodiment 3 of this indication. 本開示の実施形態３に係る第２の取得ユニットの概略構成図である。It is a schematic block diagram of the 2nd acquisition unit which concerns on Embodiment 3 of this indication. 本開示の実施形態３に係る判別ユニットの概略構成図である。It is a schematic block diagram of the discrimination | determination unit concerning Embodiment 3 of this indication.

本発明の目的、技術的解決方法、及び利点をより明確にするために、本発明の実施形態では、添付図面を参照して以下に詳細に説明する。 In order to make the objectives, technical solutions, and advantages of the present invention clearer, the embodiments of the present invention are described in detail below with reference to the accompanying drawings.

（実施形態１）
本開示の実施形態は、リレーショナルモデル決定用の方法を提供する。図１を参照すると、方法は下記のステップを含む。 (Embodiment 1)
Embodiments of the present disclosure provide a method for relational model determination. Referring to FIG. 1, the method includes the following steps.

ステップ１０１：サンプルデータ、少なくとも２つの潜在変数及びモデルパラメータに応じて決定される対数尤度と、正規化項と、潜在変数の各々の変分分布の対数と、が取得される。なお、潜在変数の各々は、サンプルデータのサンプルカテゴリを記述するために使用される。 Step 101: A log likelihood determined according to sample data, at least two latent variables and model parameters, a normalization term, and a logarithm of each variation distribution of the latent variables are obtained. Note that each latent variable is used to describe a sample category of sample data.

任意の実施形態のように、サンプルデータ、少なくとも２つの潜在変数及びモデルパラメータに応じて決定される対数尤度は、

であって、ｌｏｇｐ（）は対数尤度を示し、ｐは同時確率密度関数を示し、

はサンプルデータを示し、ｄはサンプルデータの次元を示し、Ｎ_１は第１次元のサンプルデータの数を示し、Ｎ_ｄは第ｄ次元のサンプルデータの数を示し、Ｚ^１は第１次元のサンプルデータの潜在変数を示し、Ｚ^ｄは第ｄ次元のサンプルデータの潜在変数を示し、θはモデルパラメータのセットを示し、モデルパラメータはα^１，．．．，α^ｄ，φを含み、α^１は第１次元の混合比であり、α^ｄは第ｄ次元の混合比であり、φはモデルパラメータを示す。 As in any embodiment, the log likelihood determined as a function of sample data, at least two latent variables and model parameters is

Indicates sample data, d indicates the dimension of the sample data, N ₁ indicates the number of sample data in the first dimension, N _d indicates the number of sample data in the d-th dimension, and Z ¹ indicates the number of sample data in the first dimension. Indicates a latent variable of the sample data, Z ^d indicates a latent variable of the d-th dimension sample data, θ indicates a set of model parameters, and the model parameters are α ¹ ,. . . , Α ^d , φ, where α ¹ is the first-dimensional mixing ratio, α ^d is the d-dimensional mixing ratio, and φ is the model parameter.

任意の実施形態のように、サンプルデータ、少なくとも２つの潜在変数及びモデルパラメータに応じて決定される正規化項は、

であって、ｄはサンプルデータの次元を示し、Ｎ_１は第１次元のサンプルデータの数を示し、Ｎ_ｄは第ｄ次元のサンプルデータの数を示し、Ｎ_ｉは第ｉ次元のサンプルデータの数を示し、Ｋ_ｌは第１次元のサンプルカテゴリの数を示し、Ｋ_ｄは第ｄ次元のサンプルカテゴリの数を示し、

は潜在変数の変分分布の近似値を示し、

はα^ｉの次元を示し、

を示し、ｂは

を示す。 As in any embodiment, the normalization term determined in response to the sample data, at least two latent variables and the model parameter is:

A is, d denotes the dimension of the sample data, N ₁ represents the number of first dimension sample data, N _d is the number of sample data of the d-dimensional, N _i is the i dimension of the sample data , K _l indicates the number of sample categories in the first dimension, K _d indicates the number of sample categories in the d dimension,

Is an approximation of the variational distribution of latent variables,

Indicates the dimension of α ⁱ ,

And b is

Indicates.

任意の実施形態のように、サンプルデータ、少なくとも２つの潜在変数及びモデルパラメータに応じて決定される潜在変数の各々の変分分布の対数はｌｏｇｑ（Ｚ^１），．．．ｌｏｇｑ（Ｚ^ｄ）を含み、ｑ（Ｚ^１）は潜在変数Ｚ^１の変分分布を示し、ｑ（Ｚ^ｄ）は潜在変数Ｚ^ｄの変分分布を示す。 As in any embodiment, the logarithm of the variational distribution of each of the latent variables determined in response to the sample data, the at least two latent variables and the model parameters is logq (Z ¹ ),. . . logq (Z ^d ), q (Z ¹ ) represents the variational distribution of the latent variable Z ¹ , and q (Z ^d ) represents the variational distribution of the latent variable Z ^d .

ステップ１０２：目的関数が、対数尤度と、正規化項と、潜在変数の各々の変分分布の対数と、に応じて決定される。 Step 102: The objective function is determined according to the log likelihood, the normalization term, and the logarithm of each variational distribution of latent variables.

任意の実施形態のように、対数尤度と、正規化項と、潜在変数の各々の変分分布の対数と、に応じて目的関数を決定することは、対数尤度の期待値、正規化項の期待値及び潜在変数の各々の変分分布の対数の期待値に応じて目的関数を決定することを含む。 As in any embodiment, determining the objective function according to the log likelihood, the normalization term, and the logarithm of each variational distribution of the latent variable is the expected value of the log likelihood, normalization Determining an objective function according to the expected value of the term and the expected value of the logarithm of each variational distribution of the latent variable.

任意の実施形態のように、対数尤度の期待値、正規化項の期待値及び潜在変数の各々の変分分布の対数の期待値に応じて決定された目的関数

は、下記の式で表される。

Objective function determined according to the expected value of log likelihood, the expected value of normalization term, and the expected value of logarithm of each variation distribution of latent variables, as in any embodiment

Is represented by the following formula.

ステップ１０３：目的関数の収束を可能にする潜在変数の各々の変分分布及びモデルパラメータが決定され、リレーショナルモデルが目的関数の収束を可能にする潜在変数の各々の変分分布及びモデルパラメータに応じて決定される。 Step 103: The variation distribution and model parameters of each of the latent variables that enable convergence of the objective function are determined, and the relational model depends on each variation distribution and model parameters of the latent variables that allow the convergence of the objective function. Determined.

任意の実施形態のように、目的関数の収束を可能にする潜在変数の各々の変分分布及びモデルパラメータを決定することは、
潜在変数の各々の更新された変分分布及び更新されたモデルパラメータを取得することと、
潜在変数の各々の更新された変分分布及び更新されたモデルパラメータに応じて目的関数が収束するか否かを判別し、目的関数が収束しない場合には目的関数の収束を可能にする潜在変数の各々の変分分布及び更新されたモデルパラメータを取得するまで潜在変数の各々の更新された変分分布及び更新されたモデルパラメータを再取得することと、
を含む。 As in any embodiment, determining the variational distribution and model parameters of each of the latent variables that allow convergence of the objective function
Obtaining an updated variation distribution and updated model parameters for each of the latent variables;
A latent variable that determines whether or not the objective function converges according to each updated variation distribution and updated model parameter of the latent variable, and allows the objective function to converge if the objective function does not converge Reacquiring each updated variation distribution and updated model parameter of the latent variable until each variation distribution and updated model parameter of
including.

任意の実施形態のように、潜在変数の各々の更新された変分分布及び更新されたモデルパラメータを取得することは、
潜在変数の各々の収束され更新された変分分布を取得するまで、下記の式

を使用することによって潜在変数の各々の変分分布を交互に更新することと、
下記の式

ｔは現在の更新を示し、ｔ−１は前回の更新又は初期設定を示す、
を使用することによって潜在変数の各々の更新された変分分布に応じて更新されたモデルパラメータを取得することと、
を含む。 As in any embodiment, obtaining an updated variation distribution and updated model parameters for each of the latent variables
Until we get the converged and updated variational distribution of each of the latent variables

t indicates the current update, t-1 indicates the previous update or initial setting,
Obtaining updated model parameters in response to each updated variational distribution of latent variables by using
including.

任意の実施形態のように、潜在変数の各々の更新された変分分布及び更新されたモデルパラメータを取得することは、
更新されたモデルパラメータを取得するために下記の式

を使用することによってモデルパラメータを更新することと、
潜在変数の各々の収束され更新された変分分布を取得するために下記の式

ここで、ｔは現在の更新を示し、ｔ−１は前回の更新又は初期設定を示す、
を使用することによって更新されたモデルパラメータに応じて潜在変数の各々の変分分布を交互に更新することと、
を含む。 As in any embodiment, obtaining an updated variation distribution and updated model parameters for each of the latent variables
To get the updated model parameters:

Updating model parameters by using
To get the converged and updated variational distribution for each of the latent variables:

Here, t indicates the current update, t-1 indicates the previous update or initial setting,
Alternately updating the variational distribution of each of the latent variables in response to the model parameters updated by using
including.

任意の実施形態のように、潜在変数の各々の更新された変分分布及び更新されたモデルパラメータに応じて目的関数が収束するか否かを判別することは、
潜在変数の各々の更新された変分分布及び更新されたモデルパラメータに応じて決定された目的関数と、前記潜在変数の各々の前回更新された変分分布及び前回更新されたモデルパラメータに応じて決定された前回取得された目的関数と、の間の距離が閾値より短いか否かを判別することと、
潜在変数の各々の更新された変分分布及び更新されたモデルパラメータに応じて決定された目的関数と前回取得された目的関数との間の距離が閾値より短い場合には目的関数が収束すると判別することと、
を含む。 As in any embodiment, determining whether the objective function converges according to the updated variational distribution and updated model parameters of each of the latent variables is
An objective function determined in response to each updated variation distribution and updated model parameter of the latent variable, and a previously updated variation distribution and last updated model parameter of each of the latent variable Determining whether the distance between the determined objective function acquired last time is shorter than a threshold;
It is determined that the objective function converges if the distance between the objective function determined according to each updated variation distribution of the latent variable and the updated model parameter and the objective function acquired last time is shorter than the threshold value. To do
including.

本開示の実施形態において提供される方法によれば、サンプルデータ、サンプルデータのサンプルカテゴリを記述するための少なくとも２つの潜在変数及びモデルパラメータに応じて決定される対数尤度と、正規化項と、潜在変数の各々の変分分布の対数と、を取得し、取得された対数尤度と、正規化項と、潜在変数の各々の変分分布の対数と、に応じて目的関数を決定することによって、目的関数の収束を可能にする潜在変数の各々の変分分布及びモデルパラメータに応じて決定されたリレーショナルモデルは、２つ以上のエンティティ間における相互関係の分析に適用可能になり、その結果、リレーショナルモデルの適用範囲を拡張する。加えて、目的関数に正規化項を導入することによって、決定されたリレーショナルモデルの複雑性は自動的に制御され、リレーショナルモデルを決定する効率は改善される。更に、潜在変数とモデルパラメータとが相互に依存しているので、潜在変数の各々の変分分布及びモデルパラメータの決定はより正確である。 According to a method provided in an embodiment of the present disclosure, a logarithmic likelihood determined according to at least two latent variables and model parameters for describing sample data, a sample category of sample data, a normalization term, and Logarithm of each variational distribution of latent variables, and determine an objective function according to the obtained log likelihood, normalization term, and logarithm of each variational distribution of latent variables The relational model determined according to the variational distribution and model parameters of each of the latent variables enabling the convergence of the objective function can be applied to the analysis of the correlation between two or more entities, As a result, the application range of the relational model is expanded. In addition, by introducing a normalization term into the objective function, the complexity of the determined relational model is automatically controlled and the efficiency of determining the relational model is improved. Furthermore, since the latent variables and model parameters are interdependent, the determination of the variation distribution and model parameters of each of the latent variables is more accurate.

（実施形態２）
本開示の実施形態はリレーショナルモデル決定用の方法を提供する。実施形態１の内容に関し、２であるサンプルデータの次元を取得した場合を例に、本開示のこの実施形態において提供される方法を詳細に説明する。図２を参照すると、方法は下記のステップを含む。 (Embodiment 2)
Embodiments of the present disclosure provide a method for relational model determination. Regarding the contents of the first embodiment, the method provided in this embodiment of the present disclosure will be described in detail by taking as an example the case where the dimension of the sample data that is 2 is acquired. Referring to FIG. 2, the method includes the following steps.

ステップ２０１：サンプルデータ、少なくとも２つの潜在変数及びモデルパラメータに応じて決定される対数尤度と、正規化項と、潜在変数の各々の変分分布の対数と、が取得される。なお、潜在変数の各々は、サンプルデータのサンプルカテゴリを記述するために使用される。 Step 201: A log likelihood determined according to sample data, at least two latent variables and model parameters, a normalization term, and a logarithm of each variation distribution of the latent variables are obtained. Note that each latent variable is used to describe a sample category of sample data.

この実施形態は、サンプルデータの内容及び次元に何の限定も設けない。特定の実施例中、サンプルデータは、映画についての複数のユーザによってなされた評価であってよい。この場合には、サンプルデータの次元は２であってよい。即ち、統計収集は、ユーザ及び映画の次元における評価のために行われる。それにもかかわらず、上記内容及び次元に加えて、サンプルデータが他の内容及び次元を採用してもよい。 This embodiment provides no limitation on the content and dimensions of the sample data. In certain embodiments, the sample data may be ratings made by multiple users about the movie. In this case, the dimension of the sample data may be 2. That is, statistics collection is done for evaluation in the user and movie dimensions. Nevertheless, in addition to the above contents and dimensions, the sample data may adopt other contents and dimensions.

理解を容易にするために、説明は、一例としての下記のサンプルデータを使用してなされる。サンプルデータは、５×５の行列形式で表される。なお、行列の行はユーザ１から５までを示し、行列の列は映画１から５を示す。行列の任意の要素Ｘ_ｉｊはユーザｉによりなされた映画ｊに対する評価を示す。なお、１≦ｉ≦５、１≦ｊ≦５、ｉとｊはいずれも整数である。

For ease of understanding, the description is made using the following sample data as an example. Sample data is expressed in a 5 × 5 matrix format. The rows of the matrix indicate users 1 to 5 and the columns of the matrix indicate movies 1 to 5. An optional element X _ij of the matrix indicates the rating for movie j made by user i. 1 ≦ i ≦ 5, 1 ≦ j ≦ 5, i and j are both integers.

モデルパラメータは、これらに限定されないが、各サンプルカテゴリにおいて、行の混合比、列の混合比、及びサブモデルパラメータを含む。この実施形態は、具体的なモデルパラメータの内容に何の限定も設けない。一例としての行列形式のサンプルデータを使用する場合、行の混合比は、決定されたリレーショナルモデル内の行の総数に対する決定されたリレーショナルモデル内の各サンプルカテゴリにおける行列の行の数の比であり、列の混合比は、決定されたリレーショナルモデル内の行の総数に対する決定されたリレーショナルモデル内の各サンプルカテゴリにおける行列の列の数の比であり、各サンプルカテゴリにおけるサブモデルパラメータは、決定されたリレーショナルモデル内の各サンプルカテゴリにおけるデータ分布のパラメータである。 Model parameters include, but are not limited to, row mix ratios, column mix ratios, and sub-model parameters in each sample category. This embodiment does not provide any limitation on the contents of specific model parameters. Using sample data in the matrix format as an example, the row mixing ratio is the ratio of the number of rows in the matrix in each sample category in the determined relational model to the total number of rows in the determined relational model. The column mixing ratio is the ratio of the number of columns in the matrix in each sample category in the determined relational model to the total number of rows in the determined relational model, and the submodel parameters in each sample category are determined Data distribution parameters in each sample category in the relational model.

ここで、潜在変数とモデルパラメータは互いに独立であってよい又は依存関係の対象であってよいことに留意すべきである。実際には、決定されたリレーショナルモデルをより正確なものにするために潜在変数とモデルパラメータは相互依存しているので、この実施形態は、一例として、潜在変数とモデルパラメータが依存関係の対象になっている場合を用いて説明する。 It should be noted here that the latent variables and model parameters may be independent of each other or subject to dependencies. In practice, since latent variables and model parameters are interdependent to make the determined relational model more accurate, this embodiment is an example where latent variables and model parameters are subject to dependencies. A description will be given using the case where

更に、サンプルデータ、少なくとも２つの潜在変数、及びモデルパラメータに応じて決定される対数尤度と、正規化項と、潜在変数の各々の変分分布の対数と、を取得するために、この実施形態で提供される方法は、最初に、下記の同時確率密度関数を導入する。

ここで、ｐは同時確率密度関数を示し、

はサンプルデータを示し、Ｎ_ｒは行のサンプルの数を示し、Ｎ_ｃは列のサンプルの数を示し、Ｚ^Ｒは行の潜在変数を示し、Ｚ^ｃは列の潜在変数を示し、θはモデルパラメータのセットを示し、モデルパラメータはα、β、φを含み、αとβはそれぞれ行と列の混合比であり、φは各サンプルカテゴリにおけるサブモデルパラメータを示し、Ｋ_ｒは行のサンプルカテゴリの数を示し、Ｋ_ｃは列のサンプルカテゴリの数を示し、Ｘ_ｉｊは第ｉ行第ｊ列のサンプルデータを示し、

はサンプルデータＸ_ｉｊが第ｋ行のサンプルカテゴリに属するか否かを示し、

はサンプルデータＸ_ｉｊが第ｌ列のサンプルカテゴリに属するか否かを示し、α_ｋは第ｋ行のサンプルカテゴリの行の混合比を示し、β_ｌは第ｌ列のサンプルカテゴリの列の混合比を示す。 In addition, this implementation is used to obtain log likelihood determined according to sample data, at least two latent variables, and model parameters, a normalization term, and a logarithm of each variational distribution of latent variables. The method provided in the form first introduces the following joint probability density function:

Where p is the joint probability density function,

Indicates sample data, N _r indicates the number of samples in the row, N _c indicates the number of samples in the column, Z ^R indicates the latent variable in the row, Z ^c indicates the latent variable in the column, θ is Denotes a set of model parameters, where the model parameters include α, β, and φ, where α and β are the row and column mixing ratios, φ indicates the submodel parameters in each sample category, and K _r is the sample of the row Indicates the number of categories, K _c indicates the number of sample categories in the column, X _ij indicates the sample data in row i and column j,

Indicates whether the sample data X _ij belongs to the sample category in the k-th row,

Indicates whether the sample data X _ij belongs to the sample category of the l-th column, α _k indicates the mixing ratio of the rows of the sample category of the k-th row, and β _l indicates the mixing of the columns of the sample category of the l-th column Indicates the ratio.

上記同時確率密度関数は、リレーショナルモデルの確率密度分布を決定する。リレーショナルモデルの確率密度分布は、同時確率密度関数における潜在変数Ｚ^Ｒ、Ｚ^ｃ及びリレーショナルモデルがこのようにして決定されるのと同様に、モデルパラメータα、β、φを決定することによって決定されてよい。同時確率密度関数を解けるようにするにあたり、下記の同時確率密度分布（joint probability density distribution）の等式の両辺の各々についての対数は、対数尤度を取得するために計算される。

The joint probability density function determines the probability density distribution of the relational model. The probability density distribution of the relational model is determined by determining the model parameters α, β, φ in the same way that the latent variables Z ^R , Z ^c and the relational model in the joint probability density function are determined in this way. It's okay. In order to be able to solve the joint probability density function, the logarithm for each side of the equation of the joint probability density distribution below is calculated to obtain the log likelihood.

はサンプルデータを示し、Ｎ_ｒは行のサンプルの数を示し、Ｎ_ｃは列のサンプルの数を示し、Ｚ^Rは行の潜在変数を示し、Ｚ^ｃは列の潜在変数を示し、θはモデルパラメータのセットを示し、モデルパラメータはα、β、φを含み、αとβはそれぞれ行と列の混合比を示し、φは各サンプルカテゴリにおけるモデルパラメータを示す。 As in any embodiment, the log likelihood determined as a function of sample data, at least two latent variables and model parameters is

Indicates sample data, N _r indicates the number of samples in the row, N _c indicates the number of samples in the column, Z ^R indicates the latent variable in the row, Z ^c indicates the latent variable in the column, θ is A set of model parameters is shown, where the model parameters include α, β, and φ, where α and β indicate the row and column mixing ratio, respectively, and φ indicates the model parameter in each sample category.

特に、サンプルデータ

が行列形式で表される場合、Ｎ_ｒは行列における行のサンプルの数を示し、Ｎ_ｃは行列における列のサンプルの数を示し、Ｋ_ｒは行のサンプルカテゴリの数を示し、Ｋ_ｃは列のサンプルカテゴリの数を示す。Ｚ^ＲはＮ_ｒ*Ｋ_ｒのブロック変数行列である。Ｚ^Ｒの各要素は

である。

の場合、それはサンプルデータＸ_ｉｊが第ｋ行のサンプルカテゴリに属することを示す。Ｚ^ｃはＮ_ｃ*Ｋ_ｃのブロック変数行列である。Ｚ^ｃの各要素は

である。

の場合、それはサンプルデータＸ_ｉｊが第１行のサンプルカテゴリに属することを示す。行の混合比αは、サンプルデータ行列の行の総数に対するリレーショナルデータモデルの各サンプルカテゴリにおける行の数の比を示す。列の混合比βは、サンプルデータ行列の列の総数に対するリレーショナルデータモデルの各サンプルカテゴリにおける列の数の比を示す。各要素におけるモデルパラメータφは、リレーショナルモデルにおける各サンプルカテゴリのサンプルデータがサンプルカテゴリにおいて従う分布のパラメータを示す。例えば、各サンプルカテゴリにおけるサンプルデータがガウス分布に従う場合、φはガウス分布における期待値μ及び分散δを示す。他の例として、各サンプルカテゴリのサンプルデータがポアソン分布に従う場合、φは、ポアソン分布における期待値及び分散λを示す。ここで、各サンプルカテゴリにおけるサンプルデータは上記の分布以外の他の分布に従っていてよく、この実施形態において分布は限定されないことに留意すべきである。 In particular, sample data

Is represented in matrix form, N _r indicates the number of samples in the row in the matrix, N _c indicates the number of samples in the column in the matrix, K _r indicates the number of sample categories in the row, and K _c is Indicates the number of sample categories in the column. Z ^R is a block variable matrix of N _r * K _r . Each element of the Z ^R is

It is.

, It indicates that the sample data _Xij belongs to the sample category in the kth row. Z ^c is a block variable matrix of N _c * K _c . Each element of the Z ^c is

It is.

, It indicates that the sample data X _ij belongs to the sample category in the first row. The row mixing ratio α indicates the ratio of the number of rows in each sample category of the relational data model to the total number of rows in the sample data matrix. The column mixing ratio β indicates the ratio of the number of columns in each sample category of the relational data model to the total number of columns in the sample data matrix. The model parameter φ in each element indicates a distribution parameter that sample data of each sample category in the relational model follows in the sample category. For example, when sample data in each sample category follows a Gaussian distribution, φ indicates an expected value μ and a variance δ in the Gaussian distribution. As another example, when sample data of each sample category follows a Poisson distribution, φ indicates an expected value and a variance λ in the Poisson distribution. Here, it should be noted that the sample data in each sample category may follow a distribution other than the above distribution, and the distribution is not limited in this embodiment.

任意の実施形態において、サンプルデータ、少なくとも２つの潜在変数及びモデルパラメータに応じて決定される正規化項は、

であって、Ｎ_ｒは行のサンプルの数を示し、Ｎ_ｃは列のサンプルの数を示し、Ｋ_ｒは行のサンプルカテゴリの数を示し、Ｋ_ｃは列のサンプルカテゴリの数を示し、

は潜在変数の変分分布の近似値を示し、

は第ｋサンプルカテゴリにおける第ｉ行のサンプルデータの行の潜在変数を示し、

は、第ｌサンプルカテゴリにおける第ｊ列のサンプルデータの列の潜在変数を示し、αとβはそれぞれ行と列の混合比を示し、Ｄ_αはαの次元を示し、Ｄ_βはβの次元を示し、Ｄ_ｋｌは第ｋ行第ｌ列のサンプルカテゴリにおけるサブモデルパラメータの次元を示す。Ｌ（ａ，ｂ）＝ｌｏｇｂ＋(ａ−ｂ)／ｂであり、ａは

を示し、ｂは

を示す。従って、正規化項は、下記の式のように拡張されてもよい。

In any embodiment, the normalization term determined in response to the sample data, the at least two latent variables and the model parameter is:

Where N _r indicates the number of samples in the row, N _c indicates the number of samples in the column, K _r indicates the number of sample categories in the row, K _c indicates the number of sample categories in the column,

Is an approximation of the variational distribution of latent variables,

Indicates the latent variable of the i-th sample data row in the k-th sample category,

Indicates the latent variable of the column of the sample data of the j-th column in the l-th sample category, α and β indicate the mixing ratio of rows and columns, respectively, D _α indicates the dimension of α, and D _β indicates the dimension of β D _kl indicates the dimension of the sub-model parameter in the sample category in the k-th row and the l-th column. L (a, b) = logb + (ab) / b, where a is

And b is

Indicates. Therefore, the normalization term may be extended as in the following equation.

特に、Ｋ_ｒがブロック変数行列Ｚ^Ｒの行のサンプルカテゴリの数を示す場合にはＤ_α＝Ｄ（α）＝Ｋ_ｒ−１であり、Ｋ_ｃがブロック変数行列Ｚ^ｃの列のサンプルカテゴリの数を示す場合にはＤ_β＝Ｄ（β）＝Ｋ_ｃ−１であり、各サンプルカテゴリにおけるサンプルデータが、期待値μ及び分散δを有するガウス分布に従う場合にはＤ_ｋｌ＝Ｄ（φ_ｋｌ）＝２であり、各サンプルカテゴリにおけるサンプルデータが、期待値及び分散λを有するポアソン分布に従う場合にはＤ_ｋｌ＝Ｄ（φ_ｋｌ）＝１である。 In particular, when K _r indicates the number of sample categories in the row of the block variable matrix Z ^R , D _α = D (α) = K _r −1, and K _c is the sample category of the column of the block variable matrix Z ^c. D _β = D (β) = K _c −1, and if the sample data in each sample category follows a Gaussian distribution with expected value μ and variance δ, D _kl = D (φ _kl ) = 2 and D _kl = D (φ _kl ) = 1 if the sample data in each sample category follows a Poisson distribution with expected value and variance λ.

更に、潜在変数の変分分布の近似値

は、この実施形態に限定されない。

の値は、前回の更新又は初期設定によって取得された更新された潜在変数の変分分布の値を取るがこれに限定されるわけではない。理解を容易にするために、この実施形態では、前回更新された潜在変数の変分分布の値又は潜在変数の変分分布の近似値

として、初期設定から取得された、更新された潜在変数を採用して説明する。正規化項を初めて決定する場合、潜在変数の変数分布の近似値は、初期設定から取得された更新された潜在変数の変数分布の値であってよい。正規化項を決定することが初めてではない場合、潜在変数の変数分布の近似値は、前回更新された潜在変数の変分分布の値であってよい。 Furthermore, the approximate value of the variation distribution of the latent variable

Is not limited to this embodiment.

The value of is the variation distribution of the updated latent variable acquired by the previous update or initial setting, but is not limited to this. In order to facilitate understanding, in this embodiment, the last updated value of the variation distribution of the latent variable or the approximate value of the variation distribution of the latent variable is used.

As described below, an updated latent variable acquired from the initial setting is employed. When the normalization term is determined for the first time, the approximate value of the variable distribution of the latent variable may be the updated value of the variable distribution of the latent variable obtained from the initial setting. If it is not the first time to determine the normalization term, the approximate value of the variable distribution of the latent variable may be the value of the variational distribution of the latent variable updated last time.

ここで、正規化項を決定することによって、決定されたリレーショナルモデルの複雑性が自動的に制御されてよいし、リレーショナルモデルを決定することの効率が改善されることに留意すべきである。 It should be noted here that by determining the normalization term, the complexity of the determined relational model may be automatically controlled and the efficiency of determining the relational model is improved.

任意の実施形態において、サンプルデータ、少なくとも２つの潜在変数及びモデルパラメータに応じて決定される潜在変数の各々の変分分布の対数は、ｌｏｇｑ（Ｚ^Ｒ）及びｌｏｇｑ（Ｚ^ｃ）を含み、ｑ（Ｚ^Ｒ）は行の潜在変数Ｚ^Ｒの変分分布を示し、ｑ（Ｚ^ｃ）は列の潜在変数Ｚ^ｃの変分分布を示す。 In any embodiment, the logarithm of the variational distribution of each of the latent variables determined in response to the sample data, the at least two latent variables, and the model parameters includes logq (Z ^R ) and logq (Z ^c ), and q (Z ^R ) indicates the variation distribution of the latent variable Z ^{R in} the row, and q (Z ^c ) indicates the variation distribution of the latent variable Z ^{c in} the column.

特に、行の潜在変数Ｚ^Ｒの変分分布は、下記のように表現されてよい。

In particular, variation distribution of the latent variables Z ^R of the line, may be expressed as follows.

特に、列の潜在変数Ｚ^ｃの変分分布は、下記のように表現されてよい。

In particular, variation distribution of the latent variables Z ^c columns, may be expressed as follows.

ステップ２０２：目的関数は、対数尤度の期待値、正規化項の期待値及び潜在変数の各々の変分分布の対数の期待値に応じて決定される。 Step 202: The objective function is determined according to the expected value of the log likelihood, the expected value of the normalization term, and the expected value of the logarithm of each variation distribution of the latent variable.

対数尤度は、ステップ２０１において因数分解表現として記載されている。対数尤度を解けるようにするにあたり、下記に示す因子化情報量基準（Factorized Information Criterion）（ＦＩＣ）を一例とするタイトな下界（tight lower bound）を取得するために、各因数にラプラス近似が行われる。

ここで、

ＦＩＣが最大の場合、

はθの値を示す。 The log likelihood is described as a factorized expression in step 201. In order to solve the log-likelihood, Laplace approximation is used for each factor in order to obtain a tight lower bound using the Factorized Information Criterion (FIC) shown below as an example. Done.

here,

If FIC is the largest,

Indicates the value of θ.

更に、ＦＩＣがサンプルデータ

と潜在変数Ｚ^Ｒ及びＺ^ｃとを含むため、解答は、通常、期待値最大化（ＥＭ）アルゴリズムを介して行われる。しかしながら、リレーショナルモデルは、従属潜在変数によって決定されるので、従来のＥＭアルゴリズムは、ＦＩＣの解法に適用できない。ＦＩＣを解けるようにするにあたり、この実施形態は、漸近的に一様な下界を取得するために、下記の収束ＦＩＣ方法を採用する。

ここで、

関数はＬ（ａ，ｂ）＝ｌｏｇｂ＋(ａ−ｂ)／ｂである。 In addition, FIC has sample data

And the latent variables Z ^R and Z ^c , the answer is usually done via an Expectation Maximization (EM) algorithm. However, since the relational model is determined by the dependent latent variables, the conventional EM algorithm cannot be applied to the FIC solution. In order to be able to solve the FIC, this embodiment employs the following convergent FIC method in order to obtain an asymptotically uniform lower bound.

here,

The function is L (a, b) = logb + (ab) / b.

は、下記の式で表される。

Is represented by the following formula.

更に、目的関数は、上記ステップによって決定される。目的関数を介してリレーショナル関数を決定するために、この実施形態において提供される方法は、下記の後続のステップを更に含む。 Furthermore, the objective function is determined by the above steps. In order to determine the relational function via the objective function, the method provided in this embodiment further includes the following subsequent steps.

ステップ２０３：潜在変数の各々の更新された変分分布及び更新されたモデルパラメータが取得される。 Step 203: An updated variation distribution and updated model parameters for each of the latent variables are obtained.

ここで、

here,

初期設定の方法は、この実施形態に限定されない。特定の実施例において、初期設定は、確率論的な方法によって行われてもよい。即ち、α、β、φの値は、確率論的に初期設定される。しかし、上記の方法に加えて、他の方法も可能である。 The initial setting method is not limited to this embodiment. In certain embodiments, initialization may be done by a probabilistic method. That is, the values of α, β, and φ are initialized stochastically. However, in addition to the above methods, other methods are possible.

更に、現在の潜在変数Ｚ^Ｒの変分分布が計算されるときに前回の潜在変数Ｚ^ｃの変分分布の計算結果が使用されるように、潜在変数の各々の収束され更新された変分分布が取得されるまで、潜在変数の各々の変分分布を交互に更新すること、即ち、潜在変数Ｚ^ｃの変分分布と潜在変数Ｚ^Ｒの変分分布とを交互に更新することが必要とされる。収束条件は、この実施形態において限定されない。特定の実施例において、行の潜在変数Ｚ^Ｒについて、行の潜在変数の現在の変分分布と行の潜在変数の前回の変分分布との間のユークリッド距離が計算されてもよい。計算されたユークリッド距離が距離の閾値より短い場合には、行の潜在変数の現在の変分分布が収束していると判別される。距離の閾値の値は、実際の要件に応じて設定されてよいが、この実施形態において限定されない。 Further, as the calculation result of the variation distribution of the previous latent variable Z ^c when variational distribution of current latent variable Z ^R is calculated it is used, variations that are updated are converged for each latent variable until distribution is obtained, updating the variational distribution of each latent variable alternately, i.e., necessary to update the variation distribution of the latent variables Z ^R variational distribution of the latent variable Z ^c alternately It is said. The convergence condition is not limited in this embodiment. In certain embodiments, the latent variable Z ^R of the line, the Euclidean distance between the previous variation distribution of current variation distribution and line latent variables latent variables line may be calculated. If the calculated Euclidean distance is shorter than the distance threshold, it is determined that the current variation distribution of the latent variable in the row has converged. The distance threshold value may be set according to actual requirements, but is not limited in this embodiment.

それにもかかわらず、潜在変数の各々の更新された変分分布が収束しているか否かを判別する方法に加え、更新の固定回数を設定する方法において潜在変数の各々の収束された変分分布が取得されてもよい。更新回数の設定方法において、特定の実施例中、更新回数が所定の更新回数の閾値に達した場合には、反復が停止し、少なくとも２つの潜在変数の各々の更新された変分分布が取得される。なお、この実施形態は、所定の更新回数の閾値の値に限定を設けない。 Nevertheless, in addition to the method of determining whether each updated variational distribution of latent variables has converged, in addition to the method of setting a fixed number of updates, each converged variational distribution of latent variables May be acquired. In a method of setting the number of updates, in a specific embodiment, when the number of updates reaches a predetermined update number threshold, the iteration stops and an updated variation distribution of each of at least two latent variables is obtained. Is done. Note that this embodiment does not limit the threshold value for the predetermined number of updates.

任意の実施形態のように、潜在変数の各々の更新された変分分布及び更新されたモデルパラメータを取得する上記方法に加え、この実施形態において提供される方法は（潜在変数の各々の更新された変分分布及び更新されたモデルパラメータを取得する下記の方法に限定されないが）、
下記の式

を使用してモデルパラメータを更新することによって更新されたモデルパラメータを取得することと、
潜在変数の各々の収束され更新された変分分布を取得するために下記の式

ここで、ｔは現在の更新を示し、ｔ−１は前回の更新又は初期設定を示す、
を使用することによって更新されたモデルパラメータに応じて潜在変数の各々の変分分布を交互に更新することと、
を含む。 In addition to the method described above for obtaining an updated variational distribution and updated model parameters for each of the latent variables, as in any embodiment, the method provided in this embodiment is (updated for each of the latent variables. But not limited to the following methods of obtaining the variation distribution and updated model parameters),
The following formula

Obtaining updated model parameters by updating the model parameters using
To get the converged and updated variational distribution for each of the latent variables:

ステップ２０３における式が潜在変数の更新された変分分布及び更新されたモデルパラメータを取得するために初めて使用された場合、ｔ−１が初期設定を示すので、ｔ−１に対応するパラメータは初期値である。例えば、ステップ２０３において、潜在変数の更新された変分分布及び更新されたモデルパラメータが初めて取得された場合、例えば、上記の式における

はα_ｋの初期値を示し、

はβ_ｌの初期値を示す。なお、この実施形態は、初期設定の方法に限定を設けない。特定の実施例において、確率論的な初期設定の方法は、ｑ（Ｚ^Ｒ）及びｑ（Ｚ^Ｃ）を初期化するために使用されてよい。それにもかかわらず、確率論的な初期設定の方法に加え、他の方法も可能であってよい。 When the equation in step 203 is used for the first time to obtain an updated variational distribution of latent variables and updated model parameters, t-1 indicates the initial setting, so the parameter corresponding to t-1 is the initial Value. For example, when the updated variation distribution of the latent variable and the updated model parameter are obtained for the first time in step 203, for example, in the above equation

Indicates the initial value of α _k ,

It represents the initial value of β _l. This embodiment does not limit the initial setting method. In certain embodiments, a stochastic initialization method may be used to initialize q (Z ^R ) and q (Z ^C ). Nevertheless, in addition to the stochastic initialization method, other methods may be possible.

ステップ２０３において上記の式を使用することによって潜在変数の各々の更新された変分分布及び更新されたモデルパラメータを取得することが初めてではない場合、ｔ−１は前回の更新を示すので、ｔ−１に対応するパラメータは前回更新された値である。例えば、ステップ２０３において潜在変数の各々の更新された変分分布及び更新されたモデルパラメータを３回取得した場合、上記の式における

は潜在変数の各々の更新された変分分布及び更新されたモデルパラメータを２回取得した場合に取得されたα_ｋの値を示し、

は潜在変数の各々の更新された変分分布及び更新されたモデルパラメータを２回取得した場合に取得されたβ_ｌの値を示す。 If it is not the first time to obtain each updated variation distribution and updated model parameter of the latent variable by using the above equation in step 203, t-1 indicates the previous update, so t The parameter corresponding to -1 is the value updated last time. For example, if the updated variation distribution and updated model parameters of each latent variable are obtained three times in step 203,

Indicates the updated variation distribution of each of the latent variables and the value of α _k obtained when the updated model parameters are acquired twice,

Indicates the updated variation distribution of each latent variable and the value of β _l obtained when the updated model parameters are acquired twice.

更に、潜在変数の各々の変分分布が更新されたモデルパラメータに応じて更新された場合、潜在変数の各々の収束され更新された変分分布が取得されるまで、潜在変数Ｚ^Ｒの変分分布と潜在変数Ｚ^ｃの変分分布とを交互に更新することも必要とされる。 Furthermore, if each variation distribution of the latent variable is updated in accordance with the model parameters are updated, until each converged updated variation distribution of latent variables is obtained, variation of latent variables Z ^R It is also a need to update distribution and the variation distribution of the latent variable Z ^c alternately.

加えて、特定の実施例において、行のサンプルカテゴリの異なる数Ｋ_ｒ及び列のサンプルカテゴリの異なる数Ｋ_ｃが設定されてもよい。例えば、Ｋ_ｒの最小値がＫ_ｒｍｉｎとして設定され、一方でＫ_ｒの最大値がＫ_ｒｍａｘとして設定される。また、Ｋ_ｃの最小値がＫ_ｃｍｉｎとして設定され、一方でＫ_ｃの最大値がＫ_ｃｍａｘとして設定される。Ｋ_ｒとＫ_ｃの範囲内で、Ｋ_ｒとＫ_ｃの各値の組み合わせについて、潜在変数の各々の更新された変分分布及び更新されたモデルパラメータが取得される。 In addition, in certain embodiments, a different number K _r of row sample categories and a different number K _c of column sample categories may be set. For example, the minimum value of _{K r} is set as _{K rmin,} whereas the maximum value of _{K r} is set as _{K rmax} at. The minimum value of _{K c} is set as _{K cmin,} whereas the maximum value of _{K c} is set as _{K cmax} with. Within the K _r and K _c, for each combination of values of K _r and K _c, each variation distribution and the updated model parameters are updated in the latent variable is obtained.

潜在変数の各々の更新された変分分布及び更新されたモデルパラメータの取得時において、潜在変数の各々の収束され更新された変分分布が取得されるまで、潜在変数の各々の変分分布が最初に交互に更新されてよいこと、及び、更新されたモデルパラメータを取得するために、モデルパラメータが潜在変数の各々の更新された変分分布に応じて更新されることに留意すべきである。又は、更新されたモデルパラメータを取得するためにモデルパラメータが最初に更新されてよいこと、及び、潜在変数の各々の収束され更新された変分分布を取得するために、潜在変数の各々の変分分布が更新されたモデルパラメータに応じて交互に更新されることに留意すべきである。つまり、潜在変数の各々の変分分布とモデルパラメータの更新順序はこの実施形態において限定されない。 At the time of obtaining each updated variation distribution and updated model parameter of the latent variable, each variation distribution of the latent variable is obtained until each converged and updated variation distribution of the latent variable is obtained. It should be noted that it may be updated alternately first and that the model parameters are updated according to each updated variation distribution of the latent variables to obtain updated model parameters. . Alternatively, the model parameters may be updated first to obtain updated model parameters, and each variable of the latent variable is obtained to obtain a converged and updated variational distribution of each of the latent variables. It should be noted that the minute distribution is updated alternately according to the updated model parameters. That is, the variation distribution of each latent variable and the update order of model parameters are not limited in this embodiment.

ステップ２０４：潜在変数の各々の更新された変分分布及び更新されたモデルパラメータに応じて目的関数が収束するか否かが判別され、目的関数が収束しない場合には潜在変数の各々の更新された変分分布及び更新されたモデルパラメータが再取得される。 Step 204: It is determined whether or not the objective function converges according to the updated variation distribution of each latent variable and the updated model parameters. If the objective function does not converge, each latent variable is updated. The variation distribution and updated model parameters are reacquired.

特に、潜在変数の各々の更新された変分分布及び更新されたモデルパラメータに応じて目的関数が収束するか否かを判別することは、これに限定されないが、
潜在変数の各々の更新された変分分布及び更新されたモデルパラメータに応じて決定された目的関数と、潜在変数の各々の前回更新された変分分布及び前回更新されたモデルパラメータに応じて決定された前回取得された目的関数と、の間の距離が閾値より短いか否かを判別することと、
潜在変数の各々の更新された変分分布及び更新されたモデルパラメータに応じて決定された目的関数と前回取得された目的関数との間の距離が閾値より短い場合には目的関数が収束すると判別することと、
を含む。 In particular, it is not limited to determining whether the objective function converges according to each updated variation distribution of the latent variables and the updated model parameters,
Objective function determined according to each updated variation distribution and updated model parameter of latent variable, and determined according to last updated variation distribution and last updated model parameter of each latent variable Determining whether the distance between the previously obtained objective function and the objective function is shorter than a threshold;
It is determined that the objective function converges if the distance between the objective function determined according to each updated variation distribution of the latent variable and the updated model parameter and the objective function acquired last time is shorter than the threshold value. To do
including.

この実施形態は、閾値を特定の値に限定しない。特定の実施例中、異なる閾値が、サンプルデータのデータ量のような要因に応じて設定されてよい。目的関数は、少なくとも２つの潜在変数の各々の更新された変分分布及び更新されたモデルパラメータに応じて決定され、その結果、対数尤度に常に漸近する。目的関数が収束すると、対数尤度の値は、目的関数の値に近似される。このように、解けない対数は解ける目的関数に変換されるので、リレーショナルモデルの決定が達成される。 This embodiment does not limit the threshold to a specific value. In certain embodiments, different thresholds may be set depending on factors such as the amount of sample data. The objective function is determined as a function of the updated variational distribution and updated model parameters of each of the at least two latent variables, so that it is always asymptotic to log likelihood. When the objective function converges, the log likelihood value is approximated to the value of the objective function. In this way, an unsolvable logarithm is converted into a solvable objective function, so that a relational model decision is achieved.

少なくとも２つの潜在変数の各々の更新された変分分布及び更新されたモデルパラメータの他の取得中に目的関数が収束していないと判別された場合、処理はステップ２０３に戻り、ステップ２０３において少なくとも２つの潜在変数の各々の更新された変分分布及び更新されたモデルパラメータは再度取得されることに留意すべきである。少なくとも２つの潜在変数の各々の更新された変分分布の最初の取得中、ステップ２０３の式におけるｔ−１は初期値を示す。一方で、少なくとも２つの潜在変数の各々の更新された変分分布及び更新されたモデルパラメータを再度取得するために、処理がステップ２０３に戻った場合、ステップ２０３の式におけるｔ−１は前回の取得を示す。例えば、ステップ２０３における式の使用による少なくとも２つの潜在変数の各々の更新された変分分布及び更新されたモデルパラメータの初回取得中、上記の式におけるｔ−１に対応するパラメータは初期値を使用し、初回に取得された潜在変数の更新された変分分布及び更新されたモデルパラメータが取得される。初回に取得された潜在変数の更新された変分分布及び更新されたモデルパラメータが目的関数を収束させることに失敗した場合には、初回に取得された潜在変数の更新された変分分布及び更新されたモデルパラメータは上記ステップ２０３におけるｔ−１に対応するパラメータの値として使用され、潜在変数の更新された変分分布及び更新されたモデルパラメータは再度取得される。そして、再度取得された潜在変数の更新された変分分布及び更新されたモデルパラメータが、決定された目的関数を収束させられるか否かが判別される。更新は、目的関数を収束可能な潜在変数の更新された変分分布及び更新されたモデルパラメータが取得されるまで、このような方法で行われる。 If it is determined that the objective function has not converged during another acquisition of the updated variation distribution and updated model parameters of each of the at least two latent variables, the process returns to step 203 and at least in step 203 It should be noted that the updated variation distribution and updated model parameters for each of the two latent variables are obtained again. During the initial acquisition of the updated variational distribution of each of the at least two latent variables, t-1 in the equation of step 203 represents an initial value. On the other hand, if the process returns to step 203 to obtain again the updated variation distribution and updated model parameters for each of the at least two latent variables, t-1 in the equation of step 203 is Indicates acquisition. For example, during the initial acquisition of the updated variational distribution and updated model parameters for each of at least two latent variables by using the formula in step 203, the parameter corresponding to t-1 in the above formula uses the initial value. Then, the updated variation distribution of the latent variable acquired for the first time and the updated model parameter are acquired. The updated variation distribution and update of the first acquired latent variable if the updated variation distribution and updated model parameter of the first acquired failure fail to converge the objective function The updated model parameter is used as the value of the parameter corresponding to t-1 in step 203, and the updated variation distribution of the latent variable and the updated model parameter are obtained again. Then, it is determined whether the updated variation distribution of the latent variable acquired again and the updated model parameter can converge the determined objective function. Updating is performed in this way until an updated variational distribution of latent variables and updated model parameters that can converge the objective function are obtained.

ステップ２０５：目的関数の収束を可能にする潜在変数の各々の変分分布及びモデルパラメータに応じてリレーショナルモデルが決定される。 Step 205: A relational model is determined according to the variational distribution and model parameters of each of the latent variables that enable convergence of the objective function.

このステップにおいて、収束時の目的関数の値が対数尤度に近似すると、目的関数の収束を可能にする潜在変数の各々の変分分布及びモデルパラメータに応じてリレーショナルモデルが決定されてよい。 In this step, when the value of the objective function at the time of convergence approximates the logarithmic likelihood, a relational model may be determined according to each variation distribution and model parameter of the latent variable that enables the objective function to converge.

更に、行のサンプルカテゴリの異なる数Ｋ_ｒ及び列のサンプルカテゴリの異なる数Ｋ_ｃが設定されてよい。また、Ｋ_ｒとＫ_ｃの各値の組み合わせについて、潜在変数の各々の更新された変分分布及び更新されたモデルパラメータが取得されると、目的関数の最大値を可能にするＫ_ｒとＫ_ｃは、目的関数の収束を可能にすることに基づいて選択されてよい。更に、リレーショナルモデルは、Ｋ_ｒとＫ_ｃの使用により計算された潜在変数の各々の変分分布及びモデルパラメータに応じて決定される。 Furthermore, the number K _c of different sample categories number K _r and column different sample categories of lines may be set. Also, K combinations of values of _r and K _c, as each updated variational distribution and updated model parameters of latent variables are acquired, K _r and K that enables the maximum value of the objective function _c may be selected based on enabling convergence of the objective function. Furthermore, the relational model is determined according to the variational distribution and the model parameters for each of the calculated latent variables by use of the K _r and K _c.

決定されたリレーショナルモデルにおいて、行のサンプルカテゴリの初期値Ｋ_ｒ及び列のサンプルカテゴリの初期値Ｋ_ｃは、行のサンプルカテゴリの最終値及び列のサンプルカテゴリの最終値と同じであってもよいし、異なっていてもよいことに留意すべきである。つまり、リレーショナルモデルの構造は、リレーショナルモデルを決定する処理の間に自動的に調整されてよい。 In the determined relational model, the initial value K _r of the row sample category and the initial value K _c of the column sample category may be the same as the final value of the row sample category and the final value of the column sample category. It should be noted that they may be different. That is, the structure of the relational model may be automatically adjusted during the process of determining the relational model.

２次元のサンプルデータのリレーショナルモデルは、ステップ２０１からステップ２０５において決定される。この実施形態において提供される方法は、２以上の次元を有するサンプルデータのリレーショナルモデルに対して更に使用されてもよい。例えば、サンプルデータの次元は、３、４、５等であってよい。 A relational model of two-dimensional sample data is determined in steps 201 to 205. The method provided in this embodiment may be further used for a relational model of sample data having two or more dimensions. For example, the dimension of the sample data may be 3, 4, 5, etc.

本開示のこの実施形態において提供される方法が２以上の次元を有するサンプルデータのリレーショナルモデルの決定に対して使用された場合、ステップ２０１において、サンプルデータ、少なくとも２つの潜在変数及びモデルパラメータに応じて決定される対数尤度は下記の式で表される。

ここで、ｌｏｇｐ（）は対数尤度を示し、ｐは同時確率密度関数を示し、

はサンプルデータを示し、ｄはサンプルデータの次元を示し、Ｎ_１は第１次元のサンプルデータの数を示し、Ｎ_ｄは第ｄ次元のサンプルデータの数を示し、Ｚ^１は第１次元のサンプルデータの潜在変数を示し、Ｚ^ｄは第ｄ次元のサンプルデータの潜在変数を示し、θはモデルパラメータのセットを示し、モデルパラメータはα^１，．．．，α^ｄ，φを含み、α^１は第１次元の混合比であり、α^ｄは第ｄ次元の混合比であり、φはモデルパラメータを示す。 If the method provided in this embodiment of the present disclosure is used for the determination of a relational model of sample data having two or more dimensions, in step 201, depending on the sample data, at least two latent variables and model parameters The log likelihood determined by the above is expressed by the following equation.

サンプルデータ、少なくとも２つの潜在変数及びモデルパラメータに応じて決定される正規化項は下記の式で表される。

ここで、ｄはサンプルデータの次元を示し、Ｎ_１は第１次元のサンプルデータの数を示し、Ｎ_ｄは第ｄ次元のサンプルデータの数を示し、Ｎ_ｉは第ｉ次元のサンプルデータの数を示し、Ｋ_ｌは第１次元のサンプルカテゴリの数を示し、Ｋ_ｄは第ｄ次元のサンプルカテゴリの数を示し、

は潜在変数の変分分布の近似値を示し、

はα^ｉの次元を示し、

を示し、ｂは

を示す。 A normalization term determined according to sample data, at least two latent variables, and model parameters is expressed by the following equation.

Here, d represents the dimension of the sample data, N ₁ represents the number of first dimension sample data, N _d is the number of sample data of the d-dimensional, N _i is the sample data of the i-dimensional K _l indicates the number of sample categories in the first dimension, K _d indicates the number of sample categories in the d dimension,

Is an approximation of the variational distribution of latent variables,

Indicates the dimension of α ⁱ ,

And b is

Indicates.

サンプルデータ、少なくとも２つの潜在変数及びモデルパラメータに応じて決定される潜在変数の各々の変分分布の対数は、ｌｏｇｑ（Ｚ^１），．．．ｌｏｇｑ（Ｚ^ｄ）を含む。ここで、ｑ（Ｚ^１）は潜在変数Ｚ^１の変分分布を示し、ｑ（Ｚ^ｄ）は潜在変数Ｚ^ｄの変分分布を示す。 The logarithm of the variational distribution of each of the latent variables determined according to the sample data, at least two latent variables and the model parameters is logq (Z ¹ ),. . . logq (Z ^d ) is included. Here, q (Z ¹ ) represents the variational distribution of the latent variable Z ¹ , and q (Z ^d ) represents the variational distribution of the latent variable Z ^d .

ステップ２０２において、対数尤度の期待値、正規化項の期待値及び潜在変数の各々の変分分布の対数の期待値に応じて決定された目的関数

は、下記の式で表される。

In step 202, the objective function determined according to the expected value of the log likelihood, the expected value of the normalization term, and the expected value of the logarithm of each variation distribution of the latent variable

Is represented by the following formula.

任意の実施形態のように、ステップ２０３において、潜在変数の各々の更新された変分分布及び更新されたモデルパラメータを取得することは、これに限定されないが、
潜在変数の各々の収束され更新された変分分布を取得するまで、下記の式

ｔは現在の更新を示し、ｔ−１は前回の更新又は初期設定を示す、
を使用することによって潜在変数の各々の更新された変分分布に応じて更新されたモデルパラメータを取得することと、
を含む。 As in any embodiment, in step 203, obtaining an updated variation distribution and updated model parameters for each of the latent variables is not limited to this,
Until we get the converged and updated variational distribution of each of the latent variables

任意の実施形態のように、ステップ２０３において、潜在変数の各々の更新された変分分布及び更新されたモデルパラメータを取得することは、これに限定されないが、
更新されたモデルパラメータを取得するために下記の式

ここで、ｔは現在の更新を示し、ｔ−１は前回の更新又は初期設定を示す、
を使用することによって更新されたモデルパラメータに応じて潜在変数の各々の変分分布を交互に更新することと、
を含む。 As in any embodiment, in step 203, obtaining an updated variation distribution and updated model parameters for each of the latent variables is not limited to this,
To get the updated model parameters:

ステップ２０４及びステップ２０５の本実施方法は、２以上の次元を有するサンプルデータに直接的に適用されてよいので、リレーショナルモデルの決定に直接的に適用されてもよい。 Since this implementation method of step 204 and step 205 may be directly applied to sample data having two or more dimensions, it may be directly applied to the determination of the relational model.

決定されたリレーショナルモデルは、データのクラスタリング及びデータのカテゴライジングに適用されてよい。クラスタリングデータの場合、リレーショナルモデルを決定する処理はクラスタリングデータの処理である。カテゴライジングデータの場合、リレーショナルモデルを決定するために更なる後処理をまだ行う必要がある。クラスタリング及びカテゴライジングの結果は顧客分析、生物分析、土地分析（geoanalysis）等に対して使用されるので、多くの社会的価値及び経済的価値が発生する。 The determined relational model may be applied to data clustering and data categorization. In the case of clustering data, the process for determining the relational model is the clustering data process. In the case of categorizing data, further post-processing still needs to be done to determine the relational model. Since the results of clustering and categorizing are used for customer analysis, bioanalysis, land analysis, etc., many social and economic values are generated.

本開示の実施形態において提供される方法によれば、サンプルデータ、サンプルデータのサンプルカテゴリを記述するための少なくとも２つの潜在変数及びモデルパラメータに応じて決定される対数尤度と、正規化項と、潜在変数の各々の変分分布の対数と、を取得し、対数尤度と、正規化項と、潜在変数の各々の変分分布の対数と、に応じて目的関数を決定することによって、目的関数の収束を可能にする潜在変数の各々の変分分布及びモデルパラメータに応じて決定されたリレーショナルモデルは、２つ以上のエンティティ間における相互関係の分析に適用可能になり、その結果、リレーショナルモデルの適用範囲を拡張する。加えて、目的関数に正規化項を導入することによって、決定されたリレーショナルモデルの複雑性は自動的に制御され、リレーショナルモデルを決定する効率は改善される。更に、潜在変数とモデルパラメータとが相互に依存しているので、潜在変数の各々の変分分布及びモデルパラメータの決定はより正確である。 According to a method provided in an embodiment of the present disclosure, a logarithmic likelihood determined according to at least two latent variables and model parameters for describing sample data, a sample category of sample data, a normalization term, and , Obtaining the logarithm of each variational distribution of latent variables, and determining an objective function according to the log likelihood, normalization term, and logarithm of each variational distribution of latent variables, The relational model determined according to the variational distribution and model parameters of each of the latent variables that enable the convergence of the objective function becomes applicable to the analysis of the interrelationship between two or more entities, so that relational Extend the scope of the model. In addition, by introducing a normalization term into the objective function, the complexity of the determined relational model is automatically controlled and the efficiency of determining the relational model is improved. Furthermore, since the latent variables and model parameters are interdependent, the determination of the variation distribution and model parameters of each of the latent variables is more accurate.

（実施形態３）
図３を参照すると、本開示のこの実施形態は、リレーショナルモデル決定用の装置を提供する。なお、この装置は、
サンプルデータ、少なくとも２つの潜在変数及びモデルパラメータに応じて決定される対数尤度と、正規化項と、潜在変数の各々の変分分布の対数と、を取得するように構成された取得モジュール３０１と、
対数尤度と、正規化項と、潜在変数の各々の変分分布の対数と、に応じて目的関数を決定するように構成された第１の決定モジュール３０２と、
目的関数の収束を可能にする潜在変数の各々の変分分布及びモデルパラメータを決定するように構成された第２の決定モジュール３０３と、
目的関数の収束を可能にする潜在変数の各々の変分分布及びモデルパラメータに応じてリレーショナルモデルを決定するように構成された第３の決定モジュール３０４と、
を備え、
潜在変数の各々は、サンプルデータのサンプルカテゴリを記述するために使用される。 (Embodiment 3)
Referring to FIG. 3, this embodiment of the present disclosure provides an apparatus for relational model determination. This device is
Acquisition module 301 configured to acquire log likelihood determined according to sample data, at least two latent variables and model parameters, a normalization term, and the logarithm of each variational distribution of the latent variables. When,
A first determination module 302 configured to determine an objective function as a function of log likelihood, normalization term, and logarithm of each variational distribution of latent variables;
A second determination module 303 configured to determine variation distributions and model parameters of each of the latent variables that enable convergence of the objective function;
A third determination module 304 configured to determine a relational model as a function of each variational distribution of latent variables and model parameters that enable convergence of the objective function;
With
Each of the latent variables is used to describe a sample category of sample data.

任意の実施形態のように、取得モジュール３０１によって取得された対数尤度は、

はサンプルデータを示し、ｄはサンプルデータの次元を示し、Ｎ_１は第１次元のサンプルデータの数を示し、Ｎ_ｄは第ｄ次元のサンプルデータの数を示し、Ｚ^１は第１次元のサンプルデータの潜在変数を示し、Ｚ^ｄは第ｄ次元のサンプルデータの潜在変数を示し、θはモデルパラメータのセットを示し、モデルパラメータはα^１，．．．，α^ｄ，φを含み、α^１は第１次元の混合比であり、α^ｄは第ｄ次元の混合比であり、φはモデルパラメータを示す。 As in any embodiment, the log likelihood obtained by the acquisition module 301 is

任意の実施形態のように、取得モジュール３０１によって取得された正規化項は、

は潜在変数の変分分布の近似値を示し、

はα^ｉの次元を示し、

を示し、ｂは

を示す。 As in any embodiment, the normalization term obtained by the acquisition module 301 is

Is an approximation of the variational distribution of latent variables,

Indicates the dimension of α ⁱ ,

And b is

Indicates.

任意の実施形態のように、取得モジュール３０１によって取得された潜在変数の各々の変分分布の対数はｌｏｇｑ（Ｚ^１），．．．ｌｏｇｑ（Ｚ^ｄ）を含み、ｑ（Ｚ^１）は潜在変数Ｚ^１の変分分布を示し、ｑ（Ｚ^ｄ）は潜在変数Ｚ^ｄの変分分布を示す。 As in any embodiment, the logarithm of the variational distribution of each of the latent variables acquired by the acquisition module 301 is logq (Z ¹ ),. . . logq (Z ^d ), q (Z ¹ ) represents the variational distribution of the latent variable Z ¹ , and q (Z ^d ) represents the variational distribution of the latent variable Z ^d .

任意の実施形態のように、第１の決定モジュール３０２は、対数尤度の期待値、正規化項の期待値及び潜在変数の各々の変分分布の対数の期待値に応じて目的関数を決定するように構成されている。 As in any embodiment, the first determination module 302 determines the objective function according to the expected value of the log likelihood, the expected value of the normalization term, and the expected value of the logarithm of each variation distribution of the latent variable. Is configured to do.

任意の実施形態のように、第１の決定モジュール３０２によって決定された目的関数

は、

である。 Objective function determined by the first determination module 302, as in any embodiment

Is

It is.

任意の実施形態のように、図４を参照すると、第２の決定モジュール３０３は、
潜在変数の各々の更新された変分分布及び更新されたモデルパラメータを取得するように構成された取得ユニット３０３１と、
潜在変数の各々の更新された変分分布及び更新されたモデルパラメータに応じて目的関数が収束するか否かを判別するように構成された判別ユニット３０３２と、
を含み、
取得ユニット３０３１は、目的関数が収束するまで潜在変数の各々の変分分布及び更新されたモデルパラメータを再取得するように構成されている。 As in any embodiment, referring to FIG. 4, the second determination module 303
An acquisition unit 3031 configured to acquire an updated variation distribution and updated model parameters of each of the latent variables;
A discriminating unit 3032 configured to discriminate whether or not the objective function converges according to each updated variation distribution of the latent variables and the updated model parameters;
Including
The acquisition unit 3031 is configured to re-acquire each variation distribution of latent variables and updated model parameters until the objective function converges.

任意の実施形態のように、図５を参照すると、取得ユニット３０３１は、
潜在変数の各々の収束され更新された変分分布を取得するまで、下記の式

を使用することによって潜在変数の各々の変分分布を交互に更新するように構成された第１の更新サブユニット３０３１１と、
下記の式

ｔは現在の更新を示し、ｔ−１は前回の更新又は初期設定を示す、
を使用することによって潜在変数の各々の更新された変分分布に応じて更新されたモデルパラメータを取得するように構成された第２の更新サブユニット３０３１２と、
を含む。 As in any embodiment, referring to FIG.
Until we get the converged and updated variational distribution of each of the latent variables

A first update subunit 30311 configured to alternately update the variational distribution of each of the latent variables by using
The following formula

t indicates the current update, t-1 indicates the previous update or initial setting,
A second update subunit 30312 configured to obtain an updated model parameter in response to each updated variational distribution of latent variables by using
including.

任意の実施形態のように、図６を参照すると、取得ユニット３０３１は、
更新されたモデルパラメータを取得するために下記の式

を使用することによってモデルパラメータを更新するように構成された第３の更新サブユニット３０３１３と、
潜在変数の各々の収束され更新された変分分布を取得するために下記の式

ここで、ｔは現在の更新を示し、ｔ−１は前回の更新又は初期設定を示す、
を使用することによって更新されたモデルパラメータに応じて潜在変数の各々の変分分布を交互に更新するように構成された第４の更新サブユニット３０３１４と、
を含む。 As in any embodiment, referring to FIG.
To get the updated model parameters:

A third update subunit 30313 configured to update model parameters by using
To get the converged and updated variational distribution for each of the latent variables:

Here, t indicates the current update, t-1 indicates the previous update or initial setting,
A fourth update subunit 30314 configured to alternately update the variational distribution of each of the latent variables in response to the model parameters updated by using
including.

任意の実施形態のように、図７を参照すると、判別ユニット３０３２は、
潜在変数の各々の更新された変分分布及び更新されたモデルパラメータに応じて決定された目的関数と前回取得された目的関数との間の距離が閾値より短いか否かを判別するように構成された比較サブユニット３０３２１と、
潜在変数の各々の更新された変分分布及び更新されたモデルパラメータに応じて決定された目的関数と前回取得された目的関数との間の距離が閾値より短い場合には目的関数が収束すると判別するように構成された判別サブユニット３０３２２と、
を含む。 As in any embodiment, referring to FIG.
Configured to determine whether the distance between the objective function determined in accordance with each updated variation distribution of the latent variables and the updated model parameters and the previously obtained objective function is shorter than a threshold value. Compared subunit 30321,
It is determined that the objective function converges if the distance between the objective function determined according to each updated variation distribution of the latent variable and the updated model parameter and the objective function acquired last time is shorter than the threshold value. A discrimination subunit 30322 configured to:
including.

結論として、本開示の実施形態において提供される装置によれば、サンプルデータ、サンプルデータのサンプルカテゴリを記述するための少なくとも２つの潜在変数及びモデルパラメータに応じて決定される対数尤度と、正規化項と、潜在変数の各々の変分分布の対数と、を取得し、対数尤度と、正規化項と、潜在変数の各々の変分分布の対数と、に応じて目的関数を決定することによって、目的関数の収束を可能にする潜在変数の各々の変分分布及びモデルパラメータに応じて決定されたリレーショナルモデルは、２つ以上のエンティティ間における相互関係の分析に適用可能になり、その結果、リレーショナルモデルの適用範囲を拡張する。加えて、目的関数に正規化項を導入することによって、決定されたリレーショナルモデルの複雑性は自動的に制御され、リレーショナルモデルを決定する効率は改善される。更に、潜在変数とモデルパラメータとが相互に依存しているので、潜在変数の各々の変分分布及びモデルパラメータの概算はより正確である。 In conclusion, according to the apparatus provided in the embodiments of the present disclosure, the log likelihood determined according to the sample data, at least two latent variables for describing the sample category of the sample data and the model parameters, and the normal Obtain the logarithm term and the logarithm of each variation distribution of the latent variable, and determine the objective function according to the log likelihood, the normalization term, and the logarithm of each variation distribution of the latent variable The relational model determined according to the variational distribution and model parameters of each of the latent variables enabling the convergence of the objective function can be applied to the analysis of the correlation between two or more entities, As a result, the scope of application of the relational model is expanded. In addition, by introducing a normalization term into the objective function, the complexity of the determined relational model is automatically controlled and the efficiency of determining the relational model is improved. Furthermore, because the latent variables and model parameters are interdependent, the variational distribution and model parameter estimates for each of the latent variables are more accurate.

上記実施形態において提供されるリレーショナルモデル決定用の装置によって行われるリレーショナルモデルの決定中、装置は、一例として、上記機能モジュールの分割を用いて記述されているにすぎないことに留意すべきである。実際には、機能は、必要に応じて実施のために別の機能モジュールに割り当てられてよい。具体的には、装置の内部構造は、上記機能の全部又は一部を実施するために別の機能モジュールに分割される。加えて、リレーショナルモデル決定用の装置及びリレーショナルモデル決定用の方法は、同一の発明思想に関連する。なお、特定の実施例は、方法の実施形態において説明されるが、ここではこれ以上詳述しない。 It should be noted that during the relational model determination performed by the relational model determination apparatus provided in the above embodiment, the apparatus is only described using the functional module partitioning as an example. . In practice, functions may be assigned to different functional modules for implementation as needed. Specifically, the internal structure of the device is divided into separate functional modules in order to implement all or part of the above functions. In addition, the relational model determination apparatus and the relational model determination method relate to the same inventive idea. Although specific examples are described in the method embodiments, they are not described in further detail here.

本発明の前述の実施形態の連番は、説明を容易にするためのものにすぎず、実施形態の優先度を示すものではない。 The serial numbers of the above-described embodiments of the present invention are merely for ease of explanation, and do not indicate the priorities of the embodiments.

当業者は、前述の方法のステップの全部又は一部は、ハードウェア又はプログラムの指示に従うハードウェアよって実行されてよいことを理解すべきである。プログラムは、非一時的なコンピュータ読み取り可能な記録媒体に格納されてよいし、少なくとも１つのプロセッサによって実行されてよい。記録媒体は、読み出し専用メモリ、磁気ディスク、又はコンパクトディスク読み出し専用メモリであってよい。 Those skilled in the art should understand that all or part of the method steps described above may be performed by hardware or hardware according to program instructions. The program may be stored in a non-transitory computer-readable recording medium and executed by at least one processor. The recording medium may be a read only memory, a magnetic disk, or a compact disk read only memory.

上記は、単に本発明の好ましい実施形態であって、本発明を限定するものではない。本発明の精神及び原理から逸脱することなく行われる種々の修正、同等の置換、又は改良は、本発明の保護範囲内に含まれるべきである。 The above are only preferred embodiments of the present invention, and do not limit the present invention. Various modifications, equivalent substitutions, or improvements made without departing from the spirit and principle of the present invention should be included in the protection scope of the present invention.

（付記１）
サンプルデータ、少なくとも２つの潜在変数及びモデルパラメータに応じて決定される対数尤度と、正規化項と、前記潜在変数の各々の変分分布の対数と、を取得することと、
前記対数尤度と、前記正規化項と、前記潜在変数の各々の前記変分分布の対数と、に応じて目的関数を決定することと、
前記目的関数の収束を可能にする前記潜在変数の各々の変分分布及びモデルパラメータを決定し、前記目的関数の収束を可能にする前記潜在変数の各々の前記変分分布及び前記モデルパラメータに応じてリレーショナルモデルを決定することと、
を備え、
前記潜在変数の各々は、前記サンプルデータのサンプルカテゴリを記述するために使用される、
リレーショナルモデル決定用の方法。 (Appendix 1)
Obtaining logarithmic likelihood determined according to sample data, at least two latent variables and model parameters, a normalization term, and a logarithm of each variational distribution of the latent variables;
Determining an objective function according to the log likelihood, the normalization term, and the logarithm of the variational distribution of each of the latent variables;
Determining a variational distribution and model parameters of each of the latent variables enabling convergence of the objective function, and depending on the variational distribution and model parameters of each of the latent variables enabling convergence of the objective function To determine the relational model
With
Each of the latent variables is used to describe a sample category of the sample data;
A method for relational model determination.

（付記２）
前記サンプルデータ、前記少なくとも２つの潜在変数及び前記モデルパラメータに応じて決定される前記対数尤度は、

は前記サンプルデータを示し、ｄは前記サンプルデータの次元を示し、Ｎ_１は第１次元のサンプルデータの数を示し、Ｎ_ｄは第ｄ次元のサンプルデータの数を示し、Ｚ^１は第１次元のサンプルデータの潜在変数を示し、Ｚ^ｄは第ｄ次元のサンプルデータの潜在変数を示し、θは前記モデルパラメータのセットを示し、前記モデルパラメータはα^１，．．．，α^ｄ，φを含み、α^１は第１次元の混合比であり、α^ｄは第ｄ次元の混合比であり、φはモデルパラメータを示す、
付記１に記載の方法。 (Appendix 2)
The log likelihood determined according to the sample data, the at least two latent variables and the model parameter is:

Represents the sample data, d represents the dimension of the sample data, N ₁ represents the number of sample data in the first dimension, N _d represents the number of sample data in the d-th dimension, and Z ¹ represents the first Dimensional sample data latent variables, Z ^d denotes d-dimensional sample data latent variables, θ denotes the set of model parameters, and the model parameters are α ¹ ,. . . , Α ^d , φ, α ¹ is the first dimensional mixing ratio, α ^d is the d dimensional mixing ratio, and φ is the model parameter,
The method according to appendix 1.

（付記３）
前記サンプルデータ、前記少なくとも２つの潜在変数及び前記モデルパラメータに応じて決定される前記正規化項は、

は前記潜在変数の前記変分分布の近似値を示し、

はα^ｉの次元を示し、

を示し、ｂは

を示す、
付記１に記載の方法。 (Appendix 3)
The normalization term determined in response to the sample data, the at least two latent variables and the model parameter is:

Indicates the dimension of α ⁱ ,

And b is

Showing,
The method according to appendix 1.

（付記４）
前記サンプルデータ、前記少なくとも２つの潜在変数及び前記モデルパラメータに応じて決定される前記潜在変数の各々の前記変分分布の対数はｌｏｇｑ（Ｚ^１），．．．ｌｏｇｑ（Ｚ^ｄ）を含み、ｑ（Ｚ^１）は潜在変数Ｚ^１の前記変分分布を示し、ｑ（Ｚ^ｄ）は潜在変数Ｚ^ｄの前記変分分布を示す付記１に記載の方法。 (Appendix 4)
The logarithm of the variation distribution of each of the latent variables determined in response to the sample data, the at least two latent variables and the model parameters is logq (Z ¹ ),. . . The method according to appendix ¹ , wherein logq (Z ^d ) is included, q (Z ¹ ) indicates the variation distribution of the latent variable Z ¹ , and q (Z ^d ) indicates the variation distribution of the latent variable Z ^d .

（付記５）
前記対数尤度と、前記正規化項と、前記潜在変数の各々の前記変分分布の対数と、に応じて目的関数を決定するステップは、前記対数尤度の期待値、前記正規化項の期待値及び前記潜在変数の各々の前記変分分布の対数の期待値に応じて前記目的関数を決定することを含む付記１乃至４の何れかに記載の方法。 (Appendix 5)
The step of determining an objective function according to the log likelihood, the normalization term, and the logarithm of the variation distribution of each of the latent variables includes the expected value of the log likelihood, the normalization term The method according to any one of appendices 1 to 4, further comprising: determining the objective function according to an expected value and an expected value of the logarithm of the variation distribution of each of the latent variables.

（付記６）
前記対数尤度の期待値、前記正規化項の期待値及び前記潜在変数の各々の前記変分分布の対数の期待値に応じて決定された前記目的関数

は、

である付記５に記載の方法。 (Appendix 6)
The objective function determined according to the expected value of the log likelihood, the expected value of the normalization term, and the expected value of the logarithm of the variation distribution of each of the latent variables

Is

The method according to appendix 5, wherein

(付記７)
前記目的関数の収束を可能にする前記潜在変数の各々の変分分布及び前記モデルパラメータを決定するステップは、
前記潜在変数の各々の更新された変分分布及び更新されたモデルパラメータを取得することと、
前記潜在変数の各々の前記更新された変分分布及び前記更新されたモデルパラメータに応じて前記目的関数が収束するか否かを判別し、前記目的関数が収束しない場合には前記目的関数の収束を可能にする前記潜在変数の各々の前記変分分布及び前記更新されたモデルパラメータを取得するまで前記潜在変数の各々の前記更新された変分分布及び前記更新されたモデルパラメータを再取得することと、
を含む付記６に記載の方法。 (Appendix 7)
Determining a variational distribution of each of the latent variables and the model parameters that allow convergence of the objective function;
Obtaining an updated variation distribution and updated model parameters for each of the latent variables;
It is determined whether or not the objective function converges according to the updated variation distribution and the updated model parameter of each of the latent variables, and if the objective function does not converge, the objective function converges Re-acquiring the updated variation distribution and the updated model parameters for each of the latent variables until obtaining the variation distribution and the updated model parameters for each of the latent variables. When,
The method according to appendix 6, comprising:

（付記８）
前記潜在変数の各々の更新された変分分布及び更新されたモデルパラメータを取得するステップは、
前記潜在変数の各々の収束され更新された変分分布を取得するまで、下記の式

ここで、

ｔは現在の更新を示し、ｔ−１は前回の更新又は初期設定を示す、
を使用することによって前記潜在変数の各々の前記更新された変分分布に応じて前記更新されたモデルパラメータを取得することと、
を含む付記７に記載の方法。 (Appendix 8)
Obtaining an updated variation distribution and an updated model parameter for each of the latent variables;
Until we obtain a converged and updated variational distribution for each of the latent variables:

here,

t indicates the current update, t-1 indicates the previous update or initial setting,
Obtaining the updated model parameters in response to the updated variational distribution of each of the latent variables by using
The method according to appendix 7, including:

（付記９）
前記潜在変数の各々の更新された変分分布及び更新されたモデルパラメータを取得するステップは、
前記更新されたモデルパラメータを取得するために下記の式

ここで、

ここで、ｔは現在の更新を示し、ｔ−１は前回の更新又は初期設定を示す、
を使用することによって前記更新されたモデルパラメータに応じて前記潜在変数の各々の前記変分分布を交互に更新することと、
を含む付記７に記載の方法。 (Appendix 9)
Obtaining an updated variation distribution and an updated model parameter for each of the latent variables;
In order to obtain the updated model parameters:

here,

Here, t indicates the current update, t-1 indicates the previous update or initial setting,
Alternately updating the variational distribution of each of the latent variables in response to the updated model parameters by using
The method according to appendix 7, including:

（付記１０）
前記潜在変数の各々の前記更新された変分分布及び前記更新されたモデルパラメータに応じて前記目的関数が収束するか否かを判別するステップは、
前記潜在変数の各々の前記更新された変分分布及び前記更新されたモデルパラメータに応じて決定された前記目的関数と、前記潜在変数の各々の前回更新された変分分布及び前回更新されたモデルパラメータに応じて決定された前回取得された目的関数と、の間の距離が閾値より短いか否かを判別することと、
前記潜在変数の各々の前記更新された変分分布及び前記更新されたモデルパラメータに応じて決定された前記目的関数と前記前回取得された目的関数との間の距離が前記閾値より短い場合には前記目的関数が収束すると判別することと、
を含む付記７乃至９の何れかに記載の方法。 (Appendix 10)
Determining whether the objective function converges according to the updated variation distribution and the updated model parameters of each of the latent variables;
The objective function determined in response to the updated variation distribution and the updated model parameters of each of the latent variables; the last updated variation distribution and the last updated model of each of the latent variables; Determining whether the distance between the previously acquired objective function determined according to the parameter and the objective function is shorter than a threshold;
When the distance between the objective function determined in accordance with the updated variation distribution and the updated model parameter of each of the latent variables and the previously acquired objective function is shorter than the threshold value Determining that the objective function converges;
The method according to any one of appendices 7 to 9, including:

（付記１１）
サンプルデータ、少なくとも２つの潜在変数及びモデルパラメータに応じて決定される対数尤度と、正規化項と、前記潜在変数の各々の変分分布の対数と、を取得するように構成された取得モジュールと、
前記対数尤度と、前記正規化項と、前記潜在変数の各々の前記変分分布の対数と、に応じて目的関数を決定するように構成された第１の決定モジュールと、
前記目的関数の収束を可能にする前記潜在変数の各々の変分分布及びモデルパラメータを決定するように構成された第２の決定モジュールと、
前記目的関数の収束を可能にする前記潜在変数の各々の前記変分分布及び前記モデルパラメータに応じてリレーショナルモデルを決定するように構成された第３の決定モジュールと、
を備え、
前記潜在変数の各々は、前記サンプルデータのサンプルカテゴリを記述するために使用される、
リレーショナルモデル決定用の装置。 (Appendix 11)
An acquisition module configured to acquire log likelihood determined according to sample data, at least two latent variables and model parameters, a normalization term, and a logarithm of each variation distribution of the latent variables. When,
A first determination module configured to determine an objective function according to the log likelihood, the normalization term, and the logarithm of the variational distribution of each of the latent variables;
A second determination module configured to determine a variational distribution and model parameters of each of the latent variables that allow convergence of the objective function;
A third determination module configured to determine a relational model as a function of the variational distribution and model parameters of each of the latent variables that allow convergence of the objective function;
With
Each of the latent variables is used to describe a sample category of the sample data;
Equipment for relational model determination.

（付記１２）
前記取得モジュールによって取得された前記対数尤度は、

は前記サンプルデータを示し、ｄは前記サンプルデータの次元を示し、Ｎ_１は第１次元のサンプルデータの数を示し、Ｎ_ｄは第ｄ次元のサンプルデータの数を示し、Ｚ^１は第１次元のサンプルデータの潜在変数を示し、Ｚ^ｄは第ｄ次元のサンプルデータの潜在変数を示し、θは前記モデルパラメータのセットを示し、前記モデルパラメータはα^１，．．．，α^ｄ，φを含み、α^１は第１次元の混合比であり、α^ｄは第ｄ次元の混合比であり、φはモデルパラメータを示す、
付記１１に記載の装置。 (Appendix 12)
The log likelihood acquired by the acquisition module is

Represents the sample data, d represents the dimension of the sample data, N ₁ represents the number of sample data in the first dimension, N _d represents the number of sample data in the d-th dimension, and Z ¹ represents the first Dimensional sample data latent variables, Z ^d denotes d-dimensional sample data latent variables, θ denotes the set of model parameters, and the model parameters are α ¹ ,. . . , Α ^d , φ, α ¹ is the first dimensional mixing ratio, α ^d is the d dimensional mixing ratio, and φ is the model parameter,
The apparatus according to appendix 11.

（付記１３）
前記取得モジュールによって取得された前記正規化項は、

は前記潜在変数の前記変分分布の近似値を示し、

はα^ｉの次元を示し、

を示し、ｂは

を示す、
付記１１に記載の装置。 (Appendix 13)
The normalization term acquired by the acquisition module is:

Indicates the dimension of α ⁱ ,

And b is

Showing,
The apparatus according to appendix 11.

（付記１４）
前記取得モジュールによって取得された前記潜在変数の各々の前記変分分布の対数はｌｏｇｑ（Ｚ^１），．．．ｌｏｇｑ（Ｚ^ｄ）を含み、ｑ（Ｚ^１）は潜在変数Ｚ^１の前記変分分布を示し、ｑ（Ｚ^ｄ）は潜在変数Ｚ^ｄの前記変分分布を示す付記１１に記載の装置。 (Appendix 14)
The logarithm of the variational distribution of each of the latent variables acquired by the acquisition module is logq (Z ¹ ),. . . The apparatus according to appendix 11, which includes logq (Z ^d ), q (Z ¹ ) indicates the variation distribution of the latent variable Z ¹ , and q (Z ^d ) indicates the variation distribution of the latent variable Z ^d .

（付記１５）
前記第１の決定モジュールは、前記対数尤度の期待値、前記正規化項の期待値及び前記潜在変数の各々の前記変分分布の対数の期待値に応じて目的関数を決定するように構成されている付記１１乃至１４の何れかに記載の装置。 (Appendix 15)
The first determination module is configured to determine an objective function according to an expected value of the log likelihood, an expected value of the normalization term, and an expected value of the logarithm of the variation distribution of each of the latent variables. 15. The device according to any one of appendices 11 to 14, wherein

（付記１６）
前記第１の決定モジュールによって決定された前記目的関数

は、

である付記１５に記載の装置。 (Appendix 16)
The objective function determined by the first determination module;

Is

The apparatus according to appendix 15, wherein

（付記１７）
前記第２の決定モジュールは、
前記潜在変数の各々の更新された変分分布及び更新されたモデルパラメータを取得するように構成された取得ユニットと、
前記潜在変数の各々の前記更新された変分分布及び前記更新されたモデルパラメータに応じて前記目的関数が収束するか否かを判別するように構成された判別ユニットと、
を含み、
前記取得ユニットは、前記目的関数が収束するまで前記潜在変数の各々の前記変分分布及び前記更新されたモデルパラメータを再取得するように構成されている、
付記１６に記載の装置。 (Appendix 17)
The second determination module is
An acquisition unit configured to acquire an updated variation distribution and updated model parameters of each of the latent variables;
A discriminating unit configured to discriminate whether or not the objective function converges according to the updated variation distribution and the updated model parameter of each of the latent variables;
Including
The acquisition unit is configured to reacquire the variational distribution and the updated model parameters of each of the latent variables until the objective function converges;
The apparatus according to appendix 16.

（付記１８）
前記取得ユニットは、
前記潜在変数の各々の収束され更新された変分分布を取得するまで、下記の式

ここで、

ｔは現在の更新を示し、ｔ−１は前回の更新又は初期設定を示す、
を使用することによって前記潜在変数の各々の前記更新された変分分布に応じて前記更新されたモデルパラメータを取得するように構成された第２の更新サブユニットと、
を含む付記１７に記載の装置。 (Appendix 18)
The acquisition unit is
Until we obtain a converged and updated variational distribution for each of the latent variables:

here,

t indicates the current update, t-1 indicates the previous update or initial setting,
A second update subunit configured to obtain the updated model parameters in response to the updated variational distribution of each of the latent variables by using
The apparatus according to appendix 17, comprising:

（付記１９）
前記取得ユニットは、
前記更新されたモデルパラメータを取得するために下記の式

ここで、

ここで、ｔは現在の更新を示し、ｔ−１は前回の更新又は初期設定を示す、
を使用することによって前記更新されたモデルパラメータに応じて前記潜在変数の各々の前記変分分布を交互に更新するように構成された第４の更新サブユニットと、
を含む付記１７に記載の装置。 (Appendix 19)
The acquisition unit is
In order to obtain the updated model parameters:

here,

Here, t indicates the current update, t-1 indicates the previous update or initial setting,
A fourth update subunit configured to alternately update the variational distribution of each of the latent variables in response to the updated model parameters by using
The apparatus according to appendix 17, comprising:

（付記２０）
前記判別ユニットは、
前記潜在変数の各々の前記更新された変分分布及び前記更新されたモデルパラメータに応じて決定された前記目的関数と、前記潜在変数の各々の前回更新された変分分布及び前回更新されたモデルパラメータに応じて決定された前回取得された目的関数と、の間の距離が閾値より短いか否かを判別するように構成された比較サブユニットと、
前記潜在変数の各々の前記更新された変分分布及び前記更新されたモデルパラメータに応じて決定された前記目的関数と前記前回取得された目的関数との間の距離が前記閾値より短い場合には前記目的関数が収束すると判別するように構成された判別サブユニットと、
を含む付記１７乃至１９の何れかに記載の装置。 (Appendix 20)
The discrimination unit is
The objective function determined in response to the updated variation distribution and the updated model parameters of each of the latent variables; the last updated variation distribution and the last updated model of each of the latent variables; A comparison subunit configured to determine whether the distance between the previously obtained objective function determined according to the parameter and the distance is less than a threshold;
When the distance between the objective function determined in accordance with the updated variation distribution and the updated model parameter of each of the latent variables and the previously acquired objective function is shorter than the threshold value A discrimination subunit configured to determine that the objective function converges;
The apparatus according to any one of appendices 17 to 19, including:

Claims

An acquisition module configured to acquire log likelihood determined according to sample data, at least two latent variables and model parameters, a normalization term, and a logarithm of each variation distribution of the latent variables. When,
A first determination module configured to determine an objective function according to the log likelihood, the normalization term, and the logarithm of the variational distribution of each of the latent variables;
A second determination module configured to determine a variational distribution and model parameters of each of the latent variables that allow convergence of the objective function;
A third determination module configured to determine a relational model as a function of the variational distribution and model parameters of each of the latent variables that allow convergence of the objective function;
With
Each of the latent variables is used to describe a sample category of the sample data;
Equipment for relational model determination.

The log likelihood acquired by the acquisition module is

Represents the sample data, d represents the dimension of the sample data, N ₁ represents the number of sample data in the first dimension, N _d represents the number of sample data in the d-th dimension, and Z ¹ represents the first Dimensional sample data latent variables, Z ^d denotes d-dimensional sample data latent variables, θ denotes the set of model parameters, and the model parameters are α ¹ ,. . . , Α ^d , φ, α ¹ is the first dimensional mixing ratio, α ^d is the d dimensional mixing ratio, and φ is the model parameter,
The apparatus of claim 1.

The normalization term acquired by the acquisition module is:

Indicates the dimension of α ⁱ ,

And b is

Showing,
The apparatus of claim 1.

The logarithm of the variational distribution of each of the latent variables acquired by the acquisition module is logq (Z ¹ ),. . . The apparatus according to claim ¹ , comprising logq (Z ^d ), q (Z ¹ ) indicating the variation distribution of latent variable Z ¹ , and q (Z ^d ) indicating the variation distribution of latent variable Z ^d. .

The first determination module is configured to determine an objective function according to an expected value of the log likelihood, an expected value of the normalization term, and an expected value of the logarithm of the variation distribution of each of the latent variables. An apparatus according to any one of claims 1 to 4, wherein:

The objective function determined by the first determination module;

Is

6. The apparatus of claim 5, wherein

The second determination module is
An acquisition unit configured to acquire an updated variation distribution and updated model parameters of each of the latent variables;
A discriminating unit configured to discriminate whether or not the objective function converges according to the updated variation distribution and the updated model parameter of each of the latent variables;
Including
The acquisition unit is configured to reacquire the variational distribution and the updated model parameters of each of the latent variables until the objective function converges;
The apparatus according to claim 6.

The acquisition unit is
Until we obtain a converged and updated variational distribution for each of the latent variables:

here,

t indicates the current update, t-1 indicates the previous update or initial setting,
A second update subunit configured to obtain the updated model parameters in response to the updated variational distribution of each of the latent variables by using
The apparatus of claim 7 comprising:

The acquisition unit is
In order to obtain the updated model parameters:

here,

Here, t indicates the current update, t-1 indicates the previous update or initial setting,
A fourth update subunit configured to alternately update the variational distribution of each of the latent variables in response to the updated model parameters by using
The apparatus of claim 7 comprising:

Obtaining logarithmic likelihood determined according to sample data, at least two latent variables and model parameters, a normalization term, and a logarithm of each variational distribution of the latent variables;
Determining an objective function according to the log likelihood, the normalization term, and the logarithm of the variational distribution of each of the latent variables;
Determining a variational distribution and model parameters of each of the latent variables enabling convergence of the objective function, and depending on the variational distribution and model parameters of each of the latent variables enabling convergence of the objective function To determine the relational model
With
Each of the latent variables is used to describe a sample category of the sample data;
A method for relational model determination.