JP5281990B2

JP5281990B2 - Clustering apparatus, clustering method, and program

Info

Publication number: JP5281990B2
Application number: JP2009195882A
Authority: JP
Inventors: 勝彦石黒; 具治岩田; 修功上田
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2009-08-26
Filing date: 2009-08-26
Publication date: 2013-09-04
Anticipated expiration: 2029-08-26
Also published as: JP2011048583A

Abstract

<P>PROBLEM TO BE SOLVED: To estimate a secular change of a cluster by estimating the presence of the cluster of an object from a time series of relational data representing the presence of a relationship between and among a plurality of objects. <P>SOLUTION: A mathematical model of a known IRM (Infinite Relational Model) is represented using an affiliation cluster z, an inter-cluster relational degree η, an mixing ratio β, and relational data x. The IRM is expanded, and a parameter π representing a secular change of the cluster is introduced. The estimating method of the affiliation cluster z is also modified to take a time into account. Affiliation cluster z estimation calculation (step S104), secular charge information π estimation calculation (step S105), inter-cluster relational degree η estimation calculation (step S106), a mixing ratio β estimation calculation (step S107) shown in the Fig.1 are executed repeatedly until a completion state is met for calculating an estimated value of each parameter. <P>COPYRIGHT: (C)2011,JPO&INPIT

Description

本発明は、複数のオブジェクトの間の関係の有無を示す関係データから、オブジェクトのクラスタの存在を推定し、クラスタの生成、隆盛、消滅等のクラスタの時間変化を推定する技術に関する。 The present invention relates to a technique for estimating the existence of a cluster of objects from relational data indicating the presence or absence of a relationship between a plurality of objects, and estimating a temporal change of the cluster such as generation, rise, and disappearance of clusters.

複数のオブジェクトの間の関係（例えば、人と人との関係やインターネットにおけるリンク関係等）の有無を示す関係データを用いて、オブジェクトのクラスタ（グループ）間の関係を表現し、そのクラスタ間の関係を最も良く表すオブジェクトのクラスタを求める技術として、非特許文献１に記載のStochastic Block Model（ＳＢＭ）や、そのＳＢＭを拡張した非特許文献２に記載のInfinite Relational Model（ＩＲＭ）とが知られている。ＳＢＭがクラスタ数を予め設定してからクラスタリングを実行するのに対して、ＩＲＭは、クラスタ数を事前に決めることなく、関係データに基づいて最適なクラスタ数を自動的に推定することができる。 Express the relationship between clusters (groups) of objects using relationship data that indicates the presence or absence of relationships between multiple objects (for example, the relationship between people or the link relationship on the Internet). As a technique for obtaining a cluster of objects that best represents the relationship, Stochastic Block Model (SBM) described in Non-Patent Document 1 and Infinite Relational Model (IRM) described in Non-Patent Document 2 that extends the SBM are known. ing. The SBM executes clustering after setting the number of clusters in advance, whereas the IRM can automatically estimate the optimum number of clusters based on relational data without determining the number of clusters in advance.

ＩＲＭは、オブジェクトｉとオブジェクトｊとの関係データＸ＝｛ｘ_ｉ，ｊ｝に基づいて、各オブジェクトｉ，ｊ間の関係を複数のクラスタに分類する。このクラスタの分類では、ノンパラメトリックベイズモデルの一つである非特許文献３に記載のDirichlet Process Mixture（ＤＰＭ）を適用することによって、クラスタ数の推定が可能になっている。 The IRM classifies the relationship between the objects i and j into a plurality of clusters based on the relationship data X = {x _{i, j} } between the object i and the object j. In this cluster classification, the number of clusters can be estimated by applying Dirichlet Process Mixture (DPM) described in Non-Patent Document 3, which is one of the nonparametric Bayes models.

K.Nowicki and T.A.B.Snijders,“Estimation and Prediction for Stochastic Blockstructures”, Journal of the American Statistical Association, Vol.96, No.455, p.1077-1087, 2001K. Nowicki and T.A.B.Snijders, “Estimation and Prediction for Stochastic Blockstructures”, Journal of the American Statistical Association, Vol.96, No.455, p.1077-1087, 2001 C.Kemp, J.B.Tenenbaum, T.L.Griffiths, T.Yamada and N.Ueda, “Learning Systems Of Concepts With An Infinite Relational Model”, Proceedings of the 21st National Conference on Artificial Intelligence, 2006C. Kemp, J.B.Tenenbaum, T.L.Griffiths, T.Yamada and N.Ueda, “Learning Systems Of Concepts With An Infinite Relational Model”, Proceedings of the 21st National Conference on Artificial Intelligence, 2006 T.S.Ferguson,“A Bayesian Analysis of Some Nonparametric Problems”, The Annals of Statistics, Vol.1, No.2, p.353-355, 1973.T.S.Ferguson, “A Bayesian Analysis of Some Nonparametric Problems”, The Annals of Statistics, Vol.1, No.2, p.353-355, 1973.

しかしながら、ＳＢＭやＩＲＭでは、それらのモデルの構成上、クラスタの時間変化を扱うことができない。すなわち、ＳＢＭやＩＲＭは、ある一時刻のスナップショットの関係データだけを用いたり、または一定時間の平均的な関係を示すデータを用いたりすることしかできない。そのため、人と人との関係やインターネットにおけるリンク関係等のようにオブジェクト間の関係が時間的に変化するデータに対して、時間方向の情報を解析することができなかった。 However, SBM and IRM cannot handle the time change of the cluster because of the configuration of these models. In other words, SBM and IRM can only use related data of a snapshot at a certain time, or use data indicating an average relationship for a certain time. For this reason, information in the time direction cannot be analyzed for data in which the relationship between objects changes with time, such as the relationship between people and the link relationship on the Internet.

そこで、本発明の課題は、複数のオブジェクトの間の関係の有無を示す関係データの時系列から、オブジェクトのクラスタの存在を推定し、クラスタの時間変化を推定する技術を提供することを目的とする。 Accordingly, an object of the present invention is to provide a technique for estimating the existence of a cluster of objects from a time series of relational data indicating the presence / absence of a relationship between a plurality of objects and estimating a temporal change of the clusters. To do.

本発明は、複数のオブジェクトの間の関係の有無を表す関係データを用いて、オブジェクトのクラスタリングを実行するクラスタリング装置であって、前記クラスタリング装置が、前記関係データをクラスタリングする関数の一つである無限関係モデル（ＩＲＭ、Infinite Relational Model）において算出される混合比βおよびクラスタｋとクラスタｌとの間の関係の強さを示すクラスタ間関連度η_ｋ，ｌと、クラスタｋとクラスタｌとの間の関連の有無を示す関係データを所定の時間間隔で観測した関係データｘ_{ｔ，ｋ，ｌ}、時刻ｔ−１においてクラスタｋに所属していたオブジェクトが次の時刻ｔにどのクラスタに所属しやすいかを示す時間変化情報π_ｔ，ｋ、および時刻ｔにおいてオブジェクトｉが所属するクラスタを示す所属クラスタｚ_ｔ，ｉと、ハイパーパラメータα_０，κと、クラスタ数Ｋとを記憶する記憶部と、前記記憶部から前記所属クラスタｚ_ｔ，ｉ、前記時間変化情報π_ｔ，ｋ、前記混合比β、前記ハイパーパラメータα_０，κ、および前記クラスタ数Ｋとを取得し、所定期間ｔ＝１〜Ｔにおいて、ｚ_{ｔ−１，ｉ}＝ｋかつｚ_ｔ，ｉ＝ｌとなるオブジェクトの数をｍ_{ｔ，ｋ，ｌ}としたとき、ディリクレ分布Dirichlet（α_０β_１＋ｍ_{ｔ，ｋ，ｌ}，…，α_０β_ｋ＋ｍ_{ｔ，ｋ，ｋ}＋κ，…，α_０β_Ｋ＋ｍ_{ｔ，ｋ，Ｋ}，α_０（１−Σ^Ｋ _ｋ＝１β_ｋ））からサンプリングして前記時間変化情報π_ｔ，ｋを算出し、当該算出した時間変化情報π_ｔ，ｋによって前記記憶部に記憶してある時間変化情報π_ｔ，ｋを更新し記憶する時間変化情報推定部と、前記記憶部から前記所属クラスタｚ_ｔ，ｉ、前記時間変化情報π_ｔ，ｋ、前記クラスタ間関連度η_ｋ，ｌ、前記混合比β、所定期間ｔ＝１〜Ｔの前記関係データｘ_{ｔ，ｋ，ｌ}、および前記クラスタ数Ｋを取得し、取得した前記混合比β、前記時間変化情報π_ｔ，ｋ、前記クラスタ間関連度η_ｋ，ｌ、前記関係データｘ_{ｔ，ｋ，ｌ}、およびクラスタ数Ｋを、式（１)、式（２）、式（３）、式（４）に適用して、式（１）からｕ_ｔ，ｊをサンプリングし、メッセージ変数ｐ_{ｔ，ｉ，ｋ}を式（３）で定義したとき、ｔ＝１からｔ＝Ｔまで順番に式（２）を用いてメッセージ変数ｐ_{ｔ，ｉ，ｋ}を算出し、当該算出したメッセージ変数ｐ_{ｔ，ｉ，ｋ}に対して、ｔ＝Ｔからｔ＝１まで順番に式（４）を用いてメッセージ変数ｐ_{ｔ，ｉ，ｋ}を算出し、ｐ（ｚ_ｔ，ｉ＝ｋ｜ｚ_{ｔ−１，ｉ＝ｌ}）が０とならない場合の所属クラスタｚ_ｔ，ｉを算出し、当該算出した所属クラスタｚ_ｔ，ｉによって前記記憶部に記憶してある所属クラスタｚ_ｔ，ｉを更新し記憶する所属クラスタ推定部と、前記クラスタ間関連度η_ｋ，ｌを前記無限関係モデルによって算出し、前記記憶部に記憶するクラスタ間関連度推定部と、前記混合比βを前記無限関係モデルによって算出し、前記記憶部に記憶する混合比推定部と、前記混合比推定部、前記クラスタ間関連度推定部、前記時間変化情報推定部、および前記所属クラスタ推定部における演算を任意の順番で実行する過程を、所定の終了条件を満足するまで繰り返す終了判定部と、を備えることを特徴とする。
The present invention is a clustering device that performs clustering of objects using relationship data that indicates the presence or absence of relationships between a plurality of objects, and the clustering device is one of the functions for clustering the relationship data. The mixture ratio β calculated in an infinite relational model (IRM), the inter-cluster relation η _{k, l} indicating the strength of the relation between the cluster k and the cluster l, and the cluster k and the cluster l The relation data x _{t, k, l} , which is obtained by observing the relation data indicating the presence or absence of the relation at a predetermined time interval, the object belonging to the cluster k at the time t−1 belongs to which cluster at the next time t Time change information π _{t, k} indicating whether it is easy, and belonging cluster z _t indicating the cluster to which the object i belongs at time _{t , I} , hyperparameters α ₀ , κ, and the number K of clusters, and the cluster z _{t, i} , the time change information π _{t, k} , the mixture ratio β, Hyper parameters α ₀ , κ and the number of clusters K are acquired, and the number of objects that satisfy z _{t−1, i} = k and z _{t, i} = l in a predetermined period t = ₁ to _T is represented by m _{t, When k, l} , the Dirichlet distribution Dirichlet (α ₀ β ₁ + mt _{, k, l} , ... , α ₀ β _k + mt _{, k, k} + κ, ... , α ₀ β _K + mt _{, k, K} , α _{^{_{0 (1-Σ K k =}}} 1 β k)) by sampling from it calculates the time change information π _{t, k,} the time variation of the time the calculated change information [pi _{t, k} are stored in the storage unit and time change information estimating unit that updates and stores the information [pi _{t, k,} or the storage unit The cluster membership _{z t, i,} the time change information π _{t, k,} the cluster relevancy η _{k, l,} the mixing ratio beta, predetermined time period t = 1 to T the relation data _{x t of, k,} l, And the cluster number K, and the obtained mixture ratio β, the time change information π _{t, k} , the inter-cluster relevance η _{k, l} , the relation data x _{t, k, l} , and the cluster number K , Expression (1), Expression (2), Expression (3), Expression (4), u _{t, j} is sampled from Expression (1), and the message variable p _{t, i, k} is _expressed by Expression (3). ), Message variables p _{t, i, k} are calculated in order from t = 1 to t = T using equation (2), and for the calculated message variables p _{t, i, k} , t = from T to t = 1 using equation (4) in order to calculate the message variables _{p t, i,} a _k, p _{(z t _{i = k | z t-1}} , i = l) cluster membership _{z t} when does not become _0, and calculates a _i, cluster membership and the calculated _{z t,} cluster membership z of the _i are stored in said storage unit an affiliation cluster estimator that updates and stores _{t, i} , an intercluster relevance η _{k, l} is calculated by the infinite relation model, and is stored in the storage, and the mixing ratio β Is calculated by the infinite relationship model and stored in the storage unit, and the calculation in the mixture ratio estimation unit, the inter-cluster relevance estimation unit, the time change information estimation unit, and the belonging cluster estimation unit And an end determination unit that repeats the process of executing the processes in an arbitrary order until a predetermined end condition is satisfied.

また、本発明は、複数のオブジェクトの間の関係の有無を表す関係データを用いて、オブジェクトのクラスタリングを実行するクラスタリング装置において用いられるクラスタリング方法であって、前記クラスタリング装置が、前記関係データをクラスタリングする関数の一つである無限関係モデル（ＩＲＭ、Infinite Relational Model）において算出される混合比βおよびクラスタｋとクラスタｌとの間の関係の強さを示すクラスタ間関連度η_ｋ，ｌと、クラスタｋとクラスタｌとの間の関連の有無を示す関係データを所定の時間間隔で観測した関係データｘ_{ｔ，ｋ，ｌ}、時刻ｔ−１においてクラスタｋに所属していたオブジェクトが次の時刻ｔにどのクラスタに所属しやすいかを示す時間変化情報π_ｔ，ｋ、および時刻ｔにおいてオブジェクトｉが所属するクラスタを示す所属クラスタｚ_ｔ，ｉと、ハイパーパラメータα_０，κと、クラスタ数Ｋとを記憶する記憶部と処理部とを備え、前記処理部が、前記記憶部から前記所属クラスタｚ_ｔ，ｉ、前記時間変化情報π_ｔ，ｋ、前記混合比β、前記ハイパーパラメータα_０，κ、および前記クラスタ数Ｋとを取得し、所定期間ｔ＝１〜Ｔにおいて、ｚ_{ｔ−１，ｉ}＝ｋかつｚ_ｔ，ｉ＝ｌとなるオブジェクトの数をｍ_{ｔ，ｋ，ｌ}としたとき、ディリクレ分布Dirichlet（α_０β_１＋ｍ_{ｔ，ｋ，ｌ}，…，α_０β_ｋ＋ｍ_{ｔ，ｋ，ｋ}＋κ，…，α_０β_Ｋ＋ｍ_{ｔ，ｋ，Ｋ}，α_０（１−Σ^Ｋ _ｋ＝１β_ｋ））からサンプリングして前記時間変化情報π_ｔ，ｋを算出し、当該算出した時間変化情報π_ｔ，ｋによって前記記憶部に記憶してある時間変化情報π_ｔ，ｋを更新し記憶する時間変化情報推定ステップと、前記記憶部から前記所属クラスタｚ_ｔ，ｉ、前記時間変化情報π_ｔ，ｋ、前記クラスタ間関連度η_ｋ，ｌ、前記混合比β、所定期間ｔ＝１〜Ｔの前記関係データｘ_{ｔ，ｋ，ｌ}、および前記クラスタ数Ｋを取得し、取得した前記混合比β、前記時間変化情報π_ｔ，ｋ、前記クラスタ間関連度η_ｋ，ｌ、前記関係データｘ_{ｔ，ｋ，ｌ}、およびクラスタ数Ｋを、式（１)、式（２）、式（３）、式（４）に適用して、式（１）からｕ_ｔ，ｊをサンプリングし、メッセージ変数ｐ_{ｔ，ｉ，ｋ}を式（３）で定義したとき、ｔ＝１からｔ＝Ｔまで順番に式（２）を用いてメッセージ変数ｐ_{ｔ，ｉ，ｋ}を算出し、当該算出したメッセージ変数ｐ_{ｔ，ｉ，ｋ}に対して、ｔ＝Ｔからｔ＝１まで順番に式（４）を用いてメッセージ変数ｐ_{ｔ，ｉ，ｋ}を算出し、ｐ（ｚ_ｔ，ｉ＝ｋ｜ｚ_{ｔ−１，ｉ＝ｌ}）が０とならない場合の所属クラスタｚ_ｔ，ｉを算出し、当該算出した所属クラスタｚ_ｔ，ｉによって前記記憶部に記憶してある所属クラスタｚ_ｔ，ｉを更新し記憶する所属クラスタ推定ステップと、前記クラスタ間関連度η_ｋ，ｌを前記無限関係モデルによって算出し、前記記憶部に記憶するクラスタ間関連度推定ステップと、前記混合比βを前記無限関係モデルによって算出し、前記記憶部に記憶する混合比推定ステップと、前記混合比推定ステップ、前記クラスタ間関連度推定ステップ、前記時間変化情報推定ステップ、および前記所属クラスタ推定ステップにおける演算を任意の順番で実行する過程を、所定の終了条件を満足するまで繰り返し演算させる終了判定ステップと、実行することを特徴とする。 The present invention is also a clustering method used in a clustering apparatus that performs clustering of objects using relational data indicating the presence or absence of relations between a plurality of objects, wherein the clustering apparatus clusters the relational data. A mixture ratio β calculated in an infinite relational model (IRM) which is one of the functions to be performed, and an intercluster relationship η _{k, l} indicating the strength of the relationship between the cluster k and the cluster l, Relationship data x _{t, k, l obtained} by observing relationship data indicating whether or not there is a relationship between cluster k and cluster l at a predetermined time interval, and an object belonging to cluster k at time t−1 is the next time object in time-varying information π _{t, k,} and the time t indicating which cluster to easily belong to the t The Cluster membership but cluster membership z _t indicating the clusters _belonging, and _i, with hyper parameter alpha _0, and kappa, and a processing unit storing section for storing the number of clusters K, the processing unit, from the storage unit z _{t, i} , the time change information π _{t, k} , the mixing ratio β, the hyperparameters α ₀ , κ, and the number of clusters K are acquired, and z _t−1 in a predetermined period t = _{1 to} T. _{, I} = k and z _{t, i} = l _{, where} m _{t, k, l} is the number of objects, Dirichlet distribution Dirichlet (α ₀ β ₁ + mt _{, k, l} , ... , α ₀ β _k + m _{t , k, k + κ, ...} , α 0 β K + m t, k, K, α 0 (1-Σ K k = 1 β k)) by sampling calculates the time change information [pi _{t, k} from the there is stored in the storage unit by the calculated time change information [pi _{t, k} And time change information estimating step of updating stored between change information π _{t, k,} the cluster membership z _t from the storage _{unit, i,} the time change information π _{t, k,} the cluster relevancy η _{k, l,} The mixture ratio β, the relation data x _{t, k, l for} a predetermined period t = _{1 to T,} and the number of clusters K are acquired, and the acquired mixture ratio β, the time change information π _{t, k} , the cluster By applying the interrelationship η _{k, l} , the relational data x _{t, k, l} , and the number of clusters K to the equations (1), (2), (3), and (4), the equation ( When u _{t, j} is sampled from 1) and the message variable p _{t, i, k} is defined by equation (3), the message variable p _{t is} sequentially used from equation (2) from t = 1 to t = T. _{, i,} calculates _k, message variables _{p t} where the _{calculated, i,} with respect to _k, t from t = T Message variable _{p t} using Equation (4) in order to _{1, i,} to calculate the _{_{k, p (z t, i}} = k | z t-1, i = l) in the case of does not become 0 belongs cluster z _{t, i} is calculated, and the affiliated cluster estimation step of updating and storing the affiliated cluster z _{t, i} stored in the storage unit by the computed affiliated cluster z _{t, i} , and the inter-cluster relevance η _{k, l} is calculated by the infinite relationship model and stored in the storage unit, the inter-cluster relevance estimation step; the mixing ratio β is calculated by the infinite relationship model and stored in the storage unit; and The process of executing the calculation in the mixing ratio estimation step, the inter-cluster relevance estimation step, the time change information estimation step, and the belonging cluster estimation step in an arbitrary order is performed according to a predetermined end condition. A termination determination step of repeatedly calculated until satisfied, and executes.

このような構成によれば、公知のＩＲＭを拡張して、クラスタの時間変化を表す時間変化情報π_ｔ，ｋをパラメータとして導入し、所属クラスタｚ_ｔ，ｉの推定方法も時間に従って考慮可能なようにモデルを構成したことによって、クラスタの時間変化を推定することが可能となる。 According to such a configuration, the known IRM is expanded to introduce time change information π _{t, k} representing the time change of the cluster as a parameter _{, and} the estimation method of the belonging cluster z _{t, i} can be considered according to the time. By configuring the model as described above, it is possible to estimate the time change of the cluster.

本発明は、前記クラスタリング装置が、さらに、ハイパーパラメータγ，ξ，Ψを記憶する前記記憶部を備え、前記クラスタ間関連度推定部は、前記記憶部から前記所属クラスタｚ_ｔ，ｉ、前記クラスタ間関連度η_ｋ，ｌ、前記ハイパーパラメータξ，Ψ、前記クラスタ数Ｋ、および所定期間ｔ＝１〜Ｔの前記関係データｘ_{ｔ，ｋ，ｌ}を取得して、ｚ_ｔ，ｉ＝ｋかつｚ_ｔ，ｊ＝ｌとなる（ｔ，ｉ，ｊ）の組の数をＮ_ｋ，ｌとし、当該Ｎ_ｋ，ｌの中の前記関係データｘ_{ｔ，ｋ，ｌ}が関係有りを示す数をｎ_ｋ，ｌとしたとき、ベータ分布Beta（ξ＋η_ｋ，ｌ，Ψ＋Ｎ_ｋ，ｌ−ｎ_ｋ，ｌ）からサンプリングして前記クラスタ間関連度η_ｋ，ｌを算出し、当該算出したクラスタ間関連度η_ｋ，ｌによって前記記憶部に記憶してあるクラスタ間関連度η_ｋ，ｌを更新し記憶し、前記混合比推定部は、前記記憶部から前記所属クラスタｚ_ｔ，ｉ、前記混合比β、および前記ハイパーパラメータγ，α_０，κを取得し、取得した前記所属クラスタｚ_ｔ，ｉ、前記混合比β、および前記ハイパーパラメータα_０，κを、式（５）および式（６）に適用して、それぞれ補助変数Ｒ_{ｔ，ｋ，ｌ}および補助変数Ｏ_ｔ，ｋを算出し、前記算出した補助変数Ｒ_{ｔ，ｋ，ｌ}および補助変数Ｏ_ｔ，ｋを式（７）に適用して補助変数＾Ｒ_{ｔ，ｋ，ｌ}を算出し、ディリクレ分布Dirichlet（Σ_ｔ，ｋ＾Ｒ_{ｔ，ｋ，１}，Σ_ｔ，ｋ＾Ｒ_{ｔ，ｋ，２}，…，Σ_ｔ，ｋ＾Ｒ_{ｔ，ｋ，γ}）からサンプリングして前記混合比βを算出し、当該算出した混合比βによって前記記憶部に記憶してある混合比βを更新し記憶することを特徴とする。
In the present invention, the clustering apparatus further includes the storage unit that stores hyperparameters γ, ξ, and Ψ, and the inter-cluster relevance estimation unit is configured to receive the cluster z _{t, i} , the cluster from the storage unit. The relational degree η _{k, l} , the hyper parameters ξ, Ψ, the number of clusters K, and the relation data x _{t, k, l} for a predetermined period t = ₁ to _T are obtained, and z _{t, i} = k and z _t, j = l to become (t, i, j) the number of _{N k} pairs _of a _l, the _{N k,} the relationship data _{x t} in _{_l, k,} the number indicating the _l is relevant When n _{k, l} is sampled from the beta distribution Beta (ξ + η _{k, l} , Ψ + N _{k, l} −n _{k, l} ), the inter-cluster relevance η _{k, l} is calculated, and the calculated inter-cluster relevance between the clusters by degrees eta _{k, l} are stored in the storage unit functions Degrees eta _k, update the _l and stored, the mixing ratio estimating unit is configured from the storage unit cluster membership z _{t, i,} the mixing ratio beta, and the hyper parameter gamma, alpha _0, to get the kappa, obtaining The assigned cluster z _{t, i} , the mixing ratio β, and the hyperparameters α ₀ , κ are applied to the equations (5) and (6) to obtain auxiliary variables R _{t, k, l} and auxiliary variables, respectively. O _{t, k} is calculated, and the calculated auxiliary variable R _{t, k, l} and the auxiliary variable O _{t, k} are applied to the equation (7) to calculate the auxiliary variable ^ R _{t, k, l} , and the Dirichlet distribution Sampling from Dirichlet (Σt _{, k} ^ _{Rt, k, 1} , Σt _{, k} ^ _{Rt, k, 2} , ... , Σt _{, k} ^ _{Rt, k, γ} ) to calculate the mixing ratio β The mixture ratio β stored in the storage unit is updated and stored with the calculated mixture ratio β. The features.

また、本発明は、前記クラスタリング装置が、さらに、ハイパーパラメータγ，ξ，Ψを記憶する前記記憶部を備え、前記処理部が、前記クラスタ間関連度推定ステップにおいて、前記記憶部から前記所属クラスタｚ_ｔ，ｉ、前記クラスタ間関連度η_ｋ，ｌ、前記ハイパーパラメータξ，Ψ、前記クラスタ数Ｋ、および所定期間ｔ＝１〜Ｔの前記関係データｘ_{ｔ，ｋ，ｌ}を取得して、ｚ_ｔ，ｉ＝ｋかつｚ_ｔ，ｊ＝ｌとなる（ｔ，ｉ，ｊ）の組の数をＮ_ｋ，ｌとし、当該Ｎ_ｋ，ｌの中の前記関係データｘ_{ｔ，ｋ，ｌ}が関係有りを示す数をｎ_ｋ，ｌとしたとき、ベータ分布Beta（ξ＋η_ｋ，ｌ，Ψ＋Ｎ_ｋ，ｌ−ｎ_ｋ，ｌ）からサンプリングして前記クラスタ間関連度η_ｋ，ｌを算出し、当該算出したクラスタ間関連度η_ｋ，ｌによって前記記憶部に記憶してあるクラスタ間関連度η_ｋ，ｌを更新し記憶し、前記混合比推定ステップにおいて、前記記憶部から前記所属クラスタｚ_ｔ，ｉ、前記混合比β、および前記ハイパーパラメータγ，α_０，κを取得し、取得した前記所属クラスタｚ_ｔ，ｉ、前記混合比β、および前記ハイパーパラメータα_０，κを、式（５）および式（６）に適用して、それぞれ補助変数Ｒ_{ｔ，ｋ，ｌ}および補助変数Ｏ_ｔ，ｋを算出し、前記算出した補助変数Ｒ_{ｔ，ｋ，ｌ}および補助変数Ｏ_ｔ，ｋを式（７）に適用して補助変数＾Ｒ_{ｔ，ｋ，ｌ}を算出し、ディリクレ分布Dirichlet（Σ_ｔ，ｋ＾Ｒ_{ｔ，ｋ，１}，Σ_ｔ，ｋ＾Ｒ_{ｔ，ｋ，２}，…，Σ_ｔ，ｋ＾Ｒ_{ｔ，ｋ，γ}）からサンプリングして前記混合比βを算出し、当該算出した混合比βによって前記記憶部に記憶してある混合比βを更新し記憶することを特徴とする。 Further, according to the present invention, the clustering device further includes the storage unit that stores hyperparameters γ, ξ, and Ψ, and the processing unit includes the cluster from the storage unit in the inter-cluster relevance estimation step. z _{t, i} , the inter-cluster relevance η _{k, l} , the hyper parameters ξ, Ψ, the number K of clusters, and the relation data x _{t, k, l} for a predetermined period t = ₁ to _T are obtained, z _t, i = k and _{z t,} the j = l (t, i, j) and the number set of the _{N k,} and _l, the _{N k,} the relationship data _{x t} in _{_l, k, l} And n _{k, l, the} number of relevance between the clusters is sampled from the beta distribution Beta (ξ + η _{k, l} , Ψ + N _{k, l} −n _{k, l} ) and the inter-cluster relevance η _{k, l} is calculated. the by between clusters and the calculated relevance eta _{k, l} Cluster relevancy eta _{k, l} which is stored updated and stored in憶部, the the mixing ratio estimating step, wherein the cluster membership z _t from the storage _{unit, i,} the mixing ratio beta, and the hyper parameters γ , Α ₀ , κ are obtained, and the acquired cluster z _{t, i} , the mixing ratio β, and the hyperparameters α ₀ , κ are applied to the equations (5) and (6), respectively. The variable R _{t, k, l} and the auxiliary variable O _{t, k} are calculated, and the calculated auxiliary variable R _{t, k, l} and auxiliary variable O _{t, k} are applied to the equation (7) to apply the auxiliary variable R _{t , K, l} are calculated, and the Dirichlet distribution Dirichlet (Σt _{, k} ^ _{Rt, k, 1} , Σt _{, k} ^ _{Rt, k, 2} , ... , Σt _{, k} ^ _{Rt, k, γ} ) The mixing ratio β is sampled from And updates and stores the mixing ratio β which is stored in the part.

このような構成によれば、時間期間に対応した混合比βおよびクラスタ間関連度η_ｋ，ｌを用いて、時間変化情報π_ｔ，ｋおよび所属クラスタｚ_ｔ，ｉの推定方法を行うことが可能なようにモデルを構成したことによって、クラスタの時間変化を推定することが可能となる。 According to such a configuration, the estimation method of the time change information π _{t, k} and the assigned cluster z _{t, i} can be performed using the mixing ratio β corresponding to the time period and the inter-cluster relevance η _{k, l.} By constructing the model as possible, it is possible to estimate the time change of the cluster.

本発明は、前記クラスタリング装置が、演算結果を表示する表示装置と接続され、前記終了判定部が、前記終了条件を満足した場合、前記混合比β、前記クラスタ間関連度η_ｋ，ｌ、前記時間変化情報π_ｔ，ｋ、および前記所属クラスタｚ_ｔ，ｉのいずれか一つまたはいずれかの組み合わせを前記表示装置に出力することを特徴とする。 In the present invention, when the clustering device is connected to a display device that displays a calculation result, and the end determination unit satisfies the end condition, the mixture ratio β, the intercluster relevance η _{k, l} , Any one or any combination of the time change information π _{t, k} and the belonging cluster z _{t, i} is output to the display device.

また、本発明は、前記クラスタリング装置は、演算結果を表示する表示装置と接続され、前記処理部が、前記終了条件を満足した場合、前記混合比β、前記クラスタ間関連度η_ｋ，ｌ、前記時間変化情報π_ｔ，ｋ、および前記所属クラスタｚ_ｔ，ｉのいずれか一つまたはいずれかの組み合わせを前記表示装置に出力することを特徴とする Further, according to the present invention, the clustering device is connected to a display device that displays a calculation result, and when the processing unit satisfies the termination condition, the mixture ratio β, the intercluster relevance η _{k, l} , Any one or any combination of the time change information π _{t, k} and the belonging cluster z _{t, i} is output to the display device.

このような構成によれば、時間変化情報π_ｔ，ｋと他のパラメータとを合わせて表示装置に出力することができる。 According to such a configuration, the time change information π _{t, k} and other parameters can be output together to the display device.

本発明は、前記クラスタリング方法を、クラスタリング装置としてのコンピュータに実行させるためのプログラムとした。 The present invention is a program for causing a computer as a clustering apparatus to execute the clustering method.

このようなプログラムをインストールされたコンピュータは、このプログラムに基づいた機能を実現することができる。 A computer in which such a program is installed can realize functions based on this program.

本発明によれば、複数のオブジェクトの間の関係の有無を示す関係データの時系列から、オブジェクトのクラスタの存在を推定し、クラスタの時間変化を推定する技術を提供することができる。 ADVANTAGE OF THE INVENTION According to this invention, the technique which estimates the presence of the cluster of an object from the time series of the relationship data which shows the presence or absence of the relationship between several objects, and estimates the time change of a cluster can be provided.

本実施形態におけるｄＩＲＭの処理の流れを示す図である。It is a figure which shows the flow of a process of dIRM in this embodiment. クラスタリング装置の構成の一例を示す図である。It is a figure which shows an example of a structure of a clustering apparatus. 所属クラスタ推定部の処理フローを示す図である。It is a figure which shows the processing flow of an affiliation cluster estimation part. 時間変化情報推定部の処理フローを示す図である。It is a figure which shows the processing flow of a time change information estimation part. クラスタ間関連度推定部の処理フローを示す図である。It is a figure which shows the processing flow of the inter-cluster relevance estimation part. 混合比推定部の処理フローを示す図である。It is a figure which shows the processing flow of a mixture ratio estimation part. （ａ）は、ｄＩＲＭに対するｒａｎｄｉｎｄｅｘの結果を示し、（ｂ）は、ＩＲＭに対するｒａｎｄｉｎｄｅｘの結果を示す図である。(A) shows the result of the rand index for dIRM, and (b) is the figure which shows the result of the rand index for IRM. （ａ）オブジェクト間の関係の一例を示す図であり、（ｂ）ＩＲＭ適用後のオブジェクト間の関係の一例を示す図である。(A) It is a figure which shows an example of the relationship between objects, (b) It is a figure which shows an example of the relationship between the objects after IRM application.

次に、本発明を実施するための形態（以降「本実施形態」と称す）について、適宜図面を参照しながら詳細に説明する。 Next, a mode for carrying out the present invention (hereinafter referred to as “the present embodiment”) will be described in detail with reference to the drawings as appropriate.

≪公知のＩＲＭの概要≫
初めに、ＩＲＭの概要について、図８を用いて説明する。図８（ａ）は、Ｎ種類のオブジェクトからなるドメインＤ＝｛１，２, ・・・, Ｎ｝上の二値の二項関係（２つのオブジェクトの間の関係）を示す図である。縦方向には一方のオブジェクトのインデックスｉ（Ｎ＝９）を表し、横方向には他方のオブジェクトのインデックスｊ（Ｎ＝９）を表している。図８（ａ）では、２つのオブジェクト間に関連が有る場合には、その升目を黒く塗りつぶして示している。一般的に、コンピュータによってこの関係の有無を数値で表す場合、演算を行いやすくするために、オブジェクトのインデックスｉ，ｊ間に関係が無いときにはｘ_ｉ，ｊ＝０と表し、オブジェクトのインデックスｉ，ｊ間に関係が有る場合にはｘ_ｉ，ｊ＝１と表すものとする。このデータｘ_ｉ，ｊは、関係データと呼ばれる。 ≪Overview of known IRM≫
First, an outline of the IRM will be described with reference to FIG. FIG. 8A is a diagram showing a binary binary relationship (relationship between two objects) on a domain D = {1, 2,..., N} composed of N types of objects. The vertical direction represents the index i (N = 9) of one object, and the horizontal direction represents the index j (N = 9) of the other object. In FIG. 8A, when there is a relationship between two objects, the cell is shown in black. In general, when the presence / absence of this relationship is expressed numerically by a computer, x _{i, j} = 0 is represented when there is no relationship between the object indexes i and j in order to facilitate calculation, and the object index i, If there is a relationship between j, x _{i, j} = 1. This data x _{i, j} is called relational data.

そして、図８（ａ）に示す関係データをＩＲＭに適用すると、図８（ｂ）に示すように、オブジェクトのインデックスｉ，ｊ間に関係の有るインデックスとそうでないインデックスとがそれぞれグループ化され、太い実線で区切られたクラスタが生成される。図８（ｂ）では、クラスタ数が自動的に３つに決定され、クラスタを識別する変数（以降、クラスタ番号とも称す。）ｋが１〜３と設定される。このとき、オブジェクトのインデックスｉが所属するクラスタをｚ_ｉ＝ｋと表現する。 Then, when the relational data shown in FIG. 8A is applied to the IRM, as shown in FIG. 8B, an index related to the indexes i and j of the object and an index not so are grouped, respectively. Clusters separated by thick solid lines are generated. In FIG. 8B, the number of clusters is automatically determined to be 3, and a variable for identifying the cluster (hereinafter also referred to as a cluster number) k is set to 1 to 3. At this time, the cluster to which the index i of the object belongs is expressed as z _i = k.

ＩＲＭの数学的モデルを、式（８）〜式（11）に示す。
The mathematical model of IRM is shown in Formula (8)-Formula (11).

ここで、β，ｚ_ｉ，η_ｋ，ｌ，ｘ_ｉ，ｊについては後記する。また、「〜」は確率分布からサンプリングすることを表す。Stick()は、ＤＰＭにおいて用いられる分布である。Multinomial()、Beta()、およびBernoulli()は、それぞれ、多項分布、ベータ分布、ベルヌイ分布を表す。また、γ，ξ，Ψは事前に設定するハイパーパラメータである。 Here, β, z _i , η _{k, l} , x _{i, j} will be described later. “˜” represents sampling from a probability distribution. Stick () is a distribution used in DPM. Multinomial (), Beta (), and Bernoulli () represent a multinomial distribution, a beta distribution, and a Bernoulli distribution, respectively. Γ, ξ, and Ψ are hyper parameters set in advance.

まず、式（８）では、無限次元のクラスタ混合比ベクトルβ（以降、単に混合比と称する。）を生成する。式（８）は具体的には式（12）のように計算される。
First, in Expression (8), an infinite dimensional cluster mixture ratio vector β (hereinafter simply referred to as a mixture ratio) is generated. Specifically, equation (8) is calculated as equation (12).

ここで、式（12）における矢印⇔は、左辺のｋ番目のクラスタが右辺で表されることを示す記号である。そして、式（12）の右辺の混合比β_ｋは、ｋ番目のクラスタにデータが所属している確率を表している。定義によって、Σ_ｋβ_ｋ＝１であれば、βは無限個のクラスタの混合比として用いることができる。 Here, the arrow ⇔ in Expression (12) is a symbol indicating that the k-th cluster on the left side is represented by the right side. The mixing ratio β _{k on} the right side of Expression (12) represents the probability that data belongs to the k-th cluster. By definition, if Σ _k β _k = 1, β can be used as a mixing ratio of an infinite number of clusters.

式（９）では、混合比βを用いて、オブジェクトのインデックスｉが所属するクラスタｚ_ｉ＝ｋを多項分布（Multinomial()）からサンプリングする。式（10）および式（11）は、クラスタリング処理後のクラスタｚ_ｉ＝ｋが与えられたのちに、実際に観測された関係データｘ_ｉ，ｊを生成する過程を表す。式（10）では、クラスタｋ，ｌ間の関係の強さを示すクラスタ間関連度η_ｋ，ｌをベータ分布（Beta()）からサンプリングする。このサンプリングされた値は、図８（ｂ）において、（ｋ，ｌ）で表されるクラスタ内の升目に黒（＝１）を含む確率を表す。具体的には、クラスタ（ｋ＝１，ｌ＝３）のクラスタ間関連度η_１，３は０．１７（＝１／６）となる。また、個別の関係データｘ_ｉ，ｊの値は、各オブジェクトｉ，ｊの所属する各クラスタｚ_ｉ，ｚ_ｊによって規定されるブロックの関係の強さを表すクラスタ間関連度η_{ｚｉ，ｚｊ}に基づいて、ベルヌイ分布（Bernoulli()）からサンプリングされる。 In equation (9), the cluster z _i = k to which the index i of the object belongs is sampled from the multinomial distribution (Multinomial ()) using the mixture ratio β. Equations (10) and (11) represent the process of generating the actually observed relational data x _{i, j} after the clustered cluster z _i = k is given. In Expression (10), the intercluster relevance η _{k, l} indicating the strength of the relationship between the clusters k, l is sampled from the beta distribution (Beta ()). This sampled value represents the probability that black (= 1) is included in the cell in the cluster represented by (k, l) in FIG. 8B. Specifically, the inter-cluster relevance η _1,3 of the cluster (k = 1, l = 3) is 0.17 (= 1/6). In addition, the value of the individual relation data x _{i, j} is represented by the inter-cluster relevance η _{zi, zj} representing the strength of the block relation defined by each cluster z _i , z _j to which each object i, j belongs. Based on the Bernoulli distribution (Bernoulli ()).

≪時間を含む時系列関係データをＩＲＭに適用する際の問題≫
次に、前記したＩＲＭの数学的モデルに、時間を含む時系列関係データを適用する際の問題の一例を説明する。今、時刻データを含んだ関係データ時系列Ｘ＝｛ｘ_{ｔ，ｉ，ｊ}∈｛０，１｝,１≦ｔ≦Ｔ｝が与えられたとする。ここで、ｘ_{ｔ，ｉ，ｊ}＝１は時刻ｔにおいてオブジェクトのインデックスｉ，ｊ間に関係が有ることを示す。なお、ｘ_{ｔ，ｉ，ｊ}は、時刻ｔと異なる時刻におけるオブジェクト間の関係を定義されないものとする。 ≪Problems when applying time series related data including time to IRM≫
Next, an example of a problem when applying time-series related data including time to the mathematical model of IRM described above will be described. Assume that a relational data time series X = {x _{t, i, j} ∈ {0, 1}, 1 ≦ t ≦ T} including time data is given. Here, x _{t, i, j} = 1 indicates that there is a relationship between the indices i, j of the object at time t. Note that x _{t, i, j} does not define the relationship between objects at a time different from time t.

まず、ＩＲＭの数学的モデルの式（８），式（９），式（10），式（11）には時刻インデックスｔが含まれていないため、時刻を含んだ関係データを適用することはできない。この問題を解決するためには、いくつかの方法が考えられる。最も単純には、時刻データを含んだ関係データ時系列Ｘから時刻データを含まない関係データ~Ｘを生成して、前記したＩＲＭの数学的モデルに適用する。例えば、関係データ~ｘ_ｉ，ｊを、式（13）のように表す。 First, since the time index t is not included in the equations (8), (9), (10), and (11) of the mathematical model of the IRM, it is possible to apply the relational data including the time. Can not. There are several ways to solve this problem. Most simply, relational data ~ X not including time data are generated from the relational data time series X including time data and applied to the IRM mathematical model described above. For example, the relational data _{˜xi, j} is expressed as in equation (13).

ここで、σは例えばσ＝０．５等、予め設定しておく閾値である。関係データ~ｘ_ｉ，ｊにおいて、時間情報が欠落する以上当然ではあるが、クラスタリングの時間変化等は完全に無視されることになる。この場合、時刻のインデックスを含んだ関係データ~ｘ_ｉ，ｊをＩＲＭに適用しようとすると、例えば、式（14）〜式（17）の様に表される。 Here, σ is a threshold value set in advance, for example, σ = 0.5. In relational data _{˜xi, j} , it is natural that time information is lost, but the time change of clustering is completely ignored. In this case, when the relational data ˜xi _{, j} including the time index is applied to the IRM, it is expressed as, for example, Expressions (14) to (17).

ここで、ｚ_ｔ，ｉ＝ｋは時刻ｔにおけるオブジェクトのインデックスｉの所属クラスタを表す。このｚ_ｔ，ｉ＝ｋは、ＩＲＭにおいて、オブジェクトの所属クラスタに時刻ｔのインデックスを追加しただけであるので、時間の情報が活かされていない。そして、前記式（14）〜式（17）に示す数学的モデルでは、どの時刻のどのオブジェクトの所属クラスタインデックスも独立に同じ分布からサンプリングされることになる。したがって、時刻ｔ−１と時刻ｔではクラスタリングの結果に相関関係がある、といったクラスタリングの時間ダイナミクスを一切表現することができない。 Here, z _{t, i} = k represents the cluster to which the index i of the object belongs at time t. Since this _{t t, i} = k is only the index of the time t added to the cluster to which the object belongs in the IRM, the time information is not utilized. In the mathematical model shown in the equations (14) to (17), the cluster index of any object at any time is independently sampled from the same distribution. Therefore, the time dynamics of clustering such that there is a correlation between the clustering results at time t-1 and time t cannot be expressed at all.

≪本実施形態におけるクラスタリングモデル（ｄＩＲＭ）≫
次に、ＩＲＭを拡張して、時間変化する関係データを考慮可能とする、新しいクラスタリングモデルdynamic Infinite Relational Model（ｄＩＲＭ）について説明する。まず、時間変化を表すために、オブジェクトのクラスタリングに隠れマルコフモデル（ＨＭＭ：Hidden Markov Model）に似た構造を導入し、時刻ｔ−１でのクラスタリングと時刻ｔでのクラスタリングとの間に依存関係を持たせる。これによって、例えば、時刻ｔ−１においてクラスタ１に所属したオブジェクトは次の時刻ｔにおいてクラスタ１とクラスタ２に所属しやすい、等のように、関係データ時系列の様々な側面をモデル化することが可能となる。ＨＭＭの詳細は、例えば、「L.R.Rabiner,“A Tutorial on Hidden Markov Models and Selected applications in Speech Recognition”, Proceedings of the IEEE, Vol.77, No.2, p. 257-286, 1989」に記載されている。 << Clustering model (dIRM) in this embodiment >>
Next, a new clustering model dynamic Infinite Relational Model (dIRM) that extends IRM to allow consideration of time-varying relational data will be described. First, in order to represent the time change, a structure similar to a Hidden Markov Model (HMM) is introduced to the clustering of objects, and the dependency between clustering at time t-1 and clustering at time t. To have. Thus, for example, modeling various aspects of the relational data time series such that an object belonging to cluster 1 at time t-1 is likely to belong to cluster 1 and cluster 2 at the next time t. Is possible. Details of the HMM are described in, for example, “LRRabiner,“ A Tutorial on Hidden Markov Models and Selected applications in Speech Recognition ”, Proceedings of the IEEE, Vol. 77, No. 2, p. 257-286, 1989”. Yes.

ＨＭＭは、非常に幅広い分野で利用されている時系列モデルである。ＨＭＭでは、各時刻ｔにおいて、観測不能な隠れ状態ｓ_ｔと観測データｙ_ｔとを定義する。ＨＭＭは、隠れ状態間の時間遷移確率ｐ（ｓ_ｔ｜ｓ_ｔ−１）と隠れ状態からの観測モデルｐ（ｙ_ｔ｜ｓ_ｔ）とで特徴付けられる。時間遷移確率ｐ（ｓ_ｔ｜ｓ_ｔ−１）は、時刻ｔにおける隠れ状態ｓ_ｔの値が時刻ｔ−１における隠れ状態の値ｓ_ｔ−１に依存して確率的に決定されることを表す。観測モデルｐ（ｙ_ｔ｜ｓ_ｔ）は、時刻ｔにおける観測データが同時刻の隠れ状態量に依存して決定されることを表す。 The HMM is a time series model used in a very wide range of fields. In HMM, at each time t, to define an unobservable hidden states _{s t} and observed data _{y t.} The HMM is characterized by a temporal transition probability p (s _t | s _t-1 ) between hidden states and an observation model p (y _t | s _t ) from the hidden states. Time transition probability _{_{p (s t | s t-}} 1) is that the value of the hidden state _{s t} at time t is determined stochastically dependent on the value _{s t-1} of the hidden state at time t-1 Represent. The observation model p (y _t | s _t ) represents that the observation data at time t is determined depending on the hidden state quantity at the same time.

このように、ｄＩＲＭでは、ＨＭＭを応用し、オブジェクトのインデックスｉが時刻ｔにおいて所属するクラスタｚ_ｔ，ｉを隠れ状態量と定義することで、クラスタリングの時間ダイナミクスをモデル化している。 Thus, in dIRM, the time dynamics of clustering is modeled by applying HMM and defining the cluster z _{t, i} to which the index i of the object belongs at time t as the hidden state quantity.

まず、公知のＨＭＭと同様に、時刻ｔ−１において、あるクラスタに所属していたオブジェクトが、次の時刻ｔにどのクラスタに所属しやすいか、という遷移確率を表すパラメータを用意する。また、時間ごとにクラスタ間の遷移確率自体が変化することも考えられることから、遷移確率パラメータは時刻ごとに分けて考える。なお、ｄＩＲＭではクラスタ数、すなわちＨＭＭの隠れ状態の取りうる値が不定（無限）であるため、単純なＨＭＭは利用できない。そこで、ｄＩＲＭでは、公知のinfinite ＨＭＭで示される隠れ状態数が各時刻で１つ(ｓ_ｔのみ)であることを拡張して、隠れ状態数が各時刻でＮ個のｚ_ｔ，ｉとなるように設定する。具体的なモデルは次の様になる。 First, as in the known HMM, a parameter representing a transition probability indicating which cluster that an object belonging to a certain cluster easily belongs to at the next time t is prepared. In addition, since the transition probability itself between clusters may change with time, the transition probability parameter is considered separately for each time. In dIRM, since the number of clusters, that is, the value that the HMM hidden state can take is indefinite (infinite), a simple HMM cannot be used. Therefore, in dirm, extends the number of hidden states shown by known infinite HMM is one at each time (s _t only), the number of hidden state is N number of z _{t, i} at each time Set as follows. The specific model is as follows.

ここで、γ，α_０，κ，ξ，Ψは事前に設定されるハイパーパラメータである。式（18）、式（21）、および式（22）はＩＲＭの場合の式（14）、式（16）、および式（17）と同じである。式（19）は、新たに導入されたパラメータであり、式（20）は、ＩＲＭの場合の式（15）と異なる。式（19）に示すπ_ｔ，ｋは、時刻ｔ−１においてクラスタｋに所属していたオブジェクトが、時刻ｔにどのクラスタに所属しやすいかを表すパラメータであり時間変化情報と称する。この時間変化情報π_ｔ，ｋは、公知のＨＭＭにおける状態間の遷移確率を時間依存で変化するよう拡張したことに対応する。また、式（20）はオブジェクトｉの時刻ｔにおける状態ｚ_ｔ，ｉが時刻ｔ−１における状態に依存することを表している点で、公知のＨＭＭモデルのアイデアを利用している。式（19）中のＤＰ()はDirichlet Processを表している。このＤＰ()は、直感的には無限次元のDirichlet分布と考えることができる。したがって、無限次元のパラメータα_０βから無限次元のベクトルπを生成することになる。この演算を簡略化するために、有限次元Ｌで近似した場合は式（23）の様なDirichlet分布となる。 Here, γ, α ₀ , κ, ξ, Ψ are hyper parameters set in advance. Expressions (18), (21), and (22) are the same as Expressions (14), (16), and (17) in the case of IRM. Equation (19) is a newly introduced parameter, and Equation (20) is different from Equation (15) in the case of IRM. Π _{t, k} shown in Expression (19) is a parameter indicating which cluster the object belonging to cluster k at time t−1 is likely to belong to at time t, and is referred to as time change information. This time change information π _{t, k} corresponds to the extension of the transition probability between states in a known HMM to change in a time-dependent manner. Further, Expression (20) uses the idea of a known HMM model in that it represents that the state z _t, i at the time t of the object i depends on the state at the time t−1. DP () in Equation (19) represents Dirichlet Process. This DP () can be intuitively considered as an infinite dimensional Dirichlet distribution. Therefore, an infinite dimensional vector π is generated from the infinite dimensional parameter α ₀ β. In order to simplify this calculation, when approximated by a finite dimension L, a Dirichlet distribution as shown in Expression (23) is obtained.

ここで、β^（Ｌ）はＬ次元ベクトルである。また、式（23）中の矢印→は、近似を表す。より詳しい数学的解説は「上田修功，山田武士,“ノンパラメトリックベイズモデル”,応用数理, Vol.8, No.3, p.16-214, 2007」に記載されている。 Here, β ^(L) is an L-dimensional vector. Moreover, the arrow → in the equation (23) represents approximation. A more detailed mathematical explanation is described in "Usuda Nobuyoshi, Yamada Takeshi," Nonparametric Bayes Model ", Applied Mathematics, Vol.8, No.3, p.16-214, 2007".

また、式（19）のκ（ただし、κ＞０）は、グローバルな混合比α_０×βのｋ番目の要素にκを加えて、サンプリングされるπ_ｔ，ｋのｋ番目の要素の値が大きくなりやすくするものである。なお、κについて詳細は、「E.B.Fox, E.B.Sudderth, M.I.Jordan and A.S.Willsky, “An HDPHMM for Systems with State Persistence”, Proceedings of the International Conference on Machine Learning (ICML), 2008」に記載されている。また、式（19）のδ_ｋは、クラスタ番号ｋのときに「１」の値となり、それ以外では「０」となるデルタ関数である。つまり、式（19）でのκの作用は、ＨＭＭによる隠れ状態ｓ_ｔ（ここではｚ_ｔ，ｉ）の決定は確率的なので、例えばｓ_１＝ｓ_２＝・・・＝ｓ_t＝ｋという値が正しい場合にも、確率的に一部の値がｋ以外の値をとる可能性が高くなることを避けることである。 Also, κ (where κ> 0) in Equation (19) is the value of the kth element of π _{t, k} sampled by adding κ to the kth element of the global mixing ratio α ₀ × β. Is easy to grow. Details of κ are described in “EBFox, EBSudderth, MIJordan and ASWillsky,“ An HDPHMM for Systems with State Persistence ”, Proceedings of the International Conference on Machine Learning (ICML), 2008”. In addition, δ _k in the equation (19) is a delta function having a value of “1” when the cluster number is k and “0” otherwise. That is, the action of κ in the equation (19) is that the determination of the hidden state s _t (here, z _{t, i} ) by the HMM is probabilistic, so that, for example, s ₁ = s ₂ = ... = s _t = k Even when the values are correct, it is to avoid the possibility that a part of the values takes a value other than k stochastically.

≪ｄＩＲＭの処理の流れ≫
図１を用いて、まず、ｄＩＲＭの処理の大まかな流れを説明し、個々の処理の詳細については後記する。なお、図１に示す処理は、クラスタリング装置８０（図２参照）が実行する。 ≪dIRM processing flow≫
First, a rough flow of dIRM processing will be described with reference to FIG. 1, and details of each processing will be described later. 1 is executed by the clustering apparatus 80 (see FIG. 2).

ステップＳ１０１では、処理に用いるハイパーパラメータγ，α_０，κ，ξ，Ψ、および観測値（関係データ時系列）Ｘ＝｛ｘ_{ｔ，ｉ，ｊ}｝を取得する。ステップＳ１０２では、所属クラスタｚ_ｔ，ｉ、時間変化情報π_ｔ，ｋ、クラスタ間関連度η_ｋ，ｌ、混合比β、クラスタ数Ｋを初期化する。クラスタ数Ｋは、クラスタを表す変数ｋの取りうる数である。ステップＳ１０３では、繰り返し変数ｉｔｒ＝１と設定する。そして、ステップＳ１０４〜Ｓ１０８まで繰り返し演算を実行する。 In step S101, hyperparameters γ, α ₀ , κ, ξ, ψ used for the processing, and observed values (related data time series) X = {x _{t, i, j} } are acquired. In step S102, the belonging cluster z _{t, i} , time change information π _{t, k} , inter-cluster relevance η _{k, l} , mixing ratio β, and number of clusters K are initialized. The cluster number K is a number that can be taken by the variable k representing the cluster. In step S103, the iteration variable itr = 1 is set. Then, the calculation is repeatedly performed from step S104 to S108.

ステップＳ１０４では、所属クラスタ（ｚ）推定部２３の処理（図３参照）を用いて、、所属クラスタｚ_ｔ，ｉをサンプリングして、そのサンプリングした所属クラスタｚ_ｔ，ｉによって記憶部３０（図６参照）に記憶されている所属クラスタｚ_ｔ，ｉを更新し保存（記憶）する。ステップＳ１０５では、時間変化情報（π）推定部２４の処理（図４参照）を用いて、時間変化情報π_ｔ，ｋをサンプリングして、そのサンプリングした時間変化情報π_ｔ，ｋによって記憶部３０に記憶されている時間変化情報π_ｔ，ｋを更新し保存（記憶）する。ステップＳ１０６では、クラスタ間関連度（η）推定部２５の処理（図５参照）を用いて、クラスタ間関連度η_ｋ，ｌをサンプリングして、そのサンプリングしたクラスタ間関連度η_ｋ，ｌによって記憶部３０に記憶されているクラスタ間関連度η_ｋ，ｌを更新し保存（記憶）する。ステップＳ１０７では、混合比（β）推定部２６の処理（図６参照）を用いて、混合比βをサンプリングして、そのサンプリングした混合比βによって記憶部３０に記憶されている混合比βを更新し保存（記憶）する。ステップＳ１０８では、繰り返し変数ｉｔｒを「１」加算してカウント数を増加する。 In step S104, the affiliation cluster z _{t, i} is sampled using the process of the affiliation cluster (z) estimation unit 23 (see FIG. 3) _, and the storage unit 30 (FIG. 6) and update (save) (store) the assigned cluster z _{t, i} stored in (6). In step S105, the time change information π _{t, k} is sampled by using the process of the time change information (π) estimation unit 24 (see FIG. 4) _, and the storage unit 30 uses the sampled time change information π _{t, k} . The time change information π _{t, k} stored in is updated and saved (stored). In step S106, the intercluster relevance (η) estimation unit 25 (see FIG. 5) is used to sample the intercluster relevance η _{k, l,} and the sampled intercluster relevance η _{k, l} is used. The inter-cluster relevance η _{k, l} stored in the storage unit 30 is updated and saved (stored). In step S107, the mixture ratio β is sampled by using the process of the mixture ratio (β) estimation unit 26 (see FIG. 6), and the mixture ratio β stored in the storage unit 30 is determined by the sampled mixture ratio β. Update and save (store). In step S108, the repetition variable itr is incremented by “1” to increase the count number.

ステップＳ１０９では、終了条件を満足したか否かを判定する。終了条件は、例えば、予め決めておいた所定の繰り返し回数になったことである。そして、終了条件を満足していない場合（ステップＳ１０９でＮｏ）、処理はステップＳ１０４へ戻る。また、終了条件を満足する場合（ステップＳ１０９でＹｅｓ）、ステップＳ１１０では、演算結果を出力し、処理を終了する。 In step S109, it is determined whether an end condition is satisfied. The termination condition is, for example, that a predetermined number of repetitions has been determined in advance. If the end condition is not satisfied (No in step S109), the process returns to step S104. If the end condition is satisfied (Yes in step S109), the calculation result is output in step S110, and the process ends.

なお、前記した終了条件の代わりとして、終了条件を、１つ前のサンプリング処理におけるパラメータの値との差分の絶対値が、予め設定しておいた所定の閾値以下になったこと、としても構わない。また、演算結果の出力においては、所属クラスタｚ_ｔ，ｉ，時間変化情報π_ｔ，ｋ，クラスタ間関連度η_ｋ，ｌ，混合比βのパラメータ中から、必要なパラメータのデータのみを出力する。また、ステップＳ１０１においてハイパーパラメータγ，κ，α_０，ξ，Ψを事前に与える代わりに、繰り返し演算のループの中で、他のパラメータと同時に推定しても良い（例えば、「Y.W.Teh, M.I.Jordan, M.J.Beal and D.M.Blei,“Hierarchical Dirichlet Process”, Journal of the American Statistical Association, Vol.101, No.476, p.1566-1581, 2006」参照）。 In place of the above-described end condition, the end condition may be that the absolute value of the difference from the parameter value in the previous sampling process is equal to or less than a predetermined threshold value set in advance. Absent. Further, in the output of the calculation result, only necessary parameter data is output from the parameters of the belonging cluster z _{t, i} , time change information π _{t, k} , inter-cluster relevance η _{k, l} , and mixture ratio β. . Further, instead of giving the hyper parameters γ, κ, α ₀ , ξ, Ψ in advance in step S101, they may be estimated simultaneously with other parameters in a loop of repetitive calculation (for example, “YWTeh, MIJordan, MJ Beal and DMBlei, “Hierarchical Dirichlet Process”, Journal of the American Statistical Association, Vol. 101, No. 476, p.1566-1581, 2006).

次に、図１の各ステップの処理を実行する、クラスタリング装置８０（後記する図２参照）の構成および処理の詳細について説明する。まず、処理の前提について、先に説明する。例えば、所属クラスタｚ_ｔ，ｉ、時間変化情報π_ｔ，ｋ、クラスタ間関連度η_ｋ，ｌ、混合比βを推定するために、ｂｅａｍｓａｍｐｌｉｎｇを用いる場合を示す。この理由は、ｂｅａｍｓａｍｐｌｉｎｇが、無限個存在するクラスタを有限個に打ち切って推定するので、無限次元ベクトルとして扱わなければならなかったβやπ_ｔ，ｋを有限次元ベクトルで扱えるためである。そして、ｂｅａｍｓａｍｐｌｉｎｇでは、有限個に打ち切って推定したクラスタの数は、サンプリングを繰り返すことで、理論上無限次元ベクトルを正当に扱った場合と等価になる。なお、ｂｅａｍｓａｍｐｌｉｎｇの代わりにＧｉｂｂｓサンプリングによる方法を用いることもできるが、ｂｅａｍｓａｍｐｌｉｎｇは、前記したように有限個に打ち切って演算を実行するので、Ｇｉｂｂｓサンプリングよりも高速に処理を行うことができるという利点がある。 Next, the configuration and processing details of the clustering apparatus 80 (see FIG. 2 described later) that executes the processing of each step in FIG. 1 will be described. First, the premise of a process is demonstrated previously. For example, a case of using beam sampling to estimate the belonging cluster z _{t, i} , time change information π _{t, k} , inter-cluster relevance η _{k, l} , and mixing ratio β will be shown. This is because because beam sampling estimates the infinite number of clusters by cutting it into a finite number, β and π _{t, k} that had to be handled as infinite dimensional vectors can be handled as finite dimensional vectors. In beam sampling, the number of clusters estimated by cutting into a finite number is equivalent to a case where an infinite dimensional vector is theoretically handled by repeating sampling. In addition, although the method by Gibbs sampling can also be used instead of beam sampling, it can be processed at higher speed than Gibbs sampling, because beam sampling cuts a finite number of operations as described above. There are advantages.

また、以下の説明では、２つのドメインのオブジェクトの数は同じ（インデックスｉ，ｊがどちらも１，２，・・・，Ｎ）ものとするが、これらは同じでなくても良い。また、本実施形態では、クラスタを表す変数ｋを正整数と仮定して説明を行うが、変数ｋはクラスタリングの各グループの名前を識別できることが目的であるため、変数ｋの値そのものには意味はなく、正整数のかわりにａ，ｂ，ｃ等の記号（シンボル）や文字列で表現しても良い。 In the following description, the number of objects in the two domains is the same (indexes i and j are both 1, 2,..., N), but they may not be the same. In the present embodiment, the description is made assuming that the variable k representing the cluster is a positive integer. However, since the variable k is intended to identify the name of each group of clustering, the value of the variable k itself has no meaning. Instead, it may be expressed by a symbol such as a, b, c, or a character string instead of a positive integer.

≪クラスタリング装置の構成≫
次に、クラスタリング装置８０の構成について、図２を用いて説明する。図２に示すように、クラスタリング装置８０は、処理部２０と記憶部３０とを備え、処理部２０がコンピュータにおけるＣＰＵ（Central Processing Unit）に相当し、記憶部３０が主記憶装置やＨＤＤ（Hard Disc Drive）やＵＳＢ（Universal Serial Bus）メモリ等に相当する。また、クラスタリング装置８０には、クラスタリング装置８０へデータを入力する入力装置１０、およびクラスタリング装置８０の演算結果を表示する表示装置４０が接続可能である。入力装置１０は、例えば、キーボードやマウス等であり、ユーザによるデータ入力や処理操作の指示に用いられる。表示装置４０は、例えば、ディスプレイ等であって、ユーザによる演算結果の確認に用いられる。なお、入力装置１０および表示装置４０は、必須ではなく、クラスタリング装置８０に外部接続されるＵＳＢメモリ等の記憶部（図示せず）を介して、該記憶部に記憶されているデータを入力とし、演算結果を該記憶部に出力するようにしても構わない。ただし、本実施形態では、クラスタリング装置８０に入力装置１０および表示装置４０が接続されているケースについて説明する。 ≪Configuration of clustering device≫
Next, the configuration of the clustering apparatus 80 will be described with reference to FIG. As shown in FIG. 2, the clustering device 80 includes a processing unit 20 and a storage unit 30, the processing unit 20 corresponds to a CPU (Central Processing Unit) in the computer, and the storage unit 30 is a main storage device or HDD (Hard (Hard Disk)). It corresponds to a disc drive (USB) or universal serial bus (USB) memory. Further, the input device 10 for inputting data to the clustering device 80 and the display device 40 for displaying the calculation result of the clustering device 80 can be connected to the clustering device 80. The input device 10 is, for example, a keyboard, a mouse, or the like, and is used for data input and processing operation instructions by a user. The display device 40 is, for example, a display or the like, and is used for checking a calculation result by a user. Note that the input device 10 and the display device 40 are not indispensable, and input data stored in the storage unit via a storage unit (not shown) such as a USB memory externally connected to the clustering device 80. The calculation result may be output to the storage unit. However, in the present embodiment, a case where the input device 10 and the display device 40 are connected to the clustering device 80 will be described.

クラスタリング装置８０の処理部２０は、入力データの初期化を実行する初期設定部２１と、繰り返し演算を実行する推定演算部２２とを備える。推定演算部２２は、機能として、所属クラスタ（ｚ）推定部２３、時間変化情報（π）推定部２４、クラスタ間関連度（η）推定部２５、混合比（β）推定部２６、および終了判定部２８を備える。初期設定部２１は、図１に示すステップＳ１０１およびＳ１０２を実行する。所属クラスタ（ｚ）推定部２３、時間変化情報（π）推定部２４、クラスタ間関連度（η）推定部２５、および混合比（β）推定部２６は、それぞれ、図１に示すステップＳ１０４、Ｓ１０５、Ｓ１０６、およびＳ１０７を実行する。そして、終了判定部２８は、図１に示すステップＳ１０９およびＳ１１０を実行する。なお、図２中に破線で示したハイパーパラメータ算出部２７は、ハイパーパラメータを事前に与える代わりに、繰り返し演算のループの中で推定する場合に用いられる。また、各部２３〜２７の演算する順番は、図示している順に限られない。 The processing unit 20 of the clustering apparatus 80 includes an initial setting unit 21 that performs initialization of input data, and an estimation calculation unit 22 that performs repeated calculations. The estimation calculation unit 22 functions as a belonging cluster (z) estimation unit 23, a time change information (π) estimation unit 24, an intercluster relevance (η) estimation unit 25, a mixture ratio (β) estimation unit 26, and an end. A determination unit 28 is provided. The initial setting unit 21 executes steps S101 and S102 shown in FIG. The affiliation cluster (z) estimation unit 23, the time change information (π) estimation unit 24, the inter-cluster relevance (η) estimation unit 25, and the mixture ratio (β) estimation unit 26 are shown in FIG. S105, S106, and S107 are executed. Then, the end determination unit 28 executes steps S109 and S110 shown in FIG. Note that the hyper parameter calculation unit 27 indicated by a broken line in FIG. 2 is used when estimating in a loop of repetitive calculation instead of giving the hyper parameter in advance. Moreover, the order which each part 23-27 calculates is not restricted to the order shown.

記憶部３０は、処理部２０の演算に用いられる各変数として、推定値３１、補助変数３２、メッセージ変数３３、カウント変数３４、クラスタ数３６、および観測値３７を記憶する。また、記憶部３０には、処理部２０によって実行されるアプリケーションプログラムが記憶されている。 The storage unit 30 stores an estimated value 31, an auxiliary variable 32, a message variable 33, a count variable 34, a cluster number 36, and an observed value 37 as each variable used for the calculation of the processing unit 20. The storage unit 30 stores an application program executed by the processing unit 20.

次に、図２に示すクラスタリング装置８０の処理部２０の各部２１〜２８の処理の詳細について説明する。 Next, details of the processes of the respective units 21 to 28 of the processing unit 20 of the clustering apparatus 80 illustrated in FIG. 2 will be described.

（初期設定部）
初期設定部２１は、入力装置１０から、関係データ時系列Ｘ＝｛ｘ_{ｔ，ｉ，ｊ}｝（ｔ＝１，２，・，Ｔ，ｉ＝１，２，・，Ｎ，ｊ＝１，２，・，Ｎ）とハイパーパラメータγ，α_０，κ，ξ，Ψとを取得し（図１のステップＳ１０１）、それぞれ記憶部３０の観測値３７とハイパーパラメータ３５として記憶する。また、初期設定部２１は、式（18）〜式（22）において定義されたパラメータβ，ｚ_ｔ，ｉ，π_ｔ，ｋ，η_ｋ，ｌおよびクラスタ数Ｋの初期値を設定し、記憶部３０の推定値３１とクラスタ数３６として記憶する（図１のＳ１０２)。初期値の設定は次の様に行う。クラスタ数Ｋの初期値は、ランダムな正整数を設定する。すべてのｚ_ｔ，ｉについては、１からクラスタ数Ｋのうちのいずれかの整数値を設定する。これは、完全ランダムに設定すれば良い。Ｌ次元のベクトルであるπ_ｔ，ｋについては、ベクトルの要素の和が１であり、かつ、すべての要素が非負であるという条件を満たすように、ランダムな値を設定する。η_ｋ，ｌについては、０≦η_ｋ，ｌ≦１なる実数値をランダムに割り当てる。Ｌ次元のベクトルであるβについては、ベクトルの要素の和が１であり、かつ、すべての要素が非負であるという条件を満たすように、ランダムな値を設定する。 (Initial setting part)
The initial setting unit 21 receives the relation data time series X = {x _{t, i, j} } (t = 1, 2,..., T, i = 1, 2,..., N, j = 1, from the input device 10. 2,..., N) and hyperparameters γ, α ₀ , κ, ξ, and Ψ are acquired (step S101 in FIG. 1), and stored as an observed value 37 and a hyperparameter 35 in the storage unit 30, respectively. The initial setting unit 21 sets and stores the initial values of the parameters β, z _{t, i} , π _{t, k} , η _{k, l} and the number of clusters K defined in the equations (18) to (22). The estimated value 31 of the unit 30 and the number of clusters 36 are stored (S102 in FIG. 1). The initial value is set as follows. A random positive integer is set as the initial value of the number of clusters K. For all z _{t, i} , any integer value from 1 to the number K of clusters is set. This may be set completely at random. For π _{t, k} which is an L-dimensional vector, random values are set so as to satisfy the condition that the sum of vector elements is 1 and all elements are non-negative. For η _{k, l} , real values of 0 ≦ η _{k, l} ≦ 1 are randomly assigned. For β, which is an L-dimensional vector, a random value is set so that the condition that the sum of the elements of the vector is 1 and all the elements are non-negative is satisfied.

（所属クラスタ（ｚ）推定部）
所属クラスタ（ｚ）推定部２３は、ｚ_ｔ，ｉのサンプリングを行い、そのサンプリングした値によって記憶部３０に保存してあるｚ_ｔ，ｉを更新し記憶する。所属クラスタ（ｚ）推定部３１の処理フローについて図３を用いて説明する。まず、ステップＳ３０１では、所属クラスタ（ｚ）推定部３１は、現在記憶部３０に保存してあるｚ_ｔ，ｉ，π_ｔ，ｋ，η_ｋ，ｌ，β，Ｋ，Ｘ＝｛ｘ_{１，１，１}，ｘ_{１，１，２}，・・・，ｘ_{Ｔ，Ｎ，Ｎ}｝を取得する。次に、ステップＳ３０２では、ｔ＝１〜Ｔ，ｉ＝１〜Ｎについて、式（１）を用いて、補助変数ｕ_ｔ，ｉを次のように算出する。 (Affiliation cluster (z) estimation part)
Cluster membership (z) estimation unit 23 samples the z _{t, i,} z _t that are stored in the storage unit 30 by the sampled value _to update and store _i. A processing flow of the affiliation cluster (z) estimation unit 31 will be described with reference to FIG. First, in step S301, the cluster (z) estimation unit 31 belongs to z _{t, i} , π _{t, k} , η _{k, l} , β, K, X = {x _{1, 1,1} , _x1,1,2 , ..., _xT , _{N, N} }. Next, in step S302, auxiliary variables u _{t, i} are calculated as follows using equation (1) for t = 1-T and i = 1-N.

ここで、Uniform()は、一様分布を表す。つまり、式（１）は、ｕ_ｔ，ｉ（t＝１〜Ｔ，ｉ＝１〜Ｎ）が、一様分布からサンプリングされることを表す。 Here, Uniform () represents a uniform distribution. That is, Equation (1) represents that u _{t, i} (t = 1 to T, i = 1 to N) is sampled from the uniform distribution.

続いて、ステップＳ３０３〜Ｓ３０９では、ｔ＝１からｔ＝Ｔまですべてのｉ＝1〜Ｎおよび必要なｋについてメッセージ変数ｐ_{ｔ，ｉ，ｋ}を算出する。このメッセージ変数の算出は、ｔ＝１からｔ＝Ｔまで順番に行うので、ｆｏｒｗａｒｄｆｉｌｔｅｒｉｎｇと呼ぶ。具体的には、ステップＳ３０３ではｔ＝１と設定し、ステップＳ３０４ではｉ＝１と設定し、ステップＳ３０５では式（２）を用いてメッセージ変数ｐ_{ｔ，ｉ，ｋ}を算出し、ステップＳ３０６ではｉを「１」増加し、ステップＳ３０７でｉ＞Ｎの場合はステップＳ３０８でｔを「１」増加し、ステップＳ３０９でｔ＞Ｔの場合にステップＳ３１０へ処理を移行する。なお、ステップＳ３０７でｉ≦Ｎの場合はステップＳ３０５へ戻り、ステップＳ３０９でｔ≦Ｔの場合はステップＳ３０４へ戻る。 Subsequently, in steps S303 to S309, message variables p _{t, i, k} are calculated for all i = 1 to N and necessary k from t = 1 to t = T. Since the calculation of the message variable is performed in order from t = 1 to t = T, it is called forward filtering. Specifically, t = 1 is set in step S303, i = 1 is set in step S304, message variables p _{t, i, k} are calculated using equation (2) in step S305, and in step S306. i is increased by “1”. If i> N in step S307, t is increased by “1” in step S308, and if t> T in step S309, the process proceeds to step S310. If i ≦ N in step S307, the process returns to step S305. If t ≦ T in step S309, the process returns to step S304.

なお、ステップＳ３０５では、メッセージ変数ｐ_{ｔ，ｉ，ｋ}は、式（２）によって算出される。ただし、メッセージ変数は式（３）を満たすような変数である。
In step S305, the message variable p _{t, i, k} is calculated by equation (2). However, the message variable is a variable that satisfies Expression (3).

ここで、式（２）は、ｚ_ｔ，ｉに関するすべての時系列関係データの尤度を既存のメッセージ変数に乗算することを表す。なお、ｋに関してはｕ_{ｔ＋１，ｉ}＜π_{ｔ＋１，ｋ，ｚｔ＋１，ｉ}となるｋ、すなわち有限個のｋに関してのみ計算し、その他のｋに関してはｐ_{ｔ，ｉ，ｋ}＝０とする。つまり、ｚ_ｔ，ｉ＝ｋのサンプリングにおいて、ｕ_ｔ，ｉ＞π_{ｔ，ｚｔ―１，ｉ，ｋ}となるクラスタを表すクラスタ番号ｋにはオブジェクトのインデックスｉをアサイン（関連付け）できなくすることによって、ｚ_ｔ，ｉの取りうるクラスタ番号の数を無限個から有限個に削減することができる。また、ｔ＝１の場合には、右辺最終項を無視して演算する。 Here, Equation (2) represents multiplying the existing message variable by the likelihood of all the time series relation data regarding z _{t, i} . Note that k is calculated for k that satisfies u _{t + 1, i} <π _{t + 1, k, zt + 1, i} , that is, only a finite number of k, and _pt _{, i, k} = 0 for the other k. That is, in sampling at z _{t, i} = k, the object index i cannot be assigned (associated) to the cluster number k representing the cluster where u _{t, i} > π _{t, zt-1, i, k.} Thus, the number of cluster numbers that z _{t, i} can take can be reduced from infinite to finite. When t = 1, the calculation is performed while ignoring the last term on the right side.

また、式（３）において、ｘ_{１：ｔ，・，・}は、ｘ_{ｔ´，ｉ，ｊ}の全集合のうち、１≦ｔ´≦ｔ, １≦ｉ，ｊ≦Ｎを満たすすべての値の集合を表す。つまり、ｘ_{１：ｔ，・，・}は、時刻ｔ＝１〜ｔのすべての関係データである。ｕ_{１：ｔ，・}は、ｕ_{１：ｔ´，ｉ}のうち、１≦ｔ´≦ｔ, １≦ｉ≦Ｎを満たす集合を表す。なお、「・」はすべてのインデックスを示す。 In Expression (3), x _{1: t,} _{... Are} all values satisfying 1 ≦ t ′ ≦ t, 1 ≦ i, j ≦ N among all sets of x _{t ′, i, j.} Represents a set of That is, x _{1: t,} ... Is all relational data at times t = _{1 to t} . u _{1: t,} _... represents a set satisfying 1 ≦ t ′ ≦ t and 1 ≦ i ≦ N among u _{1: t ′, i} . “·” Indicates all indexes.

次に、ステップＳ３１０〜Ｓ３１６では、ｔ＝Ｔ，Ｔ−１，・，１，ｉ＝１〜Ｎについて、ｂａｃｋｗａｒｄｓａｍｐｌｉｎｇによってｚ_ｔ，ｉ＝ｋの値を算出する。具体的には、ステップＳ３１０ではｔ＝Ｔと設定し、ステップＳ３１１ではｉ＝１と設定し、ステップＳ３１２では式（４）を用いてｚ_ｔ，ｉをサンプリングして更新保存し、ステップＳ３１３ではｉを「１」増加し、ステップＳ３１４でｉ＞Ｎの場合はステップＳ３１５でｔから「１」減少し、ステップＳ３１６でｔ＜１の場合はステップＳ３１７へ処理を移行する。なお、ステップＳ３１４でｉ≦Ｎの場合はステップＳ３１２へ戻り、ステップＳ３１６でｔ≧１の場合はステップＳ３１１へ戻る。 Next, in steps S310 to S316, for t = T, T-1,..., 1, i = 1 to N, the value of z _{t, i} = k is calculated by backing sampling. Specifically, t = T is set in step S310, i = 1 is set in step S311, z _{t, i} is sampled and updated using equation (4) in step S312, and in step S313. i is increased by “1”. If i> N in step S314, “1” is decreased from t in step S315. If t <1 in step S316, the process proceeds to step S317. If i ≦ N in step S314, the process returns to step S312. If t ≧ 1 in step S316, the process returns to step S311.

ステップＳ３１２では、メッセージ変数ｐ_{ｔ，ｉ，ｋ}は、式（４）を用いて算出される。
In step S312, the message variables p _{t, i, k} are calculated using equation (4).

ここで、右辺のＩ（）は、カッコ内の条件式が満たされれば１、そうでなければ０の値をとる関数である。したがって、ここでもｋに関してはｕ_{ｔ＋１，ｉ}＜π_{ｔ＋１，ｋ，ｚｔ＋１，ｉ}となるｋ、すなわち有限個のｋに関してのみ計算すれば良い。式（４）を、ｔ＝Ｔからｔ＝１まで、すべてのｉ，ｋについて演算してｚ_ｔ，ｉ＝ｋをサンプリングする。このとき、式（４）の右辺が０となる場合は、そのときのｚ_ｔ，ｉ＝ｋは無視される（サンプリングされない）。そして、そのサンプリングしたｚ_ｔ，ｉによって記憶部３０に記憶されているｚ_ｔ，ｉを更新し保存する。その結果として、ｚ_ｔ，ｉのクラスタ番号の取りうるクラスタ数Ｋ個の値だけが選択される。このように、ｄＩＲＭでは、サンプリングによって選択されたｚ_ｔ，ｉのクラスタ番号がＫ個に限定されるが、一般性を失うことはない。 Here, I () on the right side is a function that takes a value of 1 if the conditional expression in parentheses is satisfied and 0 otherwise. Therefore, it is only necessary to calculate k for k that satisfies u _{t + 1, i} <π _{t + 1, k, zt + 1, i} , that is, a finite number of k. Equation (4) is calculated for all i, k from t = T to t = 1 to sample z _{t, i} = k. At this time, if the right side of Equation (4) is 0, z _{t, i} = k at that time is ignored (not sampled). Then, the sampled z _t, z stored in the storage unit 30 by _i _t, stores and updates the _i. As a result, only the value of K number of clusters that the cluster number of z _{t, i} can take is selected. Thus, in dIRM, the cluster number of z _{t, i} selected by sampling is limited to K, but generality is not lost.

ステップＳ３１７では、以上の処理によってサンプリングして保存されたすべてのｚ_ｔ，ｉのクラスタ番号の取りうるクラスタ数をＫとして記憶部３０のＫを更新し保存する。次に、ステップＳ３１８では、記憶部３０に保存されているすべての変数について、取りうるクラスタ番号を１〜Ｋに更新し、保存する。 In step S317, K in the storage unit 30 is updated and stored, where K is the number of clusters that can be taken by all z _{t, i} cluster numbers sampled and stored by the above processing. Next, in step S318, for all variables stored in the storage unit 30, the possible cluster numbers are updated to 1 to K and stored.

このステップＳ３１８の処理の具体例を、以下に示す。いま、時間ステップ数ｔ＝１でオブジェクトの数Ｎ＝５の場合に、記憶されているｚ_１，ｉが次のようになっていたとする。
ｚ_１，１＝１
ｚ_１，２＝３
ｚ_１，３＝４
ｚ_１，４＝６
ｚ_１，５＝６ A specific example of the process in step S318 is shown below. Assume that the stored z _{1, i} is as follows when the number of time steps t = 1 and the number of objects N = 5.
z _1,1 = 1
z _{1, 2} = 3
z _1,3 = 4
z _1,4 = 6
z _1,5 = 6

この場合、使用されているクラスタ番号は１，３，４，６の４種類であるためＫ＝４となる。上記のクラスタ番号の範囲１〜６のうち、クラスタ番号２と５は使用されていないので、クラスタ番号のインデックス「３」を「２」に、「４」を「３」に、「６」を「４」に置き換えることにより、無駄な（使用されていない）クラスタの番号がないようにする。
ｚ_１，１＝１（更新しない）
ｚ_１，２＝３ → ２
ｚ_１，３＝４ → ３
ｚ_１，４＝６ → ４
ｚ_１，５＝６ → ４ In this case, since there are four types of cluster numbers 1, 3, 4, and 6, K = 4. Since the cluster numbers 2 and 5 are not used in the cluster number ranges 1 to 6, the cluster number index “3” is set to “2”, “4” is set to “3”, and “6” is set. By replacing with “4”, there is no useless (unused) cluster number.
z _1,1 = 1 (do not update)
z _{1, 2} = 3 → 2
z _1,3 = 4 → 3
z _1,4 = 6 → 4
z _1,5 = 6 → 4

これにより、すべての変数ｚ_ｔ，ｉについて、とりうるクラスタの番号が１〜４のいずれかになるように更新される。そして、このようにｚ_ｔ，ｉを書き換えると、他のパラメータπ、η、βや補助変数のインデックスｋ，ｌにも影響を及ぼすので、これらのすべての変数について上記ｚ_ｔ，ｉと同様にインデックスの書き換え処理（ステップＳ３１８）を行う。 As a result, all the variables z _{t, i} are updated so that the possible cluster numbers are any one of 1-4. And if z _{t, i} is rewritten in this way, it also affects the other parameters π, η, β and the indices k, l of the auxiliary variables, so that all these variables are the same as z _{t, i} above. An index rewriting process is performed (step S318).

（時間変化情報（π）推定部）
図２に示す時間変化情報（π）推定部２４は、π_ｔ，ｋのサンプリングを行い、そのサンプリングしたπ_ｔ，ｋによって記憶部３０に保存してあるπ_ｔ，ｋを更新し記憶する。時間変化情報（π）推定部２４の処理フローについて図４を用いて説明する。 (Time change information (π) estimation unit)
Time change information shown in FIG. 2 ([pi) estimation unit 24 samples the [pi _{t, k,} is updated to store the [pi _{t, k} that are stored in the storage unit 30 by the sampled [pi _{t, k.} The processing flow of the time change information (π) estimation unit 24 will be described with reference to FIG.

まず、ステップＳ４０１では、記憶部３０に現在保存してあるｚ_ｔ，ｉ，π_ｔ，ｋ，β，Ｋとハイパーパラメータα_０，κとを取得する。ここで、ｔを固定したときにｚ_{ｔ−１，ｉ}＝ｋかつｚ_ｔ，ｉ＝ｌとなるオブジェクトの数をｍ_{ｔ，ｋ，ｌ}と表し、そのｍ_{ｔ，ｋ，ｌ}をカウント変数とする。ステップＳ４０２では、すべてのｔ＝１〜Ｔ、ｋ＝１〜Ｋ、ｌ＝１〜Ｋについて、ｍ_{ｔ，ｋ，ｌ}＝０に初期化（設定）する。 First, in step S401, z _{t, i} , π _{t, k} , β, K and hyper parameters α ₀ , κ currently stored in the storage unit 30 are acquired. Here, when t is fixed _, the number of objects with z _{t−1, i} = k and z _{t, i} = _l is represented as m _{t, k, l,} and m _{t, k, l} is _defined as a count variable. To do. In step S402, all t = 1 to _{T, k} = 1 to _{K, and l} = 1 to _K are initialized (set) to m _{t, k, l} = 0.

次に、ステップＳ４０３〜Ｓ４１１によって、ｔ＝１〜Ｔ、ｉ＝１〜Ｎについてｍ_{ｔ，ｋ，ｌ}を算出する。具体的には、ステップＳ４０３ではｔ＝１と設定し、ステップＳ４０４ではｉ＝１と設定し、ステップＳ４０５ではｋ＝ｚ_{ｔ−１，ｉ}と設定し、ステップＳ４０６ではｌ＝ｚ_ｔ，ｉと設定し、ステップＳ４０７ではｍ_{ｔ，ｋ，ｌ}を「１」増加し、ステップＳ４０８ではｉを「１」増加し、ステップＳ４０９ではｉ＞Ｎの場合はステップＳ４１０でｔを「１」増加し、ステップＳ４１１ではｔ＞Ｔの場合にステップＳ４１２へ処理を移行する。なお、ステップＳ４０９でｉ≦Ｎの場合はステップＳ４０５へ戻り、ステップＳ４１１でｔ≦Ｔの場合はステップＳ４０４へ戻る。 Next, in steps S403 to S411, mt _{, k, and l} are calculated for _t = 1 to T and i = 1 to N. Specifically, t = 1 is set in step S403, i = 1 is set in step S404, k = z _{t-1, i} is set in step S405, and l = z _{t, i} is set in step S406. In step S407, m _{t, k, l} is increased by “1”, i is increased by “1” in step S408, and in step S409, if i> N, t is increased by “1” in step S410, In step S411, if t> T, the process proceeds to step S412. If i ≦ N in step S409, the process returns to step S405. If t ≦ T in step S411, the process returns to step S404.

次に、ステップＳ４１２では、ｔ＝１〜Ｔ、ｋ＝１〜Ｋに対して、式（24）を用いてπ_ｔ，ｋをサンプリングし、記憶部３０のπ_ｔ，ｋを更新し保存する。

ここで、κは式（19）で使用されるκと同じであり、また、β_ｕ＝１−Σ^Ｋ _ｋ＝１β_ｋである。 Next, at step S412, t = 1 to T, with respect to k = 1 to K, to sample the [pi _{t, k} using Equation (24), stores and updates the [pi _{t, k} of the storage section 30 .

Here, κ is the same as κ used in Equation (19), and β _u = 1−Σ ^K _{k = 1} β _k .

（クラスタ間関連度（η）推定部）
図２に示すクラスタ間関連度（η）推定部２５は、η_ｋ，ｌのサンプリングを行い、そのサンプリングした値によって記憶部３０に保存してあるη_ｋ，ｌを更新し記憶する。クラスタ間関連度（η）推定部２５の処理フローについて図５を用いて説明する。まず、ステップＳ５０１では、記憶部３０に現在保存してあるｚ_ｔ，ｉ，η_ｋ，ｌ，Ｋとハイパーパラメータξ，Ψと関係データ時系列Ｘとを取得する。 (Inter-cluster relevance (η) estimation part)
Cluster relevancy shown in FIG. 2 (eta) estimator 25, eta _k, samples the _l, updates and stores the eta _{k, l} that are stored in the storage unit 30 by the sampled value. A processing flow of the inter-cluster relevance (η) estimation unit 25 will be described with reference to FIG. First, in step S501, z _{t, i} , η _{k, l} , K, hyper parameters ξ, Ψ, and relational data time series X currently stored in the storage unit 30 are acquired.

ここで、ｚ_ｔ，ｉ＝ｋ，ｚ_ｔ，ｊ＝ｌとなる（ｔ，ｉ，ｊ）の組の数をＮ_ｋ，ｌ、そのうちｘ_{ｔ，ｉ，ｊ}＝１となった観測値の数をｎ_ｋ，ｌとし、それらのＮ_ｋ，ｌおよびｎ_ｋ，ｌをカウント変数とする。ステップＳ５０２では、各Ｎ_ｋ，ｌおよび各ｎ_ｋ，ｌの初期値は「０」に初期化（設定）する。 Here, the number of sets of (t, i, j) where z _{t, i} = k, z _{t, j} = l is N _{k, l} , of which the observed values are x _{t, i, j} = 1. Let n _{k, l be a} number, and let N _{k, l} and n _{k, l} be count variables. In step S502, the initial values of each N _{k, l} and each n _{k, l} are initialized (set) to “0”.

次に、ステップＳ５０３〜Ｓ５１４によって、ｔ＝１〜Ｔ、ｉ＝１〜ＮについてＮ_ｋ，ｌおよびｎ_ｋ，ｌを算出する。具体的には、ステップＳ５０３ではｔ＝１と設定し、ステップＳ５０４ではｉ＝１と設定し、ステップＳ５０５ではｋ＝ｚ_ｔ，ｉと設定し、ステップＳ５０６ではｊ＝１と設定し、ステップＳ５０７ではｌ＝ｚ_ｔ，ｉと設定し、ステップＳ５０８ではＮ_ｋ，ｌを「１」増加するとともにｎ_ｋ，ｌにｘ_{ｔ，ｉ，ｊ}を加算し、ステップＳ５０９ではｊを「１」増加し、ステップＳ５１０ではｊ＞Ｎの場合はステップＳ５１１でｉを「１」増加し、ステップＳ５１２ではｉ＞Ｎの場合はステップＳ５１３でｔを「１」増加し、ステップＳ５１４ではｔ＞Ｔの場合にステップＳ５１５へ処理を移行する。なお、ステップＳ５１０でｊ≦Ｎの場合はステップＳ５０７へ戻り、ステップＳ５１２でｉ≦Ｎの場合はステップＳ５０５へ戻り、ステップＳ５１４でｔ≦Ｔの場合はステップＳ５０４へ戻る。 Next, in steps S503 to S514, N _{k, l} and n _{k, l} are calculated for t = 1 to T and i = 1 to N. Specifically, t = 1 is set in step S503, i = 1 is set in step S504, k = z _{t, i} is set in step S505, j = 1 is set in step S506, and step S507 is set. in setting l _{= z t, i} and adds the _{x t, i, j} in step S508 _{n k,} with increasing "1" _l _{n k,} the _l, a j in step S509, "1" increased In step S510, if j> N, i is increased by “1” in step S511. If i> N in step S512, t is increased by “1” in step S513. If t> T in step S514, The process proceeds to step S515. If j ≦ N in step S510, the process returns to step S507. If i ≦ N in step S512, the process returns to step S505. If t ≦ T in step S514, the process returns to step S504.

次に、ステップＳ５１５では、ｋ＝１〜Ｋ、ｌ＝１〜Ｋに対して、式（25）を用いてη_ｋ，ｌをサンプリングし、記憶部３０のη_ｋ，ｌを更新し保存する。
Next, in step S515, k = 1 to K, with respect to l = 1 to K, to sample the eta _{k, l} using equation (25), stores and updates the eta _{k, l} of the storage section 30 .

（混合比（β）推定部）
図２に示す混合比（β）推定部２６は、βのサンプリングを行い、そのサンプリングした値によって記憶部３０に保存してあるβを更新し記憶する。混合比（β）推定部２６の処理フローについて図６を用いて説明する。まず、ステップＳ６０１では、記憶部３０に現在保存してあるｚ_ｔ，ｉ，β，Ｋとハイパーパラメータγ，α_０，κとカウント変数ｍ_{ｔ，ｋ，ｌ}とを取得する。 (Mixing ratio (β) estimation part)
The mixing ratio (β) estimation unit 26 shown in FIG. 2 samples β, and updates and stores β stored in the storage unit 30 with the sampled value. The processing flow of the mixture ratio (β) estimation unit 26 will be described with reference to FIG. First, in step S601, z _{t, i} , β, K, hyper parameters γ, α ₀ , κ, and count variables m _{t, k, l} currently stored in the storage unit 30 are acquired.

βのサンプリングにおいては、３つの補助変数が必要となる。まず、ステップＳ６０２では、すべてのｔ＝１，〜Ｔ、ｋ＝１〜Ｋ、ｌ＝１〜Ｋに対して、式（５）を用いて補助変数Ｒ_{ｔ，ｋ，ｌ}＝ｒ，ｒ∈｛１，２，・・・，ｍ_{ｔ，ｋ，ｌ}｝をサンプリングする。
In the sampling of β, three auxiliary variables are required. First, in step S602, for all t = 1, to T, k = 1 to _{K, and l} = _{1 to K, the} auxiliary variables R _{t, k, l} = r, rε using equation (5). {1,2, ..., mt _{, k, l} } is sampled.

ここで、ｓ（ｘ，ａ）は第１種スターリング数（unsigned stirling number of the first kind）とよばれる関数であり、ｎ≧ａ≧０に対してｘ（ｘ＋１）（ｘ＋２）・・・（ｘ＋ｎ−１）のｘａの係数をｓ（ｘ，ａ）の値とする関数である。以下の漸化式で与えられる。
ｓ（ｎ，ｋ）＝ｓ（ｎ−１，ｋ−１）＋（ｎ−１）ｓ（ｎ−１，ｋ） Here, s (x, a) is a function called an unsigned stirling number of the first kind, and for n ≧ a ≧ 0, x (x + 1) (x + 2). x + n−1) is a function having the coefficient of xa as the value of s (x, a). It is given by the following recurrence formula.
s (n, k) = s (n-1, k-1) + (n-1) s (n-1, k)

次に、ステップＳ６０３では、ｔ＝１〜Ｔ、ｋ＝１〜Ｋに対し、式（６）を用いて補助変数Ｏ_ｔ，ｋをサンプリングする。

ここで、Binomial（）は、二項分布を表す。 Next, in step S603, the auxiliary variable O _{t, k} is sampled using Equation (6) for t = 1 to T and k = 1 to K.

Here, Binomial () represents a binomial distribution.

次に、ステップＳ６０４では、ｔ＝１〜Ｔ、ｋ＝１〜Ｋ、ｌ＝１〜Ｋに対して、式（７）を用いて補助変数＾Ｒ_{ｔ，ｋ，ｌ}をサンプリングする。
Next, in step S604, auxiliary variables ＲR _{t, k, l} are sampled using equation (7) for t = 1-T, k = 1-K, and l = 1-K.

最後に、ステップＳ６０５では、式（26）を用いてβをサンプリングし、記憶部３０のβを更新し保存する。
Finally, in step S605, β is sampled using Equation (26), and β in the storage unit 30 is updated and stored.

（ハイパーパラメータ算出部）
図２に示すハイパーパラメータ算出部２７は、ハイパーパラメータを事前に与えない場合に用いられる。ハイパーパラメータ算出部２７におけるハイパーパラメータの推定演算は、「J.Van Gael, Y.Saatci, Y.W.Teh and Z.Ghahramani,“Beam Sampling for the Infinite Hidden Markov Model”, Proceedings of the 25th International Conference on Machine Learning (ICML), 2008」に記載の技術を用いて実行することが可能である。 (Hyper parameter calculation part)
The hyper parameter calculation unit 27 shown in FIG. 2 is used when hyper parameters are not given in advance. The hyperparameter estimation operation in the hyperparameter calculator 27 is described in “J. Van Gael, Y. Saatci, YWTeh and Z. Ghahramani,“ Beam Sampling for the Infinite Hidden Markov Model ”, Proceedings of the 25th International Conference on Machine Learning ( ICML), 2008 "can be used for the execution.

（終了判定部）
図２に示す終了判定部２８は、所定の終了条件を満足しているか否かを判定し、その終了条件を満足している場合、演算結果を出力する。終了条件は、例えば、予め決めておいた所定の繰り返し回数になったことである。また、他の終了条件の例として、１つ前のサンプリング処理におけるパラメータの値との差分の絶対値が、予め設定しておいた所定の閾値以下になったこと、としても構わない。また、終了判定部２８は、演算結果の出力においては、所属クラスタｚ_ｔ，ｉ，クラスタ時間変化π_ｔ，ｋ，クラスタ間関連度η_ｋ，ｌ，混合比βの中から、入力装置１０を介してユーザによって出力するように指示された、必要な演算結果のみを表示装置４０に出力する。 (End determination part)
The end determination unit 28 shown in FIG. 2 determines whether or not a predetermined end condition is satisfied, and outputs a calculation result when the end condition is satisfied. The termination condition is, for example, that a predetermined number of repetitions has been determined in advance. As another example of the end condition, the absolute value of the difference from the parameter value in the previous sampling process may be equal to or less than a predetermined threshold value set in advance. In addition, in the output of the calculation result, the end determination unit 28 selects the input device 10 from the belonging cluster z _{t, i} , cluster time change π _{t, k} , inter-cluster relevance η _{k, l} , and mixture ratio β. Only a necessary calculation result instructed to be output by the user is output to the display device 40.

（ｄＩＲＭの性能評価例）
本実施形態におけるｄＩＲＭの性能について、シミュレーションによって確認した結果を以下に示す。 (Example of performance evaluation of dIRM)
The result confirmed by simulation about the performance of dIRM in this embodiment is shown below.

クラスタリングの正解の分かっている人工データを作成して、その人工データを用いてｄＩＲＭの定量的な評価を行った。人工データは、全体の時間ステップはＴ＝５、オブジェクト数はＮ＝１６、クラスタ数はＫ＝４とした。オブジェクトのインデックスｉ＝１〜４はほぼ常にクラスタ１に所属し、オブジェクトのインデックスｉ＝５〜８はほぼ常にクラスタ２、オブジェクトのインデックスｉ＝９〜１２はほぼ常にクラスタ３に所属し、オブジェクトのインデックスｉ＝１３〜１６はほぼ常にクラスタ４に所属するように設定する。ただし、一部のオブジェクトは時間に応じてクラスタ間を遷移させた。クラスタ間関連度η_ｋ，ｌは、ｐｏｓｉｔｉｖｅなクラスタ間ではη＝０．９、ｎｅｇａｔｉｖｅなクラスタ間ではη＝０．１の２種類を用いた。クラスタ間関連度ηがｐｏｓｉｔｉｖｅかｎｅｇａｔｉｖｅかも事前に設定し、与えられたηに従って各時刻の関係データｘ_{ｔ，ｉ，ｊ}を生成した。 Artificial data for which the correct answer of clustering was known was created, and quantitative evaluation of dIRM was performed using the artificial data. In the artificial data, the total time step is T = 5, the number of objects is N = 16, and the number of clusters is K = 4. The object index i = 1 to 4 almost always belongs to cluster 1, the object index i = 5 to 8 almost always belongs to cluster 2, and the object index i = 9 to 12 almost always belongs to cluster 3. The index i = 13 to 16 is set so as to almost always belong to the cluster 4. However, some objects have transitioned between clusters according to time. Two types of inter-cluster relevance η _{k, l} were used: η = 0.9 between positive clusters and η = 0.1 between negative clusters. Whether the inter-cluster relevance η is positive or negative is set in advance, and the relationship data x _{t, i, j} at each time is generated according to the given η.

シミュレーションでは、ｄＩＲＭおよび公知のＩＲＭに対して、前記の手続きに従って生成した関係データ時系列Ｘを用いて、オブジェクトのクラスタリングを実行した。ＩＲＭでは、式（13）においてσ＝０．５としてクラスタリングしたのち、ＩＲＭでのオブジェクトのインデックスｉのクラスタリング結果ｚ_ｉを、各時刻ｔでのオブジェクトのインデックスｉのクラスタリングｚ_ｔ，ｉとみなした。 In the simulation, object clustering was performed on the dIRM and the known IRM using the relational data time series X generated according to the above procedure. In the IRM, after clustering with σ = 0.5 in the equation (13), the clustering result z _i of the object index i in the IRM is regarded as the clustering z _{t, i} of the object index i at each time t. .

シミュレーションでは、クラスタリングの評価尺度の一つである、ｒａｎｄｉｎｄｅｘを利用した定量的評価を行った。ｒａｎｄｉｎｄｅｘとは、あるデータに対して２つのクラスタリング結果が与えられた時、２つのクラスタリング結果の類似度を測る指標である。ｒａｎｄｉｎｄｅｘの最大値は１で、このとき２つのクラスタリング結果は完全に一致していることを表す。シミュレーションでは、観測データ生成時に利用した正しいクラスタリング結果と、各モデルでの推定結果から得られたクラスタリング結果とのｒａｎｄｉｎｄｅｘを計算した。 In the simulation, quantitative evaluation using a rand index, which is one of evaluation scales for clustering, was performed. The rand index is an index that measures the degree of similarity between two clustering results when two clustering results are given to certain data. The maximum value of the rand index is 1, indicating that the two clustering results are completely coincident. In the simulation, a random index between the correct clustering result used at the time of observation data generation and the clustering result obtained from the estimation result of each model was calculated.

ｒａｎｄｉｎｄｅｘの計算結果を図７（ａ）および図７（ｂ）に示す。図７（ａ）は、本実施形態のｄＩＲＭに対するｒａｎｄｉｎｄｅｘの結果を示し、（ｂ）は、公知のＩＲＭに対するｒａｎｄｉｎｄｅｘの結果を示している。図７（ａ）と図７（ｂ）とを比較すると、ｄＩＲＭの場合は、繰り返し回数を増加するにしたがってｒａｎｄｉｎｄｅｘがほぼ１となった。それに対して、ＩＲＭの場合は、繰り返し回数を増加しても、ｒａｎｄｉｎｄｅｘが１になることは無かった。結論として、ｄＩＲＭは、時間変化する関係データに対して、ＩＲＭに比較して、より良くモデル化できることが確認された。 FIG. 7A and FIG. 7B show the calculation results of the rand index. FIG. 7A shows the result of the rand index for the dIRM of this embodiment, and FIG. 7B shows the result of the rand index for the known IRM. Comparing FIG. 7A and FIG. 7B, in the case of dIRM, the rand index becomes almost 1 as the number of repetitions is increased. On the other hand, in the case of IRM, even if the number of repetitions was increased, the rand index did not become 1. In conclusion, it was confirmed that dIRM can be modeled better with respect to time-varying relational data compared to IRM.

以上、本実施形態のクラスタリング装置８０によれば、公知のＩＲＭを拡張して、クラスタの時間変化を表すパラメータπを導入し、所属クラスタｚの推定方法も時間に従って考慮可能なように改造したことによって、クラスタの時間変化を推定することが可能となった。 As described above, according to the clustering apparatus 80 of the present embodiment, the publicly known IRM is expanded to introduce the parameter π representing the time change of the cluster, and has been modified so that the estimation method of the belonging cluster z can also be considered according to the time. This makes it possible to estimate the time change of the cluster.

なお、本実施形態は、これに限定されるものではなく、その趣旨を変えない範囲で実施することができる。例えば、図１のステップＳ１０４〜Ｓ１０７の演算の順番は、任意の順番で構わない。また、図２の所属クラスタ（ｚ）推定部２３、時間変化情報（π）推定部２４、クラスタ間関連度（η）推定部２５、混合比（β）推定部２６の処理の順番は、任意の順番で構わない。 In addition, this embodiment is not limited to this, It can implement in the range which does not change the meaning. For example, the calculation order of steps S104 to S107 in FIG. 1 may be any order. Further, the order of processing of the belonging cluster (z) estimation unit 23, the time change information (π) estimation unit 24, the inter-cluster relevance (η) estimation unit 25, and the mixture ratio (β) estimation unit 26 in FIG. 2 is arbitrary. It doesn't matter in the order.

また、本実施形態において、クラスタリング装置８０（図２参照）の各部２３〜２６の処理は、クラスタリング装置８０をコンピュータで実現したときに搭載されるプログラムによって実現されてもよい。このプログラムは、通信回線を介して提供することもできるし、ＣＤ−ＲＯＭ等のコンピュータ読み取り可能な記録媒体に書き込んで配布することも可能である。 In the present embodiment, the processing of the units 23 to 26 of the clustering device 80 (see FIG. 2) may be realized by a program installed when the clustering device 80 is realized by a computer. This program can be provided via a communication line, or can be distributed by writing in a computer-readable recording medium such as a CD-ROM.

１０入力装置
２０処理部
２１初期設定部
２２推定演算部
２３所属クラスタ（ｚ）推定部
２４時間変化情報（π）推定部
２５クラスタ間関連度（η）推定部
２６混合比（β）推定部
２８終了判定部
３０記憶部
４０表示装置
８０クラスタリング装置
９０クラスタリングシステム DESCRIPTION OF SYMBOLS 10 Input device 20 Processing part 21 Initial setting part 22 Estimation calculation part 23 Affiliation cluster (z) estimation part 24 Time change information ((pi)) estimation part 25 Inter-cluster relevance ((eta)) estimation part 26 Mixing ratio ((beta)) estimation part 28 End determination unit 30 Storage unit 40 Display device 80 Clustering device 90 Clustering system

Claims

A clustering device that performs clustering of objects using relational data representing the presence or absence of relationships between a plurality of objects,
The clustering apparatus includes:
The inter-cluster relation η indicating the mixture ratio β calculated in an infinite relational model (IRM) which is one of the functions for clustering the relational data and the strength of the relation between the cluster k and the cluster l _{k, l,} and relation data x _{t, k, l} , which is obtained by observing relation data indicating whether or not there is a relation between cluster k and cluster l at a predetermined time interval, belonged to cluster k at time t−1. Time change information π _{t, k} indicating which cluster the object is likely to belong to at the next time t, belonging cluster z _{t, i} indicating the cluster to which the object i belongs at time t, and hyper parameters α ₀ , κ And a storage unit for storing the number K of clusters,
The assigned cluster z _{t, i} , the time change information π _{t, k} , the mixing ratio β, the hyper parameters α ₀ , κ, and the number of clusters K are acquired from the storage unit, and a predetermined period t = 1 to 1 In T, when the number of objects with z _{t−1, i} = k and z _{t, i} = l is m _{t, k, l} , the Dirichlet distribution Dirichlet (α ₀ β ₁ + m _{t, k, l} , ... _{_{_{, α 0 β k + m t}}} , k, k + κ, ..., α 0 β K + m t, k, K, α 0 (1-Σ K k = 1 β k)) said sampled from time change information [pi _{t , K,} and the time change information estimation unit that updates and stores the time change information π _{t, k} stored in the storage unit with the calculated time change information π _{t, k} ,
From the storage unit, the belonging cluster z _{t, i} , the time change information π _{t, k} , the inter-cluster relevance η _{k, l} , the mixing ratio β, and the relation data x _{t, for} a predetermined period t = 1 to _{T k, l} and the number of clusters K are acquired, the acquired mixing ratio β, the time change information π _{t, k} , the inter-cluster relevance η _{k, l} , the relationship data x _{t, k, l} , and The number of clusters K is applied to Equation (1), Equation (2), Equation (3), and Equation (4), u _{t, j} is sampled from Equation (1), and the message variable p _{t, i, k} when defined in formula (3), t = message variables _{p t} using equation (2) in order from 1 to t = _{T, i,} calculates _k, message variables _{p t} where the _{calculated, i, k} relative, calculated message variables _{p t, i,} the _k using equation (4) in order from t = T to t = 1 _{_{, P (z t, i =}} k | z t-1, i = l) in the case where does not become 0 cluster membership _{z t,} calculates a _i, the calculated cluster membership _{z t,} stored in the storage unit by _i An affiliated cluster estimation unit for updating and storing the affiliated cluster z _{t, i} ,
An inter-cluster relevance estimation unit that calculates the inter-cluster relevance η _{k, l} using the infinite relationship model and stores it in the storage unit;
The mixing ratio β calculated by the infinite relation model and stored in the storage unit; and
End determination that repeats the process of executing the operations in the mixture ratio estimation unit, the inter-cluster relevance estimation unit, the time change information estimation unit, and the belonging cluster estimation unit in an arbitrary order until a predetermined end condition is satisfied And
A clustering apparatus comprising:

The clustering apparatus includes:
Furthermore, the storage unit for storing hyperparameters γ, ξ, Ψ is provided,
The inter-cluster relevance estimation unit receives the belonging cluster z _{t, i} , the inter-cluster relevance η _{k, l} , the hyper parameters ξ, Ψ, the number of clusters K, and a predetermined period t = 1 to 1 from the storage unit. The relational data x _{t, k, l} of T is acquired, and the number of sets of (t, i, j) where z _{t, i} = k and z _{t, j} = l is N _{k, l} , when n _k, the relationship data _{x t} in _{_l, k,} the number indicating the presence _l relationship _{n k,} and _l, beta distribution _{beta (ξ + η k, l} , Ψ + n k, l -n k, l) The inter-cluster relevance η _{k, l} is sampled from the data, and the inter-cluster relevance η _{k, l} stored in the storage unit is updated and stored with the calculated inter-cluster relevance η _{k, l.} ,
The mixture ratio estimation unit acquires the belonging cluster z _{t, i} , the mixing ratio β, and the hyperparameters γ, α ₀ , κ from the storage unit, and acquires the acquired cluster z _{t, i} , the mixture The ratio β and the hyper parameters α ₀ and κ are applied to the equations (5) and (6) to calculate auxiliary variables R _{t, k, l} and auxiliary variables O _{t, k} , respectively. The auxiliary variable R _{t, k, l} and the auxiliary variable O _{t, k} are applied to the equation (7) to calculate the auxiliary variable ^ R _{t, k, l} and the Dirichlet distribution Dirichlet (Σ _{t, k} ^ R _{t, k , 1} , Σt _{, k} ^ _{Rt, k, 2} , ... , Σt _{, k} ^ _{Rt, k, γ} ) to calculate the mixing ratio β, and the storage by the calculated mixing ratio β The mixture ratio β stored in the unit is updated and stored. Clustering equipment.

The clustering device is connected to a display device that displays a calculation result,
When the termination condition is satisfied, the termination determination unit is any one of the mixture ratio β, the inter-cluster relevance η _{k, l} , the time change information π _{t, k} , and the belonging cluster z _{t, i.} The clustering apparatus according to claim 1, wherein one or any combination is output to the display device.

A clustering method used in a clustering apparatus that performs clustering of objects using relational data representing the presence or absence of a relationship between a plurality of objects,
The clustering apparatus includes:
The inter-cluster relation η indicating the mixture ratio β calculated in an infinite relational model (IRM) which is one of the functions for clustering the relational data and the strength of the relation between the cluster k and the cluster l _{k, l,} and relation data x _{t, k, l} , which is obtained by observing relation data indicating whether or not there is a relation between cluster k and cluster l at a predetermined time interval, belonged to cluster k at time t−1. Time change information π _{t, k} indicating which cluster the object is likely to belong to at the next time t, belonging cluster z _{t, i} indicating the cluster to which the object i belongs at time t, and hyper parameters α ₀ , κ And a storage unit for storing the number K of clusters and a processing unit,
The processor is
The assigned cluster z _{t, i} , the time change information π _{t, k} , the mixing ratio β, the hyper parameters α ₀ , κ, and the number of clusters K are acquired from the storage unit, and a predetermined period t = 1 to 1 In T, when the number of objects with z _{t−1, i} = k and z _{t, i} = l is m _{t, k, l} , the Dirichlet distribution Dirichlet (α ₀ β ₁ + m _{t, k, l} , ... _{_{_{, α 0 β k + m t}}} , k, k + κ, ..., α 0 β K + m t, k, K, α 0 (1-Σ K k = 1 β k)) said sampled from time change information [pi _{t , K,} and the time change information estimation step for updating and storing the time change information π _{t, k} stored in the storage unit with the calculated time change information π _{t, k} ,
From the storage unit, the belonging cluster z _{t, i} , the time change information π _{t, k} , the inter-cluster relevance η _{k, l} , the mixing ratio β, and the relation data x _{t, for} a predetermined period t = 1 to _{T k, l} and the number of clusters K are acquired, the acquired mixing ratio β, the time change information π _{t, k} , the inter-cluster relevance η _{k, l} , the relationship data x _{t, k, l} , and The number of clusters K is applied to Equation (1), Equation (2), Equation (3), and Equation (4), u _{t, j} is sampled from Equation (1), and the message variable p _{t, i, k} when defined in formula (3), t = message variables _{p t} using equation (2) in order from 1 to t = _{T, i,} calculates _k, message variables _{p t} where the _{calculated, i, k} relative, calculated message variables _{p t, i,} the _k using equation (4) in order from t = T to t = 1 _{_{, P (z t, i =}} k | z t-1, i = l) in the case where does not become 0 cluster membership _{z t,} calculates a _i, the calculated cluster membership _{z t,} stored in the storage unit by _i An affiliated cluster estimation step of updating and storing the affiliated cluster z _{t, i} ,
Calculating the inter-cluster relevance η _{k, l} by the infinite relationship model and storing it in the storage unit;
Calculating the mixture ratio β by the infinite relation model and storing it in the storage unit; and
The process of executing the operations in the mixing ratio estimation step, the inter-cluster relevance estimation step, the time change information estimation step, and the belonging cluster estimation step in an arbitrary order is repeatedly performed until a predetermined end condition is satisfied. An end determination step;
The clustering method characterized by performing.

The clustering apparatus includes:
Furthermore, the storage unit for storing hyperparameters γ, ξ, Ψ is provided,
The processor is
In the inter-cluster relevance estimation step, the storage cluster z _{t, i} , the inter-cluster relevance η _{k, l} , the hyper parameters ξ, Ψ, the number of clusters K, and a predetermined period t = 1 to The relational data x _{t, k, l} of T is acquired, and the number of sets of (t, i, j) where z _{t, i} = k and z _{t, j} = l is N _{k, l} , when n _k, the relationship data _{x t} in _{_l, k,} the number indicating the presence _l relationship _{n k,} and _l, beta distribution _{beta (ξ + η k, l} , Ψ + n k, l -n k, l) The inter-cluster relevance η _{k, l} is sampled from the data, and the inter-cluster relevance η _{k, l} stored in the storage unit is updated and stored with the calculated inter-cluster relevance η _{k, l.} ,
In the mixing ratio estimation step, the belonging cluster z _{t, i} , the mixing ratio β, and the hyperparameters γ, α ₀ , κ are acquired from the storage unit, and the acquired belonging cluster z _{t, i} , the mixing The ratio β and the hyper parameters α ₀ and κ are applied to the equations (5) and (6) to calculate auxiliary variables R _{t, k, l} and auxiliary variables O _{t, k} , respectively. The auxiliary variable R _{t, k, l} and the auxiliary variable O _{t, k} are applied to the equation (7) to calculate the auxiliary variable ^ R _{t, k, l} and the Dirichlet distribution Dirichlet (Σ _{t, k} ^ R _{t, k , 1} , Σt _{, k} ^ _{Rt, k, 2} , ... , Σt _{, k} ^ _{Rt, k, γ} ) to calculate the mixing ratio β, and the storage by the calculated mixing ratio β The mixture ratio β stored in the unit is updated and stored. The clustering method according to claim 4.

The clustering device is connected to a display device that displays a calculation result,
The processor is
When the termination condition is satisfied, any one or any combination of the mixing ratio β, the intercluster relevance η _{k, l} , the time change information π _{t, k} , and the assigned cluster z _{t, i} The clustering method according to claim 4, wherein: is output to the display device.

A program for causing a computer as a clustering apparatus to execute the clustering method according to any one of claims 4 to 6.