JP6343591B2

JP6343591B2 - Submatrix region extraction apparatus, method, and program

Info

Publication number: JP6343591B2
Application number: JP2015124687A
Authority: JP
Inventors: 勝彦石黒; 允裕中野; 上田　修功; 修功上田; 昭悟木村
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2015-06-22
Filing date: 2015-06-22
Publication date: 2018-06-13
Anticipated expiration: 2035-06-22
Also published as: JP2017010250A

Description

本発明は、部分行列領域抽出装置、方法、及びプログラムに係り、特に、特徴となる部分行列を抽出するための部分行列領域抽出装置、方法、及びプログラムに関する。 The present invention relates to a partial matrix region extraction device, method, and program, and more particularly, to a partial matrix region extraction device, method, and program for extracting a characteristic partial matrix.

実務で扱われる多くのデータは２次元のテーブルで表現可能である。これらの２次元のテーブルデータは直ちに行列形式で表現可能である。そのため、行列データに対する統計的機械学習手法が数多く提案されている。ここでは、特に、行列データ全体の中から、一部の特異な特徴をもつ部分だけを抽出する、部分行列抽出という解析タスクを対象とする。 Many data handled in practice can be expressed in a two-dimensional table. These two-dimensional table data can be immediately expressed in matrix form. Therefore, many statistical machine learning methods for matrix data have been proposed. Here, in particular, an analysis task called submatrix extraction, which extracts only a part having some unique features from the entire matrix data, is targeted.

部分行列とは、与えられた行列データの行と列との部分集合の直積、つまり行列の中の小さな矩形領域（インデックスのパーミュテーションを含めれば）の事である。部分行列抽出手法は、データ全体をパターンに分類するのではなく、特異な観測値をもつ部分行列を少数抽出することが目的である。部分行列抽出は、潜在的に興味深いと思われる観測値領域のみを、しかも認知しやすい矩形として抽出してくれるため、データの中の特異的なパターンのみを抽出するというタスクにはより好都合であり、Ｐｌａｉｄ法（非特許文献１参照）をはじめとして、いくつもの手法が考案されてきた（非特許文献２、及び非特許文献３参照）。図７に従来の抽出法の例を示す。 A submatrix is a Cartesian product of a subset of rows and columns of given matrix data, that is, a small rectangular area in the matrix (including index permutation). The purpose of the submatrix extraction method is not to classify the entire data into patterns but to extract a small number of submatrixes having unique observation values. Submatrix extraction is more convenient for the task of extracting only specific patterns in the data, because it extracts only the observation area that seems to be of interest as a rectangle that is easy to recognize. A number of methods have been devised, including the Plaid method (see Non-Patent Document 1) (see Non-Patent Document 2 and Non-Patent Document 3). FIG. 7 shows an example of a conventional extraction method.

Lazzeroni and Owen, “Plaid Models for Gene Expression Data”, Statistica Sinica, vol. 12, pp. 61-86, 2002.Lazzeroni and Owen, “Plaid Models for Gene Expression Data”, Statistica Sinica, vol. 12, pp. 61-86, 2002. Caldas and Kaski, “Bayesian Biclustering with the Plaid Model”, in Proceedings of the IEEE International Workshop on Machine Learning and Signal Processing (MLSP), 2008.Caldas and Kaski, “Bayesian Biclustering with the Plaid Model”, in Proceedings of the IEEE International Workshop on Machine Learning and Signal Processing (MLSP), 2008. Shabalin et al., “Finding Large Average Submatrices in High Dimensional Data”, The Annals of Applied Statistics, Vol. 3, issu 3, pp. 995-1012, 2009.Shabalin et al., “Finding Large Average Submatrices in High Dimensional Data”, The Annals of Applied Statistics, Vol. 3, issu 3, pp. 995-1012, 2009.

しかし、非特許文献１、及び非特許文献２に代表される手法は次の欠点をもつ。それは、行列データ内に存在するであろう「特異な部分行列領域」の数を、解析に先だって決定しなければならない点である。与えられた行列データの性質が分からないために部分行列領域抽出法を適用して解析する、というのが本来の目的であるにも関わらず、解析に先だって部分行列領域の数を指定しなければならない、という矛盾する問題をもっている。また、非特許文献３では、部分行列領域の数自体は指定しなくても良いが、部分行列とみなすための閾値パラメータを事前に設定しなければならない点で同様の問題を抱えている。 However, the methods represented by Non-Patent Document 1 and Non-Patent Document 2 have the following drawbacks. That is, the number of “singular submatrix regions” that will exist in the matrix data must be determined prior to analysis. Even though the original purpose is to analyze by applying the submatrix region extraction method because the nature of the given matrix data is unknown, the number of submatrix regions must be specified prior to analysis. It has a contradictory problem of not becoming. In Non-Patent Document 3, the number of submatrix regions itself does not need to be specified, but has a similar problem in that a threshold parameter to be regarded as a submatrix must be set in advance.

また、部分行列領域の数は領域抽出タスクの精度に大きな影響を及ぼす。また、最適な部分行列領域の数を事後的に評価することも一般には困難であるため、この問題は取り扱われてこなかった。 In addition, the number of submatrix regions greatly affects the accuracy of the region extraction task. Also, since it is generally difficult to evaluate the optimal number of submatrix regions afterwards, this problem has not been dealt with.

本発明は、上記問題点を解決するために成されたものであり、最適な数の部分行列領域を抽出することができる部分行列領域抽出装置、方法、及びプログラムを提供することを目的とする。 The present invention has been made to solve the above problems, and an object of the present invention is to provide a submatrix area extraction apparatus, method, and program capable of extracting an optimal number of submatrix areas. .

上記目的を達成するために、第１の発明に係る部分行列領域抽出装置は、第１ドメインの各オブジェクトと第２ドメインの各オブジェクトとのペアの関係についての観測値の各々からなる観測行列から、特徴を持った部分行列領域を抽出する部分行列領域抽出装置であって、前記部分行列領域の数と、前記第１ドメインの各オブジェクト及び第２ドメインの各オブジェクトに対して前記部分行列領域の数だけ存在する前記部分行列領域毎に割り当てられるか否かを表す部分行列領域割当推定値とを初期化する初期化部と、前記第１ドメインの各オブジェクト及び第２ドメインの各オブジェクトに対し、前記部分行列領域の数だけ存在する前記部分行列領域毎に、前記観測行列、前記部分行列領域割当推定値、及び各オブジェクトに対する前記部分行列領域の割り当てに関する部分行列領域ハイパーパラメータに基づいて、前記オブジェクトが、前記部分行列領域に所属すべきか否かを推定して、前記部分行列領域に所属すべきであると推定された場合には前記オブジェクトに対して前記部分行列領域を割り当てると共に、前記観測行列及び前記部分行列領域割当推定値に基づいて、前記オブジェクトを表現するために新たな前記部分行列領域を生成するべきか否かを推定し、新たな前記部分行列領域を生成するべきであると推定された場合には新たな前記部分行列領域を生成し、前記オブジェクトに、生成された新たな前記部分行列領域を割り当てて、前記部分行列領域の数を更新し、割り当てられた前記第１ドメインのオブジェクト数又は前記第２ドメインのオブジェクト数が所定値以下となる前記部分行列領域を削除し、前記部分行列領域の数を更新する部分行列領域割当推定部と、前記部分行列領域割当推定部による推定及び割り当てを予め定めた繰り返し終了条件を満たすまで繰り返す繰り返し判定部と、を含んで構成されている。 In order to achieve the above object, a submatrix region extraction apparatus according to a first aspect of the present invention is based on an observation matrix composed of observation values for a pair relationship between each object in the first domain and each object in the second domain. A partial matrix region extraction apparatus for extracting a partial matrix region having features, wherein the number of the partial matrix regions and the number of the partial matrix regions for each object in the first domain and each object in the second domain An initialization unit for initializing a submatrix region allocation estimation value indicating whether or not to be allocated to each of the submatrix regions existing in a number, and for each object of the first domain and each object of the second domain, For each of the submatrix regions present by the number of the submatrix regions, the observation matrix, the submatrix region allocation estimation value, and the object for each object When it is estimated that the object should belong to the submatrix region by estimating whether the object should belong to the submatrix region based on a submatrix region hyperparameter related to allocation of the submatrix region Assigns the submatrix region to the object and determines whether to generate a new submatrix region to represent the object based on the observation matrix and the submatrix region allocation estimate. Estimating and generating a new submatrix region when it is estimated that a new submatrix region should be generated, assigning the generated new submatrix region to the object, and The number of submatrix regions is updated, and the allocated number of objects in the first domain or the number of objects in the second domain is The partial matrix region allocation estimation unit that deletes the partial matrix regions that are less than or equal to the value and updates the number of the partial matrix regions, and the estimation and allocation by the partial matrix region allocation estimation unit until a predetermined repetition termination condition is satisfied And a repeat determination unit that repeats.

また、第１の発明に係る部分行列領域抽出装置において、前記部分行列領域割当推定部は、前記第１ドメインの各オブジェクト及び第２ドメインの各オブジェクトに対し、前記部分行列領域の数だけ存在する前記部分行列領域毎に、前記部分行列領域割当推定値に基づいて推定される、前記オブジェクトが前記部分行列領域に割り当てられる度合い、又は割り当てられない度合いを表す事前適合度と、前記観測行列、前記部分行列領域割当推定値、及び前記部分行列領域ハイパーパラメータに基づいて推定される、前記オブジェクトが前記部分行列領域に割り当てられる尤もらしさ、又は割り当てられない尤もらしさを表すデータ適合度とに基づいて、前記オブジェクトが前記部分行列領域に所属する可能性又は所属しない可能性を算出することにより、前記オブジェクトが、前記部分行列領域に所属すべきか否かを推定してもよい。 In the partial matrix region extraction device according to the first aspect of the present invention, the partial matrix region allocation estimation unit exists for each object in the first domain and each object in the second domain by the number of the partial matrix regions. For each of the submatrix regions, a pre-fit degree indicating the degree to which the object is assigned or not assigned to the submatrix region, estimated based on the submatrix region allocation estimation value, and the observation matrix, Based on a partial matrix region allocation estimate and a likelihood of the object being assigned to the submatrix region, or a data fitness representing a likelihood of being not assigned, estimated based on the submatrix region hyperparameter, Calculate the possibility that the object belongs or does not belong to the submatrix region And by the object, it may be estimated whether or not to belong to the partial matrix region.

また、第１の発明に係る部分行列領域抽出装置において、前記部分行列領域割当推定部は、前記第１ドメインの各オブジェクトに対し、前記部分行列領域割当推定値に基づいて推定される、必要とされる部分行列領域の数に関する事前適合度と、新たに生成される部分行列領域に、前記第２ドメインのオブジェクトが割り当てられる度合いを表す事前適合度と、前記新たな部分行列領域内の観測値についての観測パラメータに関する事前適合度と、前記部分行列領域の数を増やしたことで前記観測行列をよく説明できるようになった度合いを表すデータ適合度とに基づいて、前記新たな部分行列領域を加えた前記部分行列領域の数である可能性を算出することにより、前記オブジェクトを表現するために新たな前記部分行列領域を生成するべきか否かを推定し、前記第２ドメインの各オブジェクトに対し、前記部分行列領域割当推定値に基づいて推定される、必要とされる部分行列領域の数に関する事前適合度と、新たに生成される部分行列領域に、前記第１ドメインのオブジェクトが割り当てられる度合いを表す事前適合度と、前記新たな部分行列領域内の観測値についての観測パラメータに関する事前適合度と、前記部分行列領域の数を増やしたことで前記観測行列をよく説明できるようになった度合いを表すデータ適合度とに基づいて、前記新たな部分行列領域を加えた前記部分行列領域の数である可能性を算出することにより、前記オブジェクトを表現するために新たな前記部分行列領域を生成するべきか否かを推定してもよい。 In the partial matrix region extraction apparatus according to the first invention, the partial matrix region allocation estimation unit is estimated based on the partial matrix region allocation estimation value for each object of the first domain. A pre-matching degree related to the number of submatrix regions to be generated, a pre-matching degree indicating a degree to which an object of the second domain is allocated to a newly generated submatrix area, and an observation value in the new submatrix area The new submatrix region is determined based on the prior goodness of the observation parameter with respect to and the data suitability representing the degree to which the observation matrix can be well explained by increasing the number of the submatrix regions. By calculating the possibility of the added number of submatrix regions, a new submatrix region should be generated to represent the object. Whether or not, and for each object of the second domain, a pre-fit degree relating to the number of required sub-matrix regions estimated based on the sub-matrix region allocation estimate, and a newly generated Pre-adaptation degree indicating the degree to which the object of the first domain is assigned to the sub-matrix region, the pre-adaptation degree regarding the observation parameter for the observation value in the new sub-matrix region, and the number of the sub-matrix regions By calculating the possibility that it is the number of the submatrix regions to which the new submatrix region is added based on the data suitability representing the degree that the observation matrix can be well explained by the increase It may be estimated whether a new partial matrix region should be generated to represent the object.

また、第１の発明に係る部分行列領域抽出装置において、前記部分行列領域割当推定値に基づいて、前記部分行列領域ハイパーパラメータを推定する部分行列領域ハイパーパラメータ推定部を更に含み、前記初期化部は、更に前記部分行列領域ハイパーパラメータを初期化し、前記繰り返し判定部は、前記部分行列領域割当推定部による推定及び割り当て、並びに前記部分行列領域ハイパーパラメータ推定部による推定を予め定めた繰り返し終了条件を満たすまで繰り返してもよい。 The submatrix region extraction apparatus according to the first invention further includes a submatrix region hyperparameter estimation unit that estimates the submatrix region hyperparameter based on the partial matrix region allocation estimation value, and the initialization unit Further initializes the partial matrix region hyperparameter, and the iterative determination unit sets a predetermined repetition end condition for estimation and allocation by the partial matrix region allocation estimation unit and estimation by the partial matrix region hyperparameter estimation unit. It may be repeated until it is satisfied.

第２の発明に係るプログラムは、コンピュータを、上記第１の発明に係る部分行列領域抽出装置の各部として機能させるためのプログラムである。 A program according to a second invention is a program for causing a computer to function as each part of the partial matrix region extraction device according to the first invention.

本発明の部分行列領域抽出装置、方法、及びプログラムによれば、第１ドメインの各オブジェクト及び第２ドメインの各オブジェクトに対し、部分行列領域毎に、観測行列、部分行列領域割当推定値、及び部分行列領域ハイパーパラメータに基づいて、オブジェクトが、部分行列領域に所属すべきか否かを推定して、部分行列領域に所属すべきであると推定された場合にはオブジェクトに対して部分行列領域を割り当てると共に、観測行列及び部分行列領域割当推定値に基づいて、オブジェクトを表現するために新たな部分行列領域を生成するべきか否かを推定し、新たな部分行列領域を生成するべきであると推定された場合には新たな部分行列領域を生成し、オブジェクトに、生成された新たな部分行列領域を割り当てて、部分行列領域の数を更新し、割り当てられた第１ドメインのオブジェクト数又は第２ドメインのオブジェクト数が所定値以下となる部分行列領域を削除し、推定及び割り当てを予め定めた繰り返し終了条件を満たすまで繰り返すことにより、最適な数の特徴となる部分行列領域を抽出することができる、という効果が得られる。 According to the submatrix region extraction apparatus, method, and program of the present invention, for each submatrix region for each object in the first domain and each object in the second domain, an observation matrix, a submatrix region allocation estimation value, and Based on the submatrix area hyperparameter, it is estimated whether the object should belong to the submatrix area, and if it is estimated that the object should belong to the submatrix area, the submatrix area is set to the object. Assigning and estimating whether or not a new submatrix region should be generated to represent an object based on the observation matrix and the submatrix region allocation estimate, and generating a new submatrix region If it is estimated, a new submatrix region is generated, and the generated new submatrix region is assigned to the object. And deleting the submatrix region where the number of objects of the first domain assigned or the number of objects of the second domain is equal to or less than a predetermined value, and repeating the estimation and assignment until a predetermined repetition end condition is satisfied, An effect is obtained that a submatrix region that is an optimum number of features can be extracted.

最適な部分行列領域の数を自動的に最適化する例を説明した抽象図である。It is the abstract figure explaining the example which optimizes the number of the optimal submatrix area | region automatically. 本発明の実施の形態に係る部分行列領域抽出装置の構成を示すブロック図である。It is a block diagram which shows the structure of the partial matrix area | region extraction apparatus which concerns on embodiment of this invention. 本発明の実施の形態に係る部分行列領域抽出装置における部分行列領域抽出処理ルーチンを示すフローチャートである。It is a flowchart which shows the partial matrix area | region extraction processing routine in the partial matrix area | region extraction apparatus which concerns on embodiment of this invention. 本発明の実施の形態に係る部分行列領域抽出装置における推定割当処理ルーチンを示すフローチャートである。It is a flowchart which shows the presumed allocation process routine in the partial matrix area | region extraction apparatus which concerns on embodiment of this invention. 本発明の実施の形態に係る部分行列領域抽出装置における推定割当処理ルーチンを示すフローチャートである。It is a flowchart which shows the presumed allocation process routine in the partial matrix area | region extraction apparatus which concerns on embodiment of this invention. 実際に抽出したものを可視化した例を示す図である。It is a figure which shows the example which visualized what was actually extracted. 従来の抽出法の例を示す抽象図である。It is an abstract figure which shows the example of the conventional extraction method.

以下、図面を参照して本発明の実施の形態を詳細に説明する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

＜本発明の実施の形態に係る概要＞ <Outline according to Embodiment of the Present Invention>

まず、本発明の実施の形態における概要を説明する。本発明の実施の形態の部分行列領域抽出装置は、第１ドメインの各オブジェクトと第２ドメインの各オブジェクトとのペアの関係についての観測値の各々からなる観測行列から、特徴を持った部分行列領域を抽出する。本実施の形態の部分行列領域抽出装置は、抽出すべき部分行列領域数を自動的に最適化可能な部分行列領域自動抽出法を用いることにより、上記の課題を解決する。本実施の形態で提案する手法は、図１に示すように、与えられた行列データの性質を捉える上で最適な部分行列領域の数を自動的に最適化する。そのため、部分行列領域の数の事前設定にまつわる精度劣化や部分行列領域の数の選定基準などの問題を回避することが可能となる。 First, an outline of the embodiment of the present invention will be described. The submatrix region extracting apparatus according to the embodiment of the present invention is characterized in that a submatrix having characteristics is obtained from an observation matrix composed of observation values regarding a pair relationship between each object in the first domain and each object in the second domain. Extract regions. The partial matrix region extraction apparatus according to the present embodiment solves the above problem by using a partial matrix region automatic extraction method that can automatically optimize the number of partial matrix regions to be extracted. As shown in FIG. 1, the method proposed in the present embodiment automatically optimizes the number of sub-matrix regions that are optimal for capturing the properties of given matrix data. Therefore, it is possible to avoid problems such as accuracy degradation and selection criteria for the number of sub-matrix regions related to the presetting of the number of sub-matrix regions.

本実施の形態では、アルゴリズムは確率的生成モデルで完全に記述されるものとする。この利点を生かして、その他の事前設定パラメータ（観測パラメータ推定値、観測ハイパーパラメータ推定値など）に関してもデータに自動的にフィットさせる最適化も実現している。したがって、未知のデータに対してもパラメータの探索等の労力を省いて、ほぼ全自動的に高精度な部分行列領域抽出を実施できるようになる。また、本実施の形態で利用するアルゴリズムは様々な形式の行列データ、例えば連続数値量、離散数値量、シンボル量など任意の行列データに対して適用可能であり、適用するデータ形式によって各部の具体的な構成は変化しうる。特に、観測パラメータ推定値、及び観測ハイパーパラメータ推定値、並びにこれらの推定値を推定するための観測パラメータ推定部、及び観測ハイパーパラメータ推定部は各構成に応じて大きく実装が変化する。したがって、これらの部分の構成は本発明の必須要素ではないものとする。これら推定値及び各部の構成法や実際の計算アルゴリズム等については上記非特許文献１、２、及び非特許文献４等を参照することができる。 In this embodiment, it is assumed that the algorithm is completely described by a probabilistic generation model. Taking advantage of this advantage, optimization is also realized in which other preset parameters (observed parameter estimated values, observed hyperparameter estimated values, etc.) are automatically fitted to the data. Therefore, it is possible to perform submatrix region extraction with high accuracy almost completely automatically without the effort of searching parameters for unknown data. Further, the algorithm used in the present embodiment can be applied to matrix data in various formats, for example, arbitrary matrix data such as continuous numerical values, discrete numerical values, symbol amounts, and the like of each part depending on the applied data format. The general composition can vary. In particular, the implementation of the observation parameter estimation value, the observation hyperparameter estimation value, the observation parameter estimation unit for estimating these estimation values, and the observation hyperparameter estimation unit varies greatly depending on each configuration. Therefore, the configuration of these parts is not an essential element of the present invention. For the estimated values, the configuration method of each part, the actual calculation algorithm, etc., the above-mentioned Non-Patent Documents 1 and 2, and Non-Patent Document 4 can be referred to.

［非特許文献４］: ビショップ, ”パターン認識と機械学習”, シュプリンガージャパン、2007. [Non-Patent Document 4]: Bishop, “Pattern Recognition and Machine Learning”, Springer Japan, 2007.

＜本発明の実施の形態に係る部分行列領域抽出装置の構成＞ <Configuration of Submatrix Region Extraction Device According to Embodiment of the Present Invention>

次に、本発明の実施の形態に係る部分行列領域抽出装置の構成について説明する。図２に示すように、本発明の実施の形態に係る部分行列領域抽出装置１００は、ＣＰＵと、ＲＡＭと、後述する部分行列領域抽出処理ルーチンを実行するためのプログラムや各種データを記憶したＲＯＭと、を含むコンピュータで構成することが出来る。この部分行列領域抽出装置１００は、機能的には図２に示すように入力部１０と、演算部２０と、出力部５０とを備えている。 Next, the configuration of the submatrix region extraction apparatus according to the embodiment of the present invention will be described. As shown in FIG. 2, the submatrix area extraction apparatus 100 according to the embodiment of the present invention includes a CPU, a RAM, and a ROM that stores a program and various data for executing a submatrix area extraction processing routine described later. And a computer including Functionally, the submatrix region extraction apparatus 100 includes an input unit 10, a calculation unit 20, and an output unit 50 as shown in FIG.

入力部１０は、観測行列Ｘを受け付ける。観測行列Ｘは、通常Ｎ１×Ｎ２の行及び列要素からなる行列で、一般に実数値を仮定する。行方向を第１ドメインと呼び、ｉ＝１,…，Ｎ１と行方向にインデックスする。列方向を第２ドメインと呼び、ｊ＝１,…,Ｎ２と列方向にインデックスする。なお、入力データ記憶部２８に格納されるデータの構成はユーザの目的やタスクに依存する。そのため、最低限上記のデータを備えるものとするが、これに限定されるものではなく、目的やタスクに応じて多様なデータが格納できるものとする。 The input unit 10 receives the observation matrix X. The observation matrix X is usually a matrix composed of N1 × N2 row and column elements, and generally assumes a real value. The row direction is called the first domain, and i = 1,..., N1 is indexed in the row direction. The column direction is called the second domain, and is indexed in the column direction as j = 1,..., N2. The configuration of data stored in the input data storage unit 28 depends on the purpose and task of the user. For this reason, the above data is provided at the minimum, but the present invention is not limited to this, and various data can be stored depending on the purpose and task.

演算部２０は、初期化部２６と、入力データ記憶部２８と、変数推定部３０と、変数記憶部４２とを含んで構成されている。 The calculation unit 20 includes an initialization unit 26, an input data storage unit 28, a variable estimation unit 30, and a variable storage unit 42.

初期化部２６は、部分行列領域の数と、第１ドメインの各オブジェクト及び第２ドメインの各オブジェクトに対して部分行列領域の数だけ存在する部分行列領域毎に割り当てられるかを表す部分行列領域割当推定値Ｚと、部分行列領域ハイパーパラメータαと、観測ハイパーパラメータβとを初期化し、それぞれ部分行列領域数推定値の初期値Ｋ（０）、部分行列領域割当推定値の初期値Ｚ（０）、部分行列領域ハイパーパラメータの初期値α（０）、及び観測ハイパーパラメータの初期値β（０）として入力データ記憶部２８に格納する。部分行列領域数の初期値Ｋ（０）は、初期化の際に仮定する部分行列領域の数である。部分行列領域数Ｋは、後述する部分行列領域割当推定部３２の推定により最適化される。 The initialization unit 26 represents the number of sub-matrix regions and the number of sub-matrix regions indicating the number of sub-matrix regions allocated to each object in the first domain and each object in the second domain. The allocation estimation value Z, the submatrix region hyperparameter α, and the observation hyperparameter β are initialized, and the initial value K (0) of the submatrix region number estimation value and the initial value Z (0) of the submatrix region allocation estimation value, respectively. ), The initial value α (0) of the partial matrix region hyperparameter, and the initial value β (0) of the observation hyperparameter are stored in the input data storage unit 28. The initial value K (0) of the number of submatrix regions is the number of submatrix regions assumed at the time of initialization. The number K of partial matrix regions is optimized by estimation by a partial matrix region allocation estimation unit 32 described later.

ここで、部分行列領域割当推定値Ｚについて詳細に説明する。本実施の形態では、行列の行および列のインデックスがそれぞれオブジェクトを表現する。例えば解析したい観測行列Ｘについて、行方向において顧客ＩＤを表し、列方向において商品の購買個数データを表す場合を考える。このとき、各行（第１ドメイン）のインデックスｉは個々の顧客を、各列（第２ドメイン）のインデックスｊは個々の商品に相当する。各部分行列は顧客の部分集合と商品の部分集合を表し、例えば「特定の商品群」を好んで購入する「特定の顧客群」のように行オブジェクトと列オブジェクトの同時クラスタリングを実現する。部分行列はｋでインデックスされ、ｋの総数を部分行列領域の数の推定値Ｋで表現する。このクラスタリング結果を表すのが部分行列割当推定値Ｚである。部分行列割当推定値Ｚは、Ｎ１個の第１ドメインのオブジェクトに関する推定値Ｚ１と、Ｎ２個の第２ドメインのオブジェクトに関する推定値Ｚ２とからなる。 Here, the partial matrix region allocation estimated value Z will be described in detail. In the present embodiment, matrix row and column indices each represent an object. For example, for the observation matrix X to be analyzed, consider the case where the customer ID is represented in the row direction and the purchased quantity data of the product is represented in the column direction. At this time, the index i in each row (first domain) corresponds to an individual customer, and the index j in each column (second domain) corresponds to an individual product. Each submatrix represents a subset of customers and a subset of products, and realizes simultaneous clustering of row objects and column objects such as “specific customers” who prefer to purchase “specific products”. The submatrix is indexed by k, and the total number of k is expressed by an estimated value K of the number of submatrix regions. The sub-matrix allocation estimated value Z represents this clustering result. The submatrix allocation estimated value Z includes an estimated value Z1 related to N1 first domain objects and an estimated value Z2 related to N2 second domain objects.

Ｚ１は、 Z1 is

Ｚ１＝｛Ｚ(１，１),…,Ｚ(１,Ｎ１)｝ Z1 = {Z (1,1), ..., Z (1, N1)}

と表され、Ｚ２は、 Z2 is expressed as

Ｚ２＝｛Ｚ(２，１),…,Ｚ(２,Ｎ２)｝ Z2 = {Z (2,1), ..., Z (2, N2)}

と表される。各推定値Ｚ（１,ｉ）、及びＺ（２,ｊ）は、Ｋ個の部分行列領域に対して、そのｋ番目の部分行列領域に含まれるか否かを表現するバイナリ変数、あるいは０から１の間の実数をもつ。すなわち、 It is expressed. Each estimated value Z (1, i) and Z (2, j) is a binary variable that represents whether or not K partial matrix regions are included in the k-th partial matrix region, or 0. A real number between 1 and 1. That is,

Ｚ(１,ｉ)＝｛Ｚ(１,ｉ，１),Ｚ(１,ｉ，２),…,Ｚ(１,ｉ,ｋ),…,Ｚ(１,ｉ,Ｋ)}、 Z (1, i) = {Z (1, i, 1), Z (1, i, 2), ..., Z (1, i, k), ..., Z (1, i, K)},

Ｚ(２,ｊ)＝｛Ｚ(２,ｊ，１),Ｚ(２,ｊ，２),…,Ｚ(２,ｊ,ｋ),…,Ｚ(２,ｊ,Ｋ)}、 Z (2, j) = {Z (2, j, 1), Z (2, j, 2), ..., Z (2, j, k), ..., Z (2, j, K)},

であり、Ｚ(１,ｉ,ｋ)＝１ならば第１ドメインのｉ番目のオブジェクトは第ｋ番目の部分行列領域に所属する、Ｚ（１,ｉ,ｋ）＝０ならば所属しない、と表現する。実数値を利用する場合には、その中間として所属の度合いも表現する。Ｚ（２,ｊ,ｋ）についても同様である。なお、本実施の形態では、部分行列割当推定値Ｚに対して何らかの数学的なモデルを仮定することが必要となる。 And if Z (1, i, k) = 1, the i-th object of the first domain belongs to the k-th submatrix region, and if Z (1, i, k) = 0, it does not belong. It expresses. When using real values, the degree of affiliation is also expressed as an intermediate point. The same applies to Z (2, j, k). In the present embodiment, it is necessary to assume some mathematical model for the partial matrix allocation estimated value Z.

次に、初期化部２６で初期化されるハイパーパラメータについて説明する。部分行列領域ハイパーパラメータの初期値α（０）は、部分行列領域を推定するために利用する数学モデルのパラメータの初期値である。本実施の形態で採用するモデルのパラメータ集合を部分行列領域ハイパーパラメータαとする。モデルは、観測行列Ｘが与えられた際に、ＺおよびＫを推定することを主目的とするものである。観測ハイパーパラメータの初期値β（０）は、部分行列領域の観測値の特徴を表現する数学モデルに関するパラメータの初期値である。観測ハイパーパラメータβの表現は目的やデータの性質によって変化する。 Next, hyperparameters initialized by the initialization unit 26 will be described. The initial value α (0) of the submatrix region hyperparameter is an initial value of the parameter of the mathematical model used for estimating the submatrix region. The parameter set of the model employed in the present embodiment is a submatrix area hyperparameter α. The model is mainly intended to estimate Z and K when an observation matrix X is given. The initial value β (0) of the observation hyperparameter is an initial value of a parameter related to a mathematical model that expresses the feature of the observation value in the submatrix region. The representation of the observation hyperparameter β varies depending on the purpose and the nature of the data.

入力データ記憶部２８には、初期化部２６により初期化された部分行列領域の数の初期値Ｋ（０）と、部分行列領域ハイパーパラメータの初期値α（０）とが格納される。また、入力データ記憶部２８には、入力部１０より受け付けた観測行列Ｘ、及び終了条件定数が格納される。終了条件定数は、後述する繰り返し判定部４０の終了条件判定に用いる定数である。通常、繰り返し計算によって推定計算が実施されるため、終了条件定数には、最大繰り返し回数あるいは評価値の変動幅に対する閾値などを設定する。 The input data storage unit 28 stores an initial value K (0) of the number of submatrix regions initialized by the initialization unit 26 and an initial value α (0) of the submatrix region hyperparameter. The input data storage unit 28 stores an observation matrix X received from the input unit 10 and an end condition constant. The end condition constant is a constant used for determining the end condition of the repetition determination unit 40 described later. Usually, since the estimation calculation is performed by iterative calculation, a threshold value for the maximum number of repetitions or the fluctuation range of the evaluation value is set as the end condition constant.

変数記憶部４２には、後述する変数推定部３０で推定された部分行列領域の数Ｋ、部分行列割当推定値Ｚ、観測パラメータ推定値θ、及び観測ハイパーパラメータ推定値βが格納される。 The variable storage unit 42 stores the number K of partial matrix regions estimated by the variable estimation unit 30 described later, the partial matrix allocation estimated value Z, the observed parameter estimated value θ, and the observed hyperparameter estimated value β.

変数推定部３０は、部分行列領域割当推定部３２と、部分行列領域ハイパーパラメータ推定部３４と、観測パラメータ推定部３６と、観測ハイパーパラメータ推定部３８と、繰り返し判定部４０とを含んで構成されている。 The variable estimation unit 30 includes a partial matrix region allocation estimation unit 32, a partial matrix region hyperparameter estimation unit 34, an observation parameter estimation unit 36, an observation hyperparameter estimation unit 38, and an iterative determination unit 40. ing.

部分行列領域割当推定部３２は、以下に説明する第１〜第３の処理を行う。第１の処理では、既存の部分行列領域割当では既存Ｋ個の部分行列それぞれについて、注目しているオブジェクトを所属させるか否かを推定するのみであり、既存の手法といえる。一方、第２の処理では、「新しい部分行列領域を生成するか否か」すなわち部分行列の数を増加させる処理を行う。また第３の処理では、「使用されていない部分行列領域を消去する」すなわち部分行列の数を減少させる処理が組み込まれている。この第２及び第３の処理を適切に実装することによって、行列データに合わせて最適な部分行列領域の数Ｋを自動的に決定することが可能となる点が、本実施の形態における最大の特徴といえる。 The submatrix region allocation estimation unit 32 performs first to third processes described below. In the first process, the existing submatrix region allocation merely estimates whether or not the object of interest belongs to each of the existing K submatrixes, which is an existing method. On the other hand, in the second process, “whether a new submatrix region is generated”, that is, a process of increasing the number of submatrixes is performed. The third process incorporates a process of “erasing unused submatrix regions”, that is, reducing the number of submatrixes. By appropriately implementing the second and third processes, it is possible to automatically determine the optimum number K of sub-matrix regions in accordance with the matrix data. This is a feature.

部分行列領域割当推定部３２は、第１の処理において、第１ドメインの各オブジェクト及び第２ドメインの各オブジェクトに対し、部分行列領域の数だけ存在する前記部分行列領域毎に、入力データ記憶部２８又は変数記憶部４２に記憶されている、部分行列領域割当推定値Ｚに基づいて推定される、当該オブジェクトが当該部分行列領域に割り当てられる度合いを表す事前適合度と、入力データ記憶部２８に記憶されている観測行列Ｘ、入力データ記憶部２８又は変数記憶部４２に記憶されている、部分行列領域割当推定値Ｚ、及び部分行列領域ハイパーパラメータαに基づいて推定される、当該オブジェクトが当該部分行列領域に割り当てられる尤もらしさを表すデータ適合度とに基づいて、当該オブジェクトが当該部分行列領域に所属する可能性を算出することにより、当該オブジェクトが、当該部分行列領域に所属すべきか否かを推定する。そして、当該部分行列領域に所属すべきであると推定された場合には当該オブジェクトに対して当該部分行列領域を割り当てるように更新した部分行列割当推定値Ｚを変数記憶部４２に記憶する。 In the first process, the submatrix area allocation estimation unit 32 performs an input data storage unit for each submatrix area that exists in the number of submatrix areas for each object in the first domain and each object in the second domain. 28 or a pre-fit degree indicating the degree to which the object is assigned to the partial matrix region, which is estimated based on the partial matrix region allocation estimated value Z, stored in the variable storage unit 42, and the input data storage unit 28. The object is estimated based on the stored observation matrix X, the partial matrix region allocation estimated value Z, and the partial matrix region hyperparameter α stored in the input data storage unit 28 or the variable storage unit 42. The object belongs to the submatrix area based on the data fit that represents the likelihood assigned to the submatrix area By calculating the likelihood that the object is to estimate whether or not to belong to the partial matrix region. Then, when it is estimated that it should belong to the submatrix region, the submatrix allocation estimated value Z updated so as to allocate the submatrix region to the object is stored in the variable storage unit 42.

部分行列領域割当推定部３２の第１の処理の詳細について以下に説明する。第１の処理では、第１ドメインおよび第２ドメインのすべてのオブジェクトについて部分行列領域の割当の再計算を行う。この再計算の際、オブジェクトの更新順は任意である。説明の便宜のため、現在、第１ドメインのオブジェクトｉを選んで推定し、割り当てによる部分行列領域割当推定値Ｚの更新を行うものとする。 Details of the first process of the submatrix region allocation estimation unit 32 will be described below. In the first process, the sub-matrix region allocation is recalculated for all objects in the first domain and the second domain. In this recalculation, the update order of the objects is arbitrary. For convenience of explanation, it is assumed that the object i of the first domain is currently selected and estimated, and the submatrix area allocation estimated value Z is updated by allocation.

第１の処理において、あるオブジェクトが部分行列領域ｋに所属しやすいか否かは一般に「他のオブジェクトの所属状況、つまり部分行列領域割当推定値Ｚから推定される当該オブジェクトが当該部分行列領域に割り当てられる度合い、又は割り当てられない度合い（所属度）」と「部分行列領域ｋに所属するあるいは所属しないと決めたときに、観測行列Ｘをどれだけよく説明できるか」の２つの要素により決定される。前者を事前適合度、後者をデータ適合度と呼ぶと、あるオブジェクトが部分行列領域ｋに所属する可能性は、例えば、以下の（１）式又は（２）式のように計算が可能である。また、同様に、所属しない可能性も計算可能である。 In the first processing, whether or not an object is likely to belong to the submatrix region k is generally determined by “the affiliation status of other objects, that is, the object estimated from the submatrix region allocation estimated value Z is included in the submatrix region. "Degree of assignment or degree of assignment (affiliation degree)" and "How well the observation matrix X can be explained when it is decided that it belongs to or does not belong to the submatrix region k". The If the former is called the pre-matching degree and the latter is called the data matching degree, the possibility that an object belongs to the submatrix region k can be calculated, for example, as in the following formula (1) or (2). . Similarly, the possibility of not belonging can also be calculated.

（部分行列領域ｋに所属する可能性）
＝（事前適合度）×（所属すると決めた場合のデータ適合度）・・・（１） (Possibility of belonging to submatrix region k)
= (Preliminary conformance) x (Data conformance when it is decided to belong) (1)

あるいは Or

（部分行列領域ｋに所属する可能性）
＝（事前適合度）＋（所属すると決めた場合のデータ適合度）・・・（２） (Possibility of belonging to submatrix region k)
= (Preliminary conformance) + (Data conformance when it is decided to belong) (2)

部分行列領域割当推定部３２の第１の処理では、上記の（１）式（又は（２）式）に基づいて、部分行列領域割当推定値Ｚにおけるオブジェクトｉの部分行列領域ｋへの割当Ｚ（１,ｉ,ｋ）を再計算して更新する。計算方法は、Ｚ（１,ｉ,ｋ）を０又は１の二値として、所属する可能性と所属しない可能性の大きい方へ割り当てる方法、（Ｚ（１,ｉ,ｋ）を０〜１の実数として、所属する可能性と所属しない可能性で按分する方法、Ｚ（１,ｉ,ｋ）を０又は１の二値として、按分結果に基づいて確率的に選択する方法などの方法が考えられる。なお、事前適合度およびデータ適合度の計算方法は任意である。第１の処理において統計的に最適であるとされる実装は、掛け算に基づく可能性計算式をベイズ推定に基づいて定式化する方法である。ベイズ推定に基づく定式化の場合の具体例については実施例において紹介する。なお、特に指針がない場合にはいずれかあるいは両方を定数としてもよい。ただし、数学的に適切な方法で計算していない場合には第２の処理における部分行列領域の数の更新と適合せず、計算が破たんする可能性があるためする計算方法は整合する必要がある。 In the first process of the submatrix region allocation estimation unit 32, the allocation Z of the object i to the submatrix region k in the submatrix region allocation estimation value Z based on the above equation (1) (or equation (2)). Recalculate and update (1, i, k). The calculation method is a method of assigning Z (1, i, k) as a binary value of 0 or 1, and assigning Z (1, i, k) from 0 to 1 As a real number, there is a method of apportioning by the possibility of belonging and the possibility of not belonging, and a method of selecting Z (1, i, k) as a binary value of 0 or 1 and probabilistically selecting based on the apportioning result. It should be noted that the method of calculating the pre-fit and the data fit is arbitrary, and the implementation that is considered to be statistically optimal in the first process is to calculate the possibility calculation formula based on multiplication based on Bayesian estimation. Specific examples in the case of formulation based on Bayesian estimation will be introduced in the embodiment, and either or both may be constants unless otherwise indicated, but mathematically If not calculated in the proper way Processing incompatible with the number of updates of the partial matrix region in a calculation method of calculation is due to the possibility of failure has to be matched.

部分行列領域割当推定部３２は、第２の処理において、第１ドメインの各オブジェクトに対し、部分行列領域割当推定値Ｚに基づいて推定される、必要とされる部分行列領域の数Ｋに関する事前適合度と、新たに生成される部分行列領域に、第２ドメインのオブジェクトが割り当てられる度合いを表す事前適合度と、新たな部分行列領域内の観測値についての観測パラメータθに関する事前適合度と、部分行列領域の数を増やしたことで観測行列Ｘをよく説明できるようになった度合いを表すデータ適合度とに基づいて、新たな部分行列領域を加えた部分行列領域の数Ｋ＋Ｌである可能性を算出することにより、オブジェクトを表現するために新たな部分行列領域を生成するべきか否かを推定する。また、第２ドメインについても第１ドメインと同様に、各オブジェクトに対し、部分行列領域割当推定値Ｚに基づいて推定される、必要とされる部分行列領域の数Ｋに関する事前適合度と、新たに生成される部分行列領域に、第１ドメインのオブジェクトが割り当てられる度合いを表す事前適合度と、新たな部分行列領域内の観測値についての観測パラメータθに関する事前適合度と、部分行列領域の数を増やしたことで観測行列Ｘをよく説明できるようになった度合いを表すデータ適合度とに基づいて、部分行列領域の数が、新たな部分行列領域を加えたＫ＋Ｌである可能性を算出することにより、オブジェクトを表現するために新たな部分行列領域を生成するべきか否かを推定する。 In the second process, the submatrix region allocation estimation unit 32 performs advance processing on the number K of required submatrix regions estimated based on the partial matrix region allocation estimation value Z for each object in the first domain. The degree of goodness, the degree of pre-fit that represents the degree to which the object of the second domain is assigned to the newly generated submatrix region, the degree of prefit for the observation parameter θ for the observation value in the new submatrix region, Possibility of the number K + L of the number of submatrix regions added with a new submatrix region, based on the data suitability indicating the degree to which the observation matrix X can be well explained by increasing the number of submatrix regions To calculate whether or not a new submatrix region should be generated to represent the object. Further, in the same way as in the first domain, the second domain is estimated based on the partial matrix region allocation estimated value Z for each object. The degree of prior fit indicating the degree to which objects of the first domain are assigned to the generated submatrix region, the degree of prior fit for the observation parameter θ for the observation value in the new submatrix region, and the number of submatrix regions The possibility that the number of submatrix regions is K + L with the addition of a new submatrix region is calculated on the basis of the data suitability representing the degree that the observation matrix X can be well explained by increasing Thus, it is estimated whether or not a new submatrix region should be generated to represent the object.

部分行列領域割当推定部３２の第２の処理では、以下に説明するように、第１の処理において、第１ドメイン又は第２ドメインのあるオブジェクトについてのＫ個の部分行列領域への所属割当の推定が終了した後に、当該オブジェクトについて、新しい部分行列領域が必要かどうかを計算する。これは、初期化などで適当に部分行列領域の数Ｋを与えた場合、部分行列領域の数Ｋが与えられた観測行列Ｘに潜在する部分行列領域の数としては不足している可能性があるからである。計算方法は第１の処理の部分行列領域にオブジェクトが所属する可能性を計算した場合とほぼ同様である。 In the second process of the submatrix region allocation estimation unit 32, as will be described below, in the first process, the affiliation allocation to the K submatrix regions for the object in the first domain or the second domain is performed. After the estimation is finished, calculate whether a new submatrix region is necessary for the object. This is because there is a possibility that the number of submatrix regions latent in the observation matrix X given the number K of submatrix regions is insufficient when the number K of submatrix regions is appropriately given by initialization or the like. Because there is. The calculation method is almost the same as the case where the possibility that an object belongs to the submatrix area of the first process is calculated.

具体的には、部分行列領域割当推定部３２の第２の処理において、以下に説明する三つの要素に基づいて、Ｌ個の新たな部分行列領域を生成するべきか否かを推定する。第２の処理では、考慮すべき一つ目の要素として、「第１の処理の全オブジェクトの部分行列領域の所属状況、すなわち部分行列領域割当推定値Ｚに基づいて推定される、必要とされる部分行列領域の数Ｋに関する事前適合度」を推定する。ここでは部分行列領域の数を０個増やすか、２個増やすかを評価する評価値を推定する。ここで、部分行列領域の数をＬ個増やすとした場合、今現在の処理において注目しているオブジェクトｉ又はオブジェクトｊは必ずＬ個の部分行列領域に所属するものとする。 Specifically, in the second process of the partial matrix region allocation estimation unit 32, it is estimated whether or not L new partial matrix regions should be generated based on the following three elements. In the second process, as a first element to be considered, “the affiliation status of the partial matrix area of all objects in the first process, that is, estimated based on the partial matrix area allocation estimated value Z is required. Pre-fit degree regarding the number K of sub-matrix regions ”is estimated. Here, an evaluation value for evaluating whether to increase the number of submatrix regions by 0 or 2 is estimated. Here, when the number of submatrix regions is increased by L, it is assumed that the object i or object j that is focused on in the current process always belongs to L submatrix regions.

次に、第２の処理では、二つ目の要素として、「新たに生成されるＬ個の部分行列領域に対し、他方のドメインのオブジェクトの割り当ててられる度合い」を計算する。例えば、第１ドメインのオブジェクトに対して新たに生成されるＬ個の部分行列領域に対しては、第２ドメインのオブジェクトがＬ個の部分行列領域に所属しなければならない（第２ドメインのオブジェクトに割り当てないと矩形にならない）。従って、今現在の処理において第１ドメインのオブジェクトｉに注目しているとすれば、第２ドメインのＮ２個のオブジェクトについては何らかの指標（例えばランダム）で新たな部分行列領域に所属すると仮定して度合いに関する事前適合度を計算する。 Next, in the second process, as a second element, “the degree to which objects of the other domain are allocated to L newly generated sub-matrix regions” is calculated. For example, for L submatrix regions newly generated for an object in the first domain, an object in the second domain must belong to L submatrix regions (objects in the second domain). If it is not assigned to, it will not become a rectangle). Therefore, assuming that the object i in the first domain is focused on in the current processing, it is assumed that the N2 objects in the second domain belong to a new submatrix region with some index (eg, random). Calculate the degree of prior fit for the degree.

また、第２の処理では、三つ目の要素として、変数記憶部４２に記憶されている観測パラメータθに基づいて、新たに生成されるＬ個の部分行列領域に対する観測パラメータθを設定し、新たな部分行列領域内の観測値についての観測パラメータβに関する事前適合度を得る。 In the second process, as a third element, an observation parameter θ for L newly generated sub-matrix regions is set based on the observation parameter θ stored in the variable storage unit 42, Obtain a prior goodness of the observation parameter β for the observation value in the new submatrix region.

そして、第２の処理では、以上の三つの要素が定まったときに、「部分行列領域の数をＫ＋Ｌとしたことで観測行列Ｘをどれだけよく説明できるか」を計算する。 In the second process, when the above three elements are determined, “how well the observation matrix X can be explained by setting the number of submatrix regions to K + L” is calculated.

以上の要素を組み合わせて、以下（３）式に基づいて、部分行列領域の数を増やすか否かを推定する。 By combining the above elements, whether or not to increase the number of submatrix regions is estimated based on the following equation (3).

（部分行列領域の数がＫ＋Ｌである可能性）

＝（部分行列領域割当推定値Ｚに基づいて推定される、必要とされる部分行列領域の数Ｋに関する事前適合度）×（新たに生成されるＬ個の部分行列領域に、他方のドメインのオブジェクトが割り当てられる度合いを表す事前適合度）×（新たな部分行列領域内の観測値についての観測パラメータβに関する事前適合度）×（部分行列領域の数を増やしたことで観測行列Ｘをよく説明できるようになった度合いを表すデータ適合度）
・・・（３） (The number of submatrix regions may be K + L)

= (Pre-fitness for the required number of submatrix regions K estimated based on the submatrix region allocation estimate Z) × (L newly generated submatrix regions in the other domain Pre-adaptation degree indicating the degree to which an object is assigned) x (pre-adaptation degree regarding observation parameter β for observation value in new sub-matrix region) x (observation matrix X is explained well by increasing the number of sub-matrix regions Data suitability indicating the degree to which it is possible
... (3)

上記（３）式では乗算としているが、加減乗除の使い方等は任意である。また、各適合度の計算はユーザの設計した数学モデルに依存する。なお、Ｌの個数はランダムに決定する。また、後述する実施例では、確率的な意味で最適な実装例を紹介するが、実装方法は実施例における実装例に限定されるものではなく、多様な数学モデルの設計が可能である。 In the above equation (3), multiplication is used, but how to use addition, subtraction, multiplication, and division is arbitrary. In addition, the calculation of each fitness level depends on a mathematical model designed by the user. Note that the number of L is determined randomly. In the embodiments described later, an optimal implementation example is introduced in a probabilistic sense, but the implementation method is not limited to the implementation examples in the embodiment, and various mathematical models can be designed.

そして、第２の処理では、Ｌ個の部分行列領域を生成するべきであると推定された場合には、Ｌ個の部分行列領域を生成し、当該オブジェクトに、生成されたＬ個の部分行列領域を割り当てるように、変数記憶部４２に記憶されている部分行列割当推定値Ｚを更新する。また、部分行列領域の数ＫをＫ＝Ｋ＋Ｌとして変数記憶部４２の部分行列領域の数Ｋを更新する。 Then, in the second process, when it is estimated that L partial matrix regions should be generated, L partial matrix regions are generated, and the generated L partial matrices are generated in the object. The submatrix allocation estimated value Z stored in the variable storage unit 42 is updated so as to allocate the area. Further, the number K of partial matrix regions in the variable storage unit 42 is updated with the number K of partial matrix regions set to K = K + L.

部分行列領域割当推定部３２は、第３の処理において、上記第１の処理及び第２の処理によって部分行列領域毎に割り当てられた第１ドメインのオブジェクト数又は第２ドメインのオブジェクト数が所定値以下となる部分行列領域を削除し、部分行列領域の数を更新する。 In the third process, the submatrix region allocation estimation unit 32 determines whether the number of objects in the first domain or the number of objects in the second domain allocated for each submatrix region by the first process and the second process is a predetermined value. The following submatrix regions are deleted and the number of submatrix regions is updated.

具体的には、部分行列領域割当推定部３２の第３の処理では、以下に説明するように部分行列領域割当推定値Ｚに不必要な部分行列領域がないかを確認し、更新する。ここで、各部分行列領域の大きさは、「第１ドメインで所属するオブジェクト数」×「第２ドメインで所属するオブジェクト数」で決定される。しかし、抽出したい「部分行列領域」は、第１ドメインと第２ドメインとの部分集合の直積で表現されるため、「どちらかのドメインで所属するオブジェクトが一定値以下である」と推定される部分行列領域は、不必要な部分行列領域として削除する。例えば、ｋ番目の部分行列領域に所属する第１ドメインの総オブジェクト数はＺ（１,ｉ,ｋ）を全てのｉについて和をとれば計算できる。この総オブジェクト数が一定値以下の部分行列領域を削除する。そして、一つの部分行列領域の削除に伴って、部分行列領域の数ＫをＫ＝Ｋ−１と減少させる。また、必要な場合には部分行列領域割当推定値Ｚおよび観測パラメータθ、観測ハイパーパラメータβなどの部分行列領域のインデックスｋを適切に指定しなおす。なお、第３の処理の削除手続きは、本実施の形態では第２の処理の後に実行するが、これに限定されるものではなく、例えば第１の処理又は第２の処理の各オブジェクトの推定が終わる度、あるいは全オブジェクトの推定が終わる度など、任意のタイミングで実行できる。 Specifically, in the third process of the submatrix region allocation estimation unit 32, as described below, the submatrix region allocation estimation value Z is checked for an unnecessary submatrix region and updated. Here, the size of each submatrix region is determined by “the number of objects belonging to the first domain” × “the number of objects belonging to the second domain”. However, since the “submatrix region” to be extracted is expressed by the direct product of the subsets of the first domain and the second domain, it is estimated that “the object belonging to either domain is below a certain value”. The submatrix area is deleted as an unnecessary submatrix area. For example, the total number of objects in the first domain belonging to the kth submatrix region can be calculated by summing Z (1, i, k) for all i. The submatrix area whose total number of objects is equal to or smaller than a certain value is deleted. Then, with the deletion of one submatrix region, the number K of submatrix regions is reduced to K = K-1. If necessary, the submatrix area allocation estimated value Z, the observation parameter θ, the observation hyperparameter β, and the like, and the index k of the submatrix area are appropriately designated again. The deletion procedure of the third process is executed after the second process in the present embodiment, but is not limited to this. For example, the estimation of each object of the first process or the second process is performed. Can be executed at an arbitrary timing, such as every time or after all objects have been estimated.

部分行列領域ハイパーパラメータ推定部３４は、変数記憶部４２に記憶されている部分行列領域割当推定値Ｚに基づいて、部分行列領域の数および部分行列領域の割り当てに関する数学モデルに必要な部分行列領域ハイパーパラメータαを推定し、変数記憶部４２に記憶する。 The submatrix area hyperparameter estimation unit 34 is based on the submatrix area allocation estimated value Z stored in the variable storage unit 42 and is necessary for a mathematical model relating to the number of submatrix areas and the allocation of the submatrix areas. The hyper parameter α is estimated and stored in the variable storage unit 42.

観測パラメータ推定部３６は、入力部１０において受け付けた観測行列Ｘと、変数記憶部４２に記憶されている部分行列領域割当推定値Ｚとに基づいて、部分行列領域の観測値の表現に用いる数学モデルの観測パラメータθを推定し、変数記憶部４２に記憶する。 Based on the observation matrix X received by the input unit 10 and the partial matrix region allocation estimated value Z stored in the variable storage unit 42, the observation parameter estimation unit 36 performs mathematics used to represent the observation values in the partial matrix region. The model observation parameter θ is estimated and stored in the variable storage unit 42.

観測ハイパーパラメータ推定部３８は、観測パラメータ推定部３６で推定された観測パラメータθに基づいて、観測パラメータθの数学モデルに必要な観測ハイパーパラメータβを推定し、変数記憶部４２に記憶する。 The observation hyperparameter estimation unit 38 estimates the observation hyperparameter β necessary for the mathematical model of the observation parameter θ based on the observation parameter θ estimated by the observation parameter estimation unit 36 and stores it in the variable storage unit 42.

なお、部分行列領域ハイパーパラメータ推定部３４、観測パラメータ推定部３６、及び観測ハイパーパラメータ推定部３８による推定を行わずに初期値のままにしてもよい。 Note that the initial values may be left as they are without performing the estimation by the partial matrix region hyperparameter estimation unit 34, the observation parameter estimation unit 36, and the observation hyperparameter estimation unit 38.

繰り返し判定部４０は、部分行列領域割当推定部３２による推定及び割り当て、並びに部分行列領域ハイパーパラメータ推定部３４、観測パラメータ推定部３６、及び観測ハイパーパラメータ推定部３８による推定を予め定めた繰り返し終了条件を満たすまで繰り返す。ここで繰り返し終了条件には、入力データ記憶部２８に記憶されている終了条件定数を用いればよい。 The iterative determination unit 40 determines the estimation and allocation by the submatrix region allocation estimation unit 32 and the estimation by the submatrix region hyperparameter estimation unit 34, the observation parameter estimation unit 36, and the observation hyperparameter estimation unit 38 in advance. Repeat until Here, an end condition constant stored in the input data storage unit 28 may be used as the repeat end condition.

なお、変数推定部３０の各部の構成は上記に限定されるものではなく、ユーザが想定する観測モデルなどによって依存するため、一概に記述することはできない。後述する実施例では、具体的に設計したモデルに合わせた実装例を紹介する。 Note that the configuration of each part of the variable estimation unit 30 is not limited to the above, and depends on the observation model assumed by the user, and therefore cannot be generally described. In an example described later, an implementation example according to a specifically designed model is introduced.

＜本発明の実施の形態に係る部分行列領域抽出装置の作用＞ <Operation of Submatrix Region Extraction Device According to Embodiment of the Present Invention>

次に、本発明の実施の形態に係る部分行列領域抽出装置１００の作用について説明する。入力部１０において観測行列Ｘを受け付けると、観測行列Ｘを入力データ記憶部２８に格納するとともに、部分行列領域抽出装置１００は、図３に示す部分行列領域抽出処理ルーチンを実行する。 Next, the operation of the submatrix region extraction apparatus 100 according to the embodiment of the present invention will be described. When the observation matrix X is received by the input unit 10, the observation matrix X is stored in the input data storage unit 28, and the partial matrix region extraction device 100 executes a partial matrix region extraction processing routine shown in FIG.

まず、ステップＳ１００では、部分行列領域の数と、第１ドメインの各オブジェクト及び第２ドメインの各オブジェクトに対して部分行列領域毎に割り当てられるかを表す部分行列領域割当推定値と、部分行列領域ハイパーパラメータと、観測ハイパーパラメータとを初期化し、入力データ記憶部２８に格納する。 First, in step S100, the number of submatrix regions, a submatrix region allocation estimation value indicating whether each submatrix region is allocated to each object in the first domain and each object in the second domain, and a submatrix region The hyper parameters and the observation hyper parameters are initialized and stored in the input data storage unit 28.

次に、ステップＳ１０２では、上記部分行列領域割当推定部３２の第１〜第３の処理によって、部分行列割当推定値Ｚ及び部分行列領域の数Ｋを推定する。 Next, in step S102, the partial matrix allocation estimated value Z and the number K of partial matrix regions are estimated by the first to third processes of the partial matrix region allocation estimating unit 32.

ステップＳ１０４では、ステップＳ１０２で推定された部分行列割当推定値Ｚに基づいて、部分行列領域ハイパーパラメータαを推定する。 In step S104, the partial matrix region hyperparameter α is estimated based on the partial matrix allocation estimated value Z estimated in step S102.

ステップＳ１０６では、入力部１０において受け付けた観測行列Ｘと、ステップＳ１０２で推定された部分行列割当推定値Ｚとに基づいて、観測パラメータθを推定する。 In step S106, the observation parameter θ is estimated based on the observation matrix X received in the input unit 10 and the partial matrix allocation estimated value Z estimated in step S102.

ステップＳ１０８では、ステップＳ１０６で推定された観測パラメータθに基づいて、観測ハイパーパラメータβを推定する。 In step S108, the observation hyperparameter β is estimated based on the observation parameter θ estimated in step S106.

ステップＳ１１０では、予め定めた繰り返し終了条件を満たすかを判定し、満たしていなければステップＳ１０２へ移行してステップＳ１０２〜ステップＳ１０８の処理を繰り返し、満たしていればステップＳ１０８へ移行する。 In step S110, it is determined whether a predetermined repetition end condition is satisfied. If not satisfied, the process proceeds to step S102, and the processes in steps S102 to S108 are repeated. If satisfied, the process proceeds to step S108.

ステップＳ１１２では、ステップＳ１０２で推定された部分行列割当推定値Ｚ及び部分行列領域の数Ｋを出力部５０により出力し処理を終了する。 In step S112, the partial matrix allocation estimated value Z estimated in step S102 and the number K of partial matrix regions are output by the output unit 50, and the process ends.

上記ステップＳ１０２の処理は、図４及び図５に示す推定割当処理ルーチンにより実現される。 The process of step S102 is realized by the estimated allocation process routine shown in FIGS.

ステップＳ２００では、第１ドメインのオブジェクトｉを、ｉ＝１と初期化する。 In step S200, the object i of the first domain is initialized as i = 1.

ステップＳ２０４では、部分行列領域ｋを選択する。 In step S204, the submatrix region k is selected.

ステップＳ２０６では、第１ドメインのオブジェクトｉに対し、ステップＳ２０４で選択した部分行列領域について、上記（１）式に従って、当該オブジェクトｉが当該部分行列領域ｋに割り当てられる度合いを表す事前適合度と、当該オブジェクトｉが当該部分行列領域ｋに割り当てられる尤もらしさを表すデータ適合度とに基づいて、当該オブジェクトｉが当該部分行列領域ｋに所属する可能性を算出する。 In step S206, with respect to the object i of the first domain, with respect to the submatrix region selected in step S204, according to the above equation (1), a pre-matching degree indicating the degree to which the object i is assigned to the submatrix region k; The possibility that the object i belongs to the submatrix region k is calculated based on the data suitability representing the likelihood that the object i is assigned to the submatrix region k.

ステップＳ２０８では、ステップＳ２０６の算出結果に基づいて、当該部分行列領域ｋに所属すべきである否かを判定し、判定結果に応じて更新した部分行列割当推定値Ｚを変数記憶部４２に記憶する。 In step S208, based on the calculation result of step S206, it is determined whether or not it should belong to the submatrix region k, and the submatrix allocation estimated value Z updated according to the determination result is stored in the variable storage unit 42. To do.

ステップＳ２１０では、当該オブジェクトｉに対し、全ての部分行列領域について推定したかを判定し、推定していない部分行列領域が存在する場合には、ステップＳ２０４に戻って、当該部分行列領域ｋを選択し、ステップＳ２０６〜Ｓ２０８の処理を繰り返し、全て推定していればステップＳ２１２へ移行する。 In step S210, it is determined whether all submatrix regions have been estimated for the object i. If there is a submatrix region that has not been estimated, the process returns to step S204 to select the submatrix region k. And if the process of step S206-S208 is repeated and all are estimated, it will transfer to step S212.

ステップＳ２１２では、上記（３）式に従って、部分行列領域の数がＫ＋Ｌである可能性を算出する。このとき、Ｌの値をランダムに決定すればよい。 In step S212, the possibility that the number of sub-matrix regions is K + L is calculated according to the above equation (3). At this time, the value of L may be determined at random.

ステップＳ２１４では、ステップＳ２１２の算出結果に基づいて、Ｌ個の部分行列領域を生成するべきかを判定し、生成すべきでない場合にはステップＳ２１８へ移行し、生成すべきであればステップＳ２１６へ移行する。 In step S214, it is determined whether or not L partial matrix regions should be generated based on the calculation result in step S212. If not, the process proceeds to step S218. If so, the process proceeds to step S216. Transition.

ステップＳ２１６では、Ｌ個の部分行列領域を生成し、当該オブジェクトｉに、生成されたＬ個の部分行列領域の各々を割り当てて、変数記憶部４２に記憶されている部分行列割当推定値Ｚを更新する。また、変数記憶部４２に記憶されている部分行列領域の数Ｋを更新する。 In step S216, L submatrix regions are generated, each of the generated L submatrix regions is assigned to the object i, and the submatrix allocation estimation value Z stored in the variable storage unit 42 is set. Update. In addition, the number K of sub-matrix regions stored in the variable storage unit 42 is updated.

ステップＳ２１８では、第１ドメインの全てのオブジェクトｉについて推定及び割り当てをしたかを判定し、していなければステップＳ２２０へ移行してｉ＝ｉ＋１として、ステップＳ２０４〜Ｓ２１６の処理を繰り返し、全て推定していればステップＳ２２２へ移行する。 In step S218, it is determined whether all objects i in the first domain have been estimated and assigned. If not, the process proceeds to step S220, i = i + 1 is set, and the processes in steps S204 to S216 are repeated to estimate all. If yes, the process proceeds to step S222.

ステップＳ２２２では、第２ドメインのオブジェクトｊを、ｊ＝１と初期化する。 In step S222, the object j of the second domain is initialized as j = 1.

ここで、ステップＳ２２４〜Ｓ２３６については、第２ドメインのオブジェクトｊについて、上記ステップＳ２０４〜Ｓ２１６で説明したオブジェクトｉについて行った処理と同様の処理を行えばよいため説明を省略する。 Steps S224 to S236 are not described here because the same processing as that performed for the object i described in steps S204 to S216 may be performed for the object j in the second domain.

そして、ステップＳ２３８では、全てのオブジェクトｊについて推定及び割り当てをしたかを判定し、していなければステップＳ２４０へ移行してｊ＝ｊ＋１として、ステップＳ２２４〜Ｓ２３６の処理を繰り返し、全て推定していればステップＳ２４０へ移行する。 In step S238, it is determined whether all objects j have been estimated and assigned. If not, the process proceeds to step S240, j = j + 1 is set, and the processes in steps S224 to S236 are repeated to estimate all. If so, the process proceeds to step S240.

ステップＳ２４２では、部分行列割当推定値Ｚに基づいて、割り当てられた第１ドメインのオブジェクト数又は第２ドメインのオブジェクト数が所定値以下となる部分行列領域を削除し、変数記憶部４２に記憶されている部分行列領域の数を更新し、推定割当処理ルーチンを終了する。 In step S242, based on the submatrix allocation estimated value Z, a submatrix region in which the allocated number of objects in the first domain or the number of objects in the second domain is equal to or smaller than a predetermined value is deleted and stored in the variable storage unit 42. The number of sub-matrix regions that have been updated is updated, and the estimated allocation processing routine ends.

＜実施例＞ <Example>

本発明の実施の形態に係る手法の実験結果について説明する。ここでは、実数行列データが与えられた場合に、確率的に最適な実装が可能な数学モデルの設計と具体的な変数や実装する計算式を示す。 An experimental result of the method according to the embodiment of the present invention will be described. Here, the design of a mathematical model that can be stochastically optimized when real matrix data is given, specific variables, and calculation formulas to be implemented are shown.

本実施例では、Ｐｌａｉｄモデル（上記非特許文献１、及び非特許文献２参照）に基づいた確率的なモデルによる例を説明する。Ｐｌａｉｄモデルとは、部分行列抽出の既存技術の一つである。Ｐｌａｉｄモデルは、連続な実数の観測値を、複数の部分行列領域の平均値パラメータの重ね合わせで表現し、抽出する。本実施例では、上記で説明した実施の形態の構成をこの手法に適用及び拡張することで、必要な部分行列領域の数を自動的に抽出し、各種パラメータも自動的に推定することが可能なＰｌａｉｄモデルを実現する実験を行った。 In this embodiment, an example using a probabilistic model based on the Plaid model (see Non-Patent Document 1 and Non-Patent Document 2 above) will be described. The Plaid model is one of existing techniques for submatrix extraction. The Plaid model expresses and extracts continuous real observation values by superposing average value parameters of a plurality of submatrix regions. In this example, by applying and extending the configuration of the embodiment described above to this method, it is possible to automatically extract the number of necessary submatrix regions and automatically estimate various parameters. An experiment was conducted to realize a clear Plaid model.

以下（４）〜（８）式に示す、Ｐｌａｉｄモデルの技術を適用する部分行列領域抽出モデル（拡張Plaid model）の確率的な数式モデルに従って、部分行列領域割当推定値Ｚ、観測パラメータ推定値θ、及び観測行列Ｘを表現する。 In accordance with a probabilistic mathematical model of a partial matrix region extraction model (extended Plaid model) to which the Plaid model technique is applied, shown in the following equations (4) to (8), a partial matrix region allocation estimated value Z, an observation parameter estimated value θ , And an observation matrix X.

Ｚ（１,ｉ,ｋ）〜ＢｅＰＢｅｒＰ（α１）・・・（４）
Ｚ（２，ｊ，ｋ）〜ＢｅＰＢｅｒＰ（α２）・・・（５）
θｋ〜Ｎｏｒｍａｌ−Ｗｉｓｈａｒｔ（βｋ）・・・（６）
θ０〜Ｎｏｒｍａｌ−Ｗｉｓｈａｒｔ（β０）・・・（７）
Ｘ（ｉ，ｊ）〜Ｎｏｒｍａｌ（ｍ０＋Σ＿｛ｋ｝ｍｋ，τ）・・・（８） Z (1, i, k) to BePBerP (α1) (4)
Z (2, j, k) to BePBerP (α2) (5)
θk to Normal-Wishart (βk) (6)
θ0-Normal-Wishart (β0) (7)
X (i, j) to Normal (m0 + Σ_ {k} mk, τ) (8)

また、上記の変数記憶部４２の各要素に対応する観測行列Ｘ、部分行列領域ハイパーパラメータα、部分行列領域割当推定値Ｚ、観測パラメータθ、観測ハイパーパラメータβを以下に示す。 The observation matrix X, partial matrix region hyperparameter α, partial matrix region allocation estimated value Z, observation parameter θ, and observation hyperparameter β corresponding to each element of the variable storage unit 42 are shown below.

Ｘ＝｛Ｘ（ｉ，ｊ）｝
α＝（α１，α２）
Ｚ＝（Ｚ１，Ｚ２）
Ｚ１＝｛Ｚ（１，ｉ，ｋ）｝
Ｚ２＝｛Ｚ（２，ｊ，ｋ）｝
θ＝（ｔ，θ０，θ１，θ２，．．．）
θｋ＝（ｍｋ，τｋ）
β＝（β０，βｋ） X = {X (i, j)}
α = (α1, α2)
Z = (Z1, Z2)
Z1 = {Z (1, i, k)}
Z2 = {Z (2, j, k)}
θ = (t, θ0, θ1, θ2,...)
θk = (mk, τk)
β = (β0, βk)

上記（４）式〜（８）式の中の確率分布の詳細については上記非特許文献４、及び非特許文献５を参照できる。
［非特許文献５］：Griffiths and Ghahramani, “The Indian Buffet Process: An Introduction and Reivew”, Journal of Machine Learning Research, Vol. 12, pp. 1185-1224, 2011.
上記（４）式〜（８）式の確率モデルに対して、事前適合度、及びデータ適合度の関数を設計することにより本発明の実施の形態における数学モデルの実装が完了する。設計の方法は任意であるが、確率的、統計的に最適な関数の設計方法はベイズ推定（上記非特許文献４参照）である。 For details of the probability distribution in the above formulas (4) to (8), the above-mentioned Non-Patent Document 4 and Non-Patent Document 5 can be referred to.
[Non-Patent Document 5]: Griffiths and Ghahramani, “The Indian Buffet Process: An Introduction and Reivew”, Journal of Machine Learning Research, Vol. 12, pp. 1185-1224, 2011.
The implementation of the mathematical model in the embodiment of the present invention is completed by designing the functions of the pre-matching degree and the data fitting degree with respect to the probability models of the above formulas (4) to (8). The design method is arbitrary, but a stochastic and statistically optimal function design method is Bayesian estimation (see Non-Patent Document 4 above).

部分行列領域割当推定部３２における推定処理は、ベイズ推定に従えば、事前適合度は他のパラメータ及び変数が与えられたときの「事前分布」、データ適合度は注目する変数の値を決めた場合の「尤度」として定義され、注目するパラメータの推定は「事後分布の計算」として実装できる。 According to the estimation process in the submatrix region allocation estimation unit 32, according to Bayesian estimation, the prior fitness is “prior distribution” when other parameters and variables are given, and the data fitness determines the value of the variable of interest. It is defined as “likelihood” of the case, and estimation of the parameter of interest can be implemented as “calculation of posterior distribution”.

上記実施の形態における部分行列領域割当推定部３２の第１の処理では、以下（９）式に示すに従って、オブジェクトｉが、当該部分行列領域ｋに所属すべきか否かを推定する。 In the first process of the submatrix region allocation estimation unit 32 in the above embodiment, whether or not the object i should belong to the submatrix region k is estimated according to the following equation (9).

ｐ（Ｚ（１，ｉ，ｋ）｜Ｚ１（−ｉｋ），Ｚ２，θ）
∝ｐ（Ｘ｜Ｚ（１，ｉ，ｋ），Ｚ１（−ｉｋ），Ｚ２，θ）×ｐ（Ｚ（１，ｉ，ｋ）｜Ｚ１（−ｉｋ））・・・（９） p (Z (1, i, k) | Z1 (−ik), Z2, θ)
∝p (X | Z (1, i, k), Z1 (−ik), Z2, θ) × p (Z (1, i, k) | Z1 (−ik)) (9)

ここで、Ｚ１（−ｉｋ）はＺ１の中からＺ（１，ｉ，ｋ）の値を取り除いたものである。また、ｐ（Ｘ｜Ｚ（１，ｉ，ｋ），Ｚ１（−ｉｋ），Ｚ２，θ）がデータ適合度であり、ｐ（Ｚ（１，ｉ，ｋ）｜Ｚ１（−ｉｋ））が事前適合度である。 Here, Z1 (−ik) is obtained by removing the value of Z (1, i, k) from Z1. Further, p (X | Z (1, i, k), Z1 (-ik), Z2, [theta]) is the data suitability, and p (Z (1, i, k) | Z1 (-ik)) is Pre-fit degree.

上記実施の形態における部分行列領域割当推定部３２の第２の処理では、以下（１０）式に従って、オブジェクトｉを表現するために新たなＬ個の部分行列領域を生成するべきか否かを推定する。 In the second process of the submatrix region allocation estimation unit 32 in the above embodiment, it is estimated whether or not new L submatrix regions should be generated to represent the object i according to the following equation (10). To do.

ｐ（Ｌ，Ｚ２＊，θ＊）＝ｍｉｎ（１．０，ｐ（Ｘ｜Ｚ１，Ｚ２，Ｚ１＊，Ｚ２＊，Ｌ，θ，θ＊）／ｐ（Ｘ｜Ｚ１，Ｚ２，θ））・・・（１０） p (L, Z2 *, θ *) = min (1.0, p (X | Z1, Z2, Z1 *, Z2 *, L, θ, θ *) / p (X | Z1, Z2, θ)) ... (10)

ここで、Ｚ１＊は、Ｌ個の新たな部分行列領域に対して割り当てられ、所属する第１ドメインのオブジェクトｉの集合であり、Ｚ２＊はＬ個の新たな部分行列領域に対して割り当てられ、所属する第２ドメインのオブジェクトｊの集合であり、θ＊はＬ個の新たな部分行列領域に対する観測パラメータである。 Here, Z1 * is assigned to L new submatrix areas and is a set of objects i of the first domain to which Z1 * belongs, and Z2 * is assigned to L new submatrix areas. , Belonging to the second domain object j, θ * is an observation parameter for L new submatrix regions.

次に、上記の部分行列領域ハイパーパラメータ推定部３４では、以下（１１）式及び（１２）式に従って、第１ドメイン及び第２ドメインに対する部分行列領域ハイパーパラメータを推定する。 Next, the partial matrix region hyperparameter estimation unit 34 estimates the partial matrix region hyperparameters for the first domain and the second domain according to the following equations (11) and (12).

第１ドメインに対する部分行列領域ハイパーパラメータα１は、以下（１１）式に従って推定される。 The submatrix region hyperparameter α1 for the first domain is estimated according to the following equation (11).

ｐ（α１｜Ｚ１）∝ｐ（α１｜Ｚ１）×ｐ（α１）・・・（１１） p (α1 | Z1) ∝p (α1 | Z1) × p (α1) (11)

第２ドメインに対する部分行列領域ハイパーパラメータα２は、以下（１２）式に従って推定される。 The submatrix region hyperparameter α2 for the second domain is estimated according to the following equation (12).

ｐ（α２｜Ｚ２）∝ｐ（α２｜Ｚ２）×ｐ（α２）・・・（１２） p (α2 | Z2) ∝p (α2 | Z2) × p (α2) (12)

ここで事前分布にはガンマ分布を使用している。 Here, a gamma distribution is used for the prior distribution.

上記（９）式〜（１２）式はベイズ推定の１種であるマルコフ連鎖モンテカルロ法を用いた場合の数式となる。各式の具体的な計算方法については、非特許文献４、及び非特許文献５を参照すればよい。 The above formulas (9) to (12) are mathematical formulas when the Markov chain Monte Carlo method, which is one type of Bayesian estimation, is used. Non-patent literature 4 and non-patent literature 5 may be referred to for a specific calculation method of each expression.

また、観測パラメータ推定部３６及び観測ハイパーパラメータ推定部３８については、上記（６）式、（７）式に従って、正規ウィシャート分布と正規分布の共役性を利用した事後分布の計算、パラメータの推定を行う（非特許文献４参照）。 In addition, the observation parameter estimation unit 36 and the observation hyperparameter estimation unit 38 calculate the posterior distribution and the parameter estimation using the conjugate of the normal Wishart distribution and the normal distribution according to the above formulas (6) and (7). (Refer nonpatent literature 4).

以上の数式モデルを構成各部に実装した上で、繰り返し判定部４０では、マルコフ連鎖モンテカルロ法の最大繰り返し回数を終了条件定数として利用し、推定の繰り返し回数が定数に達した時点で推定を終了する。 After the above mathematical model is implemented in each component, the iteration determination unit 40 uses the maximum number of iterations of the Markov chain Monte Carlo method as an end condition constant, and ends the estimation when the number of iterations of estimation reaches the constant. .

表１に実験結果の例を示す。 Table 1 shows examples of experimental results.

実験では、潜在する部分行列領域の数が３あるいは４で設計された人工データを複数準備して、これらの部分行列領域を精度よく抽出できるかどうかを検証した。精度はNormalized Mutual Information（ＮＭＩ）を用いた（非特許文献６参照）。 In the experiment, a plurality of artificial data designed with 3 or 4 latent submatrix regions were prepared, and it was verified whether these submatrix regions could be extracted with high accuracy. For accuracy, Normalized Mutual Information (NMI) was used (see Non-Patent Document 6).

表１中の数値は、計算されたＮＭＩである。値は大きいほど良い結果を表す。最大値は1.0であり、このとき完全に部分行列領域を抽出できたことを表す。比較対象は、事前に抽出すべき部分行列領域の数Ｋを固定して解析する既存手法(非特許文献６)と、この手法を上記実施例の構成で拡張した提案発明手法である。 The numbers in Table 1 are the calculated NMI. Larger values represent better results. The maximum value is 1.0, which means that the submatrix region can be completely extracted at this time. The comparison object is an existing method (Non-Patent Document 6) in which the number K of sub-matrix regions to be extracted is fixed and analyzed, and a proposed invention method in which this method is extended with the configuration of the above embodiment.

［非特許文献６］: Lancichinetti, Fortunato, and Kertesz, “Detecting the Overlapping and Hierarchical Community Structure of Complex Networks”, New Journal of Physics, Vol. 11(3), 2009.
両手法とも、Ｋの初期値Ｋ０を、（１）真の部分行列領域の数、（２）Ｋ０＝１０、（３）Ｋ０＝２０と変更して、既存手法はＫ＝Ｋ０で固定、提案法はＫを自動的に推定させた。表１に示すように、全てのケースで提案法の方が良好な数値を得ている。また、特に人工データ２〜４では提案法の示すＮＭＩはＫの数によらず高い値を維持している。これは、提案手法の構成の効果によって、潜在する部分行列領域の数を自動的に推定しながら部分行列領域を抽出することで正しい部分行列領域の数が予めわからなくても精度よく抽出が可能であることを示している。 [Non-Patent Document 6]: Lancichinetti, Fortunato, and Kertesz, “Detecting the Overlapping and Hierarchical Community Structure of Complex Networks”, New Journal of Physics, Vol. 11 (3), 2009.
In both methods, the initial value K0 of K is changed to (1) the number of true submatrix regions, (2) K0 = 10, (3) K0 = 20, and the existing method is fixed at K = K0. The method automatically estimated K. As shown in Table 1, the proposed method obtains better values in all cases. In particular, in the artificial data 2 to 4, the NMI indicated by the proposed method maintains a high value regardless of the number of K. This is because it is possible to accurately extract the number of correct submatrix regions by extracting the submatrix regions while automatically estimating the number of potential submatrix regions due to the effect of the configuration of the proposed method. It is shown that.

図６は実際に抽出したものを可視化した例である。提案する手法ではきれいに部分行列領域を抽出できているが、既存手法ではうまくいかない場合の例である。 FIG. 6 is an example of visualization of what is actually extracted. This is an example in which the proposed method can extract submatrix regions neatly, but the existing method does not work.

以上説明したように、本発明の実施の形態に係る部分行列領域抽出装置によれば、第１ドメインの各オブジェクト及び第２ドメインの各オブジェクトに対し、部分行列領域毎に、観測行列Ｘ、部分行列領域割当推定値Ｚ、及び部分行列領域ハイパーパラメータαに基づいて、オブジェクトが、部分行列領域に所属すべきか否かを推定して、部分行列領域に所属すべきであると推定された場合にはオブジェクトに対して部分行列領域を割り当てると共に、観測行列Ｘ及び部分行列領域割当推定値Ｚに基づいて、オブジェクトを表現するために新たな部分行列領域を生成するべきか否かを推定し、新たな部分行列領域を生成するべきであると推定された場合には新たな部分行列領域を生成し、オブジェクトに、生成された新たな部分行列領域を割り当てて、部分行列領域の数Ｋを更新し、割り当てられた第１ドメインのオブジェクト数又は第２ドメインのオブジェクト数が所定値以下となる部分行列領域を削除し、推定及び割り当てを予め定めた繰り返し終了条件を満たすまで繰り返すことにより、最適な数の特徴となる部分行列領域を抽出することができる。 As described above, according to the partial matrix region extraction device according to the embodiment of the present invention, the observation matrix X and the partial matrix for each partial matrix region for each object in the first domain and each object in the second domain. When it is estimated that the object should belong to the submatrix area by estimating whether the object should belong to the submatrix area based on the matrix area allocation estimated value Z and the submatrix area hyperparameter α. Assigns a submatrix region to the object, and estimates whether a new submatrix region should be generated to represent the object based on the observation matrix X and the submatrix region allocation estimate Z. If it is estimated that a new submatrix region should be generated, a new submatrix region is generated, and the generated new submatrix region is assigned to the object. Rely, update the number K of submatrix regions, delete submatrix regions where the number of assigned objects in the first domain or the number of objects in the second domain is equal to or less than a predetermined value, and repeat estimation and assignment in advance. By repeating the process until the end condition is satisfied, it is possible to extract a submatrix region that is an optimal number of features.

なお、本発明は、上述した実施の形態に限定されるものではなく、この発明の要旨を逸脱しない範囲内で様々な変形や応用が可能である。 The present invention is not limited to the above-described embodiment, and various modifications and applications can be made without departing from the gist of the present invention.

例えば、上述した実施の形態では、オブジェクトが部分行列領域に割り当てられる度合いを表す事前適合度と、データ適合度とに基づいて、オブジェクトが部分行列領域に所属する可能性を算出する場合を例に説明したが、これに限定されるものではなく、オブジェクトが部分行列領域に割り当てられない度合いを表す事前適合度と、データ適合度とに基づいて、オブジェクトが部分行列領域に所属しない可能性を算出するようにしてもよい。この場合には、オブジェクトが部分行列領域に所属しない可能性に応じて、部分行列領域割当推定値Ｚを更新すればよい。 For example, in the above-described embodiment, an example in which the possibility that an object belongs to a submatrix area is calculated based on the prior fitness indicating the degree to which the object is assigned to the submatrix area and the data fitness is taken as an example. As described above, but not limited to this, the possibility that the object does not belong to the submatrix area is calculated based on the degree of prior fit indicating the degree to which the object is not assigned to the submatrix area and the data suitability. You may make it do. In this case, the partial matrix area allocation estimated value Z may be updated according to the possibility that the object does not belong to the partial matrix area.

１０入力部
２０演算部
２６初期化部
２８入力データ記憶部
３０変数推定部
３２部分行列領域割当推定部
３４部分行列領域ハイパーパラメータ推定部
３６観測パラメータ推定部
３８観測ハイパーパラメータ推定部
４０判定部
４２変数記憶部
５０出力部
１００部分行列領域抽出装置 DESCRIPTION OF SYMBOLS 10 Input part 20 Calculation part 26 Initialization part 28 Input data storage part 30 Variable estimation part 32 Submatrix area | region allocation estimation part 34 Partial matrix area | region hyperparameter estimation part 36 Observation parameter estimation part 38 Observation hyperparameter estimation part 40 Determination part 42 Variable Storage unit 50 Output unit 100 Submatrix region extraction device

Claims

A partial matrix region extraction device that extracts a characteristic partial matrix region from an observation matrix composed of observation values for each pair of objects in the first domain and each object in the second domain,
The number of the submatrix regions and the submatrix region indicating whether each submatrix region is allocated to each object in the first domain and each object in the second domain by the number of the submatrix regions. An initialization unit for initializing the allocation estimated value;
For each object in the first domain and each object in the second domain, the observation matrix, the partial matrix region allocation estimate, and the object for each object When it is estimated that the object should belong to the submatrix area by estimating whether the object should belong to the submatrix area based on the submatrix area hyperparameter related to the allocation of the submatrix area Assigns the submatrix region to the object and determines whether to generate a new submatrix region to represent the object based on the observation matrix and the submatrix region allocation estimate. If it is estimated that a new partial matrix region should be generated, the new partial row Generates a region, to the object, it assigns the generated new said sub-matrix area, updating the number of the partial matrix region,
A submatrix region allocation estimation unit that deletes the submatrix region in which the number of assigned objects in the first domain or the number of objects in the second domain is a predetermined value or less, and updates the number of submatrix regions;
An iterative determination unit that repeats the estimation and allocation by the submatrix region allocation estimation unit until a predetermined repetition end condition is satisfied;
A submatrix region extraction apparatus including:

The submatrix region allocation estimation unit sets the submatrix region allocation estimation value for each submatrix region that exists as many as the number of submatrix regions for each object of the first domain and each object of the second domain. Estimated based on a pre-fit degree indicating the degree to which the object is assigned or not assigned to the sub-matrix region, the observation matrix, the sub-matrix region assignment estimate, and the sub-matrix region hyperparameter The object may or may not belong to the submatrix region based on the likelihood that the object is assigned to the submatrix region or the data fitness that represents the likelihood that the object is not assigned. By calculating the possibility, the object is located in the submatrix region. Submatrix region extraction apparatus according to claim 1 to estimate whether or not to not.

The submatrix region allocation estimation unit is configured to estimate, for each object in the first domain, a pre-fit degree related to the number of required submatrix regions estimated based on the partial matrix region allocation estimation value, and A pre-fit degree indicating the degree to which the object of the second domain is assigned to the generated sub-matrix area; a pre-fit degree related to observation parameters for observation values in the new sub-matrix area; and Calculate the possibility of the number of submatrix regions to which the new submatrix region is added based on the data suitability representing the degree to which the observation matrix can be well explained by increasing the number To estimate whether to generate a new submatrix region to represent the object,
For each object in the second domain, a pre-fit degree related to the number of required submatrix regions estimated based on the submatrix region allocation estimate, and a newly generated submatrix region, The observation matrix is obtained by increasing the pre-adaptation degree indicating the degree to which the objects of the first domain are allocated, the pre-adaptation degree regarding the observation parameter for the observation value in the new sub-matrix region, and the number of the sub-matrix regions. To express the object by calculating the possibility of the number of the submatrix regions to which the new submatrix region is added based on the data suitability representing the degree to which can be well explained The partial matrix region extraction apparatus according to claim 2, wherein whether or not a new partial matrix region is to be generated is estimated.

A partial matrix region hyperparameter estimation unit for estimating the partial matrix region hyperparameter based on the partial matrix region allocation estimation value;
The initialization unit further initializes the partial matrix region hyperparameter,
The said repetition determination part repeats the estimation and allocation by the said partial matrix area | region allocation estimation part, and the estimation by the said partial matrix area | region hyper parameter estimation part until it satisfies predetermined repetition termination conditions. Submatrix region extraction device.

The program for functioning a computer as each part of the partial matrix area | region extraction apparatus of any one of Claims 1-4.