JP2010250377A

JP2010250377A - Link prediction system, method, and program

Info

Publication number: JP2010250377A
Application number: JP2009096248A
Authority: JP
Inventors: Raymond Harry Putra Rudy; ルディ・レイモンド・ハリー・プテラ; Hisatsugu Kajima; 久嗣鹿島
Original assignee: International Business Machines Corp
Current assignee: International Business Machines Corp
Priority date: 2009-04-10
Filing date: 2009-04-10
Publication date: 2010-11-04
Anticipated expiration: 2029-04-10
Also published as: JP5225183B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide a scalable link prediction technology that can cope with the number of dozens to millions nodes. <P>SOLUTION: At first, similarity matrices W<SB>Z</SB>, W<SB>Y</SB>, W<SB>X</SB>, are low-rank approximated by a technology such as incomplete Cholesky decomposition. Then, the eigenvalue decomposition of low-rank approximate matrices of the similarity matrices W<SB>Z</SB>, W<SB>Y</SB>, W<SB>X</SB>is performed. Schematically, low-rank approximation is the approximation of one matrix by a product of two rectangular matrices. Here, low-rank approximation facilitates the calculation of eigenvalue decomposition. In the next step, eigenvalues of obtained low-rank approximate matrices of W<SB>Z</SB>, W<SB>Y</SB>, W<SB>X</SB>are used to constitute normalized Laplacian L. Since the normalized Laplacian L is obtained in this manner, V=<SB>Z</SB>, V=<SB>Y</SB>, V=<SB>X</SB>as matrices with respective eigenvectors of low-rank approximate matrices of W<SB>Z</SB>, W<SB>Y</SB>, W<SB>X</SB>arranged therein and L are used to favorably calculate the inverse matrix of the part of (σL+I). When the inverse matrix of (σL+I) is obtained, F can be calculated due to vec(F)=(σL+I)<SP>-1</SP>vec(F<SP>*</SP>). <P>COPYRIGHT: (C)2011,JPO&INPIT

Description

この発明は、複数の対象の間のリンクを予測するための技術に関するものである。ここでいうリンクとは、例えば、対象の間の関係の強さである。 The present invention relates to a technique for predicting a link between a plurality of objects. A link here is the strength of the relationship between objects, for example.

従来より、複数の対象、または、エンティティの間のリンク、あるいは関係の強さを予測したいという要望がある。例えば、ユーザと商品の間の購買関係を予測する、協調フィルタリングという技術があり、本出願人に係る、特開２０００−１４８８６４号などに記載されている。 Conventionally, there is a desire to predict the strength of links or relationships between multiple objects or entities. For example, there is a technique called collaborative filtering that predicts a purchase relationship between a user and a product, which is described in Japanese Patent Application Laid-Open No. 2000-148864 related to the present applicant.

あるいは、異なるタンパク質の間の相互作用の有無を予測する、生体ネットワーク予測と呼ばれる分野も知られている。 Alternatively, a field called biological network prediction that predicts the presence or absence of an interaction between different proteins is also known.

これらは例えば、いくつかの対象物の間のペアについて、正解が与えられており、これらの値に基づき、正解未知のペアについてリンクを強さを予測する、いわゆる教師つき学習として、コンピュータ・システムによって解かれていた。 These are, for example, computer systems as so-called supervised learning in which correct answers are given for pairs between several objects, and based on these values the strength of links is predicted for the unknown unknown pairs. It was solved by.

その後、対象物のペアの間の単一のリンクではなく、複数種類のリンク、すなわち、マルチタイプ・リンクを考慮したモデルも提案されてきた。例えば、協調フィルタリングにおいては、対象物のペアの間の購買関係だけでなく、評価、商品情報の閲覧なども考慮する、という次第である。 Since then, models have been proposed that consider multiple types of links, i.e. multi-type links, rather than a single link between pairs of objects. For example, in collaborative filtering, not only the purchase relationship between a pair of objects but also evaluation and browsing of product information are taken into consideration.

同様に、生体ネットワーク予測では、作用の種類や、作用が起こる環境などの複数種類のリンクが考えられる。このようなマルチタイプ・リンクの予測を実現するためには、コンピュータの処理が複雑になる。 Similarly, in biological network prediction, a plurality of types of links such as the type of action and the environment in which the action occurs can be considered. In order to realize such multi-type link prediction, the computer processing becomes complicated.

ところで、ソーシャル・ネットワーク・サービス（ＳＮＳ）を運営する会社や、ブログを運営する会社は、一般的に、数十万人から数百万人のユーザを抱え、そのユーザ間の活動のログが、システムのサーバに蓄積される。 By the way, a company that operates a social network service (SNS) or a company that operates a blog generally has hundreds of thousands to millions of users, and logs of activities between the users are as follows. Accumulated on the system server.

そのログに記録される活動としては、ブログを読む、ブログを書く、コメントを入れる、ブログをブックマークに登録する、他のユーザと友達になる、有名人のブログのサポーターになる、などがある。 The activities recorded in the log include reading a blog, writing a blog, adding a comment, bookmarking a blog, making friends with other users, and becoming a celebrity blog supporter.

本出願人に係る、特願２００７−３３６９１９号明細書には、ＳＮＳのようなオンライン・コミュニティにおいて、ブログを書く、ニュースを読む、メッセージを読むなどの複数の行動の指標を統一的に扱うことを可能とする行動マトリックスり概念が記載されている。 In the specification of Japanese Patent Application No. 2007-336919 related to the present applicant, in an online community such as SNS, a plurality of behavioral indicators such as writing a blog, reading a news, reading a message, etc. are handled uniformly. It describes the concept of action matrix that enables

そこで、それらの情報が記録されたログに基づき、ユーザが読んだ、コメントしたデータから、有名人のブログを推薦したいとか、ユーザが気に入ったプログのデータから、有名人のブログを推薦したいとか、有名人のブログに関わる活動から、有名人へのファン度を測りたいなどの要望が出てきている。 Therefore, based on the log in which the information is recorded, the user wants to recommend a celebrity blog from the commented data, the user wants to recommend a celebrity blog, or the celebrity blog. From activities related to blogging, requests such as measuring the degree of fanfare to celebrities have come out.

これは、ある意味、マルチタイプ・リンクの予測の問題に帰着される。ところが、ＳＮＳやブログ・サイトの場合、ユーザの数が数十万人から数百万人と膨大で、協調フィルタリングや生体ネットワーク予測で使われている一般のアルゴリズムがＳＮＳやブログ・サイトの規模に十分にスケーラブルでない、という問題がある。 This, in a way, results in the problem of multi-type link prediction. However, in the case of SNS and blog sites, the number of users is enormous, from hundreds of thousands to millions, and general algorithms used in collaborative filtering and biological network prediction are the scale of SNS and blog sites. There is a problem that it is not scalable enough.

ところで、従来より、機械学習に基づくリンク予測問題へのアプローチが知られているが、これは、(1) 任意の２構成要素、または，エンティティ間のリンクの有無の予測を行う、ペアワイズ予測モデルと、(2) ネットワーク構造全体のモデルである、関係ネットワークモデルとがある。 By the way, the approach to the link prediction problem based on machine learning has been known so far. This is (1) a pair-wise prediction model that predicts the presence or absence of a link between two arbitrary components or entities. And (2) a relational network model that is a model of the entire network structure.

ペアワイズ予測モデルの方が、通常の教師付き予測の枠組みに帰着できるため、比較的大きな問題が扱いやすいという利点がある。そこで、本願発明者のうちの一人を含む複数の研究者は、ペアワイズ予測モデルにおいて、ペアワイズ特徴ベース推論を用いて、手法を開発した。 The pair-wise prediction model has the advantage that relatively large problems are easier to handle because it can be reduced to a normal supervised prediction framework. Therefore, a plurality of researchers including one of the inventors of the present application developed a method using pair-wise feature-based reasoning in a pair-wise prediction model.

この手法は、(a) 半教師付き予測のリンク予測方法であること、(b) マルチタイプリンクの同時予測であること、(c) ノード類似度行列のクロネッカー積またはクロネッカー和をもちいた新規なペアワイズ類似を利用すること、(d) 共役勾配法により高速に計算可能な可能であること、を特徴とする。 This method consists of (a) a link prediction method for semi-supervised prediction, (b) simultaneous prediction of multitype links, and (c) a novel method using a Kronecker product or Kronecker sum of node similarity matrices. It is characterized by using pairwise similarity and (d) being capable of high-speed calculation by the conjugate gradient method.

この手法では、（ノード、ノード、リンクタイプ）の３つの組のラベルについて、組の第１ノード集合をＸ、組の第２ノード集合をＹ、そして、組のリンクタイプの集合をＺとする場合、「類似したノード同士は、同じラベル（性質）をもつ可能性が高い」という仮説をもとに推論を行う。このリンク伝播原理に基づく推論を、最適化問題の目的関数として表現し、適当な変数で偏微分してゼロとおくと、解くべき式は、次のようになる。
(σＬ + Ｉ)vec(Ｆ) = vec(Ｆ^*) ・・・式１ In this method, for the three sets of labels (node, node, link type), the first node set of the set is X, the second node set of the set is Y, and the set of link types of the set is Z. In this case, inference is performed based on the hypothesis that “similar nodes are likely to have the same label (property)”. If the inference based on the link propagation principle is expressed as an objective function of the optimization problem and is partially differentiated with an appropriate variable and set to zero, the equation to be solved is as follows.
(σL + I) vec (F) = vec (F ^* ) Equation 1

ここで、σは、例えば0,01のような小さい実数
Ｌは、ラプラシアン行列で、Ｌ ≡ Ｄ - Ｗ
ここでＷは、３つ組の集合Ｘ、Ｙ、Ｚの間の類似度行列Ｗ_Z，Ｗ_Y，Ｗ_Xのクロネッカー積またはクロネッカー和
Ｄは、Ｄ_Z，Ｄ_Y，Ｄ_Xのクロネッカー積またはクロネッカー和で、Ｄ_Z，Ｄ_Y，Ｄ_Xはそれぞれ、Ｗ_Z，Ｗ_Y，Ｗ_Xの各行の和を対角成分にもつような対角行列
Ｉは、所定のサイズの単位行列
Ｆ^*は、その一部に既知の関係を含む、（ノード、ノード、リンクタイプ）の３つの組を成分にもつ３階テンソル
Ｆは、Ｆ^*と同じ形をしていて、計算結果の値が格納される未知数の３階テンソルである。 Here, σ is a small real number such as 0,01, for example, L is a Laplacian matrix, and L ≡ D − W
Where W is the Kronecker product or Kronecker sum of similarity matrices W _Z , W _Y , W _X between triplets X, Y, Z Z is the Kronecker product of D _Z , D _Y , D _X or The Kronecker sum, D _Z , D _Y , D _X is a diagonal matrix I having the sum of each row of W _Z , W _Y , W _{X as} a diagonal component I is a unit matrix F ^* of a predetermined size includes a known relationship to a portion thereof (node, node, link type) 3-order tensor F with three sets the components is not the same form as F ^*, the value of the calculation result is stored An unknown third-order tensor.

この手法では、式１は共役勾配法を用いて解かれる。 In this approach, Equation 1 is solved using the conjugate gradient method.

以上のより詳細な議論は、人工知能学会第73回人工知能基本問題研究会 2009年3月13日の、リンク伝播：リンク予測のための半教師付き学習法というトピックで発表され、また予稿集として配布された資料に記述されている。 The above detailed discussion was presented on the topic of link propagation: a semi-supervised learning method for link prediction on March 13, 2009, the 73rd meeting of the Japanese Society for Artificial Intelligence. It is described in the distributed materials.

特願２００７−３３６９１９号明細書Japanese Patent Application No. 2007-336919

人工知能学会第73回人工知能基本問題研究会 2009年3月13日配布の予稿集における、鹿島久嗣、加藤毅、山西芳裕、杉山将、津田宏治「リンク伝播：リンク予測のための半教師付き学習法」と題する発表と、配布された予稿集の論文The 73rd Annual Meeting of the Japanese Society for Artificial Intelligence The Society for Artificial Intelligence Basic Research Meeting March 13, 2009, Hisashi Kashima, Kaoru Kato, Yoshihiro Yamanishi, Masaru Sugiyama, Koji Tsuda Announcement titled `` Learning Method '' and disseminated proceedings

上記論文で提案された手法によって、マルチタイプ・リンク予測問題が、従来よりも高速に解けるようになったが、スケーラビリティの点で、十分でないことが分かった。すなわち、試算によると、上記論文で提案された手法では、リンクタイプが１種類の場合、一方のノード集合ＸのサイズをM、他方のノード集合ＹのサイズをNとしたとき、計算量は、
O(M³N²+M²N³)のオーダーであり、計算のために確保しなくてはならないコンピュータのメモリ量は、O(MN)のオーダーである。すると、ＳＮＳやブログ・サイトなどのように、ユーザの数や文書数が数十万から数百万に亘る場合、リンク予測を行なうことが困難になってくる。 The method proposed in the above paper has solved the multi-type link prediction problem faster than before, but it has been found that it is not sufficient in terms of scalability. That is, according to a trial calculation, in the method proposed in the above paper, when there is one link type, when the size of one node set X is M and the size of the other node set Y is N, the amount of calculation is
The order of O (M ³ N ² + M ² N ³ ), and the amount of computer memory that must be reserved for calculation is the order of O (MN). Then, when the number of users and the number of documents ranges from hundreds of thousands to millions, such as SNS and blog site, it becomes difficult to perform link prediction.

従って、この発明の目的は、ＳＮＳやブログ・サイトなどのように、莫大な数のオブジェクトをもつ集合に対して、コンピュータにより、スケーラブルにリンク予測を行なうことを可能とする手法を提供することにある。 Accordingly, an object of the present invention is to provide a technique that enables a computer to perform link prediction in a scalable manner with respect to a set having an enormous number of objects such as an SNS and a blog site. is there.

上記目的を達成するため、本願発明は、上記の式１、すなわち、
(σＬ + Ｉ)vec(Ｆ) = vec(Ｆ^*)から出発し、正規化ラプラシアン行列Ｌは、クロネッカー積の場合、以下の形式で

そして、クロネッカー和の場合、以下の形式で

で表されることを仮定する。ここで、
D_Z ^-1/2Ｗ_ZD_Z ^-1/2、D_Y ^-1/2Ｗ_YD_Y ^-1/2、D_X ^-1/2Ｗ_XD_X ^-1/2は、それぞれ一般性を失うことなく表記の簡略化のため新しい行列Ｗ_Z、Ｗ_Y、W_Xとみなすことができることから、本願発明では以降ラプラシアン行列Ｌを以下の一般形式で表すことにする。

ただし、演算子は下記の数６の通りで、クロネッカー積とクロネッカー和は統一的に記述される。また、クロネッカー積の場合、c=1.0で、クロネッカー和の場合、c=3.0となるが、cの値は演算子によらず、データから学習して、最適な値を効率的に求めることも可能である。 In order to achieve the above object, the present invention provides the above formula 1, that is,
Starting from (σL + I) vec (F) = vec (F ^* ), the normalized Laplacian matrix L is, in the case of Kronecker product, of the form

And in the case of Kronecker sum,

It is assumed that here,
D _Z ^-1/2 W _Z D _Z ^-1/2 , D _Y ^-1/2 W _Y D _Y ^-1/2 , D _X ^-1/2 W _X D _X ^-1/2 Since it can be regarded as new matrices W _Z , W _Y , W _{X in} order to simplify the notation without losing them, the Laplacian matrix L will be represented in the following general form in the present invention.

However, the operator is as in Equation 6 below, and the Kronecker product and the Kronecker sum are described uniformly. In the case of the Kronecker product, c = 1.0, and in the case of the Kronecker sum, c = 3.0. However, the value of c is learned from the data regardless of the operator, and the optimum value can be obtained efficiently. Is possible.

本願発明の１つの側面によれば、式１を、多数のオブジェクトをもつ集合に対してもスケーラブルに解くことを可能とする、厳密解を得るための、コンピュータの処理によって実施される手法が提供される。 According to one aspect of the present invention, there is provided a technique implemented by computer processing to obtain an exact solution that enables Equation 1 to be solved in a scalable manner even for a set having a large number of objects. Is done.

この手法では、最初に、類似度行列Ｗ_Z，Ｗ_Y，Ｗ_Xがそれぞれ固有値分解される。 In this method, first, the similarity matrices W _Z , W _Y , and W _X are decomposed into eigenvalues, respectively.

次の段階では、得られたＷ_Z，Ｗ_Y，Ｗ_Xの固有値を用いて、行列Ｌが構成される。Ｗ_Z，Ｗ_Y，Ｗ_Xがクロネッカー積である場合と、クロネッカー和である場合とで、行列Ｌを構成する式は異なる。 In the next stage, the matrix L is constructed using the obtained eigenvalues of W _Z , W _Y , and W _X. The expressions constituting the matrix L differ depending on whether W _Z , W _Y , and W _X are Kronecker products or Kronecker sums.

こうして行列Ｌが得られると、Ｗ_Z，Ｗ_Y，Ｗ_Xのそれぞれの固有ベクトルを並べた行列であるＶ_Z，Ｖ_Y，Ｖ_XとＬを用いて、(σＬ + Ｉ)の部分の逆行列が有利に計算される。(σＬ + Ｉ)の逆行列が求まると、
vec(Ｆ) = (σＬ + Ｉ)^-1vec(Ｆ^*)であることから、Ｆが計算される。この厳密解では、計算量のオーダーは、O(M³+N³)であり、上述の手法のO(M³N²+M²N³)よりも、実質的に計算量は低減される。 When the matrix L is obtained in this way, the inverse matrix of the (σL + I) portion is obtained by using V _Z , V _Y , V _X and L, which are matrices in which the eigenvectors of W _Z , W _Y , and W _X are arranged. Is advantageously calculated. When the inverse matrix of (σL + I) is obtained,
Since vec (F) = (σL + I) ⁻¹ vec (F ^* ), F is calculated. In this exact solution, the order of complexity is O (M ³ + N ³ ), which is substantially less than O (M ³ N ² + M ² N ³ ) in the above method. .

この発明の第２の側面によれば、上記厳密解の精度をあまり犠牲にすることなく、より少ない計算量と、コンピュータのメモリ量で済む、コンピュータの処理によって近似解をもとめる手法が提供される。 According to the second aspect of the present invention, there is provided a method for obtaining an approximate solution by computer processing, which requires only a small amount of calculation and a memory amount of a computer without sacrificing the accuracy of the exact solution. .

この手法では、類似度行列Ｗ_Z，Ｗ_Y，Ｗ_Xが先ず、不完全コレスキー分解などの技法によって、低ランク近似される。また、この手法では、最終的な解の一歩手前の解をコンパクトにコンピュータのメモリに保持しておくことで、大きい解の必要な部分を、オンデマンドで高速に取ってくることが可能となる。 In this method, the similarity matrices W _Z , W _Y , and W _X are first approximated to a low rank by a technique such as incomplete Cholesky decomposition. Also, with this method, the solution that is one step before the final solution is stored in the computer memory in a compact manner, so that a portion requiring a large solution can be quickly retrieved on demand. .

次に、類似度行列Ｗ_Z，Ｗ_Y，Ｗ_Xの低ランク近似行列の固有値分解を行なう。低ランク近似とは、図式的には、１つの行列を、長方形の２つの行列の積で近似することである。ここで低ランク近似化されていることで、固有値分解の計算が容易になっている。 Next, eigenvalue decomposition of the low rank approximation matrix of the similarity matrix W _Z , W _Y , W _X is performed. The low-rank approximation is to approximate one matrix by the product of two rectangular matrices. Here, the low-rank approximation facilitates calculation of eigenvalue decomposition.

次の段階では、得られたＷ_Z，Ｗ_Y，Ｗ_Xの低ランク近似行列の固有値を用いて、行列Ｌ~が構成される。Ｗ_Z，Ｗ_Y，Ｗ_Xがクロネッカー積である場合と、クロネッカー和である場合とで、行列Ｌ~を構成する式は異なる。 In the next stage, the matrix L˜ is constructed using the eigenvalues of the obtained low rank approximation matrix of W _Z , W _Y , W _X. The expressions constituting the matrix L˜ differ depending on whether W _Z , W _Y , and W _X are Kronecker products or Kronecker sums.

こうして行列Ｌ~が得られると、Ｗ_Z，Ｗ_Y，Ｗ_Xの低ランク近似行列のそれぞれの固有ベクトルを並べた行列であるＶ~_Z，Ｖ~_Y，Ｖ~_XとＬ~を用いて、(σＬ~ + Ｉ)の部分の逆行列が有利に計算される。(σＬ~ + Ｉ)の逆行列が求まると、
vec(Ｆ) = (σＬ~ + Ｉ)^-1vec(Ｆ^*)であることから、Ｆが計算される。 Thus the matrix L ~ is obtained, W _Z, W _Y, using W _X of each eigenvector is a matrix obtained by arranging V ~ _Z a low rank approximation matrix, V ~ _Y, a ~ V ~ _X and L, The inverse matrix of the part (σL ~ + I) is advantageously calculated. When the inverse matrix of (σL ~ + I) is obtained,
Since vec (F) = (σL˜ + I) ⁻¹ vec (F ^* ), F is calculated.

この近似解の場合、リンクタイプが１種類とすると、Ｗ_Xの長方形行列分解の短い方のサイズをd_M、Ｗ_Yの長方形行列分解の短い方のサイズをd_Nとすると、
計算量は、O(Md_M ² + Nd_N ² + d_M ³ + d_N ³)
メモリは、O(Nd_M + Md_N + d_M ² + d_N ²)
のように、低減される。 In the case of this approximate solution, if the link type is one, the shorter size of the W _X rectangular matrix decomposition is d _M , and the shorter size of the W _Y rectangular matrix decomposition is d _N.
The computational complexity is O (Md _M ² + Nd _N ² + d _M ³ + d _N ³ )
Memory is O (Nd _M + Md _N + d _M ² + d _N ² )
As shown in FIG.

基本的には、d_M,d_Nは、リンクの予測精度を所望に保つ範囲で、d_M << M, d_N << Nとなるように選ばれる。 Basically, d _M and d _N are selected so that d _M << M and d _N << N within a range in which the prediction accuracy of the link is maintained as desired.

本発明によれば、計算量とコンピュータの必要なメモリ・サイズを大幅に削減して、数十万乃至数百万のノードの数に対応可能な、スケーラブルなリンク予測技法が提供され、ＳＮＳ、ブログ・サイトなどのオンライン・コミュニティ・システムへのリンク予測技法の応用への途が拓かれる。 According to the present invention, there is provided a scalable link prediction technique capable of handling hundreds of thousands to millions of nodes by greatly reducing the amount of computation and the required memory size of a computer. It opens the way to the application of link prediction techniques to online community systems such as blog sites.

オンライン・コミュニティ・サーバに、インターネットを介して、クライアント・コンピュータが接続されることを示す図である。It is a figure which shows that a client computer is connected to an online community server via the internet. クライアント・コンピュータのハードウェア構成を示す図である。It is a figure which shows the hardware constitutions of a client computer. オンライン・コミュニティ・サーバのハードウェア構成を示す図である。It is a figure which shows the hardware constitutions of an online community server. 本発明の実施例の機能論理ブロック図である。It is a functional logic block diagram of the Example of this invention. ２つの集合の間のリンクの例を示す図である。It is a figure which shows the example of the link between two sets. ユーザ情報の例を示す図である。It is a figure which shows the example of user information. 本発明の処理の全体の処理の概要フローチャートを示す図である。It is a figure which shows the outline | summary flowchart of the whole process of the process of this invention. パラメータを設定処理のフローチャートを示す図である。It is a figure which shows the flowchart of a parameter setting process. 類似度行列の例を示す図である。It is a figure which shows the example of a similarity matrix. 類似度行列の例を示す図である。It is a figure which shows the example of a similarity matrix. 類似度行列の例を示す図である。It is a figure which shows the example of a similarity matrix. リンク予測の方程式を解く処理のフローチャートである。It is a flowchart of the process which solves the equation of link prediction. リンク予測の厳密解を求める処理のフローチャートである。It is a flowchart of the process which calculates | requires the exact solution of link prediction. リンク予測の近似解を求める処理のフローチャートである。It is a flowchart of the process which calculates | requires the approximate solution of link prediction.

以下、図面を参照して、本発明の実施例を説明する。特に断わらない限り、同一の参照番号は、図面を通して、同一の対象を指すものとする。また、以下で説明するのは、本発明をオンライン・コミュニティ・システム上で使用する実施例であるが、これは本発明の一実施形態であり、この発明を、この実施例で説明する内容に限定する意図はなく、協調フィルタリング、生体ネットワーク予測など、任意の対象の間のリンク予測を行なう用途な使用可能であることを理解されたい。 Embodiments of the present invention will be described below with reference to the drawings. Unless otherwise noted, the same reference numerals refer to the same objects throughout the drawings. Also, what is described below is an example in which the present invention is used on an online community system, but this is an embodiment of the present invention, and the present invention will be described in the contents described in this example. It should be understood that the present invention can be used for applications that perform link prediction between arbitrary objects, such as collaborative filtering and biological network prediction, without intending to be limited.

図１において、オンライン・コミュニティ・サーバ１０２には、インターネット１０４を介して、複数のクライアント・コンピュータ１０６ａ、１０６ｂ・・・１０６ｚが接続されている。図１のシステムにおいては、クライアント・コンピュータのユーザは、Ｗｅｂブラウザを通じて、インターネット１０４の回線を介して、オンライン・コミュニティ・サーバ１０２に、ログインする。具体的には、所定のＵＲＬをＷｅｂブラウザに打ち込んで、所定のページを表示する。なお、Ｗｅｂブラウザではなく、所定の専用クライアント・アプリケーション・プログラムを使ってログインするようにしてもよい。 1, a plurality of client computers 106a, 106b,... 106z are connected to the online community server 102 via the Internet 104. In the system of FIG. 1, a user of a client computer logs in to the online community server 102 through a line of the Internet 104 through a Web browser. Specifically, a predetermined URL is typed into a Web browser and a predetermined page is displayed. The login may be performed using a predetermined dedicated client application program instead of the Web browser.

ログインに当たっては、クライアント・コンピュータのユーザは、与えられたユーザＩＤと、それに関連付けられたパスワードを用いる。クライアント・コンピュータのユーザは、一旦ログインすると、オンライン・コミュニティ内で、日記を書いたり、アクセスを許可されている他人の日記を閲覧してコメントを書いたり、ニュースを見たり、気の合った仲間同士でグループを作成したり、チャットしたり、趣味のコミュニティを検索したり、などの活動を行う。 When logging in, the user of the client computer uses a given user ID and a password associated therewith. Once a client computer user logs in, he or she writes a diary in the online community, browses the diary of another person who is allowed access, writes a comment, sees the news, and is a good friend Activities such as creating groups with each other, chatting, and searching for hobby communities.

次に、図２を参照して、図１で参照番号１０６ａ、１０６ｂ・・・１０６ｚのように示されているクライアント・コンピュータのハードウェア・ブロック図について、説明する。図２において、クライアント・コンピュータは、メイン・メモリ２０６、ＣＰＵ２０４、ＩＤＥコントローラ２０８をもち、これらは、バス２０２に接続されている。バス２０２には更に、ディスプレイ・コントローラ２１４と、通信インターフェース２１８と、ＵＳＢインターフェース２２０と、オーディオ・インターフェース２２２と、キーボード・マウス・コントローラ２２８が接続されている。ＩＤＥコントローラ２０８には、ハードディスク・ドライブ（ＨＤＤ）２１０と、ＤＶＤドライブ２１２が接続されている。ＤＶＤドライブ２１２は、必要に応じて、ＣＤ−ＲＯＭやＤＶＤから、プログラムを導入するために使用する。ディスプレイ・コントローラ２１４には、好適には、ＬＣＤ画面をもつディスプレイ装置２１６が接続されている。ディスプレイ装置２１６には、Ｗｅｂブラウザを通じて、オンライン・コミュニティの画面が表示される。 Next, referring to FIG. 2, a hardware block diagram of the client computer indicated by reference numerals 106a, 106b... 106z in FIG. In FIG. 2, the client computer has a main memory 206, a CPU 204, and an IDE controller 208, which are connected to the bus 202. In addition, a display controller 214, a communication interface 218, a USB interface 220, an audio interface 222, and a keyboard / mouse controller 228 are connected to the bus 202. A hard disk drive (HDD) 210 and a DVD drive 212 are connected to the IDE controller 208. The DVD drive 212 is used for introducing a program from a CD-ROM or DVD as necessary. A display device 216 having an LCD screen is preferably connected to the display controller 214. The display device 216 displays an online community screen through a Web browser.

ＵＳＢインターフェース２２０には、必要に応じて、専用コントローラ、加速度センサ・デバイスなどのデバイスを接続をすることができる。これらのデバイスは、オンライン・コミュニティ内での操作性を向上するために使用することができる。 Devices such as a dedicated controller and an acceleration sensor / device can be connected to the USB interface 220 as necessary. These devices can be used to improve operability within the online community.

キーボード・マウス・コントローラ２２８には、キーボード２３０と、マウス２３２が接続されている。キーボード２３０は、典型的には、オンライン・コミュニティ内で、チャットのメッセージを書いたり、検索したいコミュニティ内容を記述したするために使用される。マウス２３２は、オンライン・コミュニティ内で、リンクをクリックしてニュースを読んだり、メニューから動作を選択し実行したり、読みたい日記を選んだりするために使用される。 A keyboard 230 and a mouse 232 are connected to the keyboard / mouse controller 228. The keyboard 230 is typically used in an online community to write a chat message or describe community content to be searched. The mouse 232 is used in the online community to click on a link to read news, select and execute an action from a menu, or select a diary to read.

ＣＰＵ２０４は、例えば、３２ビット・アーキテクチャまたは６４ビット・アーキテクチャに基づく任意のものでよく、インテル社のＰｅｎｔｉｕｍ（インテル・コーポレーションの商標）４、Ｃｏｒｅ（商標）２Ｄｕｏ、ＡＭＤ社のＡｔｈｌｏｎ（商標）などを使用することができる。 The CPU 204 may be, for example, any one based on a 32-bit architecture or a 64-bit architecture, such as Intel Pentium (trademark of Intel Corporation) 4, Core (trademark) 2 Duo, AMD Athlon (trademark), or the like. Can be used.

ハードディスク・ドライブ２１０には、少なくとも、オペレーティング・システムと、オペレーティング・システム上で動作するＷｅｂブラウザ（図示しない）が格納されており、システムの起動時に、オペレーティング・システムは、メインメモリ２０６にロードされる。オペレーティング・システムは、ＷｉｎｄｏｗｓＸＰ（マイクロソフト・コーポレーションの商標）、ＷｉｎｄｏｗｓＶｉｓｔａ（マイクロソフト・コーポレーションの商標）、Ｌｉｎｕｘ（Linus Torvaldsの商標）などを使用することができる。 The hard disk drive 210 stores at least an operating system and a web browser (not shown) that runs on the operating system, and the operating system is loaded into the main memory 206 when the system starts up. . As the operating system, Windows XP (a trademark of Microsoft Corporation), Windows Vista (a trademark of Microsoft Corporation), Linux (a trademark of Linus Torvalds), or the like can be used.

通信インターフェース２１８は、オペレーティング・システムが提供するＴＣＰ／ＩＰ通信機能を利用して、イーサネット（商標）・プロトコルなどにより、オンライン・コミュニティ・サーバ１０２と、通信する。 The communication interface 218 uses the TCP / IP communication function provided by the operating system to communicate with the online community server 102 using the Ethernet (trademark) protocol or the like.

図３は、オンライン・コミュニティ・プロバイダ側のハードウェア構成の概要ブロック図である。図３に示すように、クライアント・コンピュータ１０６ａ、１０６ｂ・・・１０６ｚは、インターネット１０４を経由して、オンライン・コミュニティ・サーバ１０２の通信インターフェース３０２に接続される。通信インターフェース３０２はさらに、バス３０４に接続され、バス３０４には、ＣＰＵ３０６、主記憶（ＲＡＭ）３０８、及びハードディスク・ドライブ（ＨＤＤ）３１０が接続されている。 FIG. 3 is a schematic block diagram of a hardware configuration on the online community provider side. As shown in FIG. 3, client computers 106 a, 106 b... 106 z are connected to the communication interface 302 of the online community server 102 via the Internet 104. The communication interface 302 is further connected to a bus 304, and a CPU 306, a main memory (RAM) 308, and a hard disk drive (HDD) 310 are connected to the bus 304.

図示しないが、オンライン・コミュニティ・サーバ１０２にはさらに、キーボード、マウス、及びディスプレイが接続され、これらによって、オンライン・コミュニティ・サーバ１０２全体の管理やメンテナンス作業を行うようにしてもよい。 Although not shown, a keyboard, a mouse, and a display may be further connected to the online community server 102, and the entire online community server 102 may be managed and maintained by these.

オンライン・コミュニティ・サーバ１０２のハードディスク・ドライブ３１０には、オペレーティング・システム、クライアント・コンピュータクライアント・コンピュータ１０６ａ、１０６ｂ・・・１０６ｚのログイン管理のための、ユーザＩＤとパスワードの対応テーブルが保存されている。ハードディスク・ドライブ３１０にはさらに、オンライン・コミュニティ・サーバ１０２をＷｅｂサーバとして機能させるためのＡｐａｃｈｅなどのソフトウェアが保存され、オンライン・コミュニティ・サーバ１０２の立ち上げ時に、主記憶３０８にロードされて、動作する。これによって、クライアント・コンピュータ１０６ａ、１０６ｂ・・・１０６ｚが、ＴＣＰ／ＩＰのプロトコルで、オンライン・コミュニティ・サーバ１０２にアクセスすることが可能となる。 The hard disk drive 310 of the online community server 102 stores a correspondence table of user IDs and passwords for login management of operating systems, client computers, client computers 106a, 106b,... 106z. . The hard disk drive 310 further stores software such as Apache for causing the online community server 102 to function as a Web server, and is loaded into the main memory 308 when the online community server 102 is started up. To do. As a result, the client computers 106a, 106b,... 106z can access the online community server 102 using the TCP / IP protocol.

オンライン・コミュニティ・サーバ１０２のハードディスク・ドライブ３１０にはさらに、このオンライン・コミュニティ・サービスの各ユーザのメッセージ、日記またはブログ、掲示板などの情報と、オンライン・コミュニティ・サービスの情報が、好適には、ＨＴＭＬファイルと、グラフィック・イメージ、動画ファイル、音楽ファイルなどのマルチメディア形式で、保存されている。 In addition, the hard disk drive 310 of the online community server 102 further includes information on messages, diaries or blogs, bulletin boards, etc. of each user of the online community service, and information on the online community service. An HTML file and a multimedia format such as a graphic image, a moving image file, and a music file are stored.

日記またはブログ、及び掲示板には、当該ユーザが書き込むことができ、他のユーザは、許された権限に応じて、ブログ、掲示板を読んだり、コメントを付けたりすることができる。 The user can write in the diary or the blog and the bulletin board, and other users can read the blog and the bulletin board and make comments according to the authorized authority.

後で詳しく説明するが、ハードディスク・ドライブ３１０には、本発明に係るマルチタイプ・リンク予測を計算するためのモジュール、すなわち、３階テンソルＦ^*を計算するモジュール、類似度行列を計算するモジュール、行列の固有値と固有ベクトルを求めるモジュール、行列のクロネッカー積あるいはクロネッカー和を計算するモジュール、その他必要な行列計算を行なうモジュールなどが保存されている。 As will be described in detail later, the hard disk drive 310 includes a module for calculating a multitype link prediction according to the present invention, that is, a module for calculating a third-order tensor F ^* , a module for calculating a similarity matrix, A module for calculating eigenvalues and eigenvectors of a matrix, a module for calculating a Kronecker product or Kronecker sum of a matrix, and a module for performing other necessary matrix calculations are stored.

ブログ、掲示板などの構成と、それらに対するユーザのアクセス制御は、Perl、Ruby、PHP、Servlet、JSPのような周知のプログラミング言語のツールで実現することができる。あるいは、C、C++、C#、Java（サン・マイクロシステムズの商標）などを用いることもできる。 The configuration of blogs, bulletin boards, etc. and user access control for them can be realized with tools of well-known programming languages such as Perl, Ruby, PHP, Servlet, JSP. Alternatively, C, C ++, C #, Java (trademark of Sun Microsystems), etc. can be used.

さらに、ＨＴＭＬファイル中に適宜、JavaScript（商標）を埋め込んで、Perl、Ruby、PHPなどと協働するようにシステムを構成することもできる。 Furthermore, it is possible to embed JavaScript (trademark) in the HTML file as appropriate and configure the system to cooperate with Perl, Ruby, PHP, etc.

ブログ、掲示板、ニュースなどのコンテンツは、コンテンツ管理データベース（ＣＭＤＢ）に格納して、一元的に管理することも可能である。 Content such as blogs, bulletin boards, news, etc. can be stored in a content management database (CMDB) and managed centrally.

尚、上記オンライン・コミュニティ・サーバ１０２として、インターナョナル・ビジネス・マシーンズ・コーポレーションから購入可能な、ＩＢＭ（インターナョナル・ビジネス・マシーンズ・コーポレーションの商標）ＳｙｓｔｅｍＸ、Ｓｙｓｔｅｍｉ、Ｓｙｓｔｅｍｐなどの機種のサーバを使うことができる。その際、使用可能なオペレーティング・システムは、ＡＩＸ（インターナョナル・ビジネス・マシーンズ・コーポレーションの商標）、ＵＮＩＸ（The Open Groupの商標）、Ｌｉｎｕｘ（商標）、Ｗｉｎｄｏｗｓ（商標）２００３Ｓｅｒｖｅｒなどがある。 As the online community server 102, models such as IBM (trademark of International Business Machines Corporation) System X, System i, System p, which can be purchased from International Business Machines Corporation. You can use any server. In this case, usable operating systems include AIX (trademark of International Business Machines Corporation), UNIX (trademark of The Open Group), Linux (trademark), Windows (trademark) 2003 Server, and the like.

図４は、本発明に係る機能のブロック図を示すものである。図４において、ユーザ情報４０２は、オンライン・コミュニティ・サーバ１０２のハードディスク・ドライブ３１０に保存されているユーザのプロファイル情報であり、具体的には、図６に示すように、ユーザx₁、x₂、x₃・・・毎に、性別、年齢、などの個人情報を含む。さらに図６に示すように、ユーザ情報４０２は、好適には、ブログなどのコンテンツの個々の記事などの単位に対する関係のリンクの情報ももつ。 FIG. 4 shows a block diagram of functions according to the present invention. In FIG. 4, user information 402 is user profile information stored in the hard disk drive 310 of the online community server 102. Specifically, as shown in FIG. 6, users x ₁ and x ₂ , X ₃ ... includes personal information such as gender, age, etc. Further, as shown in FIG. 6, the user information 402 preferably also includes information on links related to units such as individual articles of content such as blogs.

コンテンツ４０４は、ハードディスク・ドライブ３１０に保存されている、ブログの記事、コメント、その他の情報の集合である。アクティブなサイトの場合、コンテンツがユーザによって日々書かれ、増大する。 The content 404 is a collection of blog articles, comments, and other information stored in the hard disk drive 310. For active sites, content is written and augmented daily by users.

ログ４０６は、ユーザの、ログイン、ログアウト、ブログの書き込み、読取り、コメントすることなど、あらゆる活動を記録したファイルであり、これも、ハードディスク・ドライブ３１０に格納される。ログ４０６の例を示すと、下記のとおりである。 The log 406 is a file that records all activities such as user login, logout, blog writing, reading, and commenting, and is also stored in the hard disk drive 310. An example of the log 406 is as follows.

日付時間活動ユーザＩＤ
--------------------------------------------------------------------------
2009/03/01 09:51:001 JST Login 0146230
2009/03/01 10:00:050 JST Logout 0099321
2009/03/01 10:11:130 JST PostBlog 0146230 PostID: 004524082
2009/03/01 10:12:020 JST OpenMsg 2965124 MsgID : 019348003 Date Time Activity User ID
-------------------------------------------------- ------------------------
2009/03/01 09: 51: 001 JST Login 0146230
2009/03/01 10: 00: 050 JST Logout 0099321
2009/03/01 10: 11: 130 JST PostBlog 0146230 PostID: 004524082
2009/03/01 10: 12: 020 JST OpenMsg 2965124 MsgID: 019348003

上述のように、ログ４０６には、少なくとも、活動があった日時と、活動の内容と、その活動をしたユーザのユーザＩＤが含まれる。さらに、ブログを書いた(PostBlog)場合には、書いた記事のＩＤも格納され、メッセージを開いた(OpenMsg)場合には、その開いたメッセージのＩＤが格納される。すなわち、ログ４０６のエントリに基づき、ユーザとコンテンツの要素の間の関係を取得することが可能である。 As described above, the log 406 includes at least the date and time of activity, the content of the activity, and the user ID of the user who performed the activity. Further, when the blog is written (PostBlog), the ID of the written article is also stored. When the message is opened (OpenMsg), the ID of the opened message is stored. That is, it is possible to acquire the relationship between the user and the content element based on the entry of the log 406.

計算モジュール４０８は、本発明に基づく処理を行なうプログラム・モジュールであって、前述した、本発明に係るマルチタイプ・リンク予測を計算するためのモジュール、すなわち、３階テンソルＦ^*を計算するモジュール、類似度行列を計算するモジュール、行列の固有値と固有ベクトルを求めるモジュール、行列のクロネッカー積あるいはクロネッカー和を計算するモジュール、その他必要な行列計算を行なうモジュールなどを含む。これらのモジュールは、好適には、JAVA（商標）、Ｃ、Ｃ＋＋、Ｃ＃などのよく知られたプログラム言語で書かれ、実行可能バイナリ形式でハードディスク・ドライブ３１０に保存され、あるいはオペレーティング・システムの機能によってメモリにロードされ、実行可能である。 The calculation module 408 is a program module that performs processing according to the present invention, and is a module for calculating the multi-type link prediction according to the present invention described above, that is, a module that calculates the third-order tensor F ^* , A module for calculating a similarity matrix, a module for obtaining eigenvalues and eigenvectors of a matrix, a module for calculating a Kronecker product or Kronecker sum of a matrix, and a module for performing other necessary matrix calculations are included. These modules are preferably written in a well-known programming language such as JAVA ™, C, C ++, C #, etc., stored in hard disk drive 310 in an executable binary format, or operating system It is loaded into memory by the function and is executable.

計算モジュール４０８は、ユーザ情報４０２、コンテンツ４０４、及びログ４０６の情報を使用して、教師テンソルデータＦ^*４１０、類似度行列４１２及び、類似度行列の固有値・固有ベクトル４１４を生成する。こうして生成された教師テンソルデータ４１４、類似度行列４１２及び、類似度行列の固有値・固有ベクトル４１４は、好適には一旦、ＲＡＭ３０８に配置される。このため、オンライン・コミュニティ・サーバ１０２のＲＡＭの容量は、なるべく大きい方が望ましい。 The calculation module 408 generates the teacher tensor data F ^* 410, the similarity matrix 412, and the eigenvalue / eigenvector 414 of the similarity matrix using the information of the user information 402, the content 404, and the log 406. The teacher tensor data 414, the similarity matrix 412 and the eigenvalue / eigenvector 414 of the similarity matrix generated in this way are preferably temporarily placed in the RAM 308. For this reason, it is desirable that the RAM capacity of the online community server 102 be as large as possible.

計算モジュール４０８は更に、教師テンソルデータ４１０、類似度行列４１２及び、類似度行列の固有値・固有ベクトル４１４の情報を用いて計算し、リンク予測結果Ｆ４１６を生成する。その実際の計算処理については、後で、図７以降のフローチャートを参照して説明する。 The calculation module 408 further performs calculation using information of the teacher tensor data 410, the similarity matrix 412 and the eigenvalue / eigenvector 414 of the similarity matrix, and generates a link prediction result F416. The actual calculation process will be described later with reference to flowcharts in FIG.

リンク予測結果Ｆ４１６は、この場合、全てのユーザと、全てのコンテンツの要素との間のリンク予測の値を要素として含む３階テンソルである。従って、この結果を用いて、特定のユーザまたはユーザのグループに関連の大きいコンテンツの要素を予測することができ、以って、推奨すべきコンテンツを特定することもできる。 In this case, the link prediction result F416 is a third-order tensor that includes, as elements, link prediction values between all users and all content elements. Thus, this result can be used to predict elements of content that are highly relevant to a particular user or group of users, and thus to identify content that should be recommended.

図５は、この実施例における、オンライン・コミュニティにおける、ユーザの集合Ｘと、コンテンツの要素の集合Ｙとの間のリンク関係を模式的に示す図である。図５において、ユーザx₁が、コンテンツの要素y₂を読み、ユーザx₂が、コンテンツの要素y₁に対してコメントし、ユーザx₃が、コンテンツの要素y₃を書くという例がここに示されている。このような既存のリンク関係が、教師テンソルデータＦ^*４１０として反映されることになる。このようなリンク関係の情報は、実際上、ログ４０６から抽出可能である。 FIG. 5 is a diagram schematically showing a link relationship between the user set X and the content element set Y in the online community in this embodiment. In FIG. 5, here is an example in which user x ₁ reads content element y ₂ , user x ₂ comments on content element y ₁ , and user x ₃ writes content element y _3. It is shown. Such an existing link relationship is reflected as teacher tensor data F ^* 410. Such link-related information can be extracted from the log 406 in practice.

なお、図５の例では、Ｘ≠Ｙであるが、ユーザの集合Ｘ内でユーザ間のリンクを調べるというモデルもあり、その場合は、Ｘ＝Ｙとなる。 In the example of FIG. 5, X ≠ Y, but there is also a model in which links between users are examined in the user set X. In this case, X = Y.

次に図７以下を参照して、本発明のリンク予測処理のアルゴリズムを説明する。特に図７は、全体の処理の概要フローチャートである。 Next, the link prediction processing algorithm of the present invention will be described with reference to FIG. In particular, FIG. 7 is a schematic flowchart of the entire process.

そのステップ７０２は、パラメータを設定するステップ、すなわち、教師データテンソル、類似度行列などのデータを予め用意いるためのステップで、図８のフローチャートに関連して、後で詳細に説明する。 Step 702 is a step for setting parameters, that is, a step for preparing data such as a teacher data tensor and a similarity matrix in advance, which will be described in detail later in connection with the flowchart of FIG.

ステップ７０４は、本発明の根幹となる、リンク予測の方程式を解くステップで、図１２のフローチャートに関連して、後で詳細に説明する。 Step 704 is the step of solving the link prediction equation, which is the basis of the present invention, and will be described later in detail with reference to the flowchart of FIG.

ステップ７０６は、リンク予測の方程式を解くことによって得られた予測値を出力するステップである。得られた予測値は、この実施例の場合、個々のユーザと個々のコンテンツの要素の間の、それぞれのリンクの予測値であり、この値を用いて、特定のコンテンツの要素と高いリンク予測値をもつのは、どのユーザかということが推測できるので、この値に基づきコンテンツの推奨などを行なうことができる。 Step 706 is a step of outputting a prediction value obtained by solving a link prediction equation. In this embodiment, the obtained prediction value is a prediction value of each link between each user and each content element, and this value is used to predict a high link prediction with a specific content element. Since it can be presumed which user has the value, the content can be recommended based on this value.

次に、図８のフローチャートを参照して、パラメータを設定するステップについて説明する。ログ４０６には、ユーザＩＤと、コンテンツの要素のＩＤと、活動の種類が記録されている。そこで、図８のステップ８０２では、計算モジュール４０８が例えば、前述のログ４０６のエントリを走査することによって、ユーザとコンテンツの要素の間のリンクを、その種類毎にカウントする。 Next, steps for setting parameters will be described with reference to the flowchart of FIG. The log 406 records the user ID, the content element ID, and the type of activity. Therefore, in step 802 of FIG. 8, the calculation module 408 scans the entries of the log 406, for example, to count the links between the user and the content elements for each type.

そこで、リンクの有無が分かっているユーザとコンテンツの要素のペアの数をNとし、そのうち、リンクがあると分かっているペアの数をN⁺、リンクがないと分かっているペアの数をN^-とすると、テンソルの成分[Ｆ^*]_i,j,kには、iとjの間にkのリンクがあると分かっている場合はε⁺ = N/N⁺、iとjの間にkのリンクがないと分かっている場合はε^- = N/N^-を格納するようにする。リンクの有無が分かっていていない場合は、単に0を入れるものとする。ここで、リンクの有無が分かっているユーザとは、i,j,kの関係の有無がログ４０６から得られる場合であり、そうでない場合をリンクの有無が分かっていない、すなわち予測の対象のものであると定義する。なお、このようなＦ^*のパラメータの与え方は、一例であって、本発明はそのようなパラメータの与え方に限定されないことを理解されたい。 So, let N be the number of user-content element pairs whose link is known, N ⁺ the number of pairs known to have a link, and N the number of pairs known to have no link. ^- and when the component [F ^*] _{i tensor, j,} the _k, i if you know that there is a link k between the ^{^{j ε + = N / N +}} , between i and j If k links is known not ε ^{^-} = N / N ^- so as to store. If the existence of a link is not known, 0 is simply entered. Here, the user who knows whether or not there is a link is a case where the presence or absence of the relationship of i, j, and k is obtained from the log 406, otherwise the presence or absence of the link is unknown, that is, the target of prediction It is defined as a thing. It should be understood that such a method of giving the parameter of F ^* is an example, and the present invention is not limited to such a method of giving the parameter.

ステップ８０４では、計算モジュール４０８が、図５におけるユーザの集合Ｘ、コンテンツの集合Ｙ、及びＸとＹの間のリンクにおける、集合Ｘの類似度行列、集合Ｙの類似度行列、そして、集合Ｚの類似度行列を計算する。ここで、後の便宜のために、ＸとＹの間のリンクの集合をＺとする。 In step 804, the calculation module 408 performs the set X similarity matrix, the set Y similarity matrix, and the set Z at the user set X, the content set Y, and the link between X and Y in FIG. Compute the similarity matrix. Here, let Z be the set of links between X and Y for later convenience.

Ｘ，Ｙ，Ｚの類似度行列は、基本的に、その各々の集合の要素間の性質に基づき、計算される。図９は、ユーザの集合Ｘの類似度行列の例を示す。このような類似度行列は例えば、図６に示すような各ユーザの特徴ベクトルの間の正規化された距離を計算することによって、生成することができる。従って、類似度行列は、ユーザの数をMとすると、M×Mの行列となる。このとき、各要素の値は非負で、対角成分が1、またはその他の値になるように正規化される。この類似度行列は、Ｗ_Xと記すことにする。 The similarity matrix of X, Y, and Z is basically calculated based on the property between the elements of each set. FIG. 9 shows an example of the similarity matrix of the set X of users. Such a similarity matrix can be generated, for example, by calculating a normalized distance between the feature vectors of each user as shown in FIG. Therefore, the similarity matrix is an M × M matrix where M is the number of users. At this time, the value of each element is non-negative and normalized so that the diagonal component is 1 or other value. This similarity matrix is denoted as W _X.

図１０は、コンテンツの集合Ｙの類似度行列の例を示す。このような類似度行列は例えば、コンテンツの要素の各々を構文解析して、キーワードを抽出し、それらのキーワードの並びとして特徴ベクトルを構成し、それらの特徴ベクトルの間の正規化された距離を計算することによって、生成することができる。従って、類似度行列は、コンテンツの要素の数をNとすると、N×Nの行列となる。このとき、各要素の値は非負で、対角成分が1、またはその他の値になるように正規化される。この類似度行列は、Ｗ_Yと記すことにする。 FIG. 10 shows an example of the similarity matrix of the content set Y. Such a similarity matrix, for example, parses each element of the content, extracts keywords, constructs feature vectors as a sequence of those keywords, and sets normalized distances between the feature vectors. It can be generated by calculating. Therefore, the similarity matrix is an N × N matrix where N is the number of content elements. At this time, the value of each element is non-negative and normalized so that the diagonal component is 1 or other value. This similarity matrix is denoted as W _Y.

図１１は、リンクの集合Ｚの類似度行列の例を示す。このような類似度行列は例えば、個々のリンクに関連付けられたユーザ及びコンテンツの要素をリストして特徴ベクトルを構成し、それらの特徴ベクトルの間の正規化された距離を計算することによって、生成することができる。従って、類似度行列は、リンクの種類の数をTとすると、T×Tの行列となる。このとき、各要素の値は非負で、対角成分が1、またはその他の値になるように正規化される。この類似度行列は、Ｗ_Zと記すことにする。 FIG. 11 shows an example of the similarity matrix of the link set Z. Such a similarity matrix can be generated, for example, by listing user and content elements associated with individual links to construct feature vectors and calculating normalized distances between those feature vectors. can do. Therefore, the similarity matrix is a T × T matrix, where T is the number of types of links. At this time, the value of each element is non-negative and normalized so that the diagonal component is 1 or other value. This similarity matrix is denoted as W _Z.

これらの記法は、以下のアルゴリズムの説明でも、踏襲する。 These notations are followed in the description of the algorithm below.

ステップ８０６では、計算モジュール４０８が、リンク予測の方程式を解くための変数値を取得する。ここでいう変数値は、例えば、結果を格納するための３階テンソルＦである。 In step 806, the calculation module 408 obtains a variable value for solving the link prediction equation. The variable value here is, for example, the third-order tensor F for storing the result.

次に、図１２を参照して、リンク予測の方程式を解く処理について説明する。図１２のステップ１２０２では、類似度行列の次元が所定の閾値より小さいかどうかを計算モジュール４０８が判断する。これは、本発明に係る厳密解法と、近似解法のどちらを適用するかを判断するためである。すなわち、厳密解法は、より正確なリンク予測を与えるが、必要なコンピュータ・メモリの容量が大きく、より計算時間がかかるので、類似度行列の次元で、どちらを適用するかを判断する。 Next, processing for solving a link prediction equation will be described with reference to FIG. In step 1202 of FIG. 12, the calculation module 408 determines whether the dimension of the similarity matrix is smaller than a predetermined threshold. This is for determining whether to apply the exact solution or the approximate solution according to the present invention. That is, the exact solution gives more accurate link prediction, but requires a large amount of computer memory and requires more computation time, so it is determined which one to apply based on the dimension of the similarity matrix.

類似度行列の次元が所定の閾値より小さいと判断すると、ステップ１２０４で、リンク予測の方程式の厳密解を求めるステップを計算モジュール４０８が実行する。リンク予測の方程式の厳密解を求めるステップは、図１３を参照して、後でより詳細に説明する。 If it is determined that the dimension of the similarity matrix is smaller than the predetermined threshold value, in step 1204, the calculation module 408 executes a step of obtaining an exact solution of the link prediction equation. The step of obtaining the exact solution of the link prediction equation will be described in more detail later with reference to FIG.

類似度行列の次元が所定の閾値より大きいと判断すると、ステップ１２０６で、リンク予測の方程式の近似解を求めるステップを計算モジュール４０８が実行する。リンク予測の方程式の近似解を求めるステップは、図１４を参照して、後でより詳細に説明する。 If it is determined that the dimension of the similarity matrix is larger than the predetermined threshold, the calculation module 408 executes a step of obtaining an approximate solution of the link prediction equation in step 1206. The step of obtaining an approximate solution of the link prediction equation will be described in more detail later with reference to FIG.

次に、本発明に係るリンク予測の方程式を解くためのアルゴリズムを説明する。
先ず、上記非特許文献から引用した式１を再掲する。
(σＬ + Ｉ)vec(Ｆ) = vec(Ｆ^*) ・・・式１ Next, an algorithm for solving the link prediction equation according to the present invention will be described.
First, Formula 1 quoted from the said nonpatent literature is redisplayed.
(σL + I) vec (F ) = vec (F *) ··· Formula 1

この実施例の説明では、類似度行列の対角成分が全て1であると述べるが、実際には類似度行列の要素は集合の要素間の類似度を表すものあるから、非負の値ならすべて可能である。但し、本発明では、このような類似度行列の各行の和を効率的に計算できることが重要な条件である。 In the description of this embodiment, it is stated that the diagonal components of the similarity matrix are all 1, but in reality the elements of the similarity matrix represent the similarity between the elements of the set. Is possible. However, in the present invention, it is an important condition that the sum of each row of such a similarity matrix can be efficiently calculated.

厳密解を求める時に、類似度行列の各行の和を単純な方法で効率的に求められるが、近似解を求める時に類似度行列を近似する行列の各行の和も効率的に求められる。このように、各行の和、または、行の和の最大値を利用して、類似度行列を、上記の数１と数２で示したように、正規化することができる。例えば、近似解を求めるステップでは、N×Nの類似度行列の各行の和を求める計算量はO(N²)ステップかかるが、その類似度行列のN×dの低ランク近似行列から各行の和の近似を求める計算量はO(Nd)しかかからない。 When obtaining an exact solution, the sum of each row of the similarity matrix can be efficiently obtained by a simple method, but when obtaining an approximate solution, the sum of each row of the matrix that approximates the similarity matrix can also be obtained efficiently. In this way, the similarity matrix can be normalized as shown in the above formulas 1 and 2 using the sum of each row or the maximum value of the sum of rows. For example, in the step of obtaining an approximate solution, the amount of calculation for calculating the sum of each row of an N × N similarity matrix takes O (N ² ) steps, but from the N × d low-rank approximation matrix of the similarity matrix, Only O (Nd) is required for calculating the sum of sums.

すると、Ｌは、正規化ラプラシアン行列で、数３を再掲すると，以下のように表される。

よって、解きたい式は、下記の式２のようになる。なお、係数σは、例えば0.01と設定される。

ここで、類似度行列の間に作用される演算子は下記のとおりであり、

すなわち、クロネッカー積

または、クロネッカー和

のどちらかである。クロネッカー積またはクロネッカー和のどちらを使うかは、用途による。vec(Ｆ)という記法は、テンソルＦの成分を並べてベクトルにしたものである。 Then, L is a normalized Laplacian matrix, and when Equation 3 is reprinted, it is expressed as follows.

Therefore, the equation to be solved is as shown in Equation 2 below. The coefficient σ is set to 0.01, for example.

Here, the operators operated during the similarity matrix are as follows:

That is, the Kronecker product

Or Kronecker sum

Either. Whether to use Kronecker product or Kronecker sum depends on the application. The notation vec (F) is a vector in which the components of the tensor F are arranged.

ここで、クロネッカー積を使うかクロネッカー和を使うか、という明確な基準は本来特にないが、本発明の手法が高速であるため、データから学習して最適なものを効率的に選べるので、解きたい問題によって精度のよいものを使うことができる。 In this case, there is no clear standard whether to use the Kronecker product or the Kronecker sum, but since the method of the present invention is fast, it is possible to efficiently select the optimal one by learning from the data. Depending on the problem you want, you can use the one with good accuracy.

定義としては、以下のように、
クロネッカー積：類似度行列の３つ組の要素同士が全て似ているなら、３つ組は似ているとする
クロネッカー和：類似度行列の３つ組の要素中、２つが共通で、残り１つが似ているなら、３つ組は似ているとする、というものである。 The definition is as follows:
Kronecker product: If all three elements of similarity matrix are similar, the triplet is similar. Kronecker sum: Two elements in three elements of similarity matrix are the same, and the remaining 1 If the two are similar, the triplet is similar.

こうして、式２の解は、下記の式３のように書かれる。

この逆行列をいかに高速に、且つ少ないメモリでコンパクトに解くかが、この技法のキーである。 Thus, the solution of Equation 2 is written as Equation 3 below.

The key to this technique is how to solve this inverse matrix quickly and compactly with a small amount of memory.

尚、vec()記法、クロネッカー積、クロネッカー和については、A. J. Laub, Matrix for Scientists and Engineers, Society for Industrial and Applied Mathematics 2005; あるいは、David A. Harville, Matrix Algebra from a Statistician's Perspective, Springer Verlag 1997 などを参照されたい。 For Vec () notation, Kronecker product, Kronecker sum, AJ Laub, Matrix for Scientists and Engineers, Society for Industrial and Applied Mathematics 2005; or David A. Harville, Matrix Algebra from a Statistician's Perspective, Springer Verlag 1997, etc. Please refer to.

さて、図１３に戻って、ステップ１３０２では、計算モジュール４０８がＷ_X, Ｗ_Y, Ｗ_Zの固有値分解を求めるのであるが、ここで、A. J. Laub, Matrix for Scientists and Engineers, Society for Industrial and Applied Mathematics 2005にも記載されている有用な定理を書き下しておく。 Returning to FIG. 13, in step 1302, the calculation module 408 calculates eigenvalue decomposition of W _X , W _Y , and W _Z. Here, AJ Laub, Matrix for Scientists and Engineers, Society for Industrial and Applied Write down useful theorems described in Mathematics 2005.

＜定理１＞
{λ_X ⁽¹⁾, λ_X ⁽²⁾, ...,λ_X ^(M)},
{λ_Y ⁽¹⁾, λ_Y ⁽²⁾, ...,λ_Y ^(N)},
{λ_Z ⁽¹⁾, λ_Z ⁽²⁾, ...,λ_Z ^(T)}
をそれぞれ、Ｗ_X, Ｗ_Y, Ｗ_Zの固有値とする。
また、Ｖ_X, Ｖ_Y, Ｖ_Zを、それぞれ、Ｗ_X, Ｗ_Y, Ｗ_Zの固有ベクトル（縦ベクトル）を並べた行列とする。
すると、クロネッカー積

または、クロネッカー和

の固有ベクトルは、ともに

で与えられる。
また、固有値は、クロネッカー積の場合

で与えられ、クロネッカー和の場合

で与えられる。 <Theorem 1>
{λ _X ⁽¹⁾ , λ _X ⁽²⁾ , ..., λ _X ^(M) },
{λ _Y ⁽¹⁾ , λ _Y ⁽²⁾ , ..., λ _Y ^(N) },
{λ _Z ⁽¹⁾ , λ _Z ⁽²⁾ , ..., λ _Z ^(T) }
_Are the eigenvalues of W _X , W _Y , and W _Z , respectively.
Also, _let V _X , V _Y , and V _{Z be} a matrix in which eigenvectors (vertical vectors) of W _X , W _Y , and W _Z are arranged, respectively.
Then Kronecker product

Or Kronecker sum

The eigenvectors of are both

Given in.
Also, the eigenvalue is the case of Kronecker product

In case of Kronecker sum

Given in.

そこで、ステップ１３０２において、Ｗ_X, Ｗ_Y, Ｗ_Zの固有値分解を、
Ｗ_X = Ｖ_X diag(λ_X ⁽¹⁾, λ_X ⁽²⁾, ...,λ_X ^(M)) Ｖ_X ^T
Ｗ_Y = Ｖ_Y diag(λ_Y ⁽¹⁾, λ_Y ⁽²⁾, ...,λ_Y ^(N)) Ｖ_Y ^T
Ｗ_Z = Ｖ_Z diag(λ_Z ⁽¹⁾, λ_Z ⁽²⁾, ...,λ_Z ^(T)) Ｖ_Z ^T
とする。ここで、diag()は、対角行列である。 Therefore, in step 1302, eigenvalue decomposition of W _X , W _Y , and W _Z is performed.
_{_{W X = V X diag (λ}} X (1), λ X (2), ..., λ X (M)) V X T
W _Y = V _Y diag (λ _Y ⁽¹⁾ , λ _Y ⁽²⁾ , ..., λ _Y ^(N) ) V _Y ^T
W _Z = V _Z diag (λ _Z ⁽¹⁾ , λ _Z ⁽²⁾ , ..., λ _Z ^(T) ) V _Z ^T
And Here, diag () is a diagonal matrix.

そこで、クロネッカー積の場合、固有値を保持する３階テンソルＬを、
[Ｌ]_i,j,k ≡ λ_X ⁽ⁱ⁾λ_Y ^(j)λ_Z ^(k)と定義する。
クロネッカー和の場合、固有値を保持する３階テンソルＬは、
[Ｌ]_i,j,k ≡ λ_X ⁽ⁱ⁾ + λ_Y ^(j) + λ_Z ^(k)と定義する。
また、本発明では、クロネッカー積とクロネッカー和の両方をあわせてリンクの予測も可能で、そのときの固有値を保持する３階テンソルＬを
[Ｌ]_i,j,k ≡ α（λ_X ⁽ⁱ⁾λ_Y ^(j)λ_Z ^(k) ）＋β（λ_X ⁽ⁱ⁾ + λ_Y ^(j) + λ_Z ^(k)）と定義する。
ただし、αとβは任意の実数で、データから最適な値を学習して設定できる。本発明での３階テンソルＬの値は類似度行列の固有値、またはその近似行列の固有値、のあらゆる関数から効率的に計算できるから、クロネッカー和とクロネッカー積から得られるものだけに限定されるものではない。また、３階テンソルＬの値にはどの関数を利用するかはデータから学習して効率的に選択することが可能である。 Therefore, in the case of the Kronecker product, the third-order tensor L that holds the eigenvalue is
[L] _{i, j, k} ≡ λ _X ⁽ⁱ⁾ λ _Y ^(j) λ _Z ^(k)
In the case of Kronecker sum, the third-order tensor L holding the eigenvalue is
[L] _i, defined _{_{^{j, k ≡ λ X (i}}} ) + λ Y and ^{_{^{(j) + λ Z (k}}} ).
In the present invention, it is possible to predict the link by combining both the Kronecker product and the Kronecker sum, and the third-order tensor L that holds the eigenvalue at that time is calculated.
[L] _{i, j, k} ≡α (λ _X ⁽ⁱ⁾ λ _Y ^(j) λ _Z ^(k) ) + β (λ _X ⁽ⁱ⁾ + λ _Y ^(j) + λ _Z ^(k) ) .
However, α and β are arbitrary real numbers and can be set by learning optimum values from the data. Since the value of the third-order tensor L in the present invention can be efficiently calculated from any function of the eigenvalue of the similarity matrix or the eigenvalue of the approximate matrix, it is limited to only those obtained from the Kronecker sum and Kronecker product is not. Further, which function is used for the value of the third-order tensor L can be efficiently selected by learning from the data.

すると、定理１により、式３の逆行列は、

と書き直せる。ここで

であることを利用すると、

と書き直せる。 Then, according to Theorem 1, the inverse matrix of Equation 3 is

Can be rewritten. here

If you use that,

Can be rewritten.

((cσ+1)I - σdiag(vec(L)))は対角行列であるから、その逆行列は容易に求まる。そこで、ステップ１３０４で、３階テンソルＤを、

のように定義すると、解は次のように書ける。

Since ((cσ + 1) I−σdiag (vec (L))) is a diagonal matrix, the inverse matrix can be easily obtained. Therefore, in step 1304, the third floor tensor D is

, The solution can be written as

この式は、さらに以下のように簡略化できる。

This equation can be further simplified as follows.

これから結局、下記の式５を得る。ステップ１３０６で、この式５を用いて、結果のＦを得る。

ここで、×₁などの演算は、テンソルのモード積で、T. G. Kolder & B. W. Bader, "Tensor decomposition and applications" Tech. Rep. SAND2007-6702, Sandia National Laboratories などにその詳細な定義が記載されている。 As a result, the following formula 5 is obtained. In step 1306, using this equation 5, the resulting F is obtained.

Here, operations such as × ₁ are tensor mode products, and detailed definitions are described in TG Kolder & BW Bader, "Tensor decomposition and applications" Tech. Rep. SAND2007-6702, Sandia National Laboratories, etc. .

次に、図１４のフローチャートを参照して、近似解法について説明する。近似解法は、類似度行列や解Ｆが大きすぎて、コンピュータのメモリに入りきらない場合に用いられる。ステップ１４０２では、類似度行列が大きすぎるという問題に対処するために、計算モジュール４０８は、不完全コレスキー分解などの方法を用いて、下記のように、類似度行列Ｗ_X, Ｗ_Y, Ｗ_Zを低ランク近似する。

Next, the approximate solution will be described with reference to the flowchart of FIG. The approximate solution is used when the similarity matrix or the solution F is too large to fit in the computer memory. In step 1402, in order to deal with the problem that the similarity matrix is too large, the calculation module 408 uses a method such as incomplete Cholesky decomposition to calculate the similarity matrix W _X , W _Y , W as follows. Approximate _Z with low rank.

ここで、Ｇ_Xは、M×M~行列、Ｇ_Yは、N×N~行列、Ｇ_Zは、T×T~行列で、
M > M~、N > N~、T > T~である。不完全コレスキー分解を用いることで、類似度行列を明示的にメモリ内に構成することなく、計算量O(MM~² + NN~² + TT~²)で、この分解を実行できる。 Here, G _X is an M × M matrix, G _Y is an N × N matrix, and G _Z is a T × T matrix,
M> M ~, N> N ~, T> T ~. By using the incomplete Cholesky decomposition, this decomposition can be executed with the amount of calculation O (MM ~ ² + NN ~ ² + TT ~ ² ) without explicitly configuring the similarity matrix in the memory.

ステップ１４０４では、計算モジュール４０８は、次のようにして、
Ｇ_X ^TＧ_X, Ｇ_Y ^TＧ_Y, Ｇ_Z ^TＧ_Zのの固有値分解を求める。
Ｇ_X ^TＧ_X = Ｕ~_X diag(λ~_X ⁽¹⁾, λ~_X ⁽²⁾, ...,λ~_X ^(M~))Ｕ~_X ^T
Ｇ_Y ^TＧ_Y = Ｕ~_Y diag(λ~_Y ⁽¹⁾, λ~_Y ⁽²⁾, ...,λ~_Y ^(N~))Ｕ~_X ^T
Ｇ_Z ^TＧ_Z = Ｕ~_Z diag(λ~_Z ⁽¹⁾, λ~_Z ⁽²⁾, ...,λ~_Z ^(N~))Ｕ~_Z ^T
すなわちここで、Ｕ~_X、Ｕ~_Y、Ｕ~_Zは、それぞれ、
Ｇ_X ^TＧ_X, Ｇ_Y ^TＧ_Y, Ｇ_Z ^TＧ_Zの固有ベクトル（縦ベクトル）を並べた行列である。 In step 1404, the calculation module 408:
_{^{_{_{G X T G X, G Y}}}} T G Y, determine the eigenvalue decomposition of the G _{_Z} ^T G _Z.
_{^{_{G X T G X = U ~}}} X diag (λ ~ X (1), λ ~ X (2), ..., λ ~ X (M ~)) U ~ X T
G _Y ^T G _Y = U ~ _Y diag (λ ~ _Y ⁽¹⁾ , λ ~ _Y ⁽²⁾ , ..., λ ~ _Y ^{(N ~)} ) U ~ _X ^T
G _Z ^T G _Z = U ~ _Z diag (λ ~ _Z ⁽¹⁾ , λ ~ _Z ⁽²⁾ , ..., λ ~ _Z ^{(N ~)} ) U ~ _Z ^T
That is, here, U to _X , U to _Y , and U to _Z are respectively
G _{_X} ^T G _X, a G _{_Y} ^T G _Y, matrix obtained by arranging eigenvectors of G _{_Z} ^T G _Z (column vector).

Ｇ_X ^TＧ_X, Ｇ_Y ^TＧ_Y, Ｇ_Z ^TＧ_Zは、Ｇ_XＧ_X ^T, Ｇ_YＧ_Y ^T, Ｇ_ZＧ_Z ^Tよりもずっと次数を小さくすることができるため、より少ないメモリで、固有値分解を行なうことができる。 G _X ^T G _X , G _Y ^T G _Y , G _Z ^T G _Z are less because orders can be much smaller than G _X G _X ^T , G _Y G _Y ^T , G _Z G _Z ^T Eigenvalue decomposition can be performed in the memory.

次のステップ１４０６では、次のようにして、Ｇ_XＧ_X ^T, Ｇ_YＧ_Y ^T, Ｇ_ZＧ_Z ^Tの固有ベクトルを求める。
Ｖ~_X = Ｇ_XＵ~_Xdiag(λ~_X ⁽¹⁾, λ~_X ⁽²⁾, ...,λ~_X ^(M~))^-1/2
Ｖ~_Y = Ｇ_YＵ~_Ydiag(λ~_Y ⁽¹⁾, λ~_Y ⁽²⁾, ...,λ~_Y ^(N~))^-1/2
Ｖ~_Z = Ｇ_ZＵ~_Zdiag(λ~_Z ⁽¹⁾, λ~_Z ⁽²⁾, ...,λ~_Z ^(T~))^-1/2 In the next step 1406, eigenvectors of G _X G _X ^T , G _Y G _Y ^T , and G _Z G _Z ^T are obtained as follows.
V ~ _X = G _X U ~ _X diag (λ ~ _X ⁽¹⁾ , λ ~ _X ⁽²⁾ , ..., λ ~ _X ^{(M ~)} ) ^-1/2
V ~ _Y = G _Y U ~ _Y diag (λ ~ _Y ⁽¹⁾ , λ ~ _Y ⁽²⁾ , ..., λ ~ _Y ^{(N ~)} ) ^-1/2
V ~ _Z = G _Z U ~ _Z diag (λ ~ _Z ⁽¹⁾ , λ ~ _Z ⁽²⁾ , ..., λ ~ _Z ^{(T ~)} ) ^-1/2

以上の結果を利用すると、式２の逆行列は、下記のようになる。

Using the above results, the inverse matrix of Equation 2 is as follows.

ここで、Ｌ~は、クロネッカー積の場合は
[Ｌ~]_i,j,k ≡ λ~_X ⁽ⁱ⁾λ~_Y ^(j)λ~_Z ^(k)
クロネッカー和の場合は
[Ｌ~]_i,j,k ≡ λ~_X ⁽ⁱ⁾ + λ~_Y ^(j) + λ~_Z ^(k)
と定義される。 Where L ~ is Kronecker product
[L ~] _{i, j, k} ≡ λ ~ _X ⁽ⁱ⁾ λ ~ _Y ^(j) λ ~ _Z ^(k)
For Kronecker sum
_{[L ~] i, j,} k ≡ λ ~ X (i) + λ ~ Y (j) + λ ~ Z (k)
It is defined as

しかし、厳密解の場合と異なり、

なので、このままでは厳密解の場合のように、式２の逆行列を変形することはできない。 However, unlike the exact solution,

Therefore, the inverse matrix of Equation 2 cannot be transformed as it is in the exact solution.

そこで、C. M. Bishop, Pattern Recognition and Machine Learning, 2006, Springer Verlagに記載されているWoodburyの公式を用いると、

のようになる。 So, using Woodbury's formula described in CM Bishop, Pattern Recognition and Machine Learning, 2006, Springer Verlag,

become that way.

そこで３階テンソルＤを、下記の式６のように定義する。

Therefore, the third-order tensor D is defined as shown in Equation 6 below.

すると、解は次のようにあらわされる。

Then, the solution is expressed as follows.

これから、Ｆは、次のように求められる。

From this, F is calculated | required as follows.

特に、ステップ１４１０で、下記を式の中心部分として保持しておく。

Ｆ自体のサイズが大きいため、式８の結果を予め計算して保持しておくことにより、ステップ１４１２で、必要な予測だけ、オンデマンドで計算することが可能となる。 In particular, in step 1410, keep the following as the central part of the equation:

Since the size of F itself is large, by calculating and holding the result of Equation 8 in advance, it becomes possible to calculate only the necessary prediction on demand in Step 1412.

こうして、結果のテンソルＦにおいて、その成分(i,j,k)の値は、集合Ｘの要素iと、集合Ｙの要素jの間のリンクkの値が格納されており、その値が大きいほどリンクkの尤度が高いと解釈される。 Thus, in the resulting tensor F, the value of the component (i, j, k) stores the value of the link k between the element i of the set X and the element j of the set Y, and the value is large. It is interpreted that the likelihood of link k is higher.

以上のように、本発明のマルチタイプ・リンク予測技法を、オンライン・コミュニティ・サーバを例して説明してきたが、本発明はこれには限定されず、協調フィルタリング、生体ネットワーク予測など、任意のマルチタイプ・リンク予測の応用例に適用可能である。 As described above, the multi-type link prediction technique of the present invention has been described by taking the online community server as an example. However, the present invention is not limited to this, and any arbitrary method such as collaborative filtering, biological network prediction, etc. It can be applied to an application example of multi-type link prediction.

また、上記実施例では、Ｗｅｂサーバ上でマルチタイプ・リンク予測のプログラムを実行するようにしたが、スタンドアロンのコンピュータでも同様に実施することが可能であることは、この分野の当業者なら理解するであろう。 In the above embodiment, the multi-type link prediction program is executed on the Web server. However, those skilled in the art will understand that it can be similarly executed on a stand-alone computer. Will.

４０２・・・ユーザ情報
４０４・・・コンテンツ
４０６・・・ログ
４０８・・・計算モジュール
４１０・・・教師テンソルデータＦ^*
４１２・・・類似度行列のデータ
４１４・・・固有ベクトル、固有値のデータ
４１６・・・リンク予測結果 402 ... user information 404 ... content 406 ... log 408 ... calculation module 410 ... teacher tensor data F ^*
412 ... Similarity matrix data 414 ... Eigenvector, eigenvalue data 416 ... Link prediction result

Claims

A system for predicting a multitype link between data of a first set of nodes and data of a second set of nodes by computer processing, comprising:
A memory capable of reading and writing data by the computer;
Data of the first set of nodes stored in the memory;
Data of the second set of nodes stored in the memory;
Means for storing in the memory a set of multi-type link information between the first node and the second node;
Means for calculating, from the multi-type link information, the first node, the second node, and third-order tensor teacher data therebetween;
Means for calculating data of a first similarity matrix of the first set of nodes so that its diagonal component is normalized to 1 or a predetermined value;
Means for calculating data of a second similarity matrix of the second set of nodes so that its diagonal component is normalized to 1 or a predetermined value;
Means for calculating data of a third similarity matrix of the set of multitype link information so that its diagonal component is normalized to 1 or a predetermined value;
Means for calculating eigenvalue decomposition of each of the first similarity matrix, the second similarity matrix, and the third similarity matrix and obtaining eigenvector matrix data in which the eigenvalues and eigenvectors are arranged. When,
Means for calculating a Kronecker product or Kronecker sum of the eigenvector matrix;
Means for calculating a third-order tensor parameter from the value of the eigenvalue;
Means for calculating multi-type link prediction based on the third-order tensor teacher data, the Kronecker product or Kronecker sum of the eigenvector matrix, and the third-order tensor parameters;
Multi-type link prediction system.

When the means for calculating the Kronecker product or the Kronecker sum of the eigenvector matrix calculates the Kronecker product, each of the first similarity matrix, the second similarity matrix, and the third similarity matrix Assuming that the eigenvalues are λ _X ⁽ⁱ⁾ , λ _Y ^(j) , and λ _Z ^(k) , respectively, the i, j, k components of the third-order tensor parameter are λ _X ⁽ⁱ⁾ λ _Y ^(j) λ _Z The multi-type link prediction system of claim 1, determined based on a value of ^(k) .

When the means for calculating the Kronecker product or the Kronecker sum of the eigenvector matrix calculates the Kronecker sum, each of the first similarity matrix, the second similarity matrix, and the third similarity matrix Assuming that the eigenvalues are λ _X ⁽ⁱ⁾ , λ _Y ^(j) , and λ _Z ^(k) , respectively, the i, j, and k components of the third-order tensor parameter are λ _X ⁽ⁱ⁾ + λ _Y ^(j) + The multi-type link prediction system of claim 1, determined based on a value of λ _Z ^(k) .

A method for predicting a multitype link between data of a first set of nodes and data of a second set of nodes by computer processing, comprising:
Loading data of the first set of nodes into memory of the computer;
Loading data of the second set of nodes into the memory;
Loading a set of multi-type link information between the first node and the second node into the memory;
Calculating, from the multitype link information, the first node, the second node, and third-order tensor teacher data therebetween;
Calculating the first similarity matrix data of the first set of nodes so that its diagonal component is normalized to 1 or a predetermined value;
Calculating data of a second similarity matrix of the second set of nodes so that its diagonal component is normalized to 1 or a predetermined value;
Calculating data of a third similarity matrix of the set of multitype link information so that its diagonal component is normalized to 1 or a predetermined value;
Calculating eigenvalue decomposition of each of the first similarity matrix, the second similarity matrix, and the third similarity matrix and obtaining eigenvector matrix data in which the eigenvalues and eigenvectors are arranged. When,
Calculating a Kronecker product or Kronecker sum of the eigenvector matrix;
Calculating a third-order tensor parameter from the value of the eigenvalue;
Calculating multi-type link prediction based on the third-order tensor teacher data, the Kronecker product or Kronecker sum of the eigenvector matrix, and the third-order tensor parameters;
Multi-type link prediction method.

When the step of calculating the Kronecker product or the Kronecker sum of the eigenvector matrix calculates the Kronecker product, each of the first similarity matrix, the second similarity matrix, and the third similarity matrix Assuming that the eigenvalues are λ _X ⁽ⁱ⁾ , λ _Y ^(j) , and λ _Z ^(k) , respectively, the i, j, k components of the third-order tensor parameter are λ _X ⁽ⁱ⁾ λ _Y ^(j) λ _Z The multi-type link prediction method according to claim 4, wherein the multi-type link prediction method is determined based on a value of ^(k) .

If the step of calculating the Kronecker product or the Kronecker sum of the eigenvector matrix calculates the Kronecker sum, the first similarity matrix, the second similarity matrix, and the third similarity matrix Assuming that the eigenvalues are λ _X ⁽ⁱ⁾ , λ _Y ^(j) , and λ _Z ^(k) , respectively, the i, j, and k components of the third-order tensor parameter are λ _X ⁽ⁱ⁾ + λ _Y ^(j) + λ _Z ^(k) is the basis of the value determination, multi-type link prediction method of claim 4.

A program for predicting a multi-type link between data of a set of first nodes and data of a set of second nodes by computer processing,
In the computer,
Loading data of the first set of nodes into memory of the computer;
Loading data of the second set of nodes into the memory;
Loading a set of multi-type link information between the first node and the second node into the memory;
Calculating, from the multitype link information, the first node, the second node, and third-order tensor teacher data therebetween;
Calculating the first similarity matrix data of the first set of nodes so that its diagonal component is normalized to 1 or a predetermined value;
Calculating data of a second similarity matrix of the second set of nodes so that its diagonal component is normalized to 1 or a predetermined value;
Calculating data of a third similarity matrix of the set of multitype link information so that its diagonal component is normalized to 1 or a predetermined value;
Calculating eigenvalue decomposition of each of the first similarity matrix, the second similarity matrix, and the third similarity matrix and obtaining eigenvector matrix data in which the eigenvalues and eigenvectors are arranged. When,
Calculating a Kronecker product or Kronecker sum of the eigenvector matrix;
Calculating a third-order tensor parameter from the value of the eigenvalue;
Performing the third-order tensor teacher data, the Kronecker product or Kronecker sum of the eigenvector matrix, and calculating a multi-type link prediction based on the third-order tensor parameters.
Multi-type link prediction program.

When the step of calculating the Kronecker product or the Kronecker sum of the eigenvector matrix calculates the Kronecker product, each of the first similarity matrix, the second similarity matrix, and the third similarity matrix Assuming that the eigenvalues are λ _X ⁽ⁱ⁾ , λ _Y ^(j) , and λ _Z ^(k) , respectively, the i, j, k components of the third-order tensor parameter are λ _X ⁽ⁱ⁾ λ _Y ^(j) λ _Z The multi-type link prediction program according to claim 7, which is determined based on the value of ^(k) .

If the step of calculating the Kronecker product or the Kronecker sum of the eigenvector matrix calculates the Kronecker sum, the first similarity matrix, the second similarity matrix, and the third similarity matrix Assuming that the eigenvalues are λ _X ⁽ⁱ⁾ , λ _Y ^(j) , and λ _Z ^(k) , respectively, the i, j, and k components of the third-order tensor parameter are λ _X ⁽ⁱ⁾ + λ _Y ^(j) + The multi-type link prediction program according to claim 7, which is determined based on a value of λ _Z ^(k) .

A system for predicting a multitype link between data of a first set of nodes and data of a second set of nodes by computer processing, comprising:
A memory capable of reading and writing data by the computer;
Data of the first set of nodes stored in the memory;
Data of the second set of nodes stored in the memory;
Means for storing in the memory a set of multi-type link information between the first node and the second node;
Means for calculating, from the multi-type link information, the first node, the second node, and third-order tensor teacher data therebetween;
Means for calculating data of a first similarity matrix of the first set of nodes so that its diagonal component is normalized to 1 or a predetermined value;
Means for calculating data of a second similarity matrix of the second set of nodes so that its diagonal component is normalized to 1 or a predetermined value;
Means for calculating data of a third similarity matrix of the set of multitype link information so that its diagonal component is normalized to 1 or a predetermined value;
Means for determining a low rank approximate decomposition of the first similarity matrix, the second similarity matrix, and the third similarity matrix;
Means for obtaining eigenvalue decomposition of the transposed matrix of the low rank approximate decomposition;
Means for obtaining data of an eigenvector matrix in which the eigenvalues of the low rank approximate decomposition and eigenvectors are arranged using the eigenvalue decomposition;
Means for calculating a Kronecker product or Kronecker sum of the eigenvector matrix;
Means for calculating a third-order tensor parameter from the value of the eigenvalue;
Means for calculating multi-type link prediction based on the third-order tensor teacher data, the Kronecker product or Kronecker sum of the eigenvector matrix, and the third-order tensor parameters;
Multi-type link prediction system.

When the means for calculating the Kronecker product or the Kronecker sum of the eigenvector matrix calculates the Kronecker product, each of the first similarity matrix, the second similarity matrix, and the third similarity matrix Assuming that the eigenvalues are λ _X ⁽ⁱ⁾ , λ _Y ^(j) , and λ _Z ^(k) , respectively, the i, j, k components of the third-order tensor parameter are λ _X ⁽ⁱ⁾ λ _Y ^(j) λ _Z The multi-type link prediction system of claim 10, wherein the multi-type link prediction system is determined based on a value of ^(k) .

When the means for calculating the Kronecker product or the Kronecker sum of the eigenvector matrix calculates the Kronecker sum, each of the first similarity matrix, the second similarity matrix, and the third similarity matrix Assuming that the eigenvalues are λ _X ⁽ⁱ⁾ , λ _Y ^(j) , and λ _Z ^(k) , respectively, the i, j, and k components of the third-order tensor parameter are λ _X ⁽ⁱ⁾ + λ _Y ^(j) + The multitype link prediction system of claim 10, wherein the multitype link prediction system is determined based on a value of λ _Z ^(k) .

A method for predicting a multitype link between data of a first set of nodes and data of a second set of nodes by computer processing, comprising:
Loading data of the first set of nodes into memory of the computer;
Loading data of the second set of nodes into the memory;
Loading a set of multi-type link information between the first node and the second node into the memory;
Calculating, from the multitype link information, the first node, the second node, and third-order tensor teacher data therebetween;
Calculating the first similarity matrix data of the first set of nodes so that its diagonal component is normalized to 1 or a predetermined value;
Calculating data of a second similarity matrix of the second set of nodes so that its diagonal component is normalized to 1 or a predetermined value;
Calculating data of a third similarity matrix of the set of multitype link information so that its diagonal component is normalized to 1 or a predetermined value;
Obtaining a low rank approximate decomposition of the first similarity matrix, the second similarity matrix, and the third similarity matrix;
Obtaining eigenvalue decomposition of the transposed matrix of the low rank approximate decomposition;
Using the eigenvalue decomposition to determine the eigenvalue of the low rank approximate decomposition and eigenvector matrix data in which eigenvectors are arranged;
Calculating a Kronecker product or Kronecker sum of the eigenvector matrix;
Calculating a third-order tensor parameter from the value of the eigenvalue;
Calculating multi-type link prediction based on the third-order tensor teacher data, the Kronecker product or Kronecker sum of the eigenvector matrix, and the third-order tensor parameters;
Multi-type link prediction method.

When the step of calculating the Kronecker product or the Kronecker sum of the eigenvector matrix calculates the Kronecker product, each of the first similarity matrix, the second similarity matrix, and the third similarity matrix Assuming that the eigenvalues are λ _X ⁽ⁱ⁾ , λ _Y ^(j) , and λ _Z ^(k) , respectively, the i, j, k components of the third-order tensor parameter are λ _X ⁽ⁱ⁾ λ _Y ^(j) λ _Z ^(k) is the basis of the value determination, multi-type link prediction method of claim 13.

If the step of calculating the Kronecker product or the Kronecker sum of the eigenvector matrix calculates the Kronecker sum, the first similarity matrix, the second similarity matrix, and the third similarity matrix Assuming that the eigenvalues are λ _X ⁽ⁱ⁾ , λ _Y ^(j) , and λ _Z ^(k) , respectively, the i, j, and k components of the third-order tensor parameter are λ _X ⁽ⁱ⁾ + λ _Y ^(j) + The multi-type link prediction method according to claim 13, wherein the multi-type link prediction method is determined based on a value of λ _Z ^(k) .

A program for predicting a multi-type link between data of a set of first nodes and data of a set of second nodes by computer processing,
In the computer,
Loading data of the first set of nodes into memory of the computer;
Loading data of the second set of nodes into the memory;
Loading a set of multi-type link information between the first node and the second node into the memory;
Calculating, from the multitype link information, the first node, the second node, and third-order tensor teacher data therebetween;
Calculating the first similarity matrix data of the first set of nodes so that its diagonal component is normalized to 1 or a predetermined value;
Calculating data of a second similarity matrix of the second set of nodes so that its diagonal component is normalized to 1 or a predetermined value;
Calculating data of a third similarity matrix of the set of multitype link information so that its diagonal component is normalized to 1 or a predetermined value;
Obtaining a low rank approximate decomposition of the first similarity matrix, the second similarity matrix, and the third similarity matrix;
Obtaining eigenvalue decomposition of the transposed matrix of the low rank approximate decomposition;
Using the eigenvalue decomposition to determine the eigenvalue of the low rank approximate decomposition and eigenvector matrix data in which eigenvectors are arranged;
Calculating a Kronecker product or Kronecker sum of the eigenvector matrix;
Calculating a third-order tensor parameter from the value of the eigenvalue;
Performing the third-order tensor teacher data, the Kronecker product or Kronecker sum of the eigenvector matrix, and calculating a multi-type link prediction based on the third-order tensor parameters.
Multi-type link prediction program.

When the step of calculating the Kronecker product or the Kronecker sum of the eigenvector matrix calculates the Kronecker product, each of the first similarity matrix, the second similarity matrix, and the third similarity matrix Assuming that the eigenvalues are λ _X ⁽ⁱ⁾ , λ _Y ^(j) , and λ _Z ^(k) , respectively, the i, j, k components of the third-order tensor parameter are λ _X ⁽ⁱ⁾ λ _Y ^(j) λ _Z The multi-type link prediction program according to claim 16, which is determined based on the value of ^(k) .

If the step of calculating the Kronecker product or the Kronecker sum of the eigenvector matrix calculates the Kronecker sum, the first similarity matrix, the second similarity matrix, and the third similarity matrix Assuming that the eigenvalues are λ _X ⁽ⁱ⁾ , λ _Y ^(j) , and λ _Z ^(k) , respectively, the i, j, and k components of the third-order tensor parameter are λ _X ⁽ⁱ⁾ + λ _Y ^(j) + The multi-type link prediction program according to claim 16, which is determined based on the value of λ _Z ^(k) .