JP4080939B2

JP4080939B2 - NETWORK DATA LOW DIMENSIONAL EMBEDDING METHOD, NETWORK DATA LOW DIMENSIONAL EMBEDDING DEVICE, NETWORK DATA LOW DIMENSIONAL EMBEDDING PROGRAM, AND RECORDING MEDIUM CONTAINING THE PROGRAM

Info

Publication number: JP4080939B2
Application number: JP2003115148A
Authority: JP
Inventors: 武士山田; 和巳齋藤; 修功上田
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2003-04-21
Filing date: 2003-04-21
Publication date: 2008-04-23
Anticipated expiration: 2023-04-21
Also published as: JP2004318739A

Description

【０００１】
【発明の属する技術分野】
本発明は、複雑なネットワークを低次元空間に配置することで、コンピュータのディスプレイ上で容易に閲覧できるようにするネットワークデータ低次元埋込方法及びその装置と、そのネットワークデータ低次元埋込方法の実現に用いられるネットワークデータ低次元埋込プログラム及びそのプログラムを記録した記録媒体とに関する。
【０００２】
【従来の技術】
多くの研究分野において、複雑なリレーショナルデータの表現手段としてネットワーク又はグラフが良く用いられる。ここで述べるネットワークとは、複数のノードとノード間を結ぶリンクとからなる集合であると定義する。
【０００３】
例えばＷｅｂを対象とする場合、各Ｗｅｂページをノード、ページ間のハイパーリンクをノード間のリンクとするハイパーリンクネットワークが良く用いられているし、生物学においては、遺伝子やたんぱく質、代謝物質などの相互作用を遺伝子制御ネットワークを用いて表現することが行われている。また、人間どうし、あるいは企業などの社会的存在どうしの関係の解析にはソーシャルネットワークが用いられる。
【０００４】
一般にネットワーク表現によって、グラフ理論などを駆使した数学的な解析が可能になると同時に、低次元空間に埋め込んで実際に表示させることで、内在するデータの構造的特徴を浮彫りにし、その結果、研究者にとって重要な洞察を得ることが可能になる。
【０００５】
このようにネットワークの埋め込み法の研究は、ネットワークに内在する知識を獲得し構造的原理を明らかにするという、発見科学および機械学習の立場からみても極めて重要である。しかるに、ネットワークのサイズが増加し複雑になるとネットワークの埋め込みはより困難となり、より効率的な埋め込みアルゴリズムが必要となる。
【０００６】
ネットワークの構造を直感的に理解し解析する基本的な方法の一つとして、「ブラウジング」が挙げられる。ブラウジングとは、ネットワークを低次元の空間に埋め込み、その配置上でノードからノードへのリンクをたどったり、ノードごとの接続関係を相互に比較するなど低次元の配置を直接調べることである。効率的な埋め込みアルゴリズムとは、ブラウジングに最も適したネットワークの低次元ユークリッド空間への埋め込みを生成することが出来るアルゴリズムであると考えられる。
【０００７】
ネットワーク又はグラフを２次元や３次元のような低次元ユークリッド空間に埋め込んで可視化する方法はいくつか知られている。
【０００８】
古典的な多次元尺度法（例えば、非特許文献１参照）を用いることも可能であるし、また、もっとも代表的な方法として、バネモデル（例えば、非特許文献２参照）が知られている。また、球面上に配置する方法として、スペクトラルクラスタリング（例えば、非特許文献３参照）が知られている。
【０００９】
これら従来の方法では、ネットワークの任意のノードのペア間に距離が存在することを仮定している。例えば上述のバネモデルでは、任意のノードペアに対してグラフ距離（graph-theoretic distance）を計算し、この距離がもつとも忠実に保存されるような低次元への埋め込みを考えている。
【００１０】
ここでグラフ距離とは、２ノード間のグラフ上の最短経路長（すなわち最小リンク数）のことであり、例えば Floydアルゴリズム（Floyd,R.W."Algorithm 97(Shortest path)",Communications of the ACM,5(6),345,1962.）によって計算される。
【００１１】
まとめると、従来法はどれも、以下のような〔原則Ｂ〕をなるべく満足する配置を求める手法であると言える。
【００１２】
〔原則Ｂ〕(distance preserving principle）各ノードペアは、相互のグラフ距離が低次元ユークリッド空間においてもっとも忠実に復元されるように相互の配置を決める。
【００１３】
【非特許文献１】
Torgerson,W.S.,"Theory and Methods of Scaling",Wiley,New York,1958.
【非特許文献２】
Kamada,T.,＆ Kawai,S."An Algorithm for Drawing General Undirected Graph",Information Processing Letters,32,7-15,1989.
【非特許文献３】
Ng,A.,Jordan,M. ＆ Weiss,Y."On spectral clustering: Analysis and an algorithm" Proc. of NIPS 14,2001.
【００１４】
【発明が解決しようとする課題】
しかしながら、特に埋め込みの次元が低く、かつ大規模で複雑なネットワークの場合、埋め込みの自由度が少ないため、従来法のような距離を忠実に保存する埋め込みでは破綻をきたすことになる。
【００１５】
その結果、従来技術に従っていると、ブラウジングに不向きな、片寄った配置しか得られないことが多い。
【００１６】
本発明はかかる事情に鑑みてなされたものであって、従来法のような「距離を忠実に保存する」というアプローチをとるのではなく、もっと直接的に「接続関係を忠実に保存する」というアプローチをとることによって、よりブラウジングに適したネットワークデータの低次元空間への埋め込みを実現する新たなネットワークデータ低次元埋込技術の提供を目的とする。
【００１７】
【課題を解決するための手段】
今、Ｎ個のノードからなるネットワークを考え、その隣接行列をＡ＝（ａ_i,j）とする。
【００１８】
説明を簡単にするために無向ネットワークのみを扱うこととする。すなわち、ａ_i,j∈｛０，１｝、ａ_i,i＝１、かつａ_i,j＝ａ_j,iと仮定する。ただし、本発明は有向ネットワークにも容易に拡張できる。
【００１９】
ここで、ａ_i,j∈｛０，１｝は、ａ_i,jの値が１か０であることを示し、ｉとｊとが繋がっている場合にはａ_i,j＝１となり、ｉとｊとが繋がっていない場合にはａ_i,j＝０となる。
【００２０】
与えられたネットワークに対して、本発明の目的は、次に述べるＫ次元ユークリッド距離の意味で、下記の〔原則Ａ〕を実現するＮ個のノードのＫ次元空間への埋め込みを求めることである。
【００２１】
〔原則Ａ〕(connectivity preserving principle) 各ノードは、それと隣接するノード（すなわち、直接リンクで繋がっているノード）を、隣接しないノードより相対的により近くに配置する。
【００２２】
そこで、本発明では、以下で説明するようにクロスエントロピーに基づく目的関数を考えることによって、より最適な埋め込みを求める。
【００２３】
Ｋ次元空間におけるＮ個のノードの座標をｘ₁,ｘ₂,・・・ ,ｘ_N（本来はベクトル表記すべきものであるが記載の便宜上ベクトル表記していない）とする。ここで、ｘ_iとｘ_jとの間のユークリッド距離は下記の式（１）のように定義される。
【００２４】
【数１】

【００２５】
次に、ｕ≧０に関する単調減少関数ρ（ｕ）∈［０，１］を考え、ρ（０）＝１，ρ（∞）＝０とする。このとき、ρ（ｄ_i,j）は、ｘ_i, ｘ_j間の（連続値をとる）類似度関数と見なすことができる。
【００２６】
本発明の基本的な技術思想は、実際の最適化がより扱いやすくなるように〔原則Ａ〕を言い換えることである。
【００２７】
まず、この類似度を用いると、〔原則Ａ〕は、「各ノードは隣接するノードと自分との類似度が、隣接しないノードとの類似度よりも大きくなるように配置する」と言い換えられる。
【００２８】
さらに、ａ_i,jとρ（ｄ_i,j）を用いることによって、〔原則Ａ〕は、「各ノードは連続値をとる類似度関数ρ（ｄ_i,j）が離散類似度ａ_i,jのもっとも良い近似となるように配置する」という、多少条件が緩和された形に言い換えられる。
【００２９】
そこで、ρ（ｄ_i,j）によってａ_i,jを近似するために、ａ_i,jとρ（ｄ_i,j）に関する下記の式（２）に示すような負のクロスエントロピーを考える。
【００３０】
【数２】

【００３１】
この式（２）は、ｘ_i，ｘ_jに関しρ（ｄ_i,j）＝ａ_i,jのとき、すなわち、連続値類似度関数が離散類似度と完全に一致するときに最小となる。Ｅ_i,jが対称であることに注意すると、ｘ₁,ｘ₂,・・・ ,ｘ_Nに関し最小化すべき全エネルギー関数は下記の式（３）となる。
【００３２】
【数３】

【００３３】
ここで、ノードｉを固定してみる。するとａ_i,jはノードｊの２値クラスラベルと解釈することができる。すなわち、ａ_i,j＝１の場合、ｊが属するクラスのラベルは１であり、ａ_i,i＝１に注意すると、ｊはｉと同じクラスに属する。一方、ａ_i,j＝０の場合、ノードｊが属するクラスのラベルは０であり、ｊはｉとは異なるクラスに属する。
【００３４】
従って、この問題は、式（３）を分類問題としては標準的な目的関数として、ｘ₁,ｘ₂,・・・ ,ｘ_NをパラメータするＮ個の２値分類問題を同時に解くことであると解釈できる。
【００３５】
ここでは、類似度関数として常にρ（ｕ）＝ｅｘｐ（−ｕ／２）を用いる（他の関数を用いることも可能）。これを用いると、式（２）のエネルギー関数は下記の式（４）で与えられる。
【００３６】
【数４】

【００３７】
この式（４）を使い、最終的な目的関数は正則化項を付加して下記の式（５）のように定義される。
【００３８】
【数５】

【００３９】
ここで、μは正則化係数であり、事前に決めておく。このような正則化項を導入することによって、アルゴリズムは安定すると同時に、埋め込まれた配置のサイズを制御することができる。
【００４０】
この目的関数は、各ノードについて自ノードと直接リンクで繋がっているノードの方が直接リンクで繋がっていないノードよりも近くに配置される場合に最適な配置であるとして、その最適な配置に近づく程、その値が小さくなってある値に近づくという性質を持つ。
【００４１】
本発明のネットワークデータ低次元埋込装置は、複数のノードとノード間を結ぶリンクとからなるネットワークで表現されたデータを低次元の空間に埋め込み配置する処理を行うために、（１）ノード数Ｎと、ノードｉとノードｊとの間の接続関係を１，０の値を示すａ_i,jで記述する隣接行列Ａ＝（ａ_i,j）とで構成されるネットワークデータを記憶する記憶手段と、（２）記憶手段から、ネットワークデータを読み込んで作業用メモリに書き込む書込手段と、（３）作業用メモリに書き込んだノード数Ｎで示される全ノードの埋め込み位置の初期値をランダムに設定して作業用メモリに書き込む設定手段と、（４）装置上で定義される目的関数Ｊであって、ノードｉとノードｊとの間のユークリッド距離ｄ_i,jを変数として、ｄ_i,j＝０のときに最大値を示し、ｄ_i,jの値が大きくなるに従って小さな値を示す単調減少関数ρ（ｄ_i,j）＝ｅｘｐ（−ｄ _i,j ／２）とａ_i,jとにより定義される上述の式（４）で表されるクロスエントロピーＥ_i,j の全エネルギー関数として導出される上述の式（５）で表される目的関数Ｊを用い、設定手段の設定した初期値を起点として順次更新される各ノードの埋め込み位置を処理対象として、この目的関数Ｊを最小化する最適な配置に向けての各ノードの埋め込み位置における勾配ベクトルを算出して作業用メモリに書き込む算出手段と、（５）算出手段が作業用メモリに書き込んだ勾配ベクトルの大きさが収束したか否かを判断することにより、ノードの埋め込み位置の更新を打ち切るのか否かを判断する判断手段と、（６）判断手段が打ち切らないと判断した場合に、作業用メモリに書き込んだ勾配ベクトルに基づき、作業用メモリに書き込んだノードの埋め込み位置の変分ベクトルを算出し、該変分ベクトルを用いて作業用メモリに書き込んだノードの埋め込み位置を更新して算出手段を呼び出す更新手段とを備える。
【００４２】
このように構成されるときにあって、算出手段は、各ノードについて、他のノードを固定してそのノードを移動させたときの勾配ベクトルを算出することで、各ノードの埋め込み位置における勾配ベクトルを算出することがある。
【００４４】
また、このように構成されるときにあって、算出手段の算出した勾配ベクトルの大きさが最も大きな値を示すノードを選択することで、埋め込み位置の更新対象となるノードを１つ選択する選択手段を備えることがある。
【００４５】
この選択手段を備えるときには、判断手段は、選択手段の選択したノードについての勾配ベクトルの大きさが収束したか否かを判断することにより、ノードの埋め込み位置の更新を打ち切るのか否かを判断することがある。
【００４６】
以上の各処理手段が動作することで実現される本発明のネットワークデータ低次元埋込方法はコンピュータプログラムで実現できるものであり、このコンピュータプログラムは、半導体メモリなどの記録媒体に記録して提供したり、ネットワークを介して提供することができる。
【００４７】
このように構成される本発明のネットワークデータ低次元埋込装置では、全ノードの埋め込み位置の初期値をランダムに設定すると、上述の性質を持つ目的関数を用い、その設定した初期値を処理対象として、最適な配置に向けての各ノードの埋め込み位置の改善指標となる勾配ベクトルを算出する。
【００４８】
そして、その算出した勾配ベクトルの大きさが収束したか否かを判断することにより、ノードの埋め込み位置の更新を打ち切るのか否かを判断して、打ち切らないと判断した場合には、勾配ベクトルに基づいてノードの埋め込み位置の変分ベクトルを算出し、その変分ベクトルを用いて最適な配置に向かうようにとノードの埋め込み位置を更新する。
【００４９】
そして、上述の性質を持つ目的関数を用い、その更新したノードの埋め込み位置を処理対象として、最適な配置に向けての各ノードの埋め込み位置の改善指標となる勾配ベクトルを算出することを繰り返す。
【００５０】
このようにして、上述の性質を持つ目的関数を用い、初期値を起点として順次更新される各ノードの埋め込み位置を処理対象として、最適な配置に向けての各ノードの埋め込み位置の改善指標となる勾配ベクトルを算出することを繰り返していくことで、最適な配置に向かうようにとノードの埋め込み位置を更新していくときに、算出した勾配ベクトルの大きさに基づいて、ノードの埋め込み位置の更新を打ち切ることを判断する場合には、その時点で得られた各ノードの埋め込み位置を最終結果として出力する。
【００５１】
このように、本発明では、各ノードについて自ノードと直接リンクで繋がっているノードの方が直接リンクで繋がっていないノードよりも近くに配置されるようにと、各ノードの埋め込み位置を決定することから、よりブラウジングに適したネットワークデータの低次元空間への埋め込みを実現できるようになる。
【００５２】
【発明の実施の形態】
次に、本発明の実施の形態について、図面を参照して説明する。
【００５３】
図１に、本発明を具備するネットワークデータ低次元埋込装置１の一実施形態例を図示する。
【００５４】
この図に示すように、本発明のネットワークデータ低次元埋込装置１は、ネットワークデータ・データベース２に格納されるネットワークデータを処理対象として、処理対象のネットワークデータを低次元のユークリッド空間に埋め込むという処理を行うものであって、この埋め込み処理を実現するために、埋め込み配置初期化部１０と、勾配ベクトル初期化部１１と、移動ノード選択部１２と、変分ベクトル計算部１３と、勾配ベクトル更新部１４と、埋め込み配置更新部１５と、作業用メモリ部１６とを備える。
【００５５】
ここで、作業用メモリ部１６は、各ノードの配置座標ｘ₁,ｘ₂,・・・ ,ｘ_Nの途中結果や最終結果などを一時的に格納するために用意される。
【００５６】
図２に、このように構成される本発明のネットワークデータ低次元埋込装置１の動作を示すフローチャートを図示する。
【００５７】
次に、このフローチャートに従って、このように構成される本発明のネットワークデータ低次元埋込装置１の処理について詳細に説明する。
【００５８】
本発明のネットワークデータ低次元埋込装置１では、まず、ネットワークデータ・データベース２から、ノード数Ｎ、Ｎ×Ｎの大きさを持つ隣接行列Ａ＝（ａ_i,j）、埋め込みの次元Ｋ、正則化のための定数μ、終了条件を決めるための収束精度定数εを埋め込み配置初期化部１０に入力する（ステップ２０）。
【００５９】
ここで、μ，εについては、例えばμ，εともに、μ，ε∈［１０^-8，１０^-4］の範囲に設定すればよい。
【００６０】
次に、埋め込み配置初期化部１０において、ｔ＝１とするとともに、全てのノードについての埋め込み配置、すなわち各ノードの配置座標ｘ₁,ｘ₂,・・・ ,ｘ_Nの初期値をランダムに設定する（ステップ２１）。
【００６１】
次に、勾配ベクトル初期化部１１において、各ノードｉについて、上述の式（５）で定義される目的関数Ｊのｘ_iに関する勾配ベクトルＪ_xiを下記の式（６）によって計算する（ステップ２２）。
【００６２】
【数６】

【００６３】
但し、上述の式（４）を微分することによって、Ｅ_i,jの微分∂Ｅ_i,jは下記の式（７）で計算できる。
【００６４】
【数７】

【００６５】
次に、移動ノード選択部１２において、下記の式（８）に従って、勾配ベクトルの二乗ノルム‖Ｊ_xj ^(t)‖²が最大となるノードｉを選択する（ステップ２３）。なお、以下では、勾配ベクトルの二乗ノルムを勾配ノルムと呼ぶことがある。
【００６６】
【数８】

【００６７】
次に、ステップ２３で選ばれたノードｉに関する勾配ノルム‖Ｊ_xi ^(t)‖²が収束精度定数εより小さいかどうかをチェックして（ステップ２４）、勾配ノルム‖Ｊ_xi ^(t)‖²が収束精度定数εより小さい場合には、すでに収束したと見なして、現在の埋め込み配置ｘ₁,ｘ₂,・・・ ,ｘ_Nを出力して（ステップ２８）、処理を終了する。
【００６８】
一方、勾配ノルム‖Ｊ_xi ^(t)‖²が収束精度定数εより小さくない場合には、ステップ２５に進んで、変分ベクトル計算部１３において、ノードｉの新たな配置座標を求めるべく、ノードｉの変分ベクトルΔｘ_iをヘッシアン行列Ｈを用いて下記の式（９）により計算する（但し、ｘの転置行列をｘ’で表す）。
【００６９】
【数９】

【００７０】
しかし、Ｈは必ずしも正定値とは限らないため、「Ｊ^(t+1)＜Ｊ^(t)」が常に成り立つとは限らない。
【００７１】
そこで、式（９）を適用した結果、「ΔＪ＝Ｊ^(t+1)−Ｊ^(t)＞０」となったのか否かを検査して、「ΔＪ＝Ｊ^(t+1)−Ｊ^(t)＞０」となった場合には、式（９）の適用をやめ、その代りに下記の式（１０）を用いる。
【００７２】
【数１０】

【００７３】
但し、ステップ長λは「Ｊ^(t+1)＜Ｊ^(t)」となるように選ぶ。Ｊ_xi ^(t)は勾配方向であるので、十分小さいλを選べば目的関数であるＪが常に減少するようにできる。こうすることによってアルゴリズムは収束が保証される。
【００７４】
ここで、式（９）を式（１０）よりも優先させる形でノードｉの変分ベクトルΔｘ_iを計算するのは、式（９）で得られる変分ベクトルΔｘ_iの方が大きい値をとることで、計算効率を上げることができるからである。
【００７５】
また、ノードｉの配置座標のみが変更し、それ以外のノードについては配置座標が変更しないことで、それ以外のノード間に係る（ｔ＋１）におけるＪの値とｔにおけるＪの値との間に変化がないことを考慮して、ΔＪについては、下記の式（１１）に従って、ノードｉに関連する差分だけ計算することによってＯ（Ｎ)(Ｎのオーダの計算量）で計算できることになる。なお、式（９）の代りに式（１０）を用いるという状況はたまにしか起らないため、あまり計算上の負担にはならない。
【００７６】
【数１１】

【００７７】
次に、勾配ベクトル更新部１４において、Ｊ_xj ^(t)を用いてＪ_xj ^(t+1)を更新する（ステップ２６）。
【００７８】
この更新処理について説明するならば、目的関数Ｊのｘ_jに関する勾配ベクトルＪ_xjは、式（５）に基づいて下記の式（１２）により計算される。
【００７９】
【数１２】

【００８０】
一方、Ｊ_xj ^(t+1)とＪ_xj ^(t)との間には、
Ｊ_xj ^(t+1)＝Ｊ_xj ^(t)＋ΔＪ_xj
という関係式が成り立つことから、この式（１２）に基づいて下記の式（１３）の関係式が成り立つ。
【００８１】
【数１３】

【００８２】
ここで、ノードｉの配置座標のみが変更し、それ以外のノードについては配置座標が変更しないことから、ｈ≠ｉ，ｊとして、下記の式（１４）という関係式が成り立つ。
【００８３】
【数１４】

【００８４】
従って、Ｊ_xj ^(t+1)とＪ_xj ^(t)との間には、この式（１４）と式（１３）とに基づいて下記の式（１５）の関係式が成り立つ。
【００８５】
【数１５】

【００８６】
ここで、式（４）と式（７）とから分かるように、下記の式（１６）という関係式が成り立つ。
【００８７】
【数１６】

【００８８】
従って、この式（１６）と式（１５）とに基づいて、Ｊ_xj ^(t+1)とＪ_xj ^(t)との間には下記の式（１７）が成り立つ。
【００８９】
【数１７】

【００９０】
これから、ステップ２６では、この式（１７）基づいて、Ｊ_xj ^(t)を用いてＪ_xj ^(t+1)を更新するのである。従って、Ｊ_xjの更新はＯ（Ｎ)(Ｎのオーダの計算量）で計算できることになる。
【００９１】
なお、Ｊ_xi ^(t+1)については、ノードｉの配置座標が変更することでＪ_xi ^(t)から求めることができないので、式（６）に従って最初から計算することになる。
【００９２】
次に、ステップ２７で、埋め込み配置更新部１５において、変分ベクトル計算部１３で計算したノードｉの変分ベクトルΔｘ_iに基づいて、各ノードの配置座標ｘ₁,ｘ₂,・・・ ,ｘ_Nの更新するとともに、ｔ＝ｔ＋１としてから、移動ノード選択部１２（ステップ２３）の処理に戻る。
【００９３】
このようにして、本発明のネットワークデータ低次元埋込装置１は、式（２）に示すようなクロスエントロピーに基づく目的関数を最小化するという方法を用いることによって、各ノードについて自ノードと直接リンクで繋がっているノードの方が直接リンクで繋がっていないノードよりも近くに配置されるようにと、各ノードの埋め込み位置を決定するように処理するのである。
【００９４】
次に、実データを利用した実験によって本発明の有効性を示す。
【００９５】
（イ）実験に用いたデータ
本発明の有効性を検証すべく、実データから構成した３種類の異なるタイプのネットワークを用いた実験を行った。
【００９６】
生物学に由来する第一のネットワークは、大腸菌（Escherichia coli）の遺伝子制御ネットワークであり、文献「Shen-Orr,S.S., Milo,R., Mangan,S. ＆ Alon,U."Network motifs in the Transcriptional Regulation Network of Escherichia Coli",Nature Genetics Volume,31,No.1,64-68,2002. 」に由来する。
【００９７】
第二のネットワークは、国際会議ＮＩＰＳ（Neural Information Processing Systems)の第１回から第１２回までに発表された論文の共著者ネットワークであり、「Roweis,S.T."Data for MATLAB hackers: NIPS conference papers Vols 0-12.",http://www.cs.toronto.edu/roweis/data.html,2002.」より入手した。この第二のネットワークでは、各著者をノードとし、二人の著者が少なくとも一つの共著論文を発表すればその著者間にリンクを張ることで構成される。
【００９８】
第三のネットワークは、ＷＷＷ（World Wide Web）のハイパーリンクネットワークであり、本出願人のサイトに属するＷＷＷページをすべて収集して構成した。
【００９９】
本発明の有効性を検証するために、これらのネットワークをまず無向ネットワークに変換し、さらに最大連結成分のみを抜き出したものを用いた。通常このように各連結成分を別々に計算するのが自然である。
【０１００】
図３に、これら抜き出したネットワークに関する統計量を示す。ここで、ノード数をＮとし、Ｌ^T、Ｌ^A、Ｌ^Mは、それぞれ各ネットワークの合計リンク数、平均リンク数、最大リンク数を表す。Ｌ_iをノードｉと直接繋がっているリンク数とすると、これらの間には、下記の式（１８）の関係式が成り立つ。
【０１０１】
【数１８】

【０１０２】
次に、Ｇ^AおよびＧ^Mを定義する。Ｇ^A、Ｇ^Mは、各ネットワークにおける任意の２ノード間のグラフ距離のそれぞれ平均値、最大値を表す。ｇ_i,jをノードｉとｊとの間のグラフ論的距離とすると、下記の式（１９）の関係式が成り立つ。
【０１０３】
【数１９】

【０１０４】
これらのネットワークの隣接行列はすべて疎行列という点では共通しているが、各統計量についてはそれぞれ異なる特徴を持つことがわかる。また、第三のネットワークは、第一のネットワーク、第二のネットワークに比べてサイズが特に大きく複雑なネットワークである。
【０１０５】
（ロ）評価尺度
本発明の有効性をはかるためには、以下のような適切な評価尺度が必要である。
【０１０６】
今、ＮノードからなるＫ次元ユークリッド空間への埋め込みが得られたとし、その配置座標をｘ₁,ｘ₂,・・・ ,ｘ_Nとする。
【０１０７】
そこで、各ｘ_iを中心とし、半径ｒ_iのＫ次元球体Ｂ_i（ｒ_i）を考える。〔原則Ａ〕を完全に実現する理想的な埋め込みでは、各ノードｉにおいて適切な半径ｒ_iを選ぶことによって、ｉに隣接する（直接リンクで繋がっている）ノードをすべて含み、逆に、ｉに隣接しないノードについては全く含まないようなＢ_i（ｒ_i）を構成できるはずである。
【０１０８】
しかしながら、実際の埋め込みでは、特に次元Ｋが小さく複雑なネットワークの場合には、すべてのノードｉでそのようなｒ_iを選ぶことは一般に不可能である。しかるに、適切な尺度のもとで最適なｒ_iを考えることはできるはずである。
【０１０９】
特に、疎なネットワークの場合には、隣接するノードに比べて隣接しないノード数が圧倒的に多いため、Ｂ_i（ｒ_i）に正しく含まれるノードの個数およびＢ_i（ｒ_i）から正しく除外されるノードの個数の合計を考える通常の「精度」(accuracy)を尺度として用いるのは適切ではない。何故なら、隣接するかどうかに関わらず、すべてのノードが全くＢ_i（ｒ_i）に含まれない場合でも、精度は高くなるからである。このような場合では、Ｆ−尺度を用いるのが適切である。Ｆ−尺度は情報検索やテキスト分類などの分野で広く用いられる尺度である。
【０１１０】
Ｆ−尺度は、「適合率」(precision）と「再現率」(recall)との調和平均で定義される。今、＃Ｘで集合Ｘの要素の個数を表すことにすると、ｘ_iとｒ_iに対応するｉ番目の球体Ｂ_i（ｒ_i）の適合率Ｐ_i（ｒ_i）は下記の式（２０）のように定義される。
【０１１１】
【数２０】

【０１１２】
同様にして、ｘ_iとｒ_iに対応するｉ番目の球体Ｂ_i（ｒ_i）の再現率Ｒ_i（ｒ_i）は下記の式（２１）のように定義される。
【０１１３】
【数２１】

【０１１４】
おおざっぱに言うと、適合率を高くするためにはｒ_iを小さくする必要があり、逆に、再現率を高くするためにはｒ_iを大きくする必要があるので、最適なｒ_iは両者のバランスで決まる。より厳密には、最適な半径＜ｒ_i＞は、下記の式（２２）で定義されるＦ−尺度を最大にするものとして定義できる。但し、実験では常にα＝１／２を用いる。
【０１１５】
【数２２】

【０１１６】
この式（２２）をすべてのノードについて平均することによって、低次元に埋め込まれたネットワークの配置を定量的に評価する尺度である「接続Ｆ−尺度」(connectivity F-measure)を下記の式（２３）のように定義する。
【０１１７】
【数２３】

【０１１８】
本発明の有効性を検証するために行って実験では、この式（２３）で定義される「接続Ｆ−尺度」を評価尺度として用いた。
【０１１９】
（ハ）接続Ｆ−尺度を用いた比較
本発明の提案法ならびに、古典的な多次元尺度法、バネモデル、スペクトラルクラスタリングの３つの既存法を上に述べた３種類のネットワークにそれぞれ適用し、Ｋ次元空間への埋め込みを得た。さらに、その結果を接続Ｆ−尺度を用いて評価した。Ｋの値については、第一のネットワークについては２から２５まで、第二のネットワークと第三のネットワークとについては２から９までそれぞれ変化させて実験した。
【０１２０】
図４に、評価結果のグラフを示す。ここで、縦軸は接続Ｆ−尺度の値を表し、横軸は埋め込んだ次元Ｋを表す。図中の記号はそれぞれ、ＣＥ：本発明の提案法、ＣＭＤＳ：古典的な多次元尺度法、ＫＫ：バネモデル、ＳＣ：スペクトラルクラスタリングを意味する。
【０１２１】
接続Ｆ−尺度が高いほうがより優れた解法である。なお、本発明の提案法およびバネモデルについては、局所解の問題があるため乱数の初期値を変えて５回行った結果を表示しているが、結果を見る限り、結果のばらつきはほとんどなく、局所解の影響は無視できることがわかる。各手法とも、次元数Ｋを増加し、埋め込みの自由度が増すと単調に性能が向上することがわかる。
【０１２２】
この図４から分かるように、期待どおり、本発明の提案法が他の手法に比べて、特に配置が難しい低次元において、優れていることが分かる。中でも、もつともサイズの大きい第三のネットワークを用いた実験において、本発明の提案法と従来法の性能差が際立っていることが分かる。
【０１２３】
（ニ）２次元への埋め込み結果
図５（ａ）〜（ｃ）のそれぞれ左側に、古典ＭＤＳ（ＣＭＤＳ）、バネモデル（ＫＫ）、本発明の提案法（ＣＥ）による第三のネットワークの実際の２次元への埋め込み結果を示す。スペクトラククラスタリングについては３次元球面上の配置となるため省略した。
【０１２４】
図５（ａ）に示す古典的な多次元尺度法の結果は、多くのノードが一点に縮退してしまっているため、可視化、特にブラウジングの観点では問題がある。これは線型手法の限界と考えられる。
【０１２５】
一方、図５（ｂ）に示すバネモデルと、図５（ｃ）に示す本発明の提案法とは、このノード縮退が少ない。さらにこれらを注意深く見ると、前者では多くのノードが半円状にかたまって配置されているのに対し、後者ではより一様に放射状に広がっている。本発明の提案法のほうが空間をより効率的に活用していることがわかり、ここに両者の接続Ｆ−尺度の性能差が現れているとも言える。これは、前者のバネモデルが復元しようとするグラフ距離が、離散的な値のみをとるからと考えられる。バネモデルの結果は、特にノードが密集した領域において、ブラウジングを難しくしている。
【０１２６】
図５（ａ）〜（ｃ）のそれぞれの左側の図において、点線で囲まれた領域を切り出し、この部分のみ再計算して埋め込んだのがそれぞれの右側の図である。切り出し前後を含む、合計６つの点線で囲まれた領域は、すべて同一のサブネットワークに対応している。
【０１２７】
この図から容易に分かるように、古典的な多次元尺度法、バネモデルにおいては、この切り出しの前後において、配置がかなり変化しているのに対し、本発明の提案法では比較的変化が少ない。従って、より切り出し安定性が高い、と考えられる。このことから考えても、本発明の提案法はよりブラウジングに適した埋め込み方法である、と言うことができる。
【０１２８】
【発明の効果】
以上説明したように、本発明は、複数のノードとノード間を結ぶリンクとからなるネットワークで表現されたデータを２次元や３次元の低次元ユークリッド空間に埋め込むことにより視覚的に表現できるようにするときにあって、クロスエントロピーに基づく目的関数を最小化するという方法を用いることによって、各ノードについて自ノードと直接リンクで繋がっているノードの方が直接リンクで繋がっていないノードよりも近くに配置されるようにと、各ノードの埋め込み位置を決定するように処理することから、従来法に比べてより空間効率を生かしたブラウジングに適した埋め込みを実現できるようになる。
【図面の簡単な説明】
【図１】本発明のネットワークデータ低次元埋込装置の一実施形態例である。
【図２】本発明のネットワークデータ低次元埋込装置の動作を示すフローチャートである。
【図３】本発明の有効性を検証するために行った実験に用いたネットワークに関する統計量の説明図である。
【図４】本発明の有効性を検証するために行った実験の評価結果を示すグラフである。
【図５】本発明の有効性を検証するために行った実験における２次元空間への埋め込み結果の説明図である。
【符号の説明】
１ネットワークデータ低次元埋込装置
２ネットワークデータ・データベース
１０埋め込み配置初期化部
１１勾配ベクトル初期化部
１２移動ノード選択部
１３変分ベクトル計算部
１４勾配ベクトル更新部
１５埋め込み配置更新部[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a network data low-dimensional embedding method and apparatus, and a network data low-dimensional embedding method that make it easy to view on a computer display by arranging a complex network in a low-dimensional space. The present invention relates to a network data low-dimensional embedded program used for realization and a recording medium on which the program is recorded.
[0002]
[Prior art]
In many research fields, networks or graphs are often used as a means of representing complex relational data. The network described here is defined as a set of a plurality of nodes and links connecting the nodes.
[0003]
For example, when targeting the Web, a hyperlink network in which each Web page is a node and a hyperlink between pages is a link between nodes is often used. In biology, genes, proteins, metabolites, etc. The interaction is expressed using a gene regulatory network. A social network is used to analyze the relationship between humans or social entities such as companies.
[0004]
In general, network representation enables mathematical analysis that makes full use of graph theory, and at the same time, by embedding it in a low-dimensional space and actually displaying it, the structural features of the underlying data are highlighted, resulting in research. Gaining insights that are important to the
[0005]
In this way, research on network embedding methods is extremely important from the standpoint of discovery science and machine learning, in which knowledge inherent in networks is acquired and structural principles are clarified. However, as the network size increases and becomes more complex, network embedding becomes more difficult and a more efficient embedding algorithm is required.
[0006]
One of the basic methods for intuitively understanding and analyzing the network structure is “browsing”. Browsing refers to directly examining a low-dimensional arrangement such as embedding a network in a low-dimensional space, following links from node to node on the arrangement, and comparing the connection relationship of each node. An efficient embedding algorithm is considered to be an algorithm that can generate an embedding in a low-dimensional Euclidean space of a network that is most suitable for browsing.
[0007]
Several methods for visualizing a network or graph by embedding in a low-dimensional Euclidean space such as two-dimensional or three-dimensional are known.
[0008]
It is possible to use a classic multidimensional scaling method (for example, see Non-Patent Document 1), and the most typical method is a spring model (for example, see Non-Patent Document 2). Spectral clustering (see, for example, Non-Patent Document 3) is known as a method of arranging on a spherical surface.
[0009]
These conventional methods assume that there is a distance between any pair of nodes in the network. For example, in the above-described spring model, a graph distance (graph-theoretic distance) is calculated for an arbitrary pair of nodes, and embedding in a low dimension is considered so that the distance is faithfully preserved.
[0010]
Here, the graph distance is the shortest path length (ie, the minimum number of links) on the graph between two nodes. For example, the Floyd algorithm (Floyd, RW “Algorithm 97 (Shortest path)”, Communications of the ACM, 5 (6), 345, 1962).
[0011]
In summary, all of the conventional methods can be said to be methods for obtaining an arrangement that satisfies the following [Principle B] as much as possible.
[0012]
[Principle B] (distance preserving principle) Each node pair determines the mutual arrangement so that the mutual graph distance is most faithfully restored in the low-dimensional Euclidean space.
[0013]
[Non-Patent Document 1]
Torgerson, W.S., "Theory and Methods of Scaling", Wiley, New York, 1958.
[Non-Patent Document 2]
Kamada, T., & Kawai, S. "An Algorithm for Drawing General Undirected Graph", Information Processing Letters, 32, 7-15, 1989.
[Non-Patent Document 3]
Ng, A., Jordan, M. & Weiss, Y. "On spectral clustering: Analysis and an algorithm" Proc. Of NIPS 14,2001.
[0014]
[Problems to be solved by the invention]
However, especially in the case of a large and complex network with a low embedding dimension, the degree of freedom of embedding is small, and embedding that preserves the distance faithfully as in the conventional method will fail.
[0015]
As a result, according to the prior art, it is often possible to obtain only an offset arrangement that is unsuitable for browsing.
[0016]
The present invention has been made in view of such circumstances, and does not take the approach of “save the distance faithfully” as in the conventional method, but more directly “save the connection relationship faithfully”. By taking an approach, the object is to provide a new network data low-dimensional embedding technique that realizes embedding of network data more suitable for browsing in a low-dimensional space.
[0017]
[Means for Solving the Problems]
Now, consider a network of N nodes, and let the adjacency matrix be A = (a_{i, j}).
[0018]
For simplicity of explanation, only undirected networks will be handled. That is, a_{i, j}∈ {0,1}, a_{i, i}= 1 and a_{i, j}= A_{j, i}Assume that However, the present invention can be easily extended to directed networks.
[0019]
Where a_{i, j}∈ {0,1} is a_{i, j}Indicates that the value of 1 is 1 or 0, and if i and j are connected, a_{i, j}= 1, i and j are not connected_{i, j}= 0.
[0020]
For a given network, an object of the present invention is to seek embedding of N nodes in the K-dimensional space that realizes the following [Principle A] in the meaning of the K-dimensional Euclidean distance described below. .
[0021]
[Principle A] (connectivity preserving principle) Each node arranges a node adjacent to it (that is, a node connected by a direct link) relatively closer to a non-adjacent node.
[0022]
Therefore, in the present invention, more optimal embedding is obtained by considering an objective function based on cross-entropy as described below.
[0023]
The coordinates of N nodes in the K-dimensional space are x₁, x₂, ..., x_N(Originally, it should be expressed as a vector, but is not expressed as a vector for convenience of description). Where x_iAnd x_jThe Euclidean distance between is defined as the following formula (1).
[0024]
[Expression 1]

[0025]
Next, consider a monotonically decreasing function ρ (u) ∈ [0, 1] for u ≧ 0, and ρ (0) = 1, ρ (∞) = 0. At this time, ρ (d_{i, j}) Is x_i, x_jIt can be regarded as a similarity function between (taking continuous values).
[0026]
The basic technical idea of the present invention is to paraphrase [Principle A] so that actual optimization becomes easier to handle.
[0027]
First, using this similarity, [Principle A] can be rephrased as “each node is arranged such that the similarity between an adjacent node and itself is greater than the similarity between a non-adjacent node”.
[0028]
In addition, a_{i, j}And ρ (d_{i, j}) [Principle A] is: “Each node has a similarity function ρ (d_{i, j}) Is the discrete similarity a_{i, j}In other words, the conditions are somewhat relaxed.
[0029]
Therefore, ρ (d_{i, j}) By a_{i, j}To approximate a_{i, j}And ρ (d_{i, j}Consider a negative cross-entropy as shown in the following equation (2).
[0030]
[Expression 2]

[0031]
This equation (2) is expressed as x_i, X_jΡ (d_{i, j}) = A_{i, j}I.e., when the continuous value similarity function completely matches the discrete similarity. E_{i, j}Note that is symmetric, x₁, x₂, ..., x_NThe total energy function to be minimized with respect to is given by equation (3) below.
[0032]
[Equation 3]

[0033]
Here, let's fix node i. Then a_{i, j}Can be interpreted as the binary class label of node j. That is, a_{i, j}If = 1, the label of the class to which j belongs is 1 and a_{i, i}Note that = 1, j belongs to the same class as i. On the other hand, a_{i, j}When = 0, the label of the class to which the node j belongs is 0, and j belongs to a class different from i.
[0034]
Therefore, this problem is expressed by using equation (3) as a standard objective function as a classification problem.₁, x₂, ..., x_NCan be interpreted as simultaneously solving N binary classification problems.
[0035]
Here, ρ (u) = exp (−u / 2) is always used as the similarity function (other functions can also be used). If this is used, the energy function of Formula (2) is given by the following Formula (4).
[0036]
[Expression 4]

[0037]
Using this equation (4), the final objective function is defined as the following equation (5) with a regularization term added.
[0038]
[Equation 5]

[0039]
Here, μ is a regularization coefficient and is determined in advance. By introducing such regularization terms, the algorithm is stable and at the same time the size of the embedded arrangement can be controlled.
[0040]
This objective function is considered to be the optimal arrangement when a node that is directly linked to its own node is arranged closer to the node than a node that is not directly linked to the node, and approaches the optimal arrangement. As the value becomes smaller, the value approaches a certain value.
[0041]
  The network data low-dimensional embedding device according to the present invention performs (1) the number of nodes in order to embed and arrange data represented by a network composed of a plurality of nodes and links connecting the nodes in a low-dimensional space. N, and the connection relationship between node i and node j is a indicating a value of 1, 0_{i, j}The adjacency matrix A = (a_{i, j}And (2) reading the network data from the storage means and writing it into the working memory.writeMeans, (3) setting means for randomly setting the initial values of the embedding positions of all nodes indicated by the number N of nodes written in the working memory, and writing them into the working memory, and (4) defined on the apparatus Objective function J, Euclidean distance d between node i and node j_{i, j}With d as the variable_{i, j}= 0 indicates maximum value, d_{i, j}Monotonically decreasing function ρ (d which shows a small value as the value of_{i, j})= Exp (-d _{i, j} / 2)And a_{i, j}Defined byIt is represented by the above formula (4)Cross entropy E_{i, j} ofDerived as a total energy functionIt is expressed by the above formula (5)Using the objective function J, the embedding position of each node that is sequentially updated starting from the initial value set by the setting means is used as a processing target, and the embedding position of each node toward the optimal arrangement that minimizes the objective function JGradient vectorA calculating means for calculating and writing to the working memory; and (5) the calculating means writes to the working memory.By determining whether the gradient vector has converged,Determining means for determining whether or not the update of the embedded position of the code is terminated;If you decide not to cutTogetherThen, based on the gradient vector written in the working memory, a variation vector of the node embedding position written in the working memory is calculated, and the variation vector is used to create the variation vector.Update the node embedding position written in the industrial memory.Newly call up the calculation methodNew means.
[0042]
  When configured in this way, the calculation means, for each node, determines the gradient vector when the other node is fixed and moved.Calculating the gradient vector at the embedding position of each node.May be calculated.
[0044]
  In addition, when configured in this way, the calculation means calculatesThe magnitude of the gradient vectorThere may be provided selection means for selecting one node that is an update target of the embedding position by selecting the node that shows the largest value.
[0045]
  When this selection means is provided, the determination means is for the node selected by the selection means.By determining whether the magnitude of the gradient vector has converged,It may be determined whether or not the update of the embedded position of the card is aborted.
[0046]
The network data low-dimensional embedding method of the present invention realized by the operation of each of the above processing means can be realized by a computer program, and this computer program is provided by being recorded on a recording medium such as a semiconductor memory. Or can be provided via a network.
[0047]
  In the network data low-dimensional embedding device of the present invention configured as described above, when the initial values of the embedding positions of all nodes are set at random, the objective function having the above-described properties is used, and the set initial values are processed. As an improvement indicator for the embedding position of each node for optimal placementThe gradient vectorcalculate.
[0048]
  And calculate thatBy determining whether the magnitude of the gradient vector has converged,Determine whether to cancel the update of the embedded position of theIf it is determined not to cut, a variation vector of the node embedding position is calculated based on the gradient vector, and the variation vector is used to calculate the variation vector.The node embedding position is updated so as to reach an appropriate arrangement.
[0049]
  Then, using the objective function with the above properties, the embedded position of the updated nodeTheImproving the embedding position of each node for optimal placement as a processing targetThe gradient vectorCalculateRepeat that.
[0050]
  In this way, using the objective function having the above-described properties, the embedding position of each node that is sequentially updated starting from the initial value is set as the processing target, and the improvement position of the embedding position of each node toward the optimal arrangement is specified.The gradient vectorBy repeating the calculation, the calculation is performed when updating the node embedding position so that the optimal placement is reached.The magnitude of the gradient vectorBased on this, when it is determined that the update of the node embedding position is to be aborted, the embedding position of each node obtained at that time is output as the final result.
[0051]
As described above, in the present invention, for each node, the embedding position of each node is determined so that the node directly connected to the own node is arranged closer to the node not directly connected to the node. For this reason, it becomes possible to embed network data suitable for browsing in a low-dimensional space.
[0052]
DETAILED DESCRIPTION OF THE INVENTION
Next, embodiments of the present invention will be described with reference to the drawings.
[0053]
FIG. 1 shows an example of an embodiment of a network data low-dimensional embedding device 1 comprising the present invention.
[0054]
As shown in this figure, the network data low-dimensional embedding device 1 according to the present invention uses network data stored in the network data database 2 as a processing target and embeds the processing target network data in a low-dimensional Euclidean space. In order to realize this embedding process, an embedding arrangement initialization unit 10, a gradient vector initialization unit 11, a mobile node selection unit 12, a variation vector calculation unit 13, a gradient vector The update unit 14, the embedded arrangement update unit 15, and the work memory unit 16 are provided.
[0055]
Here, the working memory unit 16 is arranged to arrange the coordinates x of each node.₁, x₂, ..., x_NIt is prepared for temporarily storing intermediate results and final results.
[0056]
FIG. 2 is a flowchart showing the operation of the network data low-dimensional embedding device 1 of the present invention configured as described above.
[0057]
Next, according to this flowchart, the process of the network data low-dimensional embedding apparatus 1 of this invention comprised in this way is demonstrated in detail.
[0058]
In the network data low-dimensional embedding device 1 according to the present invention, first, the adjacency matrix A = (a_{i, j}), The embedding dimension K, the constant μ for regularization, and the convergence accuracy constant ε for determining the termination condition are input to the embedding arrangement initialization unit 10 (step 20).
[0059]
Here, for μ and ε, for example, both μ and ε are μ, ε∈ [10^-8, 10^-Four] May be set within the range.
[0060]
Next, in the embedded arrangement initialization unit 10, t = 1 and the embedded arrangement for all nodes, that is, the arrangement coordinates x of each node₁, x₂, ..., x_NAre initially set at random (step 21).
[0061]
Next, in the gradient vector initialization unit 11, for each node i, x of the objective function J defined by the above equation (5)._iThe gradient vector J_xiIs calculated by the following equation (6) (step 22).
[0062]
[Formula 6]

[0063]
However, by differentiating the above equation (4), E_{i, j}Differential ∂E_{i, j}Can be calculated by the following equation (7).
[0064]
[Expression 7]

[0065]
Next, in the mobile node selection unit 12, the square norm of the gradient vector ‖J according to the following equation (8):_xj ^(t)‖²The node i that maximizes is selected (step 23). Hereinafter, the square norm of the gradient vector may be referred to as a gradient norm.
[0066]
[Equation 8]

[0067]
Next, the gradient norm ‖J for the node i selected in step 23_xi ^(t)‖²Is smaller than the convergence accuracy constant ε (step 24), the gradient norm ‖J_xi ^(t)‖²Is smaller than the convergence accuracy constant ε, it is assumed that it has already converged, and the current embedded arrangement x₁, x₂, ..., x_NIs output (step 28), and the process is terminated.
[0068]
Meanwhile, the gradient norm ‖ J_xi ^(t)‖²Is not smaller than the convergence accuracy constant ε, the process proceeds to step 25, where the variation vector calculation unit 13 determines the variation vector Δx of the node i to obtain a new arrangement coordinate of the node i._iIs calculated by the following formula (9) using the Hessian matrix H (note that the transpose matrix of x is represented by x ′).
[0069]
[Equation 9]

[0070]
However, since H is not necessarily a positive definite value, “J^{(t + 1)}<J^(t)Is not always true.
[0071]
Therefore, as a result of applying Equation (9), “ΔJ = J^{(t + 1)}-J^(t)> 0 ”is checked to see if“ ΔJ = J^{(t + 1)}-J^(t)If> 0, the application of Equation (9) is stopped and the following Equation (10) is used instead.
[0072]
[Expression 10]

[0073]
However, the step length λ is “J^{(t + 1)}<J^(t)To choose. J_xi ^(t)Since is a gradient direction, if a sufficiently small λ is selected, the objective function J can always be reduced. This ensures that the algorithm converges.
[0074]
Here, the variation vector Δx of the node i in such a manner that the expression (9) is given priority over the expression (10)._iIs calculated by the variation vector Δx obtained by the equation (9)._iThis is because the calculation efficiency can be increased by taking a larger value.
[0075]
Further, only the arrangement coordinates of the node i are changed, and the arrangement coordinates of the other nodes are not changed, so that between the value of J at (t + 1) between the other nodes and the value of J at t. Considering that there is no change, ΔJ can be calculated by O (N) (the amount of calculation on the order of N) by calculating only the difference related to node i according to the following equation (11). In addition, since the situation of using Formula (10) instead of Formula (9) occurs only occasionally, it does not become a computational burden very much.
[0076]
## EQU11 ##

[0077]
Next, in the gradient vector update unit 14, J_xj ^(t)Using J_xj ^{(t + 1)}Is updated (step 26).
[0078]
If this update process is described, x of the objective function J_jThe gradient vector J_xjIs calculated by the following equation (12) based on the equation (5).
[0079]
[Expression 12]

[0080]
On the other hand, J_xj ^{(t + 1)}And J_xj ^(t)In between
J_xj ^{(t + 1)}= J_xj ^(t)+ ΔJ_xj
Therefore, the following relational expression (13) is established based on this expression (12).
[0081]
[Formula 13]

[0082]
Here, since only the arrangement coordinates of the node i are changed and the arrangement coordinates of the other nodes are not changed, the following relational expression (14) is established with h ≠ i, j.
[0083]
[Expression 14]

[0084]
Therefore, J_xj ^{(t + 1)}And J_xj ^(t)The relational expression of the following expression (15) holds based on the expressions (14) and (13).
[0085]
[Expression 15]

[0086]
Here, as can be seen from the equations (4) and (7), the following equation (16) is established.
[0087]
[Expression 16]

[0088]
Therefore, based on this equation (16) and equation (15), J_xj ^{(t + 1)}And J_xj ^(t)The following equation (17) holds between
[0089]
[Expression 17]

[0090]
From this, in step 26, based on this equation (17), J_xj ^(t)Using J_xj ^{(t + 1)}Is updated. Therefore, J_xjCan be calculated by O (N) (the calculation amount of the order of N).
[0091]
J_xi ^{(t + 1)}For J, changing the arrangement coordinates of node i_xi ^(t)Therefore, the calculation is performed from the beginning according to the equation (6).
[0092]
Next, in step 27, the embedded arrangement updating unit 15 calculates the variation vector Δx of the node i calculated by the variation vector calculation unit 13._iBased on the location coordinates x of each node₁, x₂, ..., x_NAnd t = t + 1, and the process returns to the process of the mobile node selection unit 12 (step 23).
[0093]
In this way, the network data low-dimensional embedding device 1 according to the present invention uses the method of minimizing the objective function based on the cross entropy as shown in the equation (2), so that each node is directly connected to its own node. Processing is performed so as to determine the embedding position of each node so that the nodes connected by links are arranged closer to the nodes not directly connected by links.
[0094]
Next, the effectiveness of the present invention is shown by experiments using actual data.
[0095]
(B) Data used in the experiment
In order to verify the effectiveness of the present invention, an experiment was conducted using three different types of networks constructed from actual data.
[0096]
The first network derived from biology is the Escherichia coli gene regulatory network, and the document “Shen-Orr, SS, Milo, R., Mangan, S. & Alon, U.” Network motifs in the "Transcriptional Regulation Network of Escherichia Coli", Nature Genetics Volume, 31, No. 1, 64-68, 2002.
[0097]
The second network is a co-author network for papers published from the 1st to the 12th of the international conference NIPS (Neural Information Processing Systems). 0-12. ", Http://www.cs.toronto.edu/roweis/data.html, 2002." This second network consists of each author as a node and links between the authors if two authors publish at least one co-authored paper.
[0098]
The third network is a WWW (World Wide Web) hyperlink network, which is configured by collecting all WWW pages belonging to the applicant's site.
[0099]
In order to verify the effectiveness of the present invention, these networks were first converted into undirected networks, and only the largest connected component was extracted. Usually it is natural to calculate each connected component separately in this way.
[0100]
FIG. 3 shows statistics regarding the extracted networks. Here, the number of nodes is N, and L^T, L^A, L^MRepresents the total number of links, the average number of links, and the maximum number of links for each network. L_iIs the number of links directly connected to the node i, the following relational expression (18) is established between them.
[0101]
[Formula 18]

[0102]
Next, G^AAnd G^MDefine G^A, G^MRepresents the average value and the maximum value of the graph distance between any two nodes in each network. g_{i, j}Is a graph-theoretic distance between nodes i and j, the following relational expression (19) holds.
[0103]
[Equation 19]

[0104]
Although the adjacency matrices of these networks are all common in that they are sparse matrices, it can be seen that each statistic has different characteristics. The third network is a network that is particularly large and complex compared to the first network and the second network.
[0105]
(B) Evaluation scale
In order to measure the effectiveness of the present invention, the following appropriate evaluation scale is required.
[0106]
Suppose that embedding in a K-dimensional Euclidean space consisting of N nodes is now obtained, and the arrangement coordinates are expressed as x₁, x₂, ..., x_NAnd
[0107]
So each x_iWith radius r_iK-dimensional sphere B_i(R_i)think of. In an ideal embedding that fully realizes [Principle A], an appropriate radius r at each node i_iBy selecting, B includes all nodes adjacent to i (connected by direct links), and conversely does not include nodes that are not adjacent to i._i(R_i) Should be configurable.
[0108]
However, in actual embedding, such r in all nodes i, especially in the case of a complex network with a small dimension K._iIt is generally impossible to choose. However, the optimal r under an appropriate measure_iShould be able to think of.
[0109]
In particular, in the case of a sparse network, the number of non-adjacent nodes is overwhelmingly larger than that of adjacent nodes._i(R_i) And the number of nodes correctly included in_i(R_iIt is not appropriate to use the usual “accuracy” as a measure to consider the total number of nodes correctly excluded from Because all nodes are completely B regardless of whether they are adjacent_i(R_iThis is because the accuracy is high even if it is not included. In such cases, it is appropriate to use the F-scale. The F-scale is a scale widely used in fields such as information retrieval and text classification.
[0110]
The F-scale is defined as the harmonic mean of “precision” and “recall”. Now, if #X represents the number of elements in the set X, x_iAnd r_iI-th sphere B corresponding to_i(R_i) Conformance rate P_i(R_i) Is defined as the following equation (20).
[0111]
[Expression 20]

[0112]
Similarly, x_iAnd r_iI-th sphere B corresponding to_i(R_iRecall rate R_i(R_i) Is defined as the following equation (21).
[0113]
[Expression 21]

[0114]
Roughly speaking, to increase the precision, r_iOn the contrary, to increase the recall, r_iSince it is necessary to increase_iIs determined by the balance between the two. More precisely, the optimal radius <r_i> Can be defined as maximizing the F-scale defined by equation (22) below. However, in the experiment, α = ½ is always used.
[0115]
[Expression 22]

[0116]
By averaging this equation (22) for all nodes, the “connectivity F-measure”, which is a measure for quantitatively evaluating the placement of a network embedded in a low dimension, is expressed by the following equation ( 23).
[0117]
[Expression 23]

[0118]
In the experiment conducted to verify the effectiveness of the present invention, the “connected F-scale” defined by the equation (23) was used as the evaluation scale.
[0119]
(C) Comparison using connection F-scale
The proposed method of the present invention and the three existing methods of classical multidimensional scaling, spring model, and spectral clustering were applied to the three types of networks described above, respectively, and embedding in the K-dimensional space was obtained. In addition, the results were evaluated using the connected F-scale. The experiment was performed by changing the value of K from 2 to 25 for the first network and from 2 to 9 for the second network and the third network.
[0120]
FIG. 4 shows a graph of the evaluation results. Here, the vertical axis represents the value of the connection F-scale, and the horizontal axis represents the embedded dimension K. Symbols in the figure mean CE: the proposed method of the present invention, CMDS: classical multidimensional scaling, KK: spring model, and SC: spectral clustering, respectively.
[0121]
A higher connection F-scale is a better solution. As for the proposed method and the spring model of the present invention, since there is a problem of the local solution, the result of changing the initial value of the random number 5 times is displayed, but as far as the result is seen, there is almost no variation in the result, It can be seen that the influence of the local solution can be ignored. It can be seen that with each method, the performance improves monotonically as the dimensionality K increases and the degree of embedding increases.
[0122]
As can be seen from FIG. 4, as expected, it can be seen that the proposed method of the present invention is superior to other methods, particularly in the low dimension where placement is difficult. Above all, it can be seen that the performance difference between the proposed method of the present invention and the conventional method is conspicuous in the experiment using the third network which is large in size.
[0123]
(D) Results of embedding in two dimensions
On the left side of each of FIGS. 5A to 5C, the results of embedding the third network in the actual two dimensions by the classical MDS (CMDS), the spring model (KK), and the proposed method (CE) of the present invention are shown. Spectral clustering is omitted because it is arranged on a three-dimensional sphere.
[0124]
The result of the classic multidimensional scaling method shown in FIG. 5A is problematic in terms of visualization, particularly browsing, because many nodes are degenerated to one point. This is considered a limitation of the linear method.
[0125]
On the other hand, the spring model shown in FIG. 5B and the proposed method of the present invention shown in FIG. Furthermore, if we look carefully at these, many nodes are arranged in a semicircular shape in the former, while the latter spreads more uniformly and radially. It can be seen that the proposed method of the present invention makes more efficient use of space, and it can be said that the difference in performance between the two connected F-scales appears here. This is presumably because the graph distance to be restored by the former spring model takes only discrete values. The results of the spring model make browsing difficult, especially in areas where nodes are dense.
[0126]
5A to 5C, the right-hand side view is obtained by cutting out a region surrounded by a dotted line and recalculating and embedding only this portion. A total of six areas surrounded by dotted lines including before and after cutting correspond to the same subnetwork.
[0127]
As can be easily seen from this figure, in the classic multidimensional scaling method and the spring model, the arrangement changes considerably before and after the cutting, whereas in the proposed method of the present invention, the change is relatively small. Therefore, it is considered that the cutting stability is higher. Considering this, it can be said that the proposed method of the present invention is an embedding method more suitable for browsing.
[0128]
【The invention's effect】
As described above, the present invention enables visual representation by embedding data represented by a network composed of a plurality of nodes and links connecting the nodes in a two-dimensional or three-dimensional low-dimensional Euclidean space. By using the method of minimizing the objective function based on cross-entropy, the node that is directly connected to its own node for each node is closer to the node that is not directly connected to it Since the processing is performed so as to determine the embedding position of each node, it is possible to realize embedding suitable for browsing utilizing the space efficiency as compared with the conventional method.
[Brief description of the drawings]
FIG. 1 is an example of an embodiment of a network data low-dimensional embedding device according to the present invention.
FIG. 2 is a flowchart showing the operation of the network data low-dimensional embedding device of the present invention.
FIG. 3 is an explanatory diagram of statistics relating to a network used in an experiment conducted to verify the effectiveness of the present invention.
FIG. 4 is a graph showing the evaluation results of experiments conducted to verify the effectiveness of the present invention.
FIG. 5 is an explanatory diagram of a result of embedding in a two-dimensional space in an experiment performed to verify the effectiveness of the present invention.
[Explanation of symbols]
1 Network data low-dimensional embedding device
2 Network data database
10 Embedded placement initialization section
11 Gradient vector initialization unit
12 Mobile node selector
13 Variation vector calculator
14 Gradient vector update unit
15 Embedded placement update unit

Claims

A network data low-dimensional embedding method that is executed by a computer and embeds and arranges data expressed in a network composed of a plurality of nodes and links connecting the nodes in a low-dimensional space,
Network data composed of the number N of nodes and the adjacency matrix A = (a _{i, j} ) describing the connection relationship between the node i and the node j with a _{i, j} indicating a value of 1, 0, A first step of reading from the storage means for storing it and writing to the working memory;
A second step of writing to the working memory by setting a random initial values of the embedding positions of all nodes indicated by written node number N to the working memory,
An objective function J defined on the computer, which has a maximum value when d _{i, j} = 0 _, with Euclidean distance d _{i, j} between node i and node j as a variable, d _{i, j} Monotonically decreasing function ρ (d _{i, j} ) showing a smaller value as the value of
ρ (d _{i, j} ) = exp (−d _{i, j} / 2)
And the cross entropy E _{i, j} defined by a _{i, j} above

Objective function J derived as the total energy function of

The used sequentially for processing the embedded position of each node is updated to calculate the gradient vector at the embedding position location of each node towards optimal arrangement to minimize this objective function J as a starting point the initial value A third process of writing to the working memory
By the size of the written gradient vector in the working memory to determine whether the converged, a fourth step of determining whether or not abort the update of the embedded position of the node,
If it is determined not Kira out above, based on the written gradient vector in the working memory, we calculate the variation vector of the embedding position of the written node to the working memory, the upper SL using the modified portion vector update the embedding position of the written node in the working memory to be provided with an over extent fifth calling the third step,
Characteristic network data low-dimensional embedding method.

The network data low-dimensional embedding method according to claim 1 ,
Upper Symbol than third excessive extent of, for each node, by calculating the gradient vectors obtained while moving the node to secure the other nodes, calculating a gradient vector at the embedding position of each node ,
Characteristic network data low-dimensional embedding method.

The network data low-dimensional embedding method according to claim 1 or 2 ,
By the size of the upper Symbol gradient vector selects a node indicating a highest value, further comprising the step of selecting one node to be updated the embedded position,
Characteristic network data low-dimensional embedding method.

A network data low-dimensional embedding device that embeds and arranges data represented by a network composed of a plurality of nodes and links between the nodes in a low-dimensional space,
Stores network data composed of the number N of nodes and the adjacency matrix A = (a _{i, j} ) describing the connection relationship between the nodes i and j with a _{i, j} indicating a value of 1,0 Storage means for
From the storage means, and writing means for writing in working memory reads the network data,
Setting means for randomly setting an initial value of an embedding position of all nodes indicated by the number N of nodes written in the working memory and writing the initial value in the working memory;
An objective function J defined on the device, which has a maximum value when d _{i, j} = 0 _, with Euclidean distance d _{i, j} between node i and node j as a variable, d _{i, j} Monotonically decreasing function ρ (d _{i, j} ) showing a smaller value as the value of
ρ (d _{i, j} ) = exp (−d _{i, j} / 2)
And the cross entropy E _{i, j} defined by a _{i, j} above

Objective function J derived as the total energy function of

The used sequentially for processing the embedded position of each node is updated to calculate the gradient vector at the embedding position location of each node towards optimal arrangement to minimize this objective function J as a starting point the initial value Calculating means for writing to the working memory
By the size of the written gradient vector in the working memory to determine whether the convergence, determining means for determining whether or not abort the update of the embedded position of the node,
If it is determined not Kira out above, based on the written gradient vector in the working memory, we calculate the variation vector of the embedding position of the written node to the working memory, the upper SL using the modified portion vector and update the embedded position of the written node in the working memory, further comprising an update hand stage calling the calculation means,
Features network data low-dimensional embedding device.

The network data low-dimensional embedding device according to claim 4 ,
Upper SL calculated hand stage, for each node, by calculating the gradient vectors obtained while moving the node to secure the other nodes, calculating a gradient vector at the embedding position of each node,
Features network data low-dimensional embedding device.

In the network data low-dimensional embedding device according to claim 4 or 5 ,
By the size of the upper Symbol gradient vector selects a node indicating a highest value, further comprising selection means for selecting one node to be updated the embedded position,
Features network data low-dimensional embedding device.

The network data low-dimensional embedding program for making a computer perform the process used for implementation | achievement of the network data low-dimensional embedding method of any one of Claim 1 thru | or 3 .

A recording medium recording a network data low-dimensional embedding program for causing a computer to execute processing used to realize the network data low-dimensional embedding method according to any one of claims 1 to 3 .