JP5237977B2

JP5237977B2 - Graph centrality monitoring apparatus, method, and program

Info

Publication number: JP5237977B2
Application number: JP2010011302A
Authority: JP
Inventors: 靖宏藤原
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2010-01-21
Filing date: 2010-01-21
Publication date: 2013-07-17
Anticipated expiration: 2030-01-21
Also published as: JP2011150539A

Description

本発明は、グラフの中心性のモニタリング装置及び方法及びプログラムに係り、特に、時々刻々と増大するグラフの中心性をモニタリングするためのグラフの中心性のモニタリング装置及び方法及びプログラムに関する。 The present invention relates to a graph centrality monitoring apparatus, method, and program, and more particularly, to a graph centrality monitoring apparatus, method, and program for monitoring the centrality of a graph that increases from moment to moment.

グラフは実世界に存在するものとそれらの関係をノードとエッジという形で表現するデータ構造である。その直感的な表現方法により、グラフデータは数学やコンピュータ科学の分野において広く使われているデータ構造である。 A graph is a data structure that expresses what exists in the real world and their relationship in the form of nodes and edges. Due to its intuitive expression method, graph data is a data structure widely used in the fields of mathematics and computer science.

グラフデータを扱うグラフ理論において、与えられたノードの中から適切なノードを求める施設配置問題は非常に重要な問題の１つである。この問題においては、他のノードからの距離の合計が短いノードの方がより適切だと考えられている。これは、その他のノードからの移動コストが小さいことが期待されるためである。 In the graph theory that handles graph data, the facility placement problem of finding an appropriate node from given nodes is one of the very important problems. In this problem, it is considered that a node having a short total distance from other nodes is more appropriate. This is because the movement cost from other nodes is expected to be small.

グラフ理論において、この適切さを測る尺度として「中心性」が用いられている。この「中心性」は他のノードまでの距離の和の逆数から計算され、中心性が高いノードがより適切であるとされる。施設配置問題は最適化問題の典型的なものであり、経済や産業などの広い分野に応用が可能である。この問題は、例えば輸送・通信・伝送システムなどにおいて、その効率性を解析するために用いられている（例えば、非特許文献１参照）。しかし、従来よりグラフ理論が対象としていた問題にはグラフは静的であり、ノード数は変化しないという大きな前提があった。 In graph theory, “centrality” is used as a measure of this appropriateness. This “centrality” is calculated from the reciprocal of the sum of the distances to other nodes, and a node with high centrality is considered more appropriate. The facility placement problem is a typical optimization problem and can be applied to a wide range of fields such as economy and industry. This problem is used to analyze the efficiency of, for example, a transportation / communication / transmission system (see Non-Patent Document 1, for example). However, the problem that the graph theory has traditionally considered is that the graph is static and the number of nodes does not change.

しかし近年、時間と共にノード数が時々刻々と増大し、結果的にノード数が数十万以上になるグラフデータを扱うことが必要となってきた。これは巨大なデータベースとインターネットを用いる環境が整ってきた結果である。近年の研究により時々刻々と増大し続けるグラフデータの特徴が明らかになりつつある（例えば、非特許文献２参照）。そのため、時々刻々と増大し、結果的に非常に大きなサイズになるグラフを効率的に処理する需要が高まりつつある。 However, in recent years, it has become necessary to handle graph data in which the number of nodes increases with time and as a result the number of nodes exceeds several hundred thousand. This is the result of an environment that uses a huge database and the Internet. The characteristics of graph data, which continues to increase from moment to moment, are becoming clear by recent research (for example, see Non-Patent Document 2). For this reason, there is an increasing demand for efficiently processing graphs that increase from moment to moment, resulting in a very large size.

本明細書では、時々刻々と増大し続けるグラフの中から最も高い中心性を有するノードを検出し続ける問題を扱う。 This specification deals with the problem of continuously detecting the node having the highest centrality from the graph that continues to increase from moment to moment.

本明細書が対象とする問題の応用例としてソーシャルネットの解析があげられる。 As an application example of the problem targeted by this specification, there is an analysis of a social network.

社会科学者達によって様々な相互関係のネットワークについての研究が長い間行われてきた。これらのネットワークにおいてノードは人物や組織に対応し、エッジは社会的な関係に対応している。今までの研究において、「ネットワークにおいてどのノードが最も重要であるか？」という問題は彼らの大きな関心を集めてきた。 Social scientists have long studied various interrelated networks. In these networks, nodes correspond to people and organizations, and edges correspond to social relationships. In previous studies, the question “Which node is most important in the network?” Has attracted much attention.

直感的にはノードが多くのエッジと繋がっているということは、関心や人気をそのネットワークにおいて集めているということである。この考えの重要な具体例としてあげられるのは科学分野における出版のネットワークである。ノードはこの場合、論文や本やジャーナルを表し、エッジは参照を表す。論文や本が多くの参照を集めるほど、他の多くの研究者から有用であると判断されたほどである。そのため、そのような論文などを重要と考えるのは素直な発想である。この考えに基づき、Garfieldは、ジャーナルに対する重要度の尺度としてインパクトファクタを提案した。インパクトファクタは、出版毎の参照の数から定義される。基本的にインパクトファクタはグラフにおける次数に対応するシンプルな尺度である。 Intuitively, the fact that a node is connected to many edges means that it is gathering interest and popularity in the network. An important example of this idea is the publishing network in the scientific field. Nodes in this case represent papers, books and journals, and edges represent references. The more papers and books the more references are collected, the more useful it has been judged by many other researchers. Therefore, it is a straightforward idea to consider such papers as important. Based on this idea, Garfield proposed an impact factor as a measure of importance for journals. The impact factor is defined from the number of references per publication. The impact factor is basically a simple measure that corresponds to the order in the graph.

しかし、このような尺度はグラフ全体から見れば局所的な尺度である。それは次数がノードの隣接ノードの個数から決定されるからである。そのため、次数が高いノードであっても、もしグラフ内の孤立したコミュニティに属するのであれば、そのノード全体に対する影響は大きいとは限らなくなる。 However, such a scale is a local scale when viewed from the whole graph. This is because the order is determined from the number of adjacent nodes of the node. Therefore, even if a node has a high degree, if it belongs to an isolated community in the graph, the influence on the entire node is not necessarily large.

中心性の値は他のノードまでの距離の和の逆数から計算されるため、局所的な尺度ではない。そのため、中心性によってノードの重要性を測るのは次数より適切な尺度と考えられる。時系列的に最も重要なノードは、中心性に基づき時々刻々と増大するグラフをモニタリングすることで検出できる。例えば、Nascimentoらはデータベースにおける著名な国際会議であるSIGMODの共著者関係のグラフを1975年から2002年まで解析した（例えば、非特許文献３参照）。そして彼らは、L. A. RoweとM. StonebrakerとM. J. Careyがそれぞれ1986年から1988年、1989年から1992年、1992年から2002年まで最も影響力のある研究者であることを検出することに成功した。これら３人の研究者は、データベースコミュニティにおいて非常に著名な研究者として知られている。なお、最も共著関係にある著者（最も次数が高いノード）は必ずしも最も中心性の高かったわけではなかったことは、情報検索の分野の国際会議の共著関係を解析した結果から示されている（例えば、非特許文献４参照）。 Since the centrality value is calculated from the reciprocal of the sum of the distances to other nodes, it is not a local measure. Therefore, measuring the importance of a node by its centrality is considered a more appropriate measure than the order. The most important node in time series can be detected by monitoring a graph that increases every moment based on centrality. For example, Nascimento et al. Analyzed a coauthorship graph of SIGMOD, a prominent international conference in the database, from 1975 to 2002 (see, for example, Non-Patent Document 3). And they succeeded in detecting that LA Rowe, M. Stonebraker and MJ Carey were the most influential researchers from 1986 to 1988, 1989 to 1992, and 1992 to 2002, respectively. . These three researchers are known as very prominent researchers in the database community. Note that the authors who have the most co-authorship relationships (nodes with the highest degree) were not necessarily the most central, as shown in the results of analyzing co-authorship relationships at international conferences in the field of information retrieval (for example, Non-Patent Document 4).

この学術的なネットワークは非常に高い速度で増大することが知られている。例えば、データベース分野の共著関係のネットワークは数万以上の研究者で構成され、また、研究者の数は年々数千以上増加していることが知られている（例えば、非特許文献５参照）。そのため、成長し続けるネットワークのために用いる高速な解析方法に対する需要が高まっている。 This academic network is known to grow at a very high rate. For example, the network of co-authorship in the database field is composed of tens of thousands of researchers, and the number of researchers is known to increase by several thousand or more each year (for example, see Non-Patent Document 5). . Therefore, there is an increasing demand for high-speed analysis methods used for growing networks.

Ulrik Brandes, and Thomas Erlebach, Network Analysis: Methodological Foundations, Springer, 2008.Ulrik Brandes, and Thomas Erlebach, Network Analysis: Methodological Foundations, Springer, 2008. Jure Leskovec, Jon M. Kleinberg, and Christos Faloutsos, Graph evolution: Densification and shrinking diameters, TKDD, 2007.Jure Leskovec, Jon M. Kleinberg, and Christos Faloutsos, Graph evolution: Densification and shrinking diameters, TKDD, 2007. Mario A. Nascimento, Jorg Sander, and Jeffrey Pound, Analysis of SIGMOD's co-authorship graph, SIGMOD Record, 2003.Mario A. Nascimento, Jorg Sander, and Jeffrey Pound, Analysis of SIGMOD's co-authorship graph, SIGMOD Record, 2003. Djoerd Hiemstra, Claudia Hauff, Franciska de Jong, and Wessel Kraaij, SIGIR's 30th anniversary: an analysis of trends in IR research and the topology of its community, SIGIR forum, 2007.Djoerd Hiemstra, Claudia Hauff, Franciska de Jong, and Wessel Kraaij, SIGIR's 30th anniversary: an analysis of trends in IR research and the topology of its community, SIGIR forum, 2007. Ergin Elmacioglu, and Dongwon Lee On six degrees of separation in DBLP-DB and more, SIGMOD Record, 2005.Ergin Elmacioglu, and Dongwon Lee On six degrees of separation in DBLP-DB and more, SIGMOD Record, 2005.

定式的に述べると本明細書では以下の問題を対象とする。 In formal terms, this specification addresses the following problems.

対象とする問題を「中心性のモニタリング」とし、時刻ｔにおける重み無し無向グラフＧ［ｔ］=（Ｖ，Ｅ）が与えられ、中心性の値が最も高くなるノードを見つける場合について説明する。なお、ここで、ＶとＥは、それぞれノードとエッジの集合とする。ノードの中心性は幅優先探索により計算することができる（例えば、Ulrik Brandes, A Faster Algorithm for Betweenness Centrality, Journal of Mathematical Sociology, 2001）。幅優先探索は、グラフの探索において広く使われる手法である。幅優先探索は、グラフＧ（Ｖ，Ｅ）が与えられたとき始点ノードｕから到達可能な周辺のノードを辿りながら徐々に距離を計算していく。ノードｕの中心性を計算するために既存の手法では、幅優先探索を用いてノードｕからその他すべてのノードまでの距離を計算し、距離の合計を計算する。 A case will be described where the target problem is “monitoring of centrality” and an unweighted undirected graph G [t] = (V, E) at time t is given to find a node having the highest centrality value. . Here, V and E are a set of nodes and edges, respectively. Node centrality can be calculated by breadth-first search (eg Ulrik Brandes, A Faster Algorithm for Betweenness Centrality, Journal of Mathematical Sociology, 2001). Breadth-first search is a technique widely used in graph search. In the breadth-first search, when a graph G (V, E) is given, the distance is gradually calculated while following the peripheral nodes that can be reached from the start node u. In the existing method for calculating the centrality of the node u, the distance from the node u to all other nodes is calculated using a breadth-first search, and the sum of the distances is calculated.

幅優先探索を用いて最も中心性ノードを計算するためには、Ｏ（ｎ^２＋ｎｍ）の計算量が必要となる。これはすべてのｎ個のノードに対してＯ（ｎ＋ｍ）の計算量が必要になる幅優先探索を行うからである（例えば、Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein, Introduction to Algorithms, The MIT Press, 2009参照.)。グラフが大規模になると、この処理には莫大な計算量が必要となる。また、時々刻々と成長するグラフに対してはグラフが成長するためにこの処理を繰り返すこととなるために、幅優先探索により最も中心性の高いノードを求めるのは現実的ではない。 In order to calculate the most central node using the breadth-first search, a calculation amount of O (n ² + nm) is required. This is because a breadth-first search that requires O (n + m) computation for all n nodes is performed (for example, Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein). , Introduction to Algorithms, The MIT Press, 2009.). When the graph becomes large, this processing requires a huge amount of calculation. Further, since this process is repeated for a graph that grows from moment to moment because the graph grows, it is not realistic to obtain the node with the highest centrality by the breadth-first search.

本発明は、上記の点に鑑みなされたもので、時々刻々と増大するグラフの中から最も中心性の高いノードを効率的に、かつ、正確に求めることが可能なグラフの中心性のモニタリング装置及び方法及びプログラムを提供することを目的とする。 The present invention has been made in view of the above points, and is a graph centrality monitoring device that can efficiently and accurately determine the node having the highest centrality from a graph that increases every moment. And a method and a program.

図１は、本発明の原理構成図である。 FIG. 1 is a principle configuration diagram of the present invention.

本発明（請求項１）は、時々刻々と増大するグラフの中から、グラフ理論において適切さを測る尺度である中心性が最も高いノードを求めるモニタリング装置であって、
グラフとして時刻毎に変動するノードまたはエッジを受信するデータ受信手段１１０と、
データ受信手段１１０で受信した元のグラフのノードを近似の中心性により枝刈りし、近似グラフを計算する集約手段１２０と、
データ受信手段１１０で受信した元のグラフと集約手段１２０で計算された近似グラフを格納するデータ保持手段１３０と、
データ保持手段１３０から元のグラフと近似グラフを読み出して、グラフにおいて最も中心性の高いノードを計算する検出手段１４０と、を有する。 The present invention (Claim 1) is a monitoring device for obtaining a node having the highest centrality, which is a measure for measuring appropriateness in graph theory, from among ever increasing graphs.
Data receiving means 110 for receiving a node or an edge that varies with time as a graph;
Aggregating means 120 for pruning the nodes of the original graph received by the data receiving means 110 with approximate centrality and calculating an approximate graph;
A data holding unit 130 for storing the original graph received by the data receiving unit 110 and the approximate graph calculated by the aggregation unit 120;
And a detection unit 140 that reads the original graph and the approximate graph from the data holding unit 130 and calculates a node having the highest centrality in the graph.

また、本発明（請求項２）は、請求項１のモニタリング装置において、集約手段１２０は、データ受信手段で受信した元のノード集合のうち、ノード間の距離総和の逆数を求め、該逆数の値が近似するノード同士を纏めて近似グラフとしてデータ保持手段１３０に格納する手段を含み、
検出手段１４０は、データ保持手段１３０から元のグラフと近似グラフを読み出して、始点ノードから他のすべてのノードまでの距離を幅優先探索により求め、あるノードの中心性の上限値が該近似グラフの中心性より小さければ探索を終了する手段を含む。 Further, according to the present invention (Claim 2), in the monitoring apparatus according to Claim 1, the aggregating unit 120 obtains the reciprocal of the distance sum between the nodes in the original node set received by the data receiving unit, and the reciprocal Means for storing together nodes having similar values together in the data holding means 130 as an approximate graph;
The detecting unit 140 reads the original graph and the approximate graph from the data holding unit 130, obtains the distance from the start node to all other nodes by a breadth-first search, and the upper limit value of the centrality of a certain node is A means for terminating the search if it is smaller than the centrality is included.

図２は、本発明の原理を説明するための図である。 FIG. 2 is a diagram for explaining the principle of the present invention.

本発明（請求項３）は、時々刻々と増大するグラフの中から、グラフ理論において適切さを測る尺度である中心性が最も高いノードを求めるモニタリング方法であって、
受信したグラフ及び近似グラフを格納する記憶手段を有するコンピュータにおいて、
グラフとして時刻毎に変動するノードまたはエッジを受信し、前記記憶手段に格納するデータ受信ステップ（ステップ１）と、
前記データ受信ステップで受信した元のグラフのノードを近似の中心性により枝刈りし、近似グラフを計算し、該近似グラフを前記記憶手段に格納する集約ステップ（ステップ２）と、
前記記憶手段から前記元のグラフと前記近似グラフを読み出して、グラフにおいて最も中心性の高いノードを計算する検出ステップ（ステップ３）と、を行う。 The present invention (Claim 3) is a monitoring method for obtaining a node having the highest centrality, which is a measure for measuring appropriateness in graph theory, from among ever increasing graphs.
In a computer having storage means for storing received graphs and approximate graphs,
A data receiving step (step 1) for receiving a node or an edge that varies with time as a graph and storing it in the storage means;
An aggregation step (step 2) for pruning the nodes of the original graph received in the data receiving step with approximate centrality, calculating an approximate graph, and storing the approximate graph in the storage means;
A detection step (step 3) of reading the original graph and the approximate graph from the storage means and calculating a node having the highest centrality in the graph is performed.

また、本発明（請求項４）は、請求項３のモニタリング方法において、
集約ステップでは、データ受信ステップで受信した元のノード集合のうち、ノード間の距離総和の逆数を求め、該逆数の値が近似するノード同士を纏めて近似グラフとして前記記憶手段に格納し、
検出ステップでは、記憶手段から元のグラフと近似グラフを読み出して、始点ノードから他のすべてのノードまでの距離を幅優先探索により求め、あるノードの中心性の上限値が該近似グラフの中心性より小さければ探索を終了する。 The present invention (Claim 4) is the monitoring method of Claim 3,
In the aggregation step, from the original node set received in the data reception step, the reciprocal of the sum of the distances between the nodes is obtained, and the nodes whose reciprocal values are approximated are collectively stored in the storage means as an approximate graph,
In the detection step, the original graph and the approximate graph are read from the storage means, the distances from the start point node to all other nodes are obtained by breadth-first search, and the upper limit of the centrality of a certain node is determined from the centrality of the approximate graph. If it is smaller, the search is terminated.

本発明（請求項５）は、請求項１または２に記載のモニタリング装置を構成する各手段としてコンピュータを機能させるためのモニタリングプログラムである。 The present invention (Claim 5) is a monitoring program for causing a computer to function as each means constituting the monitoring apparatus according to Claim 1 or 2.

上記のように本発明によれば、以下のような効果を奏する。 As described above, the present invention has the following effects.

・効率的：
既存の手法は大規模なグラフに対して莫大な計算量が必要であったが、本発明によれば、近似の中心性を用いることにより、高速に中心性の最も高いノードを検出することができる。グラフの近似は元のグラフが時々刻々と成長しても効率的に計算することができる。 ·Efficient:
The existing method requires a huge amount of calculation for a large-scale graph. However, according to the present invention, the node having the highest centrality can be detected at high speed by using approximate centrality. it can. The graph approximation can be calculated efficiently even if the original graph grows from moment to moment.

・正確：
本発明によれば、ノードを枝刈りするために近似計算を用いるが、検出結果は正確であることが保証される。すなわち、検出結果に検出漏れなどは一切発生しない。 ·correct:
According to the present invention, an approximate calculation is used to prun a node, but the detection result is guaranteed to be accurate. That is, no detection omission occurs in the detection result.

・省メモリ：
本発明によれば、近似したグラフを保持するためにメモリ量は必要になるが、そのメモリ量は少ない。高々元のグラフを保障するのと同じメモリ量しか必要にならない。 -Memory saving:
According to the present invention, an amount of memory is required to hold an approximate graph, but the amount of memory is small. You only need the same amount of memory to guarantee the original graph at most.

本発明の原理構成図である。It is a principle block diagram of this invention. 本発明の原理を説明するための図である。It is a figure for demonstrating the principle of this invention. 本発明の一実施の形態におけるモニタリング装置の構成図である。It is a block diagram of the monitoring apparatus in one embodiment of this invention. 本発明の一実施の形態におけるグラフの例である。It is an example of the graph in one embodiment of this invention. 本発明の一実施の形態におけるノード集約の例である。It is an example of node aggregation in one embodiment of this invention. 本発明の一実施の形態における探索の枝刈りを説明するための図である。It is a figure for demonstrating the pruning of the search in one embodiment of this invention. 本発明の一実施の形態における中心性の計算のアルゴリズムの例である。It is an example of the algorithm of the centrality calculation in one embodiment of this invention. 本発明の一実施の形態における中心性のモニタリングのアルゴリズムの例である。It is an example of the algorithm of the centrality monitoring in one embodiment of this invention. 実験によるＰ２Ｐデータの処理時間を示す図である。It is a figure which shows the processing time of P2P data by experiment. 実験によるＷＷＷデータの処理時間を示す図である。It is a figure which shows the processing time of WWW data by experiment. 実験によるＤＢＬＰデータの処理時間を示す図である。It is a figure which shows the processing time of DBLP data by experiment.

以下、図面と共に本発明の実施の形態を説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.

図３は、本発明の一実施の形態におけるモニタリング装置の構成を示す。 FIG. 3 shows a configuration of the monitoring apparatus according to the embodiment of the present invention.

同図に示すモニタリング装置１００は、データ受信部１１０、集約部１２０、データ保持部１３０、検出部１４０から構成される。 The monitoring apparatus 100 shown in the figure includes a data reception unit 110, an aggregation unit 120, a data holding unit 130, and a detection unit 140.

データ受信部１１０は、時刻毎に変動するグラフ（ノードまたはエッジ）を受信し、集約部１２０に渡すと共に、受信したグラフのノードとエッジをデータ保持部１３０に送る。 The data receiving unit 110 receives a graph (node or edge) that fluctuates for each time, passes the graph (node or edge) to the aggregation unit 120, and sends the received graph node and edge to the data holding unit 130.

集約部１２０は、内部にメモリ（図示せず）を有し、受信した元のグラフのノードを集約し、近似グラフを計算する。 The aggregating unit 120 has a memory (not shown) therein, aggregates the received nodes of the original graph, and calculates an approximate graph.

データ保持部１３０は、ディスク装置やメモリ等の記憶媒体であり、データ受信部１１０から渡された元のグラフと集約部１２０で計算された近似グラフを保持する。 The data holding unit 130 is a storage medium such as a disk device or a memory, and holds the original graph passed from the data receiving unit 110 and the approximate graph calculated by the aggregation unit 120.

検出部１４０は、データ保持部１３０から元のグラフと近似グラフを読み出して、これらのグラフにおいて最も中心性の高いノードを計算する。 The detection unit 140 reads the original graph and the approximate graph from the data holding unit 130, and calculates the node having the highest centrality in these graphs.

まず、本明細書で用いる記号を定義し、必要となる背景知識を説明する。 First, symbols used in this specification are defined and necessary background knowledge is explained.

主な記号とその定義を以下に示す。 The main symbols and their definitions are shown below.

実世界のネットワークは、グラフＧ＝（Ｖ，Ｅ）として表現することができる。ここで、Ｖは、ノードの集合とし、Ｅはエッジの集合とする。また、ｎとｍはそれぞれノードとエッジの数とする。すなわち、ノードの数ｎ＝│Ｖ│であり、エッジの数ｍ=│Ｅ│である。

A real-world network can be expressed as a graph G = (V, E). Here, V is a set of nodes, and E is a set of edges. N and m are the numbers of nodes and edges, respectively. That is, the number of nodes n = | V |, and the number of edges m = | E |.

「ノードｕからノードｖまでのパス」とは、ノードｕから始まりノードｖで終わるエッジの並びである。パスの中に含まれるノードが最も少ないものは「最短パス」と呼ばれる。 The “path from the node u to the node v” is a sequence of edges starting from the node u and ending with the node v. The path with the fewest nodes is called the “shortest path”.

「ノードｕとノードｖの距離」は、最短パスに含まれるエッジの数の合計であり、ｄ（ｕ，ｖ）と定義される。当該定義からノードｕ∈Ｖに対してｄ（ｕ，ｕ）＝０であり、ノードｕ，ｖ∈Ｖに対してｄ（ｕ，ｖ）＝ｄ（ｖ，ｕ）となる。 “Distance between node u and node v” is the total number of edges included in the shortest path, and is defined as d (u, v). From this definition, d (u, u) = 0 for the node uεV, and d (u, v) = d (v, u) for the node u, vεV.

「ノードの中心性」はその他のノードまでの距離の和の逆数であり、以下のように定義される。 “Node centrality” is the reciprocal of the sum of distances to other nodes, and is defined as follows.

＜定義１＞ノードの中心性：
グラフＧ＝（Ｖ，Ｅ）において、その他のノードまでの距離和 <Definition 1> Node centrality:
In graph G = (V, E), sum of distances to other nodes

としたとき、ノードｕの中心性Ｃ_ｕは以下のように定義される。

, The centrality C _u of the node u is defined as follows.

図４は、本発明の一実施の形態におけるグラフの例である。

FIG. 4 is an example of a graph in one embodiment of the present invention.

同図において、ノードｕ_１の中心性は１／１１となる。これは、それぞれの距離が、
ｄ（ｕ_１，ｕ_１）＝０，
ｄ（ｕ_１，ｕ_２）＝１，
ｄ（ｕ_１，ｕ_３）＝２，
ｄ（ｕ_１，ｕ_４）＝２，
ｄ（ｕ_１，ｕ_５）＝３，
ｄ（ｕ_１，ｕ_６）＝３
となるからである。 In the figure, the centrality of the node u ₁ is 1/11. This is because each distance is
d (u ₁ , u ₁ ) = 0,
d (u ₁ , u ₂ ) = 1,
d (u ₁ , u ₃ ) = 2
d (u ₁ , u ₄ ) = 2
d (u ₁ , u ₅ ) = 3
d (u ₁ , u ₆ ) = 3
Because it becomes.

本発明は、時々刻々と成長するグラフに対して高速かつ正確に中心性の最も高いノードを求めることに大きな特徴がある。まず、２つの手法の概要を説明してからそれぞれの手法の詳細な説明を行う。 The present invention is greatly characterized in that a node having the highest centrality is obtained at high speed and accurately with respect to a graph that grows momentarily. First, an outline of the two methods is described, and then a detailed description of each method is given.

本発明の手法は、以下の２つの手法で構成される。 The technique of the present invention is composed of the following two techniques.

＜ノードの集約＞
従来の手法における莫大な計算量を削減するために、本発明では、集約部１２０において近似計算を行う。すべてのノードに対して正確な中心性を計算する代わりに、高速に近似の中心性を計算し、中心性の低いノードを効率的に枝刈りする。 <Node aggregation>
In the present invention, approximate calculation is performed in the aggregation unit 120 in order to reduce the enormous amount of calculation in the conventional method. Instead of calculating the exact centrality for all nodes, the approximate centrality is calculated at high speed, and nodes with low centrality are efficiently pruned.

図５にノードの集約の例を示す。 FIG. 5 shows an example of node aggregation.

集約部１２０では、最初に、グラフのサイズを小さくする。データ受信部１１０から受信した元のノードがｎ個でエッジがｍ個のグラフが与えられると、当該グラフに対して、似たノード同士を纏めることにより、ノードがn'個でエッジがｍ'個（ｎ'≪ｎ，ｍ'≪ｍ）であるようなグラフを新たに計算する。この近似のグラフにおいて近似の中心性を計算するコストはＯ（ｎ'+ｍ'）である。これは元のグラフにおける中心性の計算量Ｏ（ｎ＋ｍ）と比較すると大きな削減である。 The aggregation unit 120 first reduces the size of the graph. When a graph having n original nodes and m edges received from the data receiving unit 110 is given, similar nodes are gathered together in the graph, so that there are n ′ nodes and m ′ edges. A new graph such as n ′ << n, m ′ << m is calculated. In this approximate graph, the cost of calculating the approximate centrality is O (n ′ + m ′). This is a great reduction compared to the centrality calculation amount O (n + m) in the original graph.

元のグラフにおいて似たノードを計算するためには、Jaccard係数を用いる。本発明において、受信した元のグラフを纏めることを「集約」と呼ぶこととする。 To calculate similar nodes in the original graph, we use Jaccard coefficients. In the present invention, collecting received original graphs is referred to as “aggregation”.

上記の手法には、２つの優位性がある。１つ目の優位性として、近似の中心性によってノードの枝刈りを行っても最も中心性の高いノードを枝刈りすることはないことが挙げられる。これは近似の中心性の値が正確な中心性の値よりも小さくなることがないためである。この手法により正確になり得ないノードを効率的に枝刈りすることができる。 The above approach has two advantages. The first advantage is that the node having the highest centrality is not pruned even if the node is pruned by the approximate centrality. This is because the approximate centrality value does not become smaller than the exact centrality value. Nodes that cannot be accurate by this method can be efficiently pruned.

２つ目の優位性は、近似の計算を行うノードの数は、受信した元のグラフのノードの数と比較して大きく減少することがあげられる。このため、グラフが大規模になっても効率的な処理が可能となる。 The second advantage is that the number of nodes that perform approximate calculation is greatly reduced compared to the number of nodes in the received original graph. Therefore, efficient processing is possible even when the graph becomes large.

＜探索の枝刈り＞
上記の集約部１２０の処理により、ノードを集約するアプローチにより低い中心性を持つノードの大部分を枝刈りすることができるが、検索結果が正確になるためには正確な中心性の計算を行う必要がある。以下では、この計算量を削減するためのものである。図６の枝刈りの例を示す。 <Pruning search>
The above processing of the aggregating unit 120 can prune most of the nodes having low centrality by the approach of aggregating the nodes, but in order to make the search result accurate, accurate centrality calculation is performed. There is a need. The following is for reducing the calculation amount. An example of pruning in FIG. 6 is shown.

正確な中心性を計算するためには始点ノードからその他のすべてのノードまでの距離を計算する必要がある。この距離は、幅優先探索により求めることができるが、すべてのノードを探索するのはグラフが大規模になることを考えると好ましくない。そのため、本発明の検出部１４０では、始点ノードが最も高い中心性になり得なくなった時点で距離の計算を打ち切る。 In order to calculate the correct centrality, it is necessary to calculate the distances from the start node to all other nodes. This distance can be obtained by a breadth-first search, but it is not preferable to search all nodes in view of the fact that the graph becomes large. Therefore, in the detection unit 140 of the present invention, the calculation of the distance is terminated when the start node cannot become the highest centrality.

データ保持部１３０は、検出結果を出力する前に、高い中心性を持つ解候補のノードを保持しており、検出部１４０は、あるノードの中心性を計算する途中で、データ保持部１３０に保持されている探索していないノードの距離を推定することにより、そのノードの中心性の上限値を計算する。検出部１４０の検出処理において、もし、あるノードの中心性の上限値が解候補の中心性より小さければ、そのノードは解になることはない。そのため、不必要な距離の計算を打ち切る。 Before outputting the detection result, the data holding unit 130 holds a solution candidate node having a high centrality, and the detection unit 140 determines whether the centrality of a certain node is being calculated or not. The upper limit value of the centrality of the node is calculated by estimating the distance of the node that has not been searched. In the detection process of the detection unit 140, if the upper limit value of the centrality of a certain node is smaller than the centrality of the solution candidate, that node will not become a solution. Therefore, the unnecessary distance calculation is terminated.

上記のように、検出部１４０において、中心性の推定を行うことにより、幅優先探索においてグラフ全体の探索を行う必要が無くなるため、結果的に効率的な処理が可能となる。 As described above, by estimating the centrality in the detection unit 140, it is not necessary to search the entire graph in the breadth-first search, and as a result, efficient processing becomes possible.

上記の処理は、正確な中心性の計算のみでなく、近似の中心性の計算においても適用することができる。 The above processing can be applied not only to accurate centrality calculation but also to approximate centrality calculation.

次に、上記のノードの集約処理について詳細に説明する。 Next, the node aggregation processing will be described in detail.

集約部１２０は、データ受信部１１０から元のグラフ（ｎ個のノードとｍ個のエッジを持つグラフＧからｎ'個のノードとｍ個のエッジを持つグラフＧ）を取得する。そして、元のグラフＧを集約して近似の中心性を高速に計算するために、ｎ個のノードとｍ個のエッジを持つグラフＧからｎ'個のノードとｍ'個のエッジを持つグラフＧ'を計算する。すなわち、この処理によって、グラフＧ＝（Ｖ，Ｅ）は、近似グラフＧ'（Ｖ'，Ｅ'）に集約され、データ保持部１３０に格納される。 The aggregation unit 120 acquires an original graph (a graph G having n ′ nodes and m edges from a graph G having n nodes and m edges) from the data reception unit 110. A graph having n ′ nodes and m ′ edges from a graph G having n nodes and m edges to aggregate the original graph G and calculate the approximate centrality at high speed. G ′ is calculated. That is, by this process, the graph G = (V, E) is aggregated into the approximate graph G ′ (V ′, E ′) and stored in the data holding unit 130.

ここでは、まず、近似グラフにおけるエッジの計算方法を述べ、それから元のグラフのノードを集約する方法について述べる。 Here, first, an edge calculation method in an approximate graph will be described, and then a method of aggregating the nodes of the original graph will be described.

元のグラフにおいて集約されているノードがエッジを持つときのみ、近似グラフにおいてノードｕ'とｖ'は、エッジ｛ｕ'，ｖ'｝∈Ｅを持つとする。また、近似グラフは輪(self-loop)となるエッジは持たないものとする。これらの定義は後に述べるように、中心性の上限値を計算するために必要なものである。形式的には近似グラフがエッジを持つため必要十分条件は以下の通りである。 It is assumed that nodes u ′ and v ′ in the approximate graph have edges {u ′, v ′} ∈E only when the nodes aggregated in the original graph have edges. In addition, it is assumed that the approximate graph does not have an edge that becomes a ring (self-loop). These definitions are necessary for calculating the upper limit of centrality, as will be described later. Formally, the approximate graph has edges, so the necessary and sufficient conditions are as follows.

＜定義２＞ノードの集約：
近似グラフＧ"において、ノードｕ'とノードｖ'は以下の条件を満たすときのみエッジを持つものとする。 <Definition 2> Node aggregation:
In the approximate graph G ″, the nodes u ′ and v ′ have edges only when the following conditions are satisfied.

（１）ｕ'≠ｖ'
（２）∃｛ｕ，ｖ｝，ｕ∈ｕ'∩ｖ∈ｖ' （２）
ここで、ｕ∈ｕ'は、集約ノードｕ'が元のノードｕを含むことを示す。 (1) u ′ ≠ v ′
(2) ∃ {u, v}, u∈u′∩v∈v ′ (2)
Here, u∈u ′ indicates that the aggregate node u ′ includes the original node u.

図４において、ｕ_１，ｕ_２，ｕ_３，ｕ_４，ｕ_５をデータ受信部１１０から取得した元のノードとする。図５に示すとおり、集約部１２０において、ノードｕ_１，ｕ_２，ｕ_３をノードｕ'_１とし、ノードｕ_５，ｕ_６をノードｕ'_３に集約するものとする。ノードｕ'_２はノードｕ_４に対応する。元のグラフはエッジ｛ｕ_２，ｕ_４｝と｛ｕ_４，ｕ_５｝を持つため、近似グラフはエッジ｛ｕ'_１，ｕ'_２｝と｛ｕ'_２，ｕ'_３｝を持つ。 In FIG. 4, u ₁ , u ₂ , u ₃ , u ₄ , and u ₅ are the original nodes acquired from the data receiving unit 110. As shown in FIG. 5, in the aggregation unit 120, the nodes u ₁ , u ₂ , u ₃ are assumed to be nodes u ′ ₁ and the nodes u ₅ , u ₆ are assumed to be aggregated to nodes u ′ ₃ . Node u ′ ₂ corresponds to node u ₄ . Since the original graph has edges {u ₂ , u ₄ } and {u ₄ , u ₅ }, the approximate graph has edges {u ′ ₁ , u ′ ₂ } and {u ′ ₂ , u ′ ₃ }.

集約部１２０は、グラフを近似したときの誤差を減らすために元のグラフにおいて似たノード同士を集約する。既に述べたとおり、集約ノード同士は少なくとも元のノードが一つでもエッジを持つときにのみエッジを持つ。そのため、元のノードの隣接ノードの集合が似ているほど、一つのノードに集約したときの誤差が小さくなる。本発明では、集合の類似性の評価基準としてよく用いられるJaccard係数を用いて類似のノードを集約する。 The aggregation unit 120 aggregates similar nodes in the original graph in order to reduce an error when the graph is approximated. As already described, the aggregate nodes have edges only when at least one of the original nodes has an edge. Therefore, the closer the set of adjacent nodes of the original node is, the smaller the error when converging to one node. In the present invention, similar nodes are aggregated using a Jaccard coefficient that is often used as a criterion for evaluating the similarity of sets.

Ｎ_ｕとＮ_ｖをそれぞれノードｖとノードｕの隣接ノードの集合とする。すると、Jaccard係数は、
│Ｎ_ｕ∩Ｎ_ｖ│／│Ｎ_ｕ∪Ｎ_ｖ│
として計算することができる。これは共通する隣接ノードの数を共有する隣接ノードの数で割ったものである。ここで、「隣接ノード」とは、図４のグラフにおいて、ノードｕ_２の隣接ノードはｕ_１，ｕ_３，ｕ_４になり、また、ノードｕ_３の隣接ノードはｕ_２，ｕ_４になる。「共通するノード」とは、隣接ノードのＡＮＤ集合のことであり、ノードｕ_２とｕ_３の場合はｕ_４になる。また、「共有する隣接ノード」とは、隣接ノードのＯＲ集合のことであり、ノードｕ_２とノードｕ_３の場合は、ｕ_１，ｕ_２，ｕ_３，ｕ_４になる。 Let N _u and N _{v be} a set of nodes adjacent to node v and node u, respectively. Then the Jaccard coefficient is
│N _u ∩N _v │ / │N _u ∪N _v │
Can be calculated as This is the number of shared adjacent nodes divided by the number of shared adjacent nodes. Here, “adjacent node” means that in the graph of FIG. 4, the adjacent node of the node u ₂ is u ₁ , u ₃ , u ₄ , and the adjacent node of the node u ₃ is u ₂ , u ₄ . . The “common node” is an AND set of adjacent nodes, and becomes u ₄ in the case of the nodes u ₂ and u ₃ . The “shared adjacent node” is an OR set of adjacent nodes. In the case of the node u ₂ and the node u ₃ , they are u ₁ , u ₂ , u ₃ , u ₄ .

この例においては、集約部１２０は、ノードｕに対してノードｖが最も似ているとき、それらのノードを集約する。但し、似ていないノードを集約しないために、もし共通するノードの数が共有するノードの半分より少ない場合はノードを集約しない。 In this example, when the node v is most similar to the node u, the aggregating unit 120 aggregates those nodes. However, in order not to aggregate nodes that are not similar, nodes are not aggregated if the number of common nodes is less than half of the shared nodes.

本発明では、中心性の近似値を計算するが、その近似値は元の中心性の値より小さくなることはない。まず、近似値を計算する方法を述べる。それから近似グラフにおけるノード間の距離はそれに対応する元のグラフにおける距離より長くなることはないことを述べた上で、この特性によって近似グラフから中心性の上限値を求めることができることを示す。 In the present invention, an approximate value of centrality is calculated, but the approximate value is never smaller than the original centrality value. First, a method for calculating an approximate value will be described. Then, after mentioning that the distance between the nodes in the approximate graph is not longer than the corresponding distance in the original graph, it is shown that the upper limit value of the centrality can be obtained from the approximate graph by this characteristic.

検出部１４０は、データ保持部１３０から近似グラフを取得すると、中心性の近似値を以下のように求める。 When the detecting unit 140 acquires the approximate graph from the data holding unit 130, the detecting unit 140 obtains the approximate value of centrality as follows.

＜定義３＞中心性の近似値；
検出部１４０は、近似グラフにおけるノードｕ'の中心性の近似値を以下のように計算する。 <Definition 3> Approximate value of centrality;
The detection unit 140 calculates the approximate value of the centrality of the node u ′ in the approximate graph as follows.

ここで、│ｖ'│は集約ノードｖ'に含まれる元のグラフにおけるノードの数とする。

Here, | v ′ | is the number of nodes in the original graph included in the aggregate node v ′.

例として、図５の近似グラフにおいて、ノードｕ'_１の中心性の近似値は１／５となる。これは、
ｄ（ｕ'_１，ｕ'_２）＝１，
ｄ（ｕ'_１，ｕ'_３）＝２，
│ｕ'_２│＝１、
│ｕ'_３│＝２
だからである。なお、この近似値は元のグラフにおける中心性の値より大きくなっている。 As an example, in the approximate graph of FIG. 5, the approximate value of the centrality of the node u ′ ₁ is 1/5. this is,
d (u ′ ₁ , u ′ ₂ ) = 1,
d (u ′ ₁ , u ′ ₃ ) = 2
│u ' ₂ │ = 1
│u ' ₃ │ = 2
That's why. Note that this approximate value is larger than the centrality value in the original graph.

中心性の近似値が元の中心性の上限値となる性質を示すために、以下の補助定理を示す。 In order to show the property that the approximate value of centrality becomes the upper limit value of the original centrality, the following lemma is shown.

＜補助定理１＞ショートカット：
近似グラフのすべてのノードに対して以下の不等式が成立する。 <Lemma 1> Shortcut:
The following inequality holds for all nodes in the approximate graph.

ｄ（ｕ'，ｖ'）≦ｄ（ｕ，ｖ）（４）
［証明］
元のグラフのノードｕとｖの間の最短パスをｕ→ｖとする。また、その最短パスの一部分をｗ_ｉ→ｗ_ｊとする。もし最短パスにおけるすべてのノードが異なる集約ノードに集約されたとすると、その最短パスの長さと近似グラフで対応するパスの長さは等しくなる。これは近似グラフ上におけるパスｕ'→ｗ'_ｉ→ｗ'_ｊ→ｖ'は、元のグラフにおける最短パスｕ→ｗ_ｉ→ｗ_ｊ→ｖと完全に一致するからである。もしそうでなければ最短パスにおいて少なくとも２つ以上のノードが１つの集約ノードに集約されていることとなる。すなわち、元のグラフのノードｗ_ｉとｗ_ｊがノードｗ'に集約されていることとなる。そのため元のグラフにおける最短パスｕ→ｗ_ｉ→ｗ_ｊ→ｖは近似グラフにおいてパスｕ'→ｗ'→ｖ'に短縮される。結果としてｄ（ｕ'，ｖ'）≦ｄ（ｕ，ｖ）となる。 d (u ′, v ′) ≦ d (u, v) (4)
[Proof]
Let u → v be the shortest path between nodes u and v in the original graph. A part of the shortest path is defined as w _i → w _j . If all the nodes in the shortest path are aggregated into different aggregation nodes, the length of the shortest path is equal to the length of the corresponding path in the approximate graph. This is because the path u ′ → w ′ _i → w ′ _j → v ′ on the approximate graph completely matches the shortest path u → w _i → w _j → v in the original graph. Otherwise, at least two or more nodes are aggregated into one aggregation node in the shortest path. That is, the nodes w _i and w _{j of} the original graph are aggregated into the node w ′. Therefore, the shortest path u → w _i → w _j → v in the original graph is shortened to the path u ′ → w ′ → v ′ in the approximate graph. As a result, d (u ′, v ′) ≦ d (u, v).

補助定理１から中心性の近似について以下の定理を示すことができる。 The following theorem can be shown for the approximation of centrality from Lemma 1.

＜定理１＞（中心性の近似値）：
近似グラフのすべてのノードについて以下の不等式が成り立つ。 <Theorem 1> (approximate value of centrality):
The following inequality holds for all nodes in the approximate graph.

Ｃ_ｕ'≧Ｃ_ｕ（５）
［証明］
すべての集約ノードについて補助定理１から以下の不等式が成り立つ。 C _{u ′} ≧ C _u (5)
[Proof]
The following inequality holds from Lemma 1 for all aggregate nodes.

そのため以下の不等式が成り立つ。

Therefore, the following inequality holds.

よって定理は成り立つ。

Therefore, the theorem holds.

上記の定理１を用いることで検出結果が正確であることを保証している。この証明は後述する。 By using the above theorem 1, it is guaranteed that the detection result is accurate. This proof will be described later.

次に、検出部１４０における、元のグラフの中心性を高速に計算するための手法を述べる。この手法は、もし、あるノードの中心性が候補ノードの中心性より小さくなった段階で幅優先探索を打ち切る。この処理は、幅優先探索で探索を行っていない未探索ノードの距離を推定することで行う。 Next, a method for calculating the centrality of the original graph in the detection unit 140 at high speed will be described. This method aborts the breadth-first search when the centrality of a certain node becomes smaller than the centrality of a candidate node. This process is performed by estimating the distance of an unsearched node that has not been searched in the breadth-first search.

ここでは、まずノードの中心性を推定する式を定義する。そして幅優先探索で未探索ノードの距離の性質を示した後に、定義した推定式が中心性の上限値なることを示す。 Here, first, an expression for estimating the centrality of the node is defined. Then, after the nature of the distance of the unsearched node is shown by the breadth-first search, it shows that the defined estimation formula becomes the upper limit value of the centrality.

本例においては、幅優先探索においてノードｕの中心性を以下の式によって推定する。 In this example, the centrality of the node u is estimated by the following equation in the breadth-first search.

＜定義４＞中心性の推定：
検出部１４０は、幅優先探索において計算を打ち切るために、元のグラフにおいて、ノードｕの中心性の推定値Ｃ*_ｕを以下のように推定する。 <Definition 4> Estimation of centrality:
In order to terminate the calculation in the breadth-first search, the detection unit 140 estimates the centrality estimation value C * _u of the node u in the original graph as follows.

ここで、Ｖ_exを幅優先探索で探索を行ったノードの集合、

Here, a set of nodes searched for V _ex by breadth-first search,

を未探索のノードの個数、d_maxを以下に定義されるように探索を行ったノードの中で最も大きな距離とする。

Is the number of unsearched nodes, and d _max is the largest distance among nodes searched as defined below.

もし、すべてのノードを探索した場合、この推定値は正確な中心性の値と同じになる。距離の推定値の性質を示すために、未探索ノードにおいて成り立つ性質を以下に示す。ここで、「未探索ノード」とは、あるノードを始点に幅優先探索を行う過程で距離を計算してないノードを指す。例えば、図４のグラフにおいて、始点ノードをｕ_１としたときに幅優先探索は距離が短い順番にｕ_２，ｕ_３，ｕ_４，ｕ_５，ｕ_６とノードを辿っていく。ノードｕ_２まで探索が行われているとき未探索ノードはｕ_３，ｕ_４，ｕ_５，ｕ_６であり、ノードｕ_４まで探索が行われているとき未探索ノードはｕ_５，ｕ_６となる。

If all nodes are searched, this estimate is the same as the exact centrality value. In order to show the nature of the estimated value of distance, the properties that hold in the unsearched nodes are shown below. Here, an “unsearched node” refers to a node whose distance has not been calculated in the process of performing a breadth-first search starting from a certain node. For example, in the graph of FIG. 4, when the start node is u ₁ , the breadth-first search follows the nodes u ₂ , u ₃ , u ₄ , u ₅ , u _{6 in} order of increasing distance. The unsearched nodes are u ₃ , u ₄ , u ₅ , u ₆ when the search is performed up to the node u ₂ , and the unsearched nodes are u ₅ , u ₆ when the search is performed up to the node u _4. Become.

＜補助定理２＞未探索ノード；
未探索ノードの距離は探索を行ったどのノードの距離より短くなることはない。 <Lemma 2> Unsearched node;
The distance of unsearched nodes is never shorter than the distance of any node that has searched.

［証明］
幅優先探索においてノードの距離は、すでに探索を行ったノードから距離を用いて計算する。このため未探索ノードの距離は幅優先探索の処理を進めていく過程で単調増加となる。 [Proof]
In the breadth-first search, the distance of the node is calculated using the distance from the already searched node. For this reason, the distance of the unsearched node increases monotonously in the process of proceeding with the breadth-first search process.

この定理から未探索ノードの距離の下限値を計算できる。中心性の値は距離の和の逆数であるため、結果として中心性の上限値を求めることができる。 From this theorem, the lower limit of the distance of unsearched nodes can be calculated. Since the centrality value is the reciprocal of the sum of distances, the upper limit value of centrality can be obtained as a result.

＜定理２＞中心性の推定：
元のグラフにおいて、幅優先探索を行う過程で以下の不等式が成り立つ。 <Theorem 2> Estimation of centrality:
In the original graph, the following inequality holds in the process of performing breadth-first search.

［証明］
上記の補助定理２から幅優先探索において未探索ノードの距離は探索を行ったノードの距離より短くなることはない。そのため以下の不等式が成り立つ。

[Proof]
From the above lemma 2, the distance of the unsearched node in the breadth-first search does not become shorter than the distance of the searched node. Therefore, the following inequality holds.

この定理から当該方法による検出結果が正確になることが示されるが、その証明は後述する。

This theorem shows that the detection result by the method is accurate, and its proof will be described later.

図７に示す「アルゴリズム１」は、検出部１４０における中心性の計算を示す。ここで、候補ノードの正確な中心性の値をθとする。この「アルゴリズム１」では解になり得ないノードを解候補のノードの正確な中心性θを用いて枝刈りする。もし、中心性の推定値がθよりも小さければ、そのノードは最も高い中心性の値を持つことはない。そのため、幅優先探索の途中であっても安全に枝刈りすることができる（図７の６〜8行目）。この推定値はすべてのノードを探索した場合は正確な中心性の値となるため、アルゴリズムの最後でノードuの中心性の推定値Ｃ*_ｕを返す（図７の10行目）。なお、正確な中心性の値θは、解候補の正確な中心性の値を用いるが、解候補の選び方については後に詳述する。 “Algorithm 1” illustrated in FIG. 7 indicates calculation of centrality in the detection unit 140. Here, θ is the exact centrality value of the candidate node. A node that cannot be a solution in this “algorithm 1” is pruned using the exact centrality θ of the solution candidate node. If the centrality estimate is less than θ, the node will not have the highest centrality value. Therefore, pruning can be safely performed even during the breadth-first search (lines 6 to 8 in FIG. 7). Since this estimated value becomes an accurate centrality value when all nodes are searched, the centrality estimated value C * _u of the node _u is returned at the end of the algorithm (line 10 in FIG. 7). Note that the exact centrality value θ uses the exact centrality value of the solution candidate, and how to select the solution candidate will be described in detail later.

具体的には、図４のグラフにおいて、ノードｕ_１を始点ノード、また、θ＝１／６とする。検出部１４０は、ノードｕ_２における推定値は１／５であり、当該候補ノードの中心性の値θ＝１／６より大きいので、このノードからノードｕ_３とｕ_４を探索する。しかし、ノードｕ_３とｕ_４における推定値は１／９であり、θ＝１／６より小さいため、これらのノードからは探索を行わず計算を打ち切る。 Specifically, in the graph of FIG. 4, the node u ₁ is a start node and θ = 1/6. Since the estimated value at the node u ₂ is 1/5 and is larger than the centrality value θ = 1/6 of the candidate node, the detection unit 140 searches for the nodes u ₃ and u ₄ from this node. However, since the estimated value at nodes u ₃ and u ₄ is 1/9 and smaller than θ = 1/6, the calculation is terminated without searching from these nodes.

ここでは、元のグラフにおいて幅優先探索を打ち切る方法について述べたが、同様の議論は、集約部１２０の近似グラフにおいても適用することができる。 Here, the method of canceling the breadth-first search in the original graph has been described, but the same discussion can be applied to the approximate graph of the aggregation unit 120.

時々刻々と増大し続けるグラフに対して中心性の高いノードを検出し続ける方法について述べる。本実施例では、１時刻毎に１つのノードが新たにグラフに追加されるものとする。 A method for continuously detecting nodes with high centrality in a graph that continues to increase from moment to moment will be described. In this embodiment, it is assumed that one node is newly added to the graph every time.

最も中心性の高いノードを検出する主な手法は前述したとおりである。すなわち、検出部１４０は、集約部１２０で求められた近似グラフで中心性の低いノードを枝刈りし、元のグラフにおいて探索の打ち切りを行いながら正確に最も中心性の高いノードを求める。しかし、時々刻々と増大するグラフを対象としたときには、以下の２つの課題がある。 The main method for detecting the node having the highest centrality is as described above. That is, the detection unit 140 prunes a node with low centrality in the approximate graph obtained by the aggregation unit 120, and accurately obtains the node with the highest centrality while censoring the search in the original graph. However, there are the following two problems when targeting an ever-increasing graph.

（１）時々刻々と増大するグラフの近似グラフを計算するために、どのようにして最も似ているノードを計算するのか。 (1) How to calculate the most similar node in order to calculate an approximate graph of a graph that increases every moment.

（２）中心性の低いノードを効率的に枝刈りするために、時々刻々と増大するグラフの中からどのようにして候補のノードを選択するのか。 (2) How to select a candidate node from an ever-increasing graph in order to efficiently prun a node with low centrality.

これらの問題を解決するために、時々刻々と増大するグラフに対する以下の知見を示す。 In order to solve these problems, the following knowledge about the graph which increases every moment is shown.

ノードが加わった前後においてグラフに大きな違いはない。 There is no big difference in the graph before and after the node is added.

時々刻々と増大するグラフにおいては、少数のノードが既に存在する多数のノードから構成されるグラフに加わる。そのため、次々と新しいノードがグラフに加わることがあっても、ノードが加わる前後においてグラフに対する変化は小さなものである。以下の手法はこの知見に基づいたものである。 In a graph that increases from moment to moment, a small number of nodes join a graph composed of a large number of nodes that already exist. Therefore, even if new nodes are added to the graph one after another, the change with respect to the graph is small before and after the node is added. The following method is based on this finding.

上記の（１）の問題に対する解決策は、データ受信部１１０で受信し、グラフに新たに加わったノードの周辺のノードのみに類似性を計算するというものである。最も似ているノードを求めるために素朴な方法は新たに加わったノードとすべてのノードの組み合わせに対して類似性を計算するというものである。 The solution to the above problem (1) is to calculate the similarity only to the nodes around the node newly received by the data receiving unit 110 and newly added to the graph. A naive way to find the most similar node is to calculate the similarity for the newly added node and all node combinations.

しかし、当該方法では高速に最も似ているノードを更新するために以下の補助定理を用いる。 However, this method uses the following lemma to update the most similar node at high speed.

＜補助定理３＞類似ノード：
あるノードに対して最も似ているノードまでの距離は２より大きくなることはない。 <Lemma 3> Similar nodes:
The distance to the most similar node for a node cannot be greater than 2.

［証明］
共通する隣接ノードを持たない２つのノードのJaccard係数はその定義より０である。共通する隣接ノードを介した長さの２のパスが存在するため、Jaccard係数が正になるノードまでの距離はせいぜい２である。 [Proof]
The Jaccard coefficient of two nodes that do not have a common adjacent node is 0 by definition. Since there are two paths of length through the common adjacent node, the distance to the node where the Jaccard coefficient is positive is at most two.

具体的には、図４のグラフにおいて、ノードｕ_１とｕ_５のJaccard係数はｄ（ｕ_１，ｕ_５）＞２であるため０である。同様の理由からノードｕ_１とｕ_６はノードｕ_１に対して最も似ているノードにはなり得ない。 Specifically, in the graph of FIG. 4, the Jaccard coefficients of the nodes u ₁ and u ₅ are 0 because d (u ₁ , u ₅ )> 2. For similar reasons, nodes u ₁ and u ₆ cannot be the most similar nodes to node u ₁ .

データ保持部１３０に追加されたノードに対して補助定理３から以下の定理が成り立つ。 The following theorem holds from Lemma 3 for the node added to the data holding unit 130.

＜定理３＞Update the most similar nodes（最も類似するノードの更新）：
追加されたノードに対して最も似ているノードまでの距離はせいぜい２である。 <Theorem 3> Update the most similar nodes:
The distance to the most similar node with respect to the added node is at most 2.

［証明］
補助定理３より明らか。 [Proof]
Clearly from Lemma 3.

最も似ているノードを高速に更新するために、定理３を用いる。すなわち、集約部１２０では、追加されたノードからの距離が２以下であるノードだけに対してJaccard係数を計算する。 Theorem 3 is used to update the most similar node at high speed. That is, the aggregating unit 120 calculates Jaccard coefficients only for nodes whose distance from the added node is 2 or less.

上記では、はじめに述べたとおり、１つのノードのみが追加される場合について述べたが、定理３はノードが削除された場合についても適応することができる。また、もし複数のノードが追加されたら上記の処理を複数回行うことで対応することができる。また、１つのエッジが追加／削除された場合は、エッジの片方のノードを削除してから再度そのノードを追加することで対応することができる。 In the above, as described above, the case where only one node is added has been described, but theorem 3 can also be applied to the case where a node is deleted. Also, if a plurality of nodes are added, the above processing can be performed a plurality of times to cope with it. Further, when one edge is added / deleted, it can be dealt with by deleting one node of the edge and then adding the node again.

２つ目の問題の解決策として、本実施例では、検出部１４０は、１時刻前の最も高い中心性の値を持つノードを解候補のノードとして選択する。もし解候補のノードの中心性が低ければ、多くのノードを枝刈りすることはできない。しかし、先に示した知見から１時刻前の最も高い中心性を持つノードは再度最も高い中心性の値を持つことが期待される。そのために、ノードの正確な中心性を計算し、候補ノードの中心性の値θとする。 As a solution to the second problem, in the present embodiment, the detection unit 140 selects a node having the highest centrality value one hour before as a solution candidate node. If the centrality of the solution candidate nodes is low, many nodes cannot be pruned. However, from the knowledge shown above, the node having the highest centrality one hour before is expected to have the highest centrality value again. For this purpose, the exact centrality of the node is calculated and set as the centrality value θ of the candidate node.

図８に示す「アルゴリズム２」に最も中心性が高いノードを検出するアルゴリズムを示す。 “Algorithm 2” shown in FIG. 8 shows an algorithm for detecting a node having the highest centrality.

ここで、ｕ_best[t]を最も中心性の高いノード、ｕ_best[t-1]を１時刻前の最も中心性の高いノード、ｕ_addを新たに加わったノードとする。また、Ｖ_exactは正確な中心性の値を計算するノードの集合とする。 Here, u _best [t] is the node with the highest centrality, u _best [t−1] is the node with the highest centrality one hour before, and u _add is the newly added node. V _exact is a set of nodes for calculating an accurate centrality value.

「アルゴリズム２」は、更新と検出の２つの段階に分けることができる。更新段階においては、集約部１２０が、最も似ているノードを計算してから近似グラフを求める（図７の２〜５行目）。検出段階において、検出部１４０は、まず１時刻前の最も中心性の高いノードを用いて候補ノードの中心性の値θを計算する（図８の８行目）。もし中心性の近似値がθより小さくなれば、その集約ノードに集約されるノードは解にはなり得ない。そのため、その集約ノードを枝刈りする。近似値がθより小さくなればV_exactに集約されているノードに加える（図８の１１〜１５行目）。その集約ノードに集約されるノードは解になりえるからである。検出部１４０は、Ｖ_exactに含まれるノードの正確な中心性の値を計算し、解であるかを確認する（図８の１７〜２３行目）。 “Algorithm 2” can be divided into two stages of update and detection. In the update stage, the aggregation unit 120 calculates an approximate graph after calculating the most similar node (second to fifth lines in FIG. 7). In the detection stage, the detection unit 140 first calculates the centrality value θ of the candidate node using the node having the highest centrality one time before (line 8 in FIG. 8). If the approximate value of centrality is smaller than θ, a node aggregated in the aggregation node cannot be a solution. Therefore, the aggregation node is pruned. If the approximate value is smaller than θ, it is added to the node aggregated in V _exact (lines 11 to 15 in FIG. 8). This is because a node aggregated in the aggregation node can be a solution. The detection unit 140 calculates the exact centrality value of the nodes included in V _exact and confirms whether the solution is a solution (lines 17 to 23 in FIG. 8).

以下では、上記の手法における検出結果の正確性及び計算量についての理論的な解析を行う。計算量の観点から以下の２つについて述べる。 In the following, a theoretical analysis is performed on the accuracy of the detection results and the amount of calculation in the above method. The following two are described from the viewpoint of computational complexity.

（１）正確に最も中心性の高いノードを検出する。 (1) The node having the highest centrality is accurately detected.

（２）メモリ量及び計算時間は既存の手法と同等である。 (2) The amount of memory and the calculation time are equivalent to the existing method.

なお、ｎはノードの数であり、また、ｍはエッジの数である。 Note that n is the number of nodes, and m is the number of edges.

以下に示すとおり、本発明では、正確に最も中心性の高いノードを検出することを示す。 As will be described below, the present invention indicates that the node having the highest centrality is accurately detected.

＜定理４＞正確な検出：
正確に最も中心性の高いノードを検出することを保証する。 <Theorem 4> Accurate detection:
Guarantees that it will accurately detect the most centralized node.

［証明］
ｕ_bestを元のグラフにおいて最も中心性の高いノードとし、θ_maxをノードｕ_bestの正確な中心性の値とする。また、θを解候補の正確な中心性の値とする。 [Proof]
Let u _best be the node with the highest centrality in the original graph, and θ _{max be} the exact centrality value of the node u _best . In addition, θ is an accurate centrality value of the solution candidate.

近似グラフにおいてθ_max＞θであるため、ノードｕ_bestの近似の中心性の値はθより小さくなることはない（定理１）。同様に元のグラフにおいてノードｕ_bestの中心性の推定値はθより小さくなることはない（定理２）。検出部１４０の検出処理において、ノードが枝刈りされるのは、近似の中心性の値または中心性の推定値がθより小さくなる場合のみである。そのため、ノードｕ_bestが枝刈りされることはあり得ない。 Since θ _max > θ in the approximate graph, the approximate centrality value of the node u _best is never smaller than θ (Theorem 1). Similarly, in the original graph, the estimated value of the centrality of the node u _best is never smaller than θ (Theorem 2). In the detection process of the detection unit 140, the node is pruned only when the approximate centrality value or the centrality estimation value is smaller than θ. Therefore, the node u _best cannot be pruned.

まずは、幅優先探索を用いる既存の手法について述べる。 First, an existing method using breadth-first search is described.

＜補助定理４＞（既存手法）グラフが与えられたとき、既存の手法は最も中心性の高いノードを求めるためにＯ（ｎ＋ｍ）のメモリ量とＯ（ｎ^２＋ｎｍ）の計算時間が必要である。 <Auxiliary Theorem 4> (Existing Method) When a graph is given, the existing method requires a memory amount of O (n + m) and a calculation time of O (n ² + nm) in order to obtain the most central node. is there.

［証明］
最も中心性の高いノードを求めるために既存の手法は隣接リスト表現によってグラフを保持する。隣接リスト表現においてはＯ（ｎ＋ｍ）のメモリ量が必要になる。また、グラフのあるノードの中心性の値を求めるためには幅優先探索を行うことが必要なため、Ｏ（ｎ＋ｍ）の計算時間が必要である。既存手法ではすべてのノードの中心性のある値を求める必要があるため、Ｏ（ｎ^２＋ｎｍ）の計算時間が必要となる。 [Proof]
In order to find the most central node, the existing method holds the graph by the neighbor list representation. In the adjacent list expression, a memory amount of O (n + m) is required. Further, since it is necessary to perform a breadth-first search in order to obtain the centrality value of a certain node in the graph, a calculation time of O (n + m) is required. In the existing method, since it is necessary to obtain a central value of all nodes, a calculation time of O (n ² + nm) is required.

＜補助定理５＞本発明のメモリ量：
本発明は最も中心性の高いノードを求めるためにＯ（ｎ＋ｍ）のメモリ量が必要である。 <Auxiliary Theorem 5> Memory amount of the present invention:
The present invention requires a memory amount of O (n + m) in order to obtain the most central node.

［証明］
本発明は、受信した元のグラフと集約部１２０から得られた近似グラフをデータ保持部１３０で保持する。近似グラフにおいてノードとエッジの数はそれぞれ元のグラフのノードとエッジの数を超えることはない。そのため、近似グラフを保持するために必要となるメモリ量はＯ（ｎ＋ｍ）である。また、データ保持部１３０は、元のグラフを保持するためのメモリ量はＯ（ｎ＋ｍ）であるため、本発明が必要とするメモリ量はＯ（ｎ＋ｍ）である。 [Proof]
In the present invention, the received original graph and the approximate graph obtained from the aggregation unit 120 are held in the data holding unit 130. In the approximate graph, the number of nodes and edges does not exceed the number of nodes and edges in the original graph, respectively. Therefore, the amount of memory required for holding the approximate graph is O (n + m). Further, since the data holding unit 130 has a memory amount for holding the original graph of O (n + m), the memory amount required by the present invention is O (n + m).

＜補助定理６＞本発明の計算時間：
本発明は、最も中心性の高いノードを求めるためにＯ（ｎ^２＋ｎｍ）の計算時間が必要である。 <Auxiliary Theorem 6> Calculation time of the present invention:
The present invention requires a calculation time of O (n ² + nm) in order to obtain the node having the highest centrality.

［証明］
最も中心性の高いノードを求めるため、本発明は近似グラフを求めてから近似グラフと元のグラフでノードの枝刈りを行う。近似グラフを求めるには、Ｏ（ｎｍ）の計算時間が必要である。これは、Jaccard係数を計算する計算時間がＯ（ｍ）であり、また新たに加わったノードと他のすべてのノードとJaccard係数を計算する必要があり得るからである。また、近似グラフと元のグラフのノードとエッジの数はそれぞれせいぜいｎとｍであるため、ノードを枝刈りするための計算時間はＯ（ｎ^２＋ｎｍ）となる。結果、本発明の計算時間はＯ（ｎ^２＋ｎｍ）となる。 [Proof]
In order to obtain the node having the highest centrality, the present invention obtains an approximate graph and then prunes the node using the approximate graph and the original graph. In order to obtain an approximate graph, a calculation time of O (nm) is required. This is because the calculation time for calculating the Jaccard coefficient is O (m), and it may be necessary to calculate the Jaccard coefficient with the newly added node and all other nodes. Further, since the numbers of nodes and edges in the approximate graph and the original graph are n and m, respectively, the calculation time for pruning the nodes is O (n ² + nm). As a result, the calculation time of the present invention is O (n ² + nm).

メモリ量と計算時間についての補助定理から本発明について以下の性質を示すことができる。 From the lemma about the amount of memory and the calculation time, the following properties can be shown for the present invention.

＜定理５＞計算量の同等性：
本発明のメモリ量及び計算時間はそれぞれ既存手法と同等である。 <Theorem 5> Equivalence of computational complexity:
The memory amount and calculation time of the present invention are the same as those of the existing method.

［証明］
補助定理４から補助定理６より明らか。 [Proof]
Clearly from Lemma 4 to Lemma 6.

●実験：
本発明の有効性を調べるために実験を行った。 ● Experiment:
An experiment was conducted to examine the effectiveness of the present invention.

実験では、既存の幅優先探索に基づく手法と比較を行った。実験では公開されている実データＰ２Ｐ(Point to Point)，ＷＷＷ(World Wide Web)，ＤＢＬＰを用いた。これらはそれぞれ大学のＰ２Ｐネットワークのデータ、「nd,edu」ドメイン内のウェブデータ、コンピュータ科学の文献参照ネットワークのデータである。実験では、ネットワークから最も大きな連結成分を抜き出し、ひとつひとつノードを加えていった。 In the experiment, we compared with the method based on the existing breadth-first search. In the experiment, public data P2P (Point to Point), WWW (World Wide Web), and DBLP were used. These are respectively the data of the university P2P network, the web data in the “nd, edu” domain, and the data of the computer science literature reference network. In the experiment, the largest connected component was extracted from the network, and nodes were added one by one.

実験は、ＣＰＵがIntel Xeon Quad-Core 3.33GHz、メモリが32GBのLinuxサーバで行った。また、すべてのアルゴリズムはＧＣＣで実装した。 The experiment was performed on a Linux server with an Intel Xeon Quad-Core 3.33 GHz CPU and 32 GB of memory. All algorithms were implemented by GCC.

本発明と既存手法の処理時間を比較した。実験では処理時間に大きな影響があるノードの数を変化させた。 The processing time of the present invention and the existing method were compared. In the experiment, we changed the number of nodes that had a large effect on processing time.

図９，１０、１１に処理時間とノードの数の関係を示す。なお、ここで処理時間とは更新段階と検出段階の両方を含むトータルの時間である。本発明が既存手法と比較して９６０倍以上高速であることが確認された。 9, 10 and 11 show the relationship between the processing time and the number of nodes. Here, the processing time is a total time including both the update stage and the detection stage. It was confirmed that the present invention is 960 times faster than the existing method.

既存手法がノードあたりの中心性の計算時間がＯ（ｎ＋ｍ）であるのに対して、本発明は、Ｏ（ｎ'＋ｍ'）の計算時間で近似の中心性を計算できる。本発明ではすべてのノードに対して近似の中心性を計算するが、この計算は効果的に打ち止めされるので処理時間に大きな影響はない。また、もし近似の中心性でノードが枝刈りできなければ、本発明では正確な中心性を計算する。しかし、近似の中心性により殆どのノードが枝刈りされるので、処理時間に大きな影響は見られなかった。 While the calculation time of centrality per node is O (n + m) in the existing method, the present invention can calculate approximate centrality with the calculation time of O (n ′ + m ′). In the present invention, the approximate centrality is calculated for all nodes, but since this calculation is effectively canceled, the processing time is not greatly affected. Also, if the node cannot be pruned with approximate centrality, the present invention calculates the exact centrality. However, since most of the nodes are pruned due to the approximate centrality, the processing time was not significantly affected.

なお、上記のモニタリング装置のデータ受信部、集約部、検出部の動作をプログラムとして構築し、モニタリング装置として利用されるコンピュータにインストールして実行させる、または、ネットワークを介して流通させることが可能である。 The operation of the data reception unit, aggregation unit, and detection unit of the above monitoring device can be constructed as a program and installed in a computer used as a monitoring device for execution or distributed via a network. is there.

また、構築されたプログラムをハードディスクや、フレキシブルディスク・ＣＤ−ＲＯＭ等の可搬記憶媒体に格納し、コンピュータにインストールする、または、配布することが可能である。 Further, the constructed program can be stored in a portable storage medium such as a hard disk, a flexible disk, or a CD-ROM, and can be installed or distributed in a computer.

なお、本発明は、上記の実施の形態及び実施例に限定されることなく、特許請求の範囲内において種々変更・応用が可能である。 The present invention is not limited to the above-described embodiments and examples, and various modifications and applications can be made within the scope of the claims.

本発明は、ソーシャルネットの解析に適用可能である。 The present invention is applicable to social network analysis.

１００モニタリング装置
１１０データ受信手段、データ受信部
１２０集約手段、集約部
１３０データ保持手段、データ保持部
１４０検出手段、検出部 DESCRIPTION OF SYMBOLS 100 Monitoring apparatus 110 Data receiving means, Data receiving part 120 Aggregating means, Aggregating part 130 Data holding means, Data holding part 140 Detection means, Detection part

Claims

A monitoring device for finding a node having the highest centrality, which is a measure for measuring appropriateness in graph theory, from among ever-increasing graphs,
Data receiving means for receiving a node or an edge that varies with time as a graph;
Aggregating means for pruning the nodes of the original graph received by the data receiving means with approximate centrality and calculating an approximate graph;
Data holding means for storing the original graph received by the data receiving means and the approximate graph calculated by the aggregation means;
Detecting the original graph and the approximate graph from the data holding unit, and calculating a node having the highest centrality in the graph;
A monitoring device comprising:

The aggregation means includes
A means for obtaining a reciprocal of the sum of distances between the nodes of the original node set received by the data receiving means, and storing the nodes having similar values of the reciprocal values together in the data holding means as an approximate graph;
The detection means includes
The original graph and the approximate graph are read from the data holding means, and distances from the start node to all other nodes are obtained by breadth-first search, and the upper limit value of the centrality of a certain node is determined from the centrality of the approximate graph. The monitoring device according to claim 1, further comprising means for terminating the search if it is smaller.

A monitoring method for finding a node having the highest centrality, which is a measure for measuring appropriateness in graph theory, from an ever-increasing graph.
In a computer having storage means for storing received graphs and approximate graphs,
A data reception step of receiving a node or edge that varies with time as a graph and storing it in the storage means;
An aggregation step of pruning the nodes of the original graph received in the data receiving step with approximate centrality, calculating an approximate graph, and storing the approximate graph in the storage means;
Detecting the original graph and the approximate graph from the storage means, and calculating a node having the highest centrality in the graph;
Monitoring method characterized by performing.

In the aggregation step,
In the original node set received in the data receiving step, find the reciprocal of the sum of distances between the nodes, collect together the nodes whose reciprocal values approximate, and store them in the storage means as an approximate graph;
In the detecting step,
The original graph and the approximate graph are read from the storage means, and distances from the start node to all other nodes are obtained by breadth-first search, and the upper limit value of the centrality of a certain node is smaller than the centrality of the approximate graph. 4. The monitoring method according to claim 3, wherein the search is terminated.

The monitoring program for functioning a computer as each means which comprises the monitoring apparatus of Claim 1 or 2.