JP6005583B2

JP6005583B2 - SEARCH DEVICE, SEARCH METHOD, AND SEARCH PROGRAM

Info

Publication number: JP6005583B2
Application number: JP2013108843A
Authority: JP
Inventors: 靖宏藤原
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2013-05-23
Filing date: 2013-05-23
Publication date: 2016-10-12
Anticipated expiration: 2033-05-23
Also published as: JP2014229110A

Description

本発明は、検索装置、検索方法および検索プログラムに関する。 The present invention relates to a search device, a search method, and a search program.

近年、ソーシャルネットワークに代表されるように、大規模なネットワークの利用が進んでいる。そして、大規模なネットワークに対して、データマイニングや検索をおこなうシステムの重要性が増している。ネットワーク構成は、ネットワークを構築するコンピュータをノードとし、各ノード間の接続を示すリンクをエッジとするグラフデータとして表現できる。このため、ネットワーク構成をグラフデータとして取り扱うグラフデータベースに問い合わせをおこない、グラフのノードの検索、分類、解析などをおこなうことへの関心が高まりつつある。 In recent years, as represented by social networks, the use of large-scale networks is progressing. And, the importance of data mining and retrieval systems for large-scale networks is increasing. The network configuration can be expressed as graph data in which a computer constituting the network is a node and a link indicating a connection between the nodes is an edge. For this reason, there is a growing interest in inquiring a graph database that handles network configuration as graph data and performing search, classification, analysis, etc. of graph nodes.

例えば、ノードとしてのコンピュータがウェブサイトである場合に、ウェブページの重要度を決定するためのアルゴリズムとしてPageRankのアルゴリズムがある。このPageRankのアルゴリズムは、ランダムサーファーモデルに基づきノードの重要度を計算する。ランダムサーファーモデルは、ユーザが複数回ウェブページのリンクをクリックした後にランダムなページにジャンプするという行動をモデル化したものである。 For example, when a computer as a node is a website, there is a PageRank algorithm as an algorithm for determining the importance of a web page. The PageRank algorithm calculates the importance of a node based on a random surfer model. The random surfer model models a behavior in which a user jumps to a random page after clicking a web page link a plurality of times.

PageRankによるノードの重要度は、ランダムウォークの定常状態における確率に対応する。PageRankの各処理ステップでは、現在のノードからリンク先のノードを選択して移動し、また一定の確率でランダムなページへジャンプする。その有効性から、PageRankは、様々なアプリケーションに応用されている。 The importance of the node by PageRank corresponds to the probability in the steady state of the random walk. In each processing step of PageRank, a link destination node is selected and moved from the current node, and jumps to a random page with a certain probability. Due to its effectiveness, PageRank is applied to various applications.

Lawrence Page、Sergey Brin、Rajeev Motwani、Terry Winograd、“The PageRank Citation Ranking：Bringing Order to the Web”、1999Lawrence Page, Sergey Brin, Rajeev Motwani, Terry Winograd, “The PageRank Citation Ranking: Bringing Order to the Web”, 1999

しかしながら、上述の従来技術では、PageRankの計算コストが高いという問題がある。すなわち、従来技術によるPageRankの計算では、グラフ全体を用いて全てのノードのスコアが収束するまで繰り返し計算を行わなければならないため、大規模なグラフに対しては高速に重要度の高いノードの検索ができない。 However, the above-described conventional technique has a problem that the calculation cost of PageRank is high. In other words, in the conventional PageRank calculation, the entire graph must be used repeatedly until the scores of all the nodes converge. Therefore, for large-scale graphs, high-priority nodes are searched at high speed. I can't.

本願が開示する実施形態は、上記に鑑みてなされたものであって、PageRankに基づくノード検索において、高速に検索結果を得ることを目的とする。 An embodiment disclosed in the present application has been made in view of the above, and an object thereof is to obtain a search result at high speed in a node search based on PageRank.

本願が開示する実施形態の一例は、コンピュータネットワークを形成する機器をノードとし、機器間の接続をエッジとするコンピュータネットワークのグラフを、解ノードの個数を示すｋ（ｋは前記ノードの数を超えない正整数）の入力を受け付けて検索し、ｋ個の解ノードを出力する検索装置である。検索装置は、０を初期値とし、＋１ずつインクリメントされるｉについて、ｉ＝０の場合にはグラフを部分グラフとし、ｉ＞０の場合にはｉ回目の繰り返し計算における候補ノードに基づく（ｉ−１）回目の繰り返し計算におけるグラフの部分グラフへ到達可能なノードの集合からｉ回目の繰り返し計算におけるグラフの部分グラフを構築する部分グラフ構築処理を実行する。また、検索装置は、部分グラフに対応するランダムウォークの確率を計算する。また、検索装置は、ランダムウォークの確率およびｉ回目の繰り返し計算における候補ノードの全てのノードに対するPageRankのスコアの下限値の推定値および上限値の推定値を計算する。また、検索装置は、ｉ回目の繰り返し計算における候補ノードから（ｉ＋１）回目の繰り返し計算における候補ノードを計算し、当該（ｉ＋１）回目の繰り返し計算における候補ノードの集合の要素数がｋに等しい場合には当該（ｉ＋１）回目の繰り返し計算における候補ノードの集合を解ノードとして出力し、当該（ｉ＋１）回目の繰り返し計算における候補ノードの集合の要素数がｋと異なる場合には、当該ｉをさらに＋１インクリメントさせたあらたなｉについて部分グラフ構築処理を実行させる。そして、検索装置は、あらたなｉについて部分グラフ構築処理を実行させる場合は、当該あらたなｉについて、上述の各処理を再度、順次実行する。 An example of an embodiment disclosed in the present application is a graph of a computer network in which a device forming a computer network is a node and a connection between devices is an edge, and k (k is greater than the number of nodes) indicating the number of solution nodes. This is a search device that accepts and searches for (no positive integer) input and outputs k solution nodes. The search device sets 0 as an initial value, i is incremented by +1, and if i = 0, the graph is a partial graph, and if i> 0, it is based on the candidate node in the i-th iteration (i -1) A subgraph construction process for constructing a subgraph of the graph in the i-th iterative calculation is executed from a set of nodes that can reach the subgraph of the graph in the iterative calculation of the first time. The search device also calculates the probability of random walk corresponding to the subgraph. Further, the search device calculates the probability of random walk and the estimated value of the lower limit value and the estimated upper limit value of the PageRank score for all the candidate nodes in the i-th iterative calculation. Further, the search device calculates a candidate node in the (i + 1) -th iteration calculation from a candidate node in the i-th iteration calculation, and the number of elements in the candidate node set in the (i + 1) -th iteration calculation is equal to k. Output a set of candidate nodes in the (i + 1) -th iteration calculation as a solution node, and if the number of elements in the candidate node set in the (i + 1) -th iteration calculation is different from k, i The subgraph construction process is executed for the new i incremented by +1. Then, when the subgraph construction process is executed for the new i, the search device sequentially executes the above-described processes again for the new i.

本願が開示する実施形態によれば、例えば、PageRankに基づくノード検索において、高速に検索結果を得ることができる。 According to the embodiment disclosed in the present application, for example, a search result can be obtained at high speed in a node search based on PageRank.

図１は、検索装置の構成を示すブロック図である。FIG. 1 is a block diagram showing the configuration of the search device. 図２は、検索処理を示すフローチャートである。FIG. 2 is a flowchart showing the search process. 図３は、検索アルゴリズムを示す図である。FIG. 3 is a diagram showing a search algorithm. 図４は、検索プログラムを実行するコンピュータの一例を示す図である。FIG. 4 is a diagram illustrating an example of a computer that executes a search program.

以下に、本願が開示する検索装置などの実施形態を図面に基づいて説明する。以下の実施形態は、コンピュータネットワークを形成する機器をノードとし、機器間の接続をエッジとするコンピュータネットワークのグラフを、解ノードの個数を示すｋ（ｋはノードの数を超えない正整数）の入力を受け付けて検索する際に、PageRankのスコアが上位ｋである候補ノードを、解ノードとして出力する。なお、以下の実施形態は、一例を示すに過ぎず、本願が開示する技術を限定するものではない。 Hereinafter, embodiments of a search device and the like disclosed in the present application will be described with reference to the drawings. In the following embodiment, a graph of a computer network in which devices forming a computer network are nodes and connections between devices are edges is represented by k indicating the number of solution nodes (k is a positive integer not exceeding the number of nodes). When an input is received and a search is performed, a candidate node having a PageRank score of higher k is output as a solution node. The following embodiments are merely examples, and do not limit the technology disclosed by the present application.

［記号の定義］
実施形態の説明で用いる記号を下表に示す。なお、実施形態の説明では、ベクトルをボールド体のラテン小文字、行列をボールド体のラテン大文字で表記する。 [Definition of symbols]
The symbols used in the description of the embodiment are shown in the following table. In the description of the embodiment, the vector is expressed in bold Latin small letters, and the matrix is expressed in bold Latin capital letters.

（従来技術の概要および問題点）
実施形態の説明に先立ち、従来技術の概要および問題点について説明する。従来技術では、PageRankは、ランダムなノードからランダムウォークを開始し、各処理ステップにおいて再帰的にランダムウォークを確率ｓ（０＜ｓ＜１）で繰り返す。また、従来技術では、各処理ステップにおいて、一定の確率（１−ｓ）でランダムなノードへジャンプする。 (Overview of conventional technology and problems)
Prior to the description of the embodiments, the outline and problems of the prior art will be described. In the prior art, PageRank starts a random walk from a random node, and recursively repeats the random walk with probability s (0 <s <1) in each processing step. In the prior art, each processing step jumps to a random node with a certain probability (1-s).

集合Ｖをグラフ全体のノードとし、集合Ｅをエッジの集合とすると、問い合わせ対象のグラフＧは、Ｇ＝｛Ｖ，Ｅ｝と表現できる。ここで、

をｕ番目の要素ｐ［ｕ］がノードｕのPageRankのスコアに対応する列ベクトルとする。また、Ｎをグラフのノード数とする場合に、

を全ての要素の値が１／Ｎである列ベクトルとする。また、Ｗ［ｕ，ｖ］をノードｖからノードｕへ移動する確率とする場合に、

を列要素が正規化されたグラフの隣接行列とする。各ノードのPageRankのスコアは、以下の式（１）を再帰的に収束するまで繰り返し計算を行うことで得られる。 If the set V is a node of the whole graph and the set E is a set of edges, the graph G to be queried can be expressed as G = {V, E}. here,

Is a column vector in which the u-th element p [u] corresponds to the PageRank score of the node u. If N is the number of nodes in the graph,

Is a column vector in which the values of all elements are 1 / N. When W [u, v] is a probability of moving from node v to node u,

Let be the adjacency matrix of a graph with normalized column elements. The PageRank score of each node is obtained by repeatedly calculating the following expression (1) until it converges recursively.

ここで、ｉ＝０であれば、

は

に設定される。この繰り返し計算を行う従来技術の手法は、各ノードにおけるPageRankのスコアが収束するまでおこなわれる。Ｍをグラフのエッジ数とし、Ｔを収束するまでの繰り返し計算の計算回数とすると、この繰り返し計算は、Ｏ（（Ｎ＋Ｍ）・Ｔ）の計算コストを要する。そのため、従来技術の手法は、大規模なグラフに対して高速に検索が行えないという問題がある。なお、Ｏ（＊）は、ランダウの記号である。 Here, if i = 0,

Is

Set to The conventional technique for performing this iterative calculation is performed until the PageRank score at each node converges. Assuming that M is the number of edges in the graph and T is the number of calculation times until convergence, this iterative calculation requires a calculation cost of O ((N + M) · T). For this reason, the conventional technique has a problem that a large-scale graph cannot be searched at high speed. O (*) is a Landau symbol.

［実施形態］
（実施形態の概要）
以下に説明する実施形態は、上述の従来技術の問題点を解決するものである。実施形態では、計算コストを低減するために、PageRankのスコアの下限値の推定値および上限値の推定値を計算する。すなわち、実施形態は、従来技術の手法のように検索対象のグラフ全体を用いるのではなく、PageRankのスコアの下限値および上限値の推定値により、不要なノードとエッジを検索対象のグラフから除外した部分グラフに対してPageRankのスコアの繰り返し計算をおこなうことにより、ノード検索をおこなう。 [Embodiment]
(Outline of the embodiment)
Embodiment described below solves the problem of the above-mentioned prior art. In the embodiment, in order to reduce the calculation cost, the estimated value of the lower limit value and the estimated value of the upper limit value of the PageRank score are calculated. That is, the embodiment excludes unnecessary nodes and edges from the search target graph based on the estimated lower and upper limit values of the PageRank score instead of using the entire search target graph as in the conventional technique. Node search is performed by repeatedly calculating the PageRank score for the subgraph.

以下の実施形態では、まず、実施形態にかかる計算処理方法および理論的背景を説明し、次に、実施形態にかかる検索装置の構成および処理を説明する。 In the following embodiments, first, the calculation processing method and theoretical background according to the embodiment will be described, and then the configuration and processing of the search device according to the embodiment will be described.

＜計算処理方法および理論的背景＞
（PageRankのスコアの下限値および上限値の推定方法）
実施形態では、PageRankのスコアのｉ（ｉ＝０，１，２，・・・、（非負整数））番目の繰り返し計算において、候補ノードの集合に含まれるノードのPageRankのスコアの下限値および上限値の推定値を計算する。以下では、PageRankのスコアの下限値を「下限値」、PageRankのスコアの上限値を「上限値」、PageRankのスコアの下限値の推定値を「下限値の推定値」、PageRankのスコアの上限値の推定値を「上限値の推定値」、PageRankのスコアの下限値の推定値および上限値の推定値を「推定値」と適宜表記する。なお、候補ノードの集合を求める方法については、後述する。 <Calculation processing method and theoretical background>
(Method for estimating the lower and upper limits of PageRank score)
In the embodiment, in the i (i = 0, 1, 2,..., (Non-negative integer)) iteration of the PageRank score, the lower limit value and the upper limit value of the PageRank score of the nodes included in the set of candidate nodes Calculate an estimate of the value. Below, the lower limit of the PageRank score is "Lower limit", the upper limit of the PageRank score is "Upper limit", the estimated lower limit of the PageRank score is "Lower limit estimate", and the upper limit of the PageRank score The estimated value is appropriately expressed as “estimated value of the upper limit value”, the estimated value of the lower limit value of the PageRank score, and the estimated value of the upper limit value as “estimated value” as appropriate. A method for obtaining a set of candidate nodes will be described later.

上限値を計算するために、候補ノードの集合Ｃ_ｉに到達可能なノードの集合Ｒ_ｉを用いる。ここで、ノードｕがノードｖへ到達可能であるとは、グラフ上にノードｕからノードｖへのパスが存在するということである。またｕ番目の要素がエッジの最大の重みから、

となるＮ×１の列ベクトルを

とする。 In order to calculate the upper limit value, a set R _i of nodes that can reach the set C _{i of} candidate nodes is used. Here, the fact that the node u can reach the node v means that a path from the node u to the node v exists on the graph. The u th element is the maximum weight of the edge,

N × 1 column vector

And

また、長さがｉのランダムウォークの確率をＮ×１の列ベクトル

とする。なお、

のｕ列目の成分をｒ_ｉ［ｕ］とする。ここで、グラフの隣接行列

のｉ乗を用いて、

は

と計算できる。なお、ｉ＝０ならば、

とする。ｉ番目の繰り返し計算における下限値

と、ｉ番目の繰り返し計算における上限値

を以下のように定義する。 Also, the probability of a random walk of length i is the N × 1 column vector

And In addition,

Let the component in the u-th column of r be r _i [u]. Where the adjacency matrix of the graph

Using the i power of

Is

Can be calculated. If i = 0,

And Lower limit for i-th iteration

And the upper limit in the i th iterative calculation

Is defined as follows.

これらの推定値の性質を、以下の補助定理１、補助定理２、補助定理３で示す。なお、補助定理３は、実施形態では、推定値が収束することを示す。 The properties of these estimated values are shown by the following Lemma 1, Lemma 2, and Lemma 3. Note that Lemma 3 indicates that the estimated value converges in the embodiment.

実施形態は、再帰的に上位ｋ個のノードを検索するために候補ノードを計算し、候補ノードの数がｋ個になれば繰り返し計算を終了する。推定値を計算するために、候補ノードの集合に含まれるノードに対して部分グラフを計算する。そして、候補ノードは、繰り返し計算の中で動的に更新される。 In the embodiment, candidate nodes are calculated in order to recursively search for the top k nodes, and the iterative calculation ends when the number of candidate nodes reaches k. In order to calculate the estimated value, a subgraph is calculated for the nodes included in the set of candidate nodes. Then, the candidate node is dynamically updated in the repeated calculation.

（候補ノードについて）
以下に、候補ノードおよび部分グラフの定義、ならびに、候補ノードおよび部分グラフの性質を示す。閾値ε_ｉ−１を（ｉ−１）番目の繰り返し計算におけるｋ番目に高い下限値とし、ｉ番目の繰り返し計算における候補ノードの集合Ｃ_ｉを以下のように定義する。 (About candidate nodes)
Below, the definition of a candidate node and a subgraph, and the property of a candidate node and a subgraph are shown. The threshold value ε _{i−1 is set} to the kth highest lower limit value in the (i−1) -th iterative calculation, and a set C _i of candidate nodes in the i-th iterative calculation is defined as follows.

集合Ｃ_ｉの性質は、以下のとおりである。 The properties of the set C _i are as follows.

補助定理４から、Ａ⊆Ｃ_ｉであるため、各繰り返し計算において、候補ノードの集合Ｃ_ｉ−１から候補ノードの集合Ｃ_ｉを、以下のように逐次的に計算できる。 From Auxiliary Theorem 4, since A⊆C _i , in each iteration, the candidate node set C _i-1 to the candidate node set C _i can be calculated sequentially as follows.

また、候補ノードの集合Ｃ_ｉ（ｉ＝０，１，２，・・・）は、以下のとおり、ｉについて単調減少する。 Further, the set C _i (i = 0, 1, 2,...) Of candidate nodes monotonously decreases for i as follows.

（部分グラフについて）
また、実施形態は、部分グラフを用いて、候補ノードに対する推定値を計算する。ここで、ｉ番目の繰り返し計算における部分グラフＧ_ｉを、以下のように定義する。 (About partial graphs)
In addition, the embodiment calculates an estimated value for a candidate node using a subgraph. Here, the subgraph G _i at the i-th iteration, is defined as follows.

部分グラフの集合Ｇ_ｉ（ｉ＝０，１，２，・・・）について、次が成り立つ。 The following holds for the set of subgraphs G _i (i = 0, 1, 2,...).

また、部分グラフの集合Ｇ_ｉ（ｉ＝０，１，２，・・・）は、以下のとおり、ｉについて単調減少する。 The subgraph set G _i (i = 0, 1, 2,...) Monotonously decreases for i as follows.

（推定値の計算について）
また、ｉ番目の繰り返し計算における下限値および上限値は、部分グラフの集合Ｇ_ｉを用いて、逐次的に、以下のように計算する。なお、補助定理７に基づき部分グラフの集合Ｇ_ｉを構築する方法は、後述する。 (About calculation of estimated values)
In addition, the lower limit value and the upper limit value in the i-th iterative calculation are sequentially calculated as follows using a set G _i of subgraphs. A method for constructing a set G _i subgraph based on the lemma 7 will be described later.

そして、定義６によるｉ番目の繰り返し計算における下限値および上限値の計算の計算コストは、以下のとおりである。 And the calculation cost of the calculation of the lower limit value and the upper limit value in the i-th iterative calculation according to Definition 6 is as follows.

以上から、以下の２つの主張を、定理として示す。 From the above, the following two assertions are shown as theorems.

＜検索装置の構成および処理＞
（検索装置の構成）
図１は、検索装置の構成を示すブロック図である。実施形態にかかる検索装置１０は、ノード検索の問い合わせに用いるグラフＧと、解ノードの個数ｋを入力とし、候補ノードの要素数がｋに等しい場合に、PageRankのスコアが上位ｋである候補ノードを解ノードとして出力する。図１に示すように、検索装置１０は、部分グラフ構築部１１、ランダムウォーク確率計算部１２、推定値計算部１３、候補ノード計算部１４を有する。 <Configuration and processing of search device>
(Configuration of search device)
FIG. 1 is a block diagram showing the configuration of the search device. The search device 10 according to the embodiment receives a graph G used for a node search query and the number k of solution nodes as input, and the candidate node having the highest rank of PageRank when the number of candidate node elements is equal to k. Are output as solution nodes. As illustrated in FIG. 1, the search device 10 includes a subgraph construction unit 11, a random walk probability calculation unit 12, an estimated value calculation unit 13, and a candidate node calculation unit 14.

部分グラフ構築部１１は、問い合わせに用いるグラフＧと、解ノードの個数を示すｋ（ｋはノードの数を超えない正整数）を入力とし、繰り返し計算回数を示すｉ（ｉは非負整数）について、部分グラフＧ_ｉを計算して出力する。 The subgraph construction unit 11 receives the graph G used for the inquiry and k indicating the number of solution nodes (k is a positive integer not exceeding the number of nodes) and i indicating the number of repeated calculations (i is a non-negative integer). The subgraph G _i is calculated and output.

具体的には、部分グラフ構築部１１は、ｉ＝０の場合には、候補ノードＣ_ｉの初期集合であるＣ_０に関してＣ_０＝Ｖ、部分グラフＧ_ｉの初期集合であるＧ_０に関してＧ_０＝Ｇをセットする。そして、部分グラフ構築部１１は、ランダムウォーク確率計算部１２へ、部分グラフＧ_０を出力する。 G Specifically, subgraph constructing unit 11, in the case of i = 0, with respect to _{C 0} is an initial set of candidate nodes _{C i} _C 0 = V, with respect to _{G 0} is the initial set of subgraphs _{G i} ₀ = G is set. Then, the subgraph construction unit 11 outputs the subgraph G ₀ to the random walk probability calculation unit 12.

一方、部分グラフ構築部１１は、ｉ≠０の場合には、ｉを＋１インクリメントする。この＋１インクリメントにより、部分グラフ構築部１１が計算した部分グラフＧ_ｉは、部分グラフＧ_ｉ−１となる。そして、部分グラフ構築部１１は、定義３の式（５）および定義４の式（６）に基づいて候補ノードＣ_ｉを計算する。そして、部分グラフ構築部１１は、補助定理７に基づいて、幅優先探索により、候補ノードＣ_ｉから部分グラフＧ_ｉ−１に到達可能なノードの集合Ｒ_ｉを計算する。そして、部分グラフ構築部１１は、定義５に基づいて、ノードの集合Ｒ_ｉから部分グラフＧ_ｉを計算し、ランダムウォーク確率計算部１２へ出力する。 On the other hand, if i ≠ 0, the subgraph construction unit 11 increments i by +1. With this +1 increment, the subgraph G _i calculated by the subgraph construction unit 11 becomes the subgraph G _i−1 . Then, the subgraph construction unit 11 calculates the candidate node C _i based on Expression (5) in Definition 3 and Expression (6) in Definition 4. Then, the subgraph construction unit 11 calculates a set R _i of nodes that can reach the subgraph G _i−1 from the candidate node C _i by breadth-first search based on the lemma 7. Then, the subgraph construction unit 11 calculates the subgraph G _i from the node set R _i based on the definition 5 and outputs the sub graph G _i to the random walk probability calculation unit 12.

ランダムウォーク確率計算部１２は、部分グラフ構築部１１が出力した部分グラフＧ_ｉを入力とし、補助定理６および定義６に基づいて、部分グラフＧ_ｉに対応するランダムウォークの確率ｒ_ｉ［ｕ］を計算し、推定値計算部１３へ出力する。 The random walk probability calculation unit 12 receives the subgraph G _i output from the subgraph construction unit 11 and, based on the lemma 6 and definition 6, the random walk probability r _i [u] corresponding to the subgraph G _i . Is output to the estimated value calculation unit 13.

推定値計算部１３は、ランダムウォーク確率計算部１２が出力したランダムウォークの確率ｒ_ｉ［ｕ］および部分グラフ構築部１１が計算した候補ノードＣ_ｉから、定義６に基づいて、候補ノードＣ_ｉの全てのノードに対してPageRankのスコアの下限値の推定値および上限値の推定値を計算する。そして、推定値計算部１３は、計算した推定値を、候補ノード計算部１４へ出力する。 Estimated value calculating section 13, from the probability r _{i [u]} and the candidate node C _i the partial graph construction unit 11 has calculated the random walk random walk probability calculation unit 12 is output, based on the definition 6, the candidate node C _i The estimated value of the lower limit value and the upper limit value of the PageRank score are calculated for all the nodes. Then, the estimated value calculation unit 13 outputs the calculated estimated value to the candidate node calculation unit 14.

候補ノード計算部１４は、部分グラフ構築部１１が計算した候補ノードＣ_ｉから、ｉ番目の繰り返し計算におけるｋ番目に高い下限値である閾値ε_ｉを計算する。そして、候補ノード計算部１４は、定義３および定義４、ならびに、計算した閾値ε_ｉに基づいて、（ｉ＋１）回目の繰り返し計算における候補ノードの集合Ｃ_ｉ＋１を計算する。そして、候補ノード計算部１４は、候補ノードの集合Ｃ_ｉ＋１の要素数|Ｃ_ｉ＋１|がｋに等しいか否かを判定する。そして、候補ノード計算部１４は、要素数|Ｃ_ｉ＋１|がｋに等しい場合には、候補ノードの集合Ｃ_ｉ＋１を解ノードとして出力する。一方、候補ノード計算部１４は、要素数|Ｃ_ｉ＋１|がｋと異なる場合には、部分グラフ構築部１１に対して、ｉの＋１インクリメント処理以降の処理をおこなわせる。 The candidate node calculation unit 14 calculates a threshold value ε _i that is the kth highest lower limit value in the i th iterative calculation from the candidate nodes C _i calculated by the subgraph construction unit 11. Then, the candidate node calculation unit 14 calculates a set C _{i + 1} of candidate nodes in the (i + 1) -th iterative calculation based on the definitions 3 and 4 and the calculated threshold value ε _i . Then, the candidate node calculation unit 14 determines whether or not the number of elements | C _{i + 1} | of the candidate node set C _{i + 1} is equal to k. If the number of elements | C _{i + 1} | is equal to k, the candidate node calculation unit 14 outputs the candidate node set C _{i + 1} as a solution node. On the other hand, when the number of elements | C _{i + 1} | is different from k, the candidate node calculation unit 14 causes the subgraph construction unit 11 to perform processing after the i + 1 increment processing.

（検索処理）
図２は、検索処理を示すフローチャートである。まず、検索装置１０の部分グラフ構築部１１は、問い合わせに用いるグラフＧと、解ノードの個数を示すｋ（ｋはノードの数を超えない正整数）の入力を受け付ける（ステップＳ１０）。続いて、検索装置１０の部分グラフ構築部１１は、繰り返し計算回数を示すｉに０をセットし、候補ノードの集合Ｖの初期集合にＣ_０をセットし、グラフＧの部分グラフの初期集合にＧ_０をセットする初期化をおこなう（ステップＳ１２）。 (Search process)
FIG. 2 is a flowchart showing the search process. First, the subgraph construction unit 11 of the search device 10 receives an input of a graph G used for an inquiry and k indicating the number of solution nodes (k is a positive integer not exceeding the number of nodes) (step S10). Subsequently, subgraph constructing unit 11 of the retrieval device 10 is to set to zero i indicating the number of repetitive calculations, it sets C ₀ to initial set of the set V of the candidate nodes, the initial set of subgraphs of the graph G It performs initialization to set the G ₀ (step S12).

続いて、検索装置１０の部分グラフ構築部１１は、ｉ≠０であるか否かを判定する（ステップＳ１３）。検索装置１０の部分グラフ構築部１１は、ｉ≠０である場合に（ステップＳ１３Ｙｅｓ）、ステップＳ１４へ処理を移す。一方、検索装置１０の部分グラフ構築部１１は、ｉ＝０である場合に（ステップＳ１３Ｎｏ）、ステップＳ１７へ処理を移す。 Subsequently, the subgraph construction unit 11 of the search device 10 determines whether i ≠ 0 (step S13). The subgraph construction unit 11 of the search device 10 moves the process to step S14 when i ≠ 0 (step S13 Yes). On the other hand, when i = 0 (No in step S13), the subgraph building unit 11 of the search device 10 moves the process to step S17.

ステップＳ１４では、検索装置１０の部分グラフ構築部１１は、ｉを＋１インクリメントする。続いて、検索装置１０の部分グラフ構築部１１は、定義３の式（５）および定義４の式（６）に基づいて候補ノードＣ_ｉを計算し、補助定理７に基づいて、幅優先探索により、候補ノードＣ_ｉから部分グラフＧ_ｉ−１に到達可能なノードの集合Ｒ_ｉを計算する（ステップＳ１５）。続いて、検索装置１０の部分グラフ構築部１１は、定義５に基づいて、ノードの集合Ｒ_ｉから部分グラフＧ_ｉを計算する（ステップＳ１６）。 In step S14, the subgraph construction unit 11 of the search device 10 increments i by +1. Subsequently, the subgraph construction unit 11 of the search device 10 calculates the candidate node C _i based on the expression (5) of the definition 3 and the expression (6) of the definition 4, and performs the breadth-first search based on the auxiliary theorem 7. Then, a set R _i of nodes that can reach the subgraph G _i−1 from the candidate node C _i is calculated (step S15). Subsequently, the subgraph construction unit 11 of the search device 10 calculates the subgraph G _i from the node set R _i based on the definition 5 (step S16).

続いて、検索装置１０のランダムウォーク確率計算部１２は、補助定理６および定義６に基づいて、部分グラフＧ_ｉに対応するランダムウォークの確率ｒ_ｉ［ｕ］を計算する（ステップＳ１７）。 Subsequently, the random walk probability calculation unit 12 of the search device 10 calculates a random walk probability r _i [u] corresponding to the subgraph G _i based on the lemma 6 and the definition 6 (step S17).

続いて、検索装置１０の推定値計算部１３は、ランダムウォークの確率ｒ_ｉ［ｕ］および候補ノードＣ_ｉから、定義６に基づいて、候補ノードＣ_ｉの全てのノードに対してPageRankのスコアの下限値の推定値および上限値の推定値を計算する（ステップＳ１８）。 Subsequently, the estimated value calculation unit 13 of the search device 10 calculates the PageRank score for all the nodes of the candidate nodes C _i based on the definition 6 from the random walk probability r _i [u] and the candidate nodes C _i. An estimated value of the lower limit value and an estimated value of the upper limit value are calculated (step S18).

続いて、検索装置１０の候補ノード計算部１４は、候補ノードＣ_ｉから閾値ε_ｉを計算し、定義３および定義４、ならびに、閾値ε_ｉに基づいて、（ｉ＋１）回目の繰り返し計算における候補ノードの集合Ｃ_ｉ＋１を計算する（ステップＳ１９）。続いて、検索装置１０の候補ノード計算部１４は、候補ノードの集合Ｃ_ｉ＋１の要素数|Ｃ_ｉ＋１|がｋに等しいか否かを判定する（ステップＳ２０）。検索装置１０の候補ノード計算部１４は、要素数|Ｃ_ｉ＋１|がｋに等しい場合には（ステップＳ２０Ｙｅｓ）、ステップＳ２１へ処理を移す。一方、検索装置１０の候補ノード計算部１４は、要素数|Ｃ_ｉ＋１|がｋと異なる場合には（ステップＳ２０Ｎｏ）、ステップＳ１４へ処理を移す。ステップＳ２１では、検索装置１０の候補ノード計算部１４は、候補ノードＣ_ｉ＋１を、解ノードとして出力する。ステップＳ２１が終了すると、検索装置１０は、検索処理を終了する。 Subsequently, the candidate node calculator 14 of the retrieval device 10, the threshold epsilon _i calculated from the candidate node C _i, Definition 3 and definitions 4, and, on the basis of the threshold epsilon _i, candidates in (i + 1) th iteration of A node set C _{i + 1} is calculated (step S19). Subsequently, the candidate node calculation unit 14 of the search device 10 determines whether or not the number of elements | C _{i + 1} | of the candidate node set C _{i + 1} is equal to k (step S20). If the number of elements | C _{i + 1} | is equal to k (Yes in step S20), the candidate node calculation unit 14 of the search device 10 moves the process to step S21. On the other hand, if the number of elements | C _{i + 1} | is different from k (No in step S20), the candidate node calculation unit 14 of the search device 10 moves the process to step S14. In step S21, the candidate node calculation unit 14 of the search device 10 outputs the candidate node C _{i + 1} as a solution node. When step S21 ends, the search device 10 ends the search process.

以上の検索処理によれば、検索における事前計算を要さず、アドホックに検索をおこなうことができる。また、以上の検索処理によれば、内部パラメータの設定を要さないため、ユーザは、簡易にPageRankによる検索を行うことができる。 According to the above search processing, it is possible to perform an ad hoc search without requiring a prior calculation in the search. Further, according to the above search processing, setting of internal parameters is not required, so that the user can easily perform a search by PageRank.

（検索アルゴリズム）
図３は、検索アルゴリズムを示す図である。図３に示す検索アルゴリズムは、図２の検索処理のフローチャートが示す処理に対応する。図３に示すように、検索アルゴリズムは、ｉ＝０ならば、定義３および定義５から、集合Ｃ_０、グラフＧ_０をそれぞれＣ_０＝Ｖ、Ｇ_０＝Ｇとして初期化する（図３の第２行目〜第３行目）。検索アルゴリズムは、ｉ≠０ならば、グラフＧ_ｉ−１に幅優先探索を用いて集合Ｃ_ｉから集合Ｒ_ｉを計算する（図３の第７行目）。これは、補助定理７から、部分グラフＧ_ｉに対してＧ_ｉ⊆Ｇ_ｉ−１という性質があるからである。そして、検索アルゴリズムは、定義５から、集合Ｒ_ｉを用いて部分グラフＧ_ｉを計算する（図３の第８行目）。 (Search algorithm)
FIG. 3 is a diagram showing a search algorithm. The search algorithm shown in FIG. 3 corresponds to the process shown in the flowchart of the search process in FIG. As shown in FIG. 3, if i = 0, the search algorithm initializes the set C ₀ and the graph G ₀ from definitions 3 and 5 as C ₀ = V and G ₀ = G, respectively (see FIG. 3). 2nd to 3rd lines). If i ≠ 0, the search algorithm calculates the set R _i from the set C _i using the breadth _- first search for the graph G _i−1 (the seventh line in FIG. 3). This is because from Lemma 7, there is a property of G _i ⊆G _i−1 with respect to the subgraph G _i . Then, the search algorithm calculates the subgraph G _i from the definition 5 using the set R _i (the eighth line in FIG. 3).

そして、検索アルゴリズムは、部分グラフＧ_ｉにおける各ノードに対してランダムウォークの確率を計算する（図３の第１０行目〜第１２行目）。これは、補助定理６から、推定値を計算するためにランダムウォークの確率が必要だからである。そして、検索アルゴリズムは、候補ノードＣ_ｉに対して推定値を計算し（図３の第１３行目〜第１５行目）、候補ノードＣ_ｉから閾値ε_ｉを計算する（図３の第１６行目）。 Then, the search algorithm calculates the probability of random walk for each node in the subgraph G _i (line 10 to line 12 in FIG. 3). This is because the probability of random walk is necessary to calculate the estimated value based on Lemma 6. Then, the search algorithm calculates an estimated value for the candidate node C _i (13th to 15th lines in FIG. 3), and calculates a threshold ε _i from the candidate node C _i (16th line in FIG. 3). Line).

また、検索アルゴリズムは、候補ノードを更新し、Ｃ_ｉ＋１を計算する（図３の第１７行目）。検索アルゴリズムは、集合Ｃ_ｉ＋１の要素数|Ｃ_ｉ＋１|がｋと等しい、すなわち|Ｃ_ｉ＋１|＝ｋであれば、補助定理４から、候補ノードの集合Ｃ_ｉ＋１に含まれるノードは全て解ノードである。よって、繰り返し計算を打ち切り（図３の第１８行目）、候補ノードの集合Ｃ_ｉ＋１を解ノードとして出力する（図３の第１９行目）。 Further, the search algorithm updates the candidate node and calculates C _{i + 1} (the 17th line in FIG. 3). In the search algorithm, if the number of elements | C _{i + 1} | of the set C _{i + 1} is equal to k, that is, if | C _{i + 1} | = k, all the nodes included in the set C _{i + 1} of candidate nodes are solution nodes from the lemma 4. is there. Therefore, the iterative calculation is aborted (line 18 in FIG. 3), and a set of candidate nodes C _{i + 1} is output as a solution node (line 19 in FIG. 3).

（実施形態による効果）
以上の実施形態によれば、従来技術の手法と比較して、PageRankのスコアをグラフ全体ではなく部分グラフから計算することにより、高速に検索が可能である。また、以上の実施形態によれば、入力パラメータｋ（ｋはノード数を超えない自然数）に対し、PageRankのスコアの正確な上位ｋ個のノードを検索できる。また、以上の実施形態によれば、検索に事前計算を要さず、任意のグラフに対してアドホックに検索をおこなうことができる。また、以上の実施形態によれば、内部パラメータの設定を要さないため、ユーザは、PageRankによる検索を簡易におこなうことができる。 (Effect by embodiment)
According to the above embodiment, compared to the conventional technique, the PageRank score is calculated not from the entire graph but from the partial graph, thereby enabling high-speed search. Further, according to the above-described embodiment, it is possible to search the top k nodes with the correct PageRank score for the input parameter k (k is a natural number not exceeding the number of nodes). Moreover, according to the above embodiment, it is possible to perform an ad hoc search on an arbitrary graph without requiring a prior calculation for the search. Further, according to the above embodiment, since setting of an internal parameter is not required, the user can easily perform a search by PageRank.

（実施形態のシステム構成について）
図示した各装置の各構成要素は機能概念的なものであり、必ずしも物理的に図示のように構成されていることを要しない。すなわち、各装置の分散および統合の具体的形態は図示のものに限られず、その全部または一部を、各種の負荷や使用状況などに応じて、任意の単位で機能的または物理的に分散または統合して構成することができる。 (System configuration of the embodiment)
Each component of each illustrated apparatus is functionally conceptual and does not necessarily need to be physically configured as illustrated. In other words, the specific form of distribution and integration of each device is not limited to that shown in the figure, and all or a part thereof may be functionally or physically distributed or arbitrarily distributed in arbitrary units according to various loads or usage conditions. Can be integrated and configured.

また、検索装置１０においておこなわれる各処理は、全部または任意の一部が、ＣＰＵ（Central Processing Unit）およびＣＰＵにより解析実行されるプログラムにて実現されてもよい。また、検索装置１０においておこなわれる各処理は、ワイヤードロジックによるハードウェアとして実現されてもよい。 Each process performed in the search device 10 may be realized in whole or in part by a CPU (Central Processing Unit) and a program that is analyzed and executed by the CPU. Each process performed in the search device 10 may be realized as hardware by wired logic.

また、実施形態において説明した各処理のうち、自動的におこなわれるものとして説明した処理の全部または一部を手動的におこなうこともできる。もしくは、手動的におこなわれるものとして説明した処理の全部または一部を公知の方法で自動的におこなうこともできる。この他、上述および図示の処理手順、制御手順、具体的名称、各種のデータやパラメータを含む情報については、特記する場合を除いて適宜変更することができる。 In addition, among the processes described in the embodiment, all or a part of the processes described as being automatically performed can be manually performed. Alternatively, all or part of the processing described as being performed manually can be automatically performed by a known method. In addition, the above-described and illustrated processing procedures, control procedures, specific names, and information including various data and parameters can be changed as appropriate unless otherwise specified.

（プログラムについて）
また、実施形態において説明した検索装置１０のＣＰＵなどの制御装置が実行する処理をコンピュータが実行可能な言語で記述したプログラムを作成することもできる。例えば、制御装置が実行する処理をコンピュータが実行可能な言語で記述した検索プログラムを作成することもできる。この場合、コンピュータが検索プログラムを実行することにより、実施形態と同様の効果を得ることができる。さらに、検索プログラムをコンピュータ読み取り可能な記録媒体に記録して、記録媒体に記録された検索プログラムをコンピュータに読み込ませて実行することにより実施形態と同様の処理を実現できる。以下に、図１に示した検索装置１０と同様の機能を実現するプログラムを実行するコンピュータの一例を説明する。 (About the program)
It is also possible to create a program in which processing executed by a control device such as the CPU of the search device 10 described in the embodiment is described in a language that can be executed by a computer. For example, a search program in which processing executed by the control device is described in a language that can be executed by a computer can be created. In this case, the same effect as the embodiment can be obtained by the computer executing the search program. Furthermore, the processing similar to the embodiment can be realized by recording the search program on a computer-readable recording medium, and reading and executing the search program recorded on the recording medium. Hereinafter, an example of a computer that executes a program that implements the same function as the search device 10 illustrated in FIG. 1 will be described.

図４は、検索プログラムを実行するコンピュータ１０００の一例を示す図である。コンピュータ１０００は、メモリ１０１０、ＣＰＵ１０２０を有する。また、コンピュータ１０００は、ハードディスクドライブインタフェース１０３０、ディスクドライブインタフェース１０４０、シリアルポートインタフェース１０５０、ビデオアダプタ１０６０、ネットワークインタフェース１０７０を有する。これらは、バス１０８０によって接続される。 FIG. 4 is a diagram illustrating an example of a computer 1000 that executes a search program. The computer 1000 includes a memory 1010 and a CPU 1020. The computer 1000 also includes a hard disk drive interface 1030, a disk drive interface 1040, a serial port interface 1050, a video adapter 1060, and a network interface 1070. These are connected by a bus 1080.

図４に示すように、メモリ１０１０は、ＲＯＭ（Read Only Memory）１０１１およびＲＡＭ１０１２を含む。ＲＯＭ１０１１は、例えば、ＢＩＯＳ（Basic Input Output System）等のブートプログラムを記憶する。また、ハードディスクドライブインタフェース１０３０は、ハードディスクドライブ１０３１に接続される。また、ディスクドライブインタフェース１０４０は、ディスクドライブ１０４１に接続される。ディスクドライブ１０４１には、磁気ディスクや光ディスクなどの着脱可能な記憶媒体が挿入される。シリアルポートインタフェース１０５０は、例えばマウス１０５１、キーボード１０５２に接続される。また、ビデオアダプタ１０６０は、例えばディスプレイ１０６１に接続される。 As shown in FIG. 4, the memory 1010 includes a ROM (Read Only Memory) 1011 and a RAM 1012. The ROM 1011 stores a boot program such as BIOS (Basic Input Output System). The hard disk drive interface 1030 is connected to the hard disk drive 1031. The disk drive interface 1040 is connected to the disk drive 1041. A removable storage medium such as a magnetic disk or an optical disk is inserted into the disk drive 1041. The serial port interface 1050 is connected to a mouse 1051 and a keyboard 1052, for example. The video adapter 1060 is connected to the display 1061, for example.

ここで、図４に例示するように、ハードディスクドライブ１０３１は、例えば、ＯＳ１０９１、アプリケーションプログラム１０９２、プログラムモジュール１０９３、プログラムデータ１０９４を記憶する。すなわち、検索プログラムは、コンピュータ１０００によって実行される指令が記述されたプログラムモジュールとして、例えばハードディスクドライブ１０３１に記憶される。 Here, as illustrated in FIG. 4, the hard disk drive 1031 stores, for example, an OS 1091, an application program 1092, a program module 1093, and program data 1094. That is, the search program is stored in, for example, the hard disk drive 1031 as a program module in which a command to be executed by the computer 1000 is described.

また、実施形態で説明した各種データは、プログラムデータとして、例えばメモリ１０１０やハードディスクドライブ１０３１に記憶される。そして、ＣＰＵ１０２０が、メモリ１０１０やハードディスクドライブ１０３１に記憶されたプログラムモジュール１０９３やプログラムデータ１０９４を必要に応じてＲＡＭ１０１２に読み出す。そして、ＣＰＵ１０２０が、検索プログラムの各手順を実行する。 The various data described in the embodiment is stored as program data, for example, in the memory 1010 or the hard disk drive 1031. The CPU 1020 reads the program module 1093 and the program data 1094 stored in the memory 1010 and the hard disk drive 1031 to the RAM 1012 as necessary. Then, the CPU 1020 executes each procedure of the search program.

なお、検索プログラムにかかるプログラムモジュール１０９３およびプログラムデータ１０９４は、ハードディスクドライブ１０３１に記憶される場合に限られない。すなわち、プログラムモジュール１０９３およびプログラムデータ１０９４は、着脱可能な記憶媒体に記憶され、ディスクドライブなどを介してＣＰＵ１０２０によって読み出されてもよい。検索プログラムにかかるプログラムモジュール１０９３およびプログラムデータ１０９４は、ネットワーク（ＬＡＮ（Local Area Network）、ＷＡＮ（Wide Area Network）など）を介して接続された他のコンピュータに記憶されていてもよい。そして、プログラムモジュール１０９３およびプログラムデータ１０９４は、ネットワークインタフェース１０７０を介してＣＰＵ１０２０によって読み出され、実行されてもよい。 Note that the program module 1093 and the program data 1094 related to the search program are not limited to being stored in the hard disk drive 1031. That is, the program module 1093 and the program data 1094 may be stored in a removable storage medium and read by the CPU 1020 via a disk drive or the like. The program module 1093 and the program data 1094 related to the search program may be stored in another computer connected via a network (LAN (Local Area Network), WAN (Wide Area Network), etc.). The program module 1093 and the program data 1094 may be read and executed by the CPU 1020 via the network interface 1070.

１０検索装置
１１部分グラフ構築部
１２ランダムウォーク確率計算部
１３推定値計算部
１４候補ノード計算部 DESCRIPTION OF SYMBOLS 10 Search apparatus 11 Subgraph construction part 12 Random walk probability calculation part 13 Estimated value calculation part 14 Candidate node calculation part

Claims

Accepts an input of k (k is a positive integer not exceeding the number of nodes) indicating the number of solution nodes in the computer network graph in which the devices forming the computer network are nodes and the connections between the devices are edges. A search device for searching and outputting the k solution nodes,
For i that is incremented by +1 with 0 as an initial value, if i = 0, the graph is a partial graph, and if i> 0, it is based on a candidate node in the i-th iteration (i−1). A subgraph construction unit for executing a subgraph construction process for constructing a subgraph of the graph in the i-th iterative calculation from a set of nodes that can reach the subgraph of the graph in the iterative calculation of the first time;
A random walk probability calculation unit for calculating a probability of a random walk corresponding to the subgraph constructed by the subgraph construction unit;
Random walk probability calculated by the random walk probability calculation unit and an estimated value calculation unit for calculating an estimated lower limit value and an estimated upper limit value of PageRank scores for all of the candidate nodes in the i-th iterative calculation When,
When the candidate node in the (i + 1) -th iteration calculation is calculated from the candidate node in the i-th iteration calculation, and the number of elements in the candidate node set in the (i + 1) -th iteration calculation is equal to the k When the set of candidate nodes in the (i + 1) -th iteration calculation is output as a solution node, and the number of elements of the candidate node set in the (i + 1) -th iteration calculation is different from k, the subgraph A candidate node calculation unit that causes the construction unit to execute the subgraph construction process for a new i obtained by further incrementing i by +1,
When the candidate node calculation unit causes the subgraph building unit to execute the subgraph building process for the new i, the subgraph building unit, the random walk probability calculating unit, and the estimated value for the new i A search device, wherein the calculation unit and the candidate node calculation unit sequentially execute each process again.

Accepts an input of k (k is a positive integer not exceeding the number of nodes) indicating the number of solution nodes in the computer network graph in which the devices forming the computer network are nodes and the connections between the devices are edges. And a search method executed by a search device that outputs the k solution nodes,
For i that is incremented by +1 with 0 as an initial value, if i = 0, the graph is a partial graph, and if i> 0, it is based on a candidate node in the i-th iteration (i−1). A subgraph construction step of executing a subgraph construction process for constructing a subgraph of the graph in the i-th iteration calculation from a set of nodes that can reach the subgraph of the graph in the iteration computation;
A random walk probability calculation step of calculating a probability of a random walk corresponding to the subgraph constructed by the subgraph construction step;
Estimated value calculating step of calculating the estimated value of the lower limit value and the upper limit value of the PageRank score for all nodes of the candidate node in the random walk probability calculated by the random walk probability calculating step and the i-th iterative calculation When,
When the candidate node in the (i + 1) -th iteration calculation is calculated from the candidate node in the i-th iteration calculation, and the number of elements in the candidate node set in the (i + 1) -th iteration calculation is equal to the k When the set of candidate nodes in the (i + 1) -th iteration calculation is output as a solution node, and the number of elements of the candidate node set in the (i + 1) -th iteration calculation is different from k, the subgraph A candidate node calculation step of executing the subgraph construction processing for a new i obtained by further incrementing the i by +1 in the construction step;
When the candidate node calculation step causes the subgraph construction step to execute the subgraph construction processing for the new i, for the new i, the subgraph construction step, the random walk probability calculation step, and the estimated value A search method, wherein the calculation step and the candidate node calculation step sequentially execute each process again.

Computer
The search program for functioning as a search device of Claim 1.