JP2014146280A

JP2014146280A - Association calculation device, association calculation system, association calculation method, and association calculation program

Info

Publication number: JP2014146280A
Application number: JP2013015911A
Authority: JP
Inventors: Yasuhiro Fujiwara; 靖宏藤原; Makoto Onizuka; 真鬼塚; Kyota Tsutsumida; 恭太堤田; Akira Nakayama; 彰中山; Hiroyuki Toda; 浩之戸田
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2013-01-30
Filing date: 2013-01-30
Publication date: 2014-08-14
Anticipated expiration: 2033-01-30
Also published as: JP5735019B2

Abstract

PROBLEM TO BE SOLVED: To provide a mechanism of association calculation including a technology which suppresses computational complexity of association calculation.SOLUTION: On a graph structure data which is a data model which expresses a relationship among data by a relationship between a node and edge, an association calculation device 10 calculates a propagation value from a node to an adjacent node for every edge using association of a node and weight of edge, and repeatedly calculates an association of each node to a set of node and using the propagation value of the edge. Then, the association calculation device 10 excludes a predetermined node from the node as a calculation object on the basis of the association of the node or the propagation value of the edge, every time an association of a node is calculated. Then, the association calculation device 10 calculates the association of the node of the calculation object in which the predetermined node is excluded.

Description

本発明は、関連度計算装置、関連度計算システム、関連度計算方法および関連度計算プログラムに関する。 The present invention relates to an association degree calculation device, an association degree calculation system, an association degree calculation method, and an association degree calculation program.

従来、データ間の関係などをノードとエッジの関係で表現するデータモデルであるグラフ構造データが知られている。例えば、図１４に例示するようなコンピュータとそれらの接続関係が与えられた時に、グラフ構造データを用いることによって、これらコンピュータとそれらの接続関係を、図１５に例示するようなデータモデルで表現することができる。近年、Personalized Searchへのグラフ理論の適用による高度で高品質な検索処理の実現等により、グラフ理論の利用に関する関心が高まりつつある。このグラフ理論の利用において、ノード間の関連度は重要な属性の一つであり、様々な分野のアプリケーションに適用されており、各種の関連度の計算手法が提案されている。 Conventionally, graph structure data, which is a data model that expresses a relationship between data by a relationship between nodes and edges, is known. For example, when the computers and their connection relationships illustrated in FIG. 14 are given, the graph structure data is used to express these computers and their connection relationships in a data model as illustrated in FIG. be able to. In recent years, interest in the use of graph theory has increased due to the realization of advanced and high-quality search processing by applying graph theory to Personalized Search. In using this graph theory, the degree of association between nodes is one of the important attributes, and it is applied to applications in various fields, and various methods for calculating the degree of association have been proposed.

このような関連度の計算手法として、例えば、Personalized Page Rankという計算手法が、ノード間の関連度の計算手法として注目を集めているもののひとつである。Personalized Page Rankは、今までグラフ理論でよく用いられてきたノード間の最短距離などと異なり、グラフの構造的な特徴に基づいて関連度が計算できる。 As such a relevance calculation method, for example, a calculation method called Personalized Page Rank is one of the methods attracting attention as a relevance calculation method between nodes. Personalized Page Rank can calculate the degree of relevance based on the structural features of the graph, unlike the shortest distance between nodes that has been often used in graph theory.

このPersonalized Page Rankでは、問い合わせ分布に基づいた各ノードの存在確率からランダムウォークを開始し、隣接するノードにエッジの重みに比例した確率でランダムに移動する。さらにノードに到達するたびに一定の確率で問い合わせ分布に基づいた確率でノードに戻る。この操作を再帰的に繰り返した結果として各ノードにおける定常状態確率が得られるが、Personalized Page Rankでは、この得られた定常状態確率を関連度とする手法である。 In this Personalized Page Rank, a random walk is started from the existence probability of each node based on the query distribution, and moves randomly to an adjacent node with a probability proportional to the edge weight. Furthermore, every time the node is reached, the node returns to the node with a probability based on the query distribution with a certain probability. As a result of recursively repeating this operation, a steady state probability at each node is obtained. In Personalized Page Rank, the obtained steady state probability is a method of using the degree of relevance.

ここで、図１６を参照して、Personalized Page Rankにおいて、関連度を計算するためのアルゴリズムの一例について説明する。図１６の例では、このアルゴリズムにおいてＶとＣｕをそれぞれグラフにおける全てのノードの集合及びノードｕから出ているエッジがつながっている隣接ノードの集合とする。またｓ′を繰り返しにおいて計算される更新後の関連度とし、ｐをノード間で計算される関連度の伝搬値とする。ここで伝搬値とは、ノードの関連度とエッジの重みをかけたものである。 Here, with reference to FIG. 16, an example of an algorithm for calculating the degree of association in the Personalized Page Rank will be described. In the example of FIG. 16, in this algorithm, V and Cu are set as a set of all nodes in the graph and a set of adjacent nodes to which edges coming from the node u are connected. In addition, s ′ is a relevance degree after update calculated in repetition, and p is a propagation value of the relevance degree calculated between nodes. Here, the propagation value is obtained by multiplying the degree of relevance of the node and the weight of the edge.

図１６に示すように、はじめに関連度の分布ｓを問い合わせ分布ｄに初期化する（１行目参照）。そして、繰り返し計算のなかで、まず更新後の関連度を０に初期化する（３〜５行目参照）。その後、すべてのノードに対してそれらのノードから出ているエッジが他のノードに伝搬する関連値を計算する（６〜１１行目参照）。そして伝搬された関連度から更新後の関連度を求める（１２，１３行目参照）。その後、所定の繰り返し条件が満たされるまで、グラフ全体を用いた関連度の計算を再帰的に実行する。 As shown in FIG. 16, first, the relevance distribution s is initialized to the inquiry distribution d (see the first line). Then, in the repetitive calculation, first, the degree of association after the update is initialized to 0 (see the third to fifth lines). After that, for all the nodes, the related values that the edges coming out of those nodes propagate to other nodes are calculated (see the 6th to 11th lines). Then, the updated degree of association is obtained from the propagated degree of association (see the 12th and 13th lines). Thereafter, the relevance calculation using the entire graph is recursively executed until a predetermined repetition condition is satisfied.

“A Survey on PageRank Computing”, Pavel Berkhin著, Internet Mathematics Vol.2, 2005“A Survey on PageRank Computing”, Pavel Berkhin, Internet Mathematics Vol.2, 2005 “Fast and Exact Top-k Search for Random Walk with Restart”, Yasuhiro Fujiwara, Makoto Nakatsuji, Makoto Onizuka, Masaru Kitsuregawa著, PVLDB Endowment，Vol.5，No.5 2012“Fast and Exact Top-k Search for Random Walk with Restart”, Yasuhiro Fujiwara, Makoto Nakatsuji, Makoto Onizuka, Masaru Kitsuregawa, PVLDB Endowment, Vol.5, No.5 2012 “ネットワーク分析(Ｒで学ぶデータサイエンス８)”、鈴木努著、金明哲編、共立出版“Network analysis (data science 8 learned with R)”, Tsutomu Suzuki, Satoshi Kimmei, Kyoritsu Publishing

しかしながら、上記した従来の技術では、グラフ全体を用いた関連度の計算を繰り返し行っているので、関連度の算出に要する計算量が多いという問題がある。このため、高度で効率的な検索処理の実現等、実用的なアプリケーションで関連度を有効に利用するためには、関連度の算出に要する計算量を削減し、関連度計算を効率的に実現することが課題となる。 However, in the above-described conventional technique, since the calculation of the degree of association using the entire graph is repeatedly performed, there is a problem that the amount of calculation required for calculating the degree of association is large. For this reason, in order to effectively use relevance in practical applications such as realization of advanced and efficient search processing, the amount of calculation required to calculate relevance is reduced and relevance is efficiently calculated. It becomes a problem to do.

そこで、この発明は、上述した従来技術の課題を解決するためになされたものであり、関連度算出の計算量を抑制する技術を備えた関連度計算の仕組みを提供することを目的とする。 Accordingly, the present invention has been made to solve the above-described problems of the related art, and an object of the present invention is to provide a relevance calculation mechanism having a technique for suppressing the calculation amount of relevance calculation.

上述した課題を解決し、目的を達成するために、本発明に係る関連度計算装置は、データ間の関係をノードとエッジの関係で表現するデータモデルであるグラフ構造データにおいて、ノードから隣接ノードに対する伝搬値をノードの関連度とエッジの重みを用いてエッジごとに計算し、該エッジの伝搬値を用いてノードの集合体に対する各ノードの関連度を繰り返し計算する関連度計算部と、前記関連度計算部によって前記ノードの関連度が計算されるたびに、前記ノードの関連度または前記エッジの伝搬値に基づいて、所定のノードを計算対象のノードから除外する削除部と、を備え、前記関連度計算部は、前記所定のノードが除外された前記計算対象のノードの前記関連度を計算することを特徴とする。 In order to solve the above-described problems and achieve the object, the relevance calculation apparatus according to the present invention is a graph structure data that is a data model that expresses a relationship between data as a relationship between a node and an edge. A relevance calculation unit for calculating a propagation value for each node using the relevance of the node and the weight of the edge, and repeatedly calculating the relevance of each node with respect to the aggregate of nodes using the propagation value of the edge; A deletion unit that excludes a predetermined node from a calculation target node based on the degree of association of the node or the propagation value of the edge each time the degree of association of the node is calculated by the degree of association calculation unit, The relevance calculation unit calculates the relevance of the calculation target node from which the predetermined node is excluded.

また、本発明に係る関連度計算システムは、データ間の関係をノードとエッジの関係で表現するデータモデルであるグラフ構造データにおいて、ノードから隣接ノードに対する伝搬値をノードの関連度とエッジの重みを用いてエッジごとに計算し、該エッジの伝搬値を用いてノードの集合体に対する各ノードの関連度を繰り返し計算する関連度計算部と、前記関連度計算部によって前記ノードの関連度が計算されるたびに、前記ノードの関連度または前記エッジの伝搬値に基づいて、所定のノードを計算対象のノードから除外する削除部と、を備え、前記関連度計算部は、前記所定のノードが除外された前記計算対象のノードの前記関連度を計算することを特徴とする。 In addition, the relevance calculation system according to the present invention is a graph structure data that is a data model that expresses a relationship between data as a relationship between a node and an edge. And a relevance calculation unit that repeatedly calculates the relevance of each node to a set of nodes using the propagation value of the edge, and the relevance of the node is calculated by the relevance calculation unit. A deletion unit that excludes a predetermined node from a calculation target node based on the relevance level of the node or the propagation value of the edge each time the relevance calculation unit includes the predetermined node. The degree of association of the excluded nodes to be calculated is calculated.

また、本発明に係る関連度計算方法は、データ間の関係をノードとエッジの関係で表現するデータモデルであるグラフ構造データにおいて、ノードから隣接ノードに対する伝搬値をノードの関連度とエッジの重みを用いてエッジごとに計算し、該エッジの伝搬値を用いてノードの集合体に対する各ノードの関連度を繰り返し計算する関連度計算工程と、前記関連度計算工程によって前記ノードの関連度が計算されるたびに、前記ノードの関連度または前記エッジの伝搬値に基づいて、所定のノードを計算対象のノードから除外する削除工程と、を含み、前記関連度計算工程は、前記所定のノードが除外された前記計算対象のノードの前記関連度を計算することを特徴とする。 In addition, the relevance calculation method according to the present invention is a graph model data representing a data model that expresses a relationship between data as a relationship between a node and an edge. The relevance calculation step of calculating the relevance of each node with respect to the aggregate of nodes using the propagation value of the edge, and the relevance of the node is calculated by the relevance calculation step. Each step of deleting the predetermined node from the calculation target node based on the relevance of the node or the propagation value of the edge, and the relevance calculation step includes: The degree of association of the excluded nodes to be calculated is calculated.

また、本発明に係る関連度計算プログラムは、データ間の関係をノードとエッジの関係で表現するデータモデルであるグラフ構造データにおいて、ノードから隣接ノードに対する伝搬値をノードの関連度とエッジの重みを用いてエッジごとに計算し、該エッジの伝搬値を用いてノードの集合体に対する各ノードの関連度を繰り返し計算する関連度計算ステップと、前記関連度計算ステップによって前記ノードの関連度が計算されるたびに、前記ノードの関連度または前記エッジの伝搬値に基づいて、所定のノードを計算対象のノードから除外する削除ステップと、をコンピュータに実行させ、前記関連度計算ステップは、前記所定のノードが除外された前記計算対象のノードの前記関連度を計算することを特徴とする。 In addition, the relevance calculation program according to the present invention provides the propagation value from a node to an adjacent node using the relationship between the node and the edge weight in the graph structure data that is a data model expressing the relationship between the data by the relationship between the node and the edge. The relevance calculation step of repeatedly calculating the relevance of each node with respect to the aggregate of nodes using the propagation value of the edge, and calculating the relevance of the node by the relevance calculation step Each time, the computer executes a deletion step of excluding a predetermined node from the calculation target node based on the degree of association of the node or the propagation value of the edge. The relevance level of the calculation target node from which the node is excluded is calculated.

実施形態に係る関連度計算装置、関連度計算システム、関連度計算方法および関連度計算プログラムによれば、関連度算出の計算量を抑制する技術を備えた関連度計算の仕組みを提供することができるという効果を奏する。 According to the relevance calculation device, the relevance calculation system, the relevance calculation method, and the relevance calculation program according to the embodiment, it is possible to provide a relevance calculation mechanism having a technique for suppressing the calculation amount of relevance calculation. There is an effect that can be done.

図１は、実施例１に係る関連度計算装置の構成を説明するための図である。FIG. 1 is a diagram for explaining the configuration of the relevance calculation apparatus according to the first embodiment. 図２は、関連度を計算するためのアルゴリズムの一例を示す図である。FIG. 2 is a diagram illustrating an example of an algorithm for calculating the degree of association. 図３は、ノードとエッジで表現されるグラフと、問い合わせ分布の値の一例を示す図である。FIG. 3 is a diagram illustrating an example of a graph expressed by nodes and edges and an inquiry distribution value. 図４は、ノードとエッジで表現されるグラフと、関連度の値の一例を示す図である。FIG. 4 is a diagram illustrating an example of a graph expressed by nodes and edges, and values of relevance. 図５は、ノードの枝刈りが行われた後のノードとエッジで表現されるグラフと、関連度の値の一例を示す図である。FIG. 5 is a diagram illustrating an example of a graph expressed by nodes and edges after node pruning, and values of relevance. 図６は、実施例１に係る関連度計算装置による関連度計算処理の流れを示すフローチャートである。FIG. 6 is a flowchart illustrating the flow of the relevance calculation process performed by the relevance calculation apparatus according to the first embodiment. 図７は、実施例１に係る関連度計算装置により関連度の計算を行った場合の結果の一例を示す図である。FIG. 7 is a diagram illustrating an example of a result when the relevance calculation is performed by the relevance calculation apparatus according to the first embodiment. 図８は、実施例２に係る関連度計算装置の構成を説明するための図である。FIG. 8 is a diagram for explaining the configuration of the relevance calculation apparatus according to the second embodiment. 図９は、重みの降順で並び替えられたエッジの例を示す図である。FIG. 9 is a diagram illustrating an example of edges rearranged in descending order of weight. 図１０は、関連度を計算するためのアルゴリズムの一例を示す図である。FIG. 10 is a diagram illustrating an example of an algorithm for calculating the degree of association. 図１１は、実施例２に係る関連度計算装置による関連度計算処理の流れを示すフローチャートである。FIG. 11 is a flowchart illustrating a relevance calculation process performed by the relevance calculation apparatus according to the second embodiment. 図１２は、実施例２に係る関連度計算装置により関連度の計算を行った場合の結果の一例を示す図である。FIG. 12 is a diagram illustrating an example of a result when the relevance calculation is performed by the relevance calculation apparatus according to the second embodiment. 図１３は、関連度計算装置プログラムによる情報処理がコンピュータを用いて具体的に実現されることを示す図である。FIG. 13 is a diagram illustrating that the information processing by the relevance calculation apparatus program is specifically realized using a computer. 図１４は、コンピュータの接続関係の一例を示す図である。FIG. 14 is a diagram illustrating an example of a computer connection relationship. 図１５は、ノードとエッジの関係で表現されたコンピュータの接続関係の一例を示す図である。FIG. 15 is a diagram illustrating an example of a connection relationship between computers expressed as a relationship between nodes and edges. 図１６は、従来の関連度を計算するためのアルゴリズムの一例を示す図である。FIG. 16 is a diagram illustrating an example of an algorithm for calculating the related degree of related art.

以下に添付図面を参照して、この発明に係る関連度計算装置、関連度計算システム、関連度計算方法および関連度計算プログラムの実施例を詳細に説明する。なお、この実施例によりこの発明が限定されるものではない。例えば、以下の説明では、Personalized Page Rankを用いて関連度を計算する場合の例を用いて説明するが、これに限定されるものではなく、Personalized Page Rank以外の手法（例えば、Page RankやRandom walk with restart）などを用いて関連度を計算する場合にも本発明を適用することができる。 Exemplary embodiments of an association degree calculation device, an association degree calculation system, an association degree calculation method, and an association degree calculation program according to the present invention will be described below in detail with reference to the accompanying drawings. Note that the present invention is not limited to the embodiments. For example, in the following explanation, an example of calculating the degree of relevance using Personalized Page Rank will be described, but the present invention is not limited to this. Methods other than Personalized Page Rank (for example, Page Rank and Random The present invention can also be applied to the case where the relevance is calculated using “walk with restart” or the like.

以下の実施例では、実施例１に係る関連度計算システムに含まれる関連度計算装置の構成、処理の流れを順に説明し、最後に実施例１による効果を説明する。 In the following embodiments, the configuration of the relevance calculation device included in the relevance calculation system according to the first embodiment and the flow of processing will be described in order, and finally the effects of the first embodiment will be described.

［関連度計算装置の構成］
まず、実施例１における関連度計算システムに含まれる関連度計算装置１０について図１を用いて説明する。図１は、実施例１に係る関連度計算装置１０の構成を説明するための図である。図１に示すように、実施例１における関連度計算システムは、関連度計算装置１０およびユーザ端末２０を有する。関連度計算システム１０およびユーザ端末２０は、例えば、インターネット等（図示せず）を介して、接続されている。 [Configuration of relevance calculation device]
First, the relevance calculation device 10 included in the relevance calculation system according to the first embodiment will be described with reference to FIG. FIG. 1 is a diagram for explaining the configuration of the association degree calculation apparatus 10 according to the first embodiment. As shown in FIG. 1, the relevance calculation system according to the first embodiment includes a relevance calculation apparatus 10 and a user terminal 20. The relevance calculation system 10 and the user terminal 20 are connected via, for example, the Internet (not shown).

この関連度計算装置１０は、通信制御部１１および制御部１２を有し、インターネット等を介してユーザ端末２０と接続される。以下にこれらの各部の処理を説明する。 The degree-of-association calculation apparatus 10 includes a communication control unit 11 and a control unit 12, and is connected to the user terminal 20 via the Internet or the like. The processing of each of these units will be described below.

通信制御部１１は、インターネット等を介して接続されるユーザ端末２０との間でやり取りする各種情報に関する通信を制御する。具体的には、通信制御部１１は、問い合わせ分布ｄと、グラフの隣接行列Ｗと、再開始確率ｃと、繰り返し計算回数Ｔと、枝刈りの閾値θとをユーザ端末２０から受信する。また、通信制御部１１は、計算した関連度分布をユーザ端末２０に対して送信する。 The communication control unit 11 controls communication related to various information exchanged with the user terminal 20 connected via the Internet or the like. Specifically, the communication control unit 11 receives from the user terminal 20 the inquiry distribution d, the graph adjacency matrix W, the restart probability c, the number of iterations T, and the pruning threshold value θ. Further, the communication control unit 11 transmits the calculated relevance distribution to the user terminal 20.

制御部１２は、関連度計算装置１０において実行される各種処理を制御する。具体的には、制御部１２は、図１に示すように、関連度計算部１２ａおよび削除部１２ｂを有する。 The control unit 12 controls various processes executed in the relevance calculation apparatus 10. Specifically, as shown in FIG. 1, the control unit 12 includes a relevance calculation unit 12a and a deletion unit 12b.

関連度計算部１２ａは、データ間の関係をノードとエッジの関係で表現するデータモデルであるグラフ構造データにおいて、ノードから隣接ノードに対する伝搬値をノードの関連度とエッジの重みを用いてエッジごとに計算し、該エッジの伝搬値を用いてノードの集合体に対する各ノードの関連度を繰り返し計算する。また、関連度計算部１２ａは、後述する削除部１２ｂにより所定のノードが除外された計算対象のノードの関連度を計算する。また、関連度計算部１２ａは、予め設定された繰り返し計算回数Ｔに応じて、ノードの関連度を繰り返し計算する。 The degree-of-association calculation unit 12a is configured to display a propagation value from a node to an adjacent node for each edge by using the degree of association of the node and the weight of the edge in the graph structure data that is a data model that expresses the relationship between the data by the relationship between the node and the edge. And the degree of relevance of each node to the set of nodes is repeatedly calculated using the propagation value of the edge. The relevance calculation unit 12a calculates the relevance of the calculation target node from which a predetermined node is excluded by the deletion unit 12b described later. In addition, the relevance calculation unit 12a repeatedly calculates the relevance of nodes according to a preset number of repetitions T.

例えば、関連度計算部１２ａは、図２に例示するアルゴリズムを用いて、各ノードの関連度を計算する。図２は、関連度を計算するためのアルゴリズムの一例を示す図である。図２におけるアルゴリズムにおいてＶとＣｕをそれぞれグラフにおける全てのノードの集合及びノードｕから出ているエッジがつながっている隣接ノードの集合とする。また、ｓ′を繰り返しにおいて計算される更新後の関連度とし、ｐをノード間で計算される関連度の伝搬値とする。また、更新後のあるノードの関連度の値は、そのノードにつながっている隣接ノードからの伝搬値の総和から計算される。Ｖ′を枝刈り後のノードの集合とし、θを枝刈りにおける閾値とする。 For example, the relevance calculation unit 12a calculates the relevance of each node using the algorithm illustrated in FIG. FIG. 2 is a diagram illustrating an example of an algorithm for calculating the degree of association. In the algorithm shown in FIG. 2, V and Cu are set as a set of all nodes in the graph and a set of adjacent nodes to which edges coming from the node u are connected. In addition, s ′ is the relevance degree after update calculated in repetition, and p is a propagation value of the relevance degree calculated between nodes. Further, the relevance value of a node after updating is calculated from the sum of propagation values from adjacent nodes connected to that node. V ′ is a set of nodes after pruning, and θ is a threshold for pruning.

図２に示すように、関連度計算部１２ａは、まず関連度の分布ｓを問い合わせ分布ｄに初期化し、ノードの集合Ｖ′をノード全体の集合Ｖに初期化する（１〜２行目参照）。また、関連度計算部１２ａは、計算回数ｔを「１」とする。そして、関連度計算部１２ａは、ノードｕを全てのノードの集合Ｖから選択し、ノードの更新後の関連度ｓ′を０に初期化する（４行目〜６行目参照）。 As shown in FIG. 2, the relevance calculation unit 12a first initializes the relevance distribution s to the query distribution d, and initializes the node set V ′ to the entire node set V (see the first and second lines). ). Further, the relevance calculation unit 12a sets the calculation count t to “1”. Then, the relevance calculation unit 12a selects the node u from the set V of all nodes, and initializes the relevance s ′ after the node update to 0 (see the 4th to 6th lines).

続いて、関連度計算部１２ａは、繰り返し計算の中で各ノードからの伝搬値ｐは全てのノードの集合Ｖからでなく、伝搬対象を絞り込んだ枝刈り後のノードの集合Ｖ′から計算する（７〜１２行目参照）。具体的には、関連度計算部１２ａは、ノードｕをＶ′から選択し、ノードｖをＣｕから選択する。そして、関連度計算部１２ａは、ノードの関連度ｓとエッジの重みを示すグラフの隣接行列Ｗを乗算して伝搬値ｐを計算、すなわち、p=W[v,u]s[u]を計算し（９行目参照）、計算したｐを用いて、s′[v]=s′[v]+pを計算する（１０行目参照）。この処理をＶ′とＣｕの全てのノードの組において繰り返し計算する。その後、関連度計算部１２ａは、s′=(1-c)s′[v]+cdを計算し（１３行目参照）、関連度分布ｓを更新後の関連度ｓ′とする（１４行目参照）。そして、関連度計算部１２ａは、計算した関連度分布ｓを削除部１２ｂに出力し、削除部１２ｂが有する記憶部（図示せず）に関連度分布ｓを格納させる。 Subsequently, in the repetitive calculation, the relevance calculation unit 12a calculates the propagation value p from each node not from the set V of all nodes but from the set V ′ of nodes after pruning narrowing down the propagation target. (See lines 7-12). Specifically, the relevance calculation unit 12a selects the node u from V ′ and the node v from Cu. Then, the relevance calculation unit 12a calculates the propagation value p by multiplying the adjacency matrix W of the graph indicating the relevance s of the node and the edge weight, that is, p = W [v, u] s [u] Calculate (see the ninth line), and use the calculated p to calculate s ′ [v] = s ′ [v] + p (see the tenth line). This process is repeated for all pairs of V 'and Cu nodes. Thereafter, the relevance calculation unit 12a calculates s ′ = (1-c) s ′ [v] + cd (see the 13th line), and sets the relevance distribution s as the relevance s ′ after update (14 See line). Then, the relevance calculation unit 12a outputs the calculated relevance distribution s to the deletion unit 12b, and stores the relevance distribution s in a storage unit (not shown) included in the deletion unit 12b.

その後、関連度計算部１２ａは、ノード関連度ｓ′が枝刈りの閾値θ以上であるノードのみを含むＶ′が後述する削除部１２ｂにより記憶部（図示せず）が格納され、該Ｖ′を読み出して、上記の関連度計算処理を予め設定された繰り返し計算回数Ｔだけ繰り返し行う。このように、関連度計算部１２ａは、繰り返し計算のなかで関連度が枝刈りの閾値θ以上の関連度の高いノードからの伝搬に対してのみ関連度の伝搬値を計算し、結果高速に関連度の計算を行う。そして、関連度計算部１２ａは、関連度の計算処理を繰り返し計算回数Ｔだけ行うと、計算した関連度分布ｓをユーザ端末２０に出力する。 Thereafter, in the relevance calculation unit 12a, a storage unit (not shown) is stored in the deletion unit 12b described later, and V ′ including only nodes whose node relevance s ′ is equal to or greater than the pruning threshold θ, and the V ′ And the above relevance calculation process is repeated for a preset number of repetitions T. As described above, the relevance calculation unit 12a calculates the propagation value of the relevance only for the propagation from the node having a high relevance whose relevance is equal to or higher than the pruning threshold θ in the repetitive calculation, and results in high speed Calculate relevance. Then, the relevance calculation unit 12 a outputs the calculated relevance distribution s to the user terminal 20 when the relevance calculation process is repeated for the number of repetitions T.

削除部１２ｂは、関連度計算部１２ａによってノードの関連度が計算されるたびに、ノードの関連度に基づいて、所定のノードを計算対象のノードから除外する。具体的には、削除部１２ｂは、関連度計算部１２ａによって計算されたノードの関連度を記憶部から読みだし、該ノードの関連度が枝刈りの閾値未満であるか否かを判定し、ノードの関連度が枝刈りの閾値未満であるノードを計算対象のノードから除外する。 The deletion unit 12b excludes a predetermined node from the calculation target node every time the relevance level of the node is calculated by the relevance level calculation unit 12a. Specifically, the deletion unit 12b reads the degree of association of the node calculated by the degree of association calculation unit 12a from the storage unit, determines whether the degree of association of the node is less than a pruning threshold, Nodes whose node relevance is less than the pruning threshold are excluded from the nodes to be calculated.

図２を参照して説明すると、削除部１２ｂは、枝刈り後のノードの集合は、各繰り返し計算の中で更新された関連度から求める（１５〜２０行目参照）。具体的には、削除部１２ｂは、図２に示すように、枝刈り後のノードの集合Ｖ′を０に初期化した後（１５行目参照）、関連度計算部１２ａによって計算された各ノードｕのノード関連度ｓ′が枝刈りの閾値θ以上であるか否かを判定する（１７行目参照）。この結果、削除部１２ｂは、ノード関連度ｓ′が枝刈りの閾値θ以上であるノードのみをＶ′に加え（１８行目参照）、ノード関連度ｓ′が枝刈りの閾値θより低いノードを関連度計算対象のノードから削除する。その後、削除部１２ｂは、枝刈り後のノードの集合Ｖ′を関連度計算部１２ａに出力し、関連度計算部１２ａが有する記憶部（図示せず）にノードの集合Ｖ′を格納させる。 If it demonstrates with reference to FIG. 2, the deletion part 12b calculates | requires the set of nodes after pruning from the relevance degree updated in each repetition calculation (refer 15th-20th line). Specifically, as illustrated in FIG. 2, the deletion unit 12b initializes the node set V ′ after pruning to 0 (see the 15th line), and then calculates each relevance calculation unit 12a. It is determined whether or not the node relevance s ′ of the node u is greater than or equal to the pruning threshold value θ (see the 17th line). As a result, the deletion unit 12b adds only nodes whose node relevance s ′ is equal to or greater than the pruning threshold θ to the V ′ (see the 18th line), and nodes whose node relevance s ′ is lower than the pruning threshold θ. Is deleted from the node for which relevance is calculated. Thereafter, the deletion unit 12b outputs the node set V ′ after pruning to the relevance calculation unit 12a, and stores the node set V ′ in a storage unit (not shown) included in the relevance calculation unit 12a.

ここで、図３〜図５を用いて、関連度計算装置１０が実行する関連度計算処理の概要を説明する。図３は、ノードとエッジで表現されるグラフと、問い合わせ分布の値の一例を示す図である。図４は、ノードとエッジで表現されるグラフと、関連度の値の一例を示す図である。図５は、ノードの枝刈りが行われた後のノードとエッジで表現されるグラフと、関連度の値の一例を示す図である。 Here, the outline | summary of the relevance calculation process which the relevance degree calculation apparatus 10 performs is demonstrated using FIGS. FIG. 3 is a diagram illustrating an example of a graph expressed by nodes and edges and an inquiry distribution value. FIG. 4 is a diagram illustrating an example of a graph expressed by nodes and edges, and values of relevance. FIG. 5 is a diagram illustrating an example of a graph expressed by nodes and edges after node pruning, and values of relevance.

図３〜図５の例では、複数のノードＡ〜Ｈが存在し、ノード間がエッジで結合されている。まず、図３に示すように、問い合わせ分布から決定される存在確率がノードＡ〜Ｃに対して設定されている。また、図３に例示するように、ノードＡの存在確率が「０．５」であり、ノードＢの存在確率が「０．２５」であり、ノードＣの存在確率が「０．２５」である。ここで存在確率とは、ランダムウォークが開始される各ノードの確率である。具体例な例を挙げて説明すると、例えば、ノードＡ〜Ｃは、ユーザがブックマークしているサイトであり、存在確率とは、ユーザが最初にアクセスするであろう各サイトの確率である。 In the example of FIGS. 3 to 5, there are a plurality of nodes A to H, and the nodes are connected by edges. First, as shown in FIG. 3, the existence probabilities determined from the query distribution are set for the nodes A to C. Further, as illustrated in FIG. 3, the existence probability of the node A is “0.5”, the existence probability of the node B is “0.25”, and the existence probability of the node C is “0.25”. is there. Here, the existence probability is a probability of each node at which the random walk is started. For example, the nodes A to C are sites bookmarked by the user, and the existence probability is the probability of each site that the user will access first.

次に、図４に示すように、関連度計算装置１０は、最初の関連度計算では、全てについてのノードＡ〜Ｈの関連度を計算する。図４に例示するように、ノードＡの関連度が「０．２２」、ノードＢの関連度が「０．１８」、ノードＣの関連度が「０．２１」、ノードＤの関連度が「０．１８」、ノードＥの関連度が「０．１１」、ノードＦの関連度が「０．０５」、ノードＧの関連度が「０．０２」、ノードＨの関連度が「０．３３」となっている。ここで関連度とは、問い合わせノードに対して個別化された重要度である。つまり、具体例な例を挙げて説明すると、例えば、ノードＡ〜Ｃが、ユーザがブックマークしているニュースサイトである場合には、ブックマークされていないニュースサイトのノードの関連度が高くなる。 Next, as shown in FIG. 4, the relevance calculation device 10 calculates the relevance of the nodes A to H for all in the first relevance calculation. As illustrated in FIG. 4, the relevance of node A is “0.22”, the relevance of node B is “0.18”, the relevance of node C is “0.21”, and the relevance of node D is “0.18”, the relevance of node E is “0.11”, the relevance of node F is “0.05”, the relevance of node G is “0.02”, and the relevance of node H is “0”. .33 ". Here, the degree of association is the degree of importance that is individualized for the inquiry node. That is, a specific example will be described. For example, when the nodes A to C are news sites bookmarked by the user, the relevance of the nodes of the news sites that are not bookmarked is increased.

次に、関連度計算装置１０は、関連度を計算した後であって、次の関連度計算処理を行う前に、各関連度が枝刈りの閾値よりも低いか否かを判定する。例えば、枝刈りの閾値が「０．０４」である場合には、図４の例では、ノードＧおよびノードＨの関連度が枝刈りの閾値「０．０４」よりも低い。このため、図５に例示するように、枝刈りの閾値「０．０４」よりも低いノードＧおよびノードＨが計算対象のノードから削除され、次の関連度計算処理が行われる。 Next, after calculating the relevance level, the relevance level calculation apparatus 10 determines whether each relevance level is lower than a pruning threshold before performing the next relevance level calculation process. For example, when the pruning threshold is “0.04”, in the example of FIG. 4, the degree of association between the node G and the node H is lower than the pruning threshold “0.04”. Therefore, as illustrated in FIG. 5, the node G and the node H lower than the pruning threshold “0.04” are deleted from the calculation target nodes, and the next relevance calculation process is performed.

このように、繰り返し計算の中で更新後の関連度が閾値θ未満の関連度の小さいノード（図４、５の例では、ノードＧ、Ｈ）を、計算対象のノードから削除（枝刈り）する。その結果、枝刈りされたノードからの伝搬値を計算する必要がなくなるので、伝搬値量の計算量が削減し、結果高速な関連度の計算が可能になる。なお、この手法は「関連度の小さいノードまたはエッジは更新後の関連度の値に大きな影響を与えない」という知見に基づいている。つまり、関連度の小さいノードまたはエッジを除外した場合であっても、更新後の関連度の値に大きな影響を与えないため、計算対象から除外して計算量を抑制することで、関連度計算を効率的に行うことができる。 In this way, nodes with small relevance with updated relevance less than the threshold θ in the repeated calculation (nodes G and H in the examples of FIGS. 4 and 5) are deleted (pruned) from the calculation target nodes. To do. As a result, there is no need to calculate the propagation value from the pruned node, so the amount of propagation value calculation is reduced, and as a result, the relevance can be calculated at high speed. This method is based on the knowledge that “a node or an edge having a low degree of association does not greatly affect the value of the degree of association after update”. In other words, even if nodes or edges with low relevance are excluded, the relevance value after update is not greatly affected. Can be performed efficiently.

［関連度計算装置による関連度計算処理］
次に、図６を用いて、関連度計算装置１０による関連度計算処理の流れを説明する。図６は、実施例１に係る関連度計算装置による関連度計算処理の流れを示すフローチャートである。図６に示すように、関連度計算装置１０の関連度計算部１２ａは、関連度計算開始指示をユーザ端末２０から受け付けると（ステップＳ１０１肯定）、まず関連度の分布ｓを問い合わせ分布ｄに初期化し、ノードの集合Ｖ′をノード全体の集合Ｖに初期化する（ステップＳ１０２）。そして、関連度計算部１２ａは、計算回数ｔを「１」とする（ステップＳ１０３）。 [Relevance calculation processing by relevance calculator]
Next, the flow of relevance calculation processing by the relevance calculation apparatus 10 will be described with reference to FIG. FIG. 6 is a flowchart illustrating the flow of the relevance calculation process performed by the relevance calculation apparatus according to the first embodiment. As illustrated in FIG. 6, when the relevance calculation unit 12a of the relevance calculation apparatus 10 receives a relevance calculation start instruction from the user terminal 20 (Yes in step S101), first, the relevance distribution s is initially set to the inquiry distribution d. The node set V ′ is initialized to the entire node set V (step S102). Then, the relevance calculation unit 12a sets the calculation count t to “1” (step S103).

続いて、関連度計算部１２ａは、ノードｕを一つずつノードの集合Ｖから選択し（ステップＳ１０４）、ノードｕの更新後の関連度ｓ′［ｕ］を０に初期化する（ステップＳ１０５）。そして、関連度計算部１２ａは、Ｖから全てのノードを選択したか否かを判定し（ステップＳ１０６）、Ｖから全てのノードを選択していない場合には（ステップＳ１０６否定）、ステップＳ１０４に戻る。また、関連度計算部１２ａは、Ｖから全てのノードを選択した場合には（ステップＳ１０６肯定）、ノードｕを一つずつノードの集合Ｖ′から選択し（ステップＳ１０７）、ノードｖを一つずつノードの集合Ｃｕから選択する（ステップＳ１０８）。 Subsequently, the relevance calculation unit 12a selects the nodes u one by one from the node set V (step S104), and initializes the relevance s ′ [u] after updating the node u to 0 (step S105). ). Then, the relevance calculation unit 12a determines whether or not all nodes have been selected from V (step S106), and if not all nodes have been selected from V (No in step S106), the process proceeds to step S104. Return. Further, when all the nodes are selected from V (Yes at Step S106), the relevance calculation unit 12a selects the nodes u one by one from the node set V ′ (Step S107), and selects one node v. Each node is selected from a set Cu of nodes (step S108).

そして、関連度計算部１２ａは、p=W[v,u]s[u]を計算し、計算したｐを用いて、s′[v]=s′[v]+pを計算する（ステップＳ１０９）。続いて、関連度計算部１２ａは、関連度計算部１２ａは、Ｃｕから全てのノードを選択したか否かを判定し（ステップＳ１１０）、Ｃｕから全てのノードを選択していない場合には（ステップＳ１１０否定）、ステップＳ１０８に戻る。また、関連度計算部１２ａは、Ｃｕから全てのノードを選択した場合には（ステップＳ１１０肯定）、Ｖ′から全てのノードを選択したか否かを判定し（ステップＳ１１１）、Ｖ′から全てのノードを選択していない場合には（ステップＳ１１１否定）、ステップＳ１０７に戻る。 Then, the relevance calculation unit 12a calculates p = W [v, u] s [u], and calculates s ′ [v] = s ′ [v] + p using the calculated p (step S109). Subsequently, the relevance calculation unit 12a determines whether or not the relevance calculation unit 12a has selected all the nodes from Cu (step S110). If all the nodes have not been selected from Cu ( No at Step S110), the process returns to Step S108. Further, when all the nodes are selected from Cu (Yes at Step S110), the relevance calculation unit 12a determines whether all the nodes are selected from V ′ (Step S111), and all the nodes from V ′ are determined. If no node is selected (No at step S111), the process returns to step S107.

また、関連度計算部１２ａは、Ｖ′から全てのノードを選択した場合には（ステップＳ１１１肯定）、s′=(1-c)s′[v]+cdを計算し、関連度分布ｓを更新後の関連度ｓ′とする（ステップＳ１１２）。 Further, when all the nodes are selected from V ′ (Yes at Step S111), the relevance calculation unit 12a calculates s ′ = (1-c) s ′ [v] + cd, and the relevance distribution s Is the relevance s ′ after the update (step S112).

そして、関連度計算装置１０の削除部１２ｂは、枝刈り後のノードの集合Ｖ′を０に初期化した後（ステップＳ１１３）、ノードｕを一つずつノードの集合Ｖから選択する（ステップＳ１１４）。そして、削除部１２ｂは、関連度計算部１２ａによって計算された各ノードｕのノード関連度ｓ′が枝刈りの閾値θ以上であるか否かを判定する（ステップＳ１１５）。この結果、削除部１２ｂは、ノードｕのノード関連度ｓ′が枝刈りの閾値θ以上である場合には（ステップＳ１１５肯定）、ノードｕを枝刈り後のノードの集合Ｖ′に加え（ステップＳ１１６）、ステップＳ１１７に進む。また、削除部１２ｂは、ノードｕのノード関連度ｓ′が枝刈りの閾値θ未満である場合には（ステップＳ１１５否定）、ノード関連度ｓ′が枝刈りの閾値θより低いノードを関連度計算対象のノードから除外して、ステップＳ１１７に進む。 Then, the deletion unit 12b of the relevance calculation apparatus 10 initializes the pruned node set V ′ to 0 (step S113), and then selects the nodes u one by one from the node set V (step S114). ). Then, the deletion unit 12b determines whether or not the node association degree s ′ of each node u calculated by the association degree calculation unit 12a is greater than or equal to the pruning threshold value θ (step S115). As a result, when the node relevance s ′ of the node u is greater than or equal to the pruning threshold θ (Yes in step S115), the deleting unit 12b adds the node u to the node set V ′ after pruning (step S115). S116), the process proceeds to step S117. When the node relevance s ′ of the node u is less than the pruning threshold θ (No in step S115), the deletion unit 12b determines that the node relevance s ′ is lower than the pruning threshold θ. The node is excluded from the calculation target nodes, and the process proceeds to step S117.

ステップＳ１１７では、削除部１２ｂは、ノードの集合Ｖから全てのノードを選択したかを判定し、ノードの集合Ｖから全てのノードを選択していない場合には（ステップＳ１１７否定）、ステップＳ１１４に戻る。また、削除部１２ｂがノードの集合Ｖから全てのノードを選択したと判定した場合には（ステップＳ１１７肯定）、関連度計算部１２ａは、計算回数ｔに１を加算し（ステップＳ１１８）、計算回数ｔが繰り返し計算回数Ｔを超えたか否かを判定する（ステップＳ１１９）。 In step S117, the deletion unit 12b determines whether all nodes have been selected from the node set V. If all nodes have not been selected from the node set V (No in step S117), the process proceeds to step S114. Return. If it is determined that the deletion unit 12b has selected all the nodes from the node set V (Yes at Step S117), the relevance calculation unit 12a adds 1 to the calculation count t (Step S118), and the calculation is performed. It is determined whether or not the number of times t has exceeded the number of repeated calculations T (step S119).

この結果、関連度計算部１２ａは、計算回数ｔが繰り返し計算回数Ｔを超えていないと判定した場合には（ステップＳ１１９否定）、ステップＳ１０４に戻って、関連度計算処理を行う。ここでの関連度計算処理は、関連度の低いノードが除外された計算対象のノードの集合Ｖ′から関連度を計算する。また、関連度計算部１２ａは、計算回数ｔが繰り返し計算回数Ｔを超えたと判定した場合には（ステップＳ１１９肯定）、関連度分布ｓをユーザ端末２０に出力し（ステップＳ１２０）、関連度計算処理を終了する。 As a result, when it is determined that the calculation count t does not exceed the repeat calculation count T (No at Step S119), the relevance calculation unit 12a returns to Step S104 to perform the relevance calculation processing. In the relevance level calculation process here, the relevance level is calculated from a set V ′ of nodes to be calculated from which nodes with low relevance levels are excluded. Also, when the relevance calculation unit 12a determines that the calculation count t has exceeded the repetitive calculation count T (Yes in step S119), the relevance distribution s is output to the user terminal 20 (step S120), and the relevance calculation The process ends.

［関連度計算装置による関連度計算処理の実験結果］
次に、実施例１に係る関連度計算装置による関連度計算処理の実験結果について、図７を用いて説明する。なお、実験結果については、あくまで一例であり、これにより構成または処理が限定されるものではない。図７は、実施例１に係る関連度計算装置により関連度の計算を行った場合の結果の一例を示す図である。なお、図７に示す「ノード枝刈り」とは、実施例１に係る関連度計算装置による関連度計算処理の実験結果を示し、「オリジナル」とは、従来のアルゴリズムに従って関連度計算した場合の実験結果を示す。 [Experimental result of relevance calculation processing by relevance calculator]
Next, experimental results of the relevance calculation process performed by the relevance calculation apparatus according to the first embodiment will be described with reference to FIG. In addition, about an experimental result, it is an example to the last, and a structure or a process is not limited by this. FIG. 7 is a diagram illustrating an example of a result when the relevance calculation is performed by the relevance calculation apparatus according to the first embodiment. Note that “node pruning” illustrated in FIG. 7 indicates the experimental result of the relevance calculation processing by the relevance calculation apparatus according to the first embodiment, and “original” indicates the relevance calculation according to the conventional algorithm. Experimental results are shown.

また、この実験においては、ノード数が２６５２１４、エッジ数が４２００４５で実験を行った。また、この実験においてはｃ＝０．１５，Ｔ＝１００とし、ランダムウォークを開始するノードはランダムに１つ設定した。またノードの枝刈りによる手法においては枝刈りの閾値θを１０^−３及び１０^−７にして実験を行った。 In this experiment, the number of nodes was 265214 and the number of edges was 420045. In this experiment, c = 0.15, T = 100, and one random starting node is set at random. In the method of pruning nodes, the experiment was performed with the pruning threshold θ set to 10 ⁻³ and 10 ⁻⁷ .

図７から本発明は従来の手法と比較して高速に関連度を計算できることがわかる。図７の例では、従来の計算時間が「１．０１［ｓ］」であるのに対して、本発明の手法では、閾値θが１０^−３である場合には、「０．０５［ｓ］」であり、閾値θが１０^−７である場合には、「０．４６［ｓ］」である。また、同時に枝刈りの閾値が大きいほうがより高速な関連度の計算が可能であることがわかる。これは閾値が大きい方がより多くのノードを枝刈りできるからである。 FIG. 7 shows that the present invention can calculate the relevance at a higher speed than the conventional method. In the example of FIG. 7, the conventional calculation time is “1.01 [s]”, but in the method of the present invention, when the threshold θ is 10 ⁻³ , “0.05 [s] ] ”And the threshold θ is 10 ⁻⁷ , it is“ 0.46 [s] ”. It can also be seen that the higher the pruning threshold, the faster the relevance can be calculated. This is because a larger threshold value allows more nodes to be pruned.

また図７から本発明における関連度の誤差が非常に小さい値になることがわかる。図７に示すように、本発明の手法では、閾値θが１０^−３である場合には、平均誤差が従来の手法に比べて「３．１５・１０^−８」であり、閾値θが１０^−７である場合には、「４．８１・１０^−１１」である。これは、本発明の手法ではノードを枝刈りするが、なるべく関連度の値に影響がないように閾値が設定されているからである。また図７から本発明は閾値が小さい方がより高い精度で関連度を計算できることがわかる。これは閾値が小さい方がより多くのノードまたはエッジが計算対象となるため、関連度の計算精度が向上するためである。これらの結果から本発明において高速な計算を行いたい場合は閾値を大きくし、高精度な計算を行いたい場合は閾値を小さくすればいいことがわかる。 It can also be seen from FIG. 7 that the relevance error in the present invention is a very small value. As shown in FIG. 7, in the method of the present invention, when the threshold θ is 10 ⁻³ , the average error is “3.15 · 10 ⁻⁸ ” compared to the conventional method, and the threshold θ is 10 ^When it is ⁻⁷ , it is “4.81 · 10 ⁻¹¹ ”. This is because the method of the present invention prunes the node, but the threshold value is set so as not to affect the relevance value as much as possible. FIG. 7 also shows that the present invention can calculate the degree of relevance with higher accuracy when the threshold value is smaller. This is because the calculation accuracy of the relevance is improved because more nodes or edges are to be calculated when the threshold value is smaller. From these results, it can be seen that in the present invention, the threshold value should be increased if high-speed calculation is desired, and the threshold value should be decreased if high-precision calculation is desired.

[実施例１の効果]
上述してきたように、実施例１に係る関連度計算装置１０は、データ間の関係をノードとエッジの関係で表現するデータモデルであるグラフ構造データにおいて、ノードから隣接ノードに対する伝搬値をノードの関連度とエッジの重みを用いてエッジごとに計算し、該エッジの伝搬値を用いてノードの集合体に対する各ノードの関連度を繰り返し計算する。そして、関連度計算装置１０は、ノードの関連度が計算されるたびに、ノードの関連度またはエッジの伝搬値に基づいて、所定のノードを計算対象のノードから除外する。そして、関連度計算装置１０は、所定のノードが除外された計算対象のノードの関連度を計算する。これにより、関連度の低いノードを計算対象のノードから除外していくことが出来る結果、関連度算出の計算量を抑制する技術を備えた関連度計算の仕組みを提供することが可能である。また、閾値を調整することにより関連度計算における計算速度の精度の調整が可能である。 [Effect of Example 1]
As described above, the degree-of-association calculation apparatus 10 according to the first embodiment uses a graph structure data, which is a data model that expresses a relationship between data as a relationship between a node and an edge. Calculation is performed for each edge using the degree of association and the weight of the edge, and the degree of association of each node with respect to the aggregate of nodes is repeatedly calculated using the propagation value of the edge. Then, each time the degree of association of the node is calculated, the degree-of-association calculation apparatus 10 excludes a predetermined node from the calculation target node based on the degree of association of the node or the propagation value of the edge. Then, the relevance calculation device 10 calculates the relevance of the calculation target node from which a predetermined node is excluded. As a result, nodes with low relevance can be excluded from the nodes to be calculated. As a result, it is possible to provide a relevance calculation mechanism having a technique for suppressing the calculation amount of relevance. In addition, it is possible to adjust the accuracy of the calculation speed in the relevance calculation by adjusting the threshold.

また、実施例１に係る関連度計算装置１０は、計算されたノードの関連度が所定の閾値未満であるか否かを判定し、ノードの関連度が閾値未満であるノードを、計算対象のノードから除外する。このため、関連度の低いノードを計算対象のノードから除外していく枝刈りを実行することができ、枝刈りを実行した後の計算処理では、絞り込んだノードから関連度の計算を実行するので、関連度算出の計算量を抑制することが可能である。 Further, the relevance calculation apparatus 10 according to the first embodiment determines whether or not the calculated relevance of the node is less than a predetermined threshold, and determines the node having the relevance of the node less than the threshold as a calculation target. Exclude from the node. For this reason, pruning can be performed by excluding nodes with low relevance from the nodes to be calculated, and in the calculation process after pruning, relevance is calculated from the narrowed down nodes. It is possible to suppress the calculation amount of the relevance calculation.

また、実施例１に係る関連度計算装置１０は、予め設定された繰り返し計算回数に応じて、ノードの関連度を繰り返し計算する。このため、関連度計算の繰り返し回数を容易に変更することが可能である。 In addition, the relevance calculation apparatus 10 according to the first embodiment repeatedly calculates the relevance of nodes according to a preset number of repetitive calculations. For this reason, it is possible to easily change the number of repetitions of relevance calculation.

ところで、上記の実施例１では、関連度が低いノードを計算対象のノードから除外する場合を説明したが、これに限定されるものではなく、例えば、関連度の低いエッジの計算を中止し、エッジにつながれたノードを計算対象から除外するようにしてもよい。そこで、以下で説明する実施例２では、関連度の低いエッジの計算を中止し、エッジにつながれたノードを計算対象から除外する場合について説明する。 By the way, in the above-described first embodiment, a case where a node having a low degree of association is excluded from a calculation target node has been described. However, the present invention is not limited to this. For example, the calculation of an edge having a low degree of association is stopped, You may make it exclude the node connected to the edge from calculation object. Thus, in a second embodiment described below, a case will be described in which calculation of an edge having a low degree of association is stopped and a node connected to the edge is excluded from the calculation target.

まず、図８を用いて、実施例２に係る関連度計算装置１０Ａの構成について説明する。図８は、実施例２に係る関連度計算装置の構成を説明するための図である。なお、実施例１に係る関連度計算装置１０と同様の構成については、説明を省略する。 First, the configuration of the relevance calculation apparatus 10A according to the second embodiment will be described with reference to FIG. FIG. 8 is a diagram for explaining the configuration of the relevance calculation apparatus according to the second embodiment. In addition, description is abbreviate | omitted about the structure similar to the relevance calculation apparatus 10 which concerns on Example 1. FIG.

図８に示すように、実施例２に係る関連度計算装置１０Ａは、実施例１に係る関連度計算装置１０と比較して、並び替え部１ｃ２をさらに有する。並び替え部１２ｃは、ノードごとに該ノードにつながれたエッジの重み値が設定され、該重み値が大きい順にエッジを並び替える。ここで、図３の例を用いて説明すると、並び替え部１２ｃは、ノードＡについて、ノードＢとつながるエッジ（ここでは、エッジＩＤ「１」のエッジとする）およびノードＣとつながるエッジ（ここでは、エッジＩＤ「２」のエッジとする）の２つのエッジにつながれている。 As illustrated in FIG. 8, the relevance calculation apparatus 10A according to the second embodiment further includes a rearrangement unit 1c2 as compared with the relevance calculation apparatus 10 according to the first embodiment. The rearrangement unit 12c sets the weight value of the edge connected to the node for each node, and rearranges the edges in descending order of the weight value. Here, to explain using the example of FIG. 3, the rearrangement unit 12 c has an edge connected to the node B (here, an edge with the edge ID “1”) and an edge connected to the node C (here In this case, it is connected to two edges having an edge ID “2”.

ここで、図９に例示するように、ノードＡとノードＢを接続するエッジＩＤ１のエッジについて、重み値が「０．６」であり、ノードＡとノードＣを接続するエッジＩＤ２のエッジについて、重み値が「０．４」である場合には、並び替え部１２ｃは、重み値が大きい順として、エッジＩＤ１、エッジＩＤ２の順に並び替える。図９は、重みの降順で並び替えられたエッジの例を示す図である。 Here, as illustrated in FIG. 9, for the edge of edge ID1 that connects node A and node B, the weight value is “0.6”, and for the edge of edge ID2 that connects node A and node C, When the weight value is “0.4”, the rearrangement unit 12c rearranges in order of edge ID1 and edge ID2 in descending order of the weight value. FIG. 9 is a diagram illustrating an example of edges rearranged in descending order of weight.

また、実施例２に係る関連度計算装置１０Ａの関連度計算部１２ａは、並び替え部１２ｃに並び替えられた順にエッジの伝搬値を計算し、また、各ノードの関連度を計算する。具体的には、関連度計算部１２ａは、図１０に例示するアルゴリズムを用いて、関連度分布を計算する。図１０におけるアルゴリズムにおいてＶとＣｕをそれぞれグラフにおける全てのノードの集合及びノードｕから出ているエッジがつながっている隣接ノードの集合とする。また、ｓ′を繰り返しにおいて計算される更新後の関連度とし、ｐをノード間で計算される関連度の伝搬値とする。また、更新後のあるノードの関連度の値は、そのノードにつながっている隣接ノードからの伝搬値の総和から計算される。Ｖ′を枝刈り後のノードの集合とし、θを枝刈りにおける閾値とする。 Further, the relevance calculation unit 12a of the relevance calculation apparatus 10A according to the second embodiment calculates edge propagation values in the order rearranged by the rearrangement unit 12c, and calculates the relevance of each node. Specifically, the relevance calculation unit 12a calculates a relevance distribution using the algorithm illustrated in FIG. In the algorithm in FIG. 10, V and Cu are set as a set of all nodes in the graph and a set of adjacent nodes to which edges coming from the node u are connected. In addition, s ′ is the relevance degree after update calculated in repetition, and p is a propagation value of the relevance degree calculated between nodes. Further, the relevance value of a node after updating is calculated from the sum of propagation values from adjacent nodes connected to that node. V ′ is a set of nodes after pruning, and θ is a threshold for pruning.

図１０に示すように、関連度計算部１２ａは、まず関連度の分布ｓを問い合わせ分布に初期化する（１行目参照）。そして繰り返し計算のなかで効率的な計算の枝刈りを行うために、各ノードに対してそれらのノードから出ているエッジをエッジの重み（グラフの隣接行列の非ゼロ要素の大きさに対応）が大きい順で並び替える（２〜４行目参照）。 As shown in FIG. 10, the relevance calculation unit 12a first initializes the relevance distribution s to the inquiry distribution (see the first line). And in order to perform efficient calculation pruning in repeated calculations, the edge that comes out of each node is assigned to the edge weight (corresponding to the size of the non-zero element of the adjacency matrix of the graph) Sort in descending order (see lines 2-4).

そして、関連度計算部１２ａは、繰り返し計算のなかでは各ノードから他の隣接するノードへの伝搬値ｐを計算する（１１行目参照）。 Then, the relevance calculation unit 12a calculates a propagation value p from each node to another adjacent node in the repeated calculation (see the 11th line).

削除部１２ｂは、関連度計算部１２ａによって計算されたエッジの伝搬値が枝刈り閾値未満であるか否かを判定し、エッジの伝搬値が枝刈り閾値未満である場合には、該エッジの重み値よりも小さい重み値のエッジの伝搬値の計算を中止し、該エッジで接続されるノードを、計算対象のノードから除外する。 The deletion unit 12b determines whether the propagation value of the edge calculated by the relevance calculation unit 12a is less than the pruning threshold. If the propagation value of the edge is less than the pruning threshold, the deletion unit 12b The calculation of the propagation value of the edge having a weight value smaller than the weight value is stopped, and the node connected by the edge is excluded from the calculation target nodes.

つまり、各ノードにおいて、エッジの集合を用いて当該ノードに隣接する隣接ノードへの伝搬値を計算する過程で、ある隣接ノードへの伝搬値からその後に計算する他の隣接ノードへの伝搬値が枝刈りの閾値θより小さくなることがわかった段階で当該他の隣接ノードを含む未計算の残りの隣接ノードに対する伝搬値の計算を中止し（計算を枝刈りし）、結果として高速な関連度の計算を行う。 That is, at each node, in the process of calculating the propagation value to the adjacent node adjacent to the node using the set of edges, the propagation value to the other adjacent node calculated thereafter from the propagation value to a certain adjacent node is When it is found that the value is smaller than the pruning threshold value θ, the calculation of the propagation value for the remaining uncalculated adjacent nodes including the other adjacent nodes is stopped (pruning the calculation), and as a result, the relevance is high. Perform the calculation.

具体的には、削除部１２ｂは、図１０に示すように、計算された伝搬値が枝刈りの閾値θより小さければそのノードからの以下のエッジの重みを有する未計算の残りの隣接ノードに対する伝搬値の計算を枝刈りする（１３〜１５行目参照）。これは、エッジが重みの大きい順に並び替えられているため、その後に計算される予定の残りの隣接ノードへの伝搬値がθより大きく以上になることがないからである。この手法は伝搬値が既定の閾値より小さいエッジで接続されるノードへの伝搬値の計算を効率的に枝刈りできるため、高速な関連度の計算が可能である。 Specifically, as shown in FIG. 10, if the calculated propagation value is smaller than the pruning threshold θ, the deletion unit 12b applies to the remaining uncalculated adjacent nodes having the following edge weights from the node. The calculation of the propagation value is pruned (see the 13th to 15th lines). This is because the edges are rearranged in the descending order of weight, so that the propagation value to the remaining adjacent node scheduled to be calculated thereafter does not become larger than θ. Since this method can efficiently prun the calculation of the propagation value to the node connected with the edge whose propagation value is smaller than the predetermined threshold value, the relevance can be calculated at high speed.

［関連度計算装置による関連度計算処理］
次に、図１１を用いて、関連度計算装置１０Ａによる関連度計算処理の流れを説明する。図１１は、実施例２に係る関連度計算装置による関連度計算処理の流れを示すフローチャートである。図１１に示すように、関連度計算装置１０Ａの並び替え部１２ｃは、関連度計算開始指示をユーザ端末２０から受け付けると（ステップＳ２０１肯定）、まず関連度の分布ｓを問い合わせ分布ｄに初期化する（ステップＳ２０２）。 [Relevance calculation processing by relevance calculator]
Next, the flow of relevance calculation processing by the relevance calculation apparatus 10A will be described with reference to FIG. FIG. 11 is a flowchart illustrating a relevance calculation process performed by the relevance calculation apparatus according to the second embodiment. As illustrated in FIG. 11, when the reordering unit 12c of the relevance calculation apparatus 10A receives a relevance calculation start instruction from the user terminal 20 (Yes in step S201), first, the relevance distribution s is initialized to the inquiry distribution d. (Step S202).

続いて、並び替え部１２ｃは、ノードｕを一つずつノードの集合Ｖから選択し（ステップＳ２０３）、ノードの集合Ｃｕをエッジの重みの降順でソートする（ステップＳ２０４）。そして、並び替え部１２ｃは、Ｖから全てのノードを選択したか否かを判定し（ステップＳ２０５）、Ｖから全てのノードを選択していない場合には（ステップＳ２０５否定）、ステップＳ２０３に戻る。また、並び替え部１２ｃがＶから全てのノードを選択したと判定した場合には（ステップＳ２０５肯定）、関連度計算部１２ａは、計算回数ｔを「１」とする（ステップＳ２０６）。 Subsequently, the rearrangement unit 12c selects nodes u one by one from the node set V (step S203), and sorts the node set Cu in descending order of edge weights (step S204). Then, the rearrangement unit 12c determines whether or not all nodes have been selected from V (step S205). If not all nodes have been selected from V (No in step S205), the process returns to step S203. . If the reordering unit 12c determines that all nodes have been selected from V (Yes at Step S205), the relevance calculation unit 12a sets the calculation count t to “1” (Step S206).

続いて、関連度計算部１２ａは、ノードｕを一つずつノードの集合Ｖから選択し（ステップＳ２０７）、ノードｕの更新後の関連度ｓ′［ｕ］を０に初期化する（ステップＳ２０８）。そして、関連度計算部１２ａは、Ｖから全てのノードを選択したか否かを判定し（ステップＳ２０９）、Ｖから全てのノードを選択していない場合には（ステップＳ２０９否定）、ステップＳ２０７に戻る。また、関連度計算部１２ａは、Ｖから全てのノードを選択した場合には（ステップＳ２０９肯定）、ノードｕを一つずつノードの集合Ｖ′から選択し（ステップＳ２１０）、ノードｖを一つずつノードの集合Ｃｕから選択する（ステップＳ２１１）。ここでは、ステップＳ２０３でソートされた順番に、ノードの集合Ｃｕからノードｖを一つずつ選択する。 Subsequently, the relevance calculation unit 12a selects the nodes u one by one from the node set V (step S207), and initializes the relevance s ′ [u] after updating the node u to 0 (step S208). ). Then, the degree-of-association calculation unit 12a determines whether or not all nodes have been selected from V (step S209). If all nodes have not been selected from V (No in step S209), the process proceeds to step S207. Return. Further, when all the nodes are selected from V (Yes at Step S209), the relevance calculation unit 12a selects the nodes u one by one from the node set V ′ (Step S210), and selects one node v. Each node is selected from the node set Cu (step S211). Here, the nodes v are selected one by one from the node set Cu in the order sorted in step S203.

そして、関連度計算部１２ａは、p=W[v,u]s[u]を計算し、計算したｐを用いて、s′[v]=s′[v]+pを計算する（ステップＳ２１２）。そして、削除部１２ｂは、計算された伝搬値ｐが枝刈りの閾値θ未満であるか否かを判定する（ステップＳ２１３）。この結果、削除部１２ｂが計算された伝搬値ｐが枝刈りの閾値θ未満でない場合には（ステップＳ２１３否定）、関連度計算部１２ａは、Ｃｕから全てのノードを選択したか否かを判定し（ステップＳ２１４）、Ｃｕから全てのノードを選択していない場合には（ステップＳ２１４否定）、ステップＳ２１１に戻る。また、Ｃｕから全てのノードを選択した場合には（ステップＳ２１４肯定）、ステップＳ２１５に進む。また、削除部１２ｂが計算された伝搬値ｐが枝刈りの閾値θ未満である場合には（ステップＳ２１３肯定）、ステップＳ２１５に進む。 Then, the relevance calculation unit 12a calculates p = W [v, u] s [u], and calculates s ′ [v] = s ′ [v] + p using the calculated p (step S212). Then, the deletion unit 12b determines whether or not the calculated propagation value p is less than the pruning threshold value θ (step S213). As a result, when the propagation value p calculated by the deletion unit 12b is not less than the pruning threshold θ (No in step S213), the relevance calculation unit 12a determines whether all nodes have been selected from Cu. If not all nodes are selected from Cu (step S214 negative), the process returns to step S211. If all nodes are selected from Cu (Yes at step S214), the process proceeds to step S215. When the propagation value p calculated by the deletion unit 12b is less than the pruning threshold value θ (Yes in step S213), the process proceeds to step S215.

ステップＳ２１５では、関連度計算部１２ａは、ノードの集合Ｖから全てのノードを選択したかを判定し、ノードの集合Ｖから全てのノードを選択していない場合には（ステップＳ２１５否定）、ステップＳ２１０に戻る。また、関連度計算部１２ａは、ノードの集合Ｖから全てのノードを選択したと判定した場合には（ステップＳ２１５肯定）、s′=(1-c)s′[v]+cdを計算し、関連度分布ｓを更新後の関連度ｓ′とする（ステップＳ２１６）。 In step S215, the relevance calculation unit 12a determines whether all nodes have been selected from the node set V. If all nodes have not been selected from the node set V (No in step S215), step S215 is performed. Return to S210. If the relevance calculation unit 12a determines that all nodes have been selected from the node set V (Yes in step S215), the relevance calculation unit 12a calculates s '= (1-c) s' [v] + cd. The relevance degree distribution s is set as the relevance degree s ′ after update (step S216).

そして、関連度計算部１２ａは、計算回数ｔに１を加算し（ステップＳ２１７）、計算回数ｔが繰り返し計算回数Ｔを超えたか否かを判定する（ステップＳ２１８）。 Then, the degree-of-association calculation unit 12a adds 1 to the number of calculations t (step S217), and determines whether the number of calculations t exceeds the number of repetitions T (step S218).

この結果、関連度計算部１２ａは、計算回数ｔが繰り返し計算回数Ｔを超えていないと判定した場合には（ステップＳ２１８否定）、ステップＳ２０７に戻って、関連度計算処理を行う。また、関連度計算部１２ａは、計算回数ｔが繰り返し計算回数Ｔを超えたと判定した場合には（ステップＳ２１８肯定）、関連度分布ｓをユーザ端末２０に出力し（ステップＳ２１９）、関連度計算処理を終了する。 As a result, when it is determined that the calculation count t does not exceed the repeated calculation count T (No at Step S218), the relevance calculation unit 12a returns to Step S207 to perform the relevance calculation processing. If the relevance calculation unit 12a determines that the calculation count t has exceeded the repetitive calculation count T (Yes in step S218), the relevance distribution s is output to the user terminal 20 (step S219), and the relevance calculation is performed. The process ends.

［関連度計算装置による関連度計算処理の実験結果］
次に、実施例２に係る関連度計算装置による関連度計算処理の実験結果について、図１２を用いて説明する。なお、実験結果については、あくまで一例であり、これにより構成または処理が限定されるものではない。図１２は、実施例２に係る関連度計算装置により関連度の計算を行った場合の結果の一例を示す図である。なお、図１２に示す「エッジ枝刈り」とは、実施例２に係る関連度計算装置による関連度計算処理の実験結果を示し、「オリジナル」とは、従来のアルゴリズムに従って関連度計算した場合の実験結果を示す。 [Experimental result of relevance calculation processing by relevance calculator]
Next, the experiment result of the relevance calculation process performed by the relevance calculation apparatus according to the second embodiment will be described with reference to FIG. In addition, about an experimental result, it is an example to the last, and a structure or a process is not limited by this. FIG. 12 is a diagram illustrating an example of a result when the relevance calculation is performed by the relevance calculation apparatus according to the second embodiment. Note that “edge pruning” illustrated in FIG. 12 indicates an experimental result of the relevance calculation processing by the relevance calculation apparatus according to the second embodiment, and “original” indicates a relevance calculation according to a conventional algorithm. Experimental results are shown.

また、この実験においては、ノード数が２６５２１４、エッジ数が４２００４５で実験を行った。また、この実験においてはｃ＝０．１５，Ｔ＝１００とし、ランダムウォークを開始するノードはランダムに１つ設定した。またエッジの枝刈りによる手法においては枝刈りの閾値θを１０^−３及び１０^−７にして実験を行った。 In this experiment, the number of nodes was 265214 and the number of edges was 420045. In this experiment, c = 0.15, T = 100, and one random starting node is set at random. In the method of edge pruning, experiments were performed with the pruning threshold θ set to 10 ⁻³ and 10 ⁻⁷ .

図１２から本発明は従来の手法と比較して高速に関連度を計算できることがわかる。図１２の例では、従来の計算時間が「１．０１［ｓ］」であるのに対して、本発明の手法では、閾値θが１０^−３である場合には、「０．９３［ｓ］」であり、閾値θが１０^−７である場合には、「０．９６［ｓ］」である。また、同時に枝刈りの閾値が大きいほうがより高速な関連度の計算が可能であることがわかる。これは閾値が大きい方がより多くのノードまたはエッジを枝刈りできるからである。 FIG. 12 shows that the present invention can calculate the relevance at a higher speed than the conventional method. In the example of FIG. 12, the conventional calculation time is “1.01 [s]”, whereas in the method of the present invention, when the threshold θ is 10 ⁻³ , “0.93 [s] ] ”And the threshold θ is 10 ⁻⁷ , it is“ 0.96 [s] ”. It can also be seen that the higher the pruning threshold, the faster the relevance can be calculated. This is because a larger threshold value allows more nodes or edges to be pruned.

また図１２から本発明における関連度の誤差が非常に小さい値になることがわかる。図１２に示すように、本発明の手法では、閾値θが１０^−３である場合には、平均誤差が従来の手法に比べて「３．５４・１０^−８」であり、閾値θが１０^−７である場合には、「３．１７・１０^−１０」である。これは、本発明はノードまたはエッジを枝刈りするが、なるべく関連度の値に影響がないように閾値が設定されているからである。また図１２から本発明は閾値が小さい方がより高い精度で関連度を計算できることがわかる。これは閾値が小さい方がより多くのノードまたはエッジが計算対象となるため、関連度の計算精度が向上するためである。これらの結果から本発明において高速な計算を行いたい場合は閾値を大きくし、高精度な計算を行いたい場合は閾値を小さくすればよいことがわかる。 It can also be seen from FIG. 12 that the relevance error in the present invention is a very small value. As shown in FIG. 12, in the method of the present invention, when the threshold θ is 10 ⁻³ , the average error is “3.54 · 10 ⁻⁸ ” compared to the conventional method, and the threshold θ is 10 ^In the case of ⁻⁷ , it is “3.17 · 10 ⁻¹⁰ ”. This is because the present invention prunes a node or an edge, but the threshold is set so that the relevance value is not affected as much as possible. Further, FIG. 12 shows that the present invention can calculate the degree of association with higher accuracy when the threshold value is smaller. This is because the calculation accuracy of the relevance is improved because more nodes or edges are to be calculated when the threshold value is smaller. From these results, it can be seen that in the present invention, the threshold value should be increased if high-speed calculation is desired, and the threshold value should be decreased if high-precision calculation is desired.

[実施例２の効果]
上述してきたように、実施例２に係る関連度計算装置１０Ａは、ノードごとに該ノードにつながれたエッジの重み値が設定され、該重み値が大きい順にエッジを並び替える。そして、関連度計算装置１０Ａは、並び替えられた順にエッジの伝搬値を計算し、計算されたエッジの伝搬値が閾値未満であるか否かを判定し、エッジの伝搬値が閾値未満である場合には、該エッジの重み値よりも小さい重み値のエッジの伝搬値の計算を中止し、該エッジで接続されるノードを、計算対象のノードから除外する。これにより、関連度の低いエッジを計算対象から除外していくことが出来る結果、関連度算出の計算量を抑制する技術を備えた関連度計算の仕組みを提供することが可能である。また、閾値を調整することにより関連度計算における計算速度の精度の調整が可能である。 [Effect of Example 2]
As described above, the relevance calculation apparatus 10A according to the second embodiment sets the weight value of the edge connected to the node for each node, and rearranges the edges in descending order of the weight value. Then, the relevance calculation device 10A calculates edge propagation values in the rearranged order, determines whether the calculated edge propagation value is less than the threshold value, and the edge propagation value is less than the threshold value. In such a case, the calculation of the propagation value of the edge having a weight value smaller than the weight value of the edge is stopped, and the node connected by the edge is excluded from the calculation target nodes. As a result, it is possible to exclude edges with low relevance from the calculation target. As a result, it is possible to provide a relevance calculation mechanism including a technique for suppressing the calculation amount of relevance. In addition, it is possible to adjust the accuracy of the calculation speed in the relevance calculation by adjusting the threshold.

ところで、上記実施例１、２における関連度計算装置は、上述した実施例以外にも、種々の異なる形態にて実施されてよい。そこで、実施例３では、上記の関連度計算システムの他の実施例について説明する。 By the way, the relevance calculation apparatus in the said Example 1, 2 may be implemented with a various different form other than the Example mentioned above. Therefore, in a third embodiment, another embodiment of the above relevance calculation system will be described.

［繰り返し計算］
上記実施例１、２の関連度計算装置は、繰り返し計算の収束条件として、繰り返し計算回数を予め設定する場合を説明したが、本発明はこれに限定されるものではなく、収束の閾値を予め設定するようにしてもよい。例えば、関連度計算装置は、繰り返し行われる関連度の計算において、今回計算された関連度と前回計算された関連度との差分を計算し、該差分が予め設定された収束の閾値よりも小さい場合には、関連度の計算を終了し、大きい場合には、関連度の計算を繰り返すようにしてもよい。 [Repetition calculation]
The relevance calculation devices of the first and second embodiments have been described with respect to the case where the number of iterations is set in advance as the convergence condition for iterations. However, the present invention is not limited to this, and the convergence threshold is set in advance. You may make it set. For example, in the relevance calculation performed repeatedly, the relevance calculation device calculates a difference between the relevance calculated this time and the relevance calculated last time, and the difference is smaller than a preset convergence threshold. In such a case, the calculation of the degree of association may be terminated, and if it is large, the calculation of the degree of association may be repeated.

［システム構成］
また、本実施例において説明した各処理の内、自動的に行われるものとして説明した処理の全部又は一部を手動的に行うこともでき、あるいは、手動的に行われるものとして説明した処理の全部又は一部を公知の方法で自動的に行うこともできる。この他、上述文書中や図面中で示した処理手順、制御手順、具体的名称、各種のデータやパラメータを含む情報については、特記する場合を除いて任意に変更することができる。 [System configuration]
Of the processes described in this embodiment, all or a part of the processes described as being automatically performed can be manually performed, or the processes described as being performed manually can be performed. All or a part can be automatically performed by a known method. In addition, the processing procedures, control procedures, specific names, and information including various data and parameters shown in the above-described document and drawings can be arbitrarily changed unless otherwise specified.

また、図示した各装置の各構成要素は機能概念的なものであり、必ずしも物理的に図示の如く構成されていることを要しない。すなわち、各装置の分散・統合の具体的形態は図示のものに限られず、その全部又は一部を、各種の負荷や使用状況などに応じて、任意の単位で機能的又は物理的に分散・統合して構成することができる。 Further, each component of each illustrated apparatus is functionally conceptual, and does not necessarily need to be physically configured as illustrated. That is, the specific form of distribution / integration of each device is not limited to the one shown in the figure, and all or a part of the distribution / integration may be functionally or physically distributed in arbitrary units according to various loads or usage conditions. Can be integrated and configured.

［プログラム］
図１３は、開示の技術に係る関連度計算プログラムによる情報処理がコンピュータを用いて具体的に実現されることを示す図である。図１３に例示するように、コンピュータ１０００は、例えば、メモリ１０１０と、ＣＰＵ（Central Processing Unit）１０２０と、ハードディスクドライブインタフェース１０３０と、ディスクドライブインタフェース１０４０と、シリアルポートインタフェース１０５０と、ビデオアダプタ１０６０と、ネットワークインタフェース１０７０とを有する。コンピュータ１０００の各部はバス１０８０によって接続される。 [program]
FIG. 13 is a diagram illustrating that the information processing by the relevance calculation program according to the disclosed technique is specifically realized using a computer. As illustrated in FIG. 13, the computer 1000 includes, for example, a memory 1010, a CPU (Central Processing Unit) 1020, a hard disk drive interface 1030, a disk drive interface 1040, a serial port interface 1050, a video adapter 1060, A network interface 1070. Each part of the computer 1000 is connected by a bus 1080.

メモリ１０１０は、図１３に例示するように、ＲＯＭ１０１１及びＲＡＭ１０１２を含む。ＲＯＭ１０１１は、例えば、ＢＩＯＳ（Basic Input Output System）等のブートプログラムを記憶する。ハードディスクドライブインタフェース１０３０は、図１３に例示するように、ハードディスクドライブ１０３１に接続される。ディスクドライブインタフェース１０４０は、図１３に例示するように、ディスクドライブ１０４１に接続される。例えば磁気ディスクや光ディスク等の着脱可能な記憶媒体が、ディスクドライブ１０４１に挿入される。シリアルポートインタフェース１０５０は、図１３に例示するように、例えばマウス１０５１、キーボード１０５２に接続される。ビデオアダプタ１０６０は、図１３に例示するように、例えばディスプレイ１０６１に接続される。 The memory 1010 includes a ROM 1011 and a RAM 1012 as illustrated in FIG. The ROM 1011 stores a boot program such as BIOS (Basic Input Output System). The hard disk drive interface 1030 is connected to the hard disk drive 1031 as illustrated in FIG. The disk drive interface 1040 is connected to the disk drive 1041 as illustrated in FIG. For example, a removable storage medium such as a magnetic disk or an optical disk is inserted into the disk drive 1041. The serial port interface 1050 is connected to a mouse 1051 and a keyboard 1052, for example, as illustrated in FIG. The video adapter 1060 is connected to a display 1061, for example, as illustrated in FIG.

ここで、図１３に例示するように、ハードディスクドライブ１０３１は、例えば、ＯＳ１０９１、アプリケーションプログラム１０９２、プログラムモジュール１０９３、プログラムデータ１０９４を記憶する。すなわち、開示の技術に係る関連度計算プログラムは、コンピュータによって実行される指令が記述されたプログラムモジュール１０９３として、例えばハードディスクドライブ１０３１に記憶される。具体的には、上記実施例で説明した制御部１２の各部と同様の情報処理を実行する手順各々が記述されたプログラムモジュールが、ハードディスクドライブ１０３１に記憶される。 Here, as illustrated in FIG. 13, the hard disk drive 1031 stores, for example, an OS 1091, an application program 1092, a program module 1093, and program data 1094. That is, the relevance calculation program according to the disclosed technique is stored in, for example, the hard disk drive 1031 as the program module 1093 in which an instruction to be executed by the computer is described. Specifically, a program module in which a procedure for executing the same information processing as each unit of the control unit 12 described in the above embodiment is described is stored in the hard disk drive 1031.

また、関連度計算プログラムによる情報処理に用いられるデータは、プログラムデータ１０９４として、例えばハードディスクドライブ１０３１に記憶される。そして、ＣＰＵ１０２０が、ハードディスクドライブ１０３１に記憶されたプログラムモジュール１０９３やプログラムデータ１０９４を必要に応じてＲＡＭ１０１２に読み出し、各種の手順を実行する。 Further, data used for information processing by the relevance calculation program is stored as program data 1094 in, for example, the hard disk drive 1031. The CPU 1020 reads out the program module 1093 and the program data 1094 stored in the hard disk drive 1031 to the RAM 1012 as necessary, and executes various procedures.

なお、関連度計算プログラムに係るプログラムモジュール１０９３やプログラムデータ１０９４は、ハードディスクドライブ１０３１に記憶される場合に限られない。例えば、プログラムモジュール１０９３やプログラムデータ１０９４は、着脱可能な記憶媒体に記憶されても良い。この場合、ＣＰＵ１０２０は、ディスクドライブなどの着脱可能な記憶媒体を介してデータを読み出す。また、同様に、関連度計算プログラムに係るプログラムモジュール１０９３やプログラムデータ１０９４は、ネットワーク（ＬＡＮ（Local Area Network）、ＷＡＮ（Wide Area Network）等）を介して接続された他のコンピュータに記憶されても良い。この場合、ＣＰＵ１０２０は、ネットワークインタフェースを介して他のコンピュータにアクセスすることで各種データを読み出す。 Note that the program module 1093 and the program data 1094 related to the relevance calculation program are not limited to being stored in the hard disk drive 1031. For example, the program module 1093 and the program data 1094 may be stored in a removable storage medium. In this case, the CPU 1020 reads data via a removable storage medium such as a disk drive. Similarly, the program module 1093 and the program data 1094 related to the relevance calculation program are stored in another computer connected via a network (LAN (Local Area Network), WAN (Wide Area Network), etc.). Also good. In this case, the CPU 1020 reads various data by accessing another computer via the network interface.

［その他］
なお、本実施例で説明した関連度計算プログラムは、インターネットなどのネットワークを介して配布することができる。また、制御プログラムは、ハードディスク、フレキシブルディスク（ＦＤ）、ＣＤ−ＲＯＭ、ＭＯ、ＤＶＤなどのコンピュータで読み取り可能な記録媒体に記録され、コンピュータによって記録媒体から読み出されることによって実行することもできる。 [Others]
The relevance calculation program described in the present embodiment can be distributed via a network such as the Internet. The control program can also be executed by being recorded on a computer-readable recording medium such as a hard disk, a flexible disk (FD), a CD-ROM, an MO, and a DVD, and being read from the recording medium by the computer.

１０、１０Ａ関連度計算装置
１１通信制御部
１２制御部
１２ａ関連度計算部
１２ｂ削除部
１２ｃ並び替え部
２０ユーザ端末
１０００コンピュータ
１０１０メモリ
１０２０ＣＰＵ
１０３０ハードディスクドライブインタフェース
１０４０ディスクドライブインタフェース
１０５０シリアルポートインタフェース
１０５１マウス
１０５２キーボード
１０６０ビデオアダプタ
１０６１ディスプレイ
１０７０ネットワークインタフェース
１０８０バス
１０９１ＯＳ
１０９２アプリケーションプログラム
１０９３プログラムモジュール
１０９４プログラムデータ DESCRIPTION OF SYMBOLS 10, 10A Relevance degree calculation apparatus 11 Communication control part 12 Control part 12a Relevance degree calculation part 12b Deletion part 12c Rearrangement part 20 User terminal 1000 Computer 1010 Memory 1020 CPU
1030 Hard disk drive interface 1040 Disk drive interface 1050 Serial port interface 1051 Mouse 1052 Keyboard 1060 Video adapter 1061 Display 1070 Network interface 1080 Bus 1091 OS
1092 Application program 1093 Program module 1094 Program data

Claims

In graph structure data, which is a data model that expresses the relationship between data by the relationship between the node and the edge, the propagation value from the node to the adjacent node is calculated for each edge using the relevance of the node and the edge weight. A relevance calculator that repeatedly calculates the relevance of each node to a set of nodes using a propagation value;
A deletion unit that excludes a predetermined node from a calculation target node based on the association degree of the node or the propagation value of the edge each time the association degree of the node is calculated by the association degree calculation unit. ,
The relevance calculating unit calculates the relevance of the calculation target node from which the predetermined node is excluded.

The deletion unit determines whether or not the relevance level of the node calculated by the relevance level calculation unit is less than a predetermined first threshold value, and determines a node whose relevance level of the node is less than the first threshold value. The relevance calculation apparatus according to claim 1, wherein the predetermined node is excluded from the calculation target nodes.

A weighting value of an edge connected to the node is set for each node, and further includes a sorting unit that sorts the edges in descending order of the weighting value,
The relevance calculator calculates edge propagation values in the order rearranged by the rearranger,
The deletion unit determines whether or not the edge propagation value calculated by the relevance calculation unit is less than a predetermined second threshold, and when the edge propagation value is less than the second threshold The calculation of the propagation value of an edge having a weight value smaller than the weight value of the edge is stopped, and a node connected by the edge is excluded from the calculation target node as the predetermined node. The relevance calculation apparatus according to claim 1.

The degree-of-association calculation unit according to any one of claims 1 to 3, wherein the degree-of-association calculation unit repeatedly calculates the degree of association of the node according to a preset number of iterations.

In graph structure data, which is a data model that expresses the relationship between data by the relationship between the node and the edge, the propagation value from the node to the adjacent node is calculated for each edge using the relevance of the node and the edge weight. A relevance calculator that repeatedly calculates the relevance of each node to a set of nodes using a propagation value;
A deletion unit that excludes a predetermined node from a calculation target node based on the association degree of the node or the propagation value of the edge each time the association degree of the node is calculated by the association degree calculation unit. ,
The degree-of-association calculation unit calculates the degree of association of the calculation target node from which the predetermined node is excluded.

A relevance calculation method executed by a relevance calculation device,
In graph structure data, which is a data model that expresses the relationship between data by the relationship between the node and the edge, the propagation value from the node to the adjacent node is calculated for each edge using the relevance of the node and the edge weight. A relevance calculation step of repeatedly calculating the relevance of each node to a collection of nodes using the propagation value;
A step of removing a predetermined node from the calculation target node based on the degree of association of the node or the propagation value of the edge each time the degree of association of the node is calculated by the degree of association calculation step. ,
The degree-of-association calculating step calculates the degree of association of the calculation target node from which the predetermined node is excluded.

In graph structure data, which is a data model that expresses the relationship between data by the relationship between the node and the edge, the propagation value from the node to the adjacent node is calculated for each edge using the relevance of the node and the edge weight. A relevance calculation step for repeatedly calculating the relevance of each node to a collection of nodes using the propagation value;
A step of removing a predetermined node from a calculation target node based on the degree of association of the node or the propagation value of the edge each time the degree of association of the node is calculated by the degree of association calculation step; To run
The relevance calculation step calculates the relevance of the calculation target node from which the predetermined node is excluded.