JP2011248534A

JP2011248534A - Network analysis device, network analysis method and program for network analysis using graph pattern

Info

Publication number: JP2011248534A
Application number: JP2010119709A
Authority: JP
Inventors: Kyoshi Iizuka; 京士飯塚; Hiroyuki Sato; 宏之佐藤; Toru Kobayashi; 透小林
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2010-05-25
Filing date: 2010-05-25
Publication date: 2011-12-08

Abstract

PROBLEM TO BE SOLVED: To provide a network analysis device, a network analysis method and a program for network analysis permitting highly accurate analysis of a network represented in a labeled directed graph.SOLUTION: There are provided an analytical graph generating unit 31 that is connected to an RDF data memory device 10 in which RDF data of a labeled directed graph are stored, calculates a coefficient representing the state of superposition of result nodes identified out of the RDF data by substituting a plurality of designated node values out of the values of nodes matching a designated label for a key node of an analytical graph pattern having the same label as a designated label, and so generates an analytical graph, which is a non-labeled undirected graph, that this coefficient has an edge between nodes at or above a preset threshold, and a network analyzing unit 33 that performs network analysis of this analytical graph.

Description

本発明は、ラベル付き有向グラフにより表現されたネットワークを分析する、グラフパターンを用いたネットワーク分析装置、ネットワーク分析方法、およびネットワーク分析用プログラムに関する。 The present invention relates to a network analysis device using a graph pattern, a network analysis method, and a network analysis program for analyzing a network expressed by a directed graph with a label.

従来、膨大な量のデータの中から、人間の意志決定に必要な情報として特徴的な傾向を示す情報を抽出するための技術として、データマイニング技術がある。このデータマイニング技術では、主に統計手法によりデータどうしの関連性を分析することにより算出された、特徴的な数値が用いられている。 Conventionally, there is a data mining technique as a technique for extracting information showing a characteristic tendency as information necessary for human decision making from a huge amount of data. In this data mining technique, characteristic numerical values calculated by analyzing the relationship between data mainly by a statistical method are used.

しかしこの統計手法では、様々な事物が複雑に関連し合う事象がノードおよびノード間をつなぐエッジによりネットワークとして表現されたグラフ構造データの中から関連性を分析するには不向きであるため、このグラフ構造データの特徴を数学的に分析するためのネットワーク分析技術が注目され盛んに研究されている。 However, this statistical method is not suitable for analyzing relevance from the graph structure data expressed as a network by nodes and the edges connecting between the nodes, in which various things are complicatedly related. Network analysis technology for mathematically analyzing the characteristics of structural data has attracted attention and has been actively studied.

グラフ構造データを分析するネットワーク分析技術の代表的なものとして例えば、中心性分析やクリーク抽出分析がある。 Typical examples of network analysis techniques for analyzing graph structure data include centrality analysis and clique extraction analysis.

この中心性分析とは、分析対象とするグラフ構造データの中から影響力の強いノードを抽出することで中心的なノードを分析するものであり、またクリーク分析とは、分析対象とするグラフ構造データの中から密接な関連を持つ３個以上のノードで形成される一部分のデータを抽出することでノードの派閥を分析するものである。 This centrality analysis is to analyze the central node by extracting nodes that have a strong influence from the graph structure data to be analyzed. Clique analysis is the graph structure to be analyzed. The faction of a node is analyzed by extracting a part of data formed by three or more closely related nodes from the data.

これらの分析手法を用いることにより、多量の情報を含むグラフ構造データの分析を行うことができる。 By using these analysis techniques, it is possible to analyze graph structure data including a large amount of information.

一方近年は、グラフ構造データの中でも、ＲＤＦ（Resource Description Framework）と呼ばれるメタデータの表現方法でラベル付き有向グラフをネットワークで表現することにより、多量の情報をコンピュータで処理可能にする技術であるセマンティックＷｅｂが普及しつつある。 On the other hand, in recent years, Semantic Web, which is a technology that makes it possible to process a large amount of information on a computer by expressing a labeled directed graph on a network using a metadata description method called RDF (Resource Description Framework) among graph structure data. Is spreading.

このセマンティックＷｅｂでは、主体および客体となるリソースを記述するために用いる関係や特徴を考慮し、さらに高度なグラフ構造データを処理することができる。 In this Semantic Web, more advanced graph structure data can be processed in consideration of the relationship and features used to describe the subject and object resources.

特開２００８−１８１３３１号公報JP 2008-181331 A

しかし、上述した中心性分析やクリーク分析は、ラベル無し無向グラフ（１種類のノードおよび１種類のエッジで構成されたグラフ）で表現されたネットワークの分析を想定したものであり、ラベル付き有向グラフ（複数種類のノードと、ノード間の関係性の有無および関係性の種類を方向で示すエッジで構成されたグラフ）で表現されたネットワークの分析にはそのまま適用できないという問題があった。 However, the above-described centrality analysis and clique analysis are intended for analysis of a network expressed by an unlabeled undirected graph (a graph composed of one type of node and one type of edge). There is a problem that it cannot be applied as it is to the analysis of a network expressed by (a graph composed of a plurality of types of nodes, the presence / absence of relationships between nodes, and edges indicating the types of relationships by directions).

ラベル付き有向グラフで表現されたネットワーク分析を行うため、特許文献１に記載の技術を利用することにより、相互関係を求めたい同じラベルのノードに対して同じグラフパターンを用いて分析対象のネットワークの中からパターンマッチングを行い、結果として得られる同種のエッジで構成されたラベル無し無向グラフのグラフ構造データを抽出することが考えられる。 In order to perform a network analysis expressed by a directed graph with a label, by using the technique described in Patent Document 1, the same graph pattern is used for the nodes of the same label for which a correlation is to be obtained. It is conceivable to perform pattern matching and extract graph structure data of an unlabeled undirected graph composed of the same kind of edges as a result.

しかし、ここで抽出されたグラフ構造データのエッジにはノード間の相関関係を示す重みが付加されており、そのままネットワーク分析に利用すると相関度の低いエッジがノイズとなり精度の高い分析を行うことができないという問題があった。 However, the weights indicating the correlation between nodes are added to the edges of the graph structure data extracted here, and when used in network analysis as they are, edges with low correlation become noise and high-precision analysis can be performed. There was a problem that I could not.

そこで本発明では、ラベル付き有向グラフで表現されたネットワークの分析を、高い精度で行うことが可能な、グラフパターンを用いたネットワーク分析装置、ネットワーク分析方法、およびネットワーク分析用プログラムを提供することを目的とする。 In view of the above, an object of the present invention is to provide a network analysis apparatus, a network analysis method, and a network analysis program using a graph pattern, which can analyze a network expressed by a directed graph with a label with high accuracy. And

上記の課題を解決するための、本発明のネットワーク分析装置は、データ要素を示す値およびこの値の種類を示すラベルからなるノードと、ノード間の関係を示すラベルおよびこの関係の向きからなるエッジとから構成されるラベル付有向グラフの構造を持つグラフ構造データが記憶されたグラフ構造データ記憶装置と、外部から指定された値をノードの値とする１つのキーノード、および任意の値を取り得るノード含むグラフ構造で構成されたグラフパターンを記憶するグラフパターン記憶装置とに接続されたネットワーク分析装置において、指定されたラベルと同一ラベルを持つキーノードを有する分析グラフパターンをグラフパターン記憶装置から取得するグラフパターン取得部と、前記グラフパターン取得部で取得された分析グラフパターンのキーノードに、指定された複数のノードの値をそれぞれ代入することで、前記グラフ構造データ記憶装置に記憶されたグラフ構造データの中から結果ノードを特定する結果ノード特定部と、前記指定された複数のノードの値を有するノードの任意の２ノードについて、前記結果ノード特定部で特定された結果ノードの重なり具合を示す係数を算出し、算出された係数が、予め設定された条件を満たすノード間にエッジを有するように、前記複数のノードの値を有するノードから構成されるラベル無し無向グラフの分析グラフを生成する分析グラフ生成部と、前記分析グラフ生成部で生成された分析グラフに対して、所定の分析手法を用いてネットワーク分析を行い、分析結果を生成するネットワーク分析部とを備えることを特徴とする。 In order to solve the above problems, the network analysis device of the present invention includes a node including a value indicating a data element and a label indicating the type of the value, a label indicating a relationship between the nodes, and an edge including the direction of the relationship A graph structure data storage device in which graph structure data having a labeled directed graph structure is stored, one key node having a value designated from the outside as a node value, and a node that can take an arbitrary value A graph for acquiring an analysis graph pattern having a key node having the same label as a specified label from the graph pattern storage device in a network analysis device connected to the graph pattern storage device for storing the graph pattern configured by including the graph structure A pattern acquisition unit and an analysis graph acquired by the graph pattern acquisition unit A result node specifying unit for specifying a result node from the graph structure data stored in the graph structure data storage device by substituting the values of a plurality of specified nodes for the key node of the turn, and the specified node A coefficient indicating the degree of overlap of the result nodes specified by the result node specifying unit is calculated for any two nodes having a plurality of node values, and the calculated coefficients satisfy a preset condition. An analysis graph generation unit that generates an analysis graph of an unlabeled undirected graph composed of nodes having values of the plurality of nodes so as to have edges between the nodes, and an analysis graph generated by the analysis graph generation unit A network analysis unit that performs a network analysis using a predetermined analysis method and generates an analysis result, That.

また、本発明の他の形態のネットワーク分析装置は、データ要素を示す値およびこの値の種類を示すラベルからなるノードと、ノード間の関係を示すラベルおよびこの関係の向きからなるエッジとから構成されるラベル付有向グラフの構造を持つグラフ構造データが記憶されたグラフ構造データ記憶装置に接続されたネットワーク分析装置において、外部から指定された値をノードの値とする１つのキーノードと、任意の値を取り得るノードと含むグラフ構造で構成されたグラフパターンで、前記グラフ構造データ記録装置に格納されたグラフ構造データから所定数以上のグラフ構造部分がマッチングされるものを抽出する頻出グラフパターン抽出部と、前記頻出グラフパターン抽出部で抽出されたグラフパターンのうち、指定されたラベルと同一ラベルを持つキーノードを有するものを分析グラフパターンとして取得するグラフパターン取得部と、前記グラフパターン取得部で取得された分析グラフパターンのキーノードに、指定された複数のノードの値をそれぞれ代入することで、前記グラフ構造データ記憶装置に記憶されたグラフ構造データの中から結果ノードを特定する結果ノード特定部と、前記指定された複数のノードの値を有するノードの任意の２ノードについて、前記結果ノード特定部で特定された結果ノードの重なり具合を示す係数を算出し、算出された係数が、予め設定された条件を満たすノード間にエッジを有するように、前記複数のノードの値を有するノードから構成されるラベル無し無向グラフの分析グラフを生成する分析グラフ生成部と、前記分析グラフ生成部で生成された分析グラフに対して、所定の分析手法を用いてネットワーク分析を行い、分析結果を生成するネットワーク分析部とを備えることを特徴とする。 A network analysis device according to another aspect of the present invention includes a node including a value indicating a data element and a label indicating the type of the value, a label indicating a relationship between the nodes, and an edge including the direction of the relationship. In a network analyzer connected to a graph structure data storage device in which graph structure data having a labeled directed graph structure is stored, one key node having a value designated from the outside as a node value, and an arbitrary value A frequent graph pattern extraction unit that extracts a graph pattern composed of a graph structure including a node that can take a plurality of points, and a graph structure data stored in the graph structure data recording device that matches a predetermined number or more of the graph structure parts And a specified label among the graph patterns extracted by the frequent graph pattern extraction unit. A graph pattern acquisition unit that acquires a key node having the same label as an analysis graph pattern, and assigns the values of a plurality of specified nodes to the key nodes of the analysis graph pattern acquired by the graph pattern acquisition unit, respectively. The result node specifying unit for specifying a result node from the graph structure data stored in the graph structure data storage device, and the result for any two nodes having the values of the specified plurality of nodes A node having values of the plurality of nodes so that a coefficient indicating the overlapping degree of the result nodes specified by the node specifying unit is calculated, and the calculated coefficient has an edge between nodes satisfying a preset condition. An analysis graph generation unit that generates an analysis graph of an unlabeled undirected graph composed of: Relative analysis graph generated by the generating unit performs network analysis using a predetermined analysis method, characterized in that it comprises a network analyzer for generating analysis results.

また、本発明のネットワーク分析方法は、データ要素を示す値およびこの値の種類を示すラベルからなるノードと、ノード間の関係を示すラベルおよびこの関係の向きからなるエッジとから構成されるラベル付有向グラフの構造を持つグラフ構造データが記憶されたグラフ構造データ記憶装置と、外部から指定された値をノードの値とする１つのキーノード、および任意の値を取り得るノード含むグラフ構造で構成されたグラフパターンを記憶するグラフパターン記憶装置とに接続されたコンピュータが、指定されたラベルと同一ラベルを持つキーノードを有する分析グラフパターンをグラフパターン記憶装置から取得するグラフパターン取得ステップと、前記グラフパターン取得ステップで取得された分析グラフパターンのキーノードに、指定された複数のノードの値をそれぞれ代入することで、前記グラフ構造データ記憶装置に記憶されたグラフ構造データの中から結果ノードを特定する結果ノード特定ステップと、前記指定された複数のノードの値を有するノードの任意の２ノードについて、前記結果ノード特定部で特定された結果ノードの重なり具合を示す係数を算出し、算出された係数が、予め設定された条件を満たすノード間にエッジを有するように、前記複数のノードの値を有するノードから構成されるラベル無し無向グラフの分析グラフを生成する分析グラフ生成ステップと、前記分析グラフ生成ステップで生成された分析グラフに対して、所定の分析手法を用いてネットワーク分析を行い、分析結果を生成するネットワーク分析ステップとを有することを特徴とする。 Further, the network analysis method of the present invention provides a label with a node composed of a value indicating a data element and a label indicating the type of the value, a label indicating a relationship between the nodes, and an edge including the direction of the relationship. It is composed of a graph structure data storage device storing graph structure data having a directed graph structure, a graph structure including one key node whose value is a value designated from the outside, and a node that can take an arbitrary value. A graph pattern acquisition step in which a computer connected to a graph pattern storage device that stores a graph pattern acquires an analysis graph pattern having a key node having the same label as the designated label from the graph pattern storage device, and the graph pattern acquisition In the key node of the analysis graph pattern obtained in the step, A result node specifying step of specifying a result node from the graph structure data stored in the graph structure data storage device by substituting the values of the plurality of specified nodes, respectively, For any two nodes having values, a coefficient indicating the degree of overlap of the result nodes specified by the result node specifying unit is calculated, and the calculated coefficient sets an edge between nodes satisfying a preset condition. An analysis graph generating step for generating an analysis graph of an unlabeled undirected graph composed of nodes having values of the plurality of nodes, and an analysis graph generated in the analysis graph generation step And a network analysis step for generating an analysis result by performing network analysis using the analysis method of That.

また、本発明の他の形態のネットワーク分析方法は、データ要素を示す値およびこの値の種類を示すラベルからなるノードと、ノード間の関係を示すラベルおよびこの関係の向きからなるエッジとから構成されるラベル付有向グラフの構造を持つグラフ構造データが記憶されたグラフ構造データ記憶装置に接続されたコンピュータが、外部から指定された値をノードの値とする１つのキーノードと、任意の値を取り得るノードと含むグラフ構造で構成されたグラフパターンで、前記グラフ構造データ記録装置に格納されたグラフ構造データから所定数以上のグラフ構造部分がマッチングされるものを抽出する頻出グラフパターン抽出ステップと、前記頻出グラフパターン抽出部で抽出されたグラフパターンのうち、指定されたラベルと同一ラベルを持つキーノードを有するものを分析グラフパターンとして取得するグラフパターン取得ステップと、前記グラフパターン取得部で取得された分析グラフパターンのキーノードに、指定された複数のノードの値をそれぞれ代入することで、前記グラフ構造データ記憶装置に記憶されたグラフ構造データの中から結果ノードを特定する結果ノード特定ステップと、前記指定された複数のノードの値を有するノードの任意の２ノードについて、前記結果ノード特定部で特定された結果ノードの重なり具合を示す係数を算出し、算出された係数が、予め設定された条件を満たすノード間にエッジを有するように、前記複数のノードの値を有するノードから構成されるラベル無し無向グラフの分析グラフを生成する分析グラフ生成ステップと、前記分析グラフ生成ステップで生成された分析グラフに対して、所定の分析手法を用いてネットワーク分析を行い、分析結果を生成するネットワーク分析ステップとを有することを特徴とする。 A network analysis method according to another aspect of the present invention includes a node including a value indicating a data element and a label indicating the type of the value, a label indicating a relationship between the nodes, and an edge including the direction of the relationship. The computer connected to the graph structure data storage device storing the graph structure data having the structure of the directed graph with the label takes one key node having the value designated from the outside as the node value and an arbitrary value. A frequent graph pattern extracting step of extracting a graph pattern composed of a graph structure including a node to be obtained and a graph structure data stored in the graph structure data recording device with a predetermined number or more of the graph structure parts being matched; Of the graph patterns extracted by the frequent graph pattern extraction unit, the same label as the specified label is displayed. A graph pattern acquisition step of acquiring a key node having a key as an analysis graph pattern, and substituting the values of a plurality of specified nodes into the key node of the analysis graph pattern acquired by the graph pattern acquisition unit, respectively A result node specifying step for specifying a result node from the graph structure data stored in the graph structure data storage device, and the result node for any two nodes having values of the plurality of designated nodes A coefficient indicating the degree of overlap of the result nodes specified by the specifying unit is calculated, and from the nodes having the values of the plurality of nodes such that the calculated coefficient has an edge between nodes satisfying a preset condition. An analysis graph generation step for generating an analysis graph of the unlabeled undirected graph configured; Relative analysis chart analysis graph generated by the generating step performs a network analysis using a predetermined analysis method, and having a network analysis step of generating an analysis result.

また、本発明のネットワーク分析用プログラムは、コンピュータを、前記いずれかの形態のネットワーク分析装置として機能させることを特徴とする。 A network analysis program according to the present invention causes a computer to function as any one of the above-described network analysis apparatuses.

本発明のグラフパターンを用いたネットワーク分析装置、ネットワーク分析方法、およびネットワーク分析用プログラムによれば、ラベル付き有向グラフで表現されたネットワークの分析を、高い精度で行うことができる。 According to the network analysis device, the network analysis method, and the network analysis program using the graph pattern of the present invention, the analysis of the network expressed by the directed graph with a label can be performed with high accuracy.

本発明の第１実施形態によるネットワーク分析装置を利用したネットワーク分析システムの構成を示すブロック図である。It is a block diagram which shows the structure of the network analysis system using the network analysis apparatus by 1st Embodiment of this invention. 本発明の第１実施形態によるネットワーク分析装置を利用したネットワーク分析システムの動作を示すシーケンス図である。It is a sequence diagram which shows operation | movement of the network analysis system using the network analyzer by 1st Embodiment of this invention. 本発明の第１実施形態および第２実施形態によるネットワーク分析装置に接続されたＲＤＦデータ記憶装置に記憶されたＲＤＦデータの一例である。It is an example of the RDF data memorize | stored in the RDF data storage device connected to the network analysis apparatus by 1st Embodiment and 2nd Embodiment of this invention. 本発明の第１実施形態および第２実施形態によるネットワーク分析装置で利用される検索用グラフパターンの例を示す説明図である。It is explanatory drawing which shows the example of the graph pattern for a search utilized with the network analyzer by 1st Embodiment and 2nd Embodiment of this invention. 本発明の第１実施形態および第２実施形態によるネットワーク分析装置で実行される分析グラフ生成処理の動作を示すフローチャートである。It is a flowchart which shows the operation | movement of the analysis graph production | generation process performed with the network analyzer by 1st Embodiment and 2nd Embodiment of this invention. 本発明の第１実施形態および第２実施形態によるネットワーク分析装置で実行される分析グラフ生成処理において結果ノードが抽出されるときの状態を示す説明図である。It is explanatory drawing which shows a state when a result node is extracted in the analysis graph production | generation process performed with the network analyzer by 1st Embodiment and 2nd Embodiment of this invention. 本発明の第１実施形態および第２実施形態によるネットワーク分析装置で実行される分析グラフ生成処理において算出されるシンプソン係数の例を示す表である。It is a table | surface which shows the example of the Simpson coefficient calculated in the analysis graph production | generation process performed with the network analyzer by 1st Embodiment and 2nd Embodiment of this invention. 本発明の第１実施形態および第２実施形態によるネットワーク分析装置で実行される分析グラフ生成処理において生成された分析グラフの一例を示す説明図である。It is explanatory drawing which shows an example of the analysis graph produced | generated in the analysis graph production | generation process performed with the network analyzer by 1st Embodiment and 2nd Embodiment of this invention. 本発明の第１実施形態および第２実施形態によるネットワーク分析装置で実行される分析グラフ生成処理において算出されるシンプソン係数の他の例を示す表である。It is a table | surface which shows the other example of the Simpson coefficient calculated in the analysis graph production | generation process performed with the network analyzer by 1st Embodiment and 2nd Embodiment of this invention. 本発明の第１実施形態および第２実施形態によるネットワーク分析装置で実行される分析グラフ生成処理において生成された分析グラフの他の例を示す説明図である。It is explanatory drawing which shows the other example of the analysis graph produced | generated in the analysis graph production | generation process performed with the network analyzer by 1st Embodiment and 2nd Embodiment of this invention. 本発明の第１実施形態および第２実施形態によるネットワーク分析装置で実行されるネットワーク分析処理において算出された中心度の一例を示す表である。It is a table | surface which shows an example of the centrality calculated in the network analysis process performed with the network analyzer by 1st Embodiment and 2nd Embodiment of this invention. 本発明の第１実施形態および第２実施形態によるネットワーク分析装置で実行されるネットワーク分析処理において抽出されたクリークの一例を示す説明図である。It is explanatory drawing which shows an example of the clique extracted in the network analysis process performed with the network analyzer by 1st Embodiment and 2nd Embodiment of this invention. 本発明の第１実施形態および第２実施形態によるネットワーク分析装置で実行される周辺情報検索処理の動作を示すフローチャートである。It is a flowchart which shows the operation | movement of the periphery information search process performed with the network analyzer by 1st Embodiment and 2nd Embodiment of this invention. 本発明の第１実施形態および第２実施形態によるネットワーク分析装置で実行される周辺情報検索処理で抽出された周辺情報の例を示す説明図である。It is explanatory drawing which shows the example of the periphery information extracted by the periphery information search process performed with the network analyzer by 1st Embodiment and 2nd Embodiment of this invention. 本発明の第１実施形態および第２実施形態によるネットワーク分析装置で生成された分析結果の一例を示す説明図である。It is explanatory drawing which shows an example of the analysis result produced | generated with the network analyzer by 1st Embodiment and 2nd Embodiment of this invention. 本発明の第１実施形態および第２実施形態によるネットワーク分析装置で実行される周辺情報検索処理の動作を示すフローチャートである。It is a flowchart which shows the operation | movement of the periphery information search process performed with the network analyzer by 1st Embodiment and 2nd Embodiment of this invention. 本発明の第１実施形態および第２実施形態によるネットワーク分析装置で生成された分析結果の一例を示す説明図である。It is explanatory drawing which shows an example of the analysis result produced | generated with the network analyzer by 1st Embodiment and 2nd Embodiment of this invention. 本発明の第１実施形態および第２実施形態によるネットワーク分析装置で実行される周辺情報検索処理の動作を示すフローチャートである。It is a flowchart which shows the operation | movement of the periphery information search process performed with the network analyzer by 1st Embodiment and 2nd Embodiment of this invention. 本発明の第１実施形態および第２実施形態によるネットワーク分析装置で実行される周辺情報検索処理で抽出された周辺情報の例を示す説明図である。It is explanatory drawing which shows the example of the periphery information extracted by the periphery information search process performed with the network analyzer by 1st Embodiment and 2nd Embodiment of this invention. 本発明の第１実施形態および第２実施形態によるネットワーク分析装置で生成された分析結果の一例を示す説明図である。It is explanatory drawing which shows an example of the analysis result produced | generated with the network analyzer by 1st Embodiment and 2nd Embodiment of this invention. 本発明の第１実施形態および第２実施形態によるネットワーク分析装置で実行される周辺情報検索処理の動作を示すフローチャートである。It is a flowchart which shows the operation | movement of the periphery information search process performed with the network analyzer by 1st Embodiment and 2nd Embodiment of this invention. 本発明の第１実施形態および第２実施形態によるネットワーク分析装置で生成された分析結果の一例を示す説明図である。It is explanatory drawing which shows an example of the analysis result produced | generated with the network analyzer by 1st Embodiment and 2nd Embodiment of this invention. 本発明の第１実施形態および第２実施形態によるネットワーク分析装置で生成され表示された分析結果の一例を示す説明図である。It is explanatory drawing which shows an example of the analysis result produced | generated and displayed with the network analyzer by 1st Embodiment and 2nd Embodiment of this invention. 本発明の第２実施形態によるネットワーク分析装置を利用したネットワーク分析システムの構成を示すブロック図である。It is a block diagram which shows the structure of the network analysis system using the network analyzer by 2nd Embodiment of this invention. 本発明の第２実施形態によるネットワーク分析装置を利用したネットワーク分析システムの動作を示すシーケンス図である。It is a sequence diagram which shows operation | movement of the network analysis system using the network analyzer by 2nd Embodiment of this invention.

《第１実施形態》
〈第１実施形態によるネットワーク分析装置を利用したネットワーク分析システムの構成〉
本発明の第１実施形態によるネットワーク分析システム１は、図１に示すように、ＲＤＦデータ記憶装置１０と、ユーザ端末２０と、ネットワーク分析装置３０と、グラフパターン記憶装置４０とが接続され構成されている。 << First Embodiment >>
<Configuration of Network Analysis System Using Network Analysis Device According to First Embodiment>
As shown in FIG. 1, the network analysis system 1 according to the first embodiment of the present invention is configured by connecting an RDF data storage device 10, a user terminal 20, a network analysis device 30, and a graph pattern storage device 40. ing.

ＲＤＦデータ記憶装置１０には、分析対象のデータがＲＤＦで表現されたＲＤＦデータが記憶されている。ここで記憶されているＲＤＦデータはＸＭＬ等により記述されたものであり、個々のデータをノードとし、主体（主語）となるデータのノードとこの主体に対する客体（目的語）となるデータのノードとの関係性（述語）をエッジで示すことにより、ネットワークとして表現されている。 The RDF data storage device 10 stores RDF data in which data to be analyzed is expressed in RDF. The RDF data stored here is described in XML or the like, and each data is a node, and a data node that is a subject (subject) and a data node that is an object (object) for this subject. By representing the relationship (predicate) of each with an edge, it is expressed as a network.

またこのＲＤＦデータにより表現された情報では、個々のデータの要素はノード値として示され、さらに各ノードにはノード値の種類を示すラベルが付されている。またノード間をつなぐ各エッジには、主体となるノード値を有するデータのノードから客体となるノード値有するデータのノードへ向かう向きの情報が含まれるとともに、この主体と客体との関係性を示すエッジの種類を示すラベルが付されている。これらのラベルおよびエッジの向きの情報が含まれるグラフを、ラベル付き有向グラフという。 In the information expressed by the RDF data, each data element is indicated as a node value, and each node is provided with a label indicating the type of the node value. Each edge connecting nodes includes information on the direction from the data node having the subject node value to the data node having the object node value, and indicates the relationship between the subject and the object. A label indicating the type of edge is attached. A graph including information on the direction of the label and the edge is referred to as a directed graph with a label.

ユーザ端末２０はユーザにより操作されるものであり、ネットワーク分析装置３０にネットワーク分析処理を行わせるために必要な情報を入力し、ネットワーク分析装置３０に送信する。また、ユーザ端末２０はモニタ（図示せず）を有し、ネットワーク分析装置３０から送信された情報を表示しユーザに提供する。 The user terminal 20 is operated by a user, inputs information necessary for causing the network analysis device 30 to perform network analysis processing, and transmits the information to the network analysis device 30. Further, the user terminal 20 has a monitor (not shown), displays information transmitted from the network analysis device 30 and provides it to the user.

ネットワーク分析装置３０は、分析グラフ生成部３１と、分析グラフ記憶部３２と、ネットワーク分析部３３と、周辺情報検索部３４とを有する。 The network analysis device 30 includes an analysis graph generation unit 31, an analysis graph storage unit 32, a network analysis unit 33, and a peripheral information search unit 34.

分析グラフ生成部３１は、グラフパターン記憶装置４０に記憶された中から、ユーザによりユーザ端末２０から指定されたラベルに対応するグラフパターンをグラフパターン記憶装置４０から取得して分析グラフパターンとし、この分析グラフパターンにマッチするサブグラフをＲＤＦデータ記憶装置１０に記憶されたＲＤＦデータから抽出する。 The analysis graph generation unit 31 obtains a graph pattern corresponding to the label designated from the user terminal 20 by the user from the graph pattern storage device 40 from among the graphs stored in the graph pattern storage device 40 to obtain an analysis graph pattern. A subgraph that matches the analysis graph pattern is extracted from the RDF data stored in the RDF data storage device 10.

そして、抽出されたサブグラフを用いて、ユーザによりユーザ端末２０から指定されたノード値の集合である分析ノード集合に関するラベル無し無向グラフで表現されたサブグラフを、分析グラフとして生成する。 Then, using the extracted subgraph, a subgraph expressed by an unlabeled undirected graph related to an analysis node set, which is a set of node values designated by the user from the user terminal 20, is generated as an analysis graph.

分析グラフ記憶部３２は例えば一時的に記憶可能なメモリで構成され、分析グラフ生成部３１で生成された分析グラフを記憶する。 The analysis graph storage unit 32 is configured by a memory that can be temporarily stored, for example, and stores the analysis graph generated by the analysis graph generation unit 31.

ネットワーク分析部３３は、分析グラフ記憶部３２に記憶された分析グラフの特徴を数学的に分析するためのネットワーク分析を行い、分析結果を生成する。 The network analysis unit 33 performs network analysis for mathematically analyzing the characteristics of the analysis graph stored in the analysis graph storage unit 32 and generates an analysis result.

周辺情報検索部３４は、ネットワーク分析部３３でネットワーク分析が行われた分析グラフ内の特徴的なノードまたはエッジに関するメタデータを周辺情報として取得してネットワーク分析の分析結果に付加し、ユーザ端末２０に提供するために出力する。 The peripheral information search unit 34 acquires metadata about a characteristic node or edge in the analysis graph subjected to network analysis by the network analysis unit 33 as peripheral information, adds it to the analysis result of the network analysis, and the user terminal 20. Output to provide to.

グラフパターン記憶装置４０は、予め生成された、特定の構造を有するグラフ構造部分（サブグラフ）を検出するための検索用グラフパターンを複数記憶する。この検索用グラフパターンは、ユーザ端末２０などの外部から指定された値をノードの値とする１つのキーノードと、任意の値を取り得るノードと含むグラフ構造で構成される。 The graph pattern storage device 40 stores a plurality of search graph patterns generated in advance for detecting a graph structure portion (subgraph) having a specific structure. This search graph pattern has a graph structure including one key node having a value designated from the outside such as the user terminal 20 as a node value and a node capable of taking an arbitrary value.

〈第１実施形態によるネットワーク分析装置を利用したネットワーク分析システムの動作〉
次に、本実施形態によるネットワーク分析システム１の動作について、図２のシーケンス図を参照して説明する。 <Operation of Network Analysis System Using Network Analysis Device According to First Embodiment>
Next, the operation of the network analysis system 1 according to the present embodiment will be described with reference to the sequence diagram of FIG.

本実施形態においてＲＤＦデータ記憶装置１０には、例えば図３のようなネットワークとして表現されるＲＤＦデータがＸＭＬにより記述され、分析対象のデータとして複数記憶されている。このＲＤＦデータのうち、楕円で示されるものがそれぞれのデータを示すノードであり、主体となるデータのノードとこの主体に対する客体となるデータのノードとの間のつながりをエッジとして矢印で示している。 In the present embodiment, the RDF data storage device 10 describes, for example, RDF data expressed as a network as shown in FIG. 3 in XML and stores a plurality of data as analysis target data. Among the RDF data, those indicated by ellipses are nodes indicating the respective data, and the connection between the data node as the subject and the data node as the object for the subject is indicated by an arrow as an edge. .

そして、図３において各ノードの中に記載した情報は各ノードのラベルおよびノード値であり、「：」の左側がラベルを示し、右側がノード値を示す。例えば、「人：山本幸子」と記載されたノードは、ノード値の種類であるラベルが「人」であり、格納されたデータの内容であるノード値が「山本幸子」である。図３においては、図を簡略化するため、ラベル「技術用語」を「技」と表し、ラベル「論文」を「論」と表し、ラベル「プロジェクト」を「プ」と表し、ラベル「分野」を「分」と表している。 The information described in each node in FIG. 3 is the label and node value of each node, the left side of “:” indicates the label, and the right side indicates the node value. For example, for a node described as “person: Sachiko Yamamoto”, the label that is the type of the node value is “person”, and the node value that is the content of the stored data is “Sachiko Yamamoto”. In FIG. 3, in order to simplify the diagram, the label “technical term” is expressed as “technique”, the label “paper” is expressed as “discussion”, the label “project” is expressed as “p”, and the label “field” Is expressed as “minute”.

また、各エッジの矢印では、主体となるデータのノードから客体となるデータのノードへ向かう方向が示されるとともに、この主体と客体との関係性を示すラベルが、矢印の種類で区別されている。 In addition, the arrow at each edge indicates the direction from the data node that is the subject to the data node that is the object, and the label indicating the relationship between the subject and the object is distinguished by the type of arrow. .

例えば、実線矢印はラベルが「著者」であり、２点破線矢印はラベルが「プロジェクト担当者」であり、長点線はラベルが「論文キーワード」であり、短点線はラベルが「プロジェクトキーワード」であり、１点破線はラベルが「分野」であるエッジであることが示されている。そして、ノード「論：論文Ａ」からノード「人：山本幸子」に実線矢印のエッジでつながる部分は、主体となるノード「論：論文Ａ」から客体であるノード「人：山本幸子」に向かう関係性が「著者」であること、つまり「『論文Ａ』の『著者』は『山本幸子』である」という情報を意味している。 For example, the solid arrow is labeled “Author”, the two-dot dashed arrow is labeled “Project Manager”, the long dotted line is labeled “Thesis Keyword”, and the short dotted line is labeled “Project Keyword” Yes, it is shown that the dashed line is an edge whose label is “field”. The part connected from the node “Theory: Paper A” to the node “People: Sachiko Yamamoto” by the edge of the solid arrow heads from the main node “Theory: Paper A” to the object node “People: Sachiko Yamamoto”. This means that the relationship is “author”, that is, “the author of“ paper A ”is“ Sachiko Yamamoto ””.

図３に示すようにＲＤＦデータ記憶装置１０に記憶されるＲＤＦデータは異なるラベルのノードとエッジが入り混じっているが、ラベルが異なるノードどうし、ラベルが異なるエッジどうしは厳格に区別される。このＲＤＦデータにおいては、あるラベルが付加されたノードやエッジは特定のラベルのノードやエッジと連結する傾向があり、ノードやエッジの結合の仕方には偏りがある。 As shown in FIG. 3, the RDF data stored in the RDF data storage device 10 is mixed with nodes and edges with different labels, but the nodes with different labels and the edges with different labels are strictly distinguished. In this RDF data, a node or edge to which a certain label is added tends to be connected to a node or edge having a specific label, and there is a bias in the way of coupling the node or edge.

このような分析対象のＲＤＦデータがＲＤＦデータ記憶装置１０に複数記憶されている状態で、ユーザによるユーザ端末２０の操作によりネットワーク分析対象のＲＤＦデータが指定されると、ネットワーク分析装置３０に送信される（Ｓ１）。 When a plurality of RDF data to be analyzed is stored in the RDF data storage device 10 and the RDF data to be analyzed is designated by the user's operation of the user terminal 20, it is transmitted to the network analysis device 30. (S1).

ユーザ端末２０から送信されたＲＤＦデータを指定する情報は、ネットワーク分析装置３０の分析グラフ生成部３１で取得される。 Information specifying RDF data transmitted from the user terminal 20 is acquired by the analysis graph generation unit 31 of the network analysis device 30.

次に、ユーザの操作によりユーザ端末２０から所望の分析対象のノードのラベルである分析対象ラベルが指定される（Ｓ２）。ここでは、分析対象ラベルとして「技術用語」が指定されたものとする。 Next, an analysis target label that is a label of a desired analysis target node is designated from the user terminal 20 by a user operation (S2). Here, it is assumed that “technical term” is designated as the analysis target label.

ユーザによりユーザ端末２０で指定された分析対象ラベルの情報はネットワーク分析装置３０の分析グラフ生成部３１で取得され、この指定された分析対象ラベルに該当するキーノードを有するグラフパターンが、グラフパターン記憶装置４０に要求される（Ｓ３）。 Information on the analysis target label specified by the user on the user terminal 20 is acquired by the analysis graph generation unit 31 of the network analysis device 30, and a graph pattern having a key node corresponding to the specified analysis target label is stored in the graph pattern storage device. 40 is required (S3).

そして、分析グラフ生成部３１からの要求により、ユーザにより指定された分析対象ラベルと同一ラベルのキーノードを有するグラフパターンがグラフパターン記憶装置４０から抽出され、分析グラフ生成部３１に送出されるとともに、ユーザ端末２０に表示されユーザに提供される。 Then, in response to a request from the analysis graph generation unit 31, a graph pattern having a key node of the same label as the analysis target label specified by the user is extracted from the graph pattern storage device 40 and sent to the analysis graph generation unit 31. It is displayed on the user terminal 20 and provided to the user.

ここで、図４（ａ）に示すグラフパターンＡと、図４（ｂ）に示すグラフパターンＢとの２つの情報がユーザに提供されたものとする。 Here, it is assumed that the user is provided with two pieces of information, that is, the graph pattern A shown in FIG. 4A and the graph pattern B shown in FIG.

このグラフパターンＡは、ラベルが「技術情報」でありノード値を任意の変数「?keyNode」としたキーノード１０１と、ラベルが「人」でありノード値を任意の変数「?resultNode」とした結果ノード１０２と、キーノード１０１にラベル「論文キーワード」のエッジ１０３でつながるとともに結果ノード１０２にラベル「著者」のエッジ１０４でつながる中間ノード１０５と、キーノード１０１にラベル「論文キーワード」のエッジ１０６でつながるとともに結果ノード１０２にラベル「著者」のエッジ１０７でつながる中間ノード１０８とにより構成されている。この中間ノード１０５および１０８は、ラベルおよびノード値は指定されておらず、それぞれ任意の変数「?x1：?y1」および「?x2：?y2」で示されている。 This graph pattern A is a result of setting the key node 101 with the label “technical information” and the node value as an arbitrary variable “? KeyNode”, and the node value as an arbitrary variable “? ResultNode” with the label “person”. The node 102 is connected to the key node 101 by the edge 103 of the label “paper keyword” and is connected to the result node 102 by the edge 104 of the label “author”, and the key node 101 is connected by the edge 106 of the label “paper keyword”. The result node 102 includes an intermediate node 108 connected to the edge 107 of the label “author”. The intermediate nodes 105 and 108 have no label or node value, and are indicated by arbitrary variables “? X1:? Y1” and “? X2:? Y2”, respectively.

この構造により、グラフパターンＡでは、キーノードと結果ノードとの関係が、「技術用語に関する論文を２本以上書いている著者」となることを意味している。 With this structure, the graph pattern A means that the relationship between the key node and the result node is “an author who has written two or more papers on technical terms”.

またグラフパターンＢは、ラベルが「技術情報」でありノード値を任意の変数「?keyNode」としたキーノード１１１と、ラベルが「人」でありノード値を任意の変数「?resultNode」とした結果ノード１１２と、キーノード１１１にラベル「プロジェクトキーワード」のエッジ１１３でつながるとともに結果ノード１１２にラベル「プロジェクト担当者」のエッジ１１４でつながる中間ノード１１５と、キーノード１１１にラベル「論文キーワード」のエッジ１１６でつながるとともに結果ノード１１２にラベル「著者」のエッジ１１７でつながる中間ノード１１８とにより構成されている。この中間ノード１１５および１１８は、ラベルおよびノード値は指定されておらず、任意の変数「?x1：?y1」および「?x2：?y2」で示されている。 The graph pattern B is a result of setting the key node 111 whose label is “technical information” and the node value is an arbitrary variable “? KeyNode” and the node value is “people” and the node value is an arbitrary variable “? ResultNode”. The node 112 is connected to the key node 111 by the edge 113 of the label “project keyword” and the result node 112 is connected to the edge 114 of the label “project person”. The key node 111 is connected to the key 116 by the edge 116 of the label “paper keyword”. The intermediate node 118 is connected to the result node 112 at the edge 117 of the label “author”. The intermediate nodes 115 and 118 have no label or node value, and are indicated by arbitrary variables “? X1:? Y1” and “? X2:? Y2”.

この構造により、グラフパターンＢでは、キーノードと結果ノードとの関係が、「技術用語に関する論文の著者で、技術用語のプロジェクトの担当者も兼ねている者」となることを意味している。 This structure means that in the graph pattern B, the relationship between the key node and the result node is “the author of the paper on technical terms and also the person in charge of the technical term project”.

次に、ユーザ端末２０に表示されたグラフパターンの中から、ユーザの操作により指定されたグラフパターンの情報が、分析グラフパターンとしてネットワーク分析装置３０に送信される（Ｓ４）。 Next, the graph pattern information specified by the user's operation from the graph patterns displayed on the user terminal 20 is transmitted to the network analysis device 30 as an analysis graph pattern (S4).

ここでは、図４に示されたグラフパターンのうち、グラフパターンＡがユーザの操作により指定され、送信されたものとする。 Here, it is assumed that the graph pattern A among the graph patterns shown in FIG. 4 is designated and transmitted by the user's operation.

また、ステップＳ２おいて指定された分析対象ラベルに対応するノード値のうち、相互の関係性を求めたい複数のノード値の情報が、分析ノード集合としてユーザの操作によりユーザ端末２０で指定され、ネットワーク分析装置３０に送信される（Ｓ５）。 In addition, among the node values corresponding to the analysis target label specified in step S2, information on a plurality of node values for which a mutual relationship is desired is specified by the user terminal 20 by the user operation as an analysis node set, It is transmitted to the network analyzer 30 (S5).

ここでは、分析ノード集合{“Ａ技術”，“Ｂ技術”，“Ｃ技術”，“Ｄ技術”}が、相互の関係性を求めたいノード値としてユーザの操作により指定され、送信されたものとする。 Here, the analysis node set {“Technology A”, “Technology B”, “Technology C”, “Technology D”} is designated and transmitted by the user as a node value for which a mutual relationship is desired. And

さらに、ステップＳ５で指定した分析ノード集合のノード値で示されるノード間において、関係性があると判断するための係数の閾値情報がユーザの操作によりユーザ端末２０で指定され、ネットワーク分析装置３０に送信される（Ｓ６）。 Further, threshold information of coefficients for determining that there is a relationship between the nodes indicated by the node values of the analysis node set specified in step S5 is specified by the user terminal 20 by the user's operation, and is sent to the network analysis device 30. It is transmitted (S6).

ここでは、ノード間の関係性があるか否かをシンプソン係数に基づいて判断される場合の閾値として、「０．５」が指定され、送信されたものとする。 Here, it is assumed that “0.5” is designated and transmitted as a threshold when it is determined whether there is a relationship between nodes based on the Simpson coefficient.

これらの情報がネットワーク分析装置３０に送信されると、分析グラフ生成部３１において分析グラフ生成処理が実行される（Ｓ７）。 When these pieces of information are transmitted to the network analysis device 30, an analysis graph generation process is executed in the analysis graph generation unit 31 (S7).

分析グラフ生成部３１で実行される分析グラフ生成処理について、図５のフローチャートを参照して説明する。 The analysis graph generation process executed by the analysis graph generation unit 31 will be described with reference to the flowchart of FIG.

まず、ステップＳ５において指定された分析ノード集合の中から、一のノード値が分析ノード値として選択される（Ｓ２１）。 First, one node value is selected as an analysis node value from the set of analysis nodes designated in step S5 (S21).

次に、この分析ノード値がステップＳ４で指定された分析グラフパターンのキーノードに代入されることで、このノード値および分析グラフパターンに対応する結果ノードを得るためのマッチングクエリが生成される（Ｓ２２）。 Next, the analysis node value is substituted into the key node of the analysis graph pattern designated in step S4, thereby generating a matching query for obtaining a result node corresponding to the node value and the analysis graph pattern (S22). ).

ここでは、分析グラフパターンとして指定された図４（ａ）のグラフパターンＡのキーノードに、ステップＳ２１で選択されたノード値「Ｂ技術」が代入されたマッチングクエリが生成されたものとする。 Here, it is assumed that a matching query has been generated in which the node value “B technology” selected in step S21 is assigned to the key node of the graph pattern A in FIG. 4A designated as the analysis graph pattern.

次に、生成されたマッチングクエリが用いられ、ＲＤＦデータ記憶装置１０に記憶されたＲＤＦデータへのマッチング処理が行われる（Ｓ２３）。 Next, the generated matching query is used to perform matching processing on the RDF data stored in the RDF data storage device 10 (S23).

ここでは、グラフパターンＡのキーノードにノード値「Ｂ技術」が代入されたマッチングクエリが用いられてＲＤＦデータへのマッチング処理が行われることにより、図６に太線で示すように、ノード値「人：山田太郎」、「人：田中一郎」、「人：鈴木花子」が結果ノード値として得られ、結果ノード集合{“山田太郎”，“田中一郎”，“鈴木花子”}が生成される。 Here, a matching query in which the node value “B technology” is assigned to the key node of the graph pattern A is used to perform matching processing on the RDF data, so that the node value “people” is indicated by a thick line in FIG. : Taro Yamada "," People: Ichiro Tanaka ", and" People: Hanako Suzuki "are obtained as result node values, and a result node set {" Taro Yamada "," Ichiro Tanaka "," Hanako Suzuki "} is generated.

そして、分析ノード集合の中のノード値がすべてマッチング処理が行われたか否かが判定され（Ｓ２４）、未処理のノード値があるとき（Ｓ２４の「No」）にはステップＳ２１に戻ってさらにマッチング処理が行われる。 Then, it is determined whether or not matching processing has been performed for all the node values in the analysis node set (S24). When there is an unprocessed node value (“No” in S24), the process returns to step S21 and further. A matching process is performed.

本実施形態においては、グラフパターンＡのキーノードに分析ノード集合の中の、ノード値「Ａ技術」が代入されたマッチングクエリにより実行されたマッチング処理により結果ノード集合{“山本幸子”，“中村二郎”，“山田太郎”}が生成され、ノード値「Ｂ技術」が代入されたマッチングクエリにより実行されたマッチング処理により結果ノード集合{“山田太郎”，“田中一郎”，“鈴木花子”}が生成され、ノード値「Ｃ技術」が代入されたマッチングクエリにより実行されたマッチング処理により結果ノード集合{“山田太郎”，“田中一郎”}が生成され、ノード値「Ｄ技術」が代入されたマッチングクエリにより実行されたマッチング処理により結果ノード集合{“田中一郎”，“鈴木花子”}が生成されたものとする。 In the present embodiment, a result node set {“Sachiko Yamamoto”, “Jiro Nakamura” is obtained by a matching process executed by a matching query in which the node value “A technology” is assigned to the key node of the graph pattern A. ”,“ Taro Yamada ”} is generated, and the result node set {“ Taro Yamada ”,“ Ichiro Tanaka ”,“ Hanako Suzuki ”} is generated by the matching process executed by the matching query to which the node value“ B technology ”is assigned. The result node set {“Taro Yamada”, “Ichiro Tanaka”} is generated by the matching process executed by the matching query generated and assigned the node value “C technology”, and the node value “D technology” is substituted. It is assumed that a result node set {“Ichiro Tanaka”, “Hanako Suzuki”} is generated by the matching process executed by the matching query.

ステップＳ２４において、分析ノード集合の中のすべてのノード値に対してマッチング処理が行われたと判定されたとき（Ｓ２４の「Yes」）には、これらの分析ノード集合内のノード値の重なり度合いに基づいて、これらのノード間の関係の強弱を示す値が算出される（Ｓ２５）。 When it is determined in step S24 that matching processing has been performed on all the node values in the analysis node set (“Yes” in S24), the overlapping degree of the node values in these analysis node sets is determined. Based on this, a value indicating the strength of the relationship between these nodes is calculated (S25).

このノード間の関係の強弱を示す値としては、本実施形態においては下記式（１）により算出されるシンプソン係数が用いられる。このシンプソン係数は、ノード間の結果ノードの重なり度合いが多い程関係が強く、高い値が算出される。
As a value indicating the strength of the relationship between the nodes, a Simpson coefficient calculated by the following equation (1) is used in the present embodiment. This Simpson coefficient has a stronger relationship as the degree of overlapping of result nodes between nodes increases, and a higher value is calculated.

上記式（１）により算出された各ノード間のシンプソン係数の例を、図７に示す。図７では、ノード値「Ａ技術」のノードとノード値「Ｂ技術」のノードとの間の係数が「０．３３」であり、ノード値「Ａ技術」のノードとノード値「Ｃ技術」のノードとの間の係数が「０．５」であり、ノード値「Ａ技術」のノードとノード値「Ｄ技術」のノードとの間の係数が「０」であり、ノード値「Ｂ技術」のノードとノード値「Ｃ技術」のノードとの間の係数が「１」であり、ノード値「Ｂ技術」のノードとノード値「Ｄ技術」のノードとの間の係数が「１」であり、ノード値「Ｃ技術」のノードとノード値「Ｄ技術」のノードとの間の係数が「０．５」であることが示されている。 An example of the Simpson coefficient between the nodes calculated by the above equation (1) is shown in FIG. In FIG. 7, the coefficient between the node having the node value “Technology A” and the node having the node value “Technology B” is “0.33”, and the node having the node value “Technology A” and the node value “C technology” The coefficient between the nodes of the node “0 technology” and the node of the node value “D technology” is “0”, and the node value “B technology” The coefficient between the node having the node value “C technology” and the node having the node value “D technology” is “1”. It is shown that the coefficient between the node with the node value “C technology” and the node with the node value “D technology” is “0.5”.

次に、算出された各ノード間のシンプソン係数のうち、ステップＳ８で指定された閾値である「０．５」よりも高い値が抽出され、分析ノード集合内のノード値のノードと、抽出された該当箇所のエッジにより構成された、ラベル無し無向グラフで表現されたサブグラフが分析グラフとして生成される（Ｓ２６）。 Next, among the calculated Simpson coefficients between the nodes, a value higher than the threshold value “0.5” specified in step S8 is extracted, and the node of the node value in the analysis node set is extracted. A sub-graph expressed by an unlabeled undirected graph composed of the corresponding edges is generated as an analysis graph (S26).

ここでは、図７の中からノード値「Ａ技術」のノードとノード値「Ｃ技術」のノードとの間、ノード値「Ｂ技術」のノードとノード値「Ｃ技術」のノードとの間、ノード値「Ｂ技術」のノードとノード値「Ｄ技術」のノードとの間、ノード値「Ｃ技術」のノードとノード値「Ｄ技術」のノードとの間にエッジを有するように構成された、図８に示すような分析グラフが生成される。このグラフパターンＡに基づいて算出された分析グラフを、分析グラフＡとする。 Here, between the node of the node value “A technology” and the node of the node value “C technology” from FIG. 7, between the node of the node value “B technology” and the node of the node value “C technology”, The node value “B technology” is configured to have an edge between the node of the node value “D technology” and the node value “C technology” and the node of the node value “D technology”. An analysis graph as shown in FIG. 8 is generated. An analysis graph calculated based on the graph pattern A is referred to as an analysis graph A.

また、ステップＳ４においてユーザの操作により図４（ｂ）のグラフパターンＢが指定され、ステップＳ５およびＳ６において上記と同様に分析ノード集合{“Ａ技術”，“Ｂ技術”，“Ｃ技術”，“Ｄ技術”}および閾値情報「０．５」が指定され、分析グラフ生成処理が行われたときに算出されるシンプソン係数の例を、図９に示す。 In step S4, the graph pattern B shown in FIG. 4B is designated by the user's operation. In steps S5 and S6, the analysis node set {“A technology”, “B technology”, “C technology”, FIG. 9 shows an example of Simpson coefficients calculated when “D technique”} and threshold information “0.5” are designated and the analysis graph generation process is performed.

図９では、ノード値「Ａ技術」のノードとノード値「Ｂ技術」のノードとの間の係数が「０」であり、ノード値「Ａ技術」のノードとノード値「Ｃ技術」のノードとの間の係数が「０」であり、ノード値「Ａ技術」のノードとノード値「Ｄ技術」のノードとの間の係数が「０」であり、ノード値「Ｂ技術」のノードとノード値「Ｃ技術」のノードとの間の係数が「０．５」であり、ノード値「Ｂ技術」のノードとノード値「Ｄ技術」のノードとの間の係数が「０．５」であり、ノード値「Ｃ技術」のノードとノード値「Ｄ技術」のノードとの間の係数が「０．５」であることが示されている。 In FIG. 9, the coefficient between the node with the node value “A technology” and the node with the node value “B technology” is “0”, and the node with the node value “A technology” and the node with the node value “C technology” The coefficient between the node having the node value “A technology” and the node having the node value “D technology” is “0”, and the node having the node value “B technology” The coefficient between the node of the node value “C technology” is “0.5”, and the coefficient between the node of the node value “B technology” and the node of the node value “D technology” is “0.5”. It is shown that the coefficient between the node with the node value “C technology” and the node with the node value “D technology” is “0.5”.

そして、算出された各ノード間のシンプソン係数のうち、指定された閾値である「０．５」よりも高い値が抽出され、上述したようにラベル無し無向グラフで表現されたサブグラフが分析グラフとして生成される。 Then, among the calculated Simpson coefficients between the nodes, a value higher than the designated threshold value “0.5” is extracted, and the sub-graph expressed by the unlabeled undirected graph as described above is the analysis graph. Is generated as

ここでは、図９の中からノード値「Ｂ技術」のノードとノード値「Ｃ技術」のノードとの間、ノード値「Ｂ技術」のノードとノード値「Ｄ技術」のノードとの間、ノード値「Ｃ技術」のノードとノード値「Ｄ技術」のノードとの間にエッジを有するように構成された、図１０に示すような分析グラフが生成される。このグラフパターンＢに基づいて算出された分析グラフを、分析グラフＢとする。 Here, between the node of the node value “B technology” and the node of the node value “C technology” from FIG. 9, between the node of the node value “B technology” and the node of the node value “D technology”, An analysis graph as illustrated in FIG. 10 is generated, which is configured to have an edge between the node having the node value “C technology” and the node having the node value “D technology”. An analysis graph calculated based on the graph pattern B is referred to as an analysis graph B.

本実施形態において生成されたラベル無し無向グラフで表現された分析グラフは、指定された同一のラベルのノードで構成されているとともに、パターングラフにより同じ規則を適用して得られたエッジにより構成されているので、すべてのエッジは同一ラベルのエッジとみなすことができ、この分析グラフはラベル無しグラフとみなすことができる。また分析グラフのエッジの向きに関しては、パターングラフにより同じ規則を適用して得られたことにより、エッジに接続する２ノード間において意味的な対象性が保たれているため、無向グラフとみなして問題はない。 The analysis graph expressed by the unlabeled undirected graph generated in the present embodiment is configured by nodes having the same specified label, and is configured by edges obtained by applying the same rule by the pattern graph. Therefore, all edges can be regarded as edges having the same label, and this analysis graph can be regarded as an unlabeled graph. The edge direction of the analysis graph is regarded as an undirected graph because the semantic object is maintained between the two nodes connected to the edge by applying the same rule to the pattern graph. There is no problem.

生成された分析グラフは、分析グラフ記憶部３２に記憶される（Ｓ８）。 The generated analysis graph is stored in the analysis graph storage unit 32 (S8).

生成された分析グラフが記憶された状態で、ユーザによりユーザ端末２０が操作されネットワーク分析が要求されると、この要求がネットワーク分析装置３０に送信される（Ｓ９）。 When the user terminal 20 is operated by the user and a network analysis is requested in a state where the generated analysis graph is stored, this request is transmitted to the network analysis device 30 (S9).

ネットワーク分析装置３０では、ユーザ端末２０から送信されたネットワーク分析の要求がネットワーク分析部３３で取得され、記憶された分析グラフについてネットワーク分析処理が実行される（Ｓ１０）。 In the network analysis device 30, the network analysis request transmitted from the user terminal 20 is acquired by the network analysis unit 33, and the network analysis processing is executed for the stored analysis graph (S10).

本実施形態においてネットワーク分析部３３で実行されるネットワーク分析処理として、（１）中心性分析、および（２）クリーク抽出分析が行われる場合について説明する。 A case where (1) centrality analysis and (2) clique extraction analysis are performed as network analysis processing executed by the network analysis unit 33 in the present embodiment will be described.

（１）中心性分析によるネットワーク分析処理
中心性分析とは、分析対象とするグラフ構造データの中から影響力の強いノードを抽出することで中心的な（中心度の高い）ノードを分析するものであり、既存のアルゴリズムにより次数中心、隣接中心、切断中心など様々な観点により中心性が分析される。 (1) Network analysis processing by centrality analysis Centrality analysis is to analyze a central (high centrality) node by extracting nodes with strong influence from the graph structure data to be analyzed. The centrality is analyzed from various viewpoints such as the order center, the adjacent center, and the cutting center by an existing algorithm.

中心性分析を行うための既存のアルゴリズムの例として、連結するノードの数が多い程影響力が強いノードとみなし、隣接ノードの接続数から中心度を算出する次数中心分析のアルゴリズムや、ノード間の媒介的な役割を果たす度合いが高い程影響力が強いノードとみなし、任意の２つのノード間をつなぐ複数のパスの中に存在する確率から中心度を算出する媒介中心分析のアルゴリズムなどがある。 Examples of existing algorithms for centrality analysis include degree-centric analysis algorithms that calculate the centrality from the number of connections of adjacent nodes, and nodes that are considered to be more influential as the number of connected nodes increases. There is a mediation center analysis algorithm that calculates the centrality from the probabilities that exist in multiple paths that connect between any two nodes, considering that the higher the degree of mediator role is, the more powerful the node is .

このうち次数中心分析の中心性分析に用いる中心度は、下記式（２）により算出される。
Of these, the centrality used in the centrality analysis of the order-centric analysis is calculated by the following equation (2).

中心性分析の一例として、次数中心分析のアルゴリズムに従って図８の分析グラフＡを分析したときの結果を図１１に示す。 As an example of the centrality analysis, FIG. 11 shows the results when the analysis graph A of FIG. 8 is analyzed according to the order center analysis algorithm.

図１１に示すように、分析グラフＡからは、「Ａ技術」の中心度（dci）が「０．３３」であり、「Ｂ技術」の中心度（dci）が「０．６６」であり、「Ｃ技術」の中心度（dci）が「１」であり、「Ｄ技術」の中心度（dci）が「０．６６」であることが算出される。これにより、ラベル「技術用語」に関する分析ノード集合{“Ａ技術”，“Ｂ技術”，“Ｃ技術”，“Ｄ技術”}の中で最も影響力が強い中心的なノードは「Ｃ技術」であり、最も影響力が弱いノードは「Ａ技術」であること分かる。 As shown in FIG. 11, from the analysis graph A, the centrality (dci) of “Technology A” is “0.33”, and the centrality (dci) of “Technology B” is “0.66”. , It is calculated that the centrality (dci) of “C technology” is “1” and the centrality (dci) of “D technology” is “0.66”. As a result, the central node having the strongest influence in the analysis node set {“A technology”, “B technology”, “C technology”, “D technology”} regarding the label “technical term” is “C technology”. It can be seen that the node having the weakest influence is the “A technology”.

このように、予め同一ラベルに関するラベル無し無向グラフ構造による分析グラフを生成しておくことで、複数のラベルが混在するラベル付き有向グラフ構造データでは不可能であった当該ラベルに関する中心性分析を行うことが可能になる。 In this way, by generating an analysis graph with an unlabeled undirected graph structure related to the same label in advance, centrality analysis regarding the label, which was impossible with the directed graph structure data with a plurality of labels mixed, is performed. It becomes possible.

（２）クリーク分析によるネットワーク分析処理
クリーク分析とは、分析対象とするグラフ構造データの中から密接な関連を持つ３個以上のノードで形成される一部分のデータ（クリーク）を抽出することでノードの派閥を分析するものである。 (2) Network analysis processing by clique analysis Clique analysis is a process by extracting a part of data (clique) formed by three or more closely related nodes from the graph structure data to be analyzed. It analyzes the faction.

このクリーク分析では、ネットワーク密度が低いグラフ構造データの中では処理の精度が低下するという欠点がある。 This clique analysis has a drawback that the processing accuracy is reduced in graph structure data having a low network density.

このネットワーク密度とは、グラフ構造データに含まれるエッジの密度であり、「０」から「１」までの値を持つ。ネットワーク密度が「１」の場合は、ノード間がすべてエッジでつながった構成の完全グラフ構造データを示し、ネットワーク密度が「０」の場合は、ノード間にエッジが１本もなくノードが完全孤立したグラフ構造データを示す。 The network density is the density of edges included in the graph structure data, and has a value from “0” to “1”. When the network density is “1”, it shows complete graph structure data in which all the nodes are connected by edges. When the network density is “0”, there is no edge between the nodes and the nodes are completely isolated. The graph structure data is shown.

そこで、予め設定された閾値以上のネットワーク密度を持つデータ部分をクリークとして抽出したり、全ての２つのノードが予め設定された距離Ｎで接続されたデータ部分をＮ−クリークとして抽出したりすることで、クリーク分析を行うことが可能になる。 Therefore, a data portion having a network density equal to or higher than a preset threshold is extracted as a clique, or a data portion in which all two nodes are connected at a preset distance N is extracted as an N-clique. This makes it possible to perform clique analysis.

クリーク分析の一例として、予め設定された閾値以上のネットワーク密度を持つクリークを抽出することにより、分析グラフＡを分析したときの結果を図１２に示す。 As an example of clique analysis, FIG. 12 shows a result of analyzing analysis graph A by extracting a clique having a network density equal to or higher than a preset threshold.

図１２では抽出されたクリークを形成するノードおよびエッジを太線で示しており、分析グラフＡからは、ノード「Ｂ技術」、ノード「Ｃ技術」、ノード「Ｄ技術」、ノード「Ｂ技術」とノード「Ｃ技術」とをつなぐエッジ、ノード「Ｂ技術」とノード「Ｄ技術」とをつなぐエッジ、ノード「Ｃ技術」とノード「Ｄ技術」とをつなぐエッジにより構成されるクリークが抽出されたことを示している。 In FIG. 12, the nodes and edges forming the extracted clique are indicated by bold lines. From the analysis graph A, the node “B technology”, the node “C technology”, the node “D technology”, and the node “B technology” are indicated. Cliques composed of an edge connecting node “C technology”, an edge connecting node “B technology” and node “D technology”, and an edge connecting node “C technology” and node “D technology” were extracted. It is shown that.

この分析結果により、「Ｂ技術」、「Ｃ技術」、および「Ｄ技術」はお互いに密接な関連を持つ派閥を形成しており、「Ａ技術」はラベル「技術用語」の分析ノード集合の他のノードとの関連が低く孤立していることが分かる。 From this analysis result, “B technology”, “C technology”, and “D technology” form a faction closely related to each other, and “A technology” is a set of analysis nodes of the label “technical term”. It can be seen that the relationship with other nodes is low and isolated.

このように、予め同一ラベルに関するラベル無し無向グラフ構造による分析グラフを生成しておくことで、複数のラベルが混在するラベル付き有向グラフ構造データでは不可能であった当該ラベルに関するクリーク分析を行うことが可能になる。 In this way, by generating an analysis graph with unlabeled undirected graph structure related to the same label in advance, clique analysis related to the label that was not possible with directed graph structure data with a plurality of labels mixed Is possible.

次に、これらのネットワーク分析を行った結果に基づいて、分析グラフ内の所定の条件を満たす要素、例えば、中心的なノード、クリークに属さず孤立したノード、異なるクリークに属するエッジ、またはクリーク内の複数のノードに共通する要素等に関するメタデータを検索する周辺情報検索処理が、周辺情報検索部３４において実行される（Ｓ１１）。 Next, based on the results of the network analysis, elements satisfying a predetermined condition in the analysis graph, for example, a central node, an isolated node that does not belong to a clique, an edge that belongs to a different clique, or an inside of a clique A peripheral information search process for searching for metadata related to elements common to the plurality of nodes is executed in the peripheral information search unit 34 (S11).

周辺情報検索処理を、（１）中心性分析によるネットワーク分析処理を行った結果に対して行う場合と、（２）クリーク分析によるネットワーク分析処理を行った結果に対して行う場合とについて説明する。 A case where the peripheral information search process is performed on the result of the network analysis process by centrality analysis and the case of (2) the result of the network analysis process by clique analysis will be described.

図１３は、（１）中心性分析によるネットワーク分析処理を行った結果に対して周辺情報検索処理を行う場合の動作を示すフローチャートである。この周辺情報検索処理では、まずネットワーク分析処理結果から中心度の高いノードが分析ノードとして抽出される（Ｓ３１、Ｓ３２）。 FIG. 13 is a flowchart showing an operation when (1) peripheral information search processing is performed on the result of network analysis processing by centrality analysis. In this peripheral information search process, first, a node having a high centrality is extracted as an analysis node from the network analysis process result (S31, S32).

この分析ノードは、例えば図１１に示すように算出された各ノードの中心度から、予め設定された閾値などのパラメータに基づいて抽出される。このパラメータは、ステップＳ９においてユーザの操作により送信されるネットワーク分析要求、または当該システムの実行に際し予め記憶された設定ファイル等において設定される。 This analysis node is extracted based on parameters such as a preset threshold value from the centrality of each node calculated as shown in FIG. 11, for example. This parameter is set in a network analysis request transmitted by the user's operation in step S9 or a setting file stored in advance when the system is executed.

具体的な「中心度の高いノード」の抽出処理としては、1）パラメータとして閾値を設定しておき、この閾値以上の中心度のノードを抽出する方法、2）パラメータとして抽出パーセント（ｎ％）を設定しておくとともに処理対象となるノードを中心度の高い順にソートしておき、ソートされたノードの中から、上位ｎ％のノードを抽出する方法、または、1）および2)の方法を併用する方法などがある。 Specific “high-centrality node” extraction processing is as follows: 1) A threshold is set as a parameter, and a node with a centrality equal to or greater than this threshold is extracted. 2) Extraction percentage (n%) as a parameter Set the node and sort the nodes to be processed in descending order of centrality, and extract the top n% nodes from the sorted nodes, or the methods 1) and 2) There are methods to use together.

このようにして抽出された分析ノードに対して周辺情報検索処理が実行される（Ｓ３３）。 A peripheral information search process is executed on the analysis node extracted in this way (S33).

この周辺情報検索処理では、分析グラフを抽出した元のＲＤＦデータから、当該分析ノードに一のエッジを介してつながるノードが検索され周辺情報として取得される。 In this peripheral information search process, a node connected to the analysis node via one edge is searched from the original RDF data from which the analysis graph is extracted, and acquired as peripheral information.

例えば、分析グラフＡの中心性分析によるネットワーク分析結果では「Ｃ技術」が中心的なノードとして抽出されているため、分析グラフＡの抽出元のＲＤＦデータからこの「Ｃ技術」に一のエッジを介してつながるノードを検索すると、図１４に示すように、エッジ「分野」につながるノード「Ｌ分野」と、エッジ「プロジェクトキーワード」につながるノード「プロジェクト２」およびノード「プロジェクト１」と、エッジ「論文」につながるノード「論文Ｇ」、「論文Ｈ」、および「論文Ｉ」が、周辺情報として取得される。 For example, in the network analysis result of the centrality analysis of the analysis graph A, “C technology” is extracted as a central node, so one edge is added to this “C technology” from the RDF data from which the analysis graph A is extracted. As shown in FIG. 14, the node “L field” connected to the edge “field”, the node “project 2” and the node “project 1” connected to the edge “project keyword”, and the edge “ Nodes “Paper G”, “Paper H”, and “Paper I” connected to “Paper” are acquired as peripheral information.

次に図１５に示すように、取得された周辺情報が、中心性分析によるネットワーク分析処理を行った結果に付加され、周辺情報が紐付けされた分析結果が得られる（Ｓ３４、Ｓ３５）。 Next, as shown in FIG. 15, the acquired peripheral information is added to the result of the network analysis process based on the centrality analysis, and an analysis result in which the peripheral information is linked is obtained (S34, S35).

また図１６は、（２）クリーク分析によるネットワーク分析処理を行った結果に対して周辺情報検索処理を行う場合の動作を示すフローチャートである。この周辺情報検索処理では、まず上述したようなネットワーク分析処理結果から、どのクリークにも属さないノードが分析ノードとして抽出される（Ｓ４１、Ｓ４２）。この「どのクリークにも属さないノード」は、上述した「クリーク分析によるネットワーク分析処理」により自明に特定されるノードである。 FIG. 16 is a flowchart showing the operation when the peripheral information search process is performed on the result of the network analysis process by (2) clique analysis. In this peripheral information search process, nodes that do not belong to any clique are first extracted as analysis nodes from the network analysis process result as described above (S41, S42). This “node that does not belong to any clique” is a node that is clearly identified by the above-described “network analysis processing by clique analysis”.

この抽出された分析ノードに対してステップＳ３３で説明した周辺情報検索処理と同様の処理が実行される（Ｓ４３）。 A process similar to the peripheral information search process described in step S33 is executed on the extracted analysis node (S43).

例えば、分析グラフＡのクリーク分析によるネットワーク分析結果では「Ａ技術」がどのクリークにも属さないノードとして抽出されているため、分析グラフＡの抽出元のＲＤＦデータからこの「Ａ技術」に一のエッジを介してつながるノードを検索すると、エッジ分野」につながるノード「Ｌ分野」と、エッジ「論文」につながるノード「論文Ａ」、「論文Ｂ」、「論文Ｃ」、「論文Ｄ」、「論文Ｅ」、および「論文Ｆ」が、周辺情報として取得される。 For example, since the “A technology” is extracted as a node that does not belong to any clique in the network analysis result by the clique analysis of the analysis graph A, the “A technology” is one of the RDF data from which the analysis graph A is extracted. When the nodes connected through the edge are searched, the node “L field” connected to the edge field and the nodes “paper A”, “paper B”, “paper C”, “paper D”, “ “Paper E” and “Paper F” are acquired as peripheral information.

次に、図１７に示すように、取得された周辺情報が、クリーク分析によるネットワーク分析処理を行った結果に付加され、周辺情報が紐付けされた分析結果が得られる（Ｓ４４、Ｓ４５）。 Next, as shown in FIG. 17, the acquired peripheral information is added to the result of performing the network analysis processing by clique analysis, and the analysis result in which the peripheral information is associated is obtained (S44, S45).

また（２）クリーク分析によるネットワーク分析処理を行った結果に対して周辺情報検索処理を行う場合の他の例においては、このネットワーク分析処理結果から、異なるクリークに属するエッジが抽出され（Ｓ５２）、このエッジの両端につながる２つのノード間の関連情報が、周辺情報として検索される（Ｓ５３）。 In another example in which the peripheral information search process is performed on the result of the network analysis process by clique analysis, edges belonging to different cliques are extracted from the network analysis process result (S52), Related information between two nodes connected to both ends of the edge is searched as peripheral information (S53).

この周辺情報検索処理では、分析グラフを抽出した元のＲＤＦデータから、ステップＳ４で指定された分析グラフパターンを用いたマッチング処理の結果ノードにより、当該エッジの両端につながる２つのノード間の関連情報が抽出される。 In this peripheral information search process, the related information between the two nodes connected to both ends of the edge is obtained from the original RDF data from which the analysis graph is extracted and the result of the matching process using the analysis graph pattern specified in step S4. Is extracted.

例えば、分析グラフＡのクリーク分析によるネットワーク分析結果では、異なるクリークに属するエッジとしてノード「Ａ技術」とノード「Ｃ技術」とにつながるエッジが抽出される。 For example, in the network analysis result by the clique analysis of the analysis graph A, edges connected to the node “A technology” and the node “C technology” are extracted as edges belonging to different cliques.

そして、この分析グラフＡに用いられた分析グラフパターンＡを用いて、ノード「Ａ技術」および「Ｃ技術」をそれぞれキーノードとしてＲＤＦデータにマッチング処理が行われ、図１９に示すように、この２つのキーノードに対して重複して得られる結果ノード「山田太郎」が関連ノード集合として取得される。 Then, using the analysis graph pattern A used for the analysis graph A, matching processing is performed on the RDF data using the nodes “A technology” and “C technology” as key nodes, respectively. As shown in FIG. The result node “Taro Yamada” obtained redundantly for one key node is acquired as a related node set.

これは、「Ａ技術」と「Ｃ技術」とに関連する事項として、「『山田太郎』という同一人物が、２本以上の論文の中でキーワードとして使用していた」ことを提示するものである。 This is a matter related to "Technology A" and "Technology C", and presents that "Taro Yamada" was used as a keyword in two or more papers. " is there.

次に、図２０に示すように、取得された関連ノード集合が、クリーク分析によるネットワーク分析処理を行った結果に周辺情報として付加され、周辺情報が紐付けされた分析結果が得られる（Ｓ５４、Ｓ５５）。 Next, as shown in FIG. 20, the acquired related node set is added as the peripheral information to the result of the network analysis processing by clique analysis, and the analysis result in which the peripheral information is linked is obtained (S54, S55).

また（２）クリーク分析によるネットワーク分析処理を行った結果に対して周辺情報検索処理を行う場合の他の例においては、このネットワーク分析処理結果から、各クリークに属する分析ノード集合が抽出され（Ｓ６２）、これらの分析ノード集合に含まれるノードに関する共通情報検索処理が行われる（Ｓ６３）。 In addition, in another example in which the peripheral information search process is performed on the result of the network analysis process by clique analysis, an analysis node set belonging to each clique is extracted from the network analysis process result (S62). ), Common information search processing regarding nodes included in the set of analysis nodes is performed (S63).

例えば、図１２に示すように抽出されたクリークについては、属する分析ノード集合として{“Ｂ技術”，“Ｃ技術”，“Ｄ技術”}が抽出される。そして、これらのノードに関する共通情報検索処理として、まず、クリークの抽出に使用されたグラフパターンＡが用いられ、分析ノード集合のそれぞれのノードをキーノードに代入してＲＤＦデータにマッチング処理が行われることで、結果ノード集合が生成される。 For example, for the clique extracted as shown in FIG. 12, {"B technology", "C technology", "D technology"} is extracted as the set of analysis nodes to which it belongs. Then, as the common information search processing regarding these nodes, first, the graph pattern A used for clique extraction is used, and each node of the analysis node set is substituted for the key node, and matching processing is performed on the RDF data. Thus, a result node set is generated.

例えば、「Ｂ技術」をキーノードに代入することで結果ノード集合{“山田太郎”，“田中一郎”，“鈴木花子”}が生成され、「Ｃ技術」をキーノードに代入することで結果ノード集合{“山田太郎”，“田中一郎”}が生成され、「Ｄ技術」をキーノードに代入することで結果ノード集合{“田中一郎”，“鈴木花子”}が生成される。 For example, a result node set {“Taro Yamada”, “Ichiro Tanaka”, “Hanako Suzuki”} is generated by substituting “B technology” into a key node, and a result node set by substituting “C technology” into a key node. {“Taro Yamada”, “Ichiro Tanaka”} are generated, and a result node set {“Ichiro Tanaka”, “Hanako Suzuki”} is generated by substituting “D technology” into a key node.

次に、分析ノード集合に含まれるノードごとに、結果ノード集合に含まれるノード値の数を、分析ノード集合に含まれるノード数（ここでは、「Ｂ技術」、「Ｃ技術」、「Ｄ技術」の３個）で割ることにより、スコアが算出される。 Next, for each node included in the analysis node set, the number of node values included in the result node set is set to the number of nodes included in the analysis node set (here, “B technology”, “C technology”, “D technology”). The score is calculated by dividing by 3).

例えば、「Ｂ技術」については、結果ノード集合に含まれるノード数「３」÷分析ノード集合に含まれるノード数「３」によりスコア「１」が算出され、「Ｃ技術」については、結果ノード集合に含まれるノード数「２」÷分析ノード集合に含まれるノード数「３」によりスコア「０．６７」が算出され、「Ｄ技術」については、結果ノード集合に含まれるノード数「２」÷分析ノード集合に含まれるノード数「３」によりスコア「０．６７」が算出される。 For example, for “B technology”, the score “1” is calculated by the number of nodes included in the result node set “3” ÷ the number of nodes included in the analysis node set “3”, and for “C technology”, the result node The score “0.67” is calculated by the number of nodes included in the set “2” ÷ the number of nodes included in the analysis node set “3”. For “D technology”, the number of nodes included in the result node set “2” ÷ Score “0.67” is calculated from the number of nodes “3” included in the analysis node set.

そして、必要に応じて予め設定された閾値以下のスコアのノードを削除することで共通性の低いノードをフィルタリングして除くかあるいは、上位所定数のスコア値のノードのみを抽出することで共通性の高いノードを絞り込む。 Then, if necessary, remove nodes with a score equal to or lower than a preset threshold to filter out nodes with low commonality, or extract only nodes with a higher predetermined number of score values. Narrow nodes with high

ここでは、スコアの閾値が「０．５」であるとすると、「Ｂ技術」、「Ｃ技術」、および「Ｄ技術」ともにこの閾値を超えるため、３つすべてが共通性の高いノードとして抽出される。 Here, if the threshold value of the score is “0.5”, “B technology”, “C technology”, and “D technology” exceed this threshold value, so all three are extracted as highly common nodes. Is done.

次に、図２２に示すように、取得された共通性の高いノードが、クリーク分析によるネットワーク分析処理を行った結果に周辺情報として付加され、周辺情報が紐付けされた分析結果が得られる（Ｓ６４、Ｓ６５）。 Next, as shown in FIG. 22, the acquired highly common node is added as the peripheral information to the result of the network analysis processing by clique analysis, and the analysis result in which the peripheral information is linked is obtained ( S64, S65).

これらのうちいずれかの処理により周辺情報検索処理が行われ周辺情報が紐付けされた分析結果が得られると、ユーザ端末２０に送信されて表示される（Ｓ１２）。 When the peripheral information search process is performed by any one of these processes and an analysis result associated with the peripheral information is obtained, the analysis result is transmitted to the user terminal 20 and displayed (S12).

ユーザ端末２０に表示された分析結果の一例を、図２３に示す。図２３では、クリークａ〜ｄに関し、抽出された共通要素が周辺情報として紐付けされ表示されている。 An example of the analysis result displayed on the user terminal 20 is shown in FIG. In FIG. 23, regarding the cliques a to d, the extracted common elements are linked and displayed as the peripheral information.

以上の本実施形態によれば、ラベル付き有向グラフで表現されたネットワークから、既存のネットワーク分析技術を用いて分析可能なラベル無し無向グラフの分析グラフを生成することができ、簡易な操作で精度の高い分析処理を行うことができる。 According to the present embodiment described above, an analysis graph of an unlabeled undirected graph that can be analyzed using an existing network analysis technology can be generated from a network represented by a directed graph with a label, and the accuracy can be achieved with a simple operation. High analysis processing.

また、本実施形態においてはユーザにより指定された分析グラフパターンや分析対象ラベルを利用して１つの関係性に絞り込んだラベル無し無向グラフの分析グラフを生成しネットワーク分析を行うため、ユーザが所望の分析グラフパターンに変更することで、分析グラフのエッジの意味（ノード間の関係性）が異なる分析グラフを生成することができ、また分析対象ラベルを変更することで、異なるラベルのノードからなる分析グラフを生成することができ、様々な観点から分析結果を得ることができる。 Further, in this embodiment, the analysis graph pattern or analysis target label specified by the user is used to generate an analysis graph of an unlabeled undirected graph narrowed down to one relationship, and the network analysis is performed. By changing to the analysis graph pattern, it is possible to generate an analysis graph with different meanings (relationships between nodes) of the edge of the analysis graph, and by changing the analysis target label, it consists of nodes with different labels An analysis graph can be generated, and analysis results can be obtained from various viewpoints.

また、分析グラフの特徴を示す情報に関連する周辺情報を検索して付加することにより、さらに精度の高い分析結果を提供することができる。 Further, by searching for and adding peripheral information related to information indicating the characteristics of the analysis graph, it is possible to provide a more accurate analysis result.

《第２実施形態》
〈第２実施形態によるネットワーク分析装置を利用したネットワーク分析システムの構成〉
本実施形態によるネットワーク分析システム２の構成は、図２４に示すように、グラフパターン記憶装置４０を有さず、ネットワーク分析装置３０が、頻出グラフパターン抽出部３５と、頻出グラフパターン記憶部３６とを有する他は第１実施形態によるネットワーク分析システム１の構成と同じであるため、同様の機能を有する構成部については説明を省略する。 << Second Embodiment >>
<Configuration of Network Analysis System Using Network Analysis Device According to Second Embodiment>
As shown in FIG. 24, the configuration of the network analysis system 2 according to the present embodiment does not include the graph pattern storage device 40. The network analysis device 30 includes a frequent graph pattern extraction unit 35, a frequent graph pattern storage unit 36, and Since the configuration is the same as the configuration of the network analysis system 1 according to the first embodiment, the description of the configuration units having the same functions is omitted.

頻出グラフパターン抽出部３５は、ＲＤＦデータ記憶装置１０に記憶されたＲＤＦデータから、所定数以上のグラフ構造部分（サブグラフ）がマッチングされる特定の構造を有するグラフパターンを、頻出グラフパターンとして抽出する。 The frequent graph pattern extraction unit 35 extracts, from the RDF data stored in the RDF data storage device 10, a graph pattern having a specific structure that matches a predetermined number or more of graph structure parts (subgraphs) as a frequent graph pattern. .

頻出グラフパターン記憶部３６は例えば一時的に記憶可能なメモリで構成され、頻出グラフパターン抽出部３５で抽出された頻出グラフパターンを記憶する。 The frequent graph pattern storage unit 36 is constituted by a memory that can be temporarily stored, for example, and stores the frequent graph pattern extracted by the frequent graph pattern extraction unit 35.

分析グラフ生成部３１は、頻出グラフパターン記憶部３６に記憶された中からユーザによりユーザ端末２０から指定されたグラフパターンを分析グラフパターンとし、この分析グラフパターンにマッチするサブグラフをＲＤＦデータ記憶装置１０に記憶されたＲＤＦデータから抽出する。 The analysis graph generation unit 31 sets a graph pattern designated by the user from the user terminal 20 from among those stored in the frequent graph pattern storage unit 36 as an analysis graph pattern, and subgraphs matching the analysis graph pattern are stored in the RDF data storage device 10. From the RDF data stored in.

〈第２実施形態によるネットワーク分析装置を利用したネットワーク分析システムの動作〉
次に、本実施形態によるネットワーク分析システム１の動作について、図２５のシーケンス図を参照して説明する。 <Operation of Network Analysis System Using Network Analysis Device According to Second Embodiment>
Next, the operation of the network analysis system 1 according to the present embodiment will be described with reference to the sequence diagram of FIG.

第１実施形態と同様に分析対象のＲＤＦデータがＲＤＦデータ記憶装置１０に複数記憶されている状態で、ユーザによるユーザ端末２０の操作によりネットワーク分析対象のＲＤＦデータが指定されると、ネットワーク分析装置３０に送信される（Ｓ７１）。 As in the first embodiment, when a plurality of RDF data to be analyzed is stored in the RDF data storage device 10 and the RDF data to be analyzed is designated by the user's operation of the user terminal 20, the network analysis device 30 (S71).

ユーザ端末２０から送信されたＲＤＦデータを指定する情報は、ネットワーク分析装置３０の頻出グラフパターン抽出部３５で取得される。頻出グラフパターン抽出部３５では、ＲＤＦデータ記憶装置１０に記憶されたＲＤＦデータから、所定数以上のグラフ構造部分（サブグラフ）がマッチングされる特定の構造を有するグラフパターンが、頻出グラフパターンとして抽出される（Ｓ７２）。このグラフパターンは、ユーザ端末２０などの外部から指定された値をノードの値とする１つのキーノードと、任意の値を取り得るノードと含むグラフ構造で構成される。 Information specifying the RDF data transmitted from the user terminal 20 is acquired by the frequent graph pattern extraction unit 35 of the network analysis device 30. The frequent graph pattern extraction unit 35 extracts, as a frequent graph pattern, a graph pattern having a specific structure that matches a predetermined number or more of graph structure parts (subgraphs) from the RDF data stored in the RDF data storage device 10. (S72). This graph pattern has a graph structure including one key node having a value designated from the outside such as the user terminal 20 as a node value and a node capable of taking an arbitrary value.

ここで、サブグラフの数が「所定数」以上であるか否かを判断するための閾値は、当該システムの実行に際し予め記憶された設定ファイル等において設定しておいてもよいし、ステップＳ７１においてＲＤＦデータ指定が行われる際にユーザの操作により設定するようにしてもよい。 Here, the threshold for determining whether or not the number of subgraphs is greater than or equal to the “predetermined number” may be set in a setting file or the like stored in advance when the system is executed, or in step S71. It may be set by a user operation when RDF data designation is performed.

抽出された頻出グラフパターンは、頻出グラフパターン記憶部３６に記憶される（Ｓ７３）。 The extracted frequent graph pattern is stored in the frequent graph pattern storage unit 36 (S73).

この頻出グラフパターンを抽出しておくことで、ネットワーク密度が高いサブグラフを分析対象として得ることができ、後述するネットワーク分析処理において有用で精度の高い分析結果を得易い傾向がある。 By extracting this frequent graph pattern, a subgraph having a high network density can be obtained as an analysis target, and there is a tendency that it is easy to obtain an analysis result that is useful and highly accurate in network analysis processing described later.

次に、ユーザの操作によりユーザ端末２０から所望の分析対象のノードのラベルである分析対象ラベルが指定される（Ｓ７４）。ここでは、分析対象ラベルとして「技術用語」が指定されたものとする。 Next, an analysis target label that is a label of a desired analysis target node is designated from the user terminal 20 by a user operation (S74). Here, it is assumed that “technical term” is designated as the analysis target label.

ユーザによりユーザ端末２０で指定された分析対象ラベルの情報はネットワーク分析装置３０の分析グラフ生成部３１で取得され、この指定された分析対象ラベルに該当するキーノードを有する頻出グラフパターンが、頻出グラフパターン抽出部３５に要求される（Ｓ７５）。 Information on the analysis target label specified by the user on the user terminal 20 is acquired by the analysis graph generation unit 31 of the network analysis device 30, and the frequent graph pattern having key nodes corresponding to the specified analysis target label is the frequent graph pattern. The extraction unit 35 is requested (S75).

そして、分析グラフ生成部３１からの要求により、ユーザにより指定された分析対象ラベルと同一ラベルのキーノードを有する頻出グラフパターンが頻出グラフパターン記憶部３６から抽出され、分析グラフ生成部３１に送出されるとともに、ユーザ端末２０に表示されユーザに提供される。 Then, in response to a request from the analysis graph generation unit 31, a frequent graph pattern having a key node with the same label as the analysis target label specified by the user is extracted from the frequent graph pattern storage unit 36 and sent to the analysis graph generation unit 31. At the same time, it is displayed on the user terminal 20 and provided to the user.

以降のステップＳ７６〜Ｓ８４の処理は、第１実施形態におけるステップＳ４〜Ｓ１２の処理と同様であるため、詳細な説明は省略する。 Since the subsequent processes in steps S76 to S84 are the same as the processes in steps S4 to S12 in the first embodiment, detailed description thereof is omitted.

以上の本実施形態によれば、予め生成された複数の検索用グラフパターンのうち、該当するＲＤＦデータにおいてマッチするサブグラフの数が所定数以上のものが頻出グラフパターンとして抽出され処理に利用されるため、ネットワーク密度が高いサブグラフを分析対象として得ることができ、ネットワーク分析処理において有用で精度の高い分析結果を得ることができる。 According to the present embodiment described above, among the plurality of search graph patterns generated in advance, those having a predetermined number or more of matching subgraphs in the corresponding RDF data are extracted as frequent graph patterns and used for processing. Therefore, a subgraph having a high network density can be obtained as an analysis target, and a highly accurate analysis result useful in network analysis processing can be obtained.

上記の第１実施形態および第２実施形態においては、ＲＤＦデータの指定情報、分析対象ラベル指定情報、分析ノード集合指定情報、係数閾値指定情報等を必要に応じてユーザ端末２０から入力する場合について説明したが、予めこれらの情報を事前に入力しておくことで、処理を実行するようにしてもよい。 In the first and second embodiments described above, the RDF data designation information, analysis target label designation information, analysis node set designation information, coefficient threshold designation information, and the like are input from the user terminal 20 as necessary. As described above, the processing may be executed by inputting these pieces of information in advance.

また、上記の各実施形態におけるネットワーク分析装置の機能構成をプログラム化してコンピュータに組み込むことにより、当該コンピュータをネットワーク分析装置として機能させるネットワーク分析用プログラムを構築することも可能である。 Moreover, it is also possible to construct a network analysis program that causes the computer to function as a network analysis device by programming the functional configuration of the network analysis device in each of the above embodiments into a computer.

１，２…ネットワーク分析システム
１０…ＲＤＦデータ記憶装置
２０…ユーザ端末
３０…ネットワーク分析装置
３１…分析グラフ生成部
３２…分析グラフ記憶部
３３…ネットワーク分析部
３４…周辺情報検索部
３５…頻出グラフパターン抽出部
３６…頻出グラフパターン記憶部
４０…グラフパターン記憶装置 DESCRIPTION OF SYMBOLS 1, 2 ... Network analysis system 10 ... RDF data storage device 20 ... User terminal 30 ... Network analysis device 31 ... Analysis graph generation part 32 ... Analysis graph storage part 33 ... Network analysis part 34 ... Peripheral information search part 35 ... Frequent graph pattern Extraction unit 36 ... frequent graph pattern storage unit 40 ... graph pattern storage device

Claims

Stores graph structure data with a labeled directed graph structure consisting of nodes consisting of a value indicating the data element and a label indicating the type of this value, a label indicating the relationship between the nodes, and an edge consisting of the direction of this relationship Graph structure data storage device, one key node having a value designated from the outside as a node value, and a graph pattern storage device storing a graph pattern including a graph structure including a node that can take an arbitrary value; In the network analyzer connected to
A graph pattern acquisition unit for acquiring an analysis graph pattern having a key node having the same label as the specified label from the graph pattern storage device;
By substituting the values of a plurality of designated nodes into the key nodes of the analysis graph pattern acquired by the graph pattern acquisition unit, the result nodes are selected from the graph structure data stored in the graph structure data storage device. A result node identification part to be identified;
A coefficient indicating the degree of overlap of the result nodes specified by the result node specifying unit is calculated for any two nodes having the values of the specified plurality of nodes, and the calculated coefficients are set in advance. An analysis graph generation unit that generates an analysis graph of an unlabeled undirected graph composed of nodes having values of the plurality of nodes so as to have an edge between nodes satisfying the condition;
A network analysis unit that performs a network analysis using a predetermined analysis technique on the analysis graph generated by the analysis graph generation unit, and generates an analysis result;
A network analysis apparatus comprising:

Stores graph structure data with a labeled directed graph structure consisting of nodes consisting of a value indicating the data element and a label indicating the type of this value, a label indicating the relationship between the nodes, and an edge consisting of the direction of this relationship Network analyzer connected to the graph structure data storage device,
From the graph structure data stored in the graph structure data recording device, the graph pattern is composed of a graph structure including one key node having a value designated from the outside as a node value and a node capable of taking an arbitrary value. A frequent graph pattern extraction unit that extracts a matching number of graph structure parts of a predetermined number or more;
Of the graph patterns extracted by the frequent graph pattern extraction unit, a graph pattern acquisition unit that acquires a key node having the same label as the specified label as an analysis graph pattern;
By substituting the values of a plurality of designated nodes into the key nodes of the analysis graph pattern acquired by the graph pattern acquisition unit, the result nodes are selected from the graph structure data stored in the graph structure data storage device. A result node identification part to be identified;
A coefficient indicating the degree of overlap of the result nodes specified by the result node specifying unit is calculated for any two nodes having the values of the specified plurality of nodes, and the calculated coefficients are set in advance. An analysis graph generation unit that generates an analysis graph of an unlabeled undirected graph composed of nodes having values of the plurality of nodes so as to have an edge between nodes satisfying the condition;
A network analysis unit that performs a network analysis using a predetermined analysis technique on the analysis graph generated by the analysis graph generation unit, and generates an analysis result;
A network analysis apparatus comprising:

A peripheral information acquisition unit that acquires, as peripheral information, metadata related to a node or an edge that satisfies a predetermined condition in an analysis graph in which network analysis has been performed by the network analysis unit;
The provisional information output part which adds the peripheral information acquired by the said peripheral information acquisition part to the analysis result produced | generated by the said network analysis part, and outputs as provision information is characterized by the above-mentioned. The network analyzer described.

Stores graph structure data with a labeled directed graph structure consisting of nodes consisting of a value indicating the data element and a label indicating the type of this value, a label indicating the relationship between the nodes, and an edge consisting of the direction of this relationship Graph structure data storage device, one key node having a value designated from the outside as a node value, and a graph pattern storage device storing a graph pattern including a graph structure including a node that can take an arbitrary value; The computer connected to
A graph pattern acquisition step of acquiring an analysis graph pattern having a key node having the same label as the specified label from the graph pattern storage device;
By substituting the values of a plurality of specified nodes into the key nodes of the analysis graph pattern acquired in the graph pattern acquisition step, result nodes are selected from the graph structure data stored in the graph structure data storage device. A result node identification step to identify;
A coefficient indicating the degree of overlap of the result nodes specified by the result node specifying unit is calculated for any two nodes having the values of the specified plurality of nodes, and the calculated coefficients are set in advance. An analysis graph generation step of generating an analysis graph of an unlabeled undirected graph composed of nodes having values of the plurality of nodes so as to have an edge between nodes satisfying the condition;
A network analysis step for performing a network analysis on the analysis graph generated in the analysis graph generation step using a predetermined analysis technique and generating an analysis result;
A network analysis method characterized by comprising:

Stores graph structure data with a labeled directed graph structure consisting of nodes consisting of a value indicating the data element and a label indicating the type of this value, a label indicating the relationship between the nodes, and an edge consisting of the direction of this relationship A computer connected to the graph data storage device
From the graph structure data stored in the graph structure data recording device, the graph pattern is composed of a graph structure including one key node having a value designated from the outside as a node value and a node capable of taking an arbitrary value. A frequent graph pattern extracting step for extracting a graph structure portion that matches a predetermined number or more,
Of the graph patterns extracted by the frequent graph pattern extraction unit, a graph pattern acquisition step of acquiring a key node having the same label as the specified label as an analysis graph pattern;
By substituting the values of a plurality of designated nodes into the key nodes of the analysis graph pattern acquired by the graph pattern acquisition unit, the result nodes are selected from the graph structure data stored in the graph structure data storage device. A result node identification step to identify;
A coefficient indicating the degree of overlap of the result nodes specified by the result node specifying unit is calculated for any two nodes having the values of the specified plurality of nodes, and the calculated coefficients are set in advance. An analysis graph generation step of generating an analysis graph of an unlabeled undirected graph composed of nodes having values of the plurality of nodes so as to have an edge between nodes satisfying the condition;
A network analysis step for performing a network analysis on the analysis graph generated in the analysis graph generation step using a predetermined analysis technique and generating an analysis result;
A network analysis method characterized by comprising:

Peripheral information acquisition step for acquiring, as peripheral information, metadata related to nodes or edges that satisfy a predetermined condition in the analysis graph in which network analysis has been performed in the network analysis step;
6. The provision information output step of adding the peripheral information acquired in the peripheral information acquisition step to the analysis result generated in the network analysis step and outputting the information as provision information. The network analysis method described.

A network analysis program for causing a computer to function as the network analysis device according to any one of claims 1 to 3.