JP7157328B2

JP7157328B2 - Graph simplification method, graph simplification program, and information processing device

Info

Publication number: JP7157328B2
Application number: JP2018211325A
Authority: JP
Inventors: 弘伸北島; 康男山根
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2018-11-09
Filing date: 2018-11-09
Publication date: 2022-10-20
Anticipated expiration: 2038-11-09
Also published as: JP2020077299A

Description

本発明はグラフ簡略化方法、グラフ簡略化プログラムおよび情報処理装置に関する。 The present invention relates to a graph simplification method, a graph simplification program, and an information processing apparatus.

多数のノードとノード間を接続するエッジとを含む大規模グラフを生成し、生成した大規模グラフを用いて様々な分析を行うことがある。例えば、人や企業や装置などの実体的存在（エンティティ）をノードで表現し、エンティティ間の関係をエッジで表現し、特定の条件を満たすエンティティまたは関係をグラフから検索することがある。 A large-scale graph including many nodes and edges connecting the nodes is generated, and various analyzes are performed using the generated large-scale graph. For example, entities such as people, companies, and devices are represented by nodes, relationships between the entities are represented by edges, and entities or relationships satisfying specific conditions are searched from the graph.

例えば、同一企業内のユーザ間で行われるコミュニケーションのうち業務との関連が強いコミュニケーションを抽出して、ユーザ間の人脈を定量的に評価する人脈分析検索システムが提案されている。提案の人脈分析検索システムは、相互にコミュニケーション量が多いユーザの集合をコミュニケーションコアとして抽出する。 For example, there has been proposed a personal-connection analysis/search system that extracts communications that are strongly related to work among communications between users in the same company, and quantitatively evaluates personal connections between users. The proposed personal network analysis and search system extracts a group of users who have a large amount of communication with each other as a communication core.

また、例えば、スケールフリー特性をもつ大規模グラフにおける最短経路探索を並列処理化するグラフ処理方法が提案されている。また、例えば、専用の論理回路を用いたハードウェア処理と汎用プロセッサを用いたソフトウェア処理とを併用して、グラフの最短経路探索を行う情報処理システムが提案されている。提案の情報処理システムは、接続エッジ数が最も大きいノードをハブノードとしてグラフから抽出し、ハブノードを中心に形成されるノード群を優先的にハードウェア処理に割り当てる。 Also, for example, a graph processing method has been proposed that parallelizes the shortest path search in a large-scale graph with scale-free characteristics. Further, for example, an information processing system has been proposed in which hardware processing using a dedicated logic circuit and software processing using a general-purpose processor are used together to search for the shortest path in a graph. The proposed information processing system extracts a node with the largest number of connected edges from the graph as a hub node, and preferentially assigns a node group formed around the hub node to hardware processing.

特開２００７－２８００００号公報Japanese Patent Application Laid-Open No. 2007-280000 国際公開第２０１３／１４５００１号WO2013/145001 国際公開第２０１５／００４７８８号WO2015/004788

経路探索などの検索処理に用いるグラフは、多数のエッジをもつハブノードと、少数のエッジをもつ周辺ノードとから形成されていることがある。特に、人や企業の間の社会的関係を表すグラフなど、ある種類のグラフは、ハブノード同士が密に接続されたコアと周辺ノードが分散した周辺部とを有していることがある。 A graph used for search processing such as route search may be formed from hub nodes with many edges and peripheral nodes with a small number of edges. In particular, certain types of graphs, such as graphs representing social relationships between people and businesses, may have a core with densely connected hub nodes and a periphery with distributed peripheral nodes.

コアを有するグラフに対してそのまま検索処理を行うと、コアは多数のエッジを含むため計算量が大きくなってしまう。一方で、コア内部においては多くのハブノードが相互に接続されていることが期待されるため、検索処理上の重要性が比較的低く、厳密な検索処理が不要であることもある。そこで、検索処理の効率化のためにグラフを簡略化することが考えられる。例えば、周辺部に属する２つの周辺ノードの間の経路を、コアを簡略化したグラフを用いて探索することが考えられる。 If a search process is performed on a graph having a core as it is, the core includes a large number of edges, resulting in a large amount of calculation. On the other hand, since many hub nodes are expected to be interconnected inside the core, their importance in search processing is relatively low, and strict search processing may not be necessary. Therefore, it is conceivable to simplify the graph in order to improve the efficiency of search processing. For example, it is conceivable to search for a path between two peripheral nodes belonging to the periphery using a graph that simplifies the core.

しかし、コアを無視したりコアを１つのノードに置換するなどの極端な簡略化を行うと、コア内部のハブノードと周辺ノードとの間の接続関係についての情報が大きく失われ、検索処理の精度を低下させるという影響がある。 However, extreme simplification, such as ignoring the core or replacing the core with a single node, greatly reduces the information about the connection relationship between the hub node inside the core and the peripheral nodes, resulting in poor search accuracy. has the effect of lowering

１つの側面では、本発明は、グラフを用いた検索処理の精度への影響を低減するグラフ簡略化方法、グラフ簡略化プログラムおよび情報処理装置を提供することを目的とする。 An object of the present invention in one aspect is to provide a graph simplification method, a graph simplification program, and an information processing apparatus that reduce the influence on the accuracy of search processing using graphs.

１つの態様では、コンピュータが実行するグラフ簡略化方法が提供される。複数のノードとノード間を接続する複数のエッジとを含むグラフデータを取得する。グラフデータが示す複数のノードの中から、接続されたエッジの数が閾値より大きい複数のハブノードを検出し、検出した複数のハブノードのうちエッジによって連結された２以上のハブノードを含むコアノード集合を判定する。コアノード集合に属するハブノードの間の距離に基づいて、コアノード集合に対して中心ノードを設定し、中心ノード以外のコアノード集合に属するハブノード同士を接続するエッジに代えて、中心ノード以外のコアノード集合に属するハブノードと中心ノードとを接続するエッジを含む他のグラフデータを生成する。 In one aspect, a computer-implemented graph simplification method is provided. Obtain graph data including multiple nodes and multiple edges connecting the nodes. A plurality of hub nodes whose number of connected edges is greater than a threshold is detected from among the plurality of nodes indicated by the graph data, and a core node set including two or more hub nodes connected by edges is determined among the plurality of detected hub nodes. do. Based on the distance between the hub nodes belonging to the core node set, a central node is set for the core node set, and instead of the edge connecting the hub nodes belonging to the core node set other than the central node, the hub nodes belong to the core node set other than the central node. Generate other graph data containing edges connecting hub nodes and center nodes.

また、１つの態様では、コンピュータに実行させるグラフ簡略化プログラムが提供される。また、１つの態様では、記憶部と処理部とを有する情報処理装置が提供される。 Also, in one aspect, a graph simplification program to be executed by a computer is provided. Also, in one aspect, an information processing apparatus having a storage unit and a processing unit is provided.

１つの側面では、グラフを用いた検索処理の精度への影響を低減する。 One aspect reduces the impact on the accuracy of search processing using graphs.

第１の実施の形態の情報処理装置の例を説明する図である。It is a figure explaining the example of the information processing apparatus of 1st Embodiment. 第２の実施の形態の分析装置のハードウェア例を示すブロック図である。It is a block diagram showing an example of hardware of an analysis device of a 2nd embodiment. 企業ネットワークの関係性分析の例を示す図である。FIG. 10 is a diagram showing an example of relationship analysis of a corporate network; 企業ネットワークの形状例を示す図である。1 is a diagram showing an example of the shape of a corporate network; FIG. 企業ネットワークの特徴例を示す分布図である。FIG. 2 is a distribution diagram showing an example of features of a corporate network; 経路探索の例を示す図である。It is a figure which shows the example of route search. 企業ネットワークの簡略化例を示す図である。1 illustrates a simplified example of a corporate network; FIG. 分析装置の機能例を示すブロック図である。4 is a block diagram showing an example of functions of the analyzer; FIG. グラフデータの例を示す図である。It is a figure which shows the example of graph data. 接続情報の例を示す図である。FIG. 4 is a diagram showing an example of connection information; グラフ簡略化の手順例を示すフローチャートである。7 is a flow chart showing an example of a procedure for graph simplification; グラフ簡略化の他の手順例を示すフローチャートである。FIG. 11 is a flowchart showing another procedure example of graph simplification; FIG.

以下、本実施の形態を図面を参照して説明する。
［第１の実施の形態］
第１の実施の形態を説明する。 Hereinafter, this embodiment will be described with reference to the drawings.
[First embodiment]
A first embodiment will be described.

図１は、第１の実施の形態の情報処理装置の例を説明する図である。
第１の実施の形態の情報処理装置１０は、複数のノードおよびノード間のエッジを含むグラフを分析する。情報処理装置１０をコンピュータと言うことがある。また、情報処理装置１０は、クライアント装置でもよいしサーバ装置でもよい。情報処理装置１０が分析するグラフは、人や企業や装置などのエンティティをノードで表現し、エンティティ間の関係をエッジで表現した大規模グラフである。例えば、グラフとして、人や企業の間の取引関係を表現した企業ネットワーク、通信装置間の接続関係を表現した通信ネットワーク、Ｗｅｂサイト間のリンク関係を表現したＷｅｂネットワークなどが挙げられる。情報処理装置１０は、所定の条件を満たすノードまたはエッジをグラフから検索する検索処理を行ってもよい。例えば、検索処理として、最短経路探索など、あるノードから１以上のエッジを介して別のノードに到達する経路を探索する経路探索を行ってもよい。 FIG. 1 is a diagram illustrating an example of an information processing apparatus according to a first embodiment.
The information processing apparatus 10 according to the first embodiment analyzes a graph including a plurality of nodes and edges between nodes. The information processing device 10 is sometimes called a computer. Further, the information processing device 10 may be a client device or a server device. The graph analyzed by the information processing apparatus 10 is a large-scale graph in which entities such as people, companies, and devices are represented by nodes, and relationships between the entities are represented by edges. Examples of graphs include a corporate network that expresses transaction relationships between people and companies, a communication network that expresses connection relationships between communication devices, and a Web network that expresses link relationships between websites. The information processing apparatus 10 may perform search processing for searching the graph for nodes or edges that satisfy a predetermined condition. For example, as the search process, a route search, such as a shortest route search, may be performed to search for a route from one node to another node via one or more edges.

情報処理装置１０は、記憶部１１および処理部１２を有する。記憶部１１は、ＲＡＭ（Random Access Memory）などの揮発性の半導体メモリでもよいし、ＨＤＤ（Hard Disk Drive）やフラッシュメモリなどの不揮発性ストレージでもよい。処理部１２は、例えば、ＣＰＵ（Central Processing Unit）、ＧＰＵ（Graphics Processing Unit）、ＤＳＰ（Digital Signal Processor）などのプロセッサである。ただし、処理部１２は、ＡＳＩＣ（Application Specific Integrated Circuit）やＦＰＧＡ（Field Programmable Gate Array）などの特定用途の電子回路を含んでもよい。複数のプロセッサの集合を「マルチプロセッサ」または単に「プロセッサ」と言うことがある。 The information processing device 10 has a storage unit 11 and a processing unit 12 . The storage unit 11 may be a volatile semiconductor memory such as a RAM (Random Access Memory), or may be a non-volatile storage such as an HDD (Hard Disk Drive) or flash memory. The processing unit 12 is, for example, a processor such as a CPU (Central Processing Unit), a GPU (Graphics Processing Unit), or a DSP (Digital Signal Processor). However, the processing unit 12 may include electronic circuits for specific purposes such as ASICs (Application Specific Integrated Circuits) and FPGAs (Field Programmable Gate Arrays). A collection of multiple processors is sometimes called a "multiprocessor" or simply a "processor."

記憶部１１は、グラフデータ１３を記憶する。グラフデータ１３は、複数のノードとノード間を接続する複数のエッジとを含む。処理部１２は、グラフデータ１３をグラフデータ１４に変換する。グラフデータ１４は、検索処理の計算量を削減するためにグラフデータ１３を簡略化したものであり、グラフデータ１３と同様に複数のノードとノード間を接続する複数のエッジとを含む。グラフデータ１４は、例えば、記憶部１１に記憶され、処理部１２によって経路探索などの検索処理に使用される。例えば、グラフデータ１３，１４に含まれるノードはノードＩＤによって識別され、グラフデータ１３，１４に含まれるエッジは両端のノードの組み合わせによって識別される。 The storage unit 11 stores graph data 13 . The graph data 13 includes multiple nodes and multiple edges connecting the nodes. The processing unit 12 converts the graph data 13 into graph data 14 . The graph data 14 is obtained by simplifying the graph data 13 in order to reduce the calculation amount of the search processing, and like the graph data 13, includes a plurality of nodes and a plurality of edges connecting the nodes. The graph data 14 is stored, for example, in the storage unit 11 and used by the processing unit 12 for search processing such as route search. For example, nodes included in graph data 13 and 14 are identified by node IDs, and edges included in graph data 13 and 14 are identified by a combination of nodes at both ends.

グラフデータ１３からグラフデータ１４への変換において、処理部１２は、グラフデータ１３が示す複数のノードの中から、接続されたエッジの数が閾値より大きい複数のハブノードを検出する。ハブノードは、関係をもつ他のエンティティの数が多い大規模エンティティを表している。処理部１２は、検出された複数のハブノードのうちエッジによって連結された２以上のハブノードを含むコアノード集合１５を判定する。コアノード集合１５は、ハブノード同士が密に接続されたものであり、大規模エンティティの集合を表している。例えば、処理部１２は、グラフデータ１３からハブノードとハブノード間を接続するエッジとを抽出し、ハブノードによって形成される連結部分グラフ（連結成分）のうちハブノード数が最大の連結部分グラフをコアノード集合１５と判定する。 In converting the graph data 13 into the graph data 14, the processing unit 12 detects, from among the nodes indicated by the graph data 13, a plurality of hub nodes whose number of connected edges is greater than a threshold. A hub node represents a large entity with a large number of other entities with which it has relationships. The processing unit 12 determines a core node set 15 including two or more hub nodes connected by edges among the plurality of detected hub nodes. The core node set 15 is densely connected hub nodes and represents a set of large-scale entities. For example, the processing unit 12 extracts hub nodes and edges connecting the hub nodes from the graph data 13 , and converts the connected subgraph (connected component) formed by the hub nodes into a core node set 15 that has the largest number of hub nodes. I judge.

例えば、グラフデータ１３がノードＡ，Ｂ，Ｃ，Ｄ，Ｅ，Ｆ，Ｇ，Ｈを含んでいるとする。ノードＡのエッジ数（次数）は４、ノードＢのエッジ数は４、ノードＣのエッジ数は６、ノードＤのエッジ数は４、ノードＥのエッジ数は５、ノードＦのエッジ数は１、ノードＧのエッジ数は１、ノードＨのエッジ数は１である。処理部１２は、例えば、ノードＡ，Ｂ，Ｃ，Ｄ，Ｅをハブノードとして検出し、ノードＡ，Ｂ，Ｃ，Ｄ，Ｅによって形成される連結部分グラフをコアノード集合１５と判定する。 For example, suppose graph data 13 includes nodes A, B, C, D, E, F, G, and H. Node A has 4 edges (order), Node B has 4 edges, Node C has 6 edges, Node D has 4 edges, Node E has 5 edges, and Node F has 1 edge. , node G has one edge, and node H has one edge. The processing unit 12 detects, for example, the nodes A, B, C, D, and E as hub nodes, and determines the connected subgraph formed by the nodes A, B, C, D, and E as the core node set 15 .

処理部１２は、コアノード集合１５に属するハブノードの間の距離に基づいて、コアノード集合１５に対して中心ノード１６を設定する。中心ノード１６は、グラフデータ１３においてコアノード集合１５に含まれていたハブノードの１つであってもよいし、グラフデータ１３においてコアノード集合１５に含まれていない新規ノードであってもよい。処理部１２は、中心ノード１６以外のコアノード集合１５に属するハブノード同士を接続するエッジを削除する。代わりに、処理部１２は、中心ノード１６以外のコアノード集合１５に属するハブノードと中心ノード１６とを接続するエッジを追加する。これにより、コアノード集合１５がスター型の連結部分グラフに変形される。 The processing unit 12 sets the central node 16 for the core node set 15 based on the distance between hub nodes belonging to the core node set 15 . The central node 16 may be one of the hub nodes included in the core node set 15 in the graph data 13 or may be a new node not included in the core node set 15 in the graph data 13 . The processing unit 12 deletes edges connecting hub nodes belonging to the core node set 15 other than the central node 16 . Instead, the processing unit 12 adds edges connecting hub nodes belonging to the core node set 15 other than the central node 16 and the central node 16 . As a result, the core node set 15 is transformed into a star-shaped connected subgraph.

処理部１２は、スター型に変形されたコアノード集合１５を含むグラフデータ１４を生成する。グラフデータ１３とグラフデータ１４の主要な違いは、コアノード集合１５に属するハブノード同士の接続関係である。グラフデータ１４は、グラフデータ１３に含まれるノードを含んでよい。また、グラフデータ１４において、コアノード集合１５に属さないノード同士を接続するエッジは、グラフデータ１３と同様でよい。また、グラフデータ１４において、コアノード集合１５に属さないノードと、中心ノード１６以外のコアノード集合１５に属するハブノードとを接続するエッジは、グラフデータ１３と同様でよい。 The processing unit 12 generates graph data 14 including a core node set 15 transformed into a star shape. A major difference between the graph data 13 and the graph data 14 is the connection relationship between hub nodes belonging to the core node set 15 . Graph data 14 may include nodes included in graph data 13 . In the graph data 14 , edges connecting nodes that do not belong to the core node set 15 may be the same as in the graph data 13 . In the graph data 14 , edges connecting nodes not belonging to the core node set 15 and hub nodes belonging to the core node set 15 other than the central node 16 may be the same as in the graph data 13 .

上記のように、中心ノード１６の設定にあたってはコアノード集合１５に属するハブノードの間の距離が参照される。例えば、処理部１２は、新規ノードを中心ノード１６として設定し、中心ノード１６と各ハブノードとの間のエッジのウェイトを一定値に統一する。エッジのウェイトは、コアノード集合１５に属するハブノードの間の距離に基づいて決定することが考えられる。ハブノードの間の距離の平均値の２分の１を、エッジのウェイトとしてもよい。また、例えば、処理部１２は、既存のハブノードの１つを中心ノード１６として設定し、中心ノード１６と他のハブノードそれぞれとの間のエッジのウェイトをグラフデータ１３上で算出される距離に決定することが考えられる。中心ノード１６は、近接中心性など、ハブノード間の距離に基づいてハブノードの中から選択してもよい。 As described above, the distance between hub nodes belonging to the core node set 15 is referred to when setting the central node 16 . For example, the processing unit 12 sets the new node as the central node 16, and unifies the edge weights between the central node 16 and each hub node to a constant value. Edge weights may be determined based on the distance between hub nodes belonging to the core node set 15 . One-half of the average distance between hub nodes may be used as the edge weight. Also, for example, the processing unit 12 sets one of the existing hub nodes as the center node 16, and determines the weight of the edge between the center node 16 and each of the other hub nodes as the distance calculated on the graph data 13. can be considered. Central nodes 16 may select among hub nodes based on the distance between hub nodes, such as proximity centrality.

例えば、処理部１２は、コアノード集合１５に対して新規の中心ノード１６を追加する。処理部１２は、コアノード集合１５に属するノードＡ，Ｂ，Ｃ，Ｄ，Ｅの間の１０個のエッジを削除し、代わりに中心ノード１６とノードＡ，Ｂ，Ｃ，Ｄ，Ｅの間に５個のエッジを追加する。このようにして生成されたグラフデータ１４では、コアノード集合１５に属するノードＡ，Ｂ，Ｃ，Ｄ，Ｅは維持される。また、コアノード集合１５内のノードＣとコアノード集合１５外のノードＦ，Ｇとの間のエッジは維持され、コアノード集合１５内のノードＥとコアノード集合１５外のノードＨとの間のエッジは維持される。 For example, the processing unit 12 adds a new core node 16 to the core node set 15 . The processing unit 12 deletes 10 edges between the nodes A, B, C, D, and E belonging to the core node set 15, and instead adds 10 edges between the central node 16 and the nodes A, B, C, D, and E. Add 5 edges. In the graph data 14 generated in this manner, the nodes A, B, C, D, and E belonging to the core node set 15 are maintained. Also, the edge between node C in core node set 15 and nodes F and G outside core node set 15 is maintained, and the edge between node E in core node set 15 and node H outside core node set 15 is maintained. be done.

処理部１２は、グラフデータ１４を用いて経路探索などの検索処理を実行してもよい。例えば、処理部１２は、コアノード集合１５外のノードＦからコアノード集合１５外のノードＨに至る経路を探索することが考えられる。探索すべき経路はコアノード集合１５を通過する可能性がある。しかし、コアノード集合１５に属する複数のハブノードは相互に密に接続されていることが期待され、コアノード集合１５内のルートを厳密に探索する必要性は低い。一方、コアノード集合１５をスター型に簡略化しても、コアノード集合１５に属するハブノードとコアノード集合１５外のノードとの間の接続関係は維持される。このため、コアノード集合１５外の２つのノードが、同じハブノードに接続されている場合と異なるハブノードに接続されている場合とを区別することができる。よって、グラフデータ１４を用いて検索処理を行うことは、効率性および精度の観点から有用である。 The processing unit 12 may use the graph data 14 to perform search processing such as route search. For example, the processing unit 12 may search for a route from node F outside the core node set 15 to node H outside the core node set 15 . The path to be searched may pass through core node set 15 . However, multiple hub nodes belonging to the core node set 15 are expected to be closely connected to each other, and there is little need to strictly search for routes within the core node set 15 . On the other hand, even if the core node set 15 is simplified into a star shape, the connection relationship between hub nodes belonging to the core node set 15 and nodes outside the core node set 15 is maintained. Therefore, two nodes outside the core node set 15 can be distinguished between being connected to the same hub node and being connected to different hub nodes. Therefore, performing search processing using the graph data 14 is useful from the viewpoint of efficiency and accuracy.

第１の実施の形態の情報処理装置１０によれば、グラフデータ１３から複数のハブノードが検出され、エッジによって連結された２以上のハブノードを含むコアノード集合１５が判定される。そして、コアノード集合１５に属するハブノードの間の距離に基づいて中心ノード１６が設定され、中心ノード１６以外のコアノード集合１５に属するハブノードと中心ノード１６とがスター型に接続されたグラフデータ１４が生成される。 According to the information processing apparatus 10 of the first embodiment, a plurality of hub nodes are detected from the graph data 13, and a core node set 15 including two or more hub nodes connected by edges is determined. Then, a central node 16 is set based on the distance between hub nodes belonging to the core node set 15, and graph data 14 is generated in which hub nodes belonging to the core node set 15 other than the central node 16 and the central node 16 are connected in a star shape. be done.

これにより、グラフデータ１４を用いて経路探索などの検索処理を行うことで、グラフデータ１３を用いる場合よりも計算量を削減することができる。また、コアノード集合１５を１つのノードに置換するなどの極端な簡略化を行う場合と比べて、コアノード集合１５内のハブノードとコアノード集合１５外のノードとの間の接続関係についての情報が大きく失われることを抑制できる。よって、検索処理の精度への影響を低減できる。 Accordingly, by performing search processing such as route search using the graph data 14, the amount of calculation can be reduced more than when the graph data 13 is used. In addition, compared to the case of extreme simplification such as replacing the core node set 15 with one node, information about the connection relationship between hub nodes in the core node set 15 and nodes outside the core node set 15 is greatly lost. can be suppressed. Therefore, the influence on the accuracy of search processing can be reduced.

［第２の実施の形態］
次に、第２の実施の形態を説明する。
第２の実施の形態の分析装置は、人や企業の間の関係を表すグラフである企業ネットワークを用いて、これらの人や企業の間で行われる可能性がある不正取引のリスクを分析する。第２の実施の形態の分析装置を、情報処理装置やコンピュータと言うこともある。分析装置は、クライアント装置でもよいしサーバ装置でもよい。 [Second embodiment]
Next, a second embodiment will be described.
The analysis device of the second embodiment analyzes the risk of fraudulent transactions that may occur between people and companies using a company network, which is a graph representing relationships between people and companies. . The analysis device of the second embodiment is also called an information processing device or a computer. The analysis device may be a client device or a server device.

図２は、第２の実施の形態の分析装置のハードウェア例を示すブロック図である。
分析装置１００は、バスに接続されたＣＰＵ１０１、ＲＡＭ１０２、ＨＤＤ１０３、画像信号処理部１０４、入力信号処理部１０５、媒体リーダ１０６および通信インタフェース１０７を有する。分析装置１００は、第１の実施の形態の情報処理装置１０に対応する。ＣＰＵ１０１は、第１の実施の形態の処理部１２に対応する。ＲＡＭ１０２またはＨＤＤ１０３は、第１の実施の形態の記憶部１１に対応する。 FIG. 2 is a block diagram showing an example of hardware of an analysis device according to the second embodiment.
The analyzer 100 has a CPU 101, a RAM 102, an HDD 103, an image signal processor 104, an input signal processor 105, a medium reader 106, and a communication interface 107 connected to a bus. The analysis device 100 corresponds to the information processing device 10 of the first embodiment. A CPU 101 corresponds to the processing unit 12 of the first embodiment. A RAM 102 or HDD 103 corresponds to the storage unit 11 of the first embodiment.

ＣＰＵ１０１は、プログラムの命令を実行するプロセッサである。ＣＰＵ１０１は、ＨＤＤ１０３に記憶されたプログラムやデータの少なくとも一部をＲＡＭ１０２にロードし、プログラムを実行する。なお、ＣＰＵ１０１は複数のプロセッサコアを備えてもよく、分析装置１００は複数のプロセッサを備えてもよい。複数のプロセッサの集合を「マルチプロセッサ」または単に「プロセッサ」と言うことがある。 The CPU 101 is a processor that executes program instructions. The CPU 101 loads at least part of the programs and data stored in the HDD 103 into the RAM 102 and executes the programs. Note that the CPU 101 may include multiple processor cores, and the analysis device 100 may include multiple processors. A collection of multiple processors is sometimes called a "multiprocessor" or simply a "processor."

ＲＡＭ１０２は、ＣＰＵ１０１が実行するプログラムやＣＰＵ１０１が演算に使用するデータを一時的に記憶する揮発性の半導体メモリである。なお、分析装置１００は、ＲＡＭ以外の種類のメモリを備えてもよく、複数のメモリを備えてもよい。 The RAM 102 is a volatile semiconductor memory that temporarily stores programs executed by the CPU 101 and data used by the CPU 101 for calculation. Note that the analysis device 100 may be provided with a type of memory other than the RAM, and may be provided with a plurality of memories.

ＨＤＤ１０３は、ＯＳ（Operating System）やミドルウェアやアプリケーションソフトウェアなどのソフトウェアのプログラム、および、データを記憶する不揮発性ストレージである。なお、分析装置１００は、フラッシュメモリやＳＳＤ（Solid State Drive）など他の種類のストレージを備えてもよく、複数のストレージを備えてもよい。 The HDD 103 is a nonvolatile storage that stores an OS (Operating System), software programs such as middleware and application software, and data. Note that the analysis device 100 may include other types of storage such as flash memory and SSD (Solid State Drive), or may include multiple storages.

画像信号処理部１０４は、ＣＰＵ１０１からの命令に従って、分析装置１００に接続されたディスプレイ１１１に画像を出力する。ディスプレイ１１１としては、ＣＲＴ（Cathode Ray Tube）ディスプレイ、液晶ディスプレイ（ＬＣＤ：Liquid Crystal Display）、有機ＥＬ（ＯＥＬ：Organic Electro-Luminescence）ディスプレイなど、任意の種類のディスプレイを使用することができる。 The image signal processing unit 104 outputs an image to the display 111 connected to the analysis device 100 according to commands from the CPU 101 . As the display 111, any type of display such as a CRT (Cathode Ray Tube) display, a liquid crystal display (LCD: Liquid Crystal Display), an organic EL (OEL: Organic Electro-Luminescence) display, or the like can be used.

入力信号処理部１０５は、分析装置１００に接続された入力デバイス１１２から入力信号を受信する。入力デバイス１１２として、マウス、タッチパネル、タッチパッド、キーボードなど、任意の種類の入力デバイスを使用できる。また、分析装置１００に複数の種類の入力デバイスが接続されてもよい。 The input signal processing unit 105 receives input signals from the input device 112 connected to the analysis device 100 . Input device 112 can be any type of input device, such as a mouse, touch panel, touchpad, keyboard, or the like. Also, multiple types of input devices may be connected to the analyzer 100 .

媒体リーダ１０６は、記録媒体１１３に記録されたプログラムやデータを読み取る読み取り装置である。記録媒体１１３として、例えば、フレキシブルディスク（ＦＤ：Flexible Disk）やＨＤＤなどの磁気ディスク、ＣＤ（Compact Disc）やＤＶＤ（Digital Versatile Disc）などの光ディスク、光磁気ディスク（ＭＯ：Magneto-Optical disk）、半導体メモリなどを使用できる。媒体リーダ１０６は、例えば、記録媒体１１３から読み取ったプログラムやデータをＲＡＭ１０２またはＨＤＤ１０３に格納する。 The medium reader 106 is a reading device that reads programs and data recorded on the recording medium 113 . Examples of the recording medium 113 include magnetic disks such as flexible disks (FDs) and HDDs, optical disks such as CDs (Compact Discs) and DVDs (Digital Versatile Discs), magneto-optical disks (MOs), A semiconductor memory or the like can be used. The medium reader 106 stores programs and data read from the recording medium 113 in the RAM 102 or the HDD 103, for example.

通信インタフェース１０７は、ネットワーク１１４に接続され、ネットワーク１１４を介して他の情報処理装置と通信を行う。通信インタフェース１０７は、スイッチやルータなどの有線通信装置に接続される有線通信インタフェースでもよいし、基地局やアクセスポイントなどの無線通信装置に接続される無線通信インタフェースでもよい。 The communication interface 107 is connected to the network 114 and communicates with other information processing apparatuses via the network 114 . The communication interface 107 may be a wired communication interface connected to a wired communication device such as a switch or router, or a wireless communication interface connected to a wireless communication device such as a base station or access point.

次に、企業ネットワークについて説明する。
図３は、企業ネットワークの関係性分析の例を示す図である。
企業ネットワーク３０は、複数のノードおよびノード間を接続する複数のエッジを含む無向グラフまたは有向グラフである。ただし、第２の実施の形態では無向グラフを想定する。企業ネットワーク３０に含まれる各ノードは、人または企業を表す。企業ネットワーク３０に含まれる各エッジは、企業間の関係または人と企業の間の関係を表す。 Next, the corporate network will be explained.
FIG. 3 is a diagram showing an example of relationship analysis of a corporate network.
The enterprise network 30 is an undirected or directed graph that includes multiple nodes and multiple edges connecting between the nodes. However, the second embodiment assumes an undirected graph. Each node included in enterprise network 30 represents a person or enterprise. Each edge included in enterprise network 30 represents a relationship between enterprises or a relationship between a person and an enterprise.

企業ネットワーク３０は、ＥＤＩＮＥＴ（Electronic Disclosure for Investor's Network）やＴＤＮＥＴ（Timely Disclosure Network）などの公開データベースが開示する公開投資情報から生成することができる。ノードが表す企業には、株式会社や持分会社などの事業会社、事業会社に投資する機関投資家、事業会社を監査する監査法人などが含まれる。ノードが表す人には、取締役や執行役や監査役など事業会社に従事する役員、事業会社に投資する個人投資家などが含まれる。エッジが表す企業間の関係には、親子会社関係や関連会社関係などのグループ関係、機関投資家による事業会社への投資、監査法人による事業会社の監査、その他の重要な取引関係などが含まれる。エッジが表す人と企業の間の関係には、事業会社の役員への就任、個人投資家による事業会社への投資、事業会社と個人の間のその他の重要な取引関係などが含まれる。 The corporate network 30 can be generated from public investment information disclosed by public databases such as EDINET (Electronic Disclosure for Investor's Network) and TDNET (Timely Disclosure Network). Companies represented by nodes include operating companies such as stock companies and equity companies, institutional investors investing in operating companies, and auditing firms that audit operating companies. Persons represented by nodes include directors, executive officers, auditors, and other executives engaged in operating companies, as well as individual investors investing in operating companies. Relationships between companies represented by edges include group relationships such as parent-subsidiary relationships and affiliated company relationships, investment in operating companies by institutional investors, audits of operating companies by auditing firms, and other important business relationships. . The relationship between a person and a company represented by an edge includes the appointment of a director of a business company, an investment in a business company by a private investor, and other significant business relationships between a business company and an individual.

企業ネットワーク３０は、数万個から数十万個のノードを含む大規模グラフである。分析装置１００は、企業ネットワーク３０を用いて関係性分析を行う。関係性分析では、インサイダー取引やマネーロンダリングなどの不正取引が行われる可能性のある要注意関係を検出する。関係性分析のために、分析装置１００は、ある２つのノードが１以上の他のノードを介して繋がっている経路を企業ネットワーク３０から探索する。経路は、３以上の主体が関与する連続的な取引関係を表すことがある。このとき、機関投資家を表すノードや監査法人を表すノードなど多数のエッジをもつノードを介した短い経路（当たり前の経路）だけでなく、多数のエッジをもつノードを迂回するような長い経路（当たり前ではない経路）も探索することが好ましい。これにより、一方の企業が役員を介して他方の企業に影響を与えるような不正リスクの高い関係を発見することができる。 A corporate network 30 is a large-scale graph containing tens of thousands to hundreds of thousands of nodes. The analysis device 100 uses the corporate network 30 to perform relationship analysis. Relationship analysis detects sensitive relationships that may lead to fraudulent transactions such as insider trading and money laundering. For the relationship analysis, the analysis device 100 searches the corporate network 30 for a route connecting two nodes via one or more other nodes. A path may represent a sequential business relationship involving three or more entities. At this time, not only short routes (obvious routes) that pass through nodes with many edges, such as nodes representing institutional investors and nodes representing audit firms, but also long routes that bypass nodes with many edges ( It is preferable to search for routes that are not taken for granted. This makes it possible to discover relationships with a high risk of fraud, such as one company influencing the other company through its executives.

一例として、企業ネットワーク３０は、企業Ｃ１を表すノード３１、企業Ｃ２を表すノード３２、企業Ｃ３を表すノード３３、機関投資家Ｉを表すノード３４および役員Ｋを表すノード３５を含む。ノード３１とノード３４、ノード３１とノード３３、ノード３２とノード３４、ノード３２とノード３５、ノード３３とノード３５の間にエッジがある。 As an example, corporate network 30 includes node 31 representing company C1, node 32 representing company C2, node 33 representing company C3, node 34 representing institutional investor I, and node 35 representing executive K. There are edges between nodes 31 and 34 , nodes 31 and 33 , nodes 32 and 34 , nodes 32 and 35 , and nodes 33 and 35 .

ノード３１とノード３４の間のエッジは、機関投資家Ｉによる企業Ｃ１への投資を表す。ノード３２とノード３４の間のエッジは、機関投資家Ｉによる企業Ｃ２への投資を表す。よって、ノード３１からノード３４を介してノード３２に至る経路は「当たり前の経路」であり、企業Ｃ１と企業Ｃ２の間の関係性分析では重要性が比較的低い。一方、ノード３１からノード３３，３５を介してノード３２に至る経路は「当たり前ではない経路」である。役員Ｋが企業Ｃ２，Ｃ３に影響を与えることで不正取引が行われるリスクがあるため、企業Ｃ１と企業Ｃ２の間の関係性分析では重要性が比較的高い。 The edge between nodes 31 and 34 represents an investment by institutional investor I in firm C1. The edge between nodes 32 and 34 represents an investment by institutional investor I in firm C2. Therefore, the route from node 31 to node 32 via node 34 is a "natural route" and is of relatively low importance in analyzing the relationship between company C1 and company C2. On the other hand, the route from the node 31 to the node 32 via the nodes 33 and 35 is an "unobvious route". Since there is a risk of fraudulent transactions being carried out due to the influence of officer K on companies C2 and C3, the importance of relationship analysis between companies C1 and C2 is relatively high.

「当たり前ではない経路」を発見しやすくするため、分析装置１００は、企業ネットワーク３０に対して次数ウェイト付き経路探索を行う。
無向グラフに含まれるエッジｅに対してウェイトｗ（ｅ）が割り当てられているとする。また、ある経路がノードｖ_０，ｖ_１，…，ｖ_ｈというｈ＋１個のノードを順に通過するものとする。ノードｖ_０は始点ノードｓであり、ノードｖ_ｈは終点ノードｔである。ｋ＝１，２，…，ｈとして、経路上で隣接するノードｖ_ｋ－１とノードｖ_ｋの間のエッジをｅ_{ｋ－１，ｋ}とする。すると、その経路の長さ（経路長）は数式（１）のように定義される。 In order to facilitate the discovery of “unobvious routes”, the analysis device 100 performs a degree-weighted route search for the corporate network 30 .
Assume that a weight w(e) is assigned to an edge e included in an undirected graph. It is also assumed that a certain route passes through h+1 nodes of nodes v ₀ , v ₁ , . . . , v _h in order. Node _v0 is the starting node s and node _vh is the ending node t. _Let _k = ₁ , 2, . Then, the length of the path (path length) is defined as in Equation (1).

無向グラフに含まれるノードｖに対してウェイトｗ（ｖ）が割り当てられている場合、エッジｅのウェイトｗ（ｅ）はノードｖのウェイトｗ（ｖ）から算出できる。ノードｖ_ｉとノードｖ_ｊがエッジｅ_ｉ，ｊで接続されており、ノードｖ_ｉにウェイトｗ（ｖ_ｉ）が割り当てられており、ノードｖ_ｊにウェイトｗ（ｖ_ｊ）が割り当てられているとする。すると、エッジｅ_ｉ，ｊのウェイトｗ（ｅ_ｉ，ｊ）は数式（２）のように定義される。すなわち、あるエッジのウェイトは、そのエッジの両端にあるノードのウェイトの平均値である。 When weight w(v) is assigned to node v included in the undirected graph, weight w(e) of edge e can be calculated from weight w(v) of node v. Node v _i and node v _j are connected by edge e _i,j , node v _i is assigned weight w(v _i ), and node v _j is assigned weight w(v _j ) and Then, the weight w(e _i,j ) of the edge e _i,j is defined as Equation (2). That is, the weight of an edge is the average of the weights of the nodes at both ends of that edge.

次数ウェイト付き経路探索では、ノードｖのウェイトｗ（ｖ）として、ノードｖに接続されたエッジの数、すなわち、ノードｖの次数ｄｅｇ（ｖ）を用いる。ノードｖ_ｉとノードｖ_ｊがエッジｅ_ｉ，ｊで接続されており、ノードｖ_ｉの次数がｄｅｇ（ｖ_ｉ）であり、ノードｖ_ｊの次数がｄｅｇ（ｖ_ｊ）であるとする。すると、エッジｅ_ｉ，ｊのウェイトｗ（ｅ_ｉ，ｊ）は数式（３）のように定義される。すなわち、あるエッジのウェイトは、そのエッジの両端にあるノードの次数の平均値である。 In the degree-weighted route search, the number of edges connected to the node v, ie, the degree deg(v) of the node v, is used as the weight w(v) of the node v. Suppose that nodes v _i and v _j are connected by an edge e _i,j , the degree of node v _i is deg(v _i ), and the degree of node v _j is deg(v _j ). Then, the weight w(e _i,j ) of the edge e _i,j is defined as Equation (3). That is, the weight of an edge is the average degree of the nodes at both ends of the edge.

次数ウェイト付き経路探索では、機関投資家を表すノードや監査法人を表すノードなど次数の大きいノードを通過する経路の長さは大きく算出される。このため、次数ウェイト付き経路探索と、短い経路を優先的に探索する経路探索アルゴリズムとを組み合わせると、「当たり前の経路」だけでなく「当たり前ではない経路」も発見しやすくなる。そこで、第２の実施の形態では次数ウェイト付き経路探索を採用する。 In order-weighted route search, the length of a route passing through a node with a large order such as a node representing an institutional investor or a node representing an audit firm is calculated to be large. Therefore, by combining a route search with degree weight and a route search algorithm that preferentially searches for short routes, it becomes easier to discover not only "obvious routes" but also "abnormal routes". Therefore, in the second embodiment, a route search with an order weight is adopted.

図４は、企業ネットワークの形状例を示す図である。
人や企業の間の関係を表す企業ネットワークは、図４に示すような形状をとることが多い。企業ネットワーク４０は、コア４１および周辺部４２を含む。 FIG. 4 is a diagram showing an example of the shape of a corporate network.
Corporate networks representing relationships between people and companies often take the form shown in FIG. Corporate network 40 includes core 41 and perimeter 42 .

コア４１は、多数のハブノードが密結合した連結部分グラフ（任意の２つのノードを結ぶ経路が存在する部分グラフ）である。ハブノードは、企業ネットワーク４０に含まれるノードのうち次数が閾値を超えるノードである。ハブノードとしては、機関投資家を表すノードや監査法人を表すノードなど、多数の他の企業と関係をもつハブ企業を表すノードが想定される。企業ネットワーク４０に含まれるノードのうち数パーセントから数十パーセントのノードが、コア４１に属することがある。図４では、コア４１に属するハブノードを円の外周に配置している。 The core 41 is a connected subgraph (a subgraph in which there is a path connecting any two nodes) in which many hub nodes are tightly coupled. A hub node is a node whose degree exceeds a threshold among the nodes included in the corporate network 40 . The hub node is assumed to be a node representing a hub company having relationships with many other companies, such as a node representing an institutional investor or a node representing an audit firm. A few percent to several tens of percent of the nodes included in the corporate network 40 may belong to the core 41 . In FIG. 4, the hub nodes belonging to the core 41 are arranged on the outer circumference of the circle.

周辺部４２は、それぞれ少数の他のノードとのみエッジで接続された周辺ノードから形成される部分グラフである。周辺部４２は２つの階層を含むことがある。周辺部４２の第１階層に属するノードは、コア４１に属する少数のハブノードとエッジで接続されている。ただし、第１階層に属するノード同士が直接接続されていることもある。第１階層に属するノードとしては、機関投資家や監査法人などのハブ企業と関係をもつ一般事業会社を表すノードが想定される。周辺部４２の第２階層に属するノードは、コア４１に属する少数のハブノードや周辺部４２の第１階層に属する少数の周辺ノードとエッジで接続されている。第２階層に属するノードとしては、役員などの人を表すノードが想定される。 Periphery 42 is a subgraph formed from peripheral nodes that are each connected by edges to only a few other nodes. Periphery 42 may include two layers. Nodes belonging to the first layer of the peripheral part 42 are connected to a small number of hub nodes belonging to the core 41 by edges. However, nodes belonging to the first layer may be directly connected to each other. Nodes belonging to the first hierarchy are assumed to represent general business companies that have relationships with hub companies such as institutional investors and audit firms. Nodes belonging to the second layer of the peripheral portion 42 are connected by edges to a small number of hub nodes belonging to the core 41 and a small number of peripheral nodes belonging to the first layer of the peripheral portion 42 . As a node belonging to the second hierarchy, a node representing a person such as an executive is assumed.

図５は、企業ネットワークの特徴例を示す分布図である。
分散図５０は、企業ネットワーク４０に含まれるノードの次数とノードの相対頻度の関係を表す。分散図５０の横軸は、ノードの次数、すなわち、ノードに接続されたエッジの数を対数表記したものである。分散図５０の縦軸は、ある次数をもつノードの出現頻度、すなわち、ある次数をもつノードの全ノードに対する割合を対数表記したものである。分散図５０は領域５１および領域５２を含む。 FIG. 5 is a distribution diagram showing a characteristic example of a corporate network.
The scatter diagram 50 represents the relationship between the degrees of nodes included in the corporate network 40 and the relative frequencies of the nodes. The horizontal axis of the scatter diagram 50 is a logarithmic representation of the degree of a node, that is, the number of edges connected to the node. The vertical axis of the scatter diagram 50 is the logarithmic representation of the appearance frequency of nodes with a certain degree, that is, the ratio of nodes with a certain degree to all nodes. Scatter diagram 50 includes area 51 and area 52 .

領域５１は、企業ネットワーク４０の周辺部４２に対応する。領域５１が示すように、周辺部４２はスケールフリーネットワークの特徴をもっていると言える。スケールフリーネットワークは、大多数のノードは少数の相手とのみエッジをもち、一部のノードは多数の相手とエッジをもつという特徴をもつネットワークである。 Region 51 corresponds to perimeter 42 of enterprise network 40 . As indicated by region 51, peripheral portion 42 can be said to have the characteristics of a scale-free network. A scale-free network is a network characterized in that most nodes have edges with only a few partners, and some nodes have edges with many partners.

一方、領域５２は、企業ネットワーク４０のコア４１に対応する。領域５２が示すように、コア４１はランダムグラフの特徴をもっていると言える。ランダムグラフは、任意の２つのノード間に一定の確率ｐでエッジを張ったグラフである。ランダムグラフに含まれるノードの次数と相対頻度との関係はポアソン分布に従う。 Region 52 , on the other hand, corresponds to core 41 of enterprise network 40 . As indicated by region 52, core 41 can be said to have the characteristics of a random graph. A random graph is a graph in which an edge is drawn between any two nodes with a certain probability p. The relationship between the degree and relative frequency of nodes included in the random graph follows the Poisson distribution.

分析装置１００は、このような特徴をもつ企業ネットワークから、次数ウェイト付き経路探索により経路長が短いものから優先的に複数の経路を探索する。
図６は、経路探索の例を示す図である。 The analysis apparatus 100 preferentially searches for a plurality of routes from the corporate network having such characteristics by order-weighted route search in order of shortest route length.
FIG. 6 is a diagram showing an example of route search.

経路探索にはダイクストラ（Dijkstra）法や双方向ダイクストラ法などの経路探索アルゴリズムを応用することができる。ただし、第２の実施の形態では、経路長が最も短い最短経路だけでなく、最短経路ではない他の経路も探索される。そこで、純粋なダイクストラ法を拡張したものなど、複数の経路を探索可能な経路探索アルゴリズムを使用する。 A route search algorithm such as the Dijkstra method or the two-way Dijkstra method can be applied to the route search. However, in the second embodiment, not only the shortest route with the shortest route length, but also other routes that are not the shortest route are searched. Therefore, we use a route search algorithm capable of searching multiple routes, such as an extension of the pure Dijkstra algorithm.

例えば、ダイクストラ法に従ってノード毎に始点ノードからの経路長および前段ノードを示す情報を更新する際に、経路長が最小の情報だけでなく経路長が最小ではない情報も履歴として残しておく。そして、経路長が最小でない情報も用いて次段ノードの情報を複数パターン作成する。ノード毎の情報としては経路長が小さい方から所定個までの情報を保持しておく。これにより、始点ノードから終点ノードに至る経路として、最短経路に加えて、比較的経路長が小さい準最短経路も探索することができる。 For example, when updating the information indicating the path length from the starting node and the preceding node for each node according to the Dijkstra algorithm, not only the information with the minimum path length but also the information with the non-minimum path length are left as history. Then, a plurality of patterns of information of the next-level node are created using the information that the path length is not the minimum. As information for each node, up to a predetermined number of pieces of information from the shortest path length are held. As a result, it is possible to search not only the shortest path but also the semi-shortest path with a relatively small path length as the path from the start node to the end node.

一例として、グラフ７０はノード７１～７７（ノードａ，ｂ，ｃ，ｄ，ｅ，ｆ，ｇ）を含む。ノード７１とノード７２との間にはウェイト１のエッジがある。ノード７１とノード７３との間にはウェイト７のエッジがある。ノード７１とノード７４との間にはウェイト２のエッジがある。ノード７２とノード７５との間にはウェイト４のエッジがある。ノード７３とノード７５との間にはウェイト２のエッジがある。ノード７３とノード７６との間にはウェイト３のエッジがある。ノード７４とノード７６との間にはウェイト５のエッジがある。ノード７５とノード７７との間にはウェイト６のエッジがある。ノード７６とノード７７との間にはウェイト２のエッジがある。 As an example, graph 70 includes nodes 71-77 (nodes a, b, c, d, e, f, g). There is an edge of weight 1 between node 71 and node 72 . There is an edge of weight 7 between nodes 71 and 73 . There is an edge of weight 2 between nodes 71 and 74 . There is an edge of weight 4 between nodes 72 and 75 . There is an edge of weight 2 between nodes 73 and 75 . Between nodes 73 and 76 there is an edge of weight 3; Between nodes 74 and 76 there is an edge of weight 5; There is an edge of weight 6 between nodes 75 and 77 . Between nodes 76 and 77 there is a weight 2 edge.

始点ノードをノード７１、終点ノードをノード７７として、ダイクストラ法に準じて経路探索を行うことを考える。まず、ノード７１が選択され、ノード７１に対して長さ０のラベルが付与される。すると、ノード７２に対して長さ１かつ前段ノードａのラベルが付与され、ノード７３に対して長さ７かつ前段ノードａのラベルが付与され、ノード７４に対して長さ２かつ前段ノードａノラベルが付与される。 It is assumed that a route search is performed according to Dijkstra's algorithm, with the node 71 as the start node and the node 77 as the end node. First, node 71 is selected and a label of length 0 is assigned to node 71 . Then, the node 72 is given a label of length 1 and preceding node a, the node 73 is given a length of 7 and a label of preceding node a, and the node 74 is given a length of 2 and preceding node a. No label is given.

次に、最小経路長をもつノード７２が選択される。すると、ノード７５に対して長さ５かつ前段ノードｂのラベルが付与される。次に、最小経路長をもつノード７４が選択される。すると、ノード７６に対して長さ７かつ前段ノードｄのラベルが付与される。次に、最小経路長をもつノード７５が選択される。すると、ノード７３に対して、長さ７かつ前段ノードａのラベルを残して、長さ７かつ前段ノードｅのラベルが追加される。また、ノード７７に対して長さ１１かつ前段ノードｅのラベルが付与される。 Next, the node 72 with the smallest path length is selected. Then, node 75 is given a label of length 5 and preceding node b. Next, the node 74 with the smallest path length is selected. Then, the node 76 is given a label of length 7 and the preceding node d. Next, the node 75 with the smallest path length is selected. Then, the length 7 and the label of the preceding node e are added to the node 73 while the length 7 and the label of the preceding node a are left. Also, the label of the length 11 and the preceding node e is given to the node 77 .

次に、最小経路長をもつノードとしてノード７３が選択されたとする。すると、ノード７６に対して、ノード７３の１つ目のラベルに対応して長さ１０かつ前段ノードｃのラベルが追加される。また、ノード７３の２つ目のラベルに対応して長さ１０かつ前段ノードｃのラベルが追加される。このとき、長さ７かつ前段ノードｄのラベルは残しておく。次に、最小経路長をもつノード７６が選択される。すると、ノード７７に対して、ノード７６の１つ目のラベルに対応して長さ９かつ前段ノードｆのラベルが追加される。また、ノード７６の２つ目のラベルに対応して長さ１２かつ前段ノードｆのラベルが追加される。また、ノード７６の３つ目のラベルに対応して長さ１２かつ前段ノードｆのラベルが追加される。このとき、長さ１１かつ前段ノードｅのラベルは残しておく。 Now assume that node 73 is selected as the node with the smallest path length. Then, a label of length 10 and preceding node c is added to node 76 corresponding to the first label of node 73 . Also, corresponding to the second label of node 73, a label of length 10 and preceding node c is added. At this time, the length is 7 and the label of the preceding node d is left. Next, the node 76 with the minimum path length is selected. Then, a label of length 9 and preceding node f is added to node 77 corresponding to the first label of node 76 . Also, corresponding to the second label of node 76, a label of length 12 and preceding node f is added. Also, corresponding to the third label of node 76, a label of length 12 and preceding node f is added. At this time, the length 11 and the label of the preceding node e are left.

これにより、ノード７１，７４，７６，７７を順に辿る経路が長さ９の最短経路として抽出される。また、ノード７１，７２，７５，７７を順に辿る経路が長さ１１の経路として抽出される。また、ノード７１，７２，７５，７３，７６，７７を順に辿る経路が長さ１２の経路として抽出される。また、ノード７１，７３，７６，７７を順に辿る経路が長さ１２の経路として抽出される。このようにして、グラフ７０から、経路長が短い方から優先的に所定個までの経路を探索することが可能である。 As a result, the shortest route of length 9 is extracted as the route following the nodes 71 , 74 , 76 and 77 in order. Also, a path that follows the nodes 71, 72, 75, and 77 in order is extracted as a path of length 11. FIG. Also, the path that follows the nodes 71, 72, 75, 73, 76, and 77 in order is extracted as a path of length 12. FIG. Also, the path that follows the nodes 71, 73, 76, and 77 in order is extracted as a path of length 12. FIG. In this way, it is possible to search up to a predetermined number of routes from the graph 70 preferentially in ascending order of route length.

次に、企業ネットワークにおける経路探索の問題点について説明する。
前述の図４の企業ネットワーク４０を用いた不正調査では、周辺部４２に属する１つのノードから周辺部４２に属する別のノードに至る経路を複数通り探索したいことがある。例えば、ある人を表すノードから別の人を表すノードに至る経路であって、極端に遠回りではない経路を網羅的に探索したいことがある。探索すべき経路の中にはコア４１を通過しないものがある。例えば、始点ノードから１以上の一般事業会社を表すノードを経由して終点ノードに至る経路が存在することがある。一方、探索すべき経路の中にはコア４１を通過するものもある。例えば、始点ノードから１以上のハブ企業を表すノードを経由して終点ノードに至る経路が存在することがある。 Next, the problem of route search in corporate networks will be described.
In the above-described fraud investigation using the corporate network 40 of FIG. For example, there is a case where it is desired to comprehensively search for a route from a node representing a person to a node representing another person, which is not extremely detoured. Some paths to be searched do not pass through the core 41 . For example, there may be a path from the start node to the end node via nodes representing one or more general business companies. On the other hand, some routes to be searched pass through the core 41 . For example, there may be a path from a starting node through nodes representing one or more hub companies to an ending node.

しかし、コア４１内ではハブノード同士が密に接続されており経由するハブノードの選択パターンが多数存在するため、コア４１を通過する経路を探索する計算量が大きくなってしまうという問題がある。一方で、企業ネットワーク４０を用いた不正調査では、コア４１内のハブノードは相互に接続されていることが「当たり前」であるため、周辺部４２で何れのノードを経由するかは重要性が高いもののコア４１で何れのノードを通過するかは重要性が低い。すなわち、コア４１内で厳密な探索を行う必要性は低い。そこで、分析装置１００は、企業ネットワーク４０のコア４１を簡略化し、簡略化した企業ネットワークを用いて経路探索を行って計算量を削減する。 However, since the hub nodes are densely connected within the core 41 and there are many selection patterns of hub nodes to pass through, there is a problem that the amount of calculation for searching for a route passing through the core 41 becomes large. On the other hand, in fraud investigations using the corporate network 40, it is "natural" that the hub nodes in the core 41 are interconnected, so it is very important which nodes are routed in the peripheral part 42. It is less important which node is passed through in the core 41 of the object. That is, there is little need to perform a rigorous search within core 41 . Therefore, the analysis device 100 simplifies the core 41 of the enterprise network 40 and performs route search using the simplified enterprise network to reduce the amount of calculation.

ただし、コア４１を１つの仮想ノードに置換するような極端な簡略化を行うと、企業ネットワーク４０がもっていたコア４１と周辺部４２との間の関係についての情報が大きく失われてしまうおそれがある。例えば、周辺部４２の１つのノードと周辺部４２の別ノードとが、コア４１の異なるハブノードに接続されている場合がある。また、周辺部４２の１つのノードと周辺部４２の別ノードとが、コア４１の同じハブノードに接続されている場合がある。コア４１を１つの仮想ノードに置換すると、上記の２つの場合が同一視されてしまう。これは、２以上の役員または一般事業会社が同一のハブ企業と関係をもつ場合と、２以上の役員または一般事業会社が異なるハブ企業と関係をもつ場合とを同一視することになる。その結果、経路探索の精度が低下するという問題がある。 However, an extreme simplification, such as replacing the core 41 with a single virtual node, may result in a large loss of information about the relationship between the core 41 and the periphery 42 that the corporate network 40 had. be. For example, one node of the periphery 42 and another node of the periphery 42 may be connected to different hub nodes of the core 41 . Also, one node in the peripheral part 42 and another node in the peripheral part 42 may be connected to the same hub node of the core 41 . Replacing the core 41 with one virtual node equates the above two cases. This equates the case where two or more executives or general operating companies have relationships with the same hub company and the case where two or more executives or general operating companies have relationships with different hub companies. As a result, there is a problem that the accuracy of route search is lowered.

そこで、分析装置１００は、コア４１に属するハブノードは残しつつコア４１に含まれるエッジを削減して、コア４１を簡略化した企業ネットワークを生成する。簡略化した企業ネットワークはスター型のコアを含む。 Therefore, the analysis device 100 generates a corporate network in which the core 41 is simplified by reducing the edges included in the core 41 while leaving the hub nodes belonging to the core 41 . A simplified corporate network includes a star-shaped core.

図７は、企業ネットワークの簡略化例を示す図である。
企業ネットワーク６０は、企業ネットワーク４０を簡略化したものである。企業ネットワーク６０は、コア６１および周辺部６２を含む。コア６１は企業ネットワーク４０のコア４１に対応し、周辺部６２は企業ネットワーク４０の周辺部４２に対応する。 FIG. 7 is a diagram showing a simplified example of a corporate network.
Enterprise network 60 is a simplified version of enterprise network 40 . Corporate network 60 includes core 61 and perimeter 62 . Core 61 corresponds to core 41 of enterprise network 40 and perimeter 62 corresponds to perimeter 42 of enterprise network 40 .

コア６１は、コア４１と同じハブノードを含む。また、コア６１は中心ノード６３を含む。中心ノード６３は、コア４１に含まれていなかった新規ノードでもよいし、コア４１に含まれている何れか１つのハブノードでもよい。コア６１では、コア４１に含まれていたハブノード間のエッジは削除されている。その代わりにコア６１では、設定された中心ノード６３と各ハブノードとがエッジで直接接続され、スター型部分グラフを形成している。 Core 61 contains the same hub nodes as core 41 . Core 61 also includes central node 63 . The central node 63 may be a new node not included in the core 41 or any one hub node included in the core 41 . In core 61, the edges between hub nodes included in core 41 are deleted. Instead, in the core 61, the set central node 63 and each hub node are directly connected by edges to form a star subgraph.

周辺部６２は周辺部４２と同様である。周辺部６２に属するノードとコア６１に属するハブノードとは、企業ネットワーク４０と同様のエッジで接続されている。ただし、中心ノード６３がコア４１に属するハブノードの１つである場合、中心ノード６３と周辺部６２に属するノードとの間のエッジは削除される。すなわち、中心ノード６３はコア６１に属するハブノードのみと接続され、周辺部６２に属するノードとは接続されない。 Perimeter 62 is similar to perimeter 42 . Nodes belonging to the periphery 62 and hub nodes belonging to the core 61 are connected by edges similar to the corporate network 40 . However, if the central node 63 is one of the hub nodes belonging to the core 41, the edges between the central node 63 and the nodes belonging to the periphery 62 are deleted. That is, the core node 63 is connected only to the hub nodes belonging to the core 61 and not to the nodes belonging to the periphery 62 .

中心ノード６３が新規ノードである場合、分析装置１００は、中心ノード６３とコア６１に属するハブノードとの間にあるエッジのウェイトを一定値に統一する。
具体的には、分析装置１００は、簡略化前のコア４１の中で、コア４１に属するハブノード間の最短経路の経路長（距離）の平均値を算出する。このとき、コア４１に属するハブノード間の最短経路として、ハブノード同士を接続するエッジのみを辿る経路を探索すればよく、周辺部４２を経由する経路は考慮しなくてよい。分析装置１００は、全てのハブノードの組み合わせについて距離を算出して実際平均値を求めてもよい。また、分析装置１００は、コア４１から一部のハブノードの組み合わせをサンプリングし、サンプリングした組み合わせのみ距離を算出して推定平均値を求めてもよい。分析装置１００は、算出した平均値の２分の１を、中心ノード６３と各ハブノードとの間のエッジのウェイトとする。これにより、ハブノード間の平均距離でコア６１を通過できるようになる。 When the center node 63 is a new node, the analysis device 100 unifies the weights of the edges between the center node 63 and hub nodes belonging to the core 61 to a constant value.
Specifically, the analysis device 100 calculates the average path length (distance) of the shortest path between hub nodes belonging to the core 41 before simplification. At this time, as the shortest route between hub nodes belonging to the core 41, a route that follows only the edges connecting the hub nodes may be searched, and the route via the peripheral portion 42 need not be considered. The analysis device 100 may calculate the distances for all combinations of hub nodes and obtain an actual average value. Further, the analysis device 100 may sample some combinations of hub nodes from the core 41, calculate the distances of only the sampled combinations, and obtain an estimated average value. The analysis device 100 takes half of the calculated average value as the weight of the edge between the center node 63 and each hub node. This allows the average distance between hub nodes to pass through the core 61 .

一方、中心ノード６３が既存ハブノードの１つである場合、分析装置１００は、コア４１の中から何れか１つのハブノードを中心ノード６３として選択する。そして、分析装置１００は、中心ノード６３と他のハブノードそれぞれとの間にあるエッジのウェイトを、コア４１における当該２つのノードの間の距離に設定する。 On the other hand, if the central node 63 is one of the existing hub nodes, the analysis device 100 selects any one hub node from among the cores 41 as the central node 63 . Analysis device 100 then sets the weight of the edge between central node 63 and each of the other hub nodes to the distance between the two nodes in core 41 .

具体的には、分析装置１００は、コア４１に属するハブノードそれぞれに対して、中心性を表す指標として近接中心性を算出する。近接中心性は、あるノードが他のノードとの関係でグラフの中心にどの程度近いかを示す指標である。着目するノードをｖ_ｉ、他の１つのノードをｖ_ｊ、ノードｖ_ｉとノードｖ_ｊの距離をｄ（ｖ_ｉ，ｖ_ｊ）とする。すると、ノードｖ_ｉの近接中心性Ｃ_ｃ（ｖ_ｉ）は数式（４）のように定義される。数式（４）の近接中心性Ｃ_ｃ（ｖ_ｉ）は、他のノードとの間の距離の和の逆数である。ただし、ノード数をｎとして、ノードｖ_ｉの近接中心性Ｃ_ｃ（ｖ_ｉ）を数式（５）のように定義することもできる。数式（５）の近接中心性Ｃ_ｃ（ｖ_ｉ）は、他のノードとの間の平均距離の逆数である。 Specifically, the analysis device 100 calculates proximity centrality as an index representing centrality for each hub node belonging to the core 41 . Closeness centrality is a measure of how close a node is to the center of the graph in relation to other nodes. Let v _i be the node of interest, v _j be another node, and d(v _i , v _j ) be the distance between node v _i and v _j . Then, the proximity centrality C _c (v _i ) of node v _i is defined as Equation (4). The proximity centrality C _c (v _i ) in Equation (4) is the reciprocal of the sum of distances to other nodes. However, it is also possible to define the closeness centrality C _c (v _i ) of node v _i as shown in Equation (5), where n is the number of nodes. The proximity centrality C _c (v _i ) in Equation (5) is the reciprocal of the average distance between other nodes.

分析装置１００は、コア４１から近接中心性が最大のハブノードを中心ノード６３として選択する。中心ノード６３とコア６１の他のハブノードとの間にあるエッジのウェイトは、近接中心性を求める際に算出した当該２つのハブノードの間の距離を用いればよい。よって、中心ノード６３を新規ノードとする方法の場合、コア６１に含まれるエッジのウェイトは一定値に統一されるのに対し、中心ノード６３を既存ハブノードの１つとする方法の場合、コア６１に含まれるエッジのウェイトは統一されない。 The analysis device 100 selects the hub node with the maximum proximity centrality from the core 41 as the central node 63 . As the weight of the edge between the central node 63 and another hub node of the core 61, the distance between the two hub nodes calculated when obtaining the close centrality may be used. Therefore, in the case of using the central node 63 as a new node, the weights of the edges included in the core 61 are unified to a constant value, whereas in the case of using the central node 63 as one of the existing hub nodes, the core 61 Contained edges do not have uniform weights.

中心ノード６３を既存ハブノードの１つとする方法では、コア６１に含まれる複数のエッジはそれぞれ、コア４１のハブノード間の関係を考慮した異なるウェイトをもつ。よって、コア４１のハブノードの個性をコア６１に反映させることができ、企業ネットワーク６０を用いた経路探索の精度を向上させることができる。一方、中心ノード６３を新規ノードとする方法では、近接中心性の計算など中心ノード６３を選択するための処理が軽減される。また、ハブノード間の距離の平均値をサンプリングによって推定することが可能であり、ハブノード間の距離計算を軽減することもできる。 In the method in which the central node 63 is one of the existing hub nodes, each of the multiple edges included in the core 61 has a different weight considering the relationship between hub nodes of the core 41 . Therefore, the personality of the hub node of the core 41 can be reflected in the core 61, and the accuracy of route search using the corporate network 60 can be improved. On the other hand, in the method of setting the center node 63 as a new node, processing for selecting the center node 63 such as calculation of proximity centrality is reduced. Moreover, it is possible to estimate the average value of the distances between hub nodes by sampling, and it is also possible to reduce the distance calculation between hub nodes.

次に、分析装置１００の機能および処理手順について説明する。
図８は、分析装置の機能例を示すブロック図である。
分析装置１００は、グラフデータ記憶部１２１、接続情報生成部１２２、接続情報記憶部１２３、経路探索部１２４および探索結果表示部１２５を有する。グラフデータ記憶部１２１および接続情報記憶部１２３は、例えば、ＲＭＡ１０２またはＨＤＤ１０３の記憶領域を用いて実装される。接続情報生成部１２２、経路探索部１２４および探索結果表示部１２５は、例えば、ＣＰＵ１０１が実行するプログラムを用いて実装される。 Next, functions and processing procedures of the analyzer 100 will be described.
FIG. 8 is a block diagram showing an example of functions of the analyzer.
The analysis device 100 has a graph data storage unit 121 , a connection information generation unit 122 , a connection information storage unit 123 , a route search unit 124 and a search result display unit 125 . The graph data storage unit 121 and the connection information storage unit 123 are implemented using storage areas of the RMA 102 or the HDD 103, for example. The connection information generation unit 122, the route search unit 124, and the search result display unit 125 are implemented using programs executed by the CPU 101, for example.

グラフデータ記憶部１２１は、簡略化前の企業ネットワークを示すグラフデータを記憶する。グラフデータは、ＥＤＩＮＥＴやＴＤＮＥＴなどの公開データベースが開示する公開投資情報から生成される。グラフデータは、分析装置１００が生成してもよいし他の情報処理装置が生成してもよい。グラフデータは、それぞれ人または企業を表す複数のノードと、人や企業の間の関係を表す複数のエッジとを含む。エッジのウェイトは、接続情報生成部１２２においてグラフデータから自動的に算出される。 The graph data storage unit 121 stores graph data representing a corporate network before simplification. Graph data is generated from public investment information disclosed by public databases such as EDINET and TDNET. The graph data may be generated by the analysis device 100 or by another information processing device. The graph data includes multiple nodes, each representing a person or company, and multiple edges, representing relationships between the people or companies. Edge weights are automatically calculated from the graph data in the connection information generator 122 .

接続情報生成部１２２は、グラフデータ記憶部１２１に記憶されたグラフデータから、ノード間の接続関係を示す接続情報を生成する。接続情報は隣接行列として表現される。このとき、接続情報生成部１２２は、各ノードの次数をウェイトとして算出し、各ノードのウェイトを用いてノード間を接続する各エッジのウェイトを算出する。 The connection information generation unit 122 generates connection information indicating connection relationships between nodes from the graph data stored in the graph data storage unit 121 . Connection information is expressed as an adjacency matrix. At this time, the connection information generating unit 122 calculates the degree of each node as a weight, and uses the weight of each node to calculate the weight of each edge connecting the nodes.

接続情報の生成にあたり、接続情報生成部１２２は、簡略化フラグおよび次数閾値を含むオプション情報をユーザから受け付ける。簡略化フラグは、企業ネットワークの簡略化を行うか否かを示すフラグである。次数閾値は、企業ネットワークの簡略化を行う場合に適用されるパラメータであり、ハブノードとみなすノードの次数の閾値である。簡略化フラグがＯＦＦである場合、接続情報生成部１２２は、グラフデータが示す簡略化前の企業ネットワークに相当する元隣接行列を出力する。一方、簡略化フラグがＯＮである場合、接続情報生成部１２２は、簡略化前の企業ネットワークを簡略化後の企業ネットワークに変換し、簡略化後の企業ネットワークに相当する簡略化隣接行列を出力する。 In generating the connection information, the connection information generation unit 122 receives option information including a simplification flag and a degree threshold from the user. The simplification flag is a flag indicating whether or not to simplify the corporate network. The degree threshold is a parameter applied when simplifying a corporate network, and is a threshold for the degree of a node regarded as a hub node. When the simplification flag is OFF, the connection information generator 122 outputs the original adjacency matrix corresponding to the corporate network before simplification indicated by the graph data. On the other hand, if the simplification flag is ON, the connection information generation unit 122 converts the unsimplified corporate network into a simplified corporate network, and outputs a simplified adjacency matrix corresponding to the simplified corporate network. do.

接続情報記憶部１２３は、接続情報生成部１２２が生成した接続情報を記憶する。簡略化フラグがＯＦＦである場合、接続情報記憶部１２３には、簡略化前の企業ネットワークに相当する元隣接行列が記憶される。簡略化フラグがＯＮである場合、接続情報記憶部１２３には、簡略化後の企業ネットワークに相当する簡略化隣接行列が記憶される。接続情報が１度生成されれば、生成された接続情報を用いて経路探索を繰り返し実行できる。 The connection information storage unit 123 stores connection information generated by the connection information generation unit 122 . When the simplification flag is OFF, the connection information storage unit 123 stores the original adjacency matrix corresponding to the corporate network before simplification. When the simplification flag is ON, the connection information storage unit 123 stores a simplified adjacency matrix corresponding to the corporate network after simplification. Once connection information is generated, route searches can be repeatedly executed using the generated connection information.

経路探索部１２４は、接続情報記憶部１２３に記憶された接続情報を用いて、企業ネットワーク上の経路探索を行う。経路探索にあたり、経路探索部１２４は、始点ノード、終点ノード、出力する経路の個数およびウェイトを考慮するか否かのフラグを含むオプション情報をユーザから受け付ける。すると、経路探索部１２４は、ダイクストラ法に準じた経路探索アルゴリズムなどを用いて、経路長の短い方から優先的に始点ノードから終点ノードに至る複数の経路を探索する。ウェイトを考慮するか否かのフラグがＯＦＦである場合、経路探索部１２４は、接続情報が示すウェイトにかかわらず各エッジのウェイトを１とみなして経路探索を行う。ウェイトを考慮するか否かのフラグがＯＮである場合、経路探索部１２４は、接続情報が示すウェイトに基づいて経路探索を行う。 The route search unit 124 uses the connection information stored in the connection information storage unit 123 to search for routes on the corporate network. When searching for a route, the route searching unit 124 receives from the user option information including a start node, an end node, the number of routes to be output, and a flag indicating whether or not to consider weights. Then, the route search unit 124 searches for a plurality of routes from the start node to the end node, preferentially starting from the route having the shortest route length, using a route search algorithm based on Dijkstra's algorithm. When the flag indicating whether or not to consider the weight is OFF, the route search unit 124 considers the weight of each edge to be 1 regardless of the weight indicated by the connection information, and performs the route search. When the flag indicating whether or not to consider the weight is ON, the route search unit 124 performs route search based on the weight indicated by the connection information.

探索結果表示部１２５は、経路探索部１２４の探索結果をディスプレイ１１１に表示する。例えば、探索結果表示部１２５は、複数のノードおよび複数のエッジを含む企業ネットワークを図形情報として表示し、探索された経路を太線で表すなど企業ネットワーク上で強調表示する。また、例えば、探索結果表示部１２５は、経由するノードを列挙した文字列で経路を表現し、複数の経路を表す複数の文字列を並べたリストを表示する。また、探索結果表示部１２５は、探索結果をＨＤＤ１０３などのストレージに保存してもよいし、ネットワーク１１４を介して他の情報処理装置に送信してもよいし、分析装置１００が有する他の出力デバイスに出力するようにしてもよい。 A search result display unit 125 displays the search result of the route search unit 124 on the display 111 . For example, the search result display unit 125 displays a corporate network including a plurality of nodes and a plurality of edges as graphic information, and highlights the searched route on the corporate network, such as by using a thick line. Further, for example, the search result display unit 125 expresses a route with a character string listing nodes to be passed through, and displays a list in which a plurality of character strings representing a plurality of routes are arranged. Further, the search result display unit 125 may store the search result in a storage such as the HDD 103, may transmit the search result to another information processing apparatus via the network 114, or may transmit the search result to another information processing apparatus possessed by the analysis apparatus 100. You may make it output to a device.

図９は、グラフデータの例を示す図である。
ノードテーブル１３１は、企業ネットワークに含まれるノードを示す。ノードテーブル１３１は、グラフデータ記憶部１２１に記憶される。ノードテーブル１３１は、ノードＩＤ、ノード名およびウェイトの項目を含む。ノードＩＤは、ノードを識別する識別子である。ノード名は、人物名や企業名などノードの意味を表す文字列である。ウェイトは、接続情報生成部１２２によって算出されるノードの次数である。 FIG. 9 is a diagram showing an example of graph data.
A node table 131 indicates the nodes included in the corporate network. The node table 131 is stored in the graph data storage unit 121. FIG. The node table 131 includes items of node ID, node name and weight. A node ID is an identifier that identifies a node. A node name is a character string representing the meaning of a node, such as a person's name or company name. The weight is the degree of the node calculated by the connection information generator 122 .

エッジテーブル１３２は、企業ネットワークに含まれるエッジを示す。エッジテーブル１３２は、グラフデータ記憶部１２１に記憶される。エッジテーブル１３２は、エッジＩＤ、始点、終点およびウェイトの項目を含む。エッジＩＤは、エッジを識別する識別子である。始点は、エッジの一端にあるノードを識別する識別子である。終点は、エッジの他端にあるノードを識別する識別子である。ウェイトは、接続情報生成部１２２によって両端のノードのウェイトから算出されるエッジのウェイトである。 Edge table 132 shows the edges involved in the corporate network. The edge table 132 is stored in the graph data storage unit 121. FIG. The edge table 132 includes items of edge ID, start point, end point and weight. An edge ID is an identifier that identifies an edge. A starting point is an identifier that identifies a node at one end of an edge. An endpoint is an identifier that identifies a node at the other end of an edge. The weight is the edge weight calculated from the weights of the nodes at both ends by the connection information generating unit 122 .

図１０は、接続情報の例を示す図である。
隣接行列１３３は、企業ネットワークに含まれるノードの間に存在するエッジを示す接続情報である。隣接行列１３３は、接続情報記憶部１２３に記憶される。生成時に簡略化フラグがＯＦＦに指定された場合、接続情報記憶部１２３に保存される隣接行列１３３は、簡略化前の企業ネットワークに相当する元隣接行列である。生成時に簡略化フラグがＯＮに指定された場合、接続情報記憶部１２３に保存される隣接行列１３３は、簡略化後の企業ネットワークに相当する簡略化隣接行列である。 FIG. 10 is a diagram showing an example of connection information.
The adjacency matrix 133 is connection information indicating edges existing between nodes included in the corporate network. The adjacency matrix 133 is stored in the connection information storage unit 123 . When the simplification flag is designated as OFF at the time of generation, the adjacency matrix 133 stored in the connection information storage unit 123 is the original adjacency matrix corresponding to the corporate network before simplification. If the simplification flag is set to ON at the time of generation, the adjacency matrix 133 stored in the connection information storage unit 123 is a simplified adjacency matrix corresponding to the corporate network after simplification.

隣接行列１３３の行は、企業ネットワークに含まれるノードを識別するノードＩＤに対応付けられている。同様に、隣接行列１３３の列は、企業ネットワークに含まれるノードを識別するノードＩＤに対応付けられている。隣接行列１３３の１つの要素は、当該要素の行番号に対応するノードと当該要素の列番号に対応するノードとの間に存在するエッジのウェイトを示す。要素の値が０であることは、エッジが存在しないことを示す。接続情報生成部１２２は、まず簡略化前の企業ネットワークを表す元隣接行列を生成する。簡略化フラグがＯＮに指定された場合、接続情報生成部１２２は、元隣接行列を更新することで、簡略化後の企業ネットワークを示す簡略化隣接行列を生成する。 Rows of adjacency matrix 133 are associated with node IDs that identify nodes included in the corporate network. Similarly, the columns of adjacency matrix 133 are associated with node IDs that identify nodes included in the corporate network. One element of the adjacency matrix 133 indicates the weight of an edge existing between the node corresponding to the row number of the element and the node corresponding to the column number of the element. An element value of 0 indicates that no edge exists. The connection information generator 122 first generates an original adjacency matrix representing the corporate network before simplification. When the simplification flag is designated to be ON, the connection information generating unit 122 generates a simplified adjacency matrix indicating the corporate network after simplification by updating the original adjacency matrix.

次に、接続情報生成部１２２による企業ネットワークの簡略化について説明する。まず中心ノードとしてコアに新規ノードを追加する第１の方法を説明し、その次に中心ノードとしてコアの中から既存ハブノードの１つを選択する第２の方法を説明する。 Next, simplification of the corporate network by the connection information generation unit 122 will be described. A first method of adding a new node to a core as a central node is first described, and then a second method of selecting one of the existing hub nodes among the cores as the central node is described.

図１１は、グラフ簡略化の手順例を示すフローチャートである。
（Ｓ１０）接続情報生成部１２２は、ＥＤＩＮＥＴやＴＤＮＥＴが開示する公開投資情報に基づいて予め生成されたグラフデータをグラフデータ記憶部１２１から読み込む。また、接続情報生成部１２２は、ユーザが入力したオプション情報を読み込む。オプション情報は、簡略化フラグおよび次数閾値を含む。 FIG. 11 is a flow chart showing an example procedure for graph simplification.
(S10) The connection information generation unit 122 reads from the graph data storage unit 121 graph data generated in advance based on public investment information disclosed by EDINET or TDNET. Also, the connection information generation unit 122 reads option information input by the user. Optional information includes a simplification flag and a degree threshold.

（Ｓ１１）接続情報生成部１２２は、グラフデータに基づいて、各ノードに接続されたエッジをカウントして各ノードの次数をウェイト（次数ウェイト）として算出する。接続情報生成部１２２は、各エッジについて、当該エッジの両端にあるノードのウェイトを平均化して当該エッジのウェイトを算出する。 (S11) Based on the graph data, the connection information generator 122 counts the edges connected to each node and calculates the degree of each node as a weight (degree weight). The connection information generation unit 122 calculates the weight of each edge by averaging the weights of the nodes at both ends of the edge.

（Ｓ１２）接続情報生成部１２２は、簡略化前の企業ネットワークを示す元隣接行列を生成する。元隣接行列の行数および列数は、グラフデータが示すノードの数である。元隣接行列の要素は、ステップＳ１１で算出したエッジのウェイトである。 (S12) The connection information generation unit 122 generates an original adjacency matrix indicating the corporate network before simplification. The number of rows and the number of columns of the original adjacency matrix are the number of nodes indicated by the graph data. Elements of the original adjacency matrix are edge weights calculated in step S11.

（Ｓ１３）接続情報生成部１２２は、ステップＳ１０で読み込んだオプション情報に含まれる簡略化フラグがＯＮであるか判断する。簡略化フラグがＯＮである場合はステップＳ１４に進み、簡略化フラグがＯＦＦである場合はステップＳ２１に進む。 (S13) The connection information generator 122 determines whether the simplification flag included in the option information read in step S10 is ON. If the simplification flag is ON, the process proceeds to step S14, and if the simplification flag is OFF, the process proceeds to step S21.

（Ｓ１４）接続情報生成部１２２は、グラフデータまたは元隣接行列に基づいて、次数が閾値より大きいノードをハブノードとして企業ネットワークから抽出する。ここで使用する閾値は、ステップＳ１０で読み込んだオプション情報に含まれる次数閾値である。 (S14) Based on the graph data or the original adjacency matrix, the connection information generation unit 122 extracts a node whose degree is greater than the threshold as a hub node from the enterprise network. The threshold used here is the order threshold included in the option information read in step S10.

（Ｓ１５）接続情報生成部１２２は、２以上のハブノードがエッジによって連結されている連結部分グラフを抽出する。連結部分グラフは、例えば、元隣接行列からハブノードに対応する行および列のみを抽出することで特定できる。このとき、不連続な２つ以上の連結部分グラフが抽出されることがある。接続情報生成部１２２は、抽出された連結部分グラフのうちハブノード数が最大のものをコアと判定する。 (S15) The connection information generator 122 extracts a connected subgraph in which two or more hub nodes are connected by edges. A connected subgraph can be identified, for example, by extracting only the rows and columns corresponding to hub nodes from the original adjacency matrix. At this time, two or more discontinuous connected subgraphs may be extracted. The connection information generation unit 122 determines that the extracted connected subgraph with the largest number of hub nodes is the core.

（Ｓ１６）接続情報生成部１２２は、コアに属するハブノードの間の平均距離を算出する。例えば、接続情報生成部１２２は、コアに属する各ハブノードを始点ノードとして、ダイクストラ法などの経路探索アルゴリズムにより他のハブノードとの距離を算出する。接続情報生成部１２２は、コアに属する全てのハブノードの組の距離を網羅的に算出して平均値を求めてもよい。また、接続情報生成部１２２は、コアに属する幾つかのハブノードの組をサンプリングして平均距離の推定値を求めてもよい。 (S16) The connection information generator 122 calculates the average distance between hub nodes belonging to the core. For example, the connection information generation unit 122 uses each hub node belonging to the core as a starting node and calculates the distance to other hub nodes by a route search algorithm such as the Dijkstra method. The connection information generation unit 122 may exhaustively calculate the distances of all hub node pairs belonging to the core and obtain an average value. Also, the connection information generator 122 may obtain an estimate of the average distance by sampling several sets of hub nodes belonging to the core.

（Ｓ１７）接続情報生成部１２２は、コアに新規の中心ノードを追加する。例えば、接続情報生成部１２２は、隣接行列に対して１つの行および１つの列を追加する。
（Ｓ１８）接続情報生成部１２２は、コア内のハブノードの間のエッジ、すなわち、コア内のハブノードからコア内の別のハブノードへのエッジを削除する。例えば、接続情報生成部１２２は、隣接行列に含まれる要素のうち、コア内のハブノードに対応する行番号およびコア内のハブノードに対応する列番号をもつ要素の値を０に書き換える。そして、接続情報生成部１２２は、コア内の各ハブノードと中心ノードとの間のエッジを追加する。追加されるエッジは、隣接行列において、中心ノードに対応する行番号およびコア内のハブノードに対応する列番号をもつ要素と、コア内のハブノードに対応する行番号および中心ノードに対応する列番号をもつ要素に相当する。 (S17) The connection information generator 122 adds a new core node to the core. For example, the connection information generator 122 adds one row and one column to the adjacency matrix.
(S18) The connection information generator 122 deletes edges between hub nodes within the core, that is, edges from a hub node within the core to another hub node within the core. For example, among the elements included in the adjacency matrix, the connection information generation unit 122 rewrites the values of elements having row numbers corresponding to hub nodes in the core and column numbers corresponding to hub nodes in the core to zero. The connection information generating unit 122 then adds edges between each hub node and the central node in the core. The edge to be added is an element in the adjacency matrix with the row number corresponding to the central node and the column number corresponding to the hub node in the core, and the row number corresponding to the hub node in the core and the column number corresponding to the central node. corresponds to an element with

（Ｓ１９）接続情報生成部１２２は、ステップＳ１８で追加したエッジのウェイトを設定する。追加したエッジのウェイトは、ステップＳ１６で算出した平均距離の２分の１とする。例えば、接続情報生成部１２２は、隣接行列に含まれる要素のうち、ステップＳ１８で特定した追加エッジに相当する要素の値を平均距離の２分の１に設定する。 (S19) The connection information generator 122 sets weights for the edges added in step S18. The weight of the added edge is half the average distance calculated in step S16. For example, the connection information generating unit 122 sets the value of the element corresponding to the additional edge identified in step S18 among the elements included in the adjacency matrix to half the average distance.

（Ｓ２０）接続情報生成部１２２は、ステップＳ１４～Ｓ２０を通じて生成された簡略化隣接行列を接続情報記憶部１２３に出力する。そして、グラフ簡略化が終了する。
（Ｓ２１）接続情報生成部１２２は、ステップＳ１２で生成された元隣接行列を接続情報記憶部１２３に出力する。 (S20) The connection information generation unit 122 outputs the simplified adjacency matrix generated through steps S14 to S20 to the connection information storage unit 123. FIG. Then the graph simplification ends.
(S21) The connection information generation unit 122 outputs the original adjacency matrix generated in step S12 to the connection information storage unit 123. FIG.

図１２は、グラフ簡略化の他の手順例を示すフローチャートである。
ステップＳ３０～Ｓ３５，Ｓ４０，Ｓ４１の処理は、図１１のステップＳ１０～Ｓ１５，Ｓ２０，Ｓ２１と同様であるため説明を省略する。 FIG. 12 is a flowchart showing another procedure example of graph simplification.
Since the processes of steps S30 to S35, S40 and S41 are the same as those of steps S10 to S15, S20 and S21 of FIG. 11, description thereof will be omitted.

（Ｓ３６）接続情報生成部１２２は、コアに属する各ハブノードの近接中心性を算出する。例えば、接続情報生成部１２２は、コアに属する各ハブノードを始点ノードとして、ダイクストラ法などの経路探索アルゴリズムにより他のハブノードとの距離を算出する。接続情報生成部１２２は、例えば、ハブノード毎に、他のハブノードとの間の距離の和の逆数または平均距離の逆数を近接中心性として算出する。 (S36) The connection information generator 122 calculates the proximity centrality of each hub node belonging to the core. For example, the connection information generation unit 122 uses each hub node belonging to the core as a starting node and calculates the distance to other hub nodes by a route search algorithm such as the Dijkstra method. For example, the connection information generation unit 122 calculates, for each hub node, the reciprocal of the sum of distances to other hub nodes or the reciprocal of the average distance as proximity centrality.

（Ｓ３７）接続情報生成部１２２は、コア内のハブノードの中から、ステップＳ３６で算出した近接中心性が最大であるハブノードと中心ノードとして選択する。
（Ｓ３８）接続情報生成部１２２は、コア内のハブノードの間のエッジ、すなわち、コア内のハブノードからコア内の別のハブノードへのエッジを削除する。そして、接続情報生成部１２２は、ステップＳ３７で選択した中心ノードとコア内の他のハブノードとの間のエッジを追加する。追加されるエッジは、隣接行列において、中心ノードに対応する行番号および他のハブノードに対応する列番号をもつ要素と、他のハブノードに対応する行番号および中心ノードに対応する列番号をもつ要素に相当する。 (S37) The connection information generating unit 122 selects the hub node and central node having the maximum proximity centrality calculated in step S36 from the hub nodes in the core.
(S38) The connection information generator 122 deletes the edge between the hub nodes within the core, that is, the edge from the hub node within the core to another hub node within the core. The connection information generator 122 then adds edges between the center node selected in step S37 and other hub nodes within the core. The edges to be added are elements in the adjacency matrix that have the row number corresponding to the central node and the column number corresponding to the other hub node, and the elements that have the row number corresponding to the other hub node and the column number corresponding to the central node. corresponds to

（Ｓ３９）接続情報生成部１２２は、ステップＳ３８で追加したエッジのウェイトを設定する。追加したエッジのウェイトは、当該エッジの両端にあるハブノードの間の距離とする。この距離は、簡略化前の企業ネットワークの接続関係を基準に算出される距離であり、ステップＳ３６で近接中心性を求める過程で算出された距離を用いることができる。例えば、接続情報生成部１２２は、隣接行列に含まれる要素のうち、ステップＳ３８で特定した追加エッジに相当する要素の値を、両端のノードに応じた距離に設定する。 (S39) The connection information generator 122 sets weights for the edges added in step S38. The weight of the added edge is the distance between hub nodes at both ends of the edge. This distance is a distance calculated based on the connection relationship of the corporate network before simplification, and the distance calculated in the process of obtaining the close centrality in step S36 can be used. For example, the connection information generation unit 122 sets the value of the element corresponding to the additional edge identified in step S38 among the elements included in the adjacency matrix to the distance according to the nodes at both ends.

第２の実施の形態の分析装置１００によれば、企業ネットワークから次数の大きいハブノードが検出され、複数のハブノードが連結したコアが検出される。そして、コアに中心ノードが設定され、コアに属するハブノード同士はコアを介して接続されているスター型コアになるように企業ネットワークが簡略化される。これにより、コアを通過する経路を探索するための計算量が削減され、企業ネットワーク上での経路探索が高速化される。また、コアを１つの仮想ノードに置換するなどの極端な簡略化と比べて、コアに属するハブノードは削除されず、コアに属するハブノードと周辺部に属する周辺ノードとの間の接続関係の情報が維持される。よって、経路探索の精度の低下を抑制できる。 According to the analysis apparatus 100 of the second embodiment, a hub node with a large degree is detected from a corporate network, and a core connecting a plurality of hub nodes is detected. A central node is set in the core, and the corporate network is simplified to form a star core in which hub nodes belonging to the core are connected via the core. This reduces the amount of computation for finding a route through the core and speeds up the route finding on the corporate network. Also, compared to extreme simplification such as replacing the core with one virtual node, the hub node belonging to the core is not deleted, and the information on the connection relationship between the hub node belonging to the core and the peripheral node belonging to the periphery is lost. maintained. Therefore, it is possible to suppress a decrease in route search accuracy.

１０情報処理装置
１１記憶部
１２処理部
１３，１４グラフデータ
１５コアノード集合
１６中心ノード REFERENCE SIGNS LIST 10 information processing device 11 storage unit 12 processing unit 13, 14 graph data 15 core node set 16 central node

Claims

the computer
obtaining graph data including a plurality of nodes and a plurality of edges connecting the nodes;
detecting, from among the plurality of nodes indicated by the graph data, a plurality of hub nodes each having a number of connected edges greater than a threshold value, and a core node including three or more hub nodes connected by edges among the plurality of detected hub nodes; determine the set,
Based on the distance between each hub node belonging to the core node set and other hub nodes belonging to the core node set, a core node is selected from hub nodes belonging to the core node set, and belonging to the core node set other than the core node set. generating other graph data including edges connecting hub nodes belonging to the core node set other than the central node and the central node instead of the edges connecting the hub nodes;
Graph simplification method.

In setting the core node, the hub node belonging to the core node set other than the core node and the core node are connected based on the distance between the core node and the hub node belonging to the core node set other than the core node. determine the edge weight,
The graph simplification method of claim 1 .

searching for a path between a first node that does not belong to the core node set and a second node that does not belong to the core node set based on the other graph data;
The graph simplification method of claim 1.

at least some of the plurality of nodes are indicative of an enterprise;
at least a portion of the plurality of edges are indicative of relationships between companies;
The plurality of hub nodes indicate hub companies with which there are more other companies with which they have a relationship than the companies indicated by the nodes other than the plurality of hub nodes,
The graph simplification method of claim 1.

to the computer,
obtaining graph data including a plurality of nodes and a plurality of edges connecting the nodes;
detecting, from among the plurality of nodes indicated by the graph data, a plurality of hub nodes each having a number of connected edges greater than a threshold value, and a core node including three or more hub nodes connected by edges among the plurality of detected hub nodes; determine the set,
Based on the distance between each hub node belonging to the core node set and other hub nodes belonging to the core node set, a core node is selected from hub nodes belonging to the core node set, and belonging to the core node set other than the core node set. generating other graph data including edges connecting hub nodes belonging to the core node set other than the central node and the central node instead of the edges connecting the hub nodes;
A graph simplification program that does the work.

a storage unit that stores graph data including a plurality of nodes and a plurality of edges connecting the nodes;
detecting, from among the plurality of nodes indicated by the graph data, a plurality of hub nodes each having a number of connected edges greater than a threshold value, and a core node including three or more hub nodes connected by edges among the plurality of detected hub nodes; determining a set , selecting a core node from hub nodes belonging to the core node set based on the distance between each hub node belonging to the core node set and other hub nodes belonging to the core node set; a processing unit that generates other graph data including edges connecting hub nodes belonging to the core node set other than the core node and the core node instead of the edges connecting the hub nodes belonging to the core node set;
Information processing device having