JP6638919B2

JP6638919B2 - Clustering device, clustering method, and clustering program

Info

Publication number: JP6638919B2
Application number: JP2016095846A
Authority: JP
Inventors: 淳也新井; 鬼塚　真; 真鬼塚
Original assignee: Nippon Telegraph and Telephone Corp; Osaka University NUC
Current assignee: Nippon Telegraph and Telephone Corp; Osaka University NUC
Priority date: 2016-05-12
Filing date: 2016-05-12
Publication date: 2020-02-05
Anticipated expiration: 2036-05-12
Also published as: JP2017204161A

Description

本発明は、クラスタリング装置、クラスタリング方法およびクラスタリングプログラムに関する。 The present invention relates to a clustering device, a clustering method, and a clustering program.

従来、グラフのクラスタリング手法として、逐次ノード集約によるクラスタリングが知られている（例えば特許文献１参照）。このような逐次ノード集約によるクラスタリングでは、例えば、まず、グラフに含まれる全てのノードとエッジが入力される。そして、任意のノードが１つ選択され、選択されたノードは隣接するノードの１つへ集約される。このとき、集約先のノードとしては、集約によって生じるモジュラリティ向上量を最大化するものが選択される。 2. Description of the Related Art Conventionally, as a graph clustering method, clustering based on sequential node aggregation is known (for example, see Patent Document 1). In such clustering by successive node aggregation, for example, first, all nodes and edges included in the graph are input. Then, one arbitrary node is selected, and the selected nodes are collected into one of the adjacent nodes. At this time, the node that maximizes the amount of modularity improvement caused by the aggregation is selected as the aggregation destination node.

また、マルチＣＰＵまたはマルチコアＣＰＵを用いた並列計算機で行われるクラスタリング手法として、グラフ分割による並列クラスタリングが知られている（例えば特許文献２参照）。このようなグラフ分割による並列クラスタリングでは、例えば、まず、グラフは複数の部分グラフに分割される。そして、それぞれの部分グラフは、並列に動作する各スレッドへ割り当てられる。ここで、各スレッドは、割り当てられた部分グラフについてモジュラリティを最大化するような分類を行う。そして、全てのスレッドにおいて分類が完了した後、同じクラスタに分類されたノードは１つのノードに集約される。 Also, as a clustering method performed by a parallel computer using a multi-CPU or a multi-core CPU, parallel clustering by graph division is known (for example, see Patent Document 2). In such parallel clustering by graph division, for example, a graph is first divided into a plurality of subgraphs. Then, each subgraph is assigned to each thread that operates in parallel. Here, each thread performs classification to maximize the modularity of the assigned subgraph. Then, after the classification is completed in all the threads, the nodes classified into the same cluster are aggregated into one node.

特開２０１３−１５６６９８号公報JP 2013-156698 A 特開２０１４−１６０３３６号公報JP 2014-160336 A

しかしながら、従来のクラスタリング手法には、グラフデータに対し、高速かつ処理結果のモジュラリティが大きいクラスタリングを行うことができない場合があるという問題があった。 However, the conventional clustering method has a problem that it may not be possible to perform high-speed clustering with high modularity of processing results on graph data.

例えば、逐次ノード集約によるクラスタリングは、１スレッドで行われるため、複数のスレッドでクラスタリングを行う場合と比べると、大規模グラフのクラスタリングを高速に行うことができない場合がある。 For example, since clustering by sequential node aggregation is performed by one thread, clustering of a large-scale graph may not be performed at a higher speed than when clustering is performed by a plurality of threads.

また、例えば、グラフ分割による並列クラスタリングは、分割された複数の部分グラフのそれぞれの中でのモジュラリティを大きくするものであるため、グラフ全体としてのモジュラリティが大きくならない場合がある。仮に、第１のノードと、第２のノードがそれぞれ異なる部分グラフへ分割されてしまった場合、第２のノードが第１のノードに集約された場合に最もモジュラリティが大きくなるとしても、第１のノードと第２のノードが集約されることはない。 In addition, for example, parallel clustering by graph division increases modularity in each of a plurality of divided subgraphs, so that the modularity of the entire graph may not increase. If the first node and the second node are respectively divided into different subgraphs, even if the second node is aggregated into the first node, the modularity becomes maximum, The first node and the second node are not aggregated.

本発明のクラスタリング装置は、複数のノードを有するグラフデータの入力を受け付ける入力部と、前記複数のノードをＮ個（Ｎは２以上の整数）の群に分割し、前記Ｎ個の群のそれぞれに、処理を実行する単位であるＮ個のスレッドの各スレッドを割り当てる割当部と、前記各スレッドで、前記割当部によって割り当てられた群に含まれるノードから所定の順番でノードを選択する選択部と、前記各スレッドで、前記選択部によってノードが選択されるたびに、前記選択部によって選択されたノードである選択ノードに関する情報を所定の状態に変更する変更部と、前記各スレッドで、前記変更部によって前記選択ノードに関する情報が変更されるたびに、前記選択ノードに隣接するノードまたはクラスタの中から、前記選択ノードと同じクラスタに分類された場合にクラスタリング処理結果の精度を表すモジュラリティが最大となるノードまたはクラスタである分類先を抽出する抽出部と、前記各スレッドで、前記抽出部によって前記分類先が抽出されるたびに、前記分類先に関する情報が他スレッドによって前記所定の状態に変更されているか否かを判定し、変更されていないと判定した場合、前記選択ノードと前記分類先を同一のクラスタに分類する分類部と、前記各スレッドで、前記分類部によって分類されたノードのうち、クラスタが同一である複数のノードを、１つのノードに集約する集約部と、を有することを特徴とする。 A clustering device according to the present invention includes an input unit that receives input of graph data having a plurality of nodes, and divides the plurality of nodes into N (N is an integer of 2 or more) groups, and each of the N groups An allocating unit for allocating each of N threads as a unit for executing processing; and a selecting unit for selecting a node in a predetermined order from the nodes included in the group allocated by the allocating unit in each of the threads. A changing unit that changes information about a selected node, which is a node selected by the selecting unit, to a predetermined state each time a node is selected by the selecting unit in each of the threads; Each time the information about the selected node is changed by the changing unit, the same as the selected node is selected from nodes or clusters adjacent to the selected node. An extraction unit that extracts a destination that is a node or a cluster that has the highest modularity representing the accuracy of the clustering processing result when classified into a raster; and the extraction unit extracts the classification destination in each of the threads. Each time, it is determined whether or not the information on the classification destination has been changed to the predetermined state by another thread. If it is determined that the information has not been changed, the selected node and the classification destination are classified into the same cluster. It is characterized by having a classifying unit and an aggregating unit for aggregating a plurality of nodes having the same cluster among nodes classified by the classifying unit in each thread into one node.

また、本発明のクラスタリング方法は、クラスタリング装置で実行されるクラスタリング方法であって、複数のノードを有するグラフデータの入力を受け付ける入力工程と、前記複数のノードをＮ個の群に分割し、前記Ｎ個の群のそれぞれに、処理を実行する単位であるＮ個のスレッドの各スレッドを割り当てる割当工程と、前記各スレッドで、前記割当工程によって割り当てられた群に含まれるノードから所定の順番でノードを選択する選択工程と、前記各スレッドで、前記選択工程によってノードが選択されるたびに、前記選択工程によって選択されたノードである選択ノードに関する情報を所定の状態に変更する変更工程と、前記各スレッドで、前記変更工程によって前記選択ノードに関する情報が変更されるたびに、前記選択ノードに隣接するノードまたはクラスタの中から、前記選択ノードと同じクラスタに分類された場合にクラスタリング処理結果の精度を表すモジュラリティが最大となるノードまたはクラスタである分類先を抽出する抽出工程と、前記各スレッドで、前記抽出工程によって前記分類先が抽出されるたびに、前記分類先に関する情報が他スレッドによって前記所定の状態に変更されているか否かを判定し、変更されていないと判定した場合、前記選択ノードと前記分類先を同一のクラスタに分類する分類工程と、前記各スレッドで、前記分類工程によって分類されたノードのうち、クラスタが同一である複数のノードを、１つのノードに集約する集約工程と、を含んだことを特徴とする。 Further, the clustering method of the present invention is a clustering method executed by a clustering device, comprising: an input step of receiving input of graph data having a plurality of nodes; and dividing the plurality of nodes into N groups. An allocating step of allocating each thread of N threads as a unit for executing processing to each of the N groups; and in each of the threads, in a predetermined order from nodes included in the group allocated by the allocating step. A selecting step of selecting a node, and in each of the threads, each time a node is selected by the selecting step, a changing step of changing information about a selected node that is a node selected by the selecting step to a predetermined state; In each of the threads, each time information about the selected node is changed by the changing step, the selected node An extraction step of extracting, from adjacent nodes or clusters, a classification destination that is a node or a cluster with the highest modularity representing the accuracy of the clustering processing result when classified into the same cluster as the selected node; In a thread, every time the classification destination is extracted by the extraction step, it is determined whether information about the classification destination has been changed to the predetermined state by another thread, and if it is determined that the information has not been changed, A classifying step of classifying the selected node and the classifying destination into the same cluster; and, in each thread, among the nodes classified by the classifying step, a plurality of nodes having the same cluster are aggregated into one node. And an aggregation step.

本発明によれば、グラフデータに対し、高速かつ処理結果のモジュラリティが大きいクラスタリングを行うことができる。 According to the present invention, clustering can be performed on graph data at high speed and with high modularity of processing results.

図１は、第１の実施形態に係るクラスタリング装置の構成の一例を示す図である。FIG. 1 is a diagram illustrating an example of a configuration of the clustering device according to the first embodiment. 図２は、ノード群のスレッドへの割り当てについて説明するための図である。FIG. 2 is a diagram for explaining assignment of a node group to a thread. 図３は、ノードの選択およびクラスタの抽出について説明するための図である。FIG. 3 is a diagram for explaining selection of a node and extraction of a cluster. 図４は、ノードの分類およびノードの集約について説明するための図である。FIG. 4 is a diagram for explaining node classification and node aggregation. 図５は、第１の実施形態に係るクラスタリング装置の処理の流れの一例を示すフローチャートである。FIG. 5 is a flowchart illustrating an example of a processing flow of the clustering device according to the first embodiment. 図６は、第１の実施形態に係るクラスタリング装置における各スレッドでの処理の流れの一例を示すフローチャートである。FIG. 6 is a flowchart illustrating an example of a processing flow in each thread in the clustering device according to the first embodiment. 図７は、クラスタのメンバリストのデータ構造の一例を示す図である。FIG. 7 is a diagram illustrating an example of a data structure of a cluster member list. 図８は、第１の実施形態に係るクラスタリング装置における分類部の処理の流れの一例を示すフローチャートである。FIG. 8 is a flowchart illustrating an example of a processing flow of the classification unit in the clustering device according to the first embodiment. 図９は、モジュラリティが小さいクラスタリングが行われる場合の例について説明する図である。FIG. 9 is a diagram illustrating an example in which clustering with small modularity is performed. 図１０は、プログラムが実行されることにより、クラスタリング装置が実現されるコンピュータの一例を示す図である。FIG. 10 is a diagram illustrating an example of a computer in which a clustering device is realized by executing a program.

以下に、本願に係るクラスタリング装置、クラスタリング方法およびクラスタリングプログラムの実施形態を図面に基づいて詳細に説明する。なお、この実施形態は本発明を限定するものではない。 Hereinafter, embodiments of a clustering device, a clustering method, and a clustering program according to the present application will be described in detail with reference to the drawings. Note that this embodiment does not limit the present invention.

［第１の実施形態の構成］
まず、図１を用いて、第１の実施形態に係るクラスタリング装置の構成について説明する。図１は、第１の実施形態に係るクラスタリング装置の構成の一例を示す図である。図１に示すように、クラスタリング装置１０は、入力されたグラフデータ２０に対してクラスタリング処理を行い、クラスタリング結果２１を出力する。ここで、グラフデータ２０は、複数のノード、およびノードごとのエッジに関する情報を表すデータである。また、クラスタリング結果２１は、例えば、グラフデータ２０に含まれるノードごとの分類されたクラスタを表すデータである。また、図１に示すように、クラスタリング装置１０は、入力部１１、出力部１２、制御部１３および記憶部１４を有する。 [Configuration of First Embodiment]
First, the configuration of the clustering device according to the first embodiment will be described with reference to FIG. FIG. 1 is a diagram illustrating an example of a configuration of the clustering device according to the first embodiment. As shown in FIG. 1, the clustering apparatus 10 performs a clustering process on the input graph data 20 and outputs a clustering result 21. Here, the graph data 20 is data representing information on a plurality of nodes and edges for each node. The clustering result 21 is, for example, data representing a classified cluster for each node included in the graph data 20. As shown in FIG. 1, the clustering device 10 includes an input unit 11, an output unit 12, a control unit 13, and a storage unit 14.

入力部１１は、グラフデータ２０の入力を受け付ける。例えば、入力部１１は、外部の記憶装置に記憶されているグラフデータ２０を受信することによって入力を受け付けてもよい。この場合、入力部１１は、例えば、ＮＩＣ（Network Interface Card）である。また、入力部１１は、ユーザによるグラフデータ２０の入力を受け付けてもよい。この場合、入力部１１は、例えば、マウスやキーボード等の入力装置である。 The input unit 11 receives input of the graph data 20. For example, the input unit 11 may receive an input by receiving graph data 20 stored in an external storage device. In this case, the input unit 11 is, for example, a NIC (Network Interface Card). Further, the input unit 11 may receive an input of the graph data 20 by a user. In this case, the input unit 11 is, for example, an input device such as a mouse or a keyboard.

出力部１２は、クラスタリング結果２１を出力する。出力部１２は、外部の記憶装置にクラスタリング結果２１を送信するようにしてもよい。この場合、出力部１２は、例えば、ＮＩＣである。また、出力部１２は、クラスタリング結果２１を画面に表示するようにしてもよい。この場合、出力部１２は、例えば、ディスプレイ等の表示装置である。 The output unit 12 outputs a clustering result 21. The output unit 12 may transmit the clustering result 21 to an external storage device. In this case, the output unit 12 is, for example, an NIC. Further, the output unit 12 may display the clustering result 21 on a screen. In this case, the output unit 12 is, for example, a display device such as a display.

制御部１３は、クラスタリング装置１０を制御する。制御部１３は、例えば、ＣＰＵ（Central Processing Unit）、ＭＰＵ（Micro Processing Unit）等の電子回路や、ＡＳＩＣ（Application Specific Integrated Circuit）、ＦＰＧＡ（Field Programmable Gate Array）等の集積回路である。また、制御部１３は、各種の処理手順を規定したプログラムや制御データを格納するための内部メモリを有し、内部メモリを用いて各処理を実行する。また、制御部１３は、各種のプログラムが動作することにより各種の処理部として機能する。 The control unit 13 controls the clustering device 10. The control unit 13 is, for example, an electronic circuit such as a CPU (Central Processing Unit) or an MPU (Micro Processing Unit), or an integrated circuit such as an ASIC (Application Specific Integrated Circuit) or an FPGA (Field Programmable Gate Array). Further, the control unit 13 has an internal memory for storing programs and control data defining various processing procedures, and executes each process using the internal memory. Further, the control unit 13 functions as various processing units when various programs operate.

また、本実施形態において、制御部１３は、複数のスレッドで処理を実行することで、クラスタリング装置１０を並列計算機として機能させる。例えば、制御部１３は、複数のＣＰＵで構成されるマルチＣＰＵや、複数のコアを有するマルチコアＣＰＵによって実現される。また、制御部１３は、割当部１３１、選択部１３２、変更部１３３、集約部１３４、抽出部１３５および分類部１３６を有する。 Further, in the present embodiment, the control unit 13 causes the clustering device 10 to function as a parallel computer by executing a process with a plurality of threads. For example, the control unit 13 is realized by a multi-CPU composed of a plurality of CPUs or a multi-core CPU having a plurality of cores. Further, the control unit 13 includes an assignment unit 131, a selection unit 132, a change unit 133, an aggregation unit 134, an extraction unit 135, and a classification unit 136.

割当部１３１は、複数のノードをＮ個（Ｎは２以上の整数）の群に分割し、Ｎ個の群のそれぞれに、処理を実行する単位であるＮ個のスレッドの各スレッドを割り当てる。割当部１３１による処理は、処理全体に要する時間と比べて極めて短い時間で完了するため、割当部１３１は、１つのスレッドで処理を実行してもよい。 The allocating unit 131 divides the plurality of nodes into N groups (N is an integer of 2 or more), and allocates each of N threads, which is a unit for executing processing, to each of the N groups. Since the processing by the allocating unit 131 is completed in a very short time as compared with the time required for the entire processing, the allocating unit 131 may execute the processing by one thread.

例えば、グラフデータ２０に含まれるノードの数がｎ個（ｎは２以上の整数）、スレッドの数がＮ個である場合、割当部１３１は、先頭のｎ／Ｎ個のノードをＶ_１群、次のｎ／Ｎ個のノードをＶ_２群、というようにノードを分割してもよい。さらに、この場合、割当部１３１は、例えば、Ｖ_１群のノードを１個目のスレッド、Ｖ_２群のノードを２個目のスレッドというように割り当てる。 For example, the number of nodes included in the graph data 20 are n (n is an integer of 2 or more), if the number of threads is N pieces, allocation unit 131, _a group the leading n / N number of nodes V , _two groups following n / n pieces of node V, node may be divided and so on. Furthermore, in this case, assignment unit 131, for example, assign the node V ₁ group 1 th thread, the node of V ₂ groups so that two first threads.

ここで、Ｎは、この後のステップで行われる処理の並列度に相当し、一般的にはＣＰＵが持つコア数、すなわち総ハードウェアスレッド数と同一である。また、スレッドが増加するほど、各スレッドに割り当てられるノードの数は少なくなり、各スレッドおよび装置全体における処理時間が短縮される。 Here, N corresponds to the degree of parallelism of the processing performed in the subsequent steps, and is generally equal to the number of cores of the CPU, that is, the total number of hardware threads. Further, as the number of threads increases, the number of nodes assigned to each thread decreases, and the processing time of each thread and the entire apparatus decreases.

図２を用いて、割当部１３１による、ノード群のスレッドへの割り当てについて説明する。図２は、ノード群のスレッドへの割り当てについて説明するための図である。図２に示すように、グラフデータ２０は、ノード１０１〜１０３、２０１〜２０３、３０１〜３０３および４０１〜４０３を有する。また、制御部１３は、スレッドｔ１〜ｔ４の４個のスレッドで処理を行う。このとき、割当部１３１は、例えば、ノード１０１〜１０３をスレッドｔ１に割り当て、ノード２０１〜２０３をスレッドｔ２に割り当て、ノード３０１〜３０３をスレッドｔ３に割り当て、ノード４０１〜４０３をスレッドｔ４に割り当てる。なお、図２に示す通り、各ノード群のノードのそれぞれは、隣接していなくてもよい。 The assignment of a node group to a thread by the assignment unit 131 will be described with reference to FIG. FIG. 2 is a diagram for explaining assignment of a node group to a thread. As shown in FIG. 2, the graph data 20 has nodes 101 to 103, 201 to 203, 301 to 303, and 401 to 403. In addition, the control unit 13 performs processing in four threads t1 to t4. At this time, the allocating unit 131 allocates the nodes 101 to 103 to the thread t1, the nodes 201 to 203 to the thread t2, the nodes 301 to 303 to the thread t3, and the nodes 401 to 403 to the thread t4. In addition, as shown in FIG. 2, each of the nodes in each node group may not be adjacent to each other.

選択部１３２、変更部１３３、集約部１３４、抽出部１３５および分類部１３６は、各スレッドで並列処理を行う。すなわち、例えばスレッドｔ１で選択部１３２、変更部１３３、集約部１３４、抽出部１３５および分類部１３６による処理が行われている間に、スレッドｔ２でも選択部１３２、変更部１３３、集約部１３４、抽出部１３５および分類部１３６による処理が行われる。 The selecting unit 132, the changing unit 133, the aggregating unit 134, the extracting unit 135, and the classifying unit 136 perform parallel processing in each thread. That is, for example, while the processing is being performed by the selection unit 132, the change unit 133, the aggregation unit 134, the extraction unit 135, and the classification unit 136 in the thread t1, the selection unit 132, the change unit 133, the aggregation unit 134, Processing by the extraction unit 135 and the classification unit 136 is performed.

各ノードは、分類されるクラスタを識別するラベルが付けられることによって分類される。また、割当部１３１によって各スレッドに割り当てられた時点ではクラスタへの分類は行われていないため、各ノードにはそれぞれ異なるラベルが付けられていることとする。すなわち、各ノードは、１個のノードからなるクラスタに分類されていることになる。 Each node is classified by being labeled to identify the cluster to be classified. At the time when the threads are assigned to the respective threads by the assigning unit 131, since the classification into the clusters is not performed, it is assumed that the respective nodes have different labels. That is, each node is classified into a cluster including one node.

選択部１３２は、各スレッドで、割当部１３１によって割り当てられた群に含まれるノードから所定の順番でノードを選択する。割り当てられたノード群の中に、未選択のノードがある場合、選択部１３２は、ノード群からノードを選択する。選択部１３２がノードを選択する順番は、任意に設定することができる。 The selecting unit 132 selects, in each thread, nodes in a predetermined order from nodes included in the group assigned by the assigning unit 131. When there is an unselected node in the assigned node group, the selection unit 132 selects a node from the node group. The order in which the selection unit 132 selects the nodes can be arbitrarily set.

変更部１３３は、各スレッドで、選択部１３２によってノードが選択されるたびに、選択部１３２によって選択されたノードである選択ノードに関する情報を所定の状態に変更する。例えば、変更部１３３は、選択部１３２によって選択ノードが分類されているクラスタを、所定の状態、すなわち変更禁止状態にする。変更部１３３は、例えばクラスタを表すデータの所定のフラグをオンにすること等により、当該クラスタが変更禁止であることを示す変更禁止マークを付ける。 The changing unit 133 changes information on the selected node, which is the node selected by the selecting unit 132, to a predetermined state every time a node is selected by the selecting unit 132 in each thread. For example, the change unit 133 sets the cluster in which the selected node is classified by the selection unit 132 to a predetermined state, that is, a change prohibited state. The changing unit 133 puts a change prohibition mark indicating that the cluster is prohibited from being changed, for example, by turning on a predetermined flag of data representing the cluster.

集約部１３４は、各スレッドで、分類部１３６によって分類されたノードのうち、クラスタが同一である複数のノードを、１つのノードに集約する。集約部１３４は、例えば、変更部１３３によって選択ノードに関する情報が変更された後、かつ抽出部１３５によって分類先が抽出される前に、選択ノードと、選択ノードと分類されたクラスタが同一であるノードとを１つのノードに集約する。 The aggregation unit 134 aggregates a plurality of nodes having the same cluster among the nodes classified by the classification unit 136 into one node in each thread. For example, after the information about the selected node is changed by the change unit 133 and before the classification unit is extracted by the extraction unit 135, the aggregation unit 134 has the same cluster as the selected node and the cluster classified as the selected node. Nodes are aggregated into one node.

集約部１３４は、選択ノードが分類されているクラスタに複数のノードが含まれている場合、当該複数のノードを選択ノードに集約する。具体的に、集約部１３４は、当該クラスタの選択ノード以外のノードを削除し、削除したノードのエッジを選択ノードに付け替える。 When the cluster into which the selected node is classified includes a plurality of nodes, the aggregation unit 134 aggregates the plurality of nodes into the selected node. Specifically, the aggregation unit 134 deletes a node other than the selected node of the cluster, and replaces the edge of the deleted node with the selected node.

抽出部１３５は、各スレッドで、変更部１３３によって選択ノードに関する情報が変更されるたびに、選択ノードに隣接するノードまたはクラスタの中から、選択ノードと同じクラスタに分類された場合にクラスタリング処理結果の精度を表すモジュラリティが最大となるノードまたはクラスタである分類先を抽出する。 Every time when the information about the selected node is changed by the changing unit 133 in each thread, the extraction unit 135 performs the clustering processing result when the node or the cluster adjacent to the selected node is classified into the same cluster as the selected node. A classification destination that is a node or a cluster having the maximum modularity representing the precision of the data is extracted.

なお、抽出部１３５は、集約部１３４による処理が行われる場合、ノードの集約が行われた後に処理を実行する。また、抽出部１３５による抽出の対象となる、選択ノードに隣接するノード、および選択ノードに隣接するクラスタに含まれるノードは、選択ノードと同じ群のノードであってもよいし、異なる群のノードであってもよい。 When the processing by the aggregation unit 134 is performed, the extraction unit 135 executes the processing after the aggregation of the nodes. Further, the nodes adjacent to the selected node and the nodes included in the cluster adjacent to the selected node, which are to be extracted by the extraction unit 135, may be the same group of nodes as the selected node, or may be different groups of nodes. It may be.

図３を用いて、ノードの選択およびクラスタの抽出について説明する。図３は、ノードの選択およびクラスタの抽出について説明するための図である。図３に示すように、選択ノードがノード１０１である場合、選択ノードに隣接するクラスタは、クラスタｃ１〜ｃ４である。なお、ノード１０２および２０３は、１つのノードから構成されるクラスタｃ３およびｃ５とみなされる。なお、図中のｕが付されたノードは、選択ノードである。 The selection of a node and the extraction of a cluster will be described with reference to FIG. FIG. 3 is a diagram for explaining selection of a node and extraction of a cluster. As shown in FIG. 3, when the selected node is the node 101, clusters adjacent to the selected node are clusters c1 to c4. The nodes 102 and 203 are regarded as clusters c3 and c5 each including one node. The node with u in the figure is the selected node.

抽出部１３５は、クラスタｃ１〜ｃ４の中から、ノード１０１を同じクラスタとした場合に、最もモジュラリティが大きくなるクラスタを抽出する。このとき、抽出部１３５は、式（１）によって、クラスタｃ１〜ｃ４のそれぞれが１つのノードに集約された場合、当該集約されたノードと選択ノードと同じクラスタに分類された際のモジュラリティの向上量ΔＱ（ｕ，ｖ）を計算することで、結果的に最もモジュラリティが大きくなるクラスタを抽出する。 The extraction unit 135 extracts, from the clusters c1 to c4, the cluster having the highest modularity when the node 101 is the same cluster. At this time, when each of the clusters c1 to c4 is aggregated into one node by Expression (1), the extraction unit 135 determines the modularity when the clusters are classified into the same cluster as the aggregated node and the selected node. By calculating the improvement amount ΔQ (u, v), a cluster having the largest modularity as a result is extracted.

ただし、式（１）において、ｕは選択ノード、ｖは隣接するクラスタのいずれかを集約したノードである。例えば、図３の例では、ｕはノード１０１、ｖはクラスタｃ１〜ｃ４のいずれかを集約した１つのノードである。ただし、クラスタｃ３については、ｖは、ノード１０２そのものである。また、ｍは初期状態のグラフのエッジ数、ｗ_ｕｖはノードｕ、ｖ間のエッジの重み、ｄ（ｕ）およびｄ（ｖ）はそれぞれノードｕ、ｖの重み付き次数である。 Here, in Expression (1), u is a selected node, and v is a node obtained by aggregating any one of adjacent clusters. For example, in the example of FIG. 3, u is a node 101, and v is a single node obtained by integrating any of the clusters c1 to c4. However, for the cluster c3, v is the node 102 itself. _M is the number of edges of the graph in the initial state, w _uv is the weight of the edge between nodes u and v, and d (u) and d (v) are the weighted degrees of nodes u and v, respectively.

分類部１３６は、各スレッドで、抽出部１３５によって分類先が抽出されるたびに、分類先に関する情報が他スレッドによって所定の状態に変更されているか否かを判定し、変更されていないと判定した場合、選択ノードと分類先を同一のクラスタに分類する。 The classifying unit 136 determines, in each thread, every time a classifying destination is extracted by the extracting unit 135, whether or not the information on the classifying destination has been changed to a predetermined state by another thread, and determines that the information has not been changed. In this case, the selected node and the classification destination are classified into the same cluster.

分類部１３６は、下記の一連の処理によって分類を行う。まず、抽出部１３５によって抽出された分類先には、他スレッドで実行される変更部１３３によって変更禁止マークが付けられている場合があるため、分類部１３６は、分類先に変更禁止マークが付けられているか否かを判定する。そして、分類部１３６は、分類先に変更禁止マークが付けられていない場合、選択ノードを当該分類先のクラスタのメンバに追加する。そして、分類部１３６は、選択ノードのラベルをメンバとして追加されたクラスタと同じラベルに変更し、分類を完了する。 The classification unit 136 performs classification by a series of processes described below. First, since the classification destination extracted by the extraction unit 135 may be marked with a change prohibition mark by the change unit 133 executed by another thread, the classification unit 136 puts a change prohibition mark on the classification destination. It is determined whether it has been performed. Then, when the classification destination is not marked with the change prohibition mark, the classification unit 136 adds the selected node to the members of the cluster of the classification destination. Then, the classification unit 136 changes the label of the selected node to the same label as the cluster added as a member, and completes the classification.

このとき、分類部１３６は、変更禁止マークが付けられているか否かの判定、および選択ノードのクラスタのメンバへの追加をアトミックな処理によって行う。例えば、分類部１３６は、ＣＡＳ（Compare And Swap）命令によって、分類先に関する情報が変更されているか否かを判定し、変更されていないと判定した場合、選択ノードを分類先と同一のクラスタのメンバに追加する。 At this time, the classification unit 136 performs an atomic process to determine whether or not the change prohibition mark is attached and to add the selected node to a member of the cluster. For example, the classification unit 136 determines whether or not the information on the classification destination has been changed by a CAS (Compare And Swap) instruction. If the classification unit 136 determines that the information has not been changed, the selected node is set to the same cluster as the classification destination. Add to members.

図４を用いて、ノードの分類およびノードの集約について説明する。図４は、ノードの分類およびノードの集約について説明するための図である。例えば、図４の（ａ）におけるノード２０２、３０１、３０２および４０１〜４０３は、既に選択部１３２によって選択ノードとして選択されたノードである。 The classification of nodes and the aggregation of nodes will be described with reference to FIG. FIG. 4 is a diagram for explaining node classification and node aggregation. For example, the nodes 202, 301, 302, and 401 to 403 in FIG. 4A are nodes that have already been selected by the selection unit 132 as selected nodes.

図４の（ａ）に示すように、まず、選択部１３２は、ノード１０１を選択ノードとして選択する。そして、変更部１３３は、ノード１０１をクラスタとみなし、変更禁止マークを付ける。このとき、ノード１０１とクラスタが同一であるノードは、ノード１０１以外に存在しないため、集約部１３４は集約を行わない。そして、抽出部１３５は、ノード１０１に隣接するクラスタｃ１〜ｃ４の中から、ノード１０１が同じクラスタに分類された場合に、モジュラリティの向上量が最大となるクラスタを抽出する。図４の（ａ）の例では、抽出部１３５は、クラスタｃ３を抽出することとする。 As shown in FIG. 4A, first, the selecting unit 132 selects the node 101 as a selected node. Then, the change unit 133 regards the node 101 as a cluster and attaches a change prohibition mark. At this time, since the node having the same cluster as the node 101 does not exist other than the node 101, the aggregation unit 134 does not perform the aggregation. Then, the extraction unit 135 extracts, from the clusters c1 to c4 adjacent to the node 101, a cluster with the greatest improvement in modularity when the node 101 is classified into the same cluster. In the example of FIG. 4A, the extracting unit 135 extracts the cluster c3.

図４の（ｂ）に示すように、分類部１３６は、クラスタｃ３に変更禁止マークが付けられていない場合、ノード１０１をクラスタｃ３に分類する。次に、図４の（ｃ）に示すように、選択部１３２は、ノード１０２を選択ノードとして選択する。そして、変更部１３３は、ノード１０２のクラスタｃ３に変更禁止マークを付ける。このとき、図４の（ｄ）に示すように、集約部１３４は、クラスタｃ３のノードをノード１０２に集約する。そして、抽出部１３５は、ノード１０２に隣接するクラスタｃ１、ｃ２、ｃ４、ｃ５の中から、ノード１０１が同じクラスタに分類された場合に、モジュラリティの向上量が最大となるクラスタを抽出する。図４の（ｄ）の例では、抽出部１３５は、クラスタｃ４を抽出することとする。 As illustrated in FIG. 4B, when the change prohibition mark is not attached to the cluster c3, the classification unit 136 classifies the node 101 into the cluster c3. Next, as shown in FIG. 4C, the selection unit 132 selects the node 102 as a selected node. Then, the change unit 133 puts a change prohibition mark on the cluster c3 of the node 102. At this time, as illustrated in FIG. 4D, the aggregation unit 134 aggregates the nodes of the cluster c3 into the nodes 102. Then, the extraction unit 135 extracts, from the clusters c1, c2, c4, and c5 adjacent to the node 102, the cluster with the greatest improvement in modularity when the node 101 is classified into the same cluster. In the example of FIG. 4D, the extracting unit 135 extracts the cluster c4.

図４の（ｅ）に示すように、分類部１３６は、クラスタｃ４に変更禁止マークが付けられていない場合、ノード１０２をクラスタｃ４に分類する。そして、最終的に、図４の（ｆ）に示すクラスタリング結果が得られる。なお、抽出部１３５によって計算されるモジュラリティの向上量は、０以下になる場合がある。その場合、抽出部１３５は分類先の抽出を行わないため、分類部１３６による選択ノードの分類は行われない。例えば、図４の（ｆ）の例では、ノード１０３が選択された場合に、クラスタｃ４のノード、すなわちノード１０３をクラスタｃ１およびｃ２のいずれに分類したとしても、モジュラリティが向上しないことが考えられる。 As shown in FIG. 4E, the classification unit 136 classifies the node 102 into the cluster c4 when the change prohibition mark is not attached to the cluster c4. Finally, the clustering result shown in FIG. 4F is obtained. Note that the amount of improvement in modularity calculated by the extraction unit 135 may be 0 or less. In this case, since the extraction unit 135 does not extract the classification destination, the classification unit 136 does not classify the selected node. For example, in the example of FIG. 4F, when the node 103 is selected, the modularity does not improve even if the node of the cluster c4, that is, the node 103 is classified into any of the clusters c1 and c2. Can be

［第１の実施形態の処理］
図５を用いて、クラスタリング装置１０の処理の流れについて説明する。図５は、第１の実施形態に係るクラスタリング装置の処理の流れの一例を示すフローチャートである。図５に示すように、クラスタリング装置１０の割当部１３１は、入力されたグラフデータ２０のノードを、Ｎ個のノード群Ｖ_１、Ｖ_２、…、Ｖ_Ｎに分割する（ステップＳ１００）。このとき、割当部１３１は、分割したノード群をＮ個のスレッドのそれぞれに割り当てる。 [Processing of First Embodiment]
The flow of the process of the clustering device 10 will be described with reference to FIG. FIG. 5 is a flowchart illustrating an example of a processing flow of the clustering device according to the first embodiment. As shown in FIG. 5, the allocating unit 131 of the clustering device 10 divides the nodes of the input graph data 20 into N node groups V ₁ , V ₂ ,..., V _N (step S100). At this time, the allocating unit 131 allocates the divided node group to each of the N threads.

次に、選択部１３２、変更部１３３、集約部１３４、抽出部１３５および分類部１３６は、Ｎ個のノード群の集約処理およびクラスタを示すラベル付けを各スレッドで並列に行う（ステップＳ１１０）。 Next, the selection unit 132, the change unit 133, the aggregation unit 134, the extraction unit 135, and the classification unit 136 perform aggregation processing of N node groups and labeling indicating clusters in parallel in each thread (step S110).

ここで、図５のステップＳ１１０において、ｉ個目のノード群Ｖ_ｉに対して行われる処理を図６を用いて説明する。図６は、第１の実施形態に係るクラスタリング装置における各スレッドでの処理の流れの一例を示すフローチャートである。図６に示すように、ノード群Ｖ_ｉが空でない場合（ステップＳ２００、Ｙｅｓ）、選択部１３２は、ノード群Ｖ_ｉから任意のノードｕを選択する（ステップＳ２１０）。また、ノード群Ｖ_ｉが空である場合（ステップＳ２００、Ｎｏ）、クラスタリング装置１０は処理を終了する。 Here, the processing performed on the _i-th node group Vi in step S110 of FIG. 5 will be described with reference to FIG. FIG. 6 is a flowchart illustrating an example of a processing flow in each thread in the clustering device according to the first embodiment. As shown in FIG. 6, if nodes _{V i} is not empty (step S200, Yes), the selection unit 132 selects an arbitrary node u from nodes _{V i} (step S210). Also, if the nodes _{V i} is empty (step S200, No), the clustering apparatus 10 ends the process.

そして、変更部１３３は、ノードｕのクラスタＣ_ｕに変更禁止マークを付ける（ステップＳ２２０）。変更禁止マークによって、ノードｕの処理中に他スレッドからクラスタＣ_ｕが変更されてしまうことが防止される。次に、集約部１３４は、クラスタＣ_ｕに含まれるノードをノードｕへ集約する（ステップＳ２３０）。 Then, the change unit 133 _puts a change prohibition mark on the cluster Cu of the node u (Step S220). The change prohibition mark, cluster C _u from other threads while processing node u is prevented from being changed. Then, the aggregating unit 134 aggregates the nodes included in the cluster _{C u} to node u (step S230).

抽出部１３５は、ノードｕの隣接クラスタの中で最もモジュラリティ向上量が大きい隣接クラスタをＣ_ｖとおく（ステップＳ２４０）。なお、本実施形態では、集約タイミングの遅延により、各ノードにクラスタを示すラベルを付けた直後にノードを集約しないため、隣接クラスタの中には、１つのノードに集約されたノードで構成されるクラスタと、集約されていない複数のノードで構成されるクラスタの両方が含まれる。また、集約されていない複数のノードで構成されるクラスタについては、１つのノードに集約されたものと仮定してモジュラリティの向上量を計算する。 Extraction unit 135, placing the adjacent cluster most modularity improvement amount is large in the neighboring cluster nodes u and C _v (step S240). In the present embodiment, since the nodes are not aggregated immediately after each node is labeled with the cluster due to the delay of the aggregation timing, the adjacent cluster is configured by the nodes aggregated into one node. Includes both clusters and clusters that consist of multiple nodes that are not aggregated. For a cluster composed of a plurality of nodes that are not aggregated, the amount of improvement in modularity is calculated on the assumption that the cluster is aggregated in one node.

分類部１３６は、ノードｕをクラスタＣ_ｖへ分類したときにモジュラリティが向上するか否かを判定する（ステップＳ２５０）。分類部１３６は、モジュラリティが向上しないと判定した場合（ステップＳ２５０、Ｎｏ）、ノード群Ｖ_ｉからノードｕを削除し（ステップＳ３００）、クラスタＣ_ｕの変更禁止マークを除去し（ステップＳ３１０）、処理をステップＳ２００へ戻す。なお、分類部１３６は、例えば、モジュラリティの向上量が０より大きい場合、モジュラリティが向上すると判定する。 Classifying unit 136 determines whether the modularity is enhanced when classifying node u to the cluster C _v (step S250). Classifying unit 136, when judging that modularity is not improved (step S250, No), then remove the node u from nodes _{V i} (step S300), to remove the change prohibition mark cluster _{C u} (step S310) , And the process returns to step S200. Note that the classification unit 136 determines that the modularity is improved, for example, when the improvement amount of the modularity is larger than 0.

分類部１３６は、モジュラリティが向上すると判定した場合（ステップＳ２５０、Ｙｅｓ）、「クラスタＣ_ｖに変更禁止マークがなければ、ノードｕをクラスタＣ_ｖのメンバに追加」をアトミックに実行する（ステップＳ２６０）。そして、分類部１３６は、アトミックな変更に成功しなかった場合（ステップＳ２７０、Ｎｏ）、クラスタＣ_ｕの変更禁止マークを除去し（ステップＳ３１０）、処理をステップＳ２００へ戻す。なお、アトミックな変更は、他スレッドの処理等により成功しない場合がある。 Classifying unit 136, when judging that modularity is improved (step S250, Yes), "if there is no change prohibition mark cluster C _v, add a node u to members of the cluster C _v" to be performed atomically (step S260). Then, the classification unit 136, if not successful atomic changed (step S270, No), the cluster _{C u} and the removal of Unchangeable mark (step S310), the process returns to step S200. Note that the atomic change may not succeed due to processing of another thread or the like.

一方、分類部１３６は、アトミックな変更に成功した場合（ステップＳ２７０、Ｙｅｓ）、ノードｕをクラスタＣ_ｖのノードと同じクラスタに分類するようなラベル付けを行う（ステップＳ２８０）。そして、分類部１３６は、ノード群Ｖ_ｉからノードｕを削除し（ステップＳ２９０）、処理をステップＳ２００へ戻す。その後、ノード群Ｖ_ｉの未選択のノードについて、さらに分類処理が行われる。 On the other hand, the classification unit 136, if a successful atomic changed (step S270, Yes), performs labeling such that classify nodes u in the same cluster as the nodes of the cluster _{C v} (step S280). Then, the classification unit 136 deletes the node u from nodes _{V i} (step S290), the process returns to step S200. Thereafter, the unselected nodes of the node group V _i, further classification process is performed.

ここで、ＣＰＵにおいて、ＣＡＳ命令で書き換え可能なデータの量は、１６バイトまでに限られている場合がある。この場合、ロックを用いることなく、アトミックな処理をＣＡＳ命令によって実現するためには、クラスタのメンバの変更を１６バイト以下の書き換えで実行できるようなデータ構造を採用する必要がある。 Here, in the CPU, the amount of data that can be rewritten by the CAS instruction may be limited to 16 bytes. In this case, in order to realize an atomic process by a CAS instruction without using a lock, it is necessary to adopt a data structure that can change a member of a cluster by rewriting 16 bytes or less.

そこで、本実施形態では、クラスタのメンバを表すためのデータ構造として、図７に示すような、アトミック操作可能な単方向リスト構造を用いる（参考文献：Timothy L Harris. A Pragmatic Implementation of Non-blocking Linked-Lists. In Proceedings of the 15th Inter-national Conference on Distributed Computing, DISC '01,pages 300-314, London, UK, 2001. Springer-Verlag.）。図７は、クラスタのメンバリストのデータ構造の一例を示す図である。 Therefore, in the present embodiment, a unidirectional list structure that can be atomically operated as shown in FIG. 7 is used as a data structure for representing the members of the cluster (reference: Timothy L Harris. A Pragmatic Implementation of Non-blocking). Linked-Lists. In Proceedings of the 15th International Conference on Distributed Computing, DISC '01, pages 300-314, London, UK, 2001. Springer-Verlag.). FIG. 7 is a diagram illustrating an example of a data structure of a cluster member list.

図７に示すように、単方向リストの先頭を表すダミー要素（ｈｅａｄ）は、リストの次要素へのポインタ（ｎｅｘｔ）と、変更禁止フラグ（ｉｓ＿ｒｅａｄｏｎｌｙ）とを持つ。また、後続の各リスト要素は、次要素へのポインタと、ノード識別子等任意の情報を持つ。ｈｅａｄはクラスタを表し、後続の各リスト要素は当該クラスタのメンバであるノードを表している。また、変更部１３３は、変更禁止フラグを１にすることで、クラスタに対し変更禁止マークを付ける。 As shown in FIG. 7, the dummy element (head) representing the head of the unidirectional list has a pointer (next) to the next element of the list and a change prohibition flag (is_readonly). Each subsequent list element has a pointer to the next element and arbitrary information such as a node identifier. head represents a cluster, and each subsequent list element represents a node that is a member of the cluster. Further, the changing unit 133 sets a change prohibition flag to 1 to put a change prohibition mark on the cluster.

現在、パーソナルコンピュータやサーバ装置で一般的に採用されている、インテル社のｘ８６−６４アーキテクチャを想定すれば、ポインタは８バイトであり、変更禁止フラグは０（変更可）または１（変更禁止）を表せればよいことから、次要素へのポインタと変更禁止フラグを合わせても、クラスタのメンバ変更の際に書き換えられるデータの量は、ＣＡＳで変更可能な１６バイト以下とすることができる。 Assuming Intel's x86-64 architecture, which is currently commonly used in personal computers and server devices, the pointer is 8 bytes, and the change prohibition flag is 0 (changeable) or 1 (change prohibited). Therefore, even if the pointer to the next element and the change prohibition flag are matched, the amount of data that can be rewritten when the cluster member is changed can be 16 bytes or less that can be changed by CAS.

図７のデータ構造を用いた、図６のステップＳ２６０の処理について、図８を用いて説明する。図８は、第１の実施形態に係るクラスタリング装置における分類部の処理の流れの一例を示すフローチャートである。 The process of step S260 in FIG. 6 using the data structure in FIG. 7 will be described with reference to FIG. FIG. 8 is a flowchart illustrating an example of a processing flow of the classification unit in the clustering device according to the first embodiment.

図８に示すように、分類部１３６は、ｈｅａｄのコピーを２つ作成し、それぞれを変数ｈｅａｄ＿ｏｌｄおよびｈｅａｄ＿ｎｅｗとおく（ステップＳ４００）。次に、分類部１３６は、新しいリスト要素のメモリ領域を確保し、確保したメモリ領域を変数ｅｌｅｍとおく（ステップＳ４１０）。 As shown in FIG. 8, the classification unit 136 creates two copies of head and sets them as variables head_old and head_new (step S400). Next, the classification unit 136 secures a memory area for the new list element, and sets the secured memory area as a variable elem (step S410).

分類部１３６は、変数ｈｅａｄ＿ｏｌｄとｈｅａｄ＿ｎｅｗのｉｓ＿ｒｅａｄｏｎｌｙに０を代入し、さらに変数ｈｅａｄ＿ｎｅｗのｎｅｘｔに変数ｅｌｅｍのアドレスを代入する（ステップＳ４２０）。そして、分類部１３６は、変数ｅｌｅｍに、クラスタに追加されるノードのノード識別子等の情報を設定し、さらにｎｅｘｔに変数ｈｅａｄ＿ｏｌｄのｎｅｘｔの値を代入する（ステップＳ４３０）。 The classifying unit 136 substitutes 0 for the variables head_old and is_readonly of the head_new, and further substitutes the address of the variable elem for the next of the variable head_new (step S420). Then, the classification unit 136 sets information such as the node identifier of the node to be added to the cluster to the variable elem, and further substitutes the value of next of the variable head_old into next (step S430).

ここで、分類部１３６は、「もしｈｅａｄ＝ｈｅａｄ＿ｏｌｄならｈｅａｄ＿ｎｅｗを代入」をＣＡＳ命令でアトミックに実行する（ステップＳ４４０）。このとき、ｈｅａｄ＿ｏｌｄはｈｅａｄのコピーであるため、ｈｅａｄのｉｓ＿ｒｅａｄｏｎｌｙが他スレッドで実行される変更部１３３によって１にされていなければ、ｈｅａｄ＝ｈｅａｄ＿ｏｌｄが成り立つ。一方、ｈｅａｄのｉｓ＿ｒｅａｄｏｎｌｙが他スレッドで実行される変更部１３３によって１にされている場合、ｈｅａｄ＝ｈｅａｄ＿ｏｌｄは成り立たない。 Here, the classification unit 136 atomically executes “substitute head_new if head = head_old” with a CAS instruction (step S440). At this time, since the head_old is a copy of the head, the head = head_old holds unless the is_readonly of the head is set to 1 by the change unit 133 executed by another thread. On the other hand, if is_readonly of head is set to 1 by the change unit 133 executed by another thread, head = head_old does not hold.

［第１の実施形態の効果］
入力部１１は、複数のノードを有するグラフデータの入力を受け付ける。また、割当部１３１は、複数のノードをＮ個の群に分割し、Ｎ個の群のそれぞれに、処理を実行する単位であるＮ個のスレッドの各スレッドを割り当てる。また、選択部１３２は、各スレッドで、割当部１３１によって割り当てられた群に含まれるノードから所定の順番でノードを選択する。また、変更部１３３は、各スレッドで、選択部１３２によってノードが選択されるたびに、選択部１３２によって選択されたノードである選択ノードに関する情報を所定の状態に変更する。また、抽出部１３５は、各スレッドで、変更部１３３によって選択ノードに関する情報が変更されるたびに、選択ノードに隣接するノードまたはクラスタの中から、選択ノードと同じクラスタに分類された場合にクラスタリング処理結果の精度を表すモジュラリティが最大となるノードまたはクラスタである分類先を抽出する。また、分類部１３６は、各スレッドで、抽出部１３５によって分類先が抽出されるたびに、分類先に関する情報が他スレッドによって所定の状態に変更されているか否かを判定し、変更されていないと判定した場合、選択ノードと分類先を同一のクラスタに分類する。また、集約部１３４は、各スレッドで、分類部１３６によって分類されたノードのうち、クラスタが同一である複数のノードを、１つのノードに集約する。 [Effect of First Embodiment]
The input unit 11 receives input of graph data having a plurality of nodes. In addition, the allocating unit 131 divides the plurality of nodes into N groups, and allocates each of the N threads, which is a unit for executing processing, to each of the N groups. In addition, the selection unit 132 selects, in each thread, a node in a predetermined order from the nodes included in the group assigned by the assignment unit 131. Further, the changing unit 133 changes information on the selected node, which is the node selected by the selecting unit 132, to a predetermined state every time a node is selected by the selecting unit 132 in each thread. In addition, each time the change unit 133 changes the information on the selected node in each thread, the extraction unit 135 performs clustering when the selected node is classified into the same cluster as the selected node from the nodes or clusters adjacent to the selected node. A classification destination that is a node or a cluster with the maximum modularity representing the accuracy of the processing result is extracted. In each thread, the classifying unit 136 determines whether or not the information on the classifying destination is changed to a predetermined state by another thread every time the classifying destination is extracted by the extracting unit 135, and the thread is not changed. Is determined, the selected node and the classification destination are classified into the same cluster. The aggregation unit 134 aggregates, in each thread, a plurality of nodes having the same cluster among the nodes classified by the classification unit 136 into one node.

これにより、グラフデータに対し、高速かつ処理結果のモジュラリティが大きいクラスタリングを行うことができる。すなわち、ノードを複数のノード群に分割し、マルチスレッドによる処理が行われるため、処理が高速になる。また、各ノードの分類先のクラスタは、同一の群のノードで構成されている必要がないため、同一の群の中でのみクラスタリングを行う場合と比べて、モジュラリティが大きくなる。 This makes it possible to perform high-speed clustering with high modularity of the processing result on the graph data. That is, a node is divided into a plurality of node groups, and processing is performed by multi-threading, so that processing speed is increased. In addition, since the clusters to which each node is classified need not be composed of the same group of nodes, the modularity is higher than when clustering is performed only in the same group.

ここで、図９を用いて、従来のグラフ分割による並列クラスタリングの問題点について説明する。図９は、モジュラリティが小さいクラスタリングが行われる場合の例について説明する図である。図９の（ａ）に示すようなグラフに対してクラスタリングを行う場合、図９の（ｂ）のようなクラスタリング結果となった場合にモジュラリティが大きくなると考えられる。しかしながら、従来の手法により、図９の（ｃ）に示すような部分グラフに群分けがされた場合、図９の（ｄ）のようなモジュラリティの小さいクラスタリング結果が得られる。従来の手法では、例えば、同じ群に含まれるノードｕとノードｖを、他の群のクラスタに含めることができない。 Here, a problem of the conventional parallel clustering by graph division will be described with reference to FIG. FIG. 9 is a diagram illustrating an example in which clustering with small modularity is performed. When clustering is performed on a graph such as that shown in FIG. 9A, it is considered that the modularity increases when a clustering result as shown in FIG. 9B is obtained. However, in the case of grouping into subgraphs as shown in FIG. 9C by the conventional method, a clustering result with low modularity as shown in FIG. 9D is obtained. In the conventional method, for example, nodes u and v included in the same group cannot be included in another group of clusters.

これに対し、本実施形態の抽出部１３５は、群とは無関係に、選択ノードに隣接するノードまたはクラスタの中から分類先を抽出するため、従来の手法と比べてモジュラリティの大きいクラスタリング結果を得ることができる。 On the other hand, the extraction unit 135 of the present embodiment extracts the classification destination from the nodes or clusters adjacent to the selected node irrespective of the group. Obtainable.

また、従来の逐次ノード集約によるクラスタリング、およびグラフ分割による並列クラスタリングを単純に合わせた手法、すなわち、並列処理により群ごとに逐次ノード集約によるクラスタリングを行う手法が考えられる。しかしながら、このような手法には、並列処理を行うスレッドが同じノードに対する処理を行おうとして、衝突が発生するという問題がある。この問題に対して、処理対象のノードにロックをかけておくことが考えられるが、ロックによって同時に処理を進めることができないスレッドが発生し、処理の並列度が低下してしまう。このため、ロックをかける方法は、ＣＰＵのコア数の増加に対する性能向上率、すなわちスケーラビリティが低い。 Further, a method that simply combines conventional clustering by sequential node aggregation and parallel clustering by graph division, that is, a method of performing clustering by sequential node aggregation for each group by parallel processing is conceivable. However, such a method has a problem that a collision occurs when a thread performing parallel processing attempts to perform processing on the same node. To solve this problem, it is conceivable to lock the node to be processed. However, the lock causes a thread that cannot proceed with the process at the same time, and the degree of parallelism of the process is reduced. For this reason, the locking method has a low performance improvement rate for an increase in the number of CPU cores, that is, a low scalability.

これに対し、本実施形態の分類部１３６は、処理対象のノードに関する情報が変更されているか否かを判定したうえで変更を確定する楽観的並列処理による処理を行っているため、ロックによる並列度の低下を発生させない。 On the other hand, since the classification unit 136 of the present embodiment performs processing by optimistic parallel processing of determining whether or not information about the processing target node has been changed and then determining the change, the classification by the lock is performed. Does not cause a decrease in degree.

また、集約部１３４は、変更部１３３によって選択ノードに関する情報が変更された後、かつ抽出部１３５によって分類先が抽出される前に、選択ノードと、選択ノードと分類されたクラスタが同一であるノードとを１つのノードに集約する。このように、集約部１３４は、分類部１３６による分類の直後ではなく、分類が行われたノードが選択ノードとして選択された際に集約を行っている。これによって、分類部１３６におけるアトミックな処理による処理量を少なくすることができる。 Also, the aggregation node 134 has the same cluster as the selected node and the cluster classified as the selected node after the information about the selected node is changed by the change unit 133 and before the classification destination is extracted by the extraction unit 135. Nodes are aggregated into one node. As described above, the aggregation unit 134 performs aggregation not immediately after the classification by the classification unit 136 but when the classified node is selected as the selected node. This makes it possible to reduce the amount of processing performed by the atomic processing in the classification unit 136.

また、分類部１３６は、ＣＡＳ命令によって、分類先に関する情報が変更されているか否かを判定し、変更されていないと判定した場合、選択ノードを分類先と同一のクラスタのメンバに追加することで分類を行う。このように、分類部１３６は、ＣＡＳ命令によって、アトミックな処理を実現することができる。また、分類部１３６はノードの集約を行わないため、書き換え可能なデータが１６バイトまでに限定されているＣＡＳ命令であっても処理の実行が可能となる。 In addition, the classification unit 136 determines whether or not the information on the classification destination has been changed by the CAS instruction, and if determined that the information has not been changed, adds the selected node to a member of the same cluster as the classification destination. Classify with. As described above, the classifying unit 136 can realize an atomic process by the CAS instruction. Further, since the classifying unit 136 does not perform the aggregation of the nodes, it is possible to execute the processing even with the CAS instruction whose rewritable data is limited to 16 bytes.

出力部１２は、分類部１３６によって複数のノードが分類されたクラスタを示す情報を出力する。このように、クラスタリング結果に例えばエッジ等に関する情報を含めないようにすることで、クラスタリング結果のデータ量を小さくすることができる。 The output unit 12 outputs information indicating a cluster into which a plurality of nodes have been classified by the classification unit 136. In this way, by not including information on edges and the like in the clustering result, for example, the data amount of the clustering result can be reduced.

［システム構成等］
また、図示した各装置の各構成要素は機能概念的なものであり、必ずしも物理的に図示の如く構成されていることを要しない。すなわち、各装置の分散・統合の具体的形態は図示のものに限られず、その全部または一部を、各種の負荷や使用状況等に応じて、任意の単位で機能的または物理的に分散・統合して構成することができる。さらに、各装置にて行われる各処理機能は、その全部または任意の一部が、ＣＰＵおよび当該ＣＰＵにて解析実行されるプログラムにて実現され、あるいは、ワイヤードロジックによるハードウェアとして実現され得る。 [System configuration, etc.]
Each component of each device illustrated is a functional concept, and does not necessarily need to be physically configured as illustrated. In other words, the specific mode of distribution / integration of each device is not limited to the illustrated one, and all or a part of each device may be functionally or physically distributed / arbitrarily divided into arbitrary units according to various loads and usage conditions. Can be integrated and configured. Further, all or any part of each processing function performed by each device can be realized by a CPU and a program analyzed and executed by the CPU, or can be realized as hardware by wired logic.

また、本実施形態において説明した各処理のうち、自動的に行われるものとして説明した処理の全部または一部を手動的に行うこともでき、あるいは、手動的に行われるものとして説明した処理の全部または一部を公知の方法で自動的に行うこともできる。この他、上記文書中や図面中で示した処理手順、制御手順、具体的名称、各種のデータやパラメータを含む情報については、特記する場合を除いて任意に変更することができる。 Further, of the processes described in the present embodiment, all or a part of the processes described as being performed automatically can be manually performed, or the processes described as being performed manually can be performed. All or part can be performed automatically by a known method. In addition, the processing procedures, control procedures, specific names, and information including various data and parameters shown in the above documents and drawings can be arbitrarily changed unless otherwise specified.

［プログラム］
一実施形態として、クラスタリング装置は、パッケージソフトウェアやオンラインソフトウェアとして上記のクラスタリングを実行するクラスタリングプログラムを所望のコンピュータにインストールさせることによって実装できる。例えば、上記のクラスタリングプログラムを情報処理装置に実行させることにより、情報処理装置をクラスタリング装置として機能させることができる。ここで言う情報処理装置には、デスクトップ型またはノート型のパーソナルコンピュータが含まれる。また、その他にも、情報処理装置にはスマートフォン、携帯電話機やＰＨＳ（Personal Handyphone System）等の移動体通信端末、さらには、ＰＤＡ（Personal Digital Assistant）等のスレート端末等がその範疇に含まれる。 [program]
As an embodiment, the clustering device can be implemented by installing a clustering program that executes the above-described clustering on a desired computer as package software or online software. For example, by causing the information processing device to execute the clustering program, the information processing device can function as a clustering device. The information processing device referred to here includes a desktop or notebook personal computer. In addition, the information processing apparatus includes a mobile communication terminal such as a smartphone, a mobile phone, and a PHS (Personal Handyphone System), and a slate terminal such as a PDA (Personal Digital Assistant).

また、クラスタリング装置は、ユーザが使用する端末装置をクライアントとし、当該クライアントに上記のクラスタリングに関するサービスを提供するサーバ装置として実装することもできる。例えば、クラスタリング装置は、グラフデータを入力とし、クラスタリング結果を出力とするクラスタリングサービスを提供するサーバ装置として実装される。この場合、クラスタリング装置は、Ｗｅｂサーバとして実装することとしてもよいし、アウトソーシングによって上記のクラスタリングに関するサービスを提供するクラウドとして実装することとしてもかまわない。 In addition, the clustering device may be implemented as a server device that provides a terminal device used by a user as a client and provides the client with the above-described clustering service. For example, the clustering device is implemented as a server device that provides a clustering service that receives graph data as input and outputs clustering results. In this case, the clustering device may be implemented as a Web server, or may be implemented as a cloud that provides the above-described service related to clustering by outsourcing.

図１０は、プログラムが実行されることにより、クラスタリング装置が実現されるコンピュータの一例を示す図である。コンピュータ１０００は、例えば、メモリ１０１０、ＣＰＵ１０２０を有する。また、コンピュータ１０００は、ハードディスクドライブインタフェース１０３０、ディスクドライブインタフェース１０４０、シリアルポートインタフェース１０５０、ビデオアダプタ１０６０、ネットワークインタフェース１０７０を有する。これらの各部は、バス１０８０によって接続される。また、ＣＰＵ１０２０は、複数のスレッドで処理の実行が可能なＣＰＵである。ＣＰＵ１０２０は、例えば、複数のＣＰＵで構成されるマルチＣＰＵや、複数のコアを有するマルチコアＣＰＵである。 FIG. 10 is a diagram illustrating an example of a computer in which a clustering device is realized by executing a program. The computer 1000 has, for example, a memory 1010 and a CPU 1020. The computer 1000 has a hard disk drive interface 1030, a disk drive interface 1040, a serial port interface 1050, a video adapter 1060, and a network interface 1070. These components are connected by a bus 1080. The CPU 1020 is a CPU that can execute a process with a plurality of threads. The CPU 1020 is, for example, a multi-CPU having a plurality of CPUs or a multi-core CPU having a plurality of cores.

メモリ１０１０は、ＲＯＭ（Read Only Memory）１０１１およびＲＡＭ（Random Access Memory）１０１２を含む。ＲＯＭ１０１１は、例えば、ＢＩＯＳ（Basic Input Output System）等のブートプログラムを記憶する。ハードディスクドライブインタフェース１０３０は、ハードディスクドライブ１０９０に接続される。ディスクドライブインタフェース１０４０は、ディスクドライブ１１００に接続される。例えば磁気ディスクや光ディスク等の着脱可能な記憶媒体が、ディスクドライブ１１００に挿入される。シリアルポートインタフェース１０５０は、例えばマウス１１１０、キーボード１１２０に接続される。ビデオアダプタ１０６０は、例えばディスプレイ１１３０に接続される。 The memory 1010 includes a ROM (Read Only Memory) 1011 and a RAM (Random Access Memory) 1012. The ROM 1011 stores, for example, a boot program such as a BIOS (Basic Input Output System). The hard disk drive interface 1030 is connected to the hard disk drive 1090. The disk drive interface 1040 is connected to the disk drive 1100. For example, a removable storage medium such as a magnetic disk or an optical disk is inserted into the disk drive 1100. The serial port interface 1050 is connected to, for example, a mouse 1110 and a keyboard 1120. The video adapter 1060 is connected to the display 1130, for example.

ハードディスクドライブ１０９０は、例えば、ＯＳ１０９１、アプリケーションプログラム１０９２、プログラムモジュール１０９３、プログラムデータ１０９４を記憶する。すなわち、クラスタリング装置の各処理を規定するプログラムは、コンピュータにより実行可能なコードが記述されたプログラムモジュール１０９３として実装される。プログラムモジュール１０９３は、例えばハードディスクドライブ１０９０に記憶される。例えば、クラスタリング装置における機能構成と同様の処理を実行するためのプログラムモジュール１０９３が、ハードディスクドライブ１０９０に記憶される。なお、ハードディスクドライブ１０９０は、ＳＳＤ（Solid State Drive）により代替されてもよい。 The hard disk drive 1090 stores, for example, an OS 1091, an application program 1092, a program module 1093, and program data 1094. That is, a program that defines each process of the clustering device is implemented as a program module 1093 in which codes executable by a computer are described. The program module 1093 is stored in, for example, the hard disk drive 1090. For example, a program module 1093 for executing the same processing as the functional configuration in the clustering device is stored in the hard disk drive 1090. The hard disk drive 1090 may be replaced by an SSD (Solid State Drive).

また、上述した実施形態の処理で用いられる設定データは、プログラムデータ１０９４として、例えばメモリ１０１０やハードディスクドライブ１０９０に記憶される。そして、ＣＰＵ１０２０が、メモリ１０１０やハードディスクドライブ１０９０に記憶されたプログラムモジュール１０９３やプログラムデータ１０９４を必要に応じてＲＡＭ１０１２に読み出して実行する。 The setting data used in the processing of the above-described embodiment is stored as the program data 1094 in, for example, the memory 1010 or the hard disk drive 1090. Then, the CPU 1020 reads out the program module 1093 and the program data 1094 stored in the memory 1010 and the hard disk drive 1090 to the RAM 1012 as necessary and executes them.

なお、プログラムモジュール１０９３やプログラムデータ１０９４は、ハードディスクドライブ１０９０に記憶される場合に限らず、例えば着脱可能な記憶媒体に記憶され、ディスクドライブ１１００等を介してＣＰＵ１０２０によって読み出されてもよい。あるいは、プログラムモジュール１０９３およびプログラムデータ１０９４は、ネットワーク（ＬＡＮ（Local Area Network）、ＷＡＮ（Wide Area Network）等）を介して接続された他のコンピュータに記憶されてもよい。そして、プログラムモジュール１０９３およびプログラムデータ１０９４は、他のコンピュータから、ネットワークインタフェース１０７０を介してＣＰＵ１０２０によって読み出されてもよい。 The program module 1093 and the program data 1094 are not limited to being stored in the hard disk drive 1090, but may be stored in, for example, a removable storage medium and read out by the CPU 1020 via the disk drive 1100 or the like. Alternatively, the program module 1093 and the program data 1094 may be stored in another computer connected via a network (LAN (Local Area Network), WAN (Wide Area Network), or the like). Then, the program module 1093 and the program data 1094 may be read from another computer by the CPU 1020 via the network interface 1070.

１０クラスタリング装置
１１入力部
１２出力部
１３制御部
１４記憶部
２０グラフデータ
２１クラスタリング結果
１０１、１０２、１０３、２０１、２０２、２０３、３０１、３０２、３０３、４０１、４０２、４０３ノード
１３１割当部
１３２選択部
１３３変更部
１３４集約部
１３５抽出部
１３６分類部
ｃ１、ｃ２、ｃ３、ｃ４、ｃ５クラスタ
ｔ１、ｔ２、ｔ３、ｔ４スレッド DESCRIPTION OF SYMBOLS 10 Clustering apparatus 11 Input part 12 Output part 13 Control part 14 Storage part 20 Graph data 21 Clustering result 101,102,103,201,202,203,301,302,303,401,402,403 Node 131 Assignment part 132 Selection Unit 133 Change unit 134 Aggregation unit 135 Extraction unit 136 Classification unit c1, c2, c3, c4, c5 Cluster t1, t2, t3, t4 Thread

Claims

An input unit for receiving input of graph data having a plurality of nodes;
An allocating unit that divides the plurality of nodes into N (N is an integer of 2 or more) groups and allocates each thread of N threads, which is a unit for performing processing, to each of the N groups;
A selecting unit that selects, in each of the threads, nodes in a predetermined order from nodes included in the group assigned by the assigning unit;
In each of the threads, each time a node is selected by the selection unit, a change unit that changes information about a selected node that is a node selected by the selection unit to a predetermined state,
In each of the threads, each time the information about the selected node is changed by the changing unit, if the selected node is classified into the same cluster as the selected node from among the nodes or clusters adjacent to the selected node, the result of the clustering process is An extraction unit that extracts a classification destination that is a node or a cluster with the highest modularity representing accuracy;
In each of the threads, each time the classifying destination is extracted by the extraction unit, it is determined whether or not the information on the classification destination has been changed to the predetermined state by another thread, and it has been determined that the information has not been changed. In the case, a classification unit that classifies the selected node and the classification destination into the same cluster,
An aggregating unit that aggregates a plurality of nodes having the same cluster into one node among the nodes classified by the classifying unit in each of the threads;
A clustering device comprising:

The aggregating unit is configured such that, after the information about the selected node is changed by the changing unit and before the extraction unit extracts the classification destination, the selected node and the cluster classified as the selected node are the same. 2. The clustering apparatus according to claim 1, wherein the nodes are aggregated into one node.

The classification unit determines whether or not the information on the classification destination has been changed by a CAS instruction, and if determined that the information has not been changed, adds the selected node to a member of the same cluster as the classification destination. 3. The clustering apparatus according to claim 2, wherein the classification is performed by the following.

4. The clustering apparatus according to claim 1, further comprising an output unit that outputs information indicating a cluster into which the plurality of nodes are classified by the classification unit. 5.

A clustering method performed by a clustering device,
An input step of receiving an input of graph data having a plurality of nodes;
An allocating step of dividing the plurality of nodes into N groups and assigning each thread of N threads, which is a unit for executing processing, to each of the N groups;
A selecting step of, in each of the threads, selecting a node in a predetermined order from nodes included in the group assigned by the assigning step;
In each thread, each time a node is selected by the selection step, a change step of changing information about a selected node that is a node selected by the selection step to a predetermined state,
In each of the threads, each time the information on the selected node is changed by the changing step, if a node or a cluster adjacent to the selected node is classified into the same cluster as the selected node, a result of the clustering processing is returned. An extraction step of extracting a classification destination that is a node or a cluster with the highest modularity representing accuracy;
In each of the threads, each time the classification destination is extracted in the extraction step, it is determined whether or not the information on the classification destination has been changed to the predetermined state by another thread, and it has been determined that the information has not been changed. In the case, a classification step of classifying the selected node and the classification destination into the same cluster,
An aggregation step of, in each of the threads, among the nodes classified by the classification step, a plurality of nodes with the same cluster being aggregated into one node;
A clustering method comprising:

A clustering program for causing a computer to function as the clustering device according to claim 1.