JP6036848B2

JP6036848B2 - Information processing system

Info

Publication number: JP6036848B2
Application number: JP2014553987A
Authority: JP
Inventors: 真人林; 加藤　猛; 猛加藤
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2012-12-28
Filing date: 2012-12-28
Publication date: 2016-11-30
Anticipated expiration: 2032-12-28
Also published as: WO2014102996A1; JPWO2014102996A1

Description

本発明は、情報処理システムに関し、特にリソース間を接続するネットワークの制御に関する。 The present invention relates to an information processing system, and more particularly to control of a network connecting resources.

本技術分野の背景技術として特許文献１に開示されている技術がある。この公報には、アプリケーションフレームワークによって制御される少なくとも一台の回線交換機と、多数の計算ノードが接続されたネットワークを備え、ＣＰＵ負荷やメモリ使用率、ボトルネック等の監視結果に基づいて回線交換機を制御する発明が開示されている。 As a background art of this technical field, there is a technique disclosed in Patent Document 1. This publication includes at least one circuit switch controlled by an application framework and a network to which a large number of computing nodes are connected. The circuit switch is based on the monitoring results of CPU load, memory usage rate, bottleneck, and the like. An invention for controlling the above is disclosed.

特許文献２では、ストリーム処理グラフを複数のプロセッサからなる処理ノードの集合（特許文献２ではスーパーノードクラスタ（Ｓｕｐｅｒｎｏｄｅｃｌｕｓｔｅｒ）と記載）に割当て、該集合間が相互に接続されるように光回線交換機を制御する技術が開示されている。 In Patent Document 2, a stream processing graph is assigned to a set of processing nodes composed of a plurality of processors (described in Patent Document 2 as a super node cluster), and an optical line is connected so that the sets are mutually connected. A technique for controlling an exchange is disclosed.

米国特許第８１２５９８４号明細書US Pat. No. 8,125,984 米国特許第８０３７２８４号明細書US Patent No. 8037284

近年、ソーシャルネットワークにおける人間関係や企業間の取引関係など、繋がりのある大規模なデータの解析が注目されている。このようなデータは一般的にグラフとして表現される。グラフを解析する際には処理速度とデータサイズの要求に応じて複数のサーバを用いて並列分散処理を行う必要が生じる。上記のような自然発生的なグラフは、一般に、複雑な構造を有し疎密の偏りが大きいため、並列分散処理を行う際にサーバ間の通信負荷の偏りが生じ易い。通信負荷の高いサーバは、通信負荷のために処理時間が長くなり、全体の処理性能のボトルネックになりやすいという問題を生じる。 In recent years, analysis of large-scale connected data such as human relationships in social networks and business relationships between companies has attracted attention. Such data is generally expressed as a graph. When analyzing a graph, it becomes necessary to perform parallel distributed processing using a plurality of servers in accordance with requests for processing speed and data size. The naturally occurring graph as described above generally has a complicated structure and a large unevenness of the density. Therefore, when performing parallel distributed processing, an uneven communication load between servers is likely to occur. A server with a high communication load has a problem that the processing time becomes long due to the communication load, and this tends to be a bottleneck of the overall processing performance.

ここで、前述した特許文献１に記載のシステムでは、実行中の負荷監視結果に基づいてネットワークの構成を変えることができるが、プログラム実行中のトポロジ変更処理自体がオーバーヘッドとなる。また、プログラム実行開始時にどのようなネットワークトポロジを用いるべきかについては言及されていない。 Here, in the system described in Patent Document 1 described above, the network configuration can be changed based on the load monitoring result being executed, but the topology change processing itself during program execution is an overhead. Further, there is no mention of what network topology should be used at the start of program execution.

一方、前述した特許文献２に記載のシステムでは、ストリームグラフの処理ノードの割当て結果をもとに光回線交換機を用いてネットワークトポロジを変更しているが、単に全ての処理ノード間で、発生する通信量の総合計を上回る帯域を与えると記載されている。大規模なグラフ処理になるほど処理ノード間の通信量が大きく偏る場合が増え、全ての処理ノード間に通信量の総合計を上回る帯域を与えると無駄が多くなり、物量の観点から不利である。 On the other hand, in the system described in Patent Document 2 described above, the network topology is changed using an optical circuit switch based on the allocation result of the processing nodes in the stream graph. However, it occurs only between all the processing nodes. It is described that a bandwidth exceeding the total amount of communication is given. The larger the graph processing, the greater the amount of communication between the processing nodes increases. If a bandwidth exceeding the total amount of communication is provided between all processing nodes, the amount of waste increases, which is disadvantageous from the viewpoint of physical quantity.

本発明は上記の課題に鑑みてなされたものであり、その目的は各処理ノード間で発生する通信量の予測値をもとに、処理ノード間の通信量の偏りに応じたネットワークトポロジを決定可能な情報処理システムを提供することにある。 The present invention has been made in view of the above problems, and its purpose is to determine a network topology according to the amount of traffic between processing nodes based on a predicted value of the amount of traffic generated between the processing nodes. It is to provide an information processing system that can be used.

本発明の情報処理システムでは、処理負荷を複数の計算ノードに分散配置し、処理負荷により生じる通信負荷を予測し、予測結果に基づき計算ノード間のネットワークトポロジを決定することで、上述の課題を解決する。 In the information processing system according to the present invention, the processing load is distributed and distributed to a plurality of calculation nodes, the communication load caused by the processing load is predicted, and the network topology between the calculation nodes is determined based on the prediction result. Solve.

本発明によれば、グラフ処理のように並列処理時に計算ノード間の通信量に偏りが生じる場合であっても、通信量の偏りに応じたネットワークトポロジを構成することが可能となり、ひいては処理の高速化を図ることができる。 According to the present invention, it is possible to configure a network topology according to the uneven traffic even when the traffic between the computing nodes is uneven during parallel processing as in graph processing. The speed can be increased.

情報処理システムの構成を示すブロック図である。It is a block diagram which shows the structure of an information processing system. 情報処理システムの機能を示すブロック図である。It is a block diagram which shows the function of an information processing system. 接続可変ネットワークの説明図である。It is explanatory drawing of a connection variable network. 情報処理システム全体の動作を示すフローチャートである。It is a flowchart which shows operation | movement of the whole information processing system. 情報処理システムにおける負荷配置の例を示す図である。It is a figure which shows the example of the load arrangement | positioning in an information processing system. 情報処理システムにおける負荷配置情報の例を示す図である。It is a figure which shows the example of the load arrangement | positioning information in an information processing system. 情報処理システムにおける負荷とサーバの対応関係の例を示す図である。It is a figure which shows the example of the correspondence of the load and server in an information processing system. 情報処理システムにおけるサーバとネットワークスイッチの対応関係の例を示す図である。It is a figure which shows the example of the correspondence of the server and network switch in an information processing system. 情報処理システムにおいて負荷配置情報から通信負荷を予測する手順を示すフローチートである。It is a flow chart showing a procedure for predicting a communication load from load arrangement information in an information processing system. 情報処理システムにおいて負荷配置情報から通信負荷を予測した結果の例を示す図である。It is a figure which shows the example of the result of having predicted the communication load from the load arrangement information in the information processing system. 情報処理システムにおいてトポロジ決定の例を示した図である。It is the figure which showed the example of topology determination in an information processing system. 情報処理システムにおいて、処理サーバ間のネットワークトポロジの例を示す図である。It is a figure which shows the example of the network topology between process servers in an information processing system. 情報処理システムの入力となるグラフの例を示す図である。It is a figure which shows the example of the graph used as the input of an information processing system.

以下、実施例を図面を用いて説明する。 Hereinafter, examples will be described with reference to the drawings.

図１３に、本発明の情報処理システムが扱う処理負荷の例として、入力グラフの例を示す。頂点が丸印で、辺が頂点間を結ぶ矢印でグラフ構造が表現されている。各頂点には固有の通し番号であるＩＤが付けられている。また辺で繋がった頂点どうしを「隣接する」と呼ぶ。具体的なグラフ処理としては、頂点間の最短経路を算出する処理や、グラフ中の重要な頂点を算出する中心性計算など様々なものがある。これらの多くは辺に沿って情報のやり取りを行うトラバース処理を中心とする処理である。 FIG. 13 shows an example of an input graph as an example of the processing load handled by the information processing system of the present invention. The graph structure is represented by vertices with circles and arrows with arrows connecting the vertices. Each vertex is assigned an ID that is a unique serial number. Also, vertices connected by edges are called “adjacent”. Specific graph processing includes various processing such as processing for calculating the shortest path between vertices and centrality calculation for calculating important vertices in the graph. Many of these are processes centered on a traverse process for exchanging information along the side.

トラバース処理は辺に沿って頂点間でデータをやり取りする処理で、以下のステップを頂点毎に行う。まず、隣接する頂点から辺に沿ってデータを受信する。次に、受信したデータをもとに計算を行い頂点のデータを更新する。そして更新したデータを辺に沿って隣接する頂点に送信する。 The traverse process is a process of exchanging data between vertices along the side, and the following steps are performed for each vertex. First, data is received along an edge from an adjacent vertex. Next, calculation is performed based on the received data to update the vertex data. Then, the updated data is transmitted to adjacent vertices along the side.

このようなグラフ処理を複数のサーバ上で並列に実行する場合、グラフを分割してその一部をサーバに割当てる。このとき、辺の両端の頂点が異なるサーバに割当てられていると、トラバース処理中に発生する辺に沿ったデータの送受信はネットワークを介したサーバ間の通信処理として実現される。 When such graph processing is executed in parallel on a plurality of servers, the graph is divided and a part thereof is allocated to the server. At this time, if the vertices at both ends of the side are assigned to different servers, transmission / reception of data along the side generated during the traverse processing is realized as communication processing between the servers via the network.

図１に、本発明の実施例である情報処理システム１００を示す。情報処理システム１００は、管理サーバ１０１と、複数の処理サーバ１０６−１〜８と、ストレージ１１１と、を備える。管理サーバ１０１と処理サーバ１０６−１〜８は、夫々対応するネットワークスイッチ１１０−１〜１１０−５に接続されている。ネットワークスイッチ１１０−１〜５は、接続可変ネットワーク装置１１２を介して相互に接続されている。処理サーバ１０６−１〜８は、並列処理を行う計算ノードとして働く。 FIG. 1 shows an information processing system 100 that is an embodiment of the present invention. The information processing system 100 includes a management server 101, a plurality of processing servers 106-1 to 106-8, and a storage 111. The management server 101 and the processing servers 106-1 to 106-1 are connected to corresponding network switches 110-1 to 110-5, respectively. The network switches 110-1 to 110-5 are connected to each other via a connection variable network device 112. The processing servers 106-1 to 106-8 work as computing nodes that perform parallel processing.

管理サーバ１０１は、メモリ１０２と、中央処理装置（ＣＰＵ）１０３と、ネットワークインターフェース１０４と、接続可変ネットワーク制御インターフェース１０５と、を備える。管理サーバ１０１は、情報処理システム１００内で、管理ノードとして働く。なお、管理サーバ１０１は、さらに計算ノードとして処理サーバ１０６−１〜８と同様に働くことも可能である。 The management server 101 includes a memory 102, a central processing unit (CPU) 103, a network interface 104, and a connection variable network control interface 105. The management server 101 functions as a management node in the information processing system 100. Note that the management server 101 can further function as a calculation node in the same manner as the processing servers 106-1 to 106-8.

接続可変ネットワーク制御インターフェース１０５は、接続可変ネットワーク装置１１２に応じて専用の接続を用いるほか、例えばＵＳＢ、シリアル接続など汎用の接続を用いることができる。処理サーバ１０６−１〜８は同様のハードウェア構成を備え、夫々メモリ１０７と、ＣＰＵ１０８とネットワークインターフェース１０９と、を備える。 The connection variable network control interface 105 can use a dedicated connection according to the connection variable network device 112, or can use a general-purpose connection such as a USB or serial connection. The processing servers 106-1 to 10-8 have the same hardware configuration, and each includes a memory 107, a CPU 108, and a network interface 109.

接続可変ネットワーク装置１１２について図４を併用しながら説明する。接続可変ネットワーク装置１１２は、複数のポート３０２を備え、接続対象のネットワークスイッチとひとつまたは複数のリンク３０１を用いて接続される。リンク３０１には、例えば光ファイバや電気ケーブルを用いることができる。また、接続可変ネットワーク装置１１２は、制御部１１３と、可変リンク部１１７とを備える。制御部１１３は、制御インターフェース（Ｉ／Ｆ）１１４と、ＣＰＵ１１５と、メモリ１１６と、を備える。可変リンク部１１７は、制御装置１１８と、複数の可変リンク３０３と、を備える。可変リンク部１１７は、制御装置１１８からの指示に基づいて任意のポート間に可変リンク３０３を張ることができる。図４の例ではポート１と７、ポート２と４、ポート８と１０、ポート６と１２が接続されている状態を示している。可変リンク３０３を制御することは接続可変ネットワーク装置１１２に接続されているネットワークスイッチ間の物理的な結線を変更することに相当する。これにより、固定された結線のネットワーク上でルーティングを制御する場合と比べて、より柔軟にネットワークスイッチ間の帯域を決めることが可能となる。接続可変ネットワーク装置１１２としては、例えばＭＥＭＳ（ＭｉｃｒｏＥｌｅｃｔｒｏＭｅｃｈａｎｉｃａｌＳｙｓｔｅｍｓ）による光スイッチ等を用いることができる。 The connection variable network device 112 will be described with reference to FIG. The connection variable network device 112 includes a plurality of ports 302 and is connected to a connection target network switch using one or a plurality of links 301. For example, an optical fiber or an electric cable can be used for the link 301. The connection variable network device 112 includes a control unit 113 and a variable link unit 117. The control unit 113 includes a control interface (I / F) 114, a CPU 115, and a memory 116. The variable link unit 117 includes a control device 118 and a plurality of variable links 303. The variable link unit 117 can establish a variable link 303 between arbitrary ports based on an instruction from the control device 118. The example of FIG. 4 shows a state in which ports 1 and 7, ports 2 and 4, ports 8 and 10, and ports 6 and 12 are connected. Controlling the variable link 303 corresponds to changing a physical connection between network switches connected to the connection variable network device 112. This makes it possible to determine the bandwidth between the network switches more flexibly than when routing is controlled on a network with a fixed connection. As the connection variable network device 112, for example, an optical switch using MEMS (Micro Electro Mechanical Systems) can be used.

図２に情報処理システム１００の機能ブロック図を示す。管理サーバ１０１および処理サーバの各部と情報はそれぞれのメモリ上に格納される。管理サーバ１０１は、メモリ１０２上に、負荷配置部２０１と、予測通信量集計部２０２と、予測通信量通知部２０３と、通信量予測結果２０４と、負荷配置リスト２０５と、ネットワークスイッチ対応情報２０６と、を格納する。処理サーバ１０６−１〜８は、メモリ１０７に、夫々通信量予測部２０７と、処理部２０８と、負荷配置リスト２０５と、ネットワークスイッチ対応情報２０６と、処理負荷情報２０９と、を格納する。ストレージ１１１は、負荷データ２１２と、ネットワークスイッチ情報２０６と、を格納する。 FIG. 2 shows a functional block diagram of the information processing system 100. Each part and information of the management server 101 and the processing server are stored in respective memories. The management server 101 stores, on the memory 102, a load placement unit 201, a predicted communication amount totaling unit 202, a predicted communication amount notification unit 203, a traffic amount prediction result 204, a load placement list 205, and network switch correspondence information 206. And store. The processing servers 106-1 to 106-8 store the communication amount prediction unit 207, the processing unit 208, the load arrangement list 205, the network switch correspondence information 206, and the processing load information 209 in the memory 107, respectively. The storage 111 stores load data 212 and network switch information 206.

図４に、情報処理システム１００の動作フローを示す。ステップ４０１では、管理サーバ１０１が、ストレージ１１１上に格納されたネットワークスイッチ対応情報２０６を読み込んでメモリ１０２上に格納する。その後、管理サーバ１０１は、全ての処理サーバ１０６−１〜８にスイッチ対応情報２０６を送信する。各処理サーバ１０６−１〜８は、受信したネットワークスイッチ対応情報を夫々のメモリ１０７に格納する。本実施例では、各サーバは固定的にネットワークスイッチに接続されており、実行中にサーバとスイッチの対応関係が変化しないため、静的な構成情報として事前に準備されているものとした。図８にスイッチ対応情報の例を示す。各サーバの識別情報（ＩＤ）に対して、各サーバが接続されているネットワークスイッチのＩＤが格納されている。なお、対応関係に規則性があれば、対応情報を数式で表現してもかまわない。 FIG. 4 shows an operation flow of the information processing system 100. In step 401, the management server 101 reads the network switch correspondence information 206 stored on the storage 111 and stores it on the memory 102. Thereafter, the management server 101 transmits the switch correspondence information 206 to all the processing servers 106-1 to 106-8. Each processing server 106-1 to 10-8 stores the received network switch correspondence information in each memory 107. In this embodiment, each server is fixedly connected to the network switch, and the correspondence between the server and the switch does not change during execution. Therefore, the server is prepared in advance as static configuration information. FIG. 8 shows an example of switch correspondence information. For the identification information (ID) of each server, the ID of the network switch to which each server is connected is stored. If the correspondence relationship is regular, the correspondence information may be expressed by a mathematical expression.

ステップ４０２では、管理サーバ１０１の負荷配置部２０１が、ストレージ１１１上に格納された負荷データ２１２を読み込んで処理サーバ１０６−１〜８のメモリ１０７上に格納する。分散処理を前提とするため、各処理サーバは夫々異なる負荷データの部分をメモリに格納し処理を行う。例えば、管理サーバ１０１の負荷配置部２０１は、ストレージ１１１から負荷データ２１２のデータサイズを取得する。その後、管理サーバ１０１の負荷配置部２０１は、負荷データ２１２のデータサイズを処理サーバ数（本実施例では８）で割って、処理サーバ１台分の読み込みサイズを決定する。その後、管理サーバ１０１の負荷配置部２０１は、各処理サーバに読み込む範囲を通知する。各処理サーバは、受け取った読み込み範囲に従いストレージ１１１から負荷データ２１２の一部を読み取り夫々のメモリ１０７に処理負荷情報２０９として格納する。 In step 402, the load placement unit 201 of the management server 101 reads the load data 212 stored on the storage 111 and stores it on the memory 107 of the processing servers 106-1 to 106-8. Since distributed processing is assumed, each processing server stores different load data portions in a memory and performs processing. For example, the load placement unit 201 of the management server 101 acquires the data size of the load data 212 from the storage 111. Thereafter, the load placement unit 201 of the management server 101 determines the read size for one processing server by dividing the data size of the load data 212 by the number of processing servers (8 in this embodiment). Thereafter, the load placement unit 201 of the management server 101 notifies each processing server of the read range. Each processing server reads a part of the load data 212 from the storage 111 according to the received reading range and stores it as processing load information 209 in each memory 107.

図５にステップ４０２による負荷配置イメージを示す。処理サーバ１は頂点１〜６、処理サーバ２は頂点７〜１２というように、各サーバに連続した番号の頂点が格納されている。図６は、各処理サーバ１０６−１〜８に格納されている配置情報の例である。各リストには、各サーバに配置されている頂点と、該頂点に隣接する頂点の番号とが格納されている。例えば、サーバ１には頂点１〜６が割り当てられており、頂点１には頂点２と頂点４３が隣接していることを示す。 FIG. 5 shows a load arrangement image in step 402. The processing server 1 has vertices 1 to 6 and the processing server 2 has vertices 7 to 12, and so on. FIG. 6 is an example of arrangement information stored in each of the processing servers 106-1 to 106-8. Each list stores vertices arranged in each server and the numbers of vertices adjacent to the vertices. For example, vertices 1 to 6 are assigned to the server 1, and the vertex 1 indicates that the vertex 2 and the vertex 43 are adjacent to each other.

ステップ４０３では、ステップ４０２の結果をもとに各サーバ間の通信量の予測を行う。ステップ４０３の詳細な手順を図９に示す。通信量予測処理は管理サーバ１０１からの指示に基づいて処理サーバ１０６−１〜８が夫々独立に実行し、最終的に管理サーバ１０１の予測通信量集計部２０２が結果を集計する。 In step 403, the amount of communication between servers is predicted based on the result of step 402. The detailed procedure of step 403 is shown in FIG. Based on the instruction from the management server 101, the processing servers 106-1 to 106-8 execute the traffic amount prediction process independently, and finally, the predicted traffic amount calculation unit 202 of the management server 101 adds up the results.

ステップ９０１では、管理サーバ１０１の予測通信量集計部２０２は、処理サーバの中からまだ予測処理を実行していないサーバを一つ選択する。例えば、管理サーバ１０１は、処理サーバ１０６−１を選択する。 In step 901, the predicted traffic totaling unit 202 of the management server 101 selects one server that has not yet executed the prediction process from the processing servers. For example, the management server 101 selects the processing server 106-1.

ステップ９０２では、管理サーバ１０１は、ステップ９０１で選択した処理サーバに対して、通信量予測処理を開始するよう指示する。 In step 902, the management server 101 instructs the processing server selected in step 901 to start the traffic amount prediction process.

ステップ９０３では、ステップ９０１で選択された処理サーバの通信量予測部２０７が、自身のメモリ上に格納されているネットワークスイッチ対応情報２０６を参照し、自身と接続しているネットワークスイッチのＩＤを取得する。図８のとおり、処理サーバ１０６−１は、ネットワークスイッチ１と接続している。また、該処理サーバの通信量予測部２０７は、取得したＩＤに対応するネットワークスイッチと、それ自身を含むすべてのネットワークスイッチとの間の通信量のカウンタを０に初期化する。 In step 903, the traffic prediction unit 207 of the processing server selected in step 901 refers to the network switch correspondence information 206 stored in its own memory, and acquires the ID of the network switch connected to itself. To do. As shown in FIG. 8, the processing server 106-1 is connected to the network switch 1. In addition, the traffic amount prediction unit 207 of the processing server initializes a traffic amount counter between the network switch corresponding to the acquired ID and all network switches including itself to 0.

ステップ９０４では、ステップ９０１で選択された処理サーバの通信量予測部２０７が、該処理サーバに割り当てられている頂点のうち、まだ処理されていないものを選択する。選択の順序は任意であるが、例えばＩＤの小さい順から選択する。選択中の処理サーバ１は図６に示すとおり頂点１〜６が割り当てられていて、初期状態ではどの頂点も処理されていないので、最初に頂点１が選択される。 In step 904, the traffic prediction unit 207 of the processing server selected in step 901 selects a vertex that has not yet been processed among the vertices assigned to the processing server. The order of selection is arbitrary, but for example, selection is performed in ascending order of ID. The selected processing server 1 is assigned vertices 1 to 6 as shown in FIG. 6, and since no vertex is processed in the initial state, the vertex 1 is selected first.

ステップ９０５では、ステップ９０１で選択された処理サーバの通信量予測部２０７が、ステップ９０４で選択した頂点に隣接する頂点のうち、まだ処理されていないものを選択する。選択の順序は任意であるが、例えばＩＤの小さい順から選択する。図６のとおり、ステップ９０４で選択した頂点１には頂点２と頂点４３が隣接している。初期状態ではいずれも処理されていないので頂点２を選択する。 In step 905, the traffic estimation unit 207 of the processing server selected in step 901 selects a vertex that has not yet been processed among the vertices adjacent to the vertex selected in step 904. The order of selection is arbitrary, but for example, selection is performed in ascending order of ID. As shown in FIG. 6, the vertex 1 selected in step 904 is adjacent to the vertex 2 and the vertex 43. Since none is processed in the initial state, vertex 2 is selected.

ステップ９０６では、ステップ９０１で選択された処理サーバの通信量予測部２０７が、負荷配置リスト２０５を参照することによりステップ９０５で選択した頂点の所属するサーバのＩＤを取得する。図７のとおり、ステップ９０５で選択した頂点２はサーバ１に所属していることが分かる。 In step 906, the traffic prediction unit 207 of the processing server selected in step 901 acquires the ID of the server to which the vertex selected in step 905 belongs by referring to the load arrangement list 205. As can be seen from FIG. 7, the vertex 2 selected in step 905 belongs to the server 1.

ステップ９０７では、ステップ９０１で選択された処理サーバの通信量予測部２０７が、ネットワークスイッチ対応情報２０６を参照することによりステップ９０６で取得したＩＤに対応するサーバが接続しているネットワークスイッチのＩＤを取得する。図９のとおり、サーバ１はネットワークスイッチ１に接続していることがわかる。 In step 907, the traffic prediction unit 207 of the processing server selected in step 901 refers to the network switch correspondence information 206 to determine the ID of the network switch connected to the server corresponding to the ID acquired in step 906. get. As can be seen from FIG. 9, the server 1 is connected to the network switch 1.

ステップ９０８では、ステップ９０１で選択された処理サーバの通信量予測部２０７が、ステップ９０３で取得したＩＤに対応するネットワークスイッチとステップ９０７で取得したＩＤに対応するネットワークスイッチの間の通信量を加算する。ここではステップ９０７で取得したネットワークスイッチのＩＤは１なので、ネットワークスイッチ１に対する通信量に１を加算する。 In step 908, the traffic estimation unit 207 of the processing server selected in step 901 adds the traffic between the network switch corresponding to the ID acquired in step 903 and the network switch corresponding to the ID acquired in step 907. To do. Here, since the ID of the network switch acquired in step 907 is 1, 1 is added to the communication amount to the network switch 1.

ステップ９１０では、ステップ９０１で選択された処理サーバの通信量予測部２０７が、ステップ９０４で選択された頂点に隣接する頂点のうち、未処理のものがあるかどうか判定する。未処理のものがある場合、ステップ９０５に戻り、該処理サーバは前回実行時と異なる隣接頂点を選択する。未処理のものが無い場合、ステップ９１１に進む。ここでは、上記のとおり選択中の頂点１に頂点２と頂点４３が隣接しており、頂点４３が未処理であるため判定の結果がｙｅｓとなり、ステップ９０５に戻り頂点４３が選択される。 In step 910, the traffic prediction unit 207 of the processing server selected in step 901 determines whether there is an unprocessed vertex among the vertices adjacent to the vertex selected in step 904. If there is an unprocessed one, the process returns to step 905, and the processing server selects an adjacent vertex different from the previous execution. If there is no unprocessed item, the process proceeds to step 911. Here, as described above, the vertex 2 and the vertex 43 are adjacent to the currently selected vertex 1, and since the vertex 43 is not processed, the determination result is yes, and the processing returns to step 905 to select the vertex 43.

ステップ９１１では、ステップ９０１で選択された処理サーバの通信量予測部２０７が、ステップ９０１で選択された処理サーバに割り当てられている頂点のうち、まだ処理されていないものがあるかどうか判定する。処理されていないものがある場合、ステップ９０４に戻り、該処理サーバの通信量予測部２０７は、前回実行時と異なる頂点を選択する。処理されていないものが無い場合、ステップ９１２に進む。ここでは、上記のとおり選択中のサーバ１には頂点１〜６が割り当てられており、頂点２〜６は未処理であるため、判定結果がｙｅｓとなり、ステップ９０４に戻り、異なる頂点が選択される。 In step 911, the traffic prediction unit 207 of the processing server selected in step 901 determines whether there are any vertices that have not yet been processed among the vertices assigned to the processing server selected in step 901. If there is something that has not been processed, the process returns to step 904, and the traffic estimation unit 207 of the processing server selects a vertex that is different from the previous execution time. If there is no unprocessed item, the process proceeds to step 912. Here, since the vertices 1 to 6 are assigned to the selected server 1 as described above and the vertices 2 to 6 are unprocessed, the determination result is yes, and the process returns to step 904 to select a different vertex. The

ステップ９１２では、ステップ９０１で選択された処理サーバが、ステップ９０３〜９１１で求めた結果を管理サーバ１０１に送信する。 In step 912, the processing server selected in step 901 transmits the result obtained in steps 903 to 911 to the management server 101.

ステップ９１３では、管理サーバ１０１の予測通信量集計部２０２が、通信量予測処理を行っていない処理サーバがあるかどうかを判定する。通信量予測処理を行っていない処理サーバがある場合、ステップ９０１に戻り、管理サーバ１０１は、前回実行時と異なる処理サーバを選択する。通信量予測処理を行っていない処理サーバが無い場合、管理サーバ１０１は通信量予測処理を終了する。 In step 913, the predicted communication amount totaling unit 202 of the management server 101 determines whether there is a processing server that has not performed the traffic amount prediction process. When there is a processing server that has not performed the traffic amount prediction process, the process returns to step 901, and the management server 101 selects a processing server that is different from the previous execution time. When there is no processing server that has not performed the traffic amount prediction process, the management server 101 ends the traffic amount prediction process.

図１０に、ステップ４０３の出力結果である通信量予測結果を示す。この通信量予測結果は、ネットワークスイッチの組ごとに予測される通信量を格納した行列の形式で示されている。計算結果は通信量予測結果２０４として管理サーバ１０１のメモリ１０２に格納される。説明の便宜上、処理サーバを逐次に選択して通信量予測処理を行うようにしたが、各処理サーバで並列に通信量予測処理を行っても良い。 FIG. 10 shows the traffic volume prediction result that is the output result of step 403. This traffic volume prediction result is shown in the form of a matrix storing the traffic volume predicted for each set of network switches. The calculation result is stored in the memory 102 of the management server 101 as the traffic forecast result 204. For convenience of explanation, the processing server is sequentially selected to perform the traffic amount prediction process. However, the processing amount prediction process may be performed in parallel by each processing server.

ステップ４０４では、管理サーバ１０１の予測通信量通知部２０３が、ステップ４０３で計算された通信量予測結果２０４を制御Ｉ／Ｆ１０５を通じて接続可変ネットワーク装置１１２に送信する。 In step 404, the predicted communication amount notification unit 203 of the management server 101 transmits the communication amount prediction result 204 calculated in step 403 to the connection variable network device 112 through the control I / F 105.

ステップ４０５では、接続可変ネットワーク装置１１２のＣＰＵ１１５で実行されるトポロジ最適化部２１３が、受信した通信量予測結果に応じたトポロジを決定する。トポロジの決定方法としては、例えば全体の通信量に対する特定のネットワークスイッチ間通信量の割合に応じてリンクを分配する方法がある。 In step 405, the topology optimization unit 213 executed by the CPU 115 of the connection variable network device 112 determines a topology according to the received traffic amount prediction result. As a method for determining the topology, for example, there is a method of distributing links according to the ratio of the communication amount between specific network switches to the entire communication amount.

例えば、図１０に示した通信量予測結果から、全てのネットワークスイッチの組の通信量を列挙すると図１１のような結果が得られる。各組の通信量を合計すると全体の通信量が得られる。図１１の例では全体の通信量は３０となる（図示せず）。こうして求めた全体の通信量に対して各ネットワークスイッチの組の通信量の割合を求めると、図１１の“割合”列のような結果が得られる。この割合に従ってリンクを分配する。図３に示したように各ネットワークスイッチから３ポートに対してリンクを張っている場合、全部で１２ポート分が可変リンクで利用可能である。可変リンク１本につき２ポートを使用するため、使用できる可変リンクの数は最大で６本となる。前記割合に沿って６本を分配すると図１１の“リンク本数”列のようになる。リンク本数は整数値しか取れないため、小数点以下は切り捨てる。ＣＰＵ１１５で実行されるトポロジ最適化部２１３は、このようにして得られたトポロジ最適化結果を制御装置１１８に送り、最適結果を反映するように可変リンク３０３を変更する。図１２は、この結果を反映したトポロジの例である。特定のネットワークスイッチ間でポート８と１１及びポート９と１０という２本の可変リンクが張られており、当該スイッチ間でより多くの帯域を使用する事が可能になっている。 For example, enumerating the communication traffic of all network switch sets from the communication traffic prediction results shown in FIG. 10 yields the results shown in FIG. The total amount of communication is obtained by summing the amount of communication for each group. In the example of FIG. 11, the total communication amount is 30 (not shown). When the ratio of the communication amount of each network switch set to the total communication amount thus obtained is obtained, a result as shown in the “ratio” column of FIG. 11 is obtained. Distribute links according to this percentage. As shown in FIG. 3, when links are established for 3 ports from each network switch, a total of 12 ports can be used as variable links. Since two ports are used for each variable link, the maximum number of variable links that can be used is six. When 6 lines are distributed along the ratio, the “number of links” column in FIG. 11 is obtained. Since the number of links can only be an integer value, the decimal part is rounded down. The topology optimization unit 213 executed by the CPU 115 sends the topology optimization result thus obtained to the control device 118, and changes the variable link 303 to reflect the optimum result. FIG. 12 is an example of a topology reflecting this result. Two variable links of ports 8 and 11 and ports 9 and 10 are established between specific network switches, and more bandwidth can be used between the switches.

ステップ４０６では、管理サーバ１０１が、各処理サーバ１０６−１〜８に処理開始の準備が整ったことを通知する。通知を受けた各処理サーバ１０６−１〜８は、夫々の持つ処理部２０８の実行を開始する。 In step 406, the management server 101 notifies each of the processing servers 106-1 to 106-8 that preparations for starting the processing are complete. Receiving the notification, each of the processing servers 106-1 to 106-8 starts executing the processing unit 208 of each processing server.

一連の手順により、グラフデータの不均一性に起因する通信量の偏りを反映したネットワークトポロジを構成することができる。これにより、サーバ間の通信で発生していたボトルネックを解消することができるため処理を高速に実行することが可能になる。 Through a series of procedures, it is possible to configure a network topology that reflects a deviation in traffic caused by non-uniformity of graph data. Thereby, since the bottleneck which generate | occur | produced in the communication between servers can be eliminated, it becomes possible to perform a process at high speed.

１００：情報処理システム、１０１：管理サーバ、１０２：メモリ、１０３：ＣＰＵ、１０４：ネットワークインターフェース、１０５：制御インターフェース、１０６−１〜８：処理サーバ、１０７：メモリ、１０８：ＣＰＵ、１０９：ネットワークインターフェース、１１０−１〜５：ネットワークスイッチ、１１１：ストレージ、１１２：接続可変ネットワーク装置、１１３：制御部、１１４：制御インターフェース、１１５：ＣＰＵ、１１７：可変リンク部、１１８：制御装置、２０１：負荷配置部、２０２：予測通信量集計部、２０３：予測通信量通知部、２０４：通信量予測結果、２０５：負荷配置リスト、２０６：ネットワークスイッチ対応情報、２０７：通信量予測部、２０８：処理部、２０９：処理負荷情報、２１２：負荷データ、２１３：トポロジ最適化部、３０１：リンク、３０２：ポート、３０３：可変リンク、１３０１：グラフの頂点、１３０２：グラフの辺。 100: Information processing system 101: Management server 102: Memory 103: CPU 104: Network interface 105: Control interface 106-1-8: Processing server 107: Memory 108: CPU 109: Network interface 110-1 to 5: network switch, 111: storage, 112: connection variable network device, 113: control unit, 114: control interface, 115: CPU, 117: variable link unit, 118: control device, 201: load arrangement 202: Predicted traffic totaling unit, 203: Predicted traffic notification unit, 204: Traffic forecast result, 205: Load allocation list, 206: Network switch correspondence information, 207: Traffic forecast unit, 208: Processing unit, 209: Processing load information, 212: Negative Data, 213: topology optimizing unit, 301: Link, 302: Port 303: variable link, 1301: the vertices of the graph, 1302: edges of the graph.

Claims

A plurality of computation nodes and a network device that makes connection between the plurality of computation nodes variable,
Distribute processing load among the plurality of computing nodes,
Predicting traffic between the plurality of nodes caused by the processing load;
Determining the connection based on the prediction results ;
The processing load has a graph structure;
When distributing the processing load in a distributed manner, the graph vertices are placed in each computation node
An information processing system that predicts the traffic volume based on the number of edges that connect graph vertices arranged in different calculation nodes when the traffic volume is predicted .

The information processing system according to claim 1,
Have a management node,
The management node is
Distributing the processing load to the plurality of computing nodes,
An information processing system that instructs each computation node to perform the prediction.

The information processing system according to claim 1,
The plurality of computing nodes are divided into groups and connected to switches for each group,
An information processing system, wherein each switch is connected to each port of the network device.

The information processing system according to claim 3 ,
When determining the connection based on the prediction result,
An information processing system, wherein the connection is determined by predicting a communication amount between each switch from the prediction result.

The information processing system according to claim 3 ,
Each switch is connected to a plurality of said ports, respectively, Information processing system characterized by things

The information processing system according to claim 1,
The information processing system, wherein the network device is an optical switch.