JP2012065281A

JP2012065281A - Communication program, communication apparatus, communication method, and communication system

Info

Publication number: JP2012065281A
Application number: JP2010209961A
Authority: JP
Inventors: Takahiro Kawashima; 崇裕川島; Minoru Tanaka; 稔田中
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2010-09-17
Filing date: 2010-09-17
Publication date: 2012-03-29
Also published as: EP2431885A1; US20120072607A1

Abstract

PROBLEM TO BE SOLVED: To suppress that a protocol with communication performance lower than that of the other protocols is selected, when selecting a protocol.SOLUTION: A communication program causes a computer, which is included in a computer group and transmits data to the other computers included in the computer group by using any one of communication protocols among a plurality of communication protocols, to select a communication protocol, from the plurality of communication protocols, of which transfer time predicted on the basis of the number of hops on a communication path up to the other computers from the computer and a size of the data is shorter than that of the other communication protocols, and to transmit the data by using the selected communication protocol.

Description

本発明は、通信プログラム、通信装置、通信方法、及び通信システムに関する。通信プログラム、通信装置、通信方法として、例えばＲＤＭＡ通信の制御に関する通信プログラム、通信装置、通信方法、及び通信システムが含まれる。 The present invention relates to a communication program, a communication device, a communication method, and a communication system. As a communication program, a communication device, and a communication method, for example, a communication program, a communication device, a communication method, and a communication system related to control of RDMA communication are included.

コンピュータ内のメモリに記憶されたデータを他のコンピュータ内のメモリに転送する通信技術は、ＲＤＭＡ（Remote Direct Memory Access）と呼ばれる。ＲＤＭＡでデーｆタ転送を行なう場合には、ＥａｇｅｒプロトコルやＲｅｎｄｅｚｖｏｕｓプロトコルなどの手順に沿って、指定された転送先のメモリ領域にデータを転送する。 A communication technique for transferring data stored in a memory in a computer to a memory in another computer is called RDMA (Remote Direct Memory Access). When data transfer is performed by RDMA, data is transferred to a designated transfer destination memory area in accordance with procedures such as the Eager protocol and the Rendezvous protocol.

データ転送に関して、例えば、方式の切り変えしきい値のメッセージ長を指定する技術がある（例えば、非特許文献１参照）。 With regard to data transfer, for example, there is a technique for designating a message length of a method switching threshold (for example, see Non-Patent Document 1).

特開２００７−３３６１０１号公報JP 2007-336101 A

「IBM System BlueGene Solution: Application Development」、２００７年６月、fifth edition、ｐ．５“IBM System BlueGene Solution: Application Development”, June 2007, fifth edition, p. 5 「ＭＶＡＰＩＣＨ２１．５ＵｓｅｒＧｕｉｄｅ」、11.21 MV2_IBA_EAGER_THRESHOLD、［online］、２０１０年７月１１日、［平成２２年９月１６日検索］、インターネット（ＵＲＬ：http://mvapich.cse.ohio-state.edu/support/user_guide_mvapich2-1.5.html#x1-11000011.21）“MVAPICH21.5 User Guide”, 11.21 MV2_IBA_EAGER_THRESHOLD, [online], July 11, 2010, [searched September 16, 2010], Internet (URL: http://mvapich.cse.ohio-state.edu /support/user_guide_mvapich2-1.5.html#x1-11000011.21)

プロトコルによってデータ転送に要する通信回数などが異なるため、転送先と転送元との間の通信遅延がデータ転送に要する時間に与える影響は異なる。そのため、転送するデータのデータサイズに応じてデータ転送に利用するプロトコルを選択した場合であっても、通信遅延によっては、選択されなかった他のプロトコルを利用した方が、データ転送に要する時間が少なくなる可能性がある。 Since the number of communications required for data transfer differs depending on the protocol, the influence of the communication delay between the transfer destination and the transfer source on the time required for data transfer differs. For this reason, even when a protocol used for data transfer is selected according to the data size of the data to be transferred, depending on the communication delay, the time required for data transfer is better when another protocol not selected is used. May be less.

本発明の一側面として、他のプロトコルよりも通信性能が低いプロトコルを選択してしまうことを抑制することを目的とする。 An object of one aspect of the present invention is to suppress selection of a protocol having lower communication performance than other protocols.

そこで上記課題を解決するため、通信プログラムは、コンピュータ群に含まれるコンピュータであって、複数の通信プロトコルのうちのいずれかの通信プロトコルを用いて、前記コンピュータ群に含まれる他のコンピュータにデータを送信するコンピュータに、前記複数の通信プロトコルのうち、前記コンピュータから前記他のコンピュータまでの通信経路上のホップ数と、前記データのデータサイズと、に基づいて予測される転送時間が他の通信プロトコルよりも短い通信プロトコルを選択し、選択した通信プロトコルを用いて前記データを送信することを実行させることを特徴とする。 In order to solve the above problem, a communication program is a computer included in a computer group, and data is transmitted to another computer included in the computer group using any one of a plurality of communication protocols. The transmission time predicted based on the number of hops on the communication path from the computer to the other computer and the data size of the data among the plurality of communication protocols is transmitted to the transmitting computer. It is characterized by selecting a shorter communication protocol and transmitting the data using the selected communication protocol.

本発明の一側面によれば、プロトコルを選択する場合に、他のプロトコルよりも通信性能が低いプロトコルを選択してしまうことを抑制することができる。 According to one aspect of the present invention, when a protocol is selected, it is possible to suppress selection of a protocol having lower communication performance than other protocols.

本実施の形態におけるＥａｇｅｒプロトコルを説明するための図である。It is a figure for demonstrating the eager protocol in this Embodiment. 本実施の形態におけるＲｅｎｄｅｚｖｏｕｓプロトコルを説明するための図である。It is a figure for demonstrating the Rendezvous protocol in this Embodiment. ホップ数を説明するための図である。It is a figure for demonstrating the number of hops. ファットツリーの例を示す図である。It is a figure which shows the example of a fat tree. メッシュを構成する場合のコンピュータ群の接続例を示す図である。It is a figure which shows the example of a connection of the computer group in the case of comprising a mesh. トーラスを構成する場合のコンピュータ群の接続例を示す図である。It is a figure which shows the example of a connection of the computer group in the case of comprising a torus. 第一の実施の形態におけるコンピュータシステムの構成例を示す図である。It is a figure which shows the structural example of the computer system in 1st embodiment. 第一の実施の形態におけるコンピュータのハードウェア構成例を示す図である。It is a figure which shows the hardware structural example of the computer in 1st embodiment. 第一の実施の形態におけるコンピュータの機能構成例を示す図である。It is a figure which shows the function structural example of the computer in 1st embodiment. ホップ数管理テーブルの構成例を示す図である。It is a figure which shows the structural example of a hop number management table. 送信側のコンピュータが実行するデータの送信処理の処理手順の一例を説明するためのフローチャートである。It is a flowchart for demonstrating an example of the process sequence of the transmission process of the data which the computer of a transmission side performs. 受信側のコンピュータが実行するデータの受信処理の処理手順の一例を説明するためのフローチャートである。It is a flowchart for demonstrating an example of the process sequence of the data reception process which the computer of the receiving side performs. 第二の実施の形態におけるジョブ管理装置の機能構成例を示す図である。It is a figure which shows the function structural example of the job management apparatus in 2nd embodiment. 座標情報記憶部の構成例を示す図である。It is a figure which shows the structural example of a coordinate information storage part. 第二の実施の形態におけるコンピュータの機能構成例を示す図である。It is a figure which shows the function structural example of the computer in 2nd embodiment. 第二の実施の形態におけるアプリケーション実行開始時の処理手順の一例を説明するためのシーケンス図である。It is a sequence diagram for demonstrating an example of the process sequence at the time of the application execution start in 2nd embodiment.

以下、図面に基づいて本発明の実施の形態を説明する。まず、本実施の形態の考え方について説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings. First, the concept of the present embodiment will be described.

或るコンピュータの主記憶に記憶されているデータを、リモートの他のコンピュータの主記憶へＤＭＡ（Direct Memory Access）転送することは、ＲＤＭＡ（Remote Direct Memory Access）と呼ばれる。ＲＤＭＡのＷＲＩＴＥ（ＰＵＴとも呼ばれるデータの書き込み要求。）を用いて、一方のコンピュータｃ１の主記憶上の領域ｒ１にあるデータｄ１を、他方のコンピュータｃ２の主記憶上の領域ｒ２に転送する場合、コンピュータｃ１は領域ｒ２の仮想アドレス等を知っている必要がある。また、毎回同じ領域間でデータ転送が行われるとは限らない。すなわち、領域ｒ１’から領域ｒ２’への転送、領域ｒ１’’から領域ｒ２’’への転送も行われうる。ＲＤＭＡによってデータを転送する場合に、Ｅａｇｅｒプロトコル又はＲｅｎｄｅｚｖｏｕｓプロトコルが利用されている。 Transferring data stored in the main memory of one computer to the main memory of another remote computer is called RDMA (Remote Direct Memory Access). When transferring data d1 in the area r1 on the main memory of one computer c1 to the area r2 on the main memory of the other computer c2 using RDMA WRITE (data write request also called PUT), The computer c1 needs to know the virtual address of the area r2. In addition, data transfer is not always performed between the same areas. That is, transfer from the region r1 'to the region r2' and transfer from the region r1 "to the region r2" can also be performed. When data is transferred by RDMA, the eager protocol or the rendezvous protocol is used.

図１は、本実施の形態におけるＥａｇｅｒプロトコルを説明するための図である。 FIG. 1 is a diagram for explaining the eager protocol in the present embodiment.

同図において、コンピュータＣ１とコンピュータＣ２とは、それぞれＲＤＭＡ（Remote Direct Memory Access）通信機能を有し、インターコネクト等のネットワークを介して接続されている。同図を用いて、コンピュータＣ１からコンピュータＣ２へ、Ｅａｇｅｒプロトコルを利用してデータ転送を行う場合の処理手順について説明する。 In the figure, a computer C1 and a computer C2 each have an RDMA (Remote Direct Memory Access) communication function and are connected via a network such as an interconnect. A processing procedure for transferring data from the computer C1 to the computer C2 using the eager protocol will be described with reference to FIG.

まず、コンピュータＣ１は、主記憶Ｍ１内の領域Ｒ１に記憶されている、転送対象のデータｄ１を主記憶Ｍ１内の送信バッファＲ３にメモリコピーする（Ｓ１）。続いて、コンピュータＣ１は、送信バッファＲ３のデータｄ１の前後のいずれか一方に、制御情報ｈ１を付加する（Ｓ２）。制御情報ｈ１には、例えば、Ｅａｇｅｒプロトコルの識別子等が含まれている。続いて、コンピュータＣ１は、送信バッファＲ３内のデータｄ１及び制御情報ｈ１を、コンピュータＣ２の主記憶Ｍ２内の受信バッファＲ４に転送する（Ｓ３）。なお、コンピュータＣ１は、受信バッファＲ４の仮想アドレス等を予め知っている。 First, the computer C1 makes a memory copy of the transfer target data d1 stored in the area R1 in the main memory M1 to the transmission buffer R3 in the main memory M1 (S1). Subsequently, the computer C1 adds the control information h1 to either one before or after the data d1 in the transmission buffer R3 (S2). The control information h1 includes, for example, an eager protocol identifier. Subsequently, the computer C1 transfers the data d1 and the control information h1 in the transmission buffer R3 to the reception buffer R4 in the main memory M2 of the computer C2 (S3). The computer C1 knows in advance the virtual address of the reception buffer R4.

続いて、コンピュータＣ２は、受信バッファＲ４の制御情報ｈ１を解読する（Ｓ４）。続いて、コンピュータＣ２は、格納領域（領域Ｒ２）を確定し、データｄ１を主記憶Ｍ２内の領域Ｒ２にメモリコピーする（Ｓ５）。Ｓ５において、コンピュータＣ２は、格納領域を確定させるまでに、データｄ１を不図示の退避領域に退避させておくなどの処理も行ない得る。 Subsequently, the computer C2 decodes the control information h1 in the reception buffer R4 (S4). Subsequently, the computer C2 determines a storage area (area R2) and makes a memory copy of the data d1 to the area R2 in the main memory M2 (S5). In S5, the computer C2 can also perform processing such as saving the data d1 in a save area (not shown) before determining the storage area.

図１において、データｄ１の転送に関して、Ｅａｇｅｒプロトコルにおける通信に要する時間（Ｔ＿Ｅａｇｅｒ）は、例えば、以下の式（Ｅ）で表される。 In FIG. 1, regarding the transfer of data d1, the time (T_Eger) required for communication in the eager protocol is represented by the following equation (E), for example.

Ｔ＿Ｅａｇｅｒ＝（Ｄ／Ｗ）＋（Ｌ＿１＋Ｄ／Ｂ）＋（Ｄ／Ｗ）＋Ｓ＿Ｅａｇｅｒ・・・（Ｅ）
Ｄ：データサイズ（ｂｙｔｅ）
Ｗ：メモリコピーのバンド幅（帯域幅）（ｂｙｔｅ／ｓｅｃ）
Ｂ：インターコネクトのバンド幅（ＲＤＭＡ通信のバンド幅）（ｂｙｔｅ／ｓｅｃ）
Ｌ＿１：インターコネクトのＲＤＭＡの通信遅延（ｓｅｃ）
Ｓ＿Ｅａｇｅｒ：Ｅａｇｅｒプロトコルにおけるソフトウェアオーバーヘッド時間（ｓｅｃ） T_Eager = (D / W) + (L_1 + D / B) + (D / W) + S_Eager (E)
D: Data size (bytes)
W: Memory copy bandwidth (bandwidth) (byte / sec)
B: Interconnect bandwidth (RDMA communication bandwidth) (byte / sec)
L_1: RDMA communication delay of the interconnect (sec)
S_Eger: Software overhead time in the eager protocol (sec)

ここで、式（Ｅ）の右辺の第１項の（Ｄ／Ｗ）は、ステップＳ１におけるメモリコピーに要する時間である。第２項の（Ｌ＿１＋Ｄ／Ｂ）は、ステップＳ３におけるデータ転送に要する時間である。第３項の（Ｄ／Ｗ）は、ステップＳ５におけるメモリコピーに要する時間である。第４項のＳ＿Ｅａｇｅｒは、ステップＳ２及びＳ４等を含むソフトウェアの処理に要するオーバーヘッド時間である。 Here, (D / W) in the first term on the right side of the equation (E) is the time required for the memory copy in step S1. The second term (L_1 + D / B) is the time required for data transfer in step S3. The third term (D / W) is the time required for the memory copy in step S5. S_Eager in the fourth term is an overhead time required for software processing including steps S2 and S4.

なお、Ｌ＿１の通信遅延とは、０バイトのデータの転送に要する、ハードウェアによるオーバーヘッド時間を意味する。 Note that the communication delay of L_1 means an overhead time by hardware required for transferring 0-byte data.

一方、図２は、本実施の形態におけるＲｅｎｄｅｚｖｏｕｓプロトコルを説明するための図である。コンピュータＣ１とコンピュータＣ２との関係は、図１と同じである。同図を用いて、コンピュータＣ１からコンピュータＣ２へ、Ｒｅｎｄｅｚｖｏｕｓプロトコルを利用してデータ転送を行う場合の処理手順について説明する。 On the other hand, FIG. 2 is a diagram for explaining the Rendezvous protocol in the present embodiment. The relationship between the computer C1 and the computer C2 is the same as in FIG. A processing procedure for transferring data from the computer C1 to the computer C2 using the Rendezvous protocol will be described with reference to FIG.

まず、コンピュータＣ１は、制御情報ｈ２をコンピュータＣ２に送信する（Ｓ１１）。制御情報ｈ２には、例えば、Ｒｅｎｄｅｚｖｏｕｓプロトコルの識別子等が含まれている。コンピュータＣ２は、制御情報ｈ２を受信すると、制御情報ｈ２を解読する（Ｓ１２）。続いて、コンピュータＣ２は、制御情報ｈ３をコンピュータＣ１に送信する（Ｓ１３）。制御情報ｈ３には、例えば、主記憶Ｍ２内の受信用の領域Ｒ２の仮想アドレス等が含まれる。 First, the computer C1 transmits control information h2 to the computer C2 (S11). The control information h2 includes, for example, an Rendezvous protocol identifier. When the computer C2 receives the control information h2, the computer C2 decodes the control information h2 (S12). Subsequently, the computer C2 transmits the control information h3 to the computer C1 (S13). The control information h3 includes, for example, a virtual address of the reception area R2 in the main memory M2.

コンピュータＣ１は、制御情報ｈ３を受信すると、制御情報ｈ３を解読して領域Ｒ２の仮想アドレス等を取得する（Ｓ１４）。続いて、コンピュータＣ１は、主記憶Ｍ１内の領域Ｒ１に記憶されているデータｄ１を、コンピュータＣ２の領域Ｒ２に転送する（Ｓ１５）。 When the computer C1 receives the control information h3, the computer C1 decodes the control information h3 and obtains a virtual address or the like of the region R2 (S14). Subsequently, the computer C1 transfers the data d1 stored in the area R1 in the main memory M1 to the area R2 of the computer C2 (S15).

図２において、データｄ１の転送に関して、Ｒｅｎｄｅｚｖｏｕｓプロトコルにおける通信にかかる時間（Ｔ＿Ｒｅｎｄｅｚｖｏｕｓ）は、例えば、以下の式（Ｒ）で表される。 In FIG. 2, the time (T_Rendezvous) required for communication in the Rendezvous protocol regarding the transfer of the data d1 is represented by the following equation (R), for example.

Ｔ＿Ｒｅｎｄｅｚｖｏｕｓ＝Ｌ＿２×２＋（Ｌ＿１＋Ｄ／Ｂ）＋Ｓ＿Ｒｅｎｄｅｚｖｏｕｓ・・・（Ｒ）
Ｄ：データサイズ（ｂｙｔｅ）
Ｂ：インターコネクトのバンド幅（帯域幅）（ｂｙｔｅ／ｓｅｃ）
Ｌ＿１：インターコネクトのＲＤＭＡの通信遅延（ｓｅｃ）
Ｌ＿２：インターコネクトの制御通信の通信遅延（ｓｅｃ）
Ｓ＿Ｒｅｎｄｅｚｖｏｕｓ：Ｒｅｎｄｅｚｖｏｕｓプロトコルにおけるソフトウェアオーバーヘッド時間（ｓｅｃ） T_Rendezvous = L_2 × 2 + (L_1 + D / B) + S_Rendezvous (R)
D: Data size (bytes)
B: Interconnect bandwidth (bandwidth) (byte / sec)
L_1: RDMA communication delay of the interconnect (sec)
L_2: Communication delay of interconnect control communication (sec)
S_Rendezvous: Software overhead time in the Rendezvous protocol (sec)

ここで、式（Ｒ）の右辺の第１項Ｌ＿２×２は、ステップＳ１１及びＳ１３における制御情報ｈ２又はｈ３の送受信に要する時間である。制御情報ｈ２及びｈ３はデータサイズが非常に小さいため、通信遅延Ｌ＿２だけを要すると考えることができる。第２項の（Ｌ＿２＋Ｄ／Ｂ）は、ステップＳ１５に要する時間である。第３項Ｓ＿Ｒｅｎｄｅｚｖｏｕｓは、ステップＳ１２及びＳ１４等を含むソフトウェアの処理に要するオーバーヘッド時間である。 Here, the first term L_2 × 2 on the right side of the equation (R) is the time required to transmit / receive the control information h2 or h3 in steps S11 and S13. Since the control information h2 and h3 have a very small data size, it can be considered that only the communication delay L_2 is required. The second term (L_2 + D / B) is the time required for step S15. The third term S_Rendezvous is an overhead time required for software processing including steps S12 and S14.

式（Ｅ）及び式（Ｒ）に関して、ネットワークトポロジーが任意のノード（コンピュータ）間の通信時間が略均一なものである場合は、一般的に、Ｗ、Ｂ、Ｌ＿１、Ｌ＿２、Ｓ＿Ｅａｇｅｒ、及びＳ＿Ｅａｇｅｒの値は、ハードウェアとソフトウェアとの特性によって決まる。したがって、これらのパラメータの値は、通信するコンピュータの組み合わせに関わらず、ほぼ定数であると考えることができる。 Regarding the equations (E) and (R), if the network topology has a substantially uniform communication time between arbitrary nodes (computers), in general, W, B, L_1, L_2, S_Eager, and S_Eager The value of depends on the characteristics of the hardware and software. Therefore, the values of these parameters can be considered to be almost constant regardless of the combination of computers that communicate.

通常のネットワーク状態において、データサイズＤが小さくなればなるほど、Ｔ＿Ｅａｇｅｒ＜＜Ｔ＿Ｒｅｎｄｅｚｖｏｕｓとなる。一方、データサイズＤが大きくなればなるほどＴ＿Ｅａｇｅｒ＞＞Ｔ＿Ｒｅｎｄｅｚｖｏｕｓとなる。このことは、Ｔ＿ＥａｇｅｒとＴ＿Ｒｅｎｄｅｚｖｏｕｓの式でＤに０を代入した場合と、Ｄに∞（無限大）を代入した場合とを比較すれば明らかである。 In a normal network state, the smaller the data size D, the more T_Eager << T_Rendezvous. On the other hand, T_Eager >> T_Rendezvous as the data size D increases. This is apparent when comparing the case where 0 is substituted for D and the case where ∞ (infinity) is substituted for D in the equations of T_Eager and T_Rendezvous.

また、式（Ｅ）及び式（Ｒ）は、いずれもデータサイズＤの一次式である。したがって、ＥａｇｅｒプロトコルとＲｅｎｄｅｚｖｏｕｓプロトコルとを切り替えるためのデータサイズＤの閾値Ｄ＿Ｔｈｒｅｓｈｏｌｄは、式（Ｅ）と式（Ｒ）について、Ｔ＿Ｅａｇｅｒ＝Ｔ＿Ｒｅｎｄｅｚｖｏｕｓである場合のＤについて解けば求まる。その結果が、以下の式（Ｄｔ１）である。 In addition, the expressions (E) and (R) are both linear expressions of the data size D. Therefore, the threshold value D_Threshold of the data size D for switching between the Eager protocol and the Rendezvous protocol can be obtained by solving D in the case of T_Eager = T_Rendezvous with respect to the expressions (E) and (R). The result is the following equation (Dt1).

Ｄ＿Ｔｈｒｅｓｈｏｌｄ＝（Ｌ＿２＋（Ｓ＿Ｒｅｎｄｅｚｖｏｕｓ−Ｓ＿Ｅａｇｅｒ）／２）×Ｗ・・・（Ｄｔ１） D_Threshold = (L_2 + (S_Rendezvous−S_Eager) / 2) × W (Dt1)

ネットワークトポロジーが任意のノード間の通信時間が略均一なものである場合に、例えば、Ｗ、Ｌ＿２、Ｓ＿Ｅａｇｅｒ、及びＳ＿Ｅａｇｅｒの値を定数として代入して得られたＤ＿Ｔｈｒｅｓｈｏｌｄよりも、転送するデータのデータサイズが大きいか小さいかに基づいてプロトコルを切り替えることで、通信にかかる時間が抑制される。 When the communication time between arbitrary nodes in the network topology is substantially uniform, for example, data of data to be transferred is more than D_Threshold obtained by substituting values of W, L_2, S_Eager, and S_Eager as constants. By switching the protocol based on whether the size is large or small, the time required for communication is suppressed.

図３は、ホップ数を説明するための図である。なお、ホップ数Ｈとは、二つのコンピュータＣｉ間の通信経路上のインターコネクトによる接続数をいう。ノードＮ１〜Ｎ５は、コンピュータ又はスイッチ等である。例えば、ノードＮ１が送信元であり、ノードＮ５が送信先である場合、ホップ数Ｈは４となる。 FIG. 3 is a diagram for explaining the number of hops. Note that the hop number H refers to the number of connections through the interconnect on the communication path between the two computers Ci. The nodes N1 to N5 are computers or switches. For example, when the node N1 is the transmission source and the node N5 is the transmission destination, the hop count H is 4.

ネットワークトポロジーとしては、例えば、ファットツリー、メッシュ、トーラスなどがある。 Examples of the network topology include a fat tree, a mesh, and a torus.

図４は、ファットツリーの例を示す図である。同図のファットツリーにおいて、ルートスイッチＳＷ１及びＳＷ２は、ルートノードとしてのスイッチである。ルートスイッチＳＷ１及びＳＷ２のそれぞれには、リーフスイッチＳＷ３〜ＳＷ６が接続されている。各リーフスイッチＳＷ３〜６には、それぞれ、計算ノードとしてのコンピュータｃ３１〜ｃ３４、ｃ４１〜ｃ４４、ｃ５１〜ｃ５４、又はｃ６１〜ｃ６４が接続されている。 FIG. 4 is a diagram illustrating an example of a fat tree. In the fat tree shown in the figure, the root switches SW1 and SW2 are switches as root nodes. Leaf switches SW3 to SW6 are connected to the route switches SW1 and SW2, respectively. Computers c31 to c34, c41 to c44, c51 to c54, or c61 to c64 as calculation nodes are connected to the leaf switches SW3 to SW6, respectively.

通常のツリーのネットワークトポロジーではルートノードが1個だけであるため、ルートノード付近に通信負荷が集中し、通信性能が低下する原因となる。このような通常のツリーの欠点を回避するために、ファットツリーのネットワークトポロジーでは、複数個のルートノードが配置される。 In a normal tree network topology, since there is only one root node, the communication load is concentrated in the vicinity of the root node, which causes a decrease in communication performance. In order to avoid the disadvantages of the normal tree, a plurality of root nodes are arranged in the fat tree network topology.

ファットツリーにおいて、Ｗ、Ｂ、Ｓ＿Ｅａｇｅｒ、及びＳ＿Ｅａｇｅｒの値は、ほぼ定数であると考えることができる。Ｌ＿１とＬ＿２については、ルートノードを介して通信する場合（例えばｃ３１とｃ６４との通信やｃ３１とｃ５１との通信）とルートノードを介さずに通信する場合（例えばｃ３１とｃ３２との通信）とでホップ数Ｈが異なるため、異なる値となることが予想される。しかしながら、通信先がファットツリーに含まれるほとんどの通信ノード（ｃ３１に対しては、ｃ４１〜ｃ４４、ｃ５１〜ｃ５４、及びｃ６１〜ｃ６４）については、ほぼ同じ値であることが予想される。 In the fat tree, the values of W, B, S_Eager, and S_Eager can be considered to be substantially constant. For L_1 and L_2, when communicating via a root node (for example, communication between c31 and c64 or communication between c31 and c51) and when communicating without passing through a root node (for example, communication between c31 and c32) Since the number of hops H is different, the values are expected to be different. However, most communication nodes whose communication destinations are included in the fat tree (c41 to c44, c51 to c54, and c61 to c64 for c31) are expected to have substantially the same value.

したがって、図１又は図２に示されるコンピュータＣ１及びＣ２が、ファットツリーにおける計算ノードとして配置される場合、データｄ１のデータサイズＤに応じて、ＥａｇｅｒプロトコルとＲｅｎｄｅｚｖｏｕｓプロトコルとを切り替えて利用するなどにより、コンピュータＣ１とコンピュータＣ２との間の通信時間を短縮すると考えられる。 Therefore, when the computers C1 and C2 shown in FIG. 1 or FIG. 2 are arranged as calculation nodes in the fat tree, by switching between the Eager protocol and the Rendezvous protocol according to the data size D of the data d1, etc. It is considered that the communication time between the computer C1 and the computer C2 is shortened.

図５は、コンピュータＣ１〜Ｃ９がメッシュを構成する接続例を示す図である。また、図６は、コンピュータＣ１〜Ｃ９がトーラスを構成する接続例を示す図である。各コンピュータＣｉを結ぶ線分は、当該線分の両端に係るコンピュータＣｉを接続するインターコネクトを示す。 FIG. 5 is a diagram illustrating a connection example in which the computers C1 to C9 form a mesh. FIG. 6 is a diagram illustrating a connection example in which the computers C1 to C9 constitute a torus. A line segment connecting each computer Ci indicates an interconnect connecting the computers Ci related to both ends of the line segment.

なお、図５では、二次元のメッシュが示されている。また、図６では、二次元のトーラスが示されている。但し、それぞれの次元数は所定のものに限定されない。 In FIG. 5, a two-dimensional mesh is shown. In FIG. 6, a two-dimensional torus is shown. However, the number of dimensions is not limited to a predetermined number.

また、図５及び図６に示すように、メッシュやトーラスでは任意のノード間のホップ数は定まらない。 As shown in FIGS. 5 and 6, the number of hops between arbitrary nodes is not determined in the mesh or the torus.

ネットワークトポロジーによっては、Ｌ＿１及びＬ＿２の値が、通信を行うコンピュータの組み合わせによって変化する。Ｌ＿１又はＬ＿２は、通信を行うコンピュータ間のホップ数Ｈの一次関数として、以下のように表される。 Depending on the network topology, the values of L_1 and L_2 change depending on the combination of computers that perform communication. L_1 or L_2 is expressed as follows as a linear function of the number of hops H between computers performing communication.

Ｌ＿１＝Ｌ＿１Ｎ＋Ａ＿１×Ｈ・・・（Ｌ１）
Ｌ＿２＝Ｌ＿２Ｎ＋Ａ＿２×Ｈ・・・（Ｌ２） L_1 = L_1N + A_1 × H (L1)
L_2 = L_2N + A_2 × H (L2)

ここで、Ａ＿１及びＡ＿２は、それぞれ、１ホップあたりのインターコネクトの通信遅延（ＲＤＭＡ通信の通信遅延）の増加量（ｓｅｃ／ホップ）である。なお、通常、Ａ＿１及びＡ＿２の値は一致するが、一般化のために式（Ｌ１）と式（Ｌ２）とで異なる変数が付与されている。 Here, A_1 and A_2 are the amount of increase (sec / hop) in interconnect communication delay (communication delay of RDMA communication) per hop, respectively. Normally, the values of A_1 and A_2 match, but different variables are assigned to the formula (L1) and the formula (L2) for generalization.

Ｌ＿１Ｎ及びＬ＿２Ｎは、１ホップ目（送信元とその隣のノードとの間）の通信遅延における、ハードウェアによるオーバーヘッド時間（ｓｅｃ）である。 L_1N and L_2N are hardware overhead times (sec) in the communication delay of the first hop (between the transmission source and the adjacent node).

Ａ＿１、Ａ＿２、Ｌ＿１Ｎ、及びＬ＿２Ｎの値は、実際に利用されるコンピュータシステムにおいて予め計測されればよい。具体的には、Ａ＿１については、Ｅａｇｅｒプロトコルを利用した場合の通信時に、ホップ数が１増加するごとの通信遅延の増加量が計測されればよい。Ｌ＿１Ｎについては、Ｅａｇｅｒプロトコルを利用した場合のホップ数Ｈの通信時において、実際の通信遅延よりＡ＿１×Ｈを差し引くことによって求めればよい。 The values of A_1, A_2, L_1N, and L_2N may be measured in advance in a computer system that is actually used. Specifically, for A_1, it is only necessary to measure the amount of increase in communication delay every time the number of hops increases by one during communication when the eager protocol is used. L_1N may be obtained by subtracting A_1 × H from the actual communication delay during communication with the number of hops H when the eager protocol is used.

同様に、Ａ＿２については、Ｒｅｎｄｅｚｖｏｕｓプロトコルを利用した場合の通信時に、ホップ数が１増加するごとの通信遅延の増加量が計測されればよい。Ｌ＿２Ｎについては、Ｒｅｎｄｅｚｖｏｕｓプロトコルを利用した場合のホップ数Ｈの通信時において、実際の通信遅延よりＡ＿２×Ｈを差し引くことによって求めればよい。 Similarly, for A_2, it is only necessary to measure the amount of increase in communication delay every time the number of hops increases by one during communication when the Rendezvous protocol is used. L_2N may be obtained by subtracting A_2 × H from the actual communication delay during communication with the number of hops H when the Rendezvous protocol is used.

Ｌ＿１及びＬ＿２がホップ数Ｈの影響を受けるということは、閾値Ｄ＿Ｔｈｒｅｓｈｏｌｄについてもホップ数Ｈの影響を受けることになる。したがって、Ｌ＿１やＬ＿２の値を定数として扱えないネットワークトポロジーに関しては、式（Ｄｔ１）によって算出される閾値Ｄ＿Ｔｈｒｅｓｈｏｌｄでは、不十分であるということになる。すなわち、閾値Ｄ＿Ｔｈｒｅｓｈｏｌｄの算出方法を工夫することにより、通信性能を更に向上させる余地があると考えられる。 The fact that L_1 and L_2 are affected by the hop number H means that the threshold D_Threshold is also affected by the hop number H. Therefore, for a network topology that cannot handle the values of L_1 and L_2 as constants, the threshold D_Threshold calculated by the equation (Dt1) is insufficient. That is, it is considered that there is room for further improving the communication performance by devising a method for calculating the threshold value D_Threshold.

そこで、式（Ｅ）、式（Ｒ）に対して、式（Ｌ１）及び式（Ｌ２）を代入すると、以下の式（Ｅｈ）及び式（Ｒｈ）を得ることができる。 Therefore, the following formulas (Eh) and (Rh) can be obtained by substituting the formulas (L1) and (L2) into the formulas (E) and (R).

Ｔ＿Ｅａｇｅｒ＝（Ｄ／Ｗ）×２＋（Ｌ＿１Ｎ＋Ａ＿１×Ｈ）＋Ｄ／Ｂ＋Ｓ＿Ｅａｇｅｒ・・・（Ｅｈ） T_Eager = (D / W) × 2 + (L — 1N + A — 1 × H) + D / B + S_Eager (Eh)

Ｔ＿Ｒｅｎｄｅｚｖｏｕｓ＝（Ｌ＿２Ｎ＋Ａ＿２×Ｈ）×２＋（Ｌ＿１Ｎ＋Ａ＿１×Ｈ）＋Ｄ／Ｂ＋Ｓ＿Ｒｅｎｄｅｚｖｏｕｓ・・・（Ｒｈ） T_Rendezvous = (L_2N + A_2 × H) × 2 + (L_1N + A_1 × H) + D / B + S_Rendezvous (Rh)

すなわち、Ｔ＿ＥａｇｅｒとＴ＿Ｒｅｎｄｅｚｖｏｕｓとが、ホップ数Ｈの一次式となる。 That is, T_Eager and T_Rendezvous are linear expressions of the hop count H.

ＥａｇｅｒプロトコルとＲｅｎｄｅｚｖｏｕｓプロトコルとを切り替える閾値Ｄ＿Ｔｈｒｅｓｈｏｌｄは、Ｔ＿Ｅａｇｅｒ＝Ｔ＿Ｒｅｎｄｅｚｖｏｕｓである場合のＤについて式（Ｅｈ）及び式（Ｒｈ）を解けばよい。 The threshold D_Threshold for switching between the eager protocol and the rendezvous protocol may be obtained by solving the expressions (Eh) and (Rh) for D when T_Eger = T_Rendezvous.

すなわち、
（２／Ｗ）×Ｄ＝（Ｌ＿２Ｎ＋Ａ＿２×Ｈ）×２＋（Ｓ＿Ｒｅｎｄｅｚｖｏｕｓ−Ｓ＿Ｅａｇｅｒ）
から、
Ｄ＿Ｔｈｒｅｓｈｏｌｄ＝［Ａ＿２×Ｈ＋Ｌ＿２Ｎ＋（Ｓ＿Ｒｅｎｄｅｚｖｏｕｓ−Ｓ＿Ｅａｇｅｒ）／２］×Ｗ・・・（Ｄｔ２）
が導かれる。 That is,
(2 / W) × D = (L_2N + A_2 × H) × 2 + (S_Rendezvous−S_Eager)
From
D_Threshold = [A_2 × H + L_2N + (S_Rendezvous−S_Eager) / 2] × W (Dt2)
Is guided.

本実施の形態では、式（Ｄｔ２）の演算結果とデータＤのデータサイズとの比較に応じて、利用する通信プロトコルを切り替える。式（Ｄｔ２）は、ホップ数に応じて、Ｌ＿１及びＬ＿２が変化することが考慮されて導出された式である。したがって、式（Ｄｔ２）に基づく閾値Ｄ＿Ｔｈｒｅｓｈｏｌｄによれば、データＤのデータサイズのみならず、通信を行うコンピュータ間のホップ数Ｈをも考慮して、利用する通信プロトコルを選択することができる。その結果、式（Ｄｔ１）に基づいて通信プロトコルを選択する場合に比べて、特に、メッシュやトーラスのネットワークトポロジーにおいて、通信性能の向上を期待することができる。 In the present embodiment, the communication protocol to be used is switched according to the comparison between the calculation result of the formula (Dt2) and the data size of the data D. Formula (Dt2) is a formula derived in consideration of the fact that L_1 and L_2 change according to the number of hops. Therefore, according to the threshold value D_Threshold based on the equation (Dt2), it is possible to select a communication protocol to be used in consideration of not only the data size of the data D but also the number of hops H between computers performing communication. As a result, compared with the case where a communication protocol is selected based on the formula (Dt1), improvement in communication performance can be expected particularly in a mesh or torus network topology.

以上の考え方が適用された具体的なコンピュータシステムの一例について説明する。 An example of a specific computer system to which the above concept is applied will be described.

図７は、第一の実施の形態におけるコンピュータシステムの構成例を示す図である。同図において、ジョブ管理装置２０とコンピュータ群Ｃｓとは、ＬＡＮ（Local Area Network）等のネットワーク３０を介して接続されている。 FIG. 7 is a diagram illustrating a configuration example of the computer system according to the first embodiment. In the figure, a job management apparatus 20 and a computer group Cs are connected via a network 30 such as a LAN (Local Area Network).

ジョブ管理装置２０は、ユーザからジョブの入力を受け付け、入力されたジョブのコンピュータ群Ｃｓへの割り当て順序（ディスパッチ順序）の判定、及びジョブのディスパッチ等を行うコンピュータである。ジョブの入力の受け付けは、ジョブ管理装置２０を直接操作することにより入力されたものでも良いし、ネットワーク４０を介して端末装置５０との通信により入力されたものであっても良い。なお、ジョブ管理装置２０は、情報処理装置の一例である。 The job management apparatus 20 is a computer that receives a job input from a user, determines an assignment order (dispatch order) of the input job to the computer group Cs, and dispatches the job. The acceptance of the job input may be input by directly operating the job management apparatus 20 or may be input by communication with the terminal apparatus 50 via the network 40. Note that the job management apparatus 20 is an example of an information processing apparatus.

コンピュータ群Ｃｓは、分散メモリ型の並列コンピュータの集合である。各コンピュータＣｉ（以降において、ｉはコンピュータごとに異なる整数が付与される。）は、インターコネクト等の回線（ネットワーク）によって接続され、ＲＤＭＡ通信機能を備えている。 The computer group Cs is a set of distributed memory type parallel computers. Each computer Ci (hereinafter, i is assigned a different integer for each computer) is connected by a line (network) such as an interconnect and has an RDMA communication function.

例えば、コンピュータ群Ｃｓは、ネットワークトポロジーとしてメッシュ又はトーラスなどを構成するように接続される。 For example, the computer group Cs is connected to configure a mesh or a torus as a network topology.

図８は、第一の実施の形態におけるコンピュータのハードウェア構成例を示す図である。同図においてコンピュータＣｉは、それぞれバスBで相互に接続されているドライブ装置１００と、補助記憶装置１０２と、主記憶装置１０３と、ＣＰＵ１０４と、インタフェース装置１０５とを有する。 FIG. 8 is a diagram illustrating a hardware configuration example of the computer according to the first embodiment. In the figure, the computer Ci includes a drive device 100, an auxiliary storage device 102, a main storage device 103, a CPU 104, and an interface device 105, which are mutually connected by a bus B.

コンピュータＣｉでの処理を実現するプログラムは、ＣＤ−ＲＯＭ等の記録媒体１０１によって提供される。プログラムを記録した記録媒体１０１がドライブ装置１００にセットされると、プログラムが記録媒体１０１からドライブ装置１００を介して補助記憶装置１０２にインストールされる。但し、プログラムのインストールは必ずしも記録媒体１０１より行う必要はなく、ネットワークを介して他のコンピュータよりダウンロードするようにしてもよい。補助記憶装置１０２は、インストールされたプログラムを格納すると共に、必要なファイルやデータ等を格納する。 A program for realizing processing in the computer Ci is provided by a recording medium 101 such as a CD-ROM. When the recording medium 101 on which the program is recorded is set in the drive device 100, the program is installed from the recording medium 101 to the auxiliary storage device 102 via the drive device 100. However, the program need not be installed from the recording medium 101 and may be downloaded from another computer via a network. The auxiliary storage device 102 stores the installed program and also stores necessary files and data.

主記憶装置１０３は、プログラムの起動指示があった場合に、補助記憶装置１０２からプログラムを読み出して格納する。ＣＰＵ１０４は、主記憶装置１０３に格納されたプログラムに従ってコンピュータＣｉに係る機能を実行する。インタフェース装置１０５は、ネットワーク（インターコネクト）に接続するためのインタフェースとして用いられる。 The main storage device 103 reads the program from the auxiliary storage device 102 and stores it when there is an instruction to start the program. The CPU 104 executes a function related to the computer Ci according to a program stored in the main storage device 103. The interface device 105 is used as an interface for connecting to a network (interconnect).

図９は、第一の実施の形態におけるコンピュータの機能構成例を示す図である。同図では、コンピュータＣ１がデータの送信側であり、コンピュータＣ２がデータの受信側とされている。 FIG. 9 is a diagram illustrating a functional configuration example of the computer according to the first embodiment. In the figure, the computer C1 is the data transmission side, and the computer C2 is the data reception side.

コンピュータＣ１は、アプリケーション１１及び送信制御部１２等を有する。アプリケーション１１は、ＲＤＭＡ通信を利用して、所定の処理を実行するプログラムである。各アプリケーション１１は、例えば、ジョブ管理装置２０によってジョブを割り当てられたコンピュータＣｉにおいてプロセスとして起動される。 The computer C1 includes an application 11, a transmission control unit 12, and the like. The application 11 is a program that executes predetermined processing using RDMA communication. Each application 11 is started as a process in the computer Ci to which a job is assigned by the job management apparatus 20, for example.

送信制御部１２は、アプリケーション１１からのデータの送信要求に応じ、ＲＤＭＡ通信によるデータ送信を制御する。送信制御部１２は、コンピュータＣ１にインストールされたプログラムがコンピュータＣ１のＣＰＵ１０４に実行させる処理によって実現される。送信制御部１２は、例えば、ＭＰＩ（Message Passing Interface）ライブラリの一部として実装される。 The transmission control unit 12 controls data transmission by RDMA communication in response to a data transmission request from the application 11. The transmission control unit 12 is realized by a process that a program installed in the computer C1 causes the CPU 104 of the computer C1 to execute. The transmission control unit 12 is implemented as a part of an MPI (Message Passing Interface) library, for example.

同図において、送信制御部１２は、閾値算出部１２１、パラメータ記憶部１２２、プロトコル選択部１２３、Ｅ送信制御部１２４、及びＲ送信制御部１２５等を有する。 In the figure, the transmission control unit 12 includes a threshold value calculation unit 121, a parameter storage unit 122, a protocol selection unit 123, an E transmission control unit 124, an R transmission control unit 125, and the like.

閾値算出部１２１は、上述の式（Ｄｔ２）に基づいて、閾値Ｄ＿Ｔｈｒｅｓｈｏｌｄを算出する。パラメータ記憶部１２２は、閾値Ｄ＿Ｔｈｒｅｓｈｏｌｄの算出に必要な各種のパラメータを、例えば、補助記憶装置１０２を用いて記憶する。具体的には、パラメータ記憶部１２２は、Ｗ、Ｌ＿２Ｎ、Ａ＿２、Ｓ＿Ｒｅｎｄｅｚｖｏｕｓ、及びＳ＿Ｅａｇｅｒについて、予め計測された値又は理論値を記憶する。 The threshold value calculation unit 121 calculates the threshold value D_Threshold based on the above formula (Dt2). The parameter storage unit 122 stores various parameters necessary for calculating the threshold value D_Threshold using the auxiliary storage device 102, for example. Specifically, the parameter storage unit 122 stores values or theoretical values measured in advance for W, L_2N, A_2, S_Rendezvous, and S_Eager.

また、パラメータ記憶部１２２は、当該コンピュータＣｉから他のコンピュータＣｊへの最短の通信経路のホップ数Ｈが登録されたホップ数管理テーブル１２２ｔを記憶する。 The parameter storage unit 122 also stores a hop number management table 122t in which the hop number H of the shortest communication path from the computer Ci to another computer Cj is registered.

図１０は、ホップ数管理テーブルの構成例を示す図である。同図に示されるように、ホップ数管理テーブル１２２ｔには、コンピュータ番号ごとに、ホップ数Ｈが記憶されている。ホップ数Ｈは、当該ホップ数管理テーブル１２２ｔを記憶しているコンピュータＣｉを起点とした相対的な値である。同図では、図５に示されるメッシュを構成する場合の、コンピュータＣ１から各コンピュータＣｊへのホップ数Ｈが記憶された例が示されている。なお、コンピュータ番号とは、各コンピュータＣｉ（厳密には、各コンピュータで動作するアプリケーション１１のプロセス）を識別するための番号である。 FIG. 10 is a diagram illustrating a configuration example of the hop number management table. As shown in the figure, the hop count management table 122t stores the hop count H for each computer number. The hop number H is a relative value starting from the computer Ci storing the hop number management table 122t. The figure shows an example in which the number of hops H from the computer C1 to each computer Cj when the mesh shown in FIG. 5 is configured is stored. The computer number is a number for identifying each computer Ci (strictly speaking, the process of the application 11 operating on each computer).

プロトコル選択部１２３は、アプリケーション１１より送信を要求されたデータのデータサイズと、閾値算出部１２１によって算出された閾値Ｄ＿Ｔｈｒｅｓｈｏｌｄとの比較に基づいて、利用する通信プロトコルを選択する。Ｅ送信制御部１２４は、Ｅａｇｅｒプロトコルによるデータの送信処理を制御する。Ｒ送信制御部１２５は、Ｒｅｎｄｅｚｖｏｕｓプロトコルによるデータの送信処理を制御する。 The protocol selection unit 123 selects a communication protocol to be used based on a comparison between the data size of the data requested to be transmitted from the application 11 and the threshold value D_Threshold calculated by the threshold value calculation unit 121. The E transmission control unit 124 controls data transmission processing by the eager protocol. The R transmission control unit 125 controls data transmission processing according to the Rendezvous protocol.

一方、受信側であるコンピュータＣ２は、アプリケーション１１及び受信制御部１３等を有する。アプリケーション１１については、上述した通りである。但し、受信側において、アプリケーション１１は、データの受信を受信制御部１３に要求する。 On the other hand, the computer C2 on the receiving side includes the application 11, the reception control unit 13, and the like. The application 11 is as described above. However, on the receiving side, the application 11 requests the reception control unit 13 to receive data.

受信制御部１３は、アプリケーション１１からのデータの受信要求に応じ、ＲＤＭＡ通信によるデータの受信を制御する。受信制御部は、コンピュータＣ２にインストールされたプログラムがコンピュータＣ１のＣＰＵ１０４に実行させる処理によって実現される。受信制御部１３は、例えば、ＭＰＩライブラリの一部として実装される。 The reception control unit 13 controls reception of data by RDMA communication in response to a data reception request from the application 11. The reception control unit is realized by processing that the program installed in the computer C2 causes the CPU 104 of the computer C1 to execute. The reception control unit 13 is implemented as a part of the MPI library, for example.

同図において、受信制御部１３は、振り分け部１３１、Ｅ受信制御部１３２、及びＲ受信制御部１３３等を有する。振り分け部１３１は、送信側でいずれの通信プロトコルが選択されたかを判定し、受信処理の実行主体をＥ受信制御部１３２又はＲ受信制御部１３３に振り分ける。Ｅ受信制御部１３２は、Ｅａｇｅｒプロトコルによるデータの受信処理を制御する。Ｒ受信制御部１３３は、Ｒｅｎｄｅｚｖｏｕｓプロトコルによるデータの受信処理を制御する。 In the figure, the reception control unit 13 includes a distribution unit 131, an E reception control unit 132, an R reception control unit 133, and the like. The distribution unit 131 determines which communication protocol is selected on the transmission side, and distributes the execution subject of the reception process to the E reception control unit 132 or the R reception control unit 133. The E reception control unit 132 controls data reception processing by the eager protocol. The R reception control unit 133 controls data reception processing according to the Rendezvous protocol.

なお、送信側と受信側とは相対的な関係である。すなわち、各コンピュータＣｉは、或る時は送信側となり、或る時は受信側となる。したがって、各コンピュータＣｉは、送信制御部１２及び受信制御部１３の双方を有する。各コンピュータＣｉのアプリケーション１１は、送信制御部１２を利用してデータの送信を行い、受信制御部１３を利用したデータの受信を行う。 Note that the transmission side and the reception side have a relative relationship. That is, each computer Ci is a transmitting side at a certain time and a receiving side at a certain time. Therefore, each computer Ci has both the transmission control unit 12 and the reception control unit 13. The application 11 of each computer Ci transmits data using the transmission control unit 12 and receives data using the reception control unit 13.

以下、コンピュータＣｉが実行する処理手順について説明する。図１１は、送信側のコンピュータが実行するデータの送信処理の処理手順の一例を説明するためのフローチャートである。 Hereinafter, a processing procedure executed by the computer Ci will be described. FIG. 11 is a flowchart for explaining an example of a processing procedure of data transmission processing executed by the transmission-side computer.

ステップＳ１０１において、送信制御部１２は、アプリケーション１１よりデータｄ１の送信要求を受け付ける。当該送信要求には、データｄ１、当該データｄ１のデータサイズＤ、及び送信先のコンピュータ番号等のパラメータが指定される。 In step S <b> 101, the transmission control unit 12 receives a transmission request for data d <b> 1 from the application 11. In the transmission request, parameters such as data d1, data size D of the data d1, and a transmission destination computer number are designated.

続いて、閾値算出部１２１は、閾値Ｄ＿Ｔｈｒｅｓｈｏｌｄを算出するためのパラメータである、Ｗ、Ｌ＿２Ｎ、Ａ＿２、Ｓ＿Ｒｅｎｄｅｚｖｏｕｓ、Ｓ＿Ｅａｇｅｒ、及びホップ数Ｈをパラメータ記憶部１２２より取得する（Ｓ１０２）。なお、ホップ数Ｈは、送信先のコンピュータＣｊのコンピュータ番号に対応付けられている値がホップ数管理テーブル１２２ｔより取得される。続いて、閾値算出部１２１は、取得されたパラメータを式（Ｄｔ２）に代入して、閾値Ｄ＿Ｔｈｒｅｓｈｏｌｄを算出する（Ｓ１０３）。 Subsequently, the threshold value calculation unit 121 acquires W, L_2N, A_2, S_Rendezvous, S_Eager, and the hop count H, which are parameters for calculating the threshold value D_Threshold, from the parameter storage unit 122 (S102). As the hop number H, a value associated with the computer number of the transmission destination computer Cj is acquired from the hop number management table 122t. Subsequently, the threshold value calculation unit 121 calculates the threshold value D_Threshold by substituting the acquired parameter into the equation (Dt2) (S103).

続いて、プロトコル選択部１２３は、データサイズＤと閾値Ｄ＿Ｔｈｒｅｓｈｏｌｄとを比較して、利用する通信プロトコルを選択する（Ｓ１０４）。データサイズＤが、閾値Ｄ＿Ｔｈｒｅｓｈｏｌｄより小さい場合（Ｓ１０４でＹｅｓ）、プロトコル選択部１２３は、Ｅａｇｅｒプロトコルを選択する。当該選択に応じ、Ｅ送信制御部１２４は、図１のステップＳ１〜Ｓ３において説明した処理手順で、インターコネクトを介したデータｄ１の送信処理を実行する（Ｓ１０５）。 Subsequently, the protocol selection unit 123 selects the communication protocol to be used by comparing the data size D with the threshold value D_Threshold (S104). When the data size D is smaller than the threshold value D_Threshold (Yes in S104), the protocol selection unit 123 selects the eager protocol. In response to the selection, the E transmission control unit 124 executes the transmission process of the data d1 via the interconnect according to the processing procedure described in steps S1 to S3 in FIG. 1 (S105).

一方、データサイズＤが、閾値Ｄ＿Ｔｈｒｅｓｈｏｌｄ以上である場合（Ｓ１０４でＮｏ）、プロトコル選択部１２３は、Ｒｅｎｄｅｚｖｏｕｓプロトコルを選択する。当該選択に応じ、Ｒ送信制御部１２５は、図２のステップＳ１１〜Ｓ１５において説明した手順で、インターコネクトを介した、データｄ１の送信を実行する（Ｓ１０６）。 On the other hand, when the data size D is equal to or greater than the threshold value D_Threshold (No in S104), the protocol selection unit 123 selects the Rendezvous protocol. In response to the selection, the R transmission control unit 125 performs transmission of data d1 through the interconnect in the procedure described in steps S11 to S15 in FIG. 2 (S106).

なお、図１１では、データサイズＤと閾値Ｄ＿Ｔｈｒｅｓｈｏｌｄとが一致する場合、Ｒｅｎｄｅｚｖｏｕｓプロトコルが選択されるが、Ｅａｇｅｒプロトコルが選択されても良い。 In FIG. 11, when the data size D matches the threshold value D_Threshold, the Rendezvous protocol is selected, but the eager protocol may be selected.

続いて、図１２は、受信側のコンピュータが実行するデータの受信処理の処理手順の一例を説明するためのフローチャートである。 Next, FIG. 12 is a flowchart for explaining an example of a processing procedure of data reception processing executed by the receiving computer.

ステップＳ２０１において、受信制御部１３は、データｄ１の受信要求を受け付ける。続いて、振り分け部１３１は、インターコネクトを介して情報が受信されるのを待機する（Ｓ２０２）。情報が受信されると（Ｓ２０２でＹｅｓ）、振り分け部１３１は、受信情報を解読することにより、送信側においていずれの通信プロトコルが選択されたのかを判定する（Ｓ２０３）。すなわち、送信側においてＥａｇｅｒプロトコルが選択された場合、最初に受信される情報は、制御情報ｈ１及びデータｄ１である（図１参照）。一方、Ｒｅｎｄｅｚｖｏｕｓプロトコルの場合、最初に受信される情報は、制御情報ｈ２である（図２参照）。したがって、振り分け部１３１は、制御情報ｈ１又は制御情報ｈ２を解読して、送信側において選択された通信プロトコルを判定する。 In step S201, the reception control unit 13 receives a reception request for data d1. Subsequently, the distribution unit 131 waits for information to be received via the interconnect (S202). When the information is received (Yes in S202), the distribution unit 131 determines which communication protocol is selected on the transmission side by decoding the received information (S203). That is, when the eager protocol is selected on the transmission side, the information received first is the control information h1 and the data d1 (see FIG. 1). On the other hand, in the case of the Rendezvous protocol, the information received first is the control information h2 (see FIG. 2). Therefore, the distribution unit 131 decodes the control information h1 or the control information h2, and determines the communication protocol selected on the transmission side.

振り分け部１３１は、送信側においてＥａｇｅｒプロトコルが選択されたと判定すると（Ｓ２０４でＥａｇｅｒ）、以降の受信処理をＥ受信制御部１３２に委譲する。したがって、Ｅ受信制御部１３２は、図１のステップＳ３〜Ｓ５において説明した処理手順で、インターコネクトを介したデータｄ１の受信処理を実行する（Ｓ２０５）。 When determining that the eager protocol has been selected on the transmission side (Eager in S204), the distribution unit 131 delegates subsequent reception processing to the E reception control unit 132. Therefore, the E reception control unit 132 executes the reception process for the data d1 via the interconnect in accordance with the processing procedure described in steps S3 to S5 in FIG. 1 (S205).

一方、振り分け部１３１は、送信側においてＲｅｎｄｅｚｖｏｕｓプロトコルが選択されたと判定すると（Ｓ２０４でＲｅｎｄｅｚｖｏｕｓ）、以降の受信処理をＲ受信制御部１３３に委譲する。したがって、Ｒ受信制御部１３３は、図２のステップＳ１１〜Ｓ１５において説明した処理手順で、インターコネクトを介したデータｄ１の受信処理を実行する（Ｓ２０６）。 On the other hand, when determining that the Rendezvous protocol is selected on the transmission side (Rendezvous in S204), the distribution unit 131 delegates subsequent reception processing to the R reception control unit 133. Therefore, the R reception control unit 133 executes the reception process of the data d1 via the interconnect in the processing procedure described in steps S11 to S15 in FIG. 2 (S206).

上述したように、第一の実施の形態によれば、ホップ数を考慮して、利用する通信プロトコルを判定するための閾値Ｄ＿Ｔｈｒｅｓｈｏｌｄが決定される。ここで、閾値Ｄ＿Ｔｈｒｅｓｈｏｌｄは、通信相手となるコンピュータＣｉに応じて異なりうる。したがって、メッシュやトーラスのように、任意のノード間の通信時間が均一でないネットワークトポロジーが構成されている場合であっても、通信性能の観点においてより効果的な通信プロトコルを選択することができる。 As described above, according to the first embodiment, the threshold D_Threshold for determining the communication protocol to be used is determined in consideration of the number of hops. Here, the threshold value D_Threshold may differ depending on the computer Ci that is the communication partner. Therefore, even when a network topology such as a mesh or a torus where communication time between arbitrary nodes is not uniform is configured, a more effective communication protocol can be selected from the viewpoint of communication performance.

なお、本実施の形態は、ネットワークトポロジーがファットツリーである場合であっても適用可能である。この点について説明するため、以下に改めて式（Ｄｔ１）と式（Ｄｔ２）とを示す。 Note that this embodiment is applicable even when the network topology is a fat tree. In order to explain this point, equations (Dt1) and (Dt2) are shown below again.

Ｄ＿Ｔｈｒｅｓｈｏｌｄ＝［Ａ＿２×Ｈ＋Ｌ＿２Ｎ＋（Ｓ＿Ｒｅｎｄｅｚｖｏｕｓ−Ｓ＿Ｅａｇｅｒ）／２］×Ｗ・・・（Ｄｔ２） D_Threshold = [A_2 × H + L_2N + (S_Rendezvous−S_Eager) / 2] × W (Dt2)

上記より明らかなように、式（Ｄｔ１）と式（Ｄｔ２）とは、式（Ｄｔ１）におけるＬ＿２が、式（Ｄｔ２）では、Ａ＿２×Ｈ＋Ｌ＿２Ｎに置き換わっている点について相違する。任意のノード間のホップ数の変化量が小さい場合、例えば、式（Ｄｔ２）のホップ数Ｈに、固定値（例えば、０）を代入すればよい。その場合、式（Ｄｔ２）は、式（Ｄｔ１）の近似式となる。 As apparent from the above, the formula (Dt1) and the formula (Dt2) are different from each other in that L_2 in the formula (Dt1) is replaced with A_2 × H + L_2N in the formula (Dt2). When the amount of change in the number of hops between arbitrary nodes is small, for example, a fixed value (for example, 0) may be substituted for the number of hops H in Expression (Dt2). In that case, the expression (Dt2) is an approximate expression of the expression (Dt1).

また、式（Ｄｔ２）は、運用上許容される範囲内において、簡素化されてもよい。例えば、Ｅａｇｅｒプロトコルにおけるソフトウェアオーバーヘッド時間及びＲｅｎｄｅｚｖｏｕｓプロトコルにおけるソフトーウェアオーバーベッド時間が運用上無視できる程度のものである場合、式（Ｄｔ２）から、（Ｓ＿Ｒｅｎｄｅｚｖｏｕｓ−Ｓ＿Ｅａｇｅｒ）／２の項が除去された、以下の式（Ｄｔ３）が用いられてもよい。 Further, the expression (Dt2) may be simplified within a range that is permitted in operation. For example, when the software overhead time in the Eager protocol and the software overbed time in the Rendezvous protocol are negligible in terms of operation, the term (S_Rendezvous−S_Eager) / 2 is removed from the equation (Dt2), (Dt3) may be used.

Ｄ＿Ｔｈｒｅｓｈｏｌｄ＝（Ａ＿２×Ｈ＋Ｌ＿２Ｎ）×Ｗ・・・（Ｄｔ３）
または、Ｌ＿２Ｎの値が運用上無視できる程度のものである場合、式（Ｄｔ２）から、Ｌ＿２Ｎの項が除去された、以下の式（Ｄｔ４）が用いられてもよい。 D_Threshold = (A_2 × H + L_2N) × W (Dt3)
Alternatively, when the value of L_2N is a value that can be ignored in operation, the following equation (Dt4) in which the term of L_2N is removed from the equation (Dt2) may be used.

Ｄ＿Ｔｈｒｅｓｈｏｌｄ＝［Ａ＿２×Ｈ＋（Ｓ＿Ｒｅｎｄｅｚｖｏｕｓ−Ｓ＿Ｅａｇｅｒ）／２］×Ｗ・・・（Ｄｔ４） D_Threshold = [A_2 × H + (S_Rendezvous−S_Eager) / 2] × W (Dt4)

更に、（Ｓ＿Ｒｅｎｄｅｚｖｏｕｓ−Ｓ＿Ｅａｇｅｒ）／２及びＬ＿２Ｎの双方が除去された、以下の式（Ｄｔ５）が用いられてもよい。 Furthermore, the following equation (Dt5) in which both (S_Rendezvous−S_Eager) / 2 and L_2N are removed may be used.

Ｄ＿Ｔｈｒｅｓｈｏｌｄ＝（Ａ＿２×Ｈ）×Ｗ・・・（Ｄｔ５） D_Threshold = (A_2 × H) × W (Dt5)

換言すれば、式（Ｄｔ４）又は式（Ｄｔ２）を利用する場合、ソフトウェアオーバーヘッド時間をも考慮して閾値Ｄ＿Ｔｈｒｅｓｈｏｌｄを算出することができる。また、式（Ｄｔ３）又は式（Ｄｔ２）を利用する場合、Ｌ＿２Ｎの値をも考慮して閾値Ｄ＿Ｔｈｒｅｓｈｏｌｄを算出することができる。 In other words, when the formula (Dt4) or the formula (Dt2) is used, the threshold D_Threshold can be calculated in consideration of the software overhead time. Further, when the formula (Dt3) or the formula (Dt2) is used, the threshold value D_Threshold can be calculated in consideration of the value of L_2N.

ところで、第一の実施の形態では、各コンピュータＣｉにおいて、ホップ数管理テーブル１２２ｔが記憶される。ホップ数管理テーブル１２２ｔの情報量は、コンピュータＣｉの台数の増加に応じて大きくなる。また、ホップ数管理テーブル１２２ｔの内容は、コンピュータＣｉごとに異なる。したがって、多数のコンピュータＣｉを利用する場合、ホップ数管理テーブル１２２ｔの設定作業の負担を非常に大きなものとなる。また、コンピュータＣｉの接続関係が変化した場合や、コンピュータＣｉの台数に増減が発生した場合等において、新たな構成に応じて各コンピュータＣｉのホップ数管理テーブル１２２ｔを更新するのは非常に大変である。 By the way, in the first embodiment, the hop number management table 122t is stored in each computer Ci. The amount of information in the hop number management table 122t increases as the number of computers Ci increases. The contents of the hop number management table 122t are different for each computer Ci. Therefore, when a large number of computers Ci are used, the burden of setting the hop number management table 122t becomes very large. Also, when the connection relationship of the computers Ci changes or when the number of computers Ci increases or decreases, it is very difficult to update the hop number management table 122t of each computer Ci according to the new configuration. is there.

そこで、第二の実施の形態では、ホップ数管理テーブル１２２ｔの保守作業を簡素化した例について説明する。なお、第二の実施の形態では、第一の実施の形態と異なる点について説明する。したがって、特に言及しない点については第一の実施の形態と同様でよい。 Therefore, in the second embodiment, an example in which the maintenance work of the hop number management table 122t is simplified will be described. In the second embodiment, differences from the first embodiment will be described. Accordingly, the points not particularly mentioned may be the same as those in the first embodiment.

図１３は、第二の実施の形態におけるジョブ管理装置の機能構成例を示す図である。同図において、ジョブ管理装置２０は、ジョブ割当部２１、ホップ数算出部２２、及び座標情報記憶部２３等を有する。コンピュータＣｓのうち、いずれかのコンピュータＣｉが、ジョブ割当部２１、ホップ数算出部２２、及び座標情報記憶部２３を含んで、ホップ数の算出を行なっても良い。もしくは、コンピュータＣｓのうちのコンピュータＣｉが、ホップ数算出部２２を含み、ジョブ管理装置に含まれる座標情報記憶部２３に記憶された座標値に基づいて、ホップ数の算出を行なっても良い。 FIG. 13 is a diagram illustrating a functional configuration example of the job management apparatus according to the second embodiment. In FIG. 1, the job management apparatus 20 includes a job allocation unit 21, a hop number calculation unit 22, a coordinate information storage unit 23, and the like. Any computer Ci of the computers Cs may include the job allocation unit 21, the hop number calculation unit 22, and the coordinate information storage unit 23 to calculate the hop number. Alternatively, the computer Ci of the computers Cs may include the hop number calculation unit 22 and calculate the hop number based on the coordinate values stored in the coordinate information storage unit 23 included in the job management apparatus.

ジョブ割当部２１は、ユーザからアプリケーション１１の実行指示を受け付け、アプリケーション１１の実行（すなわち、ジョブの実行）をコンピュータＣｎに指示する。 The job allocation unit 21 receives an instruction to execute the application 11 from the user, and instructs the computer Cn to execute the application 11 (that is, execute the job).

座標情報記憶部２３は、ネットワークトポロジーの座標系における、各コンピュータＣｉの座標値（位置情報）を、例えば、ジョブ管理装置２０の補助記憶装置を用いて記憶する。 The coordinate information storage unit 23 stores the coordinate value (position information) of each computer Ci in the network topology coordinate system using, for example, an auxiliary storage device of the job management device 20.

すなわち、メッシュやトーラスのネットワークトポロジーでは、コンピュータＣｉがｎ次元座標空間の論理的な格子点上に、通常ｎ次元直方体の形に配置される。ｎ次元座標系を（ｘ０，ｘ１，．．．，ｘｎ−１）と表現すれば、コンピュータＣｉはｎ次元直方体ｘｉ＿ｍｉｎ＜＝ｘｉ＜＝ｘｉ＿ｍａｘ（ｉ＝０，．．．，ｎ−１）に配置される。したがって、各コンピュータＣｉは座標値（ｘ０，ｘ１，．．．，ｘｎ−１）を有する。座標情報記憶部２３は、斯かる座標値をコンピュータＣｉごとに記憶する。 That is, in the network topology of mesh or torus, the computer Ci is usually arranged in the form of an n-dimensional rectangular parallelepiped on logical lattice points in the n-dimensional coordinate space. If the n-dimensional coordinate system is expressed as (x0, x1,..., xn-1), the computer Ci is converted into an n-dimensional cuboid xi_min <= xi <= xi_max (i = 0,..., n-1). Be placed. Accordingly, each computer Ci has coordinate values (x0, x1,..., Xn−1). The coordinate information storage unit 23 stores such coordinate values for each computer Ci.

図１４は、座標情報記憶部の構成例を示す図である。同図に示されるように、座標情報記憶部２３には、コンピュータ番号ごとに、ネットワークトポロジーの座標系における座標値が記憶されている。同図では、ネットワークトポロジーが、図５に示される２次元のメッシュの場合の各コンピュータＣｉの座標値が記憶された例が示されている。 FIG. 14 is a diagram illustrating a configuration example of the coordinate information storage unit. As shown in the figure, the coordinate information storage unit 23 stores coordinate values in the network topology coordinate system for each computer number. In the figure, an example is shown in which the coordinate values of each computer Ci are stored when the network topology is the two-dimensional mesh shown in FIG.

ホップ数算出部２２は、座標情報記憶部２３に記録されている座標値に基づいて、各コンピュータＣｉ間のホップ数Ｈを算出する。ホップ数Ｈの算出方法は、ネットワークトポロジーによって異なる。 The hop count calculation unit 22 calculates the hop count H between the computers Ci based on the coordinate values recorded in the coordinate information storage unit 23. The calculation method of the hop number H differs depending on the network topology.

ホップ集の算出方法について説明する。メッシュのネットワークトポロジーではｘｉ＿ｍｉｎに位置するコンピュータＣｉとｘｉ＿ｍａｘに位置するコンピュータＣｉとがインターコネクトによって直接的に接続されていない。例えば、図５において、コンピュータＣ１（ｘ１＿ｍｉｎ，ｘ２＿ｍｉｎ）とコンピュータＣ３（ｘ１＿ｍａｘ，ｘ２＿ｍｉｎ）とは直接的に接続されていない。したがって、ｎ次元直方体におけるコンピュータＣ１の座標を（ｘ０＿１，ｘ１＿１，．．．，ｘｎ−１＿１）とし、Ｃ２の座標を（ｘ０＿２，ｘ１＿２，．．．，ｘｎ−１＿２）とした場合、この間の最短の通信経路のホップ数Ｈは、以下の式（Ｈｍ）によって算出される。 A method for calculating the hop collection will be described. In the mesh network topology, the computer Ci located at xi_min and the computer Ci located at xi_max are not directly connected by an interconnect. For example, in FIG. 5, the computer C1 (x1_min, x2_min) and the computer C3 (x1_max, x2_min) are not directly connected. Therefore, when the coordinates of the computer C1 in the n-dimensional rectangular parallelepiped are (x0_1, x1_1, ..., xn-1_1) and the coordinates of C2 are (x0_2, x1_2, ..., xn-1_2), the shortest interval between them The number of hops H of the communication path is calculated by the following equation (Hm).

Σ｜ｘｉ＿１−ｘｉ＿２｜＝｜ｘ０＿１−ｘ０＿２｜＋｜ｘ１＿１−ｘ１＿２｜＋．．．＋｜ｘｎ−１＿１−ｘｎ−１＿２｜・・・（Ｈｍ） Σ | xi_1-xi_2 | = | x0_1-x0_2 | + | x1_1-x1_2 | +. . . + | Xn-1_1-xn-1_2 | ... (Hm)

例えば、隣接したコンピュータ間の最短の通信経路のホップ数Ｈは１である。 For example, the hop number H of the shortest communication path between adjacent computers is 1.

一方、トーラスのネットワークトポロジーでは、ｘｉ＿ｍｉｎに位置するコンピュータＣｉとｘｉ＿ｍａｘに位置するコンピュータＣｉとがインターコネクトによって直接的に接続されている。例えば、図６において、コンピュータＣ１（ｘ１＿ｍｉｎ，ｘ２＿ｍｉｎ）とコンピュータＣ３（ｘ１＿ｍａｘ，ｘ２＿ｍｉｎ）とは直接的に接続されている。したがって、コンピュータＣ１とコンピュータＣ２との間の最短の通信経路のホップ数Ｈの計算式は、より複雑なものとなる。 On the other hand, in the torus network topology, the computer Ci located at xi_min and the computer Ci located at xi_max are directly connected by an interconnect. For example, in FIG. 6, the computer C1 (x1_min, x2_min) and the computer C3 (x1_max, x2_min) are directly connected. Therefore, the calculation formula of the hop count H of the shortest communication path between the computer C1 and the computer C2 becomes more complicated.

トーラスのネットワークトポロジーのｎ次元直方体のｉ次元の辺の長さＤｉａｍｅｔｅｒ＿ｉを、
Ｄｉａｍｅｔｅｒ＿ｉ＝ｘｉ＿ｍａｘ−ｘｉ＿ｍｉｎ＋１
とする。 The length Diameter_i of the i-dimensional side of the n-dimensional rectangular parallelepiped of the torus network topology is
Diameter_i = xi_max−xi_min + 1
And

続いて、Ｄｉａｍｅｔｅｒ＿ｉの１／２の長さＲａｄｉｕｓ＿ｉを以下のように求める。 Subsequently, a length Radius_i that is ½ of Diameter_i is obtained as follows.

Ｒａｄｉｕｓ＿ｉ＝（ｘｉ＿ｍａｘ−ｘｉ＿ｍｉｎ＋１）／２ Radius_i = (xi_max-xi_min + 1) / 2

ｎ次元直方体におけるコンピュータＣ１の座標を（ｘ０＿１，ｘ１＿１，．．．，ｘｎ−１＿１）とし、Ｃ２の座標を（ｘ０＿２，ｘ１＿２，．．．，ｘｎ−１＿２）とする。この場合、コンピュータＣ１とコンピュータＣ２との最短の通信経路のホップ数Ｈは、以下の式（Ｈｔ１）又は式（Ｈｔ２）によって算出される。 The coordinates of the computer C1 in the n-dimensional rectangular parallelepiped are (x0_1, x1_1, ..., xn-1_1), and the coordinates of C2 are (x0_2, x1_2, ..., xn-1_2). In this case, the hop count H of the shortest communication path between the computer C1 and the computer C2 is calculated by the following formula (Ht1) or formula (Ht2).

上記より、ホップ数算出部２２は、ネットワークトポロジーがメッシュであれば、式（Ｈｍ）に基づいてホップ数Ｈを算出する。また、ホップ数算出部２２は、ネットワークトポロジーがトーラスであれば、式（Ｈｔ１）又は式（Ｈｔ２）に基づいてホップ数Ｈを算出する。 From the above, if the network topology is a mesh, the hop number calculation unit 22 calculates the hop number H based on the formula (Hm). Further, if the network topology is a torus, the hop number calculation unit 22 calculates the hop number H based on the formula (Ht1) or the formula (Ht2).

また、図１５は、第二の実施の形態におけるコンピュータの機能構成例を示す図である。図１５中、図９と同一部分には同一符号を付し、その説明は省略する。 FIG. 15 is a diagram illustrating a functional configuration example of a computer according to the second embodiment. In FIG. 15, the same parts as those of FIG. 9 are denoted by the same reference numerals, and the description thereof is omitted.

図１５において、コンピュータＣｉは、初期化部１４が明示されている。初期化部１４は、通信処理の実行前に必要とされる初期化処理を実行する。初期化部１４は、初期化処理において、ホップ数Ｈ等をジョブ管理装置２０より取得する。すなわち、初期化部は、取得手段の一例である。初期化部１４は、例えば、ＭＰＩライブラリの一部として実装される。この場合、初期化部１４は、初期化用の関数に相当する。 In FIG. 15, the initialization unit 14 is clearly shown in the computer Ci. The initialization unit 14 performs an initialization process that is required before the execution of the communication process. The initialization unit 14 acquires the hop count H and the like from the job management apparatus 20 in the initialization process. That is, the initialization unit is an example of an acquisition unit. The initialization unit 14 is implemented as a part of the MPI library, for example. In this case, the initialization unit 14 corresponds to a function for initialization.

以下、第二の実施の形態のコンピュータシステムの処理手順について説明する。図１６は、第二の実施の形態におけるアプリケーション実行開始時の処理手順の一例を説明するためのシーケンス図である。 Hereinafter, a processing procedure of the computer system according to the second embodiment will be described. FIG. 16 is a sequence diagram for explaining an example of a processing procedure at the start of application execution in the second embodiment.

ステップＳ３０１において、ジョブ管理装置２０のジョブ割当部２１は、ユーザよりアプリケーション１１の実行指示を受け付ける。当該実行指示では、使用するコンピュータＣｉの台数も指定される。 In step S301, the job assignment unit 21 of the job management apparatus 20 receives an execution instruction for the application 11 from the user. In the execution instruction, the number of computers Ci to be used is also specified.

続いて、ジョブ割当部２１は、コンピュータ群Ｃｓの中から、ユーザによって指定された台数分のコンピュータＣｉを、アプリケーション１１の実行先として選択する（Ｓ３０２）。具体的には、アプリケーション１１の実行先として使用するコンピュータＣｉのコンピュータ番号が決定される。なお、コンピュータＣｉの選択は、公知のジョブスケジューリング技術等に基づいて行われればよい。続いて、ジョブ割当部２１は、選択された各コンピュータＣｉに、アプリケーション１１の実行指示を送信する（Ｓ３０３）。 Subsequently, the job allocation unit 21 selects, from the computer group Cs, as many computers Ci designated by the user as the execution destination of the application 11 (S302). Specifically, the computer number of the computer Ci used as the execution destination of the application 11 is determined. The computer Ci may be selected based on a known job scheduling technique or the like. Subsequently, the job assignment unit 21 transmits an execution instruction of the application 11 to each selected computer Ci (S303).

アプリケーション１１の実行を指示された各コンピュータＣｉは、アプリケーション１１を起動させる（Ｓ３０４）。アプリケーション１１は、起動に応じ、初期化部１４に対して初期化処理の実行を要求する（Ｓ３０５）。初期化部１４は、初期化処理の中で、アプリケーション１１の実行先として選択された全てのコンピュータＣｉのコンピュータ番号及びホップ数Ｈを問い合わせる（Ｓ３０６）。なお、当該問い合わせには、問い合わせ元のコンピュータＣｉのコンピュータ番号も指定される。 Each computer Ci instructed to execute the application 11 activates the application 11 (S304). In response to activation, the application 11 requests the initialization unit 14 to execute initialization processing (S305). The initialization unit 14 inquires about the computer numbers and the hop numbers H of all the computers Ci selected as the execution destination of the application 11 during the initialization process (S306). In the inquiry, the computer number of the computer Ci as the inquiry source is also specified.

ジョブ管理装置２０のホップ数算出部２２は、問い合わせ元のコンピュータＣｉのコンピュータ番号と、アプリケーション１１の実行先として選択された他のコンピュータＣｉの座標値とを座標情報記憶部２３より取得する（Ｓ３０７）。続いて、ホップ数算出部２２は、取得された座標値に基づいて、問い合わせ元のコンピュータＣｉから当該他のコンピュータＣｉへのホップ数Ｈを算出する（Ｓ３０８）。すなわち、問い合わせ元のコンピュータＣｉと、当該他のコンピュータＣｉとの位置関係に基づいてホップ数Ｈが算出される。なお、ネットワークトポロジーは静的に決まっているため、式（Ｈｍ）、又は式（Ｈｔ１）及び式（Ｈｔ２）のいずれを用いるかは予め決まっている。 The hop number calculation unit 22 of the job management apparatus 20 acquires the computer number of the inquiry source computer Ci and the coordinate value of the other computer Ci selected as the execution destination of the application 11 from the coordinate information storage unit 23 (S307). ). Subsequently, the hop count calculation unit 22 calculates the hop count H from the inquiry computer Ci to the other computer Ci based on the acquired coordinate value (S308). That is, the hop count H is calculated based on the positional relationship between the computer Ci that is the inquiry source and the other computer Ci. Note that since the network topology is statically determined, it is determined in advance which one of the formula (Hm), the formula (Ht1), and the formula (Ht2) is used.

続いて、ホップ数算出部２２は、当該他のコンピュータＣｉのコンピュータ番号とホップ数Ｈとを問い合わせ元のコンピュータＣｉに返信する（Ｓ３０９）。当該他のコンピュータＣｉが複数である場合、コンピュータ番号とホップ数Ｈの組は複数返信される。 Subsequently, the hop number calculation unit 22 returns the computer number of the other computer Ci and the hop number H to the computer Ci that has made the inquiry (S309). When there are a plurality of other computers Ci, a plurality of sets of computer numbers and hop numbers H are returned.

続いて、初期化部１４は、受信されたコンピュータ番号及びホップ数Ｈを、パラメータ記憶部１２２のホップ数管理テーブル１２２ｔに記録する（Ｓ３１０）。 Subsequently, the initialization unit 14 records the received computer number and hop number H in the hop number management table 122t of the parameter storage unit 122 (S310).

その後に実行される通信処理については、第一の実施の形態と同様でよい。 The communication process executed after that may be the same as in the first embodiment.

上述したように、第二の実施の形態によれば、各コンピュータＣｉに対して他のコンピュータＣｉのホップ数Ｈが自動的に登録される。したがって、各コンピュータＣｉへのホップ数Ｈの登録作業の作業負担を著しく軽減することができる。すなわち、管理者等は、ジョブ管理装置２０において一元的に管理されている座標情報記憶部２３を編集すればよい。 As described above, according to the second embodiment, the hop count H of another computer Ci is automatically registered for each computer Ci. Therefore, it is possible to remarkably reduce the work burden of the registration work of the hop number H to each computer Ci. That is, the administrator or the like may edit the coordinate information storage unit 23 that is centrally managed in the job management apparatus 20.

なお、第一の実施の形態において各コンピュータＣｉに分散されているホップ数管理テーブル１２２ｔが、座標情報記憶部２３の代わりにジョブ管理装置２０に一元的に記憶されていてもよい。この場合も、図１６と同様の処理手順によって、各コンピュータＣｉに自動的にホップ数Ｈを登録することができる。この場合、ホップ数算出部２２は、ホップ数Ｈを算出する必要はない。ホップ数算出部２２は、問い合わせ元のコンピュータＣｉに関して記憶されているホップ数管理テーブル１２２ｔに基づいて、ホップ数Ｈ等を返信すればよい。 Note that the hop count management table 122t distributed to each computer Ci in the first embodiment may be centrally stored in the job management apparatus 20 instead of the coordinate information storage unit 23. Also in this case, the hop count H can be automatically registered in each computer Ci by the same processing procedure as in FIG. In this case, the hop number calculation unit 22 does not need to calculate the hop number H. The hop count calculation unit 22 may return the hop count H or the like based on the hop count management table 122t stored with respect to the inquiry source computer Ci.

但し、ホップ数管理テーブル１２２ｔは、コンピュータＣｉの台数分作成される必要がある。したがって、座標情報記憶部２３の登録作業の方が作業負担は小さいものと考えられる。 However, the hop number management table 122t needs to be created for the number of computers Ci. Therefore, it is considered that the work of registration in the coordinate information storage unit 23 is smaller.

以上、本発明の実施例について詳述したが、本発明は斯かる特定の実施形態に限定されるものではなく、特許請求の範囲に記載された本発明の要旨の範囲内において、種々の変形・変更が可能である。 As mentioned above, although the Example of this invention was explained in full detail, this invention is not limited to such specific embodiment, In the range of the summary of this invention described in the claim, various deformation | transformation・ Change is possible.

１１アプリケーション
１２送信制御部
１３受信制御部
１４初期化部
２０ジョブ管理装置
２１ジョブ割当部
２２ホップ数算出部
２３座標情報記憶部
１００ドライブ装置
１０１記録媒体
１０２補助記憶装置
１０３主記憶装置
１０４ＣＰＵ
１０５インタフェース装置
１２１閾値算出部
１２２パラメータ記憶部
１２３プロトコル選択部
１２４Ｅ送信制御部
１２５Ｒ送信制御部
１３１振り分け部
１３２Ｅ受信制御部
１３３Ｒ受信制御部
Ｂバス
Ｃｓコンピュータ群
Ｃｉコンピュータ 11 Application 12 Transmission Control Unit 13 Reception Control Unit 14 Initialization Unit 20 Job Management Device 21 Job Allocation Unit 22 Hop Number Calculation Unit 23 Coordinate Information Storage Unit 100 Drive Device 101 Recording Medium 102 Auxiliary Storage Device 103 Main Storage Device 104 CPU
105 interface device 121 threshold calculation unit 122 parameter storage unit 123 protocol selection unit 124 E transmission control unit 125 R transmission control unit 131 distribution unit 132 E reception control unit 133 R reception control unit B bus Cs computer group Ci computer

Claims

A computer included in a computer group, the computer transmitting data to another computer included in the computer group using any one of a plurality of communication protocols,
Of the plurality of communication protocols, a communication protocol in which the transfer time predicted based on the number of hops on the communication path from the computer to the other computer and the data size of the data is shorter than other communication protocols Select
A communication program that causes the data to be transmitted using a selected communication protocol.

In the computer,
Based on the information indicating the connection relationship between the computers included in the computer group stored in the storage means in advance, calculate the number of hops on the communication path from the computer to the other computer,
The communication protocol is selected from among the plurality of communication protocols, wherein a communication protocol predicted based on the calculated number of hops and a data size of the data is shorter than other communication protocols. Item 4. The communication program according to Item 1.

The plurality of communication protocols include communication protocols in which the number of times of communication between the computer and the other computer required for the computer to transmit the data to the other computer is different from each other. Item 3. A communication program according to item 1 or 2.

In the computer,
Based on the number of hops, a threshold for switching a communication protocol having a shorter transfer time than other communication protocols included in the plurality of communication protocols is set for the data size,
4. The method according to claim 1, further comprising: selecting a communication protocol having a smaller transfer time than other communication protocols according to the set threshold value and the data size of the data. 5. The communication program described in the section.

A communication device that transmits data to other communication devices included in the communication device group using any one of a plurality of communication protocols,
Of the plurality of communication protocols, the transfer time predicted based on the number of hops on the communication path from the communication device to the other communication device and the data size of the data is shorter than other communication protocols. A selection means for selecting a communication protocol;
And a transmission control means for transmitting the data using the selected communication protocol.

A computer included in the computer group, the computer transmitting data to another computer included in the computer group using any one of a plurality of communication protocols,
Of the plurality of communication protocols, a communication protocol in which the transfer time predicted based on the number of hops on the communication path from the computer to the other computer and the data size of the data is shorter than other communication protocols Select
A communication method, comprising: transmitting the data using a selected communication protocol.

Communication having a communication device that transmits data to another communication device included in the communication device group using any one of a plurality of communication protocols, and an information processing device not included in the communication device group A system,
Obtaining means for obtaining from the information processing device the number of hops on the communication path from the communication device to the other communication device;
A selection unit that selects a communication protocol whose transfer time predicted based on the number of hops acquired by the acquisition unit and the data size of the data is shorter than other communication protocols among the plurality of communication protocols means,
Transmission control means for transmitting the data using a selected communication protocol,
The information processing apparatus includes:
The number of hops on a communication path from the communication device to the other communication device is calculated based on information indicating a connection relationship between the communication devices included in the communication device group stored in the storage unit in advance. A communication system.