JP3644158B2

JP3644158B2 - Data transmission / reception method in parallel computer

Info

Publication number: JP3644158B2
Application number: JP30442696A
Authority: JP
Inventors: 明彦坂口; 暢俊佐川; 常之今木
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1996-11-15
Filing date: 1996-11-15
Publication date: 2005-04-27
Anticipated expiration: 2016-11-15
Also published as: JPH10143486A

Description

【０００１】
【発明の属する技術分野】
本発明は複数の要素計算機（プロセッシングユニット、以下PU）を通信網によって結合した並列計算機におけるPU間のデータ送受信方法に係わり、特にメッセージパッシングの高速性とデータの安全性の確保するデータ送受信方法に関する。
【０００２】
【従来の技術】
並列計算機は、複数のPUを通信網によって結合し、それらを同時に稼働させることによって処理速度を向上させる。本発明では特に、各PUがそれに付随するメモリ空間のみをアクセスすることができる分散メモリ型の並列計算機を対象とする。分散メモリ型並列計算機では、他のPUのメモリ上にあるデータを直接アクセスすることはできない。データが必要となる度に送受信を行ってそのデータを自PUに移動する必要がある。
【０００３】
分散メモリ型並列計算機では、PU間のデータのやりとりをすべてプログラム中に記述する必要がある。ここでPU間で受け渡されるデータをメッセージと呼ぶ。このメッセージをやりとりすることをメッセージパッシングと呼ぶ。並列計算機用プログラムでは、他のPUで必要となるデータが自PUのメモリ上にある場合には自PUはあらかじめこれらデータを他のPUへ送信し、他のPUのメモリ上にあるデータを自PUが必要とする場合には自PUはあらかじめ他のPUからこれらデータを受信しておくような指示を各PUのプログラム中に明示的に記述する必要がある。
【０００４】
多くの並列計算機システムでは、このようなPU間のメッセージの送受信をサポートする目的で、メッセージパッシングライブラリと呼ばれる関数（あるいはサブルーチン）群があらかじめ用意されており、通信はCやFORTRANなどのプログラムからの関数コールとして記述できるようになっている。メッセージパッシングライブラリの中には、異なる並列計算機ハードウェア上にインプリメントされ、事実上の標準としての通信環境を提供するものも現れている。米国のOak Ridge National Laboratoryで開発されたPVMや、近年標準化が進められているMPIはその例である。これらの通信ライブラリをコールすることにより書かれた並列プログラムは、異なる並列計算機上でも再コンパイルのみで動作させうる可能性（可搬性）が高い。
【０００５】
通信ライブラリでPU間のメッセージの受け渡しを行うには、送信側PUでメッセージの送信関数をコールし、受信側PUでそれに対応するメッセージの受信関数をコールし、これらの間でメッセージを送受信する方式が現在一般に用いられている。送信関数より受信関数が先にコールされた場合には受信関数はデータの到着までブロック（停止）し、送信関数が先にコールされた場合には受信関数の開始までブロックするか、メッセージがシステム内にバッファリングされる。これはsend/receive方式と呼ばれる。
【０００６】
PU間のデータ移動を高速に実現する方法として、リモートメモリ書き込み機構を持つ分散メモリ型の並列計算機がある。リモートメモリ書き込みでは、各PUは相手PUの介入なしに直接相手PU内の特定メモリ領域へのデータ転送が可能である。リモートメモリ書き込みを行うことのできる特定メモリ領域は、リモートメモリ書き込み領域と呼ばれる。リモートメモリ書き込み領域は、実アドレスが連続であり、スワッピングされないために各PUが随時データ転送することができる。
【０００７】
図２は、リモートメモリ書き込み機構を持つ並列計算機上でメッセージパッシングを実現するための従来手法を示す。各PU間の実際のデータ転送は、リモートメモリ書き込み領域間で行われる。まず送信関数がコールされると、送信側のユーザプログラム中のバッファ(２０１)からリモートメモリ書き込み領域内のバッファ(２０２)へデータがコピーされる。次にそこから受信側のリモートメモリ書き込み領域内のバッファ(２０３)にリモートメモリ書き込みを用いてデータが転送される。最後に受信関数がコールされて、リモートメモリ書き込み領域(２０３)からユーザプログラム中のバッファ(２０４)にメッセージがコピーされて、メッセージの受け渡しが完了する。
【０００８】
【発明が解決しようとする課題】
上述のように、リモートメモリ書き込みでは相手PUの介入なしにデータの転送を行う際は、受信側のリモートメモリ書き込み領域の使用を確認しないと、受信側で必要なメモリ領域を書き込みデータで上書きしてしまい、受信側のデータを壊してしまう恐れがあった。そこで受信側のリモートメモリ書き込み領域を確認しないと、続けてデータ転送を行うことができなかった。また、送信側でのユーザプログラムのバッファからリモートメモリ書き込み領域へのデータ転送、送信側のリモートメモリ領域から受信側のリモートメモリ領域へのデータ転送、受信側でのリモートメモリ書き込み領域からユーザプログラム内のバッファへのデータ転送と、計３回のデータ転送が必要であった。また、リモートメモリ書き込み領域に別々に送られてきたデータ間の順序を保証する事も出来なかった。
【０００９】
本発明の目的は、リモートメモリ転送を限られた量の転送用メモリで高速に行うことと、相手PUが独自に送ってきたデータの順序を保証することにある。
【００１０】
【課題を解決するための手段】
本発明は、複数台の計算機とこれらを相互接続する通信路からなる並列計算機において、各計算機内に、通信相手となる計算機毎に固定された領域である静的受信バッファ領域と、通信の発生時に動的に割り当てられる動的受信バッファ領域を確保するステップと、送信データのデータ長が予め定められた値より短い場合には、送信先の該静的バッファ領域に送信データを書き込むステップと、送信データのデータ長が予め定められた値より長い場合には、送信先の該動的バッファ領域のアドレスを該静的バッファ領域を用いて受信するステップと、送信先での該受信したアドレスの該動的バッファ領域に当該送信データを書き込むステップを設けることによって、達成される。
【００１１】
また予め定められた値より長いデータを転送する場合には、パイプライン処理でデータを転送することで、限られたメモリ量において高速にデータ転送を行うことができる。
【００１２】
また、送られたデータの順序性を、使用した転送用メモリ（バッファ）を順につなぐことで保証することができる。
【００１３】
【発明の実施の形態】
以下、図を参照して本発明の詳細を説明する。
【００１４】
まず、本発明の実装方法の具体例を図を参照して説明する。図１に本発明の全体構成図を示す。１０１,１０２はPU（プロセッサユニット）を示し、１０３,１０４はそれらのCPU、１０５,１０６はメモリである。１０７はそれらを結ぶ通信路（ＰＵを相互接続できるネットワークであればよい）である。PUの数は任意であるが、説明のために２つのPUからなる並列計算機を示している。１０８,１０９はOS(オペレーティングシステム)である。ユーザプログラムを実行する際には、まずメモリ上にユーザプログラムが図１に示されない補助記憶装置等からローディングされる(１１０,１１１)。なお、ユーザプログラムはあらかじめ本発明のメッセージパッシングライブラリ(１１２,１１３)とリンクされているものとする。メッセージパッシングライブラリ中には他PUからリモートメモリ書き込み可能なリモートメモリ書き込み領域(１１４,１１５)が設けられている。さらに、リモートメモリ書き込み領域内部は通信相手PUごとにあらかじめアドレスが割り当てられている静的バッファ(１１６,１１７)とアドレスが動的に変化する動的バッファ(１１８,１１９)が存在する。
【００１５】
以上の構成要素のうち、メッセージパッシングライブラリが本発明の特徴をなす構成要素である。以下、詳細に説明する。
【００１６】
（Ａ）バッファの構成
図３、図４を用いて、本発明におけるメッセージパッシング用のバッファ構成について説明する。上述したように本発明では、静的バッファ(３０１)と動的バッファ(４０１)の二種類のバッファを用いる。なお、静的バッファ３０１は図１の静的バッファ１１６、１１７に相当し、動的バッファ４０１は図１の動的バッファ１１８、１１９に相当する。
【００１７】
まず、静的バッファは通信相手PU（＃０、＃１、＃２、・・・＃ｎ）ごとに複数のブロック（図３ではPU＃１に対して６個のブロックが示される）が用意されている。静的バッファの各ブロックは、大きく分けてヘッダ(３０２)とメッセージ本体(３０３)の二つに分かれており、さらにヘッダ内にはtag(３０４)、length(３０５)、first address(３０６)、last address(３０７)の情報が含まれている。tagは対応する送受信の組を選択するための識別子、lengthは通信するメッセージの長さ、first addressは動的バッファを割り当てた時の先頭アドレス、last addressは動的バッファを割り当てた時の最終アドレスを格納するための領域である。なお、通信路（ネットワーク）内での送信先ＰＵおよび送信元ＰＵの識別情報は別途管理され、メッセージはネットワーク内を転送されるものとする。静的バッファは各PUごとに送信用(３０１)と受信用(３０９)の同一形状のバッファが用意されており、バッファの使用状況などの情報が通信相手PUと共有化されている。静的バッファは通信相手PUごとにあらかじめ設定されており初期化の時点でお互いのPUがアドレスを知ることが出来る。送信側は送信用バッファに空きがある限りは常に受信側の受信用バッファにデータの転送を行う事が出来る。
【００１８】
動的バッファは、通信相手PUごとに区別されていない複数のブロック(４０１)からなる。各ブロックは、ヘッダ(４０２)とメッセージの本体(４０３)とに分けられ、ヘッダは受信側がデータが到着したかどうかの確認を行うために使用される。なお、ヘッダ部分の領域の構成は静的バッファの構成と同じであり、送信先ＰＵおよび送信元ＰＵのネットワーク内での識別情報（アドレス）は別途管理されるものとする。さらにメッセージ長に合わせたバッファ量を選択するために、各ブロックは幾つかで束になって管理されている（図４の複数ブロック４０１ではでは、この束を太線の枠で示している）。その束ごとに管理ヘッダ(４０４)に登録されており、受信側PUはメッセージ長に合わせて最適なサイズのバッファ束を取得する（図４の例では１ブロックの束と４ブロックの束がそれぞれ複数面用意されており、メッセージが１ブロックのサイズより小さい時は１ブロックの束が、それより大きい時には４ブロックの束が取得される）。受信側PUは取得したバッファ束の先頭アドレスと最終アドレスを静的バッファのヘッダ内のfirst address、last addressに格納し送信側PUへと送信することになる。また、送信側は動的バッファを２面用意しており(４０５)、これを交互に使用することでパイプライン処理が可能となる。パイプライン処理の詳細については後述する。
【００１９】
（Ｂ）通信プロトコル
一般にメッセージ長が短い時には、より高速にメッセージの転送が行われる（レイテンシが低い）ことが求められ、一方メッセージ長が長い時には、単位時間当りにより大量のメッセージの転送が行われる（スループットが大きい）ことが求められる。この２つの必ずしも両立しない要求を満たすため、本発明ではメッセージ長が短いメッセージを送信する場合に使用するショートプロトコルとメッセージ長が長いメッセージを送信する場合に使用するロングプロトコルの２つの通信プロトコルを用意し、これを切り替えて用いることで遅延の少ないデータ転送を実現する。
【００２０】
図５は、ショートプロトコルのタイミングチャートを表している。ショートプロトコルは、メッセージの転送に静的バッファを用いる。ユーザプログラムにより送信関数がコールされると(５０１)、送信側PUは静的バッファのヘッダとメッセージ本体を受信側に送信する(５０３)。一方、受信関数がコールされると(５０２)、受信側PUは静的バッファでメッセージを受け取り、送信側に受信完了通知を送信し(５０４)する。１往復のデータ通信でメッセージの通信を完了することができる。
【００２１】
しかし、リモートメモリ書き込み領域には限りがあるため、静的バッファの長さ、数量には制限が生じる。そのため全てのメッセージを静的バッファで送信すると大量のメッセージを通信する時には、静的バッファが空くのを待つ必要があり、逆に通信速度が落ちてしまう。そのためメッセージ長が長い時には、PUごとに区別されていない、それがゆえに大量に用意の出来る動的バッファを用いてメッセージ転送を行う。これが、ロングプロトコルである。静的バッファのメッセージ本体部分の長さを境界として、静的バッファの容量より少ないメッセージ長のメッセージを送信する場合にショートプロトコルを用い、静的バッファの容量より大きいメッセージ長のメッセージを送信する場合にロングプロトコルを用いるように、制御される。
【００２２】
図６は、ロングプロトコルのタイミングチャートを表している。ユーザプログラムにより送信関数がコールされると(６０１)、送信側PUはまず、送信するデータ長の長さ（送信するデータのデータ量）を検出し、このデータ量が静的バッファの容量より大きいと、ロングプロトコルを用いると判定する。静的バッファの容量より送信するデータ量が小さい場合は、前述のショートプロトコルを用いる。ロングプロトコルの場合、静的バッファのヘッダを受信側に送信する(６０３)。一方、受信関数がコールされると(６０２)受信側PUは静的バッファでメッセージの情報を受け取り、それに合わせた動的バッファのアドレス情報を静的バッファを用いて送り返し(６０４)、以後送信側PUが受け取ったバッファ情報に基づきメッセージを送信し(６０５)、最後に受信側PUが送信側に受信完了通知を送信する(６０６)。２往復のデータ転送が必要でありショートプロトコルに比較してレイテンシは高くなるが、動的バッファは静的バッファに比べ大量のデータ転送を可能とするためスループットを大きくすることが可能である。
【００２３】
以下、各プロトコルの動作を詳細に説明する。
まず、ショートプロトコルの動作を図７を用いて説明する。送信側PUは、静的バッファのヘッダにメッセージ長と識別子を格納する(７０１)。さらにメッセージをユーザ領域からメッセージ本体部にコピーする(７０２)。次いで、送信側から受信側へ静的バッファのリモートメモリ書き込みを行う(７０３)。一方、受信側PUは、メッセージを静的バッファで受け取り(７０５)、そこからユーザ領域へとコピーし、受信が完了する(７０６)。最後に受信側は受信完了通知を静的バッファを使い送り出し(７０７)、それを受けて送信側も処理を終了する(７０４)。
【００２４】
次に、ロングプロトコルの動作を図８を用いて説明する。まず送信側PUはショートプロトコルと同様、静的バッファのヘッダにメッセージ長と識別子を格納する(８０１)。ロングプロトコルの場合静的バッファではメッセージを送りきれないためヘッダのみを受信側へリモートメモリ書き込みする(８０２)。受信側PUは静的バッファでメッセージの情報を受け取る(８０７)と、lengthに合わせて適当な長さの動的バッファを確保しその先頭アドレスと最終アドレスを取得する(８０８)。それらのアドレスを静的バッファのfirst address、last addressにセットして、送信側に送り返す(８０９)。送信側はバッファ情報を受け取り(８０３)、全てのメッセージを送信するまでループを繰り返し(８０４)、動的バッファのブロックを単位として、ユーザ領域からバッファにコピーし送信する(８０５)。さらに受信側も全てのメッセージを受信するまでループを繰り返し(８１０)、受信したブロックからユーザ領域にメッセージをコピーする(８１１)。最後に受信側は受信完了通知を静的バッファを使い送り出し(８１２)、それを受けて送信側も処理を終了する(８０６)。
【００２５】
（Ｃ）ロングプロトコルにおけるパイプライン転送
リモートメモリ書き込みを用いたメッセージパッシングでは、ユーザ領域からリモートメモリ書き込み領域へのコピー、送信側から受信側へのリモートメッセージ転送、リモートメモリ書き込み領域からユーザ領域へのコピー、とメッセージの転送が３回必要となる。したがって少なくともリモートメモリ書き込みの約３倍の時間が必要となる。そこで本発明では送信側の動的バッファを２面用意し、パイプライン処理を行う事で性能向上を図る。
【００２６】
以下に図９を用いてパイプライン処理時の動作を説明する。図９において送信側PUのリモートメモリ書き込み領域内のバッファ(２０２)は、図４における２面ある送信側の動的バッファ(４０５)を、受信側PUのリモートメモリ書き込み領域内のバッファ(２０３)は、受信側の動的バッファ(４０１)を簡易化して表している。受信側の動的バッファは多面用意されているが、ここではABCDのデータを転送するのに必要な４面のみ表記している。ステップ１で送信側においてユーザ領域からリモートメモリ書き込み領域へメッセージAのコピーを行う(９０１)。次にステップ２で、メッセージAを送信側から受信側へリモートメッセージ転送で送信する(９０２)と同時に、メッセージBをリモートメモリ書き込み領域へコピーする(９０３)。ステップ３では、受信側でメッセージAをリモートメモリ書き込み領域からユーザ領域へコピーし(９０４)、メッセージBを送信側から受信側へリモートメモリ転送し(９０５)、送信側でメッセージCをユーザ領域からリモートメモリ書き込み領域へコピーする(９０６)。ステップ４では、受信側でのメッセージBのコピー(９０７)、メッセージCの送信側から受信側へのリモートメモリ書き込み転送(９０８)、送信側でのメッセージDのコピー(９０９)を同時に行う。
【００２７】
図１０は、パイプライン処理の送信側PUと受信側PUごとの動作を示す。送信側では、まずデータAをユーザ領域からリモートメモリ書き込み領域へメモリコピーし(１００１)、次にデータAを受信側に送信すると同時にデータBのメモリコピーを行い(１００２)、以下順次同様の動作が続き(１００３,１００４)、最後にデータDの送信が行われる(１００５)。一方、受信側では、まずデータAを送信側から受信し(１００６)、次にデータBを受信すると同時にデータAのメモリコピーを行い(１００７)、以下順次同様の動作が続き(１００８,１００９)、最後にデータDのメモリコピーが行われる(１０１０)。
【００２８】
（Ｄ）メッセージパッシングライブラリのインタフェース
メッセージパッシングライブラリは、ユーザプログラムの中から関数コールの形でメッセージパッシングを行うための関数群である。以下に、本発明におけるメッセージパッシングライブラリの関数のインタフェースとその動作を説明する。なお、関数名称、引き数名称などは任意であり、必ずしもここで説明する仕様と同じである必要はない。
【００２９】
(１)Init()
本関数中で、メッセージパッシングライブラリは必要な初期化操作を行う。メッセージパッシングライブラリの使用時には、全てのPUが必ず最初に本関数をコールしなければならない。本関数がユーザプログラムからコールされると、リモートメモリ書き込み領域に静的バッファ(図１：１１６,１１７)と動的バッファ(図１：１１８,１１９)を作成し、各バッファの初期化を行う。静的バッファは各PUごとにアドレスが固定であり、全てのPUは静的バッファの送信時の相手先アドレスをこの初期化時に通信しあうことが出来る。以下に挙げる関数は、初期化関数をコールした後にのみ使用する事が出来る。
【００３０】
(２)Send(buf, dest, tag, length)
ここで、bufは送信するメッセージの格納されたユーザメモリの先頭アドレスで、destは送信先PU番号（ネットワーク内での送り先ＰＵを識別する情報）、tagはメッセージの識別子、lengthはメッセージの長さを表す。ユーザが本関数をコールすると、ライブラリはlengthによってショートプロトコル(図７：７０１〜７０４)かロングプロトコル(図８：８０１〜８０６)を用いてメッセージの送信を行う。
【００３１】
(３)Recv(buf, src, tag)
本関数のbufは受信したメッセージを格納するユーザメモリの先頭アドレスで、srcは送信元PU番号（ネットワーク内でＰＵを識別する情報）、tagはメッセージ識別子を表す。ユーザが本関数をコールすると、送信元ＰＵ番号が一致した静的バッファのヘッダ部分でメッセージの長さを受け取り、それに合わせてショートプロトコル(図７：７０５〜７０７)またはロングプロトコル(図８：８０７〜８１２)でメッセージの受信を行う。
【００３２】
（Ｅ）ノンブロッキング動作における順序性の保証
メッセージパッシングライブラリにおける送受信には、ブロッキング関数とノンブロッキング関数がある。ブロッキング関数とは、送受信関数がコールされてから送受信が完了するまでプログラムの動作をブロック(停止)する関数であり、ノンブロッキング関数とは、関数のコール後送受信が完了する前にリターンし、PUはその間に他の動作を行う事が可能な関数である。前項まではブロッキング関数を前提としていた。本項ではノンブロッキングを実現するための追加機構を説明する。ノンブロッキング関数では、送受信が完了する前に複数の送受信関数が発行される事がある。この時にtagの同じ送受信関数では、発行された順序で送信関数と受信関数が対応しない事がある。以下に本発明の送受信関数の順序性の保証法について図１１を用いて説明する。
【００３３】
本発明のメッセージパッシングライブラリは、送信時には、まず静的バッファのヘッダにメッセージ情報(tag, length)をセットして受信側へと連絡する(ショートプロトコル時にはメッセージの本体も同時に送信する)。この時に静的バッファのヘッダにnext(１１０１)というメンバを加え、このnextで次に送信する静的バッファのブロックを指定する。使用中の静的バッファの各ブロックは、nextによりチェーンでつながれており、チェーンの順に送信されることになる。受信側が受け取る静的バッファは送信側から送られたものであり、送信側と同様nextでつながっている。したがってチェーンの順で検索し、送信関数の発行順序を確定することが出来る。
【００３４】
また、送信側での送信関数発行時に静的バッファが空いていない時や、受信側での受信関数発行時に静的バッファのヘッダがまだ送られてきていない時には、送受信関数の順序を静的バッファのブロックの順序で表す事が出来ない。そこで本発明では、関数発行時にすぐに処理できない関数の順序を保持しておくために、未処理の関数の発行順序を管理するためのリクエストオブジェクトを導入する(１１０２,１１０３)。リクエストオブジェクトには、メッセージの情報を保持するtag(１１０４,１１０５)、dest(src)(１１０６,１１０７)、length(１１０８,１１０９)と順序を保持するnext(１１１０,１１１１)の計４つの要素を持つ。終了していない関数は静的バッファのブロックと同様、nextによってチェーンでつながれ、送信側、受信側、それぞれで、その順に処理される。静的バッファのブロックが順に処理された後は、リクエストオブジェクトの順に処理が進む。以上の方式によりノンブロッキング関数における順序性は保証される。
【００３５】
（Ｆ）ノンブロッキング動作の追加インタフェース
順序性が保証されれば、以下のインタフェースを追加することによってノンブロッキング関数を実現できる。
【００３６】
(１)Isend(buf, dest, tag, length)
各引き数はSendと同じ仕様である。ユーザが本関数をコールすると静的バッファに空きがある場合には使用するバッファをチェーンにつないでから転送し、静的バッファに空きがない場合にはリクエストオブジェクトを作成して本関数をチェーンにつなぐ。リクエストオブジェクトのチェーンは発行順に処理される。静的バッファを受信側に転送した後は、ショートプロトコル(図７：７０１〜７０４)かロングプロトコル(図８：８０１〜８０６)で非同期にデータが転送される。
【００３７】
(２)Irecv(buf, src, tag)
各引き数はRecvと同じ仕様である。ユーザが関数をコールするとリクエストオブジェクトを作成してチェーンにつなぐ。チェーンの先頭の関数から順に処理され、静的バッファのチェーンの先頭から順に対応するtagを持つ送信関数が発行されているかを検索する。対応する送信が検索された後は、ショートプロトコル(図７：７０４〜７０７)かロングプロトコル(図８：８０７〜８１２)で非同期にデータが転送される。
【００３８】
(３)Wait()
本関数は、ノンブロッキング関数の完了を待つための関数である。ノンブロッキング関数は、関数コール後すぐにリターンしてしまうため、関数がいつ完了するかユーザには分からない。そのためノンブロッキング関数の完了を明示するために本関数は使われる。本関数が発行されると、完了確認をしていないノンブロッキング関数が完了するまでPUはブロックされる。全ての関数が完了することで本関数も完了する。本関数完了後はまた新たにSend/Isend、Recv/Irecvが発行され通信が再開される。
【００３９】
【発明の効果】
本発明のリモートメモリ転送制御方式によれば、リモートメモリ書き込みを用いたデータ転送において、データの長さによってあらかじめPUごとにアドレスの割り当てられた領域を用いて転送するか、転送時に動的に割り当てられる領域を用いてパイプライン処理で転送するかを選択することができ、それによって高速にデータの転送が出来るようになる。図１２に示す通りパイプライン動作を導入すると、最初と最後の２回ずつを除き、３回のメッセージ転送が重なって生じる。したがって、従来のリモートメモリ書き込みを用いたメッセージパッシングに比べ、約３倍の性能が得られる。
【００４０】
また、本発明のリモートメモリ転送制御方式によれば、送信関数が発行された順にデータが転送され、受信関数が発行された順に転送されたデータを受け取る事を保証することができる。これにより、ノンブロッキング動作を行う送受信関数におけるデータの順序性を保証することができるようになる。
【図面の簡単な説明】
【図１】本発明の実施例の全体構成図。
【図２】従来のリモートメモリ書き込みを用いたデータ転送制御方式。
【図３】静的バッファの説明図。
【図４】動的バッファの説明図。
【図５】ショートプロトコルのタイミングチャート。
【図６】ロングプロトコルのタイミングチャート。
【図７】ショートプロトコルのフローチャート。
【図８】ロングプロトコルのフローチャート。
【図９】パイプライン動作の説明図。
【図１０】パイプライン動作のフローチャート。
【図１１】順序性保証のための説明図。
【図１２】パイプライン動作の動作図。
【符号の説明】
１０１,１０２...要素計算機、１０３,１０４...CPU、１０５,１０６...メモリ、１０７...通信路、１０８,１０９...オペレーティングシステム、１１０,１１１...ユーザプログラム、１１２,１１３...メッセージ通信ライブラリ、１１４,１１５...データ転送用メモリ領域、１１６,１１７,３０１,３０９...静的バッファ、１１８,１１９,４０１,４０５,９１０,９１１...動的バッファ、２０１,２０４...ユーザバッファ、２０２,２０３...データ転送用バッファ、３０２...静的バッファのヘッダ、３０３,３０８...静的バッファのメッセージ本体、３０４...メッセージの識別子を格納する領域、３０５...メッセージの長さを格納する領域、３０６...確保した動的バッファの先頭アドレスを格納する領域、３０７...確保した動的バッファの最終アドレスを格納する領域、４０２...動的バッファにメッセージが到着しているかを確認するためのヘッダ、４０３...動的バッファのメッセージの本体、４０４...動的バッファの管理ヘッダ、１１０１...静的バッファの順序を格納する領域、１１０２,１１０３...リクエストオブジェクト、１１０４,１１０５...メッセージの識別子を格納するリクエストオブジェクトの領域、１１０６,１１０７...メッセージの送受信相手を格納するリクエストオブジェクトの領域、１１０８,１１０９...メッセージの長さを格納するリクエストオブジェクトの領域、１１１０,１１１１...リクエストオブジェクトの順序を格納する領域。[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a data transmission / reception method between PUs in a parallel computer in which a plurality of elemental computers (processing units, hereinafter referred to as PUs) are connected by a communication network, and more particularly to a data transmission / reception method for ensuring high speed of message passing and data safety. .
[0002]
[Prior art]
A parallel computer improves processing speed by connecting multiple PUs via a communication network and operating them simultaneously. In particular, the present invention is directed to a distributed memory type parallel computer in which each PU can access only the memory space associated therewith. A distributed memory type parallel computer cannot directly access data on the memory of another PU. It is necessary to send and receive data each time it is needed and move the data to its own PU.
[0003]
In a distributed memory parallel computer, all data exchange between PUs must be described in a program. Here, data passed between PUs is called a message. This message exchange is called message passing. In the parallel computer program, if the data required by other PUs is in the memory of the own PU, the own PU sends these data to the other PUs in advance, and the data in the other PU memory is automatically transmitted. When a PU requires it, it is necessary for the own PU to explicitly describe in advance an instruction to receive these data from other PUs in the program of each PU.
[0004]
In many parallel computer systems, a function (or subroutine) group called a message passing library is prepared in advance for the purpose of supporting the transmission and reception of messages between PUs, and communication is performed from programs such as C and FORTRAN. It can be described as a function call. Some message passing libraries are implemented on different parallel computer hardware and provide a de facto standard communication environment. Examples include PVM developed at the Oak Ridge National Laboratory in the United States and MPI, which has been standardized in recent years. There is a high possibility (portability) that a parallel program written by calling these communication libraries can be operated only by recompilation on a different parallel computer.
[0005]
To exchange messages between PUs in the communication library, call the send function of the message on the sending PU, call the receive function of the corresponding message on the receiving PU, and send and receive messages between them. Is currently in common use. If the receive function is called before the send function, the receive function will block (stop) until the arrival of data, and if the send function is called first, it will block until the start of the receive function, or the message will be Buffered in. This is called a send / receive method.
[0006]
As a method for realizing high-speed data movement between PUs, there is a distributed memory type parallel computer having a remote memory writing mechanism. In remote memory writing, each PU can directly transfer data to a specific memory area in the partner PU without the intervention of the partner PU. A specific memory area where remote memory writing can be performed is called a remote memory writing area. In the remote memory writing area, real addresses are continuous and are not swapped, so that each PU can transfer data at any time.
[0007]
FIG. 2 shows a conventional method for realizing message passing on a parallel computer having a remote memory writing mechanism. Actual data transfer between each PU is performed between remote memory write areas. First, when the transmission function is called, data is copied from the buffer (201) in the user program on the transmission side to the buffer (202) in the remote memory writing area. Next, data is transferred from there to the buffer (203) in the remote memory writing area on the receiving side using remote memory writing. Finally, the reception function is called, the message is copied from the remote memory writing area (203) to the buffer (204) in the user program, and the message delivery is completed.
[0008]
[Problems to be solved by the invention]
As described above, in remote memory writing, when transferring data without intervention of the partner PU, if the use of the remote memory writing area on the receiving side is not confirmed, the memory area required on the receiving side is overwritten with the write data. As a result, the data on the receiving side may be destroyed. Therefore, unless the remote memory writing area on the receiving side is confirmed, data transfer cannot be continued. Also, data transfer from the user program buffer to the remote memory writing area on the sending side, data transfer from the remote memory area on the sending side to the remote memory area on the receiving side, and from the remote memory writing area on the receiving side to the user program Data transfer to the buffer and a total of three data transfers. Also, it was impossible to guarantee the order between the data sent separately to the remote memory writing area.
[0009]
An object of the present invention is to perform remote memory transfer at a high speed with a limited amount of transfer memory, and to guarantee the order of data sent uniquely by the partner PU.
[0010]
[Means for Solving the Problems]
The present invention relates to a parallel computer comprising a plurality of computers and a communication path interconnecting them, and in each computer, a static reception buffer area which is an area fixed for each computer which is a communication partner, and generation of communication A step of securing a dynamic reception buffer area that is dynamically allocated at times, and a step of writing transmission data in the static buffer area of a transmission destination when the data length of transmission data is shorter than a predetermined value; When the data length of the transmission data is longer than a predetermined value, the step of receiving the address of the dynamic buffer area at the transmission destination using the static buffer area, and the step of receiving the address at the transmission destination This is achieved by providing a step of writing the transmission data in the dynamic buffer area.
[0011]
When data longer than a predetermined value is transferred, the data can be transferred at high speed with a limited amount of memory by transferring the data by pipeline processing.
[0012]
Further, the order of the transmitted data can be ensured by connecting the used transfer memories (buffers) in order.
[0013]
DETAILED DESCRIPTION OF THE INVENTION
Hereinafter, the details of the present invention will be described with reference to the drawings.
[0014]
First, a specific example of the mounting method of the present invention will be described with reference to the drawings. FIG. 1 shows an overall configuration diagram of the present invention. Reference numerals 101 and 102 denote PUs (processor units), 103 and 104 are their CPUs, and 105 and 106 are memories. Reference numeral 107 denotes a communication path (any network that can interconnect PUs) that connects them. Although the number of PUs is arbitrary, a parallel computer composed of two PUs is shown for explanation. Reference numerals 108 and 109 denote OSs (operating systems). When executing the user program, first, the user program is loaded onto the memory from an auxiliary storage device or the like not shown in FIG. 1 (110, 111). It is assumed that the user program is linked in advance with the message passing library (112, 113) of the present invention. In the message passing library, remote memory write areas (114, 115) in which remote memory can be written from other PUs are provided. Further, in the remote memory writing area, there are a static buffer (116, 117) in which an address is assigned in advance for each communication partner PU and a dynamic buffer (118, 119) in which the address dynamically changes.
[0015]
Among the above components, the message passing library is a component that characterizes the present invention. This will be described in detail below.
[0016]
(A) Buffer configuration
A message passing buffer configuration according to the present invention will be described with reference to FIGS. As described above, the present invention uses two types of buffers, the static buffer (301) and the dynamic buffer (401). The static buffer 301 corresponds to the static buffers 116 and 117 in FIG. 1, and the dynamic buffer 401 corresponds to the dynamic buffers 118 and 119 in FIG.
[0017]
First, the static buffer is provided with a plurality of blocks (6 blocks are shown for PU # 1 in FIG. 3) for each communication partner PU (# 0, # 1, # 2,... #N). Has been. Each block of the static buffer is roughly divided into a header (302) and a message body (303). Further, in the header, tag (304), length (305), first address (306), The information of last address (307) is included. tag is an identifier for selecting the corresponding send / receive pair, length is the length of the message to be communicated, first address is the start address when the dynamic buffer is allocated, and last address is the final address when the dynamic buffer is allocated Is an area for storing. Note that the identification information of the transmission destination PU and the transmission source PU in the communication path (network) is separately managed, and the message is transferred in the network. As the static buffer, buffers of the same shape for transmission (301) and reception (309) are prepared for each PU, and information such as the buffer usage status is shared with the communication partner PU. The static buffer is set in advance for each communication partner PU, and each PU can know the address at the time of initialization. The transmitting side can always transfer data to the receiving buffer on the receiving side as long as there is an empty space in the transmitting buffer.
[0018]
The dynamic buffer includes a plurality of blocks (401) that are not distinguished for each communication partner PU. Each block is divided into a header (402) and a message body (403), and the header is used by the receiving side to check whether data has arrived. The configuration of the header portion area is the same as that of the static buffer, and the identification information (address) in the network of the transmission destination PU and the transmission source PU is separately managed. Further, in order to select the buffer amount according to the message length, some blocks are managed in a bundle (in a plurality of blocks 401 in FIG. 4, this bundle is indicated by a bold frame). Each bundle is registered in the management header (404), and the receiving side PU obtains a buffer bundle of an optimal size according to the message length (in the example of FIG. 4, a bundle of 1 block and a bundle of 4 blocks are respectively obtained. Multiple planes are prepared. When the message is smaller than the size of one block, a bundle of 1 block is obtained. When the message is larger, a bundle of 4 blocks is obtained. Be done ). The reception side PU stores the acquired start address and end address of the buffer bundle in the first address and last address in the header of the static buffer, and transmits them to the transmission side PU. In addition, the transmission side has two dynamic buffers (405), and by using them alternately, pipeline processing is possible. Details of the pipeline processing will be described later.
[0019]
(B) Communication protocol
In general, when the message length is short, message transfer is required to be performed at a higher speed (low latency), while when the message length is long, a larger amount of messages are transferred per unit time (high throughput). Is required. In order to satisfy these two incompatible requirements, the present invention provides two communication protocols, a short protocol used when transmitting a message with a short message length and a long protocol used when transmitting a message with a long message length. However, data transfer with a small delay is realized by switching and using this.
[0020]
FIG. 5 shows a timing chart of the short protocol. The short protocol uses a static buffer for message transfer. When the transmission function is called by the user program (501), the transmission side PU transmits the header of the static buffer and the message body to the reception side (503). On the other hand, when the reception function is called (502), the reception side PU receives the message in the static buffer, and transmits a reception completion notification to the transmission side (504). Message communication can be completed with one round-trip data communication.
[0021]
However, since the remote memory writing area is limited, the length and quantity of the static buffer are limited. For this reason, when all messages are transmitted in a static buffer, when a large number of messages are communicated, it is necessary to wait for the static buffer to be freed, and the communication speed decreases. Therefore, when the message length is long, the message is transferred by using a dynamic buffer that is not distinguished for each PU and can be prepared in large quantities. This is a long protocol. When sending a message with a message length smaller than the static buffer capacity when sending a message with a message length smaller than the static buffer capacity, using the message body length of the static buffer as a boundary, and sending a message with a message length larger than the static buffer capacity It is controlled to use the long protocol.
[0022]
FIG. 6 shows a timing chart of the long protocol. When the transmission function is called by the user program (601), the transmission side PU first detects the length of the data length to be transmitted (data amount of data to be transmitted), and this data amount is larger than the capacity of the static buffer. It is determined that the long protocol is used. When the amount of data to be transmitted is smaller than the capacity of the static buffer, the aforementioned short protocol is used. In the case of the long protocol, the header of the static buffer is transmitted to the receiving side (603). On the other hand, when the reception function is called (602), the receiving side PU receives the message information in the static buffer, and sends back the dynamic buffer address information corresponding to the information using the static buffer (604). A message is transmitted based on the buffer information received by the PU (605), and finally the receiving PU transmits a reception completion notification to the transmitting side (606). Although two round-trip data transfer is required and the latency is higher than that of the short protocol, the dynamic buffer can transfer a large amount of data compared to the static buffer, so that the throughput can be increased.
[0023]
Hereinafter, the operation of each protocol will be described in detail.
First, the operation of the short protocol will be described with reference to FIG. The sending PU stores the message length and identifier in the header of the static buffer (701). Further, the message is copied from the user area to the message body (702). Next, the remote buffer is written into the static buffer from the transmission side to the reception side (703). On the other hand, the receiving PU receives the message in the static buffer (705), copies it to the user area, and completes the reception (706). Finally, the reception side sends out a reception completion notification using a static buffer (707), and upon receipt thereof, the transmission side also ends the processing (704).
[0024]
Next, the operation of the long protocol will be described with reference to FIG. First, the transmission side PU stores the message length and the identifier in the header of the static buffer as in the short protocol (801). In the case of the long protocol, the message cannot be sent in the static buffer, so only the header is written in the remote memory to the receiving side (802). When the receiving PU receives the message information in the static buffer (807), it secures a dynamic buffer having an appropriate length according to the length and obtains its start address and final address (808). These addresses are set in the first address and last address of the static buffer and sent back to the transmission side (809). The transmission side receives the buffer information (803), repeats the loop until all the messages are transmitted (804), and copies and transmits the dynamic buffer block from the user area to the buffer (805). Further, the receiving side repeats the loop until all messages are received (810), and copies the message from the received block to the user area (811). Finally, the receiving side sends out a reception completion notification using a static buffer (812), and the transmitting side also ends the processing (806).
[0025]
(C) Pipeline transfer in long protocol
In message passing using remote memory writing, the message is transferred three times: copying from the user area to the remote memory writing area, remote message transfer from the sending side to the receiving side, copying from the remote memory writing area to the user area, and so on. Necessary. Therefore, at least about three times the time required for remote memory writing is required. Therefore, in the present invention, two dynamic buffers on the transmission side are prepared and performance is improved by performing pipeline processing.
[0026]
The operation during pipeline processing will be described below with reference to FIG. In FIG. 9, the buffer in the remote memory writing area of the sending PU ( 202 ) Represents the dynamic buffer (405) on the transmission side in two sides in FIG. 4, and the buffer (203) in the remote memory writing area of the reception side PU is simplified to represent the dynamic buffer (401) on the reception side. ing. Although there are many types of dynamic buffers on the receiving side, only the four sides necessary for transferring ABCD data are shown here. In step 1, message A is copied from the user area to the remote memory writing area on the transmission side (901). Next, in step 2, message A is transmitted from the transmission side to the reception side by remote message transfer (902), and at the same time, message B is copied to the remote memory writing area (903). In step 3, message A is copied from the remote memory writing area to the user area on the receiving side (904), message B is transferred from the transmitting side to the receiving side in remote memory (905), and message C is transferred from the user area on the transmitting side. Copy to the remote memory writing area (906). In step 4, message B is copied (907) on the receiving side, remote memory write transfer (908) from the transmitting side to the receiving side of message C, and message D is copied (909) on the transmitting side at the same time.
[0027]
FIG. 10 shows operations for each of the transmission side PU and the reception side PU in the pipeline processing. On the transmitting side, first, data A is copied from the user area to the remote memory writing area (1001), then data A is transmitted to the receiving side, and at the same time, data B is copied (1002). (1003, 1004), and finally data D is transmitted (1005). On the other hand, the receiving side first receives data A from the transmitting side (1006), then receives data B, and at the same time performs a memory copy of data A (1007), and thereafter the same operation continues (1008, 1009). Finally, memory copy of data D is performed (1010).
[0028]
(D) Message passing library interface
The message passing library is a group of functions for performing message passing in the form of function calls from the user program. The interface of the message passing library function and its operation in the present invention will be described below. Note that the function name, argument name, and the like are arbitrary and do not necessarily have the same specifications as described here.
[0029]
(1) Init ()
In this function, the message passing library performs necessary initialization operations. When using the message passing library, all PUs must always call this function first. When this function is called from the user program, a static buffer (FIG. 1: 116, 117) and a dynamic buffer (FIG. 1: 118, 119) are created in the remote memory write area, and each buffer is initialized. . The static buffer has a fixed address for each PU, and all PUs can communicate with each other at the time of initialization. The functions listed below can only be used after calling the initialization function.
[0030]
(2) Send (buf, dest, tag, length)
Where buf is the start address of the user memory in which the message to be transmitted is stored, dest is the destination PU number (information identifying the destination PU in the network), tag is the identifier of the message, and length is the length of the message Represents. When the user calls this function, the library transmits a message using the short protocol (FIG. 7: 701 to 704) or the long protocol (FIG. 8: 801 to 806) depending on the length.
[0031]
(3) Recv (buf, src, tag)
In this function, buf is the start address of the user memory that stores the received message, src is the source PU number (information identifying the PU in the network), and tag is the message identifier. When the user calls this function, the length of the message is received in the header portion of the static buffer with the matching transmission source PU number, and the short protocol (FIG. 7: 705 to 707) or the long protocol (FIG. 8: 807) is matched accordingly. ˜812) to receive the message.
[0032]
(E) Guarantee of order in non-blocking operation
There are a blocking function and a non-blocking function for transmission and reception in the message passing library. The blocking function is a function that blocks (stops) the operation of the program until the transmission / reception is completed after the transmission / reception function is called.The non-blocking function is returned before the transmission / reception is completed after the function is called. It is a function that can perform other operations in the meantime. Up to the previous section, a blocking function was assumed. This section describes an additional mechanism for realizing non-blocking. In the non-blocking function, a plurality of transmission / reception functions may be issued before transmission / reception is completed. At this time, in the transmission / reception function with the same tag, the transmission function and the reception function may not correspond in the order in which they are issued. The method for guaranteeing the order of transmission / reception functions according to the present invention will be described below with reference to FIG.
[0033]
When transmitting, the message passing library of the present invention first sets message information (tag, length) in the header of the static buffer and communicates with the receiving side (when the protocol is short, the message body is also transmitted simultaneously). At this time, a member next (1101) is added to the header of the static buffer, and the next block of the static buffer to be transmitted is designated by this next. Each block of the static buffer in use is chained by next, and is transmitted in the order of the chain. The static buffer received by the receiving side is sent from the transmitting side, and is connected by next like the transmitting side. Therefore, it is possible to search in the order of the chain and to determine the order in which the transmission functions are issued.
[0034]
If the static buffer is not empty when the sending function is issued on the sending side, or the header of the static buffer is not yet sent when the receiving function is issued on the receiving side, the order of the sending / receiving functions is changed to the static buffer. It cannot be expressed in the order of the blocks. Therefore, in the present invention, in order to maintain the order of functions that cannot be processed immediately upon function issuance, a request object for managing the order of issuing unprocessed functions is introduced (1102, 1103). The request object has a total of four elements: tag (1104, 1105) holding message information, dest (src) (1106, 1107), length (1108, 1109) and next (1110, 1111) holding the order. have. Unfinished functions are chained by next, as in the static buffer block, and are processed in that order on the sending and receiving sides. After the blocks of the static buffer are processed in order, the processing proceeds in the order of request objects. The ordering in the non-blocking function is guaranteed by the above method.
[0035]
(F) Additional interface for non-blocking operation
If ordering is guaranteed, a non-blocking function can be realized by adding the following interface.
[0036]
(1) Isend (buf, dest, tag, length)
Each argument has the same specifications as Send. When the user calls this function, if there is space in the static buffer, the buffer to be used is transferred after being chained. If there is no space in the static buffer, a request object is created and this function is chained. connect. Request object chains are processed in order of issue. After the static buffer is transferred to the receiving side, data is transferred asynchronously using the short protocol (FIG. 7: 701 to 704) or the long protocol (FIG. 8: 801 to 806).
[0037]
(2) Irecv (buf, src, tag)
Each argument has the same specifications as Recv. When the user calls the function, a request object is created and connected to the chain. It is processed in order from the top function of the chain, and it is searched whether a transmission function having a tag corresponding to the static buffer chain from the top is issued. After the corresponding transmission is searched, data is transferred asynchronously using the short protocol (FIG. 7: 704 to 707) or the long protocol (FIG. 8: 807 to 812).
[0038]
(3) Wait ()
This function is a function for waiting for completion of the non-blocking function. Non-blocking functions return immediately after a function call, so the user does not know when the function completes. Therefore, this function is used to clearly indicate the completion of the non-blocking function. When this function is issued, the PU is blocked until a non-blocking function that has not been confirmed for completion is completed. This function is also completed when all functions are completed. After this function is completed, Send / Isend and Recv / Irecv are newly issued and communication is resumed.
[0039]
【The invention's effect】
According to the remote memory transfer control method of the present invention, in data transfer using remote memory writing, data is transferred using an area to which an address is assigned in advance for each PU according to the length of data, or dynamically assigned at the time of transfer. It is possible to select whether data is transferred by pipeline processing using the area to be transferred, thereby enabling high-speed data transfer. When the pipeline operation is introduced as shown in FIG. 12, three message transfers are overlapped except for the first and last two times. Therefore, about three times the performance can be obtained as compared with the message passing using the conventional remote memory writing.
[0040]
Further, according to the remote memory transfer control system of the present invention, it is possible to ensure that data is transferred in the order in which the transmission functions are issued and that the transferred data is received in the order in which the reception functions are issued. This makes it possible to guarantee the order of data in the transmission / reception function that performs the non-blocking operation.
[Brief description of the drawings]
FIG. 1 is an overall configuration diagram of an embodiment of the present invention.
FIG. 2 is a data transfer control method using conventional remote memory writing.
FIG. 3 is an explanatory diagram of a static buffer.
FIG. 4 is an explanatory diagram of a dynamic buffer.
FIG. 5 is a timing chart of a short protocol.
FIG. 6 is a timing chart of a long protocol.
FIG. 7 is a flowchart of a short protocol.
FIG. 8 is a flowchart of a long protocol.
FIG. 9 is an explanatory diagram of pipeline operation.
FIG. 10 is a flowchart of pipeline operation.
FIG. 11 is an explanatory diagram for guaranteeing order.
FIG. 12 is an operation diagram of pipeline operation.
[Explanation of symbols]
101, 102 ... Element computer, 103, 104 ... CPU, 105, 106 ... Memory, 107 ... Communication path, 108, 109 ... Operating system, 110, 111 ... User program, 112,113 ... Message communication library, 114,115 ... Memory area for data transfer, 116,117,301,309 ... Static buffer, 118,119,401,405,910,911 ... Dynamic buffer, 201,204 ... User buffer, 202,203 ... Data transfer buffer, 302 ... Static buffer header, 303,308 ... Static buffer message body, 304 .. An area for storing the identifier of the message, 305... An area for storing the message length, 306... An area for storing the start address of the reserved dynamic buffer, 307. Store the address Area 402: header for confirming whether a message has arrived in the dynamic buffer 403: body of the message in the dynamic buffer 404: management header for the dynamic buffer 1101 ... Area for storing the order of static buffers, 1102, 1103... Request object, 1104, 1105... Request object area for storing message identifier, 1106, 1107... Object area, 1108, 1109... Request object area for storing message length, 1110, 1111... Area for storing the order of request objects.

Claims

A data transfer method between a plurality of computers and a parallel computer comprising a communication path interconnecting them,
In each of the computers, a static reception buffer area that is fixed for each communication partner computer and a dynamic reception buffer area that is not fixed for each communication partner computer are secured in advance. And steps to
Is shorter than the data length of transmission data is predetermined value, and performing data transfers by writing transmission data to the static buffer area of the destination by the remote memory write from a source computer,
When the data length of the transmission data is longer than a predetermined value, the step of transmitting information including the data length of the transmission data from the transmission source computer to the transmission destination computer, and at the transmission destination computer, Determining a dynamic buffer area to receive the transmission data according to the information, and returning the address of the determined dynamic buffer area to the transmission source computer by remote writing using the static buffer area; A data transmission / reception method in a parallel computer, comprising the step of writing the transmission data in a dynamic buffer area indicated by a returned address by remote writing from the transmission source computer .

A remote memory writing mechanism that connects multiple element computers via a communication path and writes data in the data transfer memory area on any element computer to the data transfer memory area on any other element computer A data transmission / reception method between two arbitrary element computers in a parallel computer having:
Each element computer is triggered by a communication initialization request from a user program, and communication occurs in the data transfer memory area with a static receive buffer area whose address is fixed in advance for each computer as a communication partner. Reserving a dynamic receive buffer area from which buffers are dynamically allocated each time,
In response to a transmission request from the user program, the transmitting side element computer copies the transmission data on the user memory to the data transfer memory area, configures a header including information on the data length, When the data length is shorter than a predetermined value, the header and the data are written to the static buffer area prepared on the element computer on the receiving side using a remote memory writing mechanism, and the data length of the transmission data Is longer than a predetermined value, the header is written to the static buffer area prepared on the element computer on the receiving side using a remote memory writing mechanism;
The element computer on the receiving side refers to the header when the header arrives and acquires the data length. If the data length of the transmission data is longer than a predetermined value, the data length is necessary. Securing a buffer on the dynamic buffer area and notifying address information of the buffer to the element computer on the transmission side;
When the data length of the transmission data is longer than a predetermined value, the transmission-side element computer refers to the address information when the address information arrives, and sends the data to the reception-side element computer. Writing to a reserved dynamic buffer using a remote memory writing mechanism;
The reception-side element computer, when triggered by a reception request from the user program, copies the data written in the static buffer area or the dynamic buffer area to the reception buffer on the user memory. Transmission / reception method.

When multiple transmission requests are issued from the user program, the address information of the static reception buffer used at the time of the next transmission request is stored in the header information, thereby guaranteeing the order of issuing the plurality of transmission requests. The data transmission / reception method according to claim 2, further comprising steps .

Between two arbitrary element computers in a parallel computer having a remote memory writing mechanism in which a plurality of element computers are connected by a communication path and data on an arbitrary element computer is written to the memory of another arbitrary element computer A data transmission / reception method,
Each element computer reserves on the data transfer memory area a dynamic reception buffer area to which buffers are dynamically allocated each time communication occurs, and two transmission buffers. Configuring the dynamic reception buffer area to be composed of a plurality of predetermined fixed-length blocks;
The transmission-side element computer divides the transmission data into a plurality of predetermined fixed-length packets triggered by a transmission request from the user program, configures a header including information on the length of the data, A step of notifying the receiving side element computer of the header; and upon receiving the header information, the receiving side element computer refers to the header information and sets a reception buffer including the necessary number of blocks. Securing on the reception buffer area and notifying address information of the buffer to the element computer on the transmission side;
The transmission side element computer, upon the arrival of the address information of the buffer, copies the nth packet of the plurality of packets to one of the two transmission buffers on the data transfer memory area, and The (n−1) th packet copied to the other of the two transmission buffers on the data transfer memory area is transferred to the dynamic reception buffer n on the element computer on the reception side by using a remote memory writing mechanism. -Writing to the first block and applying the above two steps sequentially for all the plurality of packets;
A data transmission / reception method including a step in which a reception side element computer sequentially copies the data written in a plurality of blocks of the reception buffer to a reception buffer on a user memory in response to a reception request from a user program.