JP3628514B2

JP3628514B2 - Data transmission / reception method between computers

Info

Publication number: JP3628514B2
Application number: JP13236498A
Authority: JP
Inventors: 常之今木; 暢俊佐川
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1998-05-14
Filing date: 1998-05-14
Publication date: 2005-03-16
Anticipated expiration: 2018-05-14
Also published as: JPH11328134A

Description

【０００１】
【発明の属する技術分野】
本発明は、複数の種類の通信網により接続された複数の計算機を有する計算機システムにおける、計算機間のデータ送受信方法に係り、特にメッセージパッシング型の通信を実行するのに好適な、計算機間のデータ送受信方法に関する。
【０００２】
【従来の技術】
計算機間の通信規約としてＴＣＰ／ＩＰが極めて一般的に使用されている。ＴＣＰ／ＩＰを使用して他のプログラムと通信するように構成されたプログラムを以下ではＴＣＰ／ＩＰアプリケーションプログラムと呼ぶ。このＴＣＰ／ＩＰアプリケーションプログラムからＴＣＰ／ＩＰ以外の通信規約を使用できるようにした計算機システムも存在する。すなわち、計算機システムを構成する計算機が性能の異なる複数のネットワークにより構成され、そのシステムの内の計算機が他の計算機と通信するときに、それらの計算機がその通信にこれらのネットワークのいずれを使用するかに依存して、ＴＣＰ／ＩＰまたは他の通信規約を使用するシステムも提案されている。例えば、バークレー大のＳｔｅｖｅｎＨ．Ｒｏｄｒｉｇｕｅｓらが開発したシステムは、ＴＣＰ／ＩＰプロトコルで使用可能な広域のネットワークと、それより狭い領域でより簡潔でオーバヘッドが低い通信規約を使用する局所的なネットワークからなり、局所的なネットワークに接続された計算機同士は、この通信規約を使用して通信し、広域のネットワークに接続された計算機同士は、ＴＣＰ／ＩＰ規約に従って通信する。たとえば、”Ｈｉｇｈ−ＰｅｒｆｏｒｍａｎｃｅＬｏｃａｌ−ＡｒｅａＣｏｍｍｕｎｉｃａｔｉｏｎＷｉｔｈＦａｓｔＳｏｃｋｅｔ”，ＵＳＥＮＩＸ ’９７ＡｎｎｕａｌＴｅｃｈｎｉｃａｌＣｏｎｆｅｒｅｎｃｅｐｐ．２５７−２７４）参照。
【０００３】
この計算機システムには、ＴＣＰ／ＩＰアプリケーションプログラムからもこの通信規約を利用できるようにするために、ＴＣＰ／ＩＰエミュレーションライブラリが用意されている。上記システムではこのＴＣＰ／ＩＰエミュレーションライブラリは、高速ソケット（ＦａｓｔＳｏｃｋｅｔ）と呼ばれている。
【０００４】
高速ソケットは、簡潔な通信規約としてワークステーションクラスタ向けに開発されたアクティブメッセージ（ＡｃｔｉｖｅＭｅｓｓａｇｅ）を用いる。例えば文献Ｔ．ｖｏｎＥｉｃｋｅｎ，Ｄ．Ｅ．Ｃｕｌｌｅｒ，Ｓ．Ｃ．Ｇｏｌｄｓｔｅｉｎ，Ｋ．Ｅ．Ｓｃｈａｕｓｅｒ ”ＡｃｔｉｖｅＭｅｓｓａｇｅｓ：ａＭｅｃｈａｎｉｓｍｆｏｒＩｎｔｅｇｒａｔｅｄＣｏｍｍｕｎｉｃａｔｉｏｎａｎｄＣｏｍｐｕｔａｔｉｏｎ”，ｉｎＰｒｏｃｅｅｄｉｎｇｓｏｆｔｈｅ１９ｔｈＩｎｔｅｒｎａｔｉｏｎａｌＳｙｍｐｏｓｉｕｍｏｎＣｏｍｐｕｔｅｒＡｒｃｈｉｔｅｃｔｕｒｅ，Ｍａｙ１９９２ｐｐ．２５６−２６６参照。アクティブメッセージでは、データを送る側のアプリケーションプログラムが受け手側のアプリケーションプログラムに割り込み、受け手側がその割り込みを契機にデータ受信処理を行うという方法でデータをやり取りする。
【０００５】
並列計算機は、複数の計算機が互いに通信しながら協調して１つの問題を解決することを目的とした計算機である。この目的を満たすため、一般に並列計算機の内部の各計算機は高速の内部高速通信網により相互に接続され、並列計算機内の少なくとも一部の計算機は、さらに、ＬＡＮ等のより広域のネットワークにより外部の計算機に接続されている。
【０００６】
広域のネットワークに使用される通信規約は主としてＴＣＰ／ＩＰである。並列計算機は、内部高速通信網を使用するために、各計算機に他の内部計算機と内部高速通信網を介して通信するための高速な通信ハードウェアとそのハードウエアを使用するための高速通信ライブラリを設けている。現在、多くの並列計算機に採用されている通信規約は、メッセージパッシング型の通信である。メッセージパッシング型の通信は、送信アプリケーションプログラムが発行する送信命令と受信アプリケーションプログラムが発行する受信命令が一対一に対応付けられた時に通信が行われるという通信である。多くの場合、この通信方式は並列計算機内部における高速通信ハードウェアに適している。このメッセージパッシング型通信を実現するために使用される高速通信ライブラリには、リモートメモリ書き込みライブラリあるいはＰＵＴライブラリと呼ばれているものが主に使用されている。
【０００７】
並列計算機内部の通信のオーバヘッドは、高速通信ハードウェアを使用することによりかなり減じることができる。アクティブメッセージで用いる割り込み型の通信方式では、割り込みのオーバヘッドが目立ってしまう。しかるに、メッセージパッシング型通信方式は、割り込みを前提としないので、並列計算機の内部での通信にはアクティブメッセージ型通信よりも適している。
【０００８】
【発明が解決しようとする課題】
前述のように、並列計算機の内部高速通信網を使用するための通信規約は一般にはメッセージパッシング型である。しかし、並列計算機をビジネスの分野で利用する場合に使用するビジネス用のアプリケーションプログラムは多くはＴＣＰ／ＩＰ規約を使用するように構成されている。したがって、そのようなアプリケーションプログラムは、そのままではメッセージパッシング型通信方式をそのまま利用することはできない。しかも、メッセージパッシング型通信方式を利用した状態で、ストリーム通信を実現する方法が知られていない。また、割込を使用するアクティブメッセージ型通信で使用されるストリーム通信の実現方法をそのままメッセージパッシング型通信でのストリーム通信に利用するわけには行かない。
【０００９】
従って本発明の目的は、メッセージパッシング型通信を実行するように構成されている計算機上で動作する複数のアプリケーションプログラム間でストリーム通信を実行可能にする計算機間データ送受信方法を提供することである。
【００１０】
本発明のより具体的な目的は、第１の計算機ネットワークを介して、ＴＣＰ／ＩＰのごとき、メッセージパッシング型通信と異なる通信規約を用いて通信を実行するように構成されている複数のアプリケーションプログラム間で、その計算機ネットワークよりも高速な通信網を使用して、かつ、メッセージパッシング型通信を実行可能にする計算機間データ送受信方法を提供することである。
【００１１】
さらに、本発明の他の目的は、第１の通信網とそれより高速の第２の通信網に接続された計算機上で動作し、他のアプリケーションプログラムとの間で、たとえばＴＣＰ／ＩＰ通信規約のような第１の通信規約に基づいて通信するように構成されたアプリケーションプログラムが、第１の通信網に接続された他の計算機上で動作する他のアプリケーションプログラムとの間で、その通信規約に基づいた通信を実行することを可能にするとともに、第２の通信網に接続された他の計算機上で動作する他のアプリケーションプログラムとの間でその第２の通信網を使用した高速の通信を実行する行うことを可能にする計算機間データ送受信方法を提供することである。
【００１２】
さらに、本発明の他の具体的な目的は、上記第２の通信網を使用した通信を、メッセージパッシング型の通信とすることができる計算機間データ送受信方法を提供することである。
【００１３】
【課題を解決するための手段】
上記目的を達成するために、本発明による計算機間データ送受信方法では、
第１の計算機上で実行されている第１のアプリケーションプログラムが発行した複数の送信命令により指定される複数の送信データのそれぞれを、第２の計算機上で実行されている第２のアプリケーションプログラムが発行した複数の受信命令に応答して、メッセージパッシング型の通信で受信する。この受信を第２の計算機上に設けられたエミュレーションライブラリにより制御する。
【００１４】
さらに、上記複数の送信データの連なりからなるひと繋がりのデータの内、上記複数の受信命令により指定されるサイズ部分に区分して得られる部分をそれぞれの受信命令が指定する複数のバッファに格納するというストリーム通信を実現するように、受信されたそれぞれの送信データを処理する。この処理を、上記エミュレーションライブラリにより制御する。
【００１５】
より具体的には、本発明による計算機間データ送受信方法は、以下の処理を実行する。
【００１６】
（ａ）送信側アプリケーションが一回の送信命令によって送信しようとするデータの長さを、受信側エミュレーションライブラリにより検知し、
（ｂ）上記の送信データ長を、アプリケーションが一回の受信命令によって指定しているデータ受け取り長と受信側エミュレーションライブラリにより比較し、
（ｃ）上記の比較で、送信データ長がデータ受け取り長より長ければ、受信側エミュレーションライブラリが、一旦メモリ上に確保したバッファ領域にデータの全てを受信して、そこからアプリケーションが指定したデータ受け取り長分だけのデータをアプリケーション領域にコピーし、送信データ長がデータ受け取り長以下であれば、受信側エミュレーションライブラリが、アプリケーション領域にデータを直接受信する。
【００１７】
（ｄ）アプリケーションが受信命令を発行した時に、バッファ領域にデータが残っている場合には、そこからアプリケーション領域にデータをコピーする。
【００１８】
さらにより具体的には、本発明によるデータ送受信方法を実現するために、ＴＣＰ／ＩＰエミュレーションライブラリを用意する。このライブラリは、ＴＣＰ／ＩＰのソケットアプリケーションプログラムインタフェースと同一のインタフェースを持ち、通信相手が並列計算機の外部であったら従来のシステムコールを、内部であったらＭＰＩ等の並列計算機用メッセージパッシング型通信方式による高速通信網を用いる、という切り分けを行う。すなわち、このＴＣＰ／ＩＰエミュレーションライブラリには以下のような特徴を持たせる。
【００１９】
（１）並列計算機内部で通信する場合は、メッセージパッシング型通信方式に適した通信手順を用いて、ＴＣＰ／ＩＰと同等のストリーム通信サービスを提供する。
【００２０】
（２）ＴＣＰ／ＩＰエミュレーションライブラリとして適当な手段を用いて通信方法を切り分ける。
【００２１】
（３）外部通信のデータと内部通信のデータをスピンループによって検出する。もしくは外部通信のデータと内部通信のデータを別々のスレッドで検出する。
【００２２】
【発明の実施の形態】
＜従来の技術とその問題点＞
本発明の実施の形態を説明する前に、従来の技術とその問題点を説明する。
【００２３】
（１）ＴＣＰ／ＩＰ
図１３にＴＣＰ／ＩＰの階層を示す。ＴＣＰ／ＩＰでは、下からリンク層３０１，ＩＰ層３０２，ＴＣＰ層３０３，アプリケーション層３０６の４階層を定めている。実際にはリンク層３０１の下に物理層が存在するが簡単化のためにこの層には言及しない。以下では、簡単化のためにアプリケーション層より下の層３０３，３０２，３０１からなる層群３０４をまとめてＴＣＰ／ＩＰ層と呼ぶ。一般にＴＣＰ／ＩＰが定める手順で通信を行なうために必要なプログラム（以下、ＴＣＰ／ＩＰ処理ルーチンと呼ぶ）はＯＳに含まれる。ＴＣＰ／ＩＰが定める手順で通信を実行するには、ＴＣＰ／ＩＰ層３０４を構成する複数の層の各々によりそれぞれ特定の処理が実行される。本明細書では、これらの層がそれぞれ実行する複数の処理あるいはそれらの処理を実行する複数のプログラムルーチンを総称してＴＣＰ／ＩＰ処理ルーチンと呼ぶ。
【００２４】
アプリケーションプログラムがＴＣＰ／ＩＰ処理ルーチンを利用するためのインタフェースとしては、一般にソケットアプリケーションプログラミングインタフェース（ソケットＡＰＩ）１１１が用いられる。アプリケーションプログラムがＴＣＰ／ＩＰ処理ルーチンを利用するために、ソケットライブラリと呼ばれるシステムコール群がＯＳの一部として設けられている。
【００２５】
なお、本明細書では、データ受信のためのシステムコールおよび関数あるいはデータ送信のためのシステムコールおよび関数を呼び出すことを、それぞれの機能と組み合わせて受信命令あるいは送信命令を発行すると記載することがある。ソケットアプリケーションプログラムインタフェースは、ソケットライブラリのアプリケーションプログラムインタフェースとして定められている。ソケットアプリケーションプログラムインタフェースに従って記述された、ＴＣＰ／ＩＰで通信を行なうアプリケーションプログラム（以下、ＴＣＰ／ＩＰアプリケーションプログラムと呼ぶ）の例を以下に示す。
【００２６】

まず、他と通信を行おうとするサーバ側のＴＣＰ／ＩＰアプリケーションプログラムおよびクライアント側のＴＣＰ／ＩＰアプリケーションプログラムは、それが実行中の計算機に設けられたＴＣＰ／ＩＰ処理ルーチンに含まれたシステムコールソケット（ｓｏｃｋｅｔ）を呼び出す。呼び出されたシステムコールソケットは、通信端の役目を果たす、ソケットと呼ばれるオブジェクトを生成し、ソケット記述子を返す。上の例ではサーバ側、クライアント側のＴＣＰ／ＩＰアプリケーションプログラムは、それぞれに返されるソケット記述子をｓａまたはｓｂ０として受け取る。
【００２７】
ソケット記述子は、アプリケーションプログラム内でソケット毎に一意に決定されるソケットの識別子（ＩＤ）であり、整数値からなる。生成されたソケットに対する操作は、全てこのソケット記述子を指定して行う。以下、特に断らない限り、ソケットとソケット記述子は同義であるとして説明する。上のシステムコールの引数ＡＦ＿ＩＮＥＴは、アドレスファミリインタネットを表し、インタネットを介した通信にソケットを使用することを示す。さらに、引数ＳＯＣＫ＿ＳＴＲＥＡＭは、ストリーム通信を要求する。最後の引数はプロトコルを指定する引数である。この値が０のときには、プロトコルはその引数より前の二つの引数により決まる。今の場合には、ＴＣＰ／ＩＰプロトコルを使用することになる。ソケットが生成された後、サーバ側とクライアント側のＴＣＰ／ＩＰアプリケーションプログラムは異なる手順でソケットの接続処理を行う。
【００２８】
サーバ側は、ＴＣＰ／ＩＰ処理ルーチンに含まれたシステムコールバインド（ｂｉｎｄ）を呼び出す。呼び出されたシステムコールバインドは、引数で指定されたソケットｓａに引数で指定された名前を対応付ける。名前は、ＩＰアドレスとポート番号の組合わせからなる。上の例では、ｓｌｅｎで指定される大きさを持つ構造体ｓｅｒｖｅｒに格納されたＩＰアドレスとポート番号の組からなる名前にソケット識別子ｓａが対応付けられる。サーバ側は、さらにＴＣＰ／ＩＰ処理ルーチンに含まれたシステムコールリスン（ｌｉｓｔｅｎ）を呼び出す。
【００２９】
このシステムコールリスンは、引数で指定されたソケットｓａを接続要求の受領のためのソケットに設定する。このシステムコールの第２の引数は、その接続要求および他の接続要求を一時的に保持するのに用いるキューに要求するサイズを表し、今の場合には、このコールは、５個の接続要求を保持可能なキューを要求している。
【００３０】
以上の処理の結果、論理的には、サーバ側とクライアント側の間に通信路が決定されたことになる。
【００３１】
その後、サーバ側は、ＴＣＰ／ＩＰ処理ルーチンに含まれたシステムコールアクセプト（ａｃｃｅｐｔ）を呼び出す。このシステムコールは、引数により指定されたソケットｓａを接続要求の待ち状態にする。このシステムコールの第２，第３の引数は、待つべき接続要求の発行元クライアント側のＩＰアドレスと長さを表す。
【００３２】
一方、クライアント側は、ＴＣＰ／ＩＰ処理ルーチンに含まれたシステムコールコネクト（ｃｏｎｎｅｃｔ）を呼び出す。このシステムコールは、引数で指定された名前、今の場合にはサーバ側のソケットｓａに付けられた名前（構造体ｓｅｒｖｅｒに格納された、ＩＰアドレスｓｉｎ＿ａｄｄｒとポート番号ｓｉｎ＿ｐｏｒｔの組み合わせ）に対して引数で指定されたソケットｓｂ０を接続する。
【００３３】
さらに、サーバ側の計算機では、先に呼び出されたシステムコールアクセプトが、この接続要求を受信し、システムコールアクセプトに対する前述の引数が指定するソケットｓａ１をこのクライアント側との通信のためのソケットとして新たに生成する。こうして、サーバ側のソケットｓａ１とクライアント側のソケットｓｂ０との間で通信路が確立されたことになる。
【００３４】
ソケット間の接続が確立された状態でクライアント側がサーバ側にデータを送信する時には、クライアント側は、ＴＣＰ／ＩＰ処理ルーチンに含まれたシステムコールライト（ｗｒｉｔｅ）を呼び出す。このシステムコールに対する引数では、送信すべきデータを保持するのに用いるバッファのアドレスｂｕｆｆｅｒ０とそのバッファの長さｌｅｎｇｔｈ０を指定する。システムコールライトは、このバッファ内のデータをこのシステムコールが指定するソケットｓｂ０を用いて送信する。サーバ側はＴＣＰ／ＩＰ処理ルーチンに含まれたシステムコールリード（ｒｅａｄ）を呼び出す。このシステムコールは、その引数で指定されたソケットｓａ１を介して転送されたデータを受信し、引数で指定されたアドレスｂｕｆｆｅｒ１のバッファに書き込む。このようにして、２つのアプリケーションプログラムの間でデータが転送される。もし必要ならば、クライアント側は複数の後続のデータをそれぞれ送信するために複数のライトシステムコールを呼び出し、サーバ側はそれらの後続のデータを受信するために一つまたは複数のリードシステムコールを呼び出す。
【００３５】
ソケット記述子は、ファイルへの入出力や標準入出力などを行う際に用いられる、ファイル記述子の一種として定義されている。このためソケットアプリケーションプログラムインタフェースでは、ファイル等の入出力を行う場合と同様のインタフェースを通して、データの送受信を行うことができる。
【００３６】
サーバ側とクライアント側のプログラムの間の通信が終了したときには、サーバ側は、システムコールクローズ（ｃｌｏｓｅ）を呼び出し、引数で指定されたソケットｓａ１を閉鎖する。サーバ側は、さらに同じシステムコールクローズを再度呼び出し、システムコールクローズは今度はソケットｓａを閉鎖する。クライアント側も同様にシステムコールクローズを呼び出して、このシステムコールはソケットｓｂ０を閉鎖する。
【００３７】
（２）高速ソケットにおける通信方法の切り替え
従来のＴＣＰ／ＩＰでは、システムコールバインドによってＩＰアドレスとポート番号からなる名前（ｓｅｒｖｅｒ）をソケット（ｓａ）に対応付けていた。高速ソケットでは、この時、さらに高速通信専用のソケットを以下のように生成する。まず、アプリケーションがシステムコールバインドの呼び出し時に指定する名前のポート番号（構造体ｓｅｒｖｅｒに格納されたポート番号ｓｉｎ＿ｐｏｒｔ）にハッシュ関数を施して、新たにシャドウポート番号を導き出す。次に、システムコールソケットを呼び出して、シャドウソケットを新規に生成する。最後にバインドシステムコールを呼び出して、シャドウソケットにシャドウポート番号を対応付ける。高速通信を行なう場合は、サーバおよびクライアントがａｃｃｅｐｔおよびｃｏｎｎｅｃｔを呼び出して接続するときに、このシャドウソケットを用いることで高速通信専用の通信路を貼る。
【００３８】
この方法では、システムコールバインドやシステムコールリスンが呼び出される時に、特別な処理が必要となる。
【００３９】
（３）ＴＣＰ／ＩＰ処理ルーチンによるストリーム通信
ＴＣＰ／ＩＰのソケットライブラリは、ストリーム通信をサービスする。ストリーム通信とは、送信側のアプリケーションプログラムが一連のライトシステムコールにより送る複数のデータをひと繋がりのデータストリームとして処理し、そのデータストリームを受信側のアプリケーションプログラムが発行する一つあるいは複数のリードシステムコールが指定する任意の長さを有する一つあるいは複数のデータに切り分けて受け取るという通信方法である。
【００４０】
図１４を用いて従来のストリーム通信の動作を説明する。ここでは、送信アプリケーションプログラム８０１が受信アプリケーションプログラム８０２にデータを送る例を示している。送信アプリケーションプログラム８０１は、まず第１のライトシステムコールを呼び出してそのプログラムのバッファ８０３内の５０キロバイト（ＫＢ）のデータを送信する（８０５）。なお、図１４では単位ＫＢは簡単化のために省略している。送信側のＴＣＰ／ＩＰ処理ルーチンは、バッファ８０３内のこの送信データを一旦ＯＳ内に確保したバッファ（図示せず）にコピーし、複数のパケットに分割して、受信側ＯＳに送信する。通常パケットのサイズは４０〜１５００バイトである。受信側ＯＳは、それらのパケットをＯＳ内に確保した複数のバッファ（図示せず）に受け取り、これらのパケットをリスト状に繋いで、データストリームを再構成する。送信アプリケーションプログラム８０１は、さらに第２のライトシステムコールを発行してそのプログラムの他のバッファ８０４内の８０ＫＢのデータを送信する（８０６）。このときも送信側のＴＣＰ／ＩＰ処理ルーチンと受信側のＴＣＰ／ＩＰ処理ルーチンは同様に動作して、このデータを先に送信されたデータと組み合わせて一つのデータストリームとしてＯＳ内の前述のバッファに保持する。
【００４１】
図において、８０９はＯＳ９００内に保持されたこのデータストリームを模式的に表す。先頭のデータ８０７、後続のデータ８０８はそれぞれ上記第１、第２のライトシステムコールにより送信されたそれぞれ５０ＫＢ、８０ＫＢのデータを表す。ストリーム通信では、受信アプリケーションプログラムは、これらのデータを一本の連続した１３０ＫＢのデータストリーム８０９として捉える。送信アプリケーションプログラム８０１内のバッファ８０３、８０４内のデータは、ＯＳ９００内のデータストリーム８０９内のデータ部分８０７，８０９をそれぞれ保持する。このための、受信アプリケーションプログラム８０２は、第１のリードシステムコールを発行して、ストリームデータ８０９内の５０ＫＢの先頭データ８０７の内の３０ＫＢを受信アプリケーションプログラム８０２の３０ＫＢのバッファ８１２に受け取ることを要求する（８１４）。受信側のＴＣＰ／ＩＰ処理ルーチンは、このリードシステムコールに応答して、このデータストリーム８０９から先頭の３０ＫＢのデータ８１０をバッファ８１２にコピーする。受信アプリケーションプログラム８０２がさらに第２のリードシステムコールを発行すると、このシステムコールが指定する長さに従って、ストリームデータ８０９内にある残りの１００ＫＢのデータ８１１をこのシステムコールが指定するバッファ８１３にコピーする（８１５）。
【００４２】
（４）アクティブメッセージ型通信でのストリーム通信
上記高速ソケットを使用したシステムでは、アクティブメッセージ型の通信を採用しながらストリーム通信を実現している。すなわち、このシステムでは、送信側のアプリケーションプログラムから送信要求が発行されると、この送信要求は、受信側のアプリケーションプログラムから受信要求が発行されるのを待たないで実行される。送信要求は、受信側の計算機に割り込みを発生し、この割り込みにより割り込みハンドラーが起動され、送信データは、この割り込みハンドラー内のバッファに一旦受信される。受信側のアプリケーションプログラムが受信要求を発行すると、割り込みハンドラーは、既に受信されたデータの内、この受信要求が要求する大きさのデータを、受信側のアプリケーションプログラムのバッファに転送する。もし、受信側のアプリケーションプログラムが要求するサイズが、受信済みのデータのサイズより小さければ、受信要求の処理が終了する。割り込みハンドラーに保持された残りのデータは、受信側のアプリケーションプログラムから新たな受信要求が発行されたときに、そのアプリケーションプログラムのバッファに転送される。
【００４３】
逆に、受信側のアプリケーションプログラムが要求するサイズが、受信済みのデータのサイズより大きければ、割り込みハンドラーは、送信側のアプリケーションプログラムがその後データを送信してきたときに、そのデータを、受信側のアプリケーションプログラムに供給する。受信側の割り込みハンドラーは、送信側のアプリケーションプログラムが送信要求を発行する前に、受信側のアプリケーションプログラムから後続の受信要求が発行されたときにも、同様にその後送信側のアプリケーションプログラムが後続のデータをその後送信してきたときに、その後続のデータを受信側のアプリケーションプログラムに供給する。もし、この後続のデータが、上記最初の受信要求が要求する不足のデータのサイズあるいは上記後続の受信要求が要求するデータのサイズより大きければ、割り込みハンドラーは、あまりのデータをさらに後続の受信要求のために保持する。
【００４４】
こうして、送信側のアプリケーションプログラムが発行する複数の送信要求により送信される複数のデータを、受信側のアプリケーションプログラムが発行する一連の受信要求に応答して受信側のアプリケーションプログラムに供給する。こうして、この方法では、割り込みハンドラー内のバッファに送信データを一旦保持することにより、ストリーム通信を実現している。
【００４５】
しかし、割込を使用しないメッセージパッシング型の通信では、このような方法を使用してストリーム通信を実現できない。
【００４６】
以下、本発明に係る計算機間データ送受信方法を図面に示したいくつかの実施の形態を参照してさらに詳細に説明する。なお、以下においては、同じ参照番号は同じものもしくは類似のものを表わすものとする。また、発明の第２の実施の形態以降においては、発明の第１の実施の形態との相違点を主に説明するに止める。
【００４７】
＜発明の実施の形態１＞
（１）装置の概要
図１は、本発明に係る計算機間送受信方法を実行するための計算機システムの一例を示す。図において、並列計算機１０１の内部の２台の計算機１０２，１０３と、１台の外部計算機１０４とがお互いに通信網で繋がっていると仮定する。実際には、並列計算機１０１の内部および外部の計算機の台数はそれぞれ任意である。内部の計算機１０２と１０３は内部高速通信網１０５で繋がっており、内部の計算機１０２，１０３にはそれぞれ内部高速通信網１０５専用のネットワークインタフェースハードウェア１１９，１２０が存在する。内部高速通信網１０５は、複数のパケットを互いに並列にかつ高速に転送可能なネットワーク、たとえばハイパクロスバスイッチなどにより構成される。また、内部の計算機１０２，１０３の全てと外部の計算機１０４はグローバルな通信網１０６に繋がっており、各計算機にはそれぞれ通信網１０６専用のネットワークインタフェースハードウェア１２１，１２２，１２３が存在する。
【００４８】
データの送受信は、各計算機のメモリ１２４，１２５，１２６にロードされているアプリケーションプログラム１０７，１０８，１０９の間で行われる。また、各メモリにはアプリケーションプログラムの他にＯＳ１２７，１２８，１２９がロードされており、それぞれのＯＳの中にはＴＣＰ／ＩＰ処理ルーチン１１４，１１５，１１０が存在する。ＴＣＰ／ＩＰが定める手順で通信を実行するには、ＴＣＰ／ＩＰ層３０４を構成する各層によりそれぞれ処理が実行される。ＴＣＰ／ＩＰ処理ルーチンは、ＴＣＰ／ＩＰが定める手順で通信を実行するためにこれらの層がそれぞれ実行する複数の処理の総称で、これらのＴＣＰ／ＩＰ処理ルーチン１１４，１１５，１１０自体は公知のものと同じであり、既に述べたような、システムコールで呼び出し可能な複数の関数を含んでいる。ＴＣＰ／ＩＰ処理ルーチン１１４，１１５，１１０は、広域通信を目的として、グローバルな通信網１０６専用のネットワークインタフェースハードウェア１２１，１２２，１２３によって通信を行う。
【００４９】
計算機１０２のメモリ１２４には、さらにＴＣＰ／ＩＰエミュレーションライブラリ１１２とメッセージパッシング型ライブラリ１４０と高速通信ライブラリ１３５とがロードされている。同様に、計算機１０３のメモリ１２５には、ＴＣＰ／ＩＰエミュレーションライブラリ１１３とメッセージパッシング型ライブラリ１４１と高速通信ライブラリ１３６とがロードされている。
【００５０】
計算機１０２，１０３上の高速通信ライブラリ１３５，１３６は並列計算機１０１の内部での高速通信を目的として、内部高速通信網１０５専用のネットワークインタフェースハードウェア１１９，１２０によって通信を行うためのライブラリであり、リモートメモリ書き込みライブラリあるいはＰＵＴライブラリと呼ばれているものが多く使用される。本実施の形態でも高速通信ライブラリ１３５、１３６にはこのＰＵＴライブラリを使用する。しかし、本発明はこのライブラリに限定されるのではなく、他のライブラリたとえばＰＵＴ／ＧＥＴライブラリと呼ばれるライブラリも使用可能である。
【００５１】
一般に、高速内部通信網１０５を通した通信はグローバルな通信網１０６を通して通信する場合に比べて格段に速い。そこでＴＣＰ／ＩＰエミュレーションライブラリ１１２，１１３は、内部計算機上のアプリケーションプログラム同士がＴＣＰ／ＩＰ通信を行おうとする際には、ＴＣＰ／ＩＰ処理ルーチン１１４，１１５ではなくメッセージパッシング型ライブラリ１４０，１４１と高速通信ライブラリ１３５，１３６と内部高速通信網１０５を使用してメッセージパッシング型の通信を実現するように構成され、それでもって通信の高速化を図る。
【００５２】
メッセージパッシング型ライブラリ１４０、１４１は、ＴＣＰ／ＩＰエミュレーションライブラリ１１２または１１３からの要求にしたがって、高速通信ライブラリ１３５または１３６を起動するためのライブラリである。メッセージパッシング型ライブラリ１４０、１４１は、一般的にはアプリケーションプログラム（本実施の形態においてはＴＣＰ／ＩＰエミュレーションライブラリ１１２、１１３）に対してメッセージパッシング型のインタフェースを有するライブラリである。
【００５３】
ＴＣＰ／ＩＰエミュレーションライブラリ１１２，１１３は、さらに、このメッセージパッシング型の通信においても従来のＴＣＰ／ＩＰ処理ルーチンが提供していたのと同じくストリーム通信を実現し、それでもって通信の高速化を図る。ＴＣＰ／ＩＰ処理ルーチン１１４，１１５は、ＯＳ１２７，１２８の機能であるため、従来技術ではアプリケーションプログラムがこれらを利用する際には必ずコンテクストスイッチのオーバーヘッドが発生するが、本実施の形態では、高速通信ライブラリ１３５，１３６を使うので、ＯＳを介さないためオーバーヘッドを回避でき、それにより、より一層の通信の高速化も期待できる。
【００５４】
（２）論理構成と構成要素間のインタフェース
図１（Ｂ）は、上記ハードウェア構成を論理構成として表現した図である。この図では、以降の説明に関係の無い部分は全て省略してある。また、高速通信ライブラリ１３５，１３６と内部高速通信専用ハードウェア１１９，１２０をひとまとめにして高速通信機構１１６，１１７と表現している。さらに、各構成要素間のインタフェース１１１および１１８を新たに示している。計算機１０２，１０３，１０４や並列計算機１０１を表す四角はそれぞれ、その中の論理構成要素が一台の計算機または並列計算機１０１上で実行されることを示している。
【００５５】
計算機１０４上では従来通り、アプリケーションプログラム１０９がＴＣＰ／ＩＰ１１０と、ソケットアプリケーションプログラムインタフェース１１１でリンクされている。並列計算機１０１の内部の計算機１０２，１０３上で動くアプリケーションプログラム１０７，１０８は、ＴＣＰ／ＩＰエミュレーションライブラリ１１２，１１３にソケットアプリケーションプログラムインタフェース１１１でもってリンクされている。また、ＴＣＰ／ＩＰエミュレーションライブラリ１１２，１１３は、ＯＳの機能である従来のＴＣＰ／ＩＰ処理ルーチン１１４，１１５にソケットアプリケーションプログラムインタフェース１１１でもってリンクされ、同時に、高速通信機構１１６，１１７にＭＰＩ仕様のインタフェース１１８でもってリンクされている。
【００５６】
（３）アプリケーションプログラム１０７，１０８
本実施の形態では、並列計算機１０１内のいずれかの計算機１０２上で動作しているアプリケーションプログラム例えば１０７が、いずれかの他の計算機上で動作している他のアプリケーションプログラムと通信する場合、当該他のアプリケーションプログラムが、並列計算機１０１内の計算機１０３上で動作しているアプリケーションプログラム例えば１０８か並列計算機１０１外の計算機１０４上で動作しているアプリケーションプログラム例えば１０９かによって、高速通信機構１１６とＴＣＰ／ＩＰ処理ルーチン１１４を使い分ける。並列計算機１０１内の異なる計算機１０２、１０３上で動作する二つのアプリケーションプログラム１０７、１０８が高速通信機構１１６、１１７を使用して相互に通信するためには、それらのアプリケーションプログラム間でデータを実際に送受信するときだけでなく、それぞれのアプリケーションプログラムのためのソケットを相互に接続するときにも高速通信機構１１６、１１７を利用するように特別の処理を行う必要がある。ＴＣＰ／ＩＰエミュレーションライブラリ１１２または１１３がこの特別の処理を実行するための複数の関数を含む。本実施の形態では、ＴＣＰ／ＩＰエミュレーションライブラリ１１２、１１３内に設けられた関数の名前にＥＭＵ＿という接頭辞をつけ、ＴＣＰ／ＩＰ処理ルーチン１１４、１１５または１１０内に設けられた前述の関数の名前と区別する。
【００５７】
具体的には、並列計算機１０１内の計算機１０２、１０３上で動作しているアプリケーションプログラム１０７または１０８の内、サーバ側およびクライアント側として動作するアプリケーションプログラムはそれぞれ以下のプログラムを実行するように生成される。
【００５８】

サーバ側アプリケーションプログラムは従来と同様にｓｏｃｋｅｔ、ｂｉｎｄ、ｌｉｓｔｅｎに対するシステムコールを呼び出す。これらのシステムコールは対応するＴＣＰ／ＩＰ処理ルーチンにより従来と同様に処理される。クライアント側のアプリケーションプログラムも従来と同様にｓｏｃｋｅｔシステムコールを発行する。このシステムコールも対応するＴＣＰ／ＩＰ処理ルーチンにより従来と同様に処理される。こうしてサーバ側とクライアント側に対して従来と同様にソケットｓａ、ｓｂ０が生成される。
【００５９】
その後サーバ側アプリケーションプログラムは、従来のシステムコールアクセプトに代えて、そのアプリケーションプログラムが動作している計算機内に設けられたＴＣＰ／ＩＰエミュレーションライブラリ内に設けられた関数エミュレーションアクセプト（ＥＭＵ＿ａｃｃｅｐｔ）を呼び出す。クライアント側のアプリケーションプログラムは、従来の関数コネクトに代えて、そのアプリケーションプログラムが動作している計算機内に設けられたＴＣＰ／ＩＰエミュレーションライブラリ内に設けられた関数エミュレーションコネクト（ＥＭＵ＿ｃｏｎｎｅｃｔ）を呼び出す。さらに、サーバ側アプリケーションプログラムは、従来のシステムコールリードに代えて、ＴＣＰ／ＩＰエミュレーションライブラリ内に設けられた関数エミュレーションリード（ＥＭＵ＿ｒｅａｄ）を呼び出し、クライアント側アプリケーションプログラムは、従来のシステムコールライトに代えて、対応するＴＣＰ／ＩＰエミュレーションライブラリ内に設けられた関数エミュレーションライト（ＥＭＵ＿ｗｒｉｔｅ）を呼び出す。以下、これらの新たな関数が行う処理を説明する。
【００６０】
（４）ＴＣＰ／ＩＰエミュレーションライブラリによるソケットの接続
図２において、サーバ側アプリケーションプログラム，クライアント側アプリケーションプログラムが、それぞれ上述のＥＭＵ＿ａｃｃｅｐｔ，ＥＭＵ＿ｃｏｎｎｅｃｔ関数を呼び出すと（処理５０１，５０２）、これらのシステムコールによってサーバ側のＴＣＰ／ＩＰエミュレーションライブラリ内に設けられた関数ＥＭＵ＿ａｃｃｅｐｔとクライアント側のＴＣＰ／ＩＰエミュレーションライブラリ内に設けられたＥＭＵ＿ｃｏｎｎｅｃｔ呼び出される。これらのシステムコールの引数は、サーバ側アプリケーションプログラム，クライアント側アプリケーションプログラムが、それぞれシステムコールａｃｅｐｔ、ｃｏｎｎｅｃｔを呼び出し、ソケットｓａ、ｓｂ０の接続をＴＣＰ／ＩＰ処理ルーチン１１４，１１５に要求するときと同じである。
【００６１】
呼び出された関数ＥＭＵ＿ａｃｃｅｐｔとＥＭＵ＿ｃｏｎｎｅｃｔは、まずａｃｃｅｐｔシステムコール，ｃｏｎｎｅｃｔシステムコールをそれぞれ発行する（処理５０３，５０４）。これらのシステムコールの引数は、関数ＥＭＵ＿ａｃｃｅｐｔ，ＥＭＵ＿ｃｏｎｎｅｃｔのそれぞれに対する引数がそのまま使用される。これによりサーバ側のＴＣＰ／ＩＰ処理ルーチン内に設けられたシステムコールａｃｃｅｐｔとクライアント側のＴＣＰ／ＩＰ処理ルーチン内に設けられたシステムコールｃｏｎｎｅｃｔが呼び出され、従来と同様にコールされたシステムコールａｃｃｅｐｔはコールされたシステムコールｃｏｎｎｅｃｔによって発行された接続要求を受領し、ソケットｓａ１が生成され、ソケットｓａ１とソケットｓｂ０が通信路１０６を介して接続された状態になる。
【００６２】
その後、関数ＥＭＵ＿ａｃｃｅｐｔ，ＥＭＵ＿ｃｏｎｎｅｃｔは、それぞれ相手のＩＰアドレスが並列計算機１０１の内部の計算機のアドレスであるか否かを確認する（処理５０６，５０７）。相手が並列計算機１０１の内部である場合は、ソケット記述子を内部テーブル４１０または４１１に登録する（処理５１０，５１１）。今の場合には、サーバ側のアプリケーションプログラム１０７用のソケットｓａ１、クライアント側のアプリケーションプログラム用のソケットｓｂ０がそれぞれ内部テーブル４１０、４１１にそれぞれ登録される。さらに、ＴＣＰ／ＩＰエミュレーションライブラリ１１２、１１３は、並列計算機１０１内における各計算機の識別子などの高速通信機構１１６，１１７を用いるために必要なデータを交換する。このデータ交換のために、アプリケーションプログラム１０７、１０８はそれぞれ関数ｒｅａｄ，ｗｒｉｔｅシステムコールを発行する（処理５１２、５１３、５１５、５１６）。ここでのデータ交換は、既に接続されているソケットｓａ１とｓｂ０を利用し、従来と同様にＴＣＰ／ＩＰ処理ルーチン１１４，１１５と通信路１０６を介して行われる。その後それぞれサーバ側、クライアント側のアプリケーションプログラム１０７，１０８にリターンする（処理５１８，５１９）。このとき、サーバ側のＴＣＰ／ＩＰエミュレーションライブラリ１１２はアプリケーションプログラム１０７に生成されたソケットｓａ１の識別子を戻す。
【００６３】
処理５０６，５０７において、相手が並列計算機１０１の内部の計算機でないことが判明したときは、関数ＥＭＵ＿ｃｏｎｎｅｃｔはクライアント側のアプリケーションプログラム１０８に返り、関数ＥＭＵ＿ａｃｃｅｐｔはソケットｓａ１をサーバ側のアプリケーションプログラム１０７に返し、そのプログラムに戻る（処理５０８，５０９）。こうして、サーバ側のアプリケーションプログラムに対するソケットｓａ１とクライアント側のアプリケーションプログラムに対するソケットｓｂ０は、それぞれに対応するＴＣＰ／ＩＰ処理ルーチンと通信路１０６を介して接続される。
【００６４】
なお、図３は、本実施の形態によりアプリケーションプログラム１０７，１０８，１０９が生成したソケットがお互いに接続されている一つの状態を示している。ここでは、アプリケーションプログラム１０７のソケットＳＡ１（４０２）とアプリケーションプログラム１０８のソケットＳＢ０（４０３）とが接続され（４０７）、アプリケーションプログラム１０８のソケットＳＢ１（４０４）とアプリケーションプログラム１０９のソケットＳＣ０（４０５）が接続され（４０８）、アプリケーションプログラム１０９のソケットＳＣ１（４０６）とアプリケーションプログラム１０７のソケットＳＡ０（４０１）が接続されている（４０９）。ここで、ＴＣＰ／ＩＰエミュレーションライブラリ１１２，１１３は、内部ソケットテーブル４１０，４１１を保持している。内部ソケットテーブ４１０、４１１には、ソケットの接続先が内部計算機である場合に、そのソケット記述子を登録する。例えば、アプリケーションプログラム１０７のソケットＳＡ１（４０２）の接続先であるソケットＳＢ０（４０３）は、内部計算機１０３の上で動くアプリケーションプログラム１０８のソケットなので、ＳＡ１を内部ソケットテーブル４１０に登録する。同様に、ソケットＳＢ０の接続先であるソケットＳＡ１は、内部計算機１０２の上で動くアプリケーションプログラム１０７のソケットなので、ＳＢ０を内部ソケットテーブル４１１に登録する。一方、ソケットＳＡ０（４０１）の接続先であるソケットＳＣ１（４０６）は、外部計算機上アプリケーションプログラム１０９のソケットなので、ＳＡ０は内部ソケットテーブル４１０には登録しない。同様に、ＳＣ０に接続されているＳＢ１も内部ソケットテーブル４１１には登録しない。
【００６５】
（５）内部通信と外部通信の切り分け方法
その後、サーバ側のアプリケーションプログラムとクライアント側のアプリケーションプログラムはデータ通信を開始する。これらの二つのプログラムの内の一方および他方は、データ送信および受信のために関数ＥＭＵ＿ｗｒｉｔｅ，ＥＭＵ＿ｒｅａｄをそれぞれ呼び出す。先に示したクライアント側とサーバ側のプログラムの例では、サーバ側のアプリケーションプログラムが、関数ＥＭＵ＿ｒｅａｄを呼び出し、クライアント側のアプリケーションプログラムが関数ＥＭＵ＿ｗｒｉｔｅを呼び出している。これらで指定する引数は、ＴＣＰ／ＩＰ処理ルーチン１１４，１１５に含まれたシステムコールｗｒｉｔｅ、ｒｅａｄに対する引数と同じである。
【００６６】
図４を参照するに、ＥＭＵ＿ｒｅａｄ，ＥＭＵ＿ｗｒｉｔｅが呼び出されると（処理７０１、７０２）、それぞれの関数は、それぞれの引数で指定されたソケットｓａ１、ｓｂ０のソケット記述子が対応する内部ソケットテーブル４１０、４１１にそれぞれ登録されているか否かを判定する（処理７０３，７０４）。それぞれのソケット識別子が内部ソケットテーブル４１０，４１１にそれぞれ登録されているときには、後に詳細に説明する手順で高速通信機構１１６，１１７、高速内部通信網１０５を用いてメッセージパッシング方式の通信を行う（処理７０７，７０８）。それぞれのソケット識別子が内部ソケットテーブル４１０、４１１に登録されていなかったら、そのまま従来のｒｅａｄ，ｗｒｉｔｅシステムコールをそれぞれ指定されたソケットに対して発行する（処理７０５，７０６）。これらのシステムコールは対応するＴＣＰ／ＩＰ処理ルーチン１１４，１１５により処理され、グローバル通信路１０６を用いた通信がそれらのＴＣＰ／ＩＰ処理ルーチン１１４，１１５により実行される。
【００６７】
なお、図５は、図３で示したソケットの接続状態において、各アプリケーションプログラム１０７，１０８，１０９同士が通信を行う場合のデータの流れを示して、上記切り分け方法を説明するための全体構成図である。アプリケーションプログラム１０７とアプリケーションプログラム１０８が通信する場合は、ソケットｓａ１（４０２）とソケットｓｂ０（４０３）を通信端として用いる。これらのソケットは、ＴＣＰ／ＩＰエミュレーションライブラリ１１２，１１３が保持する内部ソケットテーブル４１０，４１１に登録されている。そこでＴＣＰ／ＩＰエミュレーションライブラリ１１２，１１３は、データ通信処理にＴＣＰ／ＩＰ処理ルーチン１１４，１１５ではなく高速通信機構１１６，１１７を利用する（６０１）。一方、アプリケーションプログラム１０８とアプリケーションプログラム１０９が通信する場合は、ソケットＳＢ１（４０４）とソケットＳＣ０（４０５）を通信端として用いる。アプリケーションプログラム１０８とリンクされているＴＣＰ／ＩＰエミュレーションライブラリ１１３が保持する内部ソケットテーブル４１１には、ＳＢ１は登録されていない。そこでＴＣＰ／ＩＰエミュレーションライブラリ１１３は、データ通信処理に、ＴＣＰ／ＩＰ処理ルーチン１１５をそのまま利用する（６０２）。これによって、外部のＴＣＰ／ＩＰ処理ルーチン１１０とのデータ交換が可能となる。
【００６８】
本方式により、バインドシステムコール、リスンシステムコールは従来のＴＣＰ／ＩＰ処理ルーチンから変更せずにそのまま用いて、内部通信と外部通信の切り分けを実現できる。
【００６９】
（６）メッセージパッシング型ライブラリ１４０，１４１
並列計算機の各計算機に使用される内部高速通信網１０５を使用するための通信ハードウェア１１９、１２０および高速通信ライブラリ１３５，１３６はベンダ特有である場合が多いので、その利用方法もマシンによって様々である。よって、各マシンの高速通信ハードウェアと高速通信ライブラリを利用したアプリケーションプログラムを作ろうとする場合、マシンに特化した汎用性の低いプログラムにならざるを得なかった。これに対して、並列計算機内部の通信ハードウェアを利用して通信するためのライブラリを用意し、このライブラリのアプリケーションプログラムインタフェースを標準として規定することで、アプリケーションプログラムの汎用性を高めようという活動が世界中で活発である。
【００７０】
このメッセージ型の通信を使用するための汎用のインタフェースとして現在広く使用されているインタフェースは、ＭＰＩと呼ばれるメッセージパッシングインタフェース（ＭＰＩ−ＭｅｓｓａｇｅＰａｓｓｉｎｇＩｎｔｅｒｆａｃｅ）である。例えば、文献：”ＭＰＩ：ＭｅｓｓａｇｅＰａｓｓｉｎｇＩｎｔｅｒｆａｃｅＳｔａｎｄａｒｄｖｅｒｓｉｏｎ１．１”，ＭＰＩＦｏｒｕｍ，ＵｎｉｖｅｒｓｉｔｙｏｆＴｅｎｎｅｓｓｅｅ，１９９５参照。このインタフェースは、メッセージパッシング型の通信を実現するための上に述べた高速通信ライブラリが存在することを前提としているものであり、このインタフェースを使用しても、メッセージパッシング型の通信は、基本的には上記高速通信ライブラリにより実現されることには変わらない。
【００７１】
多くの並列計算機ベンダが、ＭＰＩに準拠したアプリケーションプログラムが並列計算機内部の高速通信ハードウェアを使用できるようにするための高速通信ライブラリを提供している。
【００７２】
本明細書では、メッセージ型の通信を使用するためのインタフェースをメッセージパッシング型インタフェースと呼び、そのインタフェースを有するライブラリをメッセージパッシング型ライブラリと呼ぶ。特に、ＭＰＩ仕様のインタフェースをＭＰＩあるいはＭＰＩインタフェースあるいはメッセージパッシングインタフェースと呼び、そのインタフェースを有するライブラリをＭＰＩライブラリあるいはメッセージパッシングインタフェースライブラリと呼ぶことがある。
【００７３】
本実施の形態では、メッセージパッシング型ライブラリ１４０、１４１として多くの並列計算機で利用可能である標準のＭＰＩ仕様により定められたインタフェースでもってコマンドあるいはデータを交換するライブラリを使用する。それでもって、ＴＣＰ／ＩＰエミュレーションライブラリ１１２，１１３の汎用性を高める。しかし、他の仕様のインタフェースを使用してもよい。
【００７４】
ＭＰＩに準拠した従来のアプリケーションプログラムの記述例を以下に示す。本実施の形態では、ＴＣＰ／ＩＰエミュレーションライブラリ１１２、１１３は、メッセージパッシング型ライブラリ１４０、１４１を起動する部分に関しては、以下に示すプログラム部分を有する。このプログラム部分のより詳細は、後にふれる。
【００７５】

ＭＰＩでは、通信しようとする全プロセスを一斉に立ち上げる。この時、各プロセスにはランクと呼ばれるプロセス識別子が決定される。上の例では、２つのプロセス、すなわち送信側プロセスｓｅｎｄｅｒと受信側プロセスｒｅｃｅｉｖｅｒとを立ち上げる場合を示している。この時、それぞれのプロセスにはランク０と１が付けられる。各プロセスは、まずＭＰＩ初期化関数ＭＰＩ＿ＩｎｉｔによってＭＰＩの初期化を行い、ＭＰＩ通信ランク関数ＭＰＩ＿Ｃｏｍｍ＿ｒａｎｋによって自分のランクを取得する。その後、各ランクごとの処理を行う。上の例では、ランク１のプロセスがＭＰＩ送信関数ＭＰＩ＿Ｓｅｎｄによって送るデータを、ランク０のプロセスがＭＰＩ受信関数ＭＰＩ＿Ｒｅｃｖで受け取る場合を示している。
【００７６】
（７）高速通信機構を用いたメッセージパッシング型の通信
高速通信機構１１６，１１７を用いるデータ受信処理７０７，データ送信処理７０８の詳細を説明する前に、これらの処理の実行に必要な、高速通信機構１１６，１１７を用いたメッセージパッシング型の通信の概要を説明する。
【００７７】
一般にメッセージパッシング型の通信を実行するには、送信側の計算機がデータを送信するための送信関数を実行し、受信側の計算機がそのデータを受信するための受信関数を実行する。後続のデータをそれらの計算機の間で転送するために、それらの関数が繰り返し実行される。本実施の形態では、メッセージパッシング型の通信の具体的な説明として、ＭＰＩ仕様によるメッセージパッシング通信を説明する。
【００７８】
ＭＰＩ仕様によるメッセージパッシング通信では、送信側の計算機は送信関数ＭＰＩ＿Ｓｅｎｄを実行し、受信側の計算機は受信関数ＭＰＩ＿Ｒｅｃｖを実行する。今の場合、送信処理７０８が起動されると、送信側のＴＣＰ／ＩＰエミュレーションライブラリ１１３は、先に記載したＭＰＩ仕様のアプリケーションプログラムの例にあるように、送信関数ＭＰＩ＿Ｓｅｎｄを呼び出す。なお、送信関数ＭＰＩ＿Ｓｅｎｄを呼び出す前に初期処理としてＭＰＩ＿Ｉｎｉｔ、ＭＰＩ＿Ｃｏｍｍ＿ｒａｎｋ等の関数の呼び出しを実行する必要があるが、これらの処理は、先に記載した関数ＥＭＵ＿ａｃｃｅｐｔ，ＥＭＵ＿ｃｏｎｎｅｃｔ内の内部テーブル登録処理（５１０，５１１）と同時に行なう。一方、受信側のＴＣＰ／ＩＰエミュレーションライブラリ１１２は、内部高速通信網１０５を用いるために受信関数ＭＰＩ＿Ｒｅｃｖを呼び出す。
【００７９】
送信側のＴＣＰ／ＩＰエミュレーションライブラリ１１３が上記送信処理７０８内で呼び出す関数ＭＰＩ＿Ｓｅｎｄの引数で指定するバッファアドレスｂｕｆｆｅｒおよびバッファ長ｌｅｎｇｔｈは、送信側のアプリケーションプログラム１０８が発行した関数ＥＭＵ＿ｗｒｉｔｅの対応する引数に等しくされる。一方、受信側のＴＣＰ／ＩＰエミュレーションライブラリ１１２が受信処理７０７内で呼び出す関数ＭＰＩ＿Ｒｅｃｖの引数で指定するバッファアドレスｂｕｆｆｅｒおよびバッファ長ｌｅｎｇｔｈは、後に説明するように受信側のアプリケーションプログラム１０７が発行した、関数ＥＭＵ＿ｒｅａｄの引数が指定するバッファアドレスおよびバッファ長もしくは受信側のＴＣＰ／ＩＰエミュレーションライブラリ１１３内に設けられる特定のバッファのアドレスおよびサイズに等しくされる。
【００８０】
送信側のメッセージパッシング型ライブラリ１４１内の関数ＭＰＩ＿Ｓｅｎｄは、起動されると、対応する高速通信ライブラリ１３６に対し、関数ＭＰＩ＿Ｓｅｎｄに対する引数が指定するアドレスのバッファからその引数が指定する長さのデータを内部高速通信網１０５を介して受信側の計算機１２４に転送することを要求する。受信側のメッセージパッシング型ライブラリ１４０内の関数ＭＰＩ＿Ｒｅｃｖは、起動されると、対応する高速通信ライブラリ１３５に対し、その高速通信ライブラリ１３５に、この転送されたデータを内部高速通信網１０５を介して受信し、関数ＭＰＩ＿Ｒｅｃｖに対する引数が指定するアドレスのバッファにその引数が指定する長さのデータを書き込むことを要求する。
【００８１】
起動された送信側の高速転送ライブラリ１３６と受信側の高速転送ライブラリ１３５は、要求されたデータの送信と受信をそれ自体公知の方法により内部高速通信網１０５を介して行う。このデータ転送は具体的には以下のように行われる。一般にはリモートメモリ書き込みコマンドあるいはＰＵＴコマンドといわれるコマンドが使用される。以下では、このコマンドをＰＵＴコマンドと呼ぶ。内部高速通信網１０５を介したデータ転送を行う通信方法は３つのＰＵＴコマンドにより行われる。
【００８２】
まず、送信側のＴＣＰ／ＩＰエミュレーションライブラリ１１３から関数ＭＰＩ＿Ｓｅｎｄが呼び出されると、送信側のメッセージパッシング型ライブラリ１４１は、この引数が指定したバッファ長（これはすなわち送信すべきデータの長さである）を含むデータの属性情報を含むヘッダの送信を送信側の高速転送ライブラリ１３６に要求する第１のＰＵＴコマンドを発行する。この高速転送ライブラリ１３６は、受信側の計算機１０２にこのヘッダを送信する。計算機１０２内の内部高速通信専用ハードウェア１１９が、このデータを受信側のメモリ１２４内の所定の位置に直接書き込む。
【００８３】
一方、受信側ＴＣＰ／ＩＰエミュレーションライブラリ１１２から受信命令ＭＰＩ＿Ｒｅｃｖが呼び出されると、受信側のメッセージパッシング型ライブラリ１４０は、まず送信側の計算機１０３からヘッダがすでに送られてきているか調べる。このために後述するデータ検査命令ＭＰＩ＿ｐｒｏｂｅが使用される。ヘッダが送られてきていないときには、その受信命令の実行は終了する。ヘッダがすでに送られてきているときには、そのヘッダ内のバッファ長から送信データが受信可能か否かを判定する。すなわち、その送信データの長さが、上記受信命令ＭＰＩ＿Ｒｅｃｖが指定したバッファ長以下であるか否かが判定される。ＭＰＩ仕様のメッセージパッシング通信に限らず、一般にメッセージパッシング型の通信では、送信命令が指定した送信データ長が、受信命令が指定した受信バッファサイズより大きいときには、受信側の計算機はその送信データを受信しないようになっている。もし受信側の計算機が受信側のバッファのサイズを超える受信データを受信したときには、そのバッファ以外のメモリ領域が受信データにより破壊する恐れがあるためである。受信側のメッセージパッシング型ライブラリ１４０は、上記判定の結果、送信データが受信可能であると判断したときには、受信命令ＭＰＩ＿Ｒｅｃｖに対する引数が指定した受信バッファのアドレスの送信を受信側の高速通信ライブラリ１３５に要求する第２のＰＵＴコマンドを発行する。この高速通信ライブラリ１３５はこのコマンドに対する応答としてバッファアドレスを送信側の計算機１０３に送信する。１０２内の内部高速通信専用ハードウェア１１９は、このアドレスを送信側のメモリ１２５内の所定の位置に書き込む。なお、上記判定の結果、送信データが受信可能でないときには、受信側のメッセージパッシング型ライブラリはバッファドレスを返さないで、エラーを送信側の計算機に通知する。
【００８４】
最後に、送信側のメッセージパッシング型ライブラリは、送信データと受信されたバッファアドレスの送信を送信側の高速転送ライブラリ１３６に要求する第３のＰＵＴコマンドを発行する。高速転送ライブラリ１３６は、このデータとバッファアドレスを受信側の計算機１０２に送信する。受信側の計算機１０２内の内部高速通信専用ハードウェア１１９は、このバッファアドレスとデータを受信し、そのアドレスを有するバッファに受信したデータを直接書き込む。こうして、データの転送が終了する。
【００８５】
なお、内部高速通信網１０５を介したデータ通信方法は以上の方法に限らず他の方法も使用可能である。たとえば並列計算機によっては、高速通信ライブラリはＰＵＴコマンドの他にリモートメモリ書き込みコマンドあるいはＧＥＴコマンドも実行可能である。このような並列計算機の場合には、送信側のメッセージパッシング型ライブラリ１４１が上記第３のＰＵＴコマンドを発行するのに代えて、受信側のメッセージパッシング型ライブラリ１４０がＧＥＴコマンドを発行し、受信側の高速通信ライブラリ１３５が送信側のメモリ１２５から送信データを読み出す処理を実行する。
【００８６】
（８）メッセージパッシング型の通信におけるストリーム通信
図４に示されたデータ受信処理７０７，送信処理７０８の詳細を図７、８に示した具体例を適宜参照して説明する。図４において、すでに述べたように送信側のアプリケーションプログラム１０８が、関数ＥＭＵ＿ｗｒｉｔｅを呼び出した結果（７０２）、送信処理７０８が起動されると、メッセージパッシング型ライブラリ１４１に含まれた関数ＭＰＩ＿Ｓｅｎｄが呼び出される（７８１）。図７に示した例では、関数ＥＭＵ＿ｗｒｉｔｅ呼び出し時（７０２）の引数ではバッファア８０３のアドレスＳｂｕｆｆと、５０キロバイト（ＫＢ）のバッファサイズを指定すると仮定する。関数ＭＰＩ＿Ｓｅｎｄ呼び出し時（７８１）の引数が指定するバッファアドレスとバッファサイズはこれらの値に等しくされる。以下では使用する関数の他の引数の説明は簡単化のために省略する。また、指定するバッファサイズの単位はＫＢであると仮定し、関数の引数を図示するときには、この単位ＫＢは簡単化のために図示しない。メッセージパッシング型ライブラリ１４１はこの関数呼び出し７８１に応答して、データの転送のために、既に述べたように高速通信機構１１７にヘッダの送信を指示し、その後データの送信を指示する。
【００８７】
一方、図４において、すでに述べたように受信側のアプリケーションプログラム１０７が、関数ＥＭＵ＿ｒｅａｄを呼び出した結果（７０１）、受信処理７０７が起動されると、この受信処理７０７では図６に示す処理がなされる。なお、図７に示した例では、関数ＥＭＵ＿ｒｅａｄの呼び出し７０１の引数が指定するバッファアドレスはＲｂｕｆｆであり、バッファサイズは３０ＫＢであり、送信関数ＥＭＵ＿ｗｒｉｔｅの呼び出し７０２が指定したバッファサイズより小さいサイズを指定していると仮定する。
【００８８】
この受信処理７０７では、まず、受信側のＴＣＰ／ＩＰエミュレーションライブラリ１１２内に設けられるデータ受信用の特定のバッファ（後に示すバッファ９０１）に受信済みでまだ受信側のアプリケーションプログラムに転送されていないデータが残っているかを判定する命令を発行する（処理２０２）。この命令は図７には示されていない。今仮定しているように最初に関数ＥＭＵ＿ｒｅａｄが実行されたときにはこの判定の結果は否定的となる。その後、送信データ検知命令ＭＰＩ＿ｐｒｏｂｅ７７１が発行され、受信側のアプリケーションプログラムに宛てて送信されようとするデータがあるかを判別する（処理２０６）。この命令は、具体的には、このアプリケーションプログラム１０７に宛てて送信されるべきデータに関する、既に説明したヘッダが高速通信機構１１６により受信済みであるか否かを判定する命令である。もしこのヘッダが送信側のアプリケーションプログラム１０８からまだ送信されていない場合には、受信処理７０８は、処理を終了し受信側アプリケーションプログラム１０７に戻る（処理２０７）。受信側アプリケーションプログラム１０７は受信が失敗したときの処理を実行する。たとえば、受信が成功するまで関数７０１を繰り返し呼び出す。
【００８９】
ヘッダが送信側のアプリケーションプログラム１０８からすでに送信済みであると仮定すると、そのヘッダが指定する送信データが受信側のアプリケーションプログラムのバッファ８１２に入りきるか否かが判定される（処理２０８）。図７の例では送信データのサイズは、５０ＫＢであり、受信側のバッファ８１２のサイズは３０ＫＢであり、この判定の結果は否定的となる。このような場合に、受信側のＴＣＰ／ＩＰエミュレーションライブラリ１１２が、この送信データを受信側のアプリケーションプログラム１０７のバッファ８１２に受信するＭＰＩ＿Ｒｅｃｖ命令を発行すると、受信側のメッセージパッシング型ライブラリ１４０は、通常はＭＰＩ仕様によりこの命令をエラーとして処理するかもしくは送信データの内、受信側のアプリケーションプログラムのバッファ８１２に入りきらない分を捨ててしまう。
【００９０】
これを防ぐために、本実施の形態では受信側アプリケーションプログラム１０７とリンクしている受信側ＴＣＰ／ＩＰエミュレーションライブラリ１１２内に、特別なバッファ９０１を用意し、受信側ＴＣＰ／ＩＰエミュレーションライブラリ１１２はここに送信データをこのバッファ９０１に一旦受信することを要求する。すなわち、このバッファ９０１のアドレスＥｂｕｆｆと全送信データのサイズ５０ＫＢとを指定する、関数ＭＰＩ＿Ｒｅｃｖを呼び出す（７７２）。
【００９１】
この関数呼び出し７７２に応答して、受信側の高速通信機構１１６と送信側の高速通信機構１１７は、すでに述べたようにしてデータを送受信し、受信側の高速通信機構１１６は、このデータを上記バッファ９０１に書き込む。受信側のＴＣＰ／ＩＰエミュレーションライブラリ１１２はその後、この受信データの内、関数ＥＭＵ＿Ｒｅａｄ呼び出し時の引数で指定されたサイズ３０ＫＢのデータを受信側のアプリケーションプログラム１０７のバッファ８１２にコピーするメモリコピー命令（ＭＥＭＣＰＹ（Ｒｂｕｆｆ、Ｅｂｕｆｆ、３０ＫＢ））７７３を発行する（処理２０９）。その後処理はアプリケーションプログラムに戻る（処理２１１）。こうして、ＴＣＰ／ＩＰエミュレーションライブラリ１１２内のバッファ９０１に受信されたデータの一部９０６が残っている状態で、アプリケーションプログラム間の送受信が完了する。
【００９２】
その後さらに送信側アプリケーションプログラム１０８がアドレスＳｂｕｆｆ’のバッファ８０４のデータ８０ＫＢを送信するために関数ＥＭＵ＿ｗｒｉｔｅ（Ｓｂｕｆｆ’，８０ＫＢ）（７９０）（図７（Ｂ））を呼び出すと、同様にして送信処理７０８が実行され、この処理の中で関数ＭＰＩ＿Ｓｅｎｄ（Ｓｂｕｆｆ’，８０）（７９１）が呼び出される。一方、受信側アプリケーションプログラム１０７もアドレスＲｂｕｆｆ’のバッファ８１３に１００ＫＢのデータを受信するために関数ＥＭＵ＿ｒｅａｄ（Ｒｂｕｆｆ’，１００ＫＢ）（７７５）を呼び出すと、これに対しても同様に受信７０７が実行される。
【００９３】
この受信処理７０７は、最初の判定処理２０２における判定では、受信側のＴＣＰ／ＩＰエミュレーションライブラリ１１２内の受信用のバッファ９０１内に未転送のデータ９０６が残っていると判断される。その結果、メモリコピー命令ｍｅｍｃｐｙ（Ｒｂｕｆｆ’，Ｅｂｕｆｆ＋３０ＫＢ，２０ＫＢ）（７７６）を発行して、このデータ９０６を受信側のアプリケーションプログラム１０７内のバッファ８１３の先頭領域９０７にコピーする（処理２０３）。このメモリコピーするデータの長さは、バッファ９０１に保持されているデータの長さの内、関数ＥＭＵ＿ｒｅａｄに対する上記第２の関数呼び出しが指定するバッファ長を超えない範囲に設定される。今の場合にはバッファ９０１に保持されているデータの長さが２０ＫＢであり、受信を要求されたデータの長さが１００ＫＢより小さいので、このデータ２０ＫＢが全てコピーされる。
【００９４】
その後受信側アプリケーションプログラム１０７が要求する８０ＫＢの残りのデータをさらに受信するために、処理２０６が実行される。この処理２０６ではすでに説明したように、送信データがあるか否かをか調べる命令ＭＰＩ＿Ｐｒｏｂｅ（）が発行される。具体的には、送信データに対するヘッダが受信済みであるか否かが判定される。今の場合に、送信側の関数呼び出し７９１がすでに実行済みであると仮定すると、８０ＫＢの送信データがあることが判明する。その場合には、この送信データのサイズが受信側アプリケーションプログラム１０２が指定するバッファ８１３に入りきるか否かが判定される（処理２０８）。今の場合には受信側のアプリケーションプログラムのバッファ８１３の残りの領域９０８のサイズは８０ＫＢなので、送信データはバッファ８１３のこの領域９０８に入りきる。
【００９５】
この結果、関数ＭＰＩ＿Ｒｅｃｖ（Ｒｂｕｆｆ’＋２０ＫＢ，８０ＫＢ）（７７８）が呼び出される。この関数呼び出しは、バッファ８１３の残りの領域９０８のアドレスと送信データのサイズ８０ＫＢを指定する。こうして、この送信データがバッファ８１３の領域９０８に直接受信される（処理２１０）。その後、受信を要求されたデータが全て受信されたか否かが判定される（処理２１２）。今の場合は判定の結果が肯定的であるので、受信処理７０７は終了し、処理はアプリケーションプログラムに戻る（処理２１１）。
【００９６】
以上の手順により、複数の関数ＥＭＵ＿ｗｒｉｔｅの呼び出し（７０２，７９０）が指定する送信データをひと繋がりのデータストリームとして複数の関数の呼び出しＥＭＵ＿ｒｅａｄ（７０１、７７５）により受信する、ストリーム通信を実現することができる。
【００９７】
送信側のアプリケーションプログラムが指定する送信データのサイズが受信側のアプリケーションプログラムが指定するバッファのサイズよりも小さいときでも、以下のようにしてストリーム通信が簡単に実現される。たとえば、送信側のアプリケーションプログラムが５０ＫＢのデータの送信を繰り返し要求し、受信側のアプリケーションプログラムが１００ＫＢのデータの受信を要求する場合のストリーム通信を図８を参照して説明する。
【００９８】
送信側のアプリケーションプログラムが呼び出す関数ＥＭＵ＿ｗｒｉｔｅ（７０２）に対する送信処理７０８（図４）の中で、関数ＭＰＩ＿ｓｅｎｄ（７８１）が呼び出される。この関数呼び出しでは、送信側アプリケーションプログラムのバッファ８０３のアドレスＳｂｕｆｆとサイズ５０ＫＢを指定する。
【００９９】
受信側のアプリケーションプログラムが呼び出す関数ＥＭＵ＿ｒｅａｄ（７０１）に対する受信処理７０７（図４）も、すでに述べたように図６に従い処理される。今の仮定では処理２０２での判定は失敗する。処理２０６において、データ検査命令７７１が実行されたときに、送信データがあると判定されたと仮定する。今の場合には受信側のバッファ８１２のサイズは、送信側のバッファのサイズより大きいので、処理２０８での判定の結果は肯定的となる。その結果、処理２１０が実行される。この処理では、送信データを受信側アプリケーションプログラムが指定したバッファ８１２に直接受信するための関数ＭＰＩ＿ｒｅｃｖ（７７４）が呼び出される。この関数呼び出しは、受信側アプリケーションプログラムのバッファ８１２の先頭アドレスＲｂｕｆｆと送信側のバッファ８０３のサイズ５０ＫＢを指定する。こうして、送信側のバッファ８０３内の全データが、受信側のバッファ８１２内の先頭の５０ＫＢの領域９０７に書き込まれる。次に処理２１２が実行される。今の場合、受信を要求されたデータのサイズは１００ＫＢであるのに対して、すでに受信されたデータのサイズは５０ＫＢである。したがって、要求されたデータの一部がまだ受信されていない。したがって、判定２１２の結果は否定的となり、残りのデータを受信するために処理２０６が再度実行される。
【０１００】
もし、送信側のアプリケーションプログラムが次に関数ＥＭＵ＿ｗｒｉｔｅ７９０を呼び出せば、それに対する送信処理７０８（図４）の中で、関数ＭＰＩ＿ｓｅｎｄ（７９１）が同様に呼び出される。この関数呼び出しでも、送信側アプリケーションプログラムの次のバッファ８０４のアドレスＳｂｕｆｆ’とサイズ５０ＫＢを指定する。
【０１０１】
上記処理２０６を繰り返したときに、すでに送信信側のアプリケーションプログラムが上記次の関数ＥＭＵ＿Ｓｅｎｄ７９１を呼び出していたならば、処理２０６での判定結果は肯定的となり、判定処理２０８に移る。今の場合には、受信側のバッファ８１２の残りの領域９１０のサイズは送信されようとするデータのサイズに等しいので、この判定の結果は肯定的となる。その結果、処理２１０が実行され、送信データを受信側のバッファ８１２の残りの領域９１０に直接書き込むための第２の受信関数７７９が呼び出される。この関数呼び出しでは、受信側アプリケーションプログラムのバッファ８１２の残りの領域９１０のアドレスＲｂｕｆｆ＋５０ＫＢと送信データのサイズ５０ＫＢとを指定する。こうして、処理２１２において、要求された全てのデータの受信が完了したと判断されるので、受信処理７０７は完了する。なお、上記処理２０６が繰り返し実行された時点で送信データが存在しないときには、受信処理７０７は終了し、処理は受信側アプリケーションプログラムに戻る。また、上記処理２０８が繰り返された時点で、処理２０８での判定結果が否定的であるときには、処理２０９が実行される。この処理の内容は、すでに説明したものと同じである。以上のごとく、送信側のアプリケーションプログラムが発行した送信命令ＥＭＵ＿ｗｒｉｔｅが指定するバッファのサイズと受信側のアプリケーションプログラムが発行した受信命令ＥＭＵ＿ｒｅａｄが指定するバッファのサイズが異なっていても、また、送信側のアプリケーションプログラムが発行する送信命令ＥＭＵ＿ｗｒｉｔｅの数と受信側のアプリケーションプログラムが発行する受信命令ＥＭＵ＿ｒｅａｄの数が異なっていてもバッファストリーム通信が実現されることが分かる。
【０１０２】
以上の説明から分かるように、本実施の形態によれば、並列計算機内部の様に、メッセージパッシング型通信方式の高速通信機構が提供されている計算機で動くアプリケーション同士が、ＴＣＰ／ＩＰを用いてデータ通信を行なう際に、高速通信機構の特徴を活かした高速通信が可能となる。また、それ以外の計算機上で動くアプリケーションとは、従来通りのＴＣＰ／ＩＰによる通信を保証する。利用者は、既存のＴＣＰ／ＩＰアプリケーションを一切変更する必要がない。
【０１０３】
＜発明の実施の形態１の変形例＞
本発明は、実施の形態１の内容に限定されるのではなく、以下に例示する変形例および他の変形例を含めいろいろの実施形態により実施できる。
【０１０４】
（１）広域ネットワークの通信規約としてＴＣＰ／ＩＰを使用したが、これに代えて他の通信規約を用いることもできる。そのときには、ＴＣＰ／ＩＰ処理ルーチン、エミュレーションライブラリを変更する必要があるのは言うまでもない。
【０１０５】
（２）実施の形態１では、メッセージパッシング型ライブラリを使用したが、これを使用しないことも可能である。このときには、エミュレーションライブラリは、直接高速通信ライブラリを呼び出すことになる。
【０１０６】
（３）さらに、この通信ライブラリをなくすことも可能である。たとえば、これに代えて、専用の回路を使用することもできる。
【０１０７】
（４）実施の形態１では、内部計算機は全て広域通信網に接続されると想定した。しかし、一部の内部計算機が広域通信網に接続されている場合にも同様に本発明を適用できる。
【０１０８】
（５）実施の形態１では、広域通信網と内部高速通信網の両方を利用することを前提とした。しかし、本発明によるストリーム通信それ自体は、メッセージパッシング型の通信を実行可能な計算機間に適用できるものであり、したがって、このストリーム通信を実行するには、ＴＣＰ／ＩＰ通信を使用しないでメッセージパッシング型の通信のみを使用するアプリケーションプログラム間の通信にも適用できる。この場合には複数種類の通信網を使用しなくてもよい。その際には、送信側のエミュレーションライブラリは実質的には使用しない変形例も可能である。
【０１０９】
＜発明の実施の形態２＞
ＴＣＰ／ＩＰ処理ルーチンには従来からアプリケーションプログラムが使用可能な関数としてｓｅｌｅｃｔ関数が設けられている。そもそもソケット記述子はファイル記述子の一種として定義されている。このファイル記述子が指定するオブジェクトから、データを取得することが可能であるかどうかを調べるためのシステムコールとして、ｓｅｌｅｃｔ関数が用意されている。例えば、あるソケットから受け取り可能なデータが送信側から送られてきている（あるいは送られようとしている）かどうかも、ｓｅｌｅｃｔ関数によって調べることができる。具体的にはあるソケットを割り当てられているアプリケーションプログラムが、送信システムコールｗｒｉｔｅを発行したか否かが判定できる。ｓｅｌｅｃｔ関数では、見張ろうとするファイル記述子をビット列で指定する。このビット列の各ビットはそれぞれ個別のファイル記述子に対応しており、ビットを１にすることでファイル記述子を指定する。複数のビットを１にすれば、一回のｓｅｌｅｃｔ関数で複数のファイル記述子を同時に見張ることができる。ｓｅｌｅｃｔ関数は、見張っているファイル記述子の何れかがデータ受け取り可能な状態になるまでブロックする。
【０１１０】
発明の実施の形態１で用いるＴＣＰ／ＩＰエミュレーションライブラリでは、並列計算機内部の通信時には従来のソケットライブラリを用いないため、ｓｅｌｅｃｔシステムコールでは、内部通信用のソケットからデータ受信可能であるかどうかを調べることができない。そこで、本実施の形態では、実施の形態１のごとくＴＣＰ／ＩＰエミュレーションライブラリを使用する計算機システムにおいても、アプリケーションプログラムが従来と同様にセレクト関数を利用可能にする。
【０１１１】
本実施の形態では、内部ソケットテーブル４１０、４１１等に登録されているソケット記述子に対しては、内部通信専用のｓｅｌｅｃｔに相当する処理を行い、それ以外のファイル記述子に対しては、従来のｓｅｌｅｃｔシステムコールをそのまま用いる、という切り分けを行うセレクト関数ＥＭＵ＿ｓｅｌｅｃｔをＴＣＰ／ＩＰエミュレーションライブラリ１１３内に設ける。
【０１１２】
ただし、内部通信用のｓｅｌｅｃｔとｓｅｌｅｃｔシステムコールは同時に並行して実行しなくてはならない。２つのｓｅｌｅｃｔを逐次に実行するのでは、例えば内部通信用のｓｅｌｅｃｔがデータを待っている間は、外部との通信用のソケットや標準入出力などがデータを受け取り可能な状態になった場合でも、それを検知することができないからである。
【０１１３】
ｓｅｌｅｃｔの同時実行を疑似的に実現する手段として、内部用ｓｅｌｅｃｔとｓｅｌｅｃｔシステムコールをノンブロッキングで続けて発行することをスピンループで繰り返す、という方法が考えられる。しかしこの方法を用いると、データが受け取り可能になるまで計算機を占有してしまい、同一計算機上で走っている他のプロセスに処理が渡らなくなってしまう。
【０１１４】
これに対して本実施の形態では、１ループ毎に処理を他のプロセスに譲渡する命令を挿入するという方法を採る。この方法により、スピンループによる計算機の占有を避けることができる。
【０１１５】
図９は、図３で示した接続状態において、アプリケーションプログラム１０８および１０９が送信命令１００２，１００３を発行して、アプリケーションプログラム１０７にデータを送信し、アプリケーションプログラム１０７側でそれらのデータの到着を、ｓｅｌｅｃｔ命令１００１によって見張っている様子を表す。ｓｅｌｅｃｔ命令１００１で指定しているビット列はソケットＳＡ０およびｓａ１に対応しているとする（１００４，１００５）。このうちｓａ１は内部ソケットテーブルに登録されているので、内部用ｓｅｌｅｃｔで見張る（１００８）。一方、ＳＡ０は内部ソケットテーブルに登録されていないので、ｓｅｌｅｃｔシステムコールで見張る（１００９）。ＴＣＰ／ＩＰエミュレーションライブラリで発行するｓｅｌｅｃｔシステムコールではｓａ１を見張る必要が無いので、アプリケーションプログラム１０７が発行したｓｅｌｅｃｔ命令１００１で指定されていたビット列に対し、ｓａ１に対応するビットを０にしたビット列を指定する。処理１００８と１００９はノンブロッキングに発行し、交互に繰り返す（１０１０，１０１１）。ただし、繰り返しの途中で処理を一旦、他のプロセスに譲渡する。
【０１１６】
図９に示したｓｅｌｅｃｔ命令、内部用のｓｅｌｅｃｔ関数をアプリケーションプログラムが使用可能にするためには、ＴＣＰ／ＩＰエミュレーションライブラリ１１２，１１３等にはエミュレーションセレクト関数ＥＭＵ＿ｓｅｌｅｃｔが設けられ、アプリケーションプログラムは、これを呼び出して使用する。アプリケーションプログラムがＥＭＵ＿ｓｅｌｅｃｔ関数を呼び出す際には従来と同じく、それぞれ一つのソケットに対応するビットからなるビット列ａｐ＿ｂｉｔｓを指定する。ＴＣＰ／ＩＰエミュレーションライブラリは、この関数呼び出しに応答して図１０にともない処理を実行する。
【０１１７】
まず、そのビット列ａｐ＿ｂｉｔｓに対し（処理１１０１）、内部ソケットテーブルに登録されているソケット記述子に対応したビットを０にするためのマスクをかける（処理１１０２）。ｉｎ＿ｍａｓｋは、内部ソケットテーブルに登録されている全てのソケット記述子に対応するビットが０、それ以外のビットが１であるようなビット列である。よって、ａｐ＿ｂｉｔｓにｉｎ＿ｍａｓｋをかけて作成したビット列ｅｘ＿ｂｉｔｓは、ａｐ＿ｂｉｔｓで指定されたファイル記述子のうち、内部通信用のソケット記述子を除いたビット列となる。その後、ｅｘ＿ｂｉｔｓを引数にしたｓｅｌｅｃｔシステムコールと、内部用ｓｅｌｅｃｔ処理をノンブロッキングで一回ずつ実行し（処理１１０３，１１０４）、もし、この処理で調べたファイル記述子の何れかがデータ受け取り可能状態であった場合にはリターンする（処理１１０６）。そうでない場合は、一旦他のプロセスに処理を譲渡し（処理１１０７）、再びｓｅｌｅｃｔ処理を繰り返す（処理１１０８）。
【０１１８】
こうして、本実施の形態によれば、実施の形態１のように内部高速通信機構を併用する計算機システムにおいても、アプリケーションプログラムがｓｅｌｅｃｔ関数を利用可能になる。
【０１１９】
＜発明の実施の形態３＞
上記スピンループによるｓｅｌｅｃｔ関数の実現方法では、スピンループの途中に他のプロセスに処理を譲渡する処理１１０７を挿入することで計算機の占有を回避するが、同じ計算機上で処理されるプロセスの優先度が低いと、そのプロセスには処理が渡らない可能性がある。データの到着をスリープして待つブロッキングウェイトを用いればこれを避けることができるが、ブロッキングウェイトの内部通信用ｓｅｌｅｃｔ処理とブロッキングウェイトのｓｅｌｅｃｔシステムコールを、１プロセス・１スレッド上で同時に実行することはできない。
【０１２０】
これに対して本実施の形態では、まず２つのスレッドを生成し、一方のスレッド上では内部用ｓｅｌｅｃｔを、もう一方ではｓｅｌｅｃｔシステムコールを実行するという方法を採る。この方法によれば、内部用ｓｅｌｅｃｔとｓｅｌｅｃｔシステムコールがそれぞれのスレッド上で独立に動作することができるため、同時にブロックしてデータを見張ることができる。
【０１２１】
図１１は、図９で示したのと同じ通信状態を表している。ただし、内部用ｓｅｌｅｃｔとｓｅｌｅｃｔシステムコールの同時実行の実現方法が異なる。図１１では、これら２つのｓｅｌｅｃｔ処理は、別々のスレッドの上で実行する（１２０３，１２０４）。処理１００９と同様に、スレッド１２０３上のｓｅｌｅｃｔシステムコールでは、ｓａ１に対応するビットを０にしたビット列を指定する。
【０１２２】
図１２を参照するに、本実施の形態において、アプリケーションプログラムがＥＭＵ＿ｓｅｌｅｃｔ関数を発行する際に指定したビット列ａｐ＿ｂｉｔｓに対し、内部ソケットテーブルに登録されているソケット記述子に対応したビットを０にするためのマスクをかけてｅｘ＿ｂｉｔｓを作成する処理までは図１０の処理（１１０１，１１０２）と同じである（処理１３０１，１３０２）。その後、まずノンブロッキングでｓｅｌｅｃｔシステムコール（処理１３０３）と内部用ｓｅｌｅｃｔ処理（処理１３０４）を１回ずつ行う。この処理で調べたファイル記述子の何れかがデータ受け取り可能状態であった場合にはリターンする。そうでない場合には、スレッドを２つ生成し（処理１３０７）、それぞれの上でｓｅｌｅｃｔシステムコール（処理１３０８）と内部用ｓｅｌｅｃｔ（処理１３０９）を実行する。これらの処理は、それぞれデータ受け取り状態になるまでブロックする。両処理のうち先にブロックが解けた方は、もう一方のスレッドをキャンセルして（処理１３１０，１３１１）リターンする（処理１３１４，１３１５）。このキャンセル処理では、スレッドを強制的に終了させるのではなく、そのスレッドがもう不要であるという印を付ける。スレッドはブロックが解けた時にこの印が付けられているかどうかを調べ（処理１３１６，１３１７）、もし付けられていれば、キャンセルされていたことになるのでそのまま消滅する（処理１３１８，１３１９）。
【０１２３】
上記ｓｅｌｅｃｔ処理において、スレッド分割の前に処理１３０３および処理１３０４のノンブロッキングｓｅｌｅｃｔ処理を一回ずつ行うのは、次の理由による。もし、処理１３０１が発行される以前に内部用ソケットとそれ以外のファイル記述子が共にデータ受け取り可能な状態になっている場合、ｓｅｌｅｃｔ関数はその両方を検知できなければならない。しかし、いきなりスレッドを分割して、内部ソケットとそれ以外のファイル記述子を別々に見張り始めると、両スレッドのうち若干早く検知した方がもう一方のスレッドをキャンセルしてしまうため、片方のスレッドの状態しか検知することができない。これに対して処理１３０３および処理１３０４を実行することで、処理１３０１が発行される以前の内部ソケットとそれ以外の記述子の状態を両方とも確実に調べることができる。
【０１２４】
本方式では、データの到着をスリープして待つため、実施の形態２のスピンループで待つ方法に比べると、データ検出のタイミングが遅れるが、優先度の低いプロセスに対しても、処理の妨げとなることを回避できる。
【０１２５】
【発明の効果】
本発明によれば、ストリーム通信をメッセージパッシング型の通信でもって実現できる。
【０１２６】
本発明の他の態様によれば、第１の通信網とそれより高速の第２の通信網に接続された計算機上で動作するアプリケーションプログラムが、第１の通信網に接続された他の計算機上で動作する他のアプリケーションプログラムとの間で第１の通信規約に基づいて通信することができ、さらに、第２の通信網に接続された他の計算機上で動作する他のアプリケーションプログラムとの間でその第２の通信網を使用した高速の通信を行うことができる。
【０１２７】
さらに具体的には、上記第１の通信規約はＴＣＰ／ＩＰ通信規約を使用できる。また、上記第２の通信網を使用した通信を、メッセージパッシング型の通信とすることができる。
【図面の簡単な説明】
【図１】本発明の実施例の全体構成図。
【図２】ソケット接続のフローチャート。
【図３】ソケット接続を説明すための図。
【図４】内部・外部通信切り分け方法のフローチャート。
【図５】内部・外部通信切り分け方法を説明するための図。
【図６】ストリーム通信のフローチャート。
【図７】送受信動作に使用される命令列の一例を示す図。
【図８】送受信動作に使用される命令列の他の例を示す図。
【図９】ｓｅｌｅｃｔ機能をスピンループで実現する方法を説明するための図。
【図１０】ｓｅｌｅｃｔ機能をスピンループで実現する方法のフローチャート。
【図１１】ｓｅｌｅｃｔ機能をスレッドを用いて実現する方法を説明するための図。
【図１２】ｓｅｌｅｃｔ機能をスレッドを用いて実現する方法のフローチャート。
【図１３】従来のＴＣＰ／ＩＰ通信規約の階層図。
【図１４】従来のストリーム通信を説明するための図。
【符号の説明】
１０５．．．並列計算機内部通信網，１０６．．．グローバル通信網，
１１１．．．ソケットアプリケーションプログラムインタフェース，１１８．．．ＭＰＩ仕様のインタフェース，１１９，１２０，１２１，１２２，１２３．．．ネットワークインタフェースハードウェア，１４０、１４１．．．メッセージパッシング型ライブラリ，４０１，４０２，４０３，４０４，４０５，４０６．．．ソケット，４０７，４０８，４０９．．．ソケットのコネクション，４１０，４１１．．．内部ソケットテーブル，６０１．．．並列計算機内部通信時におけるデータ経路，６０２．．．外部計算機との通信時におけるデータ経路，８０３，８０４，８１２，８１３．．．アプリケーションプログラムのバッファ，８０９．．．ストリーム，９０１．．．ＴＣＰ／ＩＰエミュレーションライブラリ内のバッファ。[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a data transmission / reception method between computers in a computer system having a plurality of computers connected by a plurality of types of communication networks, and is particularly suitable for executing message passing type communication. It relates to a transmission / reception method.
[0002]
[Prior art]
TCP / IP is very commonly used as a communication protocol between computers. A program configured to communicate with other programs using TCP / IP is hereinafter referred to as a TCP / IP application program. There is also a computer system in which a communication protocol other than TCP / IP can be used from this TCP / IP application program. In other words, when a computer constituting a computer system is configured by a plurality of networks having different performances, and a computer in the system communicates with other computers, those computers use any of these networks for the communication. Depending on this, systems using TCP / IP or other communication protocols have also been proposed. For example, Steven H. of Berkeley University. The system developed by Rodrigues et al. Consists of a wide area network that can be used with the TCP / IP protocol and a local network that uses a communication protocol that is simpler and has lower overhead in a smaller area, and is connected to the local network. Computers that have been connected communicate using this communication protocol, and computers connected to a wide area network communicate according to the TCP / IP protocol. For example, “High-Performance Local-Area Communication With Fast Socket”, USENIX '97 Annual Technical Conference pp. 257-274).
[0003]
In this computer system, a TCP / IP emulation library is prepared so that the communication protocol can be used from a TCP / IP application program. In the above system, this TCP / IP emulation library is called a high-speed socket.
[0004]
The high-speed socket uses an active message developed for a workstation cluster as a simple communication protocol. For example, the document T.W. von Eicken, D.C. E. Culler, S.M. C. Goldstein, K.M. E. Schuuser “Active Messages: a Mechanical for Integrated Communication and Computation”, in Proceedings of the 19th International Symptom on Computer Architecture. See 256-266. In the active message, data is exchanged in such a manner that an application program on the data sending side interrupts the application program on the receiving side, and the receiving side performs data reception processing in response to the interruption.
[0005]
A parallel computer is a computer whose purpose is to solve a problem in cooperation with a plurality of computers communicating with each other. In order to satisfy this purpose, computers in the parallel computer are generally connected to each other by a high-speed internal high-speed communication network, and at least some of the computers in the parallel computer are further externally connected by a wider network such as a LAN. Connected to a calculator.
[0006]
A communication protocol used for a wide area network is mainly TCP / IP. In order to use the internal high-speed communication network, the parallel computer uses high-speed communication hardware for communicating with each other computer via the internal high-speed communication network and a high-speed communication library for using the hardware. Is provided. Currently, the communication protocol adopted in many parallel computers is message passing type communication. Message passing communication is communication in which communication is performed when a transmission command issued by a transmission application program and a reception command issued by a reception application program are associated with each other in a one-to-one relationship. In many cases, this communication method is suitable for high-speed communication hardware inside a parallel computer. As a high-speed communication library used for realizing the message passing communication, a library called a remote memory writing library or a PUT library is mainly used.
[0007]
The communication overhead inside the parallel computer can be significantly reduced by using high speed communication hardware. In the interrupt type communication method used for the active message, the overhead of the interrupt becomes conspicuous. However, the message passing communication method does not presuppose interrupts, and is therefore more suitable for communication inside a parallel computer than active message communication.
[0008]
[Problems to be solved by the invention]
As described above, the communication protocol for using the internal high-speed communication network of the parallel computer is generally a message passing type. However, many business application programs used when a parallel computer is used in the business field are configured to use the TCP / IP protocol. Therefore, such an application program cannot directly use the message passing communication system as it is. In addition, there is no known method for realizing stream communication in a state where the message passing communication method is used. Further, the stream communication realization method used in active message type communication using interrupts cannot be directly used for stream communication in message passing type communication.
[0009]
Accordingly, an object of the present invention is to provide an inter-computer data transmission / reception method that enables stream communication to be executed between a plurality of application programs operating on a computer configured to execute message passing communication.
[0010]
A more specific object of the present invention is to provide a plurality of application programs configured to execute communication using a communication protocol different from message passing communication, such as TCP / IP, via a first computer network. It is an object to provide a data transmission / reception method between computers using a communication network faster than the computer network and enabling message passing communication.
[0011]
Furthermore, another object of the present invention is to operate on a computer connected to a first communication network and a second communication network that is faster than the first communication network, and for example, a TCP / IP communication protocol with another application program. An application program configured to communicate based on the first communication protocol such as is communicated with another application program operating on another computer connected to the first communication network. High-speed communication using the second communication network with another application program operating on another computer connected to the second communication network. It is to provide a data transmission / reception method between computers that makes it possible to execute
[0012]
Furthermore, another specific object of the present invention is to provide an inter-computer data transmission / reception method that enables communication using the second communication network to be message passing communication.
[0013]
[Means for Solving the Problems]
In order to achieve the above object, in the inter-computer data transmission / reception method according to the present invention,
Each of a plurality of transmission data specified by a plurality of transmission commands issued by the first application program executed on the first computer is transmitted to the second application program executed on the second computer. In response to a plurality of issued reception commands, the message is received by message passing communication. This reception is controlled by an emulation library provided on the second computer.
[0014]
Further, among a series of data composed of a series of a plurality of transmission data, a portion obtained by dividing into a size portion designated by the plurality of reception commands is stored in a plurality of buffers designated by the respective reception commands. The received transmission data is processed so as to realize the stream communication. This process is controlled by the emulation library.
[0015]
More specifically, the inter-computer data transmission / reception method according to the present invention executes the following processing.
[0016]
(A) The receiving-side emulation library detects the length of data that the transmitting-side application intends to transmit with a single transmission command,
(B) The above-mentioned transmission data length is compared with the data reception length specified by the application by a single reception command by the reception side emulation library,
(C) In the above comparison, if the transmission data length is longer than the data reception length, the receiving-side emulation library receives all the data in the buffer area once secured in the memory, and receives the data specified by the application from there. If the data corresponding to the length is copied to the application area and the transmission data length is equal to or shorter than the data reception length, the receiving side emulation library directly receives the data in the application area.
[0017]
(D) If data remains in the buffer area when the application issues a reception command, the data is copied from there to the application area.
[0018]
More specifically, a TCP / IP emulation library is prepared in order to realize the data transmission / reception method according to the present invention. This library has the same interface as the TCP / IP socket application program interface. If the communication partner is outside the parallel computer, the conventional system call is made. If the communication partner is inside, the message passing communication method for the parallel computer such as MPI is used. The use of a high-speed communication network is performed. That is, this TCP / IP emulation library has the following characteristics.
[0019]
(1) When communicating inside a parallel computer, a stream communication service equivalent to TCP / IP is provided using a communication procedure suitable for the message passing communication method.
[0020]
(2) The communication method is separated using an appropriate means as a TCP / IP emulation library.
[0021]
(3) External communication data and internal communication data are detected by a spin loop. Alternatively, external communication data and internal communication data are detected by separate threads.
[0022]
DETAILED DESCRIPTION OF THE INVENTION
<Conventional technology and its problems>
Prior to describing embodiments of the present invention, conventional techniques and problems thereof will be described.
[0023]
(1) TCP / IP
FIG. 13 shows the TCP / IP hierarchy. In TCP / IP, four layers of a link layer 301, an IP layer 302, a TCP layer 303, and an application layer 306 are defined from the bottom. There is actually a physical layer below the link layer 301, but this layer is not mentioned for simplicity. Hereinafter, for simplicity, the layer group 304 including the

layers

303, 302, and 301 below the application layer is collectively referred to as a TCP / IP layer. Generally, a program (hereinafter referred to as a TCP / IP processing routine) necessary for performing communication according to a procedure determined by TCP / IP is included in the OS. In order to execute communication according to the procedure determined by TCP / IP, specific processing is executed by each of the plurality of layers constituting the TCP / IP layer 304. In this specification, a plurality of processes executed by these layers or a plurality of program routines for executing these processes are collectively referred to as a TCP / IP processing routine.
[0024]
As an interface for the application program to use the TCP / IP processing routine, a socket application programming interface (socket API) 111 is generally used. In order for an application program to use a TCP / IP processing routine, a system call group called a socket library is provided as a part of the OS.
[0025]
In this specification, calling a system call and function for data reception or a system call and function for data transmission may be described as issuing a reception command or a transmission command in combination with each function. . The socket application program interface is defined as an application program interface of the socket library. An example of an application program (hereinafter referred to as a TCP / IP application program) that communicates with TCP / IP described in accordance with the socket application program interface is shown below.
[0026]

First, the TCP / IP application program on the server side and the TCP / IP application program on the client side that want to communicate with others are the system call sockets included in the TCP / IP processing routine provided in the computer on which it is running. Call (socket). The called system call socket creates an object called a socket that serves as a communication end, and returns a socket descriptor. In the above example, the server-side and client-side TCP / IP application programs receive the returned socket descriptor as sa or sb0.
[0027]
The socket descriptor is an identifier (ID) of a socket that is uniquely determined for each socket in the application program, and includes an integer value. All operations on the created socket are performed by specifying this socket descriptor. In the following description, the socket and the socket descriptor are synonymous unless otherwise specified. The system call argument AF_INET above represents the address family Internet and indicates that a socket is used for communication over the Internet. Furthermore, the argument SOCK_STREAM requests stream communication. The last argument is an argument that specifies the protocol. When this value is 0, the protocol is determined by the two arguments before that argument. In this case, the TCP / IP protocol is used. After the socket is generated, the TCP / IP application program on the server side and the client side perform socket connection processing in different procedures.
[0028]
The server side calls a system call bind (bind) included in the TCP / IP processing routine. The called system call binding associates the name specified by the argument with the socket sa specified by the argument. The name consists of a combination of an IP address and a port number. In the above example, the socket identifier sa is associated with a name consisting of a pair of an IP address and a port number stored in the structure server having a size specified by slen. The server side further calls a system call listen included in the TCP / IP processing routine.
[0029]
This system call listen sets the socket sa specified by the argument as a socket for receiving a connection request. The second argument of this system call represents the size required for the queue used to temporarily hold the connection request and other connection requests, and in this case, the call contains 5 connection requests. Requesting a queue that can hold
[0030]
As a result of the above processing, logically, a communication path is determined between the server side and the client side.
[0031]
Thereafter, the server side calls a system call accept (accept) included in the TCP / IP processing routine. This system call places the socket sa designated by the argument into a connection request waiting state. The second and third arguments of this system call represent the IP address and length on the client side of the connection request to be waited for.
[0032]
On the other hand, the client side calls a system call connect (connect) included in the TCP / IP processing routine. This system call is an argument for the name specified by the argument, in this case the name given to the server-side socket sa (combination of the IP address sin_addr and port number sin_port stored in the structure server). Connect the socket sb0 specified by.
[0033]
Further, in the server-side computer, the previously called system call accept receives this connection request, and the socket sa1 specified by the above argument for the system call accept is newly set as a socket for communication with the client side. To generate. In this way, a communication path is established between the server-side socket sa1 and the client-side socket sb0.
[0034]
When the client side transmits data to the server side with the connection between the sockets established, the client side calls a system call write (write) included in the TCP / IP processing routine. In an argument for this system call, a buffer address buffer0 used to hold data to be transmitted and a length length0 of the buffer are specified. The system call write transmits the data in this buffer using the socket sb0 specified by this system call. The server side calls a system call read (read) included in the TCP / IP processing routine. This system call receives the data transferred through the socket sa1 specified by the argument, and writes it into the buffer at the address buffer1 specified by the argument. In this way, data is transferred between the two application programs. If necessary, the client side calls multiple write system calls to send multiple subsequent data respectively, and the server side calls one or more read system calls to receive those subsequent data .
[0035]
The socket descriptor is defined as a kind of file descriptor used when performing input / output to a file, standard input / output, and the like. Therefore, in the socket application program interface, data can be transmitted and received through the same interface as that for inputting and outputting files and the like.
[0036]
When the communication between the server side and the client side program ends, the server side calls a system call close (close) to close the socket sa1 specified by the argument. The server side also calls the same system call close again, which in turn closes socket sa. The client side similarly calls a system call close, and this system call closes the socket sb0.
[0037]
(2) Switching communication method for high-speed sockets
In conventional TCP / IP, a name (server) consisting of an IP address and a port number is associated with a socket (sa) by system call binding. In the high-speed socket, a socket dedicated for high-speed communication is generated as follows. First, a hash function is applied to the port number (port number sin_port stored in the structure server) whose name is specified by the application when calling the system call binding, and a new shadow port number is derived. Next, a shadow socket is newly created by calling a system call socket. Finally, call the bind system call to associate the shadow port number with the shadow socket. When performing high-speed communication, when a server and a client call and connect to accept and connect, a communication path dedicated to high-speed communication is pasted by using this shadow socket.
[0038]
This method requires special processing when a system call bind or system call listen is invoked.
[0039]
(3) Stream communication by TCP / IP processing routine
The TCP / IP socket library services stream communication. Stream communication is one or a plurality of read systems in which a transmission side application program processes a plurality of data sent by a series of write system calls as a continuous data stream, and the reception side application program issues the data stream. This is a communication method in which one or a plurality of pieces of data having an arbitrary length specified by a call are received.
[0040]
The operation of conventional stream communication will be described with reference to FIG. Here, an example in which the transmission application program 801 sends data to the reception application program 802 is shown. The transmission application program 801 first calls the first write system call and transmits 50 kilobytes (KB) of data in the buffer 803 of the program (805). In FIG. 14, the unit KB is omitted for simplification. The TCP / IP processing routine on the transmission side copies this transmission data in the buffer 803 to a buffer (not shown) once secured in the OS, divides it into a plurality of packets, and transmits it to the reception side OS. The normal packet size is 40-1500 bytes. The receiving-side OS receives these packets in a plurality of buffers (not shown) secured in the OS, connects these packets in a list, and reconstructs the data stream. The transmission application program 801 further issues a second write system call and transmits 80 KB data in the other buffer 804 of the program (806). At this time, the TCP / IP processing routine on the transmitting side and the TCP / IP processing routine on the receiving side operate in the same manner, and the above-mentioned buffer in the OS is combined with this previously transmitted data as one data stream. Hold on.
[0041]
In the figure, reference numeral 809 schematically represents this data stream held in the OS 900. The head data 807 and the subsequent data 808 respectively represent 50 KB and 80 KB data transmitted by the first and second write system calls, respectively. In stream communication, the receiving application program views these data as a single continuous 130 KB data stream 809. Data in the

buffers

803 and 804 in the transmission application program 801

hold data portions

807 and 809 in the data stream 809 in the OS 900, respectively. For this purpose, the reception application program 802 issues a first read system call and requests that 30 KB of the 50 KB head data 807 in the stream data 809 be received in the 30 KB buffer 812 of the reception application program 802. (814). The TCP / IP processing routine on the receiving side copies the first 30 KB data 810 from this data stream 809 to the buffer 812 in response to this read system call. When the receiving application program 802 further issues a second read system call, the remaining 100 KB of data 811 in the stream data 809 is copied to the buffer 813 specified by this system call according to the length specified by this system call. (815).
[0042]
(4) Stream communication in active message type communication
The system using the high-speed socket realizes stream communication while adopting active message type communication. That is, in this system, when a transmission request is issued from the transmission-side application program, this transmission request is executed without waiting for the reception-side application program to be issued. In response to the transmission request, an interrupt is generated in the computer on the receiving side, the interrupt handler is activated by this interrupt, and the transmission data is once received in a buffer in the interrupt handler. When the reception-side application program issues a reception request, the interrupt handler transfers data of a size requested by the reception request among the already received data to the buffer of the reception-side application program. If the size requested by the application program on the receiving side is smaller than the size of the received data, the reception request processing ends. The remaining data held in the interrupt handler is transferred to the buffer of the application program when a new reception request is issued from the reception-side application program.
[0043]
Conversely, if the size requested by the receiving application program is larger than the size of the received data, the interrupt handler sends the data to the receiving side when the sending application program subsequently sends the data. Supply to application program. Similarly, when a subsequent reception request is issued from the reception-side application program before the transmission-side application program issues a transmission request, the reception-side interrupt handler is followed by the transmission-side application program. When the data is subsequently transmitted, the subsequent data is supplied to the receiving side application program. If this subsequent data is larger than the size of the insufficient data required by the first reception request or the size of the data required by the subsequent reception request, the interrupt handler requests more data from the subsequent reception request. Hold for.
[0044]
In this way, a plurality of data transmitted by a plurality of transmission requests issued by the transmission side application program are supplied to the reception side application program in response to a series of reception requests issued by the reception side application program. Thus, in this method, stream communication is realized by temporarily holding transmission data in a buffer in the interrupt handler.
[0045]
However, in message passing type communication that does not use interrupts, stream communication cannot be realized using such a method.
[0046]
Hereinafter, the inter-computer data transmission / reception method according to the present invention will be described in more detail with reference to some embodiments shown in the drawings. In the following, the same reference numerals represent the same or similar items. In the second and subsequent embodiments of the invention, differences from the first embodiment of the invention will be mainly described.
[0047]
<Embodiment 1 of the Invention>
(1) Outline of the device
FIG. 1 shows an example of a computer system for executing an inter-computer transmission / reception method according to the present invention. In the figure, it is assumed that two

computers

102 and 103 inside the parallel computer 101 and one external computer 104 are connected to each other via a communication network. Actually, the number of computers inside and outside the parallel computer 101 is arbitrary. The

internal computers

102 and 103 are connected by an internal high-speed communication network 105, and the

internal computers

102 and 103 have

network interface hardware

119 and 120 dedicated to the internal high-speed communication network 105, respectively. The internal high-speed communication network 105 is configured by a network capable of transferring a plurality of packets in parallel with each other at high speed, such as a hyper crossbar switch. All of the

internal computers

102 and 103 and the external computer 104 are connected to a global communication network 106, and each computer has

network interface hardware

121, 122, and 123 dedicated to the communication network 106, respectively.
[0048]
Data transmission / reception is performed between the

application programs

107, 108, 109 loaded in the

memories

124, 125, 126 of each computer. In addition to the application programs,

OSs

127, 128, and 129 are loaded in each memory, and TCP /

IP processing routines

114, 115, and 110 exist in each OS. In order to execute communication according to the procedure determined by TCP / IP, processing is executed by each layer constituting the TCP / IP layer 304. The TCP / IP processing routine is a general term for a plurality of processes executed by each of these layers in order to execute communication according to a procedure determined by TCP / IP. These TCP /

IP processing routines

114, 115, and 110 themselves are publicly known. It is the same as that, and includes multiple functions that can be called by system calls as described above. The TCP /

IP processing routines

114, 115, and 110 communicate with the

network interface hardware

121, 122, and 123 dedicated to the global communication network 106 for the purpose of wide area communication.
[0049]
The memory 124 of the computer 102 is further loaded with a TCP / IP emulation library 112, a message passing library 140, and a high-speed communication library 135. Similarly, a TCP / IP emulation library 113, a message passing library 141, and a high-speed communication library 136 are loaded in the memory 125 of the computer 103.
[0050]
The high-speed communication libraries 135 and 136 on the

computers

102 and 103 are libraries for communicating with the

network interface hardware

119 and 120 dedicated to the internal high-speed communication network 105 for the purpose of high-speed communication inside the parallel computer 101. A so-called remote memory writing library or PUT library is often used. Also in this embodiment, this PUT library is used for the high-speed communication libraries 135 and 136. However, the present invention is not limited to this library, and other libraries such as a library called a PUT / GET library can also be used.
[0051]
In general, communication through the high-speed internal communication network 105 is much faster than communication through the global communication network 106. Therefore, when the application programs on the internal computer try to perform TCP / IP communication, the TCP /

IP emulation libraries

112 and 113 are not the TCP /

IP processing routines

114 and 115 but the message passing

type libraries

140 and 141 and the high speed. The communication library 135, 136 and the internal high-speed communication network 105 are used to implement message passing type communication, thereby achieving high-speed communication.
[0052]
The message passing

type libraries

140 and 141 are libraries for starting the high-speed communication library 135 or 136 in accordance with a request from the TCP /

IP emulation library

112 or 113. The message passing

type libraries

140 and 141 are generally libraries having a message passing type interface to application programs (TCP /

IP emulation libraries

112 and 113 in the present embodiment).
[0053]
The TCP /

IP emulation libraries

112 and 113 further realize stream communication in the message passing type communication as provided by the conventional TCP / IP processing routine, thereby achieving high-speed communication. Since the TCP /

IP processing routines

114 and 115 are functions of the OS 127 and 128, in the conventional technique, an overhead of a context switch always occurs when an application program uses them, but in this embodiment, high-speed communication is performed. Since the libraries 135 and 136 are used, the overhead can be avoided because the OS is not used, thereby further increasing the communication speed.
[0054]
(2) Logical configuration and interface between components
FIG. 1B is a diagram representing the hardware configuration as a logical configuration. In this figure, all parts not relevant to the following description are omitted. The high-speed communication libraries 135 and 136 and the internal high-speed communication dedicated

hardware

119 and 120 are collectively expressed as high-

speed communication mechanisms

116 and 117. In addition, interfaces 111 and 118 between the components are newly shown. The squares representing the

computers

102, 103, 104 and the parallel computer 101 indicate that the logical components in each of them are executed on one computer or the parallel computer 101.
[0055]
On the computer 104, the application program 109 is linked to the TCP / IP 110 and the socket application program interface 111 as is conventional.

Application programs

107 and 108 running on the

computers

102 and 103 inside the parallel computer 101 are linked to the TCP /

IP emulation libraries

112 and 113 by the socket application program interface 111. The TCP /

IP emulation libraries

112 and 113 are linked to the conventional TCP /

IP processing routines

114 and 115, which are functions of the OS, by the socket application program interface 111. At the same time, the MPI specifications are linked to the high-

speed communication mechanisms

116 and 117. Linked by interface 118.
[0056]
(3)

Application programs

107 and 108
In the present embodiment, when an application program running on any computer 102 in the parallel computer 101, for example, 107 communicates with another application program running on any other computer, Depending on whether another application program is an application program running on the computer 103 in the parallel computer 101, for example, 108 or an application program running on the computer 104 outside the parallel computer 101, for example 109, the high-speed communication mechanism 116 and the TCP / IP processing routine 114 is used properly. In order for two

application programs

107 and 108 operating on

different computers

102 and 103 in the parallel computer 101 to communicate with each other using the high-

speed communication mechanisms

116 and 117, data is actually transmitted between these application programs. It is necessary to perform special processing so that the high-

speed communication mechanisms

116 and 117 are used not only when transmitting and receiving but also when connecting sockets for respective application programs to each other. The TCP /

IP emulation library

112 or 113 includes a plurality of functions for executing this special processing. In the present embodiment, the names of the functions provided in the TCP /

IP emulation libraries

112 and 113 are prefixed with EMU_, and the names of the functions provided in the TCP /

IP processing routines

114, 115, and 110 are added. To distinguish.
[0057]
Specifically, among the

application programs

107 or 108 operating on the

computers

102 and 103 in the parallel computer 101, the application programs operating on the server side and the client side are generated to execute the following programs, respectively. The
[0058]

The server-side application program calls a system call for socket, bind, and listen as in the conventional case. These system calls are processed in the same manner as before by the corresponding TCP / IP processing routine. The client-side application program also issues a socket system call as in the conventional case. This system call is also processed in the same manner as before by the corresponding TCP / IP processing routine. In this way, sockets sa and sb0 are generated for the server side and the client side as in the conventional case.
[0059]
Thereafter, the server-side application program calls a function emulation accept (EMU_accept) provided in the TCP / IP emulation library provided in the computer in which the application program is operating, instead of the conventional system call accept. An application program on the client side calls a function emulation connect (EMU_connect) provided in a TCP / IP emulation library provided in a computer in which the application program is operating instead of the conventional function connect. Further, the server side application program calls the function emulation read (EMU_read) provided in the TCP / IP emulation library instead of the conventional system call read, and the client side application program replaces the conventional system call write. The function emulation write (EMU_write) provided in the corresponding TCP / IP emulation library is called. Hereinafter, processing performed by these new functions will be described.
[0060]
(4) Socket connection by TCP / IP emulation library
In FIG. 2, when the server-side application program and the client-side application program call the above-described EMU_accept and EMU_connect functions (processes 501 and 502), these system calls are provided in the server-side TCP / IP emulation library. The function EMU_accept and the EMU_connect provided in the TCP / IP emulation library on the client side are called. The arguments of these system calls are the same as when the server-side application program and the client-side application program call the system calls accept and connect, respectively, and request the TCP /

IP processing routines

114 and 115 to connect the sockets sa and sb0. is there.
[0061]
The called functions EMU_accept and EMU_connect first issue an accept system call and a connect system call, respectively (processes 503 and 504). As arguments of these system calls, arguments for the functions EMU_accept and EMU_connect are used as they are. As a result, the system call accept provided in the TCP / IP processing routine on the server side and the system call connect provided in the TCP / IP processing routine on the client side are called, and the system call accept called as before is The connection request issued by the called system call connect is received, the socket sa 1 is generated, and the socket sa 1 and the socket sb 0 are connected via the communication path 106.
[0062]
Thereafter, the functions EMU_accept and EMU_connect confirm whether or not the other party's IP address is the address of the computer inside the parallel computer 101 (processes 506 and 507). If the partner is inside the parallel computer 101, the socket descriptor is registered in the internal table 410 or 411 (processing 510, 511). In this case, a socket sa1 for the server-side application program 107 and a socket sb0 for the client-side application program are registered in the internal tables 410 and 411, respectively. Further, the TCP /

IP emulation libraries

112 and 113 exchange data necessary for using the high-

speed communication mechanisms

116 and 117 such as identifiers of the computers in the parallel computer 101. For this data exchange, the

application programs

107 and 108 issue function read and write system calls, respectively (processing 512, 513, 515, and 516). The data exchange here is performed via the TCP /

IP processing routines

114 and 115 and the communication path 106 in the same manner as in the past using the already connected sockets sa1 and sb0. Thereafter, the process returns to the server-side and client-side application programs 107 and 108 (processes 518 and 519). At this time, the TCP / IP emulation library 112 on the server side returns the identifier of the socket sa1 generated in the application program 107.
[0063]
When it is determined in the

processes

506 and 507 that the partner is not a computer inside the parallel computer 101, the function EMU_connect returns to the client-side application program 108, and the function EMU_accept returns the socket sa1 to the server-side application program 107. Returning to the program (processes 508 and 509). In this way, the socket sa1 for the server-side application program and the socket sb0 for the client-side application program are connected to the corresponding TCP / IP processing routines via the communication path 106.
[0064]
FIG. 3 shows one state in which sockets generated by the

application programs

107, 108, and 109 according to this embodiment are connected to each other. Here, socket SA1 (402) of application program 107 and socket SB0 (403) of application program 108 are connected (407), and socket SB1 (404) of application program 108 and socket SC0 (405) of application program 109 are connected. Connected (408), the socket SC1 (406) of the application program 109 and the socket SA0 (401) of the application program 107 are connected (409). Here, the TCP /

IP emulation libraries

112 and 113 hold internal socket tables 410 and 411, respectively. In the internal socket tables 410 and 411, when the connection destination of the socket is an internal computer, the socket descriptor is registered. For example, since the socket SB 0 (403), which is the connection destination of the socket SA 1 (402) of the application program 107, is a socket of the application program 108 that runs on the internal computer 103, SA 1 is registered in the internal socket table 410. Similarly, the socket SA1, which is the connection destination of the socket SB0, is a socket of the application program 107 that runs on the internal computer 102, so SB0 is registered in the internal socket table 411. On the other hand, since the socket SC1 (406), which is the connection destination of the socket SA0 (401), is a socket of the application program 109 on the external computer, SA0 is not registered in the internal socket table 410. Similarly, SB1 connected to SC0 is not registered in the internal socket table 411.
[0065]
(5) Separation method between internal communication and external communication
Thereafter, the server-side application program and the client-side application program start data communication. One and the other of these two programs call the functions EMU_write and EMU_read for data transmission and reception, respectively. In the example of the client-side and server-side programs shown above, the server-side application program calls the function EMU_read, and the client-side application program calls the function EMU_write. The arguments specified by these are the same as the arguments for the system calls write and read included in the TCP /

IP processing routines

114 and 115.
[0066]
Referring to FIG. 4, when EMU_read and EMU_write are called (processes 701 and 702), the respective functions correspond to the internal socket tables 410 and 411 corresponding to the socket descriptors of the sockets sa1 and sb0 specified by the respective arguments. It is determined whether or not each is registered (processes 703 and 704). When the respective socket identifiers are registered in the internal socket tables 410 and 411, message-passing communication is performed using the high-

speed communication mechanisms

116 and 117 and the high-speed internal communication network 105 in a procedure described in detail later (processing) 707, 708). If the respective socket identifiers are not registered in the internal socket tables 410 and 411, conventional read and write system calls are issued to the designated sockets as they are (processes 705 and 706). These system calls are processed by the corresponding TCP /

IP processing routines

114 and 115, and communication using the global communication path 106 is executed by the TCP /

IP processing routines

114 and 115.
[0067]
FIG. 5 is an overall configuration diagram for explaining the above-described separation method, showing the flow of data when the

application programs

107, 108, and 109 communicate with each other in the socket connection state shown in FIG. It is. When the application program 107 and the application program 108 communicate, the socket sa1 (402) and the socket sb0 (403) are used as communication ends. These sockets are registered in internal socket tables 410 and 411 held by the TCP /

IP emulation libraries

112 and 113. Therefore, the TCP /

IP emulation libraries

112 and 113 use the high-

speed communication mechanisms

116 and 117 instead of the TCP /

IP processing routines

114 and 115 for data communication processing (601). On the other hand, when the application program 108 and the application program 109 communicate, the socket SB1 (404) and the socket SC0 (405) are used as communication ends. SB1 is not registered in the internal socket table 411 held by the TCP / IP emulation library 113 linked with the application program 108. Therefore, the TCP / IP emulation library 113 uses the TCP / IP processing routine 115 as it is for data communication processing (602). As a result, data exchange with the external TCP / IP processing routine 110 becomes possible.
[0068]
According to this method, the bind system call and the listen system call can be used as they are without changing from the conventional TCP / IP processing routine, and the internal communication and the external communication can be separated.
[0069]
(6) Message passing

type libraries

140 and 141
The

communication hardware

119 and 120 and the high-speed communication libraries 135 and 136 for using the internal high-speed communication network 105 used for each computer of the parallel computer are often vendor-specific, and the usage method varies depending on the machine. is there. Therefore, when trying to create an application program using the high-speed communication hardware and high-speed communication library of each machine, the program has to be a low general-purpose program specialized for the machine. On the other hand, there is an activity to improve the versatility of application programs by preparing a library for communication using communication hardware inside the parallel computer and defining the application program interface of this library as a standard. Active throughout the world.
[0070]
An interface that is currently widely used as a general-purpose interface for using the message type communication is a message passing interface (MPI-Message Passing Interface). For example, see: "MPI: Message Passing Interface Standard version 1.1", MPI Forum, University of Tennessee, 1995. This interface is premised on the existence of the high-speed communication library described above for realizing message-passing type communication. Even if this interface is used, message-passing type communication is fundamental. However, this is not different from that realized by the high-speed communication library.
[0071]
Many parallel computer vendors provide high-speed communication libraries that allow MPI-compliant application programs to use high-speed communication hardware inside the parallel computer.
[0072]
In this specification, an interface for using message type communication is called a message passing type interface, and a library having the interface is called a message passing type library. In particular, an MPI specification interface may be referred to as an MPI or MPI interface or a message passing interface, and a library having the interface may be referred to as an MPI library or a message passing interface library.
[0073]
In this embodiment, a library for exchanging commands or data with an interface defined by a standard MPI specification that can be used in many parallel computers is used as the message passing

type libraries

140 and 141. Therefore, the versatility of the TCP /

IP emulation libraries

112 and 113 is improved. However, interfaces with other specifications may be used.
[0074]
A description example of a conventional application program compliant with MPI is shown below. In the present embodiment, the TCP /

IP emulation libraries

112 and 113 have the following program parts with respect to the part for starting the message passing

type libraries

140 and 141. More details of this program part will be described later.
[0075]

In MPI, all processes to be communicated are launched simultaneously. At this time, a process identifier called a rank is determined for each process. The above example shows a case where two processes, that is, a sender process sender and a receiver process receiver are started up. At this time, rank 0 and 1 are assigned to each process. Each process first initializes MPI using the MPI initialization function MPI_Init, and acquires its own rank using the MPI communication rank function MPI_Comm_rank. Thereafter, processing for each rank is performed. The above example shows a case where the rank 1 process receives the data sent by the MPI transmission function MPI_Send by the MPI reception function MPI_Recv.
[0076]
(7) Message passing type communication using high-speed communication mechanism
Before describing the details of the data reception processing 707 and data transmission processing 708 using the high-

speed communication mechanisms

116 and 117, an outline of the message passing type communication using the high-

speed communication mechanisms

116 and 117 necessary for executing these processes. Will be explained.
[0077]
In general, in order to execute message passing type communication, a transmission computer executes a transmission function for transmitting data, and a reception computer executes a reception function for receiving the data. These functions are repeatedly executed to transfer subsequent data between the computers. In the present embodiment, message passing communication based on the MPI specification will be described as a specific description of message passing communication.
[0078]
In message passing communication according to the MPI specification, the sending computer executes the transmission function MPI_Send, and the receiving computer executes the reception function MPI_Recv. In this case, when the transmission process 708 is activated, the TCP / IP emulation library 113 on the transmission side calls the transmission function MPI_Send as in the example of the application program of the MPI specification described above. Before calling the transmission function MPI_Send, it is necessary to call functions such as MPI_Init and MPI_Comm_rank as initial processes. These processes are performed by the internal table registration process (510, 511) at the same time. On the other hand, the TCP / IP emulation library 112 on the receiving side calls the reception function MPI_Recv in order to use the internal high-speed communication network 105.
[0079]
The buffer address buffer and buffer length length specified by the argument of the function MPI_Send that the TCP / IP emulation library 113 on the transmission side calls in the transmission processing 708 are equal to the corresponding argument of the function EMU_write issued by the application program 108 on the transmission side. Is done. On the other hand, the buffer address buffer and the buffer length length specified by the argument of the function MPI_Recv called by the receiving TCP / IP emulation library 112 in the receiving process 707 are the functions issued by the receiving application program 107 as will be described later. The buffer address and buffer length specified by the argument of EMU_read or the address and size of a specific buffer provided in the TCP / IP emulation library 113 on the receiving side are made equal.
[0080]
When the function MPI_Send in the message passing type library 141 on the transmission side is activated, data corresponding to the length specified by the argument is stored in the buffer at the address specified by the argument for the function MPI_Send to the corresponding high-speed communication library 136. It is requested to transfer to the receiving computer 124 via the high-speed communication network 105. When activated, the function MPI_Recv in the message passing library 140 on the receiving side receives the transferred data from the corresponding high-speed communication library 135 via the internal high-speed communication network 105. Then, it requests to write the data of the length specified by the argument to the buffer at the address specified by the argument to the function MPI_Recv.
[0081]
The activated high-speed transfer library 136 on the transmission side and high-speed transfer library 135 on the reception side perform transmission and reception of the requested data via the internal high-speed communication network 105 by a method known per se. Specifically, this data transfer is performed as follows. In general, a command called a remote memory write command or a PUT command is used. Hereinafter, this command is referred to as a PUT command. A communication method for transferring data via the internal high-speed communication network 105 is performed by three PUT commands.
[0082]
First, when the function MPI_Send is called from the TCP / IP emulation library 113 on the transmission side, the message passing type library 141 on the transmission side reads the buffer length specified by this argument (this is the length of data to be transmitted). A first PUT command is issued to request the high-speed transfer library 136 on the transmission side to transmit a header including attribute information of data including The high-speed transfer library 136 transmits this header to the computer 102 on the receiving side. The internal high-speed communication dedicated hardware 119 in the computer 102 directly writes this data at a predetermined position in the memory 124 on the receiving side.
[0083]
On the other hand, when the reception instruction MPI_Recv is called from the reception-side TCP / IP emulation library 112, the reception-side message passing library 140 first checks whether a header has already been transmitted from the transmission-side computer 103. For this purpose, a data check instruction MPI_probe described later is used. When the header has not been sent, the execution of the reception command ends. When the header has already been sent, it is determined whether or not transmission data can be received from the buffer length in the header. That is, it is determined whether or not the length of the transmission data is equal to or less than the buffer length specified by the reception command MPI_Recv. In general, in message passing type communication, not only MPI specification message passing communication, but if the transmission data length specified by the transmission command is larger than the reception buffer size specified by the reception command, the receiving computer receives the transmission data. It is supposed not to. This is because if the receiving computer receives received data exceeding the size of the receiving buffer, the memory area other than the buffer may be destroyed by the received data. If the message-passing type library 140 on the receiving side determines that the transmission data is receivable as a result of the determination, the message-passing library 140 on the receiving side sends the transmission of the address of the receiving buffer designated by the argument to the receiving command MPI_Recv to the high-speed communication library 135 on the receiving side Issue the requested second PUT command. The high-speed communication library 135 transmits a buffer address to the computer 103 on the transmission side as a response to this command. Internal high-speed communication dedicated hardware 119 in 102 writes this address in a predetermined position in memory 125 on the transmission side. If the transmission data is not receivable as a result of the determination, the message passing type library on the reception side notifies the error to the transmission side computer without returning the buffer address.
[0084]
Finally, the message passing library on the transmission side issues a third PUT command for requesting the transmission-side high-speed transfer library 136 to transmit the transmission data and the received buffer address. The high-speed transfer library 136 transmits this data and buffer address to the computer 102 on the receiving side. The internal high-speed communication dedicated hardware 119 in the computer 102 on the receiving side receives this buffer address and data, and directly writes the received data in the buffer having the address. Thus, the data transfer is completed.
[0085]
The data communication method via the internal high-speed communication network 105 is not limited to the above method, and other methods can be used. For example, depending on the parallel computer, the high-speed communication library can execute a remote memory write command or a GET command in addition to the PUT command. In the case of such a parallel computer, instead of the message passing library 141 on the transmission side issuing the third PUT command, the message passing library 140 on the reception side issues a GET command and The high-speed communication library 135 executes processing for reading transmission data from the memory 125 on the transmission side.
[0086]
(8) Stream communication in message passing type communication
Details of the data reception processing 707 and transmission processing 708 shown in FIG. 4 will be described with reference to the specific examples shown in FIGS. In FIG. 4, as described above, when the transmission-side application program 108 calls the function EMU_write (702) and the transmission processing 708 is activated, the function MPI_Send included in the message passing library 141 is called. (781). In the example shown in FIG. 7, it is assumed that the address Sbuff of the buffer 803 and the buffer size of 50 kilobytes (KB) are specified as arguments when the function EMU_write is called (702). The buffer address and buffer size specified by the argument at the time of calling the function MPI_Send (781) are made equal to these values. In the following, explanation of other arguments of the function to be used is omitted for the sake of simplicity. Further, assuming that the unit of the buffer size to be specified is KB, and when illustrating the argument of the function, this unit KB is not shown for simplicity. In response to this function call 781, the message passing type library 141 instructs the high-speed communication mechanism 117 to transmit a header as described above for data transfer, and then instructs data transmission.
[0087]
On the other hand, in FIG. 4, as described above, when the reception-side application program 107 calls the function EMU_read (701) and the reception processing 707 is started, the reception processing 707 performs the processing shown in FIG. The In the example shown in FIG. 7, the buffer address specified by the argument of the function EMU_read call 701 is Rbuff, the buffer size is 30 KB, and a size smaller than the buffer size specified by the call 702 of the transmission function EMU_write is specified. Assuming that
[0088]
In this reception processing 707, first, data that has been received in a specific data reception buffer (buffer 901 described later) provided in the TCP / IP emulation library 112 on the reception side and has not yet been transferred to the application program on the reception side. An instruction is issued to determine whether or not there remains (process 202). This instruction is not shown in FIG. As assumed now, when the function EMU_read is first executed, the result of this determination is negative. Thereafter, a transmission data detection command MPI_probe 771 is issued, and it is determined whether there is data to be transmitted to the receiving-side application program (process 206). Specifically, this command is a command for determining whether or not the high-speed communication mechanism 116 has already received the header described above regarding data to be transmitted to the application program 107. If this header has not yet been transmitted from the transmission-side application program 108, the reception processing 708 ends the processing and returns to the reception-side application program 107 (processing 207). The receiving-side application program 107 executes processing when reception fails. For example, the function 701 is repeatedly called until reception is successful.
[0089]
Assuming that the header has already been transmitted from the application program 108 on the transmission side, it is determined whether or not the transmission data specified by the header can enter the buffer 812 of the application program on the reception side (process 208). In the example of FIG. 7, the size of the transmission data is 50 KB, and the size of the buffer 812 on the receiving side is 30 KB, and the result of this determination is negative. In such a case, when the receiving TCP / IP emulation library 112 issues an MPI_Recv instruction for receiving this transmission data to the buffer 812 of the receiving application program 107, the receiving message passing library 140 Either processes this command as an error according to the MPI specification, or discards the portion of the transmission data that does not fit in the buffer 812 of the receiving application program.
[0090]
In order to prevent this, in this embodiment, a special buffer 901 is prepared in the reception side TCP / IP emulation library 112 linked to the reception side application program 107, and the reception side TCP / IP emulation library 112 is here. The buffer 901 is requested to receive transmission data once. That is, the function MPI_Recv is designated to specify the address Ebuff of this buffer 901 and the size of all transmission data 50 KB (772).
[0091]
In response to this function call 772, the high-speed communication mechanism 116 on the reception side and the high-speed communication mechanism 117 on the transmission side transmit and receive data as described above, and the high-speed communication mechanism 116 on the reception side transmits the data to the above-described data. Write to buffer 901. The TCP / IP emulation library 112 on the receiving side then copies a memory copy instruction (MEMCPY) to copy the data of size 30 KB specified by the argument at the time of calling the function EMU_Read to the buffer 812 of the application program 107 on the receiving side. (Rbuff, Ebuff, 30 KB)) 773 is issued (process 209). Thereafter, the process returns to the application program (process 211). In this way, transmission / reception between application programs is completed in a state where a part 906 of the received data remains in the buffer 901 in the TCP / IP emulation library 112.
[0092]
Thereafter, when the transmission-side application program 108 further calls the function EMU_write (Sbuff ', 80KB) (790) (FIG. 7B) to transmit the data 80KB in the buffer 804 at the address Sbuff', the transmission process 708 is performed in the same manner. Is executed, and the function MPI_Send (Sbuff ', 80) (791) is called in this process. On the other hand, when the receiving-side application program 107 also calls the function EMU_read (Rbuff ′, 100 KB) (775) to receive 100 KB of data in the buffer 813 at the address Rbuff ′, the reception 707 is executed in the same manner. The
[0093]
In the reception process 707, in the determination in the first determination process 202, it is determined that untransferred data 906 remains in the reception buffer 901 in the TCP / IP emulation library 112 on the reception side. As a result, a memory copy instruction memcpy (Rbuff ′, Ebuff + 30 KB, 20 KB) (776) is issued, and this data 906 is copied to the head area 907 of the buffer 813 in the application program 107 on the receiving side (process 203). The length of the data to be copied to the memory is set in a range that does not exceed the buffer length specified by the second function call to the function EMU_read among the lengths of data held in the buffer 901. In this case, the length of the data held in the buffer 901 is 20 KB, and the length of the data requested to be received is smaller than 100 KB. Therefore, all the data 20 KB is copied.
[0094]
Thereafter, in order to further receive the remaining 80 KB data requested by the receiving-side application program 107, a process 206 is executed. In this process 206, as already described, the command MPI_Probe () for checking whether there is transmission data is issued. Specifically, it is determined whether or not a header for transmission data has been received. In this case, assuming that the function call 791 on the transmission side has already been executed, it is found that there is 80 KB of transmission data. In that case, it is determined whether or not the size of the transmission data can fit in the buffer 813 designated by the reception-side application program 102 (process 208). In this case, since the size of the remaining area 908 of the buffer 813 of the application program on the receiving side is 80 KB, the transmission data can completely enter this area 908 of the buffer 813.
[0095]
As a result, the function MPI_Recv (Rbuff ′ + 20 KB, 80 KB) (778) is called. This function call specifies the address of the remaining area 908 of the buffer 813 and the size of transmission data of 80 KB. In this way, the transmission data is directly received in the area 908 of the buffer 813 (process 210). Thereafter, it is determined whether or not all the data requested to be received have been received (process 212). In this case, since the determination result is affirmative, the reception process 707 ends, and the process returns to the application program (process 211).
[0096]
Through the above procedure, it is possible to realize stream communication in which transmission data specified by a plurality of function EMU_write calls (702, 790) is received as a series of data streams by a plurality of function calls EMU_read (701, 775). it can.
[0097]
Even when the size of transmission data specified by the application program on the transmission side is smaller than the size of the buffer specified by the application program on the reception side, stream communication is easily realized as follows. For example, the stream communication in the case where the transmission side application program repeatedly requests transmission of 50 KB data and the reception side application program requests reception of 100 KB data will be described with reference to FIG.
[0098]
The function MPI_send (781) is called in the transmission process 708 (FIG. 4) for the function EMU_write (702) called by the transmission side application program. In this function call, the address Sbuff and the size 50 KB of the buffer 803 of the transmission side application program are specified.
[0099]
The reception processing 707 (FIG. 4) for the function EMU_read (701) called by the reception-side application program is also processed in accordance with FIG. Under the current assumption, the determination in the process 202 fails. In process 206, it is assumed that it is determined that there is transmission data when the data check instruction 771 is executed. In this case, since the size of the buffer 812 on the reception side is larger than the size of the buffer on the transmission side, the determination result in the processing 208 is affirmative. As a result, the process 210 is executed. In this process, a function MPI_recv (774) for directly receiving the transmission data in the buffer 812 designated by the reception side application program is called. This function call designates the start address Rbuff of the buffer 812 of the reception side application program and the size 50 KB of the transmission side buffer 803. In this way, all the data in the transmission side buffer 803 is written into the first 50 KB area 907 in the reception side buffer 812. Next, the process 212 is executed. In this case, the size of data requested to be received is 100 KB, whereas the size of already received data is 50 KB. Thus, some of the requested data has not yet been received. Accordingly, the result of determination 212 is negative and process 206 is performed again to receive the remaining data.
[0100]
If the application program on the transmission side next calls the function EMU_write 790, the function MPI_send (791) is similarly called in the transmission processing 708 (FIG. 4) corresponding thereto. Also in this function call, the address Sbuff ′ and the size 50 KB of the next buffer 804 of the transmission side application program are designated.
[0101]
If the application program on the transmission side has already called the next function EMU_Send 791 when the process 206 is repeated, the determination result in the process 206 becomes affirmative, and the process proceeds to the determination process 208. In this case, since the size of the remaining area 910 of the buffer 812 on the receiving side is equal to the size of the data to be transmitted, the result of this determination is positive. As a result, the process 210 is executed, and the second reception function 779 for directly writing the transmission data to the remaining area 910 of the reception-side buffer 812 is called. In this function call, the address Rbuff + 50 KB of the remaining area 910 of the buffer 812 of the reception side application program and the size of transmission data 50 KB are designated. Thus, since it is determined in the process 212 that reception of all requested data has been completed, the reception process 707 is completed. When there is no transmission data at the time when the process 206 is repeatedly executed, the reception process 707 ends and the process returns to the receiving application program. Further, when the process 208 is repeated and the determination result in the process 208 is negative, the process 209 is executed. The contents of this process are the same as those already described. As described above, even if the size of the buffer specified by the transmission command EMU_write issued by the transmission-side application program and the size of the buffer specified by the reception command EMU_read issued by the reception-side application program are different, It can be seen that the buffer stream communication is realized even if the number of transmission commands EMU_write issued by the application program is different from the number of reception commands EMU_read issued by the reception-side application program.
[0102]
As can be seen from the above description, according to the present embodiment, applications running on a computer provided with a high-speed communication mechanism of a message passing type communication method, such as inside a parallel computer, use TCP / IP. When performing data communication, high-speed communication utilizing the features of the high-speed communication mechanism becomes possible. In addition, with applications running on other computers, the conventional TCP / IP communication is guaranteed. The user does not need to change any existing TCP / IP application.
[0103]
<Modification of Embodiment 1 of the Invention>
The present invention is not limited to the contents of the first embodiment, but can be implemented by various embodiments including the following modifications and other modifications.
[0104]
(1) Although TCP / IP is used as the communication protocol for the wide area network, other communication protocols can be used instead. At that time, it is needless to say that the TCP / IP processing routine and the emulation library need to be changed.
[0105]
(2) In the first embodiment, the message passing type library is used, but it is also possible not to use it. At this time, the emulation library directly calls the high-speed communication library.
[0106]
(3) Furthermore, it is possible to eliminate this communication library. For example, instead of this, a dedicated circuit can be used.
[0107]
(4) In the first embodiment, it is assumed that all internal computers are connected to a wide area communication network. However, the present invention can be similarly applied to a case where some internal computers are connected to a wide area communication network.
[0108]
(5) In the first embodiment, it is assumed that both the wide area communication network and the internal high-speed communication network are used. However, the stream communication according to the present invention can be applied between computers capable of executing message passing type communication. Therefore, message passing can be performed without using TCP / IP communication to execute this stream communication. It can also be applied to communication between application programs that use only type communication. In this case, a plurality of types of communication networks need not be used. In this case, a modification in which the emulation library on the transmission side is not substantially used is possible.
[0109]
<Embodiment 2 of the Invention>
The TCP / IP processing routine is conventionally provided with a select function as a function that can be used by an application program. In the first place, a socket descriptor is defined as a kind of file descriptor. A select function is prepared as a system call for checking whether it is possible to acquire data from the object specified by the file descriptor. For example, whether or not data that can be received from a certain socket has been sent (or is about to be sent) from the transmission side can be checked by the select function. Specifically, it can be determined whether or not an application program to which a certain socket is assigned has issued a transmission system call write. In the select function, a file descriptor to be watched is specified by a bit string. Each bit of this bit string corresponds to an individual file descriptor, and by setting the bit to 1, the file descriptor is specified. If a plurality of bits are set to 1, a plurality of file descriptors can be watched simultaneously by a single select function. The select function blocks until any of the watched file descriptors are ready to receive data.
[0110]
In the TCP / IP emulation library used in the first embodiment of the invention, the conventional socket library is not used during communication inside the parallel computer. Therefore, the select system call checks whether data can be received from the socket for internal communication. I can't. Therefore, in the present embodiment, even in the computer system that uses the TCP / IP emulation library as in the first embodiment, the application program can use the select function as in the conventional case.
[0111]
In the present embodiment, processing corresponding to select dedicated to internal communication is performed on socket descriptors registered in the internal socket tables 410, 411, etc., and other file descriptors are conventionally processed. A select function EMU_select is provided in the TCP / IP emulation library 113 to perform the separation of using the select system call.
[0112]
However, the select for internal communication and the select system call must be executed simultaneously in parallel. By executing two select sequentially, for example, while the select for internal communication is waiting for data, even if the socket for communication with the outside, the standard input / output, etc. are ready to receive data. This is because it cannot be detected.
[0113]
As a means for realizing the simultaneous execution of the select in a pseudo manner, it is conceivable to repeat the issuance of the internal select and the select system call in a non-blocking manner in a spin loop. However, if this method is used, the computer will be occupied until data can be received, and processing will not be passed to other processes running on the same computer.
[0114]
On the other hand, in this embodiment, a method of inserting an instruction for transferring processing to another process for each loop is adopted. By this method, the occupation of the computer by the spin loop can be avoided.
[0115]
9 shows that the

application programs

108 and 109

issue transmission instructions

1002 and 1003 to transmit data to the application program 107 in the connection state shown in FIG. A state of being watched by a select instruction 1001 is shown. It is assumed that the bit string specified by the select instruction 1001 corresponds to the sockets SA0 and sa1 (1004, 1005). Of these, sa1 is registered in the internal socket table, and is watched by the internal select (1008). On the other hand, since SA0 is not registered in the internal socket table, it is watched by the select system call (1009). In the select system call issued by the TCP / IP emulation library, it is not necessary to keep an eye on sa1, so a bit string in which the bit corresponding to sa1 is set to 0 is specified for the bit string specified by the select instruction 1001 issued by the application program 107. To do. Processes 1008 and 1009 are issued non-blocking and are repeated alternately (1010, 1011). However, the process is temporarily transferred to another process in the middle of repetition.
[0116]
In order to enable the application program to use the select instruction and the internal select function shown in FIG. 9, the emulation select function EMU_select is provided in the TCP /

IP emulation libraries

112, 113, etc. Call and use. When the application program calls the EMU_select function, a bit string ap_bits composed of bits corresponding to one socket is designated as in the conventional case. The TCP / IP emulation library executes processing in accordance with FIG. 10 in response to this function call.
[0117]
First, a mask for setting the bit corresponding to the socket descriptor registered in the internal socket table to 0 is applied to the bit string ap_bits (process 1101) (process 1102). in_mask is a bit string in which the bits corresponding to all the socket descriptors registered in the internal socket table are 0, and the other bits are 1. Therefore, the bit string ex_bits created by applying ap_bits to in_mask is a bit string excluding the socket descriptor for internal communication from the file descriptor specified by ap_bits. After that, the select system call with ex_bits as an argument and the internal select processing are executed once in a non-blocking manner (processing 1103 and 1104). If any of the file descriptors examined in this processing is in a state where data can be received. If there is, the process returns (process 1106). If not, the process is temporarily transferred to another process (process 1107), and the select process is repeated again (process 1108).
[0118]
Thus, according to the present embodiment, the select function can be used by the application program even in the computer system using the internal high-speed communication mechanism as in the first embodiment.
[0119]
<Third Embodiment of the Invention>
In the above-described method for realizing a select function using a spin loop, the processing 1107 for transferring the processing to another process is inserted in the middle of the spin loop to avoid the occupation of the computer. However, the priority of processes processed on the same computer If the value is low, the process may not be processed. This can be avoided by using a blocking wait that waits for the arrival of data, but it is not possible to simultaneously execute the select processing for blocking weight internal communication and the select system call for blocking weight on one process and one thread. Can not.
[0120]
On the other hand, in this embodiment, two threads are first generated, and an internal select is executed on one thread and a select system call is executed on the other. According to this method, the internal select and select system calls can operate independently on each thread, so that data can be blocked and monitored at the same time.
[0121]
FIG. 11 shows the same communication state as shown in FIG. However, the method for realizing simultaneous execution of the select for internal use and the select system call is different. In FIG. 11, these two select processes are executed on different threads (1203, 1204). Similar to the process 1009, the select system call on the thread 1203 specifies a bit string in which the bit corresponding to sa1 is set to 0.
[0122]
Referring to FIG. 12, in this embodiment, the bit corresponding to the socket descriptor registered in the internal socket table is set to 0 for the bit string ap_bits specified when the application program issues the EMU_select function. The process up to the creation of ex_bits by applying the above mask is the same as the process (1101, 1102) of FIG. 10 (process 1301, 1302). After that, first, a non-blocking select system call (process 1303) and an internal select process (process 1304) are performed once. If any of the file descriptors examined in this process is ready to receive data, the process returns. Otherwise, two threads are generated (process 1307), and a select system call (process 1308) and an internal select (process 1309) are executed on each of them. Each of these processes blocks until the data reception state is reached. Of the two processes, the one that can be unblocked first cancels the other thread (processes 1310 and 1311) and returns (processes 1314 and 1315). In this cancellation process, the thread is not forcibly terminated, but is marked as no longer needed. The thread checks whether or not the mark is attached when the block is unwound (processes 1316 and 1317). If the thread is attached, the thread is canceled and disappears (processes 1318 and 1319).
[0123]
In the select process, the non-blocking select process of the process 1303 and the process 1304 is performed once before the thread division for the following reason. If the internal socket and other file descriptors are ready to receive data before the processing 1301 is issued, the select function must be able to detect both. However, if you suddenly split the thread and start watching the internal socket and the other file descriptors separately, if one of the two threads detects it a little earlier, it cancels the other thread. Only the state can be detected. On the other hand, by executing the processing 1303 and the processing 1304, it is possible to surely check both the state of the internal socket before the processing 1301 is issued and the other descriptors.
[0124]
Since this method sleeps and waits for the arrival of data, the data detection timing is delayed as compared with the method of waiting in the spin loop of the second embodiment, but it also hinders processing even for low priority processes. Can be avoided.
[0125]
【The invention's effect】
According to the present invention, stream communication can be realized by message passing type communication.
[0126]
According to another aspect of the present invention, an application program that operates on a computer connected to a first communication network and a second communication network that is faster than the first communication network is transferred to another computer connected to the first communication network. It is possible to communicate with other application programs that operate on the basis of the first communication protocol, and with other application programs that operate on other computers connected to the second communication network. High-speed communication using the second communication network can be performed.
[0127]
More specifically, the first communication protocol can use a TCP / IP communication protocol. Further, communication using the second communication network can be message passing communication.
[Brief description of the drawings]
FIG. 1 is an overall configuration diagram of an embodiment of the present invention.
FIG. 2 is a flowchart of socket connection.
FIG. 3 is a diagram for explaining socket connection;
FIG. 4 is a flowchart of an internal / external communication separation method.
FIG. 5 is a diagram for explaining an internal / external communication separation method;
FIG. 6 is a flowchart of stream communication.
FIG. 7 is a diagram showing an example of an instruction sequence used for transmission / reception operations.
FIG. 8 is a diagram showing another example of an instruction sequence used for transmission / reception operations.
FIG. 9 is a diagram for explaining a method for realizing a select function by a spin loop;
FIG. 10 is a flowchart of a method for realizing a select function in a spin loop.
FIG. 11 is a diagram for explaining a method for realizing a select function using a thread;
FIG. 12 is a flowchart of a method for realizing a select function using threads.
FIG. 13 is a hierarchical diagram of a conventional TCP / IP communication protocol.
FIG. 14 is a diagram for explaining conventional stream communication.
[Explanation of symbols]
105. . . Parallel computer internal communication network, 106. . . Global communication network,
111. . . Socket application program interface, 118. . . MPI specification interface, 119, 120, 121, 122, 123. . .

Network interface hardware

140, 141. . . Message passing type library, 401, 402, 403, 404, 405, 406. . . Socket, 407, 408, 409. . . Socket connection, 410, 411. . . Internal socket table, 601. . . Data path in parallel computer internal communication, 602. . . Data path at the time of communication with an external computer, 803, 804, 812, 813. . . Application program buffer 809. . . Stream, 901. . . A buffer in the TCP / IP emulation library.

Claims

When the transmission data length specified by the transmission command is larger than the reception buffer size specified by the reception command, the data communication based on the message passing communication system that is not received is operated on different computers, and the message passing type A computer-to-computer data transmission / reception method between application programs, each premised on the use of a communication protocol other than the communication method,
A plurality of transmission data specified respectively by each of a plurality of transmission instructions first application program being executed has been issued by the first computing machine, the second application program running on the second computing machine In response to a plurality of reception commands issued by the second computer, each of them is temporarily received by message passing communication in a buffer in the emulation library under the control of the emulation library provided on the second computer.
Each application area in which each reception command designates a portion obtained by dividing the data into a size portion designated by each of the plurality of reception commands, among a series of data consisting of a series of a plurality of transmission data. A computer-to-computer data transmission / reception method for processing each received transmission data under the control of the emulation library so as to be stored in the buffer of the computer.

When the transmission data length specified by the transmission command is larger than the reception buffer size specified by the reception command, data communication based on message-passing communication that is not received is run on different computers, and the message-passing type An inter-computer data transmission / reception method performed between application programs on the premise of using a communication protocol other than a communication method ,
In response to a plurality of transmission instructions first application program running on the first computing machine issued with a first communication libraries for performing communication message passing type, the transmission instruction of said plurality of multiple transmission instructions for requesting transmission of a plurality of transmission data each buffer is retained in the application area designated respectively by each by a first emulation library provided in the first computer the first Issued to the communication library,
In response to a plurality of reception commands issued by a second application program executed on a second computer having a second communication library for executing message passing type communication, the plurality of transmission data are transmitted. A plurality of reception commands for temporarily receiving each of them by message passing communication in a buffer in the second emulation library provided in the second computer is sent to the second communication library by the second emulation library. Issued against
A series of data composed of a series of a plurality of transmission data specified by the plurality of transmission commands issued by the first emulation library is divided into size portions respectively specified by the plurality of reception commands. The plurality of transmission data received by the plurality of reception commands issued by the second emulation library are stored in the second computer so as to be stored in the buffers of the respective application areas specified by the reception commands. A computer-to-computer data transmission / reception method of controlling the position and movement in the second computer after receiving each transmission data by the second emulation library.

When the transmission data length specified by the transmission command is larger than the reception buffer size specified by the reception command, data communication based on message-passing communication that is not received is run on different computers, and the message-passing type An inter-computer data transmission / reception method performed between application programs on the premise of using a communication protocol other than a communication method ,
In response to a plurality of transmission commands issued by a first application program executed on a first computer having a first communication library for executing message passing communication, the plurality of transmissions A plurality of transmission instructions respectively requesting transmission of a plurality of transmission data respectively held in application area buffers designated by the instructions are sent to the first emulation library provided in the first computer by the first emulation library. Issued to the communication library of
In response to a plurality of reception instructions issued by a second application program executed on a second computer having a second communication library for executing message passing communication, the plurality of transmission data A plurality of reception commands for receiving the message by a message passing type communication in a buffer in the second emulation library provided in the second computer by the second emulation library. Issued against
A series of data composed of a series of a plurality of transmission data specified by the plurality of transmission commands issued by the first emulation library is respectively specified by the plurality of reception commands issued by the second application program. to store in the buffer of the application area that, by the second emulation library, the storage location of the received data specified by each of the plurality of received instructions issued by (i) the second emulation library, currently processed And an address of an application data area designated by a reception command issued by the second application program in the middle or an address belonging to an available buffer of the second communication library, and (b) the second Communication library is available To address the serial buffer, and controls the movement of a subsequent said second calculation machine of the transmission data received by the receiving instruction second emulation library issued,
The plurality of reception commands issued by the second emulation library are issued in correspondence with one of the plurality of transmission commands issued by the first emulation library,
The size of the reception data specified by each of the plurality of reception commands issued by the second emulation library is specified by one corresponding transmission command among the plurality of transmission commands issued by the first emulation library. An inter-computer data transmission / reception method set to be equal to the size of transmission data.

When the transmission data length specified by the transmission command is larger than the reception buffer size specified by the reception command, data communication based on message-passing communication that is not received is run on different computers, and the message-passing type An inter-computer data transmission / reception method performed between application programs on the premise of using a communication protocol other than a communication method,
(A) Issuing a transmission command determined by the message passing type communication requesting transmission of any transmission data specified by the transmission side application in the transmission side computer;
(B) The transmission data length specified by the transmission command is detected by the reception-side emulation library operating on the reception-side computer before the transmission data is transmitted,
(C) The reception data length specified by the reception command issued by the reception side application is compared with the transmission data length by the emulation library on the reception side,
(D) Depending on the result of the comparison, it is requested to receive the transmission data in an emulation library buffer in the receiving computer or to receive the transmission data in an application area buffer. Issuing a reception command determined by the message passing communication by the emulation library on the reception side,
(E) In response to the transmission command and the reception command issued by the reception side emulation library, the transmission data is transmitted from the transmission side computer to the emulation library by the transmission side computer and the reception side computer. Data transmission / reception method for transferring data to the application buffer or the application area buffer .

When (f) the transmission data is sent to the buffer for the emulation library, the data of only the receive data length from the buffer for the emulation library in the buffer of the application de region where the received command is specified, the 5. The inter-computer data transmission / reception method according to claim 4, further comprising a step of copying by a reception-side emulation library.

The (g) step (e), when the transmission data is transferred to the buffer of the application area, waits to perform the above step (a) with respect to the transmitting side computer transmits the following data,
(H) with respect to the subsequent transmission data, when the step (a) is performed,
6. The inter-computer data transmission / reception method according to claim 5, further comprising a step of executing the steps after the step (b) with respect to the subsequent transmission data and the remaining data of the reception data length requested by the reception command.

(I) Before executing step (a), it is determined whether data that has been received and not transferred to any application area remains in the emulation library buffer;
(J) When there remains data in the computer between data transmission and reception method according to claim 6, further comprising the step of copying to the extent that the buffer for the emulation library does not exceed the received data length in the buffer of the application area .