JP3641837B2

JP3641837B2 - Data transfer method for distributed memory parallel computer

Info

Publication number: JP3641837B2
Application number: JP14939994A
Authority: JP
Inventors: 啓明藤井; 泰弘稲上; 俊明垂井
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1994-06-30
Filing date: 1994-06-30
Publication date: 2005-04-27
Anticipated expiration: 2020-04-27
Also published as: JPH0816539A

Description

【０００１】
【産業上の利用分野】
本発明は、分散メモリ型並列計算機におけるデータ転送方法、それを実現する分散メモリ型並列計算機および要素プロセッサに関し、特に、並列計算機を構成する任意の要素プロセッサが、自らを含めて任意の２つの（送信側と受信側）要素プロセッサが具備する主記憶装置間でデータの送受信を可能とするデータ転送方法、それを実現する分散メモリ型並列計算機および要素プロセッサに関する。
【０００２】
【従来の技術】
近年の高度な情報化社会において、情報処理装置に対する処理量の増大、処理速度の高速化などが強く要求され、その要求に答えるために複数の演算プロセッサを連携して構成した並列計算機が開発された。ある並列計算機は、数台の演算プロセッサを有し、その数台の演算プロセッサで１つのメモリを共有して用いる形で構成された。この種の並列計算機はＴＣＭＰ（ＴｉｇｈｔｌｙＣｏｕｐｌｅｄＭｕｌｔｉ−Ｐｒｏｃｅｓｓｏｒ）型の並列計算機と呼ばれている。一方でＴＣＭＰ型よりもより多くの演算プロセッサ、具体的には数百台から数千台の演算プロセッサを有する並列計算機も登場した。この並列計算機は、ハードウェア上の実現の難易度の観点から、全演算プロセッサで１つのメモリを共有するような方式をとらずに、それぞれの演算プロセッサが独立してメモリを有する方式をとったため、分散メモリ型の並列計算機と呼ばれている。
分散メモリ型の並列計算機はＴＣＭＰ型の並列計算機に比べて高性能を達成できる。しかし、分散メモリ型の並列計算機は、メモリが複数の演算プロセッサに分散されて設けられるため、単一演算プロセッサと単一メモリを想定した従来プログラミングスタイルに基づくプログラムの移植性やプログラミングの容易性などに問題点があるとの指摘も存在した。そこで、最近では、米国Ｓｔａｎｆｏｒｄ大学の研究に代表されるような分散メモリ型の並列計算機に対して、各演算プロセッサが互いに他の演算プロセッサが有するメモリを参照できるようにする分散共有メモリ方式を導入する傾向が高くなっている。
【０００３】
分散共有メモリを実現するためには、他の演算プロセッサが有するメモリをいかにして参照させるかという課題が存在する。この課題はアドレッシングによって解決する。具体的には、自らのアドレス空間に他の演算プロセッサが有するメモリをマッピングする。これによって実現されるアドレス空間を以降グローバルアドレス空間と呼ぶ。図６は、グローバルアドレス空間の例である。グローバルアドレス空間６０１は、並列計算機を構成する要素プロセッサの台数分に分割される。分割された領域６０３、６０５、・・・６０７はそれぞれ異なる要素プロセッサ用に割り当てられる。そして、それぞれの領域６０３、６０５、・・・６０７の中の領域６０２、６０４、・・・６０６に対して、該当する要素プロセッサが具備する主記憶装置がマップされる。
例えば、ＩＢＭが実験的に試作した並列計算機であるＲＰ３では、１９８５年のＩｎｔｅｒｎａｔｉｏｎａｌＣｏｎｆｅｒｅｎｃｅｏｎＰａｒａｌｌｅｌＰｒｏｃｅｓｓｉｎｇの予稿集７８２ページから７８９ページの予稿である”ＲＰ３Ｐｒｏｃｅｓｓｏｒ−ＭｅｍｏｒｙＥｌｅｍｅｎｔ”および特公平５−２０７７６号に開示されているとおり、図１３に示す形態のアドレスを用いて他の演算プロセッサが有するメモリを参照する。図１３のアドレスでは、参照すべきメモリを有する演算プロセッサをプロセッサ番号フィールド１３０１で指定し、そのメモリ内のアドレスをオフセットフィールド１３０２で指定している。
【０００４】
【発明が解決しようとする課題】
上述した従来の分散共有メモリ方式では、ある演算プロセッサが他の演算プロセッサが有するメモリを参照する場合に、自プロセッサが有するメモリを参照する時と同様なロード／ストア命令を用いていた。すなわち、分散共有メモリ方式を並列計算機を構成する要素プロセッサ間のデータ転送インタフェースとして捉えるならば、従来の分散共有メモリ実現方式では、ワード単位の小粒度のデータ転送しか実現し得なかった。
例えば、データベース処理にこの並列計算機を適用することを考えると、大規模なデータベースの（メモリ間）コピーが発生した場合に、多量のワード単位データ転送を行わなければならないため、オーバヘッドが大きくなり性能的に問題が大きい。
また、このインタフェースでは、データ転送を起動するプロセッサ自身が必ずデータ転送元あるいはデータ転送先のどちらかになる必要がある。すなわち、このインタフェースは、２方向のみのインタフェースである。
これに対し、分散メモリ型の並列計算機が基本的にサポートしているメッセージ・パッシング・インタフェースは、数ワードから数百、数千のワードを一度に転送できるインタフェースである。しかし、従来のメッセージ・パッシング・インタフェースでは、明示的にデータの送信先のプロセッサ番号を指定する必要があった。また、送信するデータは、自プロセッサの有するメモリ内に存在しなければならなかった。すなわち、従来のメッセージ・パッシング・インタフェースは、自プロセッサから他プロセッサへの一方向のインタフェースであった。
本発明の目的は、データ群の帰属先プロセッサを意識せず、かつ、可変量のデータ群を対象とし、しかも、任意要素プロセッサ間に対して任意要素プロセッサが起動可能なデータ転送方法、それを実現する分散メモリ型並列計算機および要素プロセッサを提供することにある。
【０００５】
【課題を解決するための手段】
上記課題を解決するために、本発明は、従来のメッセージ・パッシング・インタフェースに基づくデータ転送方式に、分散共有メモリ方式で実現するグローバルアドレス空間の考え方を導入したものである。
具体的には、図５に示すように、送信するデータ群の送信元プロセッサ（この場合、自プロセッサ）での先頭アドレス（ｓｒｃ−ａｄｒ）、送信先のプロセッサ番号（ｄｓｔ−ＰＵ＃）、送信先でのデータ群の書き込みメモリ領域の先頭アドレス（ｄｓｔ−ａｄｒ）、データ転送量（ｌｅｎｇｔｈ）、および、送受信対象データのメモリ領域における存在間隔（ｓｔｒｉｄｅ）の主に５つのパラメータで表現される従来のメッセージ・パッシング・インタフェースを変更し、図３に示すように、送信するデータ群の先頭グローバルアドレス（ｓｒｃ−ａｄｒ）、転送データ群の書き込み先の先頭グローバルアドレス（ｄｓｔ−ａｄｒ）、データ転送量（ｌｅｎｇｔｈ）、および、送受信対象データのメモリ領域における存在間隔（ｓｔｒｉｄｅ）という４つのパラメータで表現されるインタフェースを定義する。本インタフェースは、分散メモリ型の並列計算機において、該並列計算機を構成する要素プロセッサの各々に所属する主記憶装置を全てグローバルアドレス空間によって参照して、任意のグローバルアドレス領域から他の任意のグローバルアドレス領域へのメモリ領域間データコピーを実現するインタフェースである。本発明は、このようなインタフェースを用いてコピー態様でデータ転送を実現することを特徴としている。
【０００６】
【作用】
本発明は、上記手段によって、分散メモリ型の並列計算機において、データ転送をメモリ領域間データコピーの概念で実現できる。したがって、グローバルアドレス空間参照に際しても、ワード単位から数百、数千ワード以上のデータ群を一度に対象とできる。
また、データ転送という観点からは、データ転送起動者がデータまたはデータ群の帰属先プロセッサを意識する必要がなくなる。この特徴は、上記手段を適用する並列計算機向けのプログラムの記述容易性を高める効果がある。
さらに、上記手段によって実現されるデータ転送方法では、任意要素プロセッサ間（任意主記憶装置間）のデータ転送が可能であり、また、データ転送起動者を、データ転送元あるいはデータ転送先のいずれとも規定しない。すなわち、要素プロセッサＢから要素プロセッサＣへのデータ転送を要素プロセッサＢでも要素プロセッサＣでもない要素プロセッサＡが指示できる。これは、一方向のみのインタフェースであった従来のメッセージ・パッシング・インタフェースや、せいぜい２方向であった従来の分散共有メモリ方式に基づくデータ転送インタフェースを凌駕する多方向のインタフェースであり、この特徴がプログラムの記述容易性を高める効果も大きい。特にこの特徴はサーバ・クライアント・モデルのプログラム記述にとって効果が大きいと考えられる。
【０００７】
【実施例】
本発明の実施例を図を用いて詳細に説明する。
図２は、分散メモリ型の並列計算機を構成する要素プロセッサの一実施例である。同図において、要素プロセッサ２０１は、プログラム処理を行う命令プロセッサ２０２、命令プロセッサ２０２に接続され、命令プロセッサ２０２から出されるコマンド／アドレス／データの組に従って、後述する主記憶装置２０７、Ｉ／Ｏデバイス２０５およびネットワークインタフェース２０８内部などへのアクセスを発行するメモリアクセスインタフェース２０３、Ｉ／Ｏインタフェース２０４、メモリ制御ユニット２０６、他の要素プロセッサ（２０１と同様な構成を有する）と要素プロセッサ間結合網（ネットワーク）を介してパケットおよびデータの受渡しを行うネットワークインタフェース２０８、Ｉ／Ｏインタフェース２０４に接続されるＩ／Ｏデバイス２０５、メモリ制御ユニット２０６に接続される主記憶装置２０７、および、メモリアクセスインタフェース２０３、Ｉ／Ｏインタフェース２０４、メモリ制御ユニット２０６およびネットワークインタフェース２０８を接続するバス２０９などから構成される。
本発明は、データ転送を実現するデータ転送機構の根幹であるネットワークインタフェース２０８に関するものである。
【０００８】
次に、本発明で定義するデータ転送インタフェースについて説明する。
図３は、本発明で定義するデータ転送インタフェースをＣ言語などのプログラミング言語を使って関数の形で表現したものである。該インタフェースを適用する並列計算機向きのプログラム中でデータ転送を表現する場合には、実際に図３に準ずる形で記述される。
図３の第１パラメータ“ｓｒｃ−ａｄｒ”は、転送する一連のデータ群の先頭グローバルアドレスである。また、第２パラメータ“ｄｓｔ−ａｄｒ”は、転送データ群の書き込み先の先頭グローバルアドレスである。第３パラメータ“ｌｅｎｇｔｈ”は、転送データ量であり、第４パラメータ“ｓｔｒｉｄｅ”は、転送対象データのメモリ領域における存在間隔である。図１４に示すとおり、“ｓｔｒｉｄｅ”は、例えば、転送順で連続する転送対象データがアドレス順で隣り合うときに１、アドレス順で１つおきのとき２（以下同様）となる。
【０００９】
本発明で定義するデータ転送インタフェースでは、図３に示した４つのパラメータのうち、最低限第１、第２、第３の３つのパラメータを指定する必要がある（ｓｔｒｉｄｅは１に固定することで省略可能である。逆に、本データ転送方法を拡張すれば他にもパラメータを設定可能である）。
図３は、“ｓｒｃ−ａｄｒ”というグローバルアドレスから始まる“ｌｅｎｇｔｈ”×“ｓｔｒｉｄｅ”個分のデータ領域から、データを“ｌｅｎｇｔｈ”個だけ“ｓｔｒｉｄｅ”間隔で読出した後、読み出した全データを、“ｄｓｔ−ａｄｒ”というグローバルアドレスから始まる“ｌｅｎｇｔｈ”×“ｓｔｒｉｄｅ”個分のデータ領域へ“ｓｔｒｉｄｅ”間隔に“ｌｅｎｇｔｈ”個だけ書き込むという操作を表現している。すなわち、このインタフェースを用いて実現するのはデータ転送というよりは、むしろ、任意のグローバルアドレス領域から他の任意のグローバルアドレス領域へのメモリ領域間データコピーとみなすことができる。
なお、上記でグローバルアドレスと表現しているのは、図６に例示するようなグローバルアドレス空間上のアドレスであり、例えば、図１３のような形式をとる。図１３のアドレスは、参照すべきメモリを有する要素プロセッサをプロセッサ番号フィールド１３０１で指定し、そのメモリ内のアドレスをオフセットフィールド１３０２で指定している。また、図６のグローバルアドレス空間６０１は、要素プロセッサの台数分に分割されており、分割された領域６０３、６０５、・・・６０７はそれぞれ異なる要素プロセッサ用に割り当てられている。そして、それぞれの領域６０３、６０５、・・・６０７の中の領域６０２、６０４、・・・６０６に対して、該当する要素プロセッサが具備する主記憶がマップされている。
【００１０】
次に、図１を用いて本発明に基づくネットワークインタフェース２０８の構成および各部動作を詳細に説明する。なお、前もって、図１の信号線Ｌ１５、Ｌ１６、Ｌ１７について誤解のないように説明しておく。それぞれの信号線は紙面の都合上いくつかのレジスタを表現する四角の下側あるいは裏側を走っているイメージで書き入れている。信号線Ｌ１５は、送信元アドレスレジスタ１１７の下を通ってメッセージ送出部１０３の入力信号となっている。信号線Ｌ１６は、送信先アドレスレジスタ１１８および送信元アドレスレジスタ１１７の下を通ってメッセージ送出部１０３の入力信号となっている。信号線Ｌ１７は、送信データ長レジスタ１１９、送信先アドレスレジスタ１１８および送信元アドレスレジスタ１１７の下を通ってメッセージ送出部１０３の入力信号となっている。
図１におけるネットワークインタフェース２０８とバス２０９、ネットワークの接続関係は先に図２の説明で述べたとおりである。
【００１１】
ネットワークインタフェース２０８は、大きく分けてメッセージ送信部、メッセージ受信部、主記憶アクセス部およびバスインタフェース部１０１の４つの部分から構成される。
メッセージ送信部は、メッセージ送出部１０３、送信元アドレスレジスタ１１７、送信先アドレスレジスタ１１８、送信データ長レジスタ１１９、送信ストライド幅レジスタ１２０、書き込み制御部１２１、セレクタ１２３、セレクタ１２４、自要素プロセッサからの要求とネットワークを介したデータ転送要求（ヘッダ解析部から）を調停する要求調停部１２２、アドレス加算部１１６、比較器１０４、自ＰＵ番号レジスタ１０５（ＰＵは要素プロセッサの略称）などからなる。
メッセージ受信部は、メッセージ受取部１０６、ヘッダ解析部１０７およびアドレス加算部１１２などから構成される。
主記憶アクセス部は、主記憶読出し部１２５および主記憶書き込み部１２８から構成される。
バスインタフェース部１０１は、バス２０９に接続され、命令プロセッサ２０２からメモリアクセスインタフェース２０３を通し、さらにバス２０９を介して伝えられる以下の３種の要求を受取り、必要な処理をする。
（１）メッセージ送出部１０３へのメッセージ送信開始指令信号Ｌ１の伝達。
（２）要求調停部１２２へのメッセージ送信要求信号Ｌ１３の伝達。
（３）送信元アドレスレジスタ１１７、送信先アドレスレジスタ１１８、送信データ長レジスタ１１９、送信ストライド幅レジスタ１２０への値の書き込み。
さらに、バスインタフェース部１０１は、要求調停部１２２の調停結果を逆にバス２０９を介して、メモリアクセスインタフェース２０３を通し、命令プロセッサ２０２に伝える。また、バスインタフェース部１０１は、ネットワークインタフェース２０８内の主記憶読出し部１２５および主記憶書き込み部１２８からの主記憶アクセスを実現する。
【００１２】
メッセージ送信部内の送信元アドレスレジスタ１１７、送信先アドレスレジスタ１１８、送信データ長レジスタ１１９、送信ストライド幅レジスタ１２０は、それぞれ図３に示した４つのパラメータである“ｓｒｃ−ａｄｒ”すなわち転送する一連のデータ群の先頭グローバルアドレス、“ｄｓｔ−ａｄｒ”すなわち転送データ群の書き込み先の先頭グローバルアドレス、“ｌｅｎｇｔｈ”すなわち転送データ量、“ｓｔｒｉｄｅ”すなわち転送対象データのメモリ領域における存在間隔を格納するためのレジスタである。
送信元アドレスレジスタ１１７および送信先アドレスレジスタ１１８には、図１３に例示するような形式をとるグローバルアドレスが格納されるため、図７に示す形のレジスタ７０１を用いる。レジスタ７０１は、図１３におけるプロセッサ番号フィールド１３０１およびオフセットフィールド１３０２を格納するために、それぞれＰＵ番号フィールド７０２およびＰＵ内アドレスフィールド７０３を有する。
【００１３】
メッセージ送信部からは、送信元アドレスレジスタ１１７の内容に応じて要求メッセージ送信とデータメッセージ送信の２種類のメッセージ送信が発生しうる。送信元アドレスレジスタ１１７のＰＵ番号フィールドの内容を伝える信号線Ｌ２５の値と自ＰＵ番号レジスタ１０５の値を比較器１０４で比較した結果、値が等しければデータメッセージ送信が発生する。逆に、比較器１０４での比較の結果、値が異なれば、送信元アドレスレジスタ１１７のＰＵ番号フィールドの内容が示す要素プロセッサに対してデータ転送を要求する要求メッセージ送信が発生する。
要求メッセージとデータメッセージのそれぞれに対しては、図９、図１０に示す別個のメッセージヘッダ９０１および１００１が定義されている。
【００１４】
要求メッセージに対する図９のメッセージヘッダ９０１には、メッセージ種類９０２、送信元ＰＵ番号９０３、送信元アドレス９０４、送信先アドレス９０５、送信データ長９０６、送信ストライド幅９０７などの情報が含まれている。メッセージ種類９０２は、要求メッセージ／データメッセージの別を示す情報（１ビットで可）であり、この場合要求メッセージを示す。送信元ＰＵ番号９０３は、信号線Ｌ１４を介してメッセージ送出部１０３に伝えられる送信元アドレスレジスタ１１７のＰＵ番号フィールドの内容であり、すなわち、転送すべきデータが格納されている主記憶装置を有する要素プロセッサの番号である。送信元ＰＵ番号９０３は、この要求メッセージ自身の送信先要素プロセッサの番号でもある。送信元アドレス９０４、送信先アドレス９０５、送信データ長９０６、送信ストライド幅９０７は、それぞれ信号線Ｌ１４、Ｌ１５、Ｌ１６、Ｌ１７を介してメッセージ送出部１０３に伝えられる送信元アドレスレジスタ１１７、送信先アドレスレジスタ１１８、送信データ長レジスタ１１９、送信ストライド幅レジスタ１２０の内容である。
【００１５】
データメッセージに対する図１０のメッセージヘッダ１００１には、メッセージ種類９０２、送信先ＰＵ番号１００３、送信先アドレス９０５、送信データ長９０６、送信ストライド幅９０７などの情報が含まれている。メッセージ種類９０２は、前述のとおり要求メッセージ／データメッセージの別を示す情報であり、この場合データメッセージを示す。送信先ＰＵ番号１００３は、信号線Ｌ１５を介してメッセージ送出部１０３に伝えられる送信先アドレスレジスタ１１８のＰＵ番号フィールドの内容であり、すなわち、転送するデータを書き込むべき主記憶装置を有する要素プロセッサの番号である。送信先ＰＵ番号１００３は、このデータメッセージ自身の送信先要素プロセッサの番号でもある。送信先アドレス９０５、送信データ長９０６、送信ストライド幅９０７は、それぞれ信号線Ｌ１５、Ｌ１６、Ｌ１７を介してメッセージ送出部１０３に伝えられる送信先アドレスレジスタ１１８、送信データ長レジスタ１１９、送信ストライド幅レジスタ１２０の内容である。
【００１６】
メッセージ送出部１０３は、信号線Ｌ８を介して伝えられる比較器１０４での比較結果にしたがって、信号線Ｌ１４、Ｌ１５、Ｌ１６、Ｌ１７を介して伝えられる情報をもとに、上述のような要求メッセージ用およびデータメッセージ用のメッセージヘッダの作り分けを行い、そのメッセージヘッダを信号線Ｌ４を介してネットワークへ送出することで、要求メッセージおよびデータメッセージの種別分けを行う。さらに、その送信がデータメッセージ送信であった場合には、メッセージヘッダの送信に続いて転送データの送出を行う。データメッセージパケットを図１５に示す。
転送データの送出は、メッセージ送出部１０３が信号線Ｌ６を介して主記憶読出し部１２５に主記憶読出し要求を伝えて実現する。主記憶読出し部１２５は、バスインタフェース部１０１を介して主記憶読出しを行い、読み出したデータを信号線Ｌ３６を介して順次メッセージ送出部１０３に転送する。なお、メッセージ送出部１０３に転送する場合には、信号線Ｌ７を用いて有効信号も転送する。なお、有効信号とは、そのマシンサイクルにおいて、信号線Ｌ上に有効な読みだしデータがのっていることを示す信号である。メッセージ送出部１０３では、読出しデータを信号線Ｌ４を介して逐次ネットワークに送出する。送出したデータ数はメッセージ送出部１０３でカウントされ、そのカウント値が信号線Ｌ１６を介して伝えられる送信データ長と等しくなれば転送データの送出を完了し、これをもってメッセージ送出を完了する。
一方、送出したメッセージヘッダが要求メッセージ用であった場合には、メッセージヘッダの送出を完了し次第、メッセージ送出を完了する。
【００１７】
なお、メッセージ送出部１０３が上述のような動作を開始するためには、信号線Ｌ３を介してメッセージ送出開始信号が伝えられる必要がある。信号線Ｌ３は、信号線Ｌ１と信号線Ｌ２のＯＲ信号である。信号線Ｌ１は、前述したとおり命令プロセッサ２０２がメッセージ送出開始を要求した結果真値が伝えられる信号線であり、信号線Ｌ２は、メッセージ受信部内のヘッダ解析部１０７がメッセージ送出開始を要求して真値を伝える信号線である。また、メッセージ送出部１０３は、メッセージ送出を完了すると、その状態を信号線Ｌ４１を介して要求調停部１２２に伝える。命令プロセッサ２０２およびヘッダ解析部１０７がメッセージ送出開始を要求するためには、それぞれがまず要求調停部１２２に対して、メッセージ送出要求を伝える必要がある。命令プロセッサ２０２の要求は前述したとおり信号線Ｌ１３で伝えられ、ヘッダ解析部１０７の要求は信号線Ｌ１１で伝えられる。要求調停部１２２は、これらの要求を受けて何等かの形で優先度制御を行った後、メッセージ送出が完了している状態のときに、要求を認める側を示す信号を信号線Ｌ１２にのせる。信号線Ｌ１２の内容を見た命令プロセッサ２０２およびヘッダ解析部１０７は、その内容が自身を示していれば、前述のメッセージ送出開始を要求する。
【００１８】
メッセージ送信部が要求メッセージ送信とデータメッセージ送信の２種類のメッセージ送信を行うため、メッセージ受信部のメッセージ受取部１０６には、２種類のメッセージが到着しうる。概略的に述べると、メッセージ受信部は、要求メッセージが到着した場合には、同じネットワークインタフェース２０８内のメッセージ送信部に依頼して、要求されたデータ転送を開始する。また、データメッセージが到着した場合には、主記憶書き込み部１２８に依頼して主記憶装置への転送データの書き込みを行う。
メッセージが到着すると、メッセージ受取部１０６は、そのメッセージが伝える最初の情報であるメッセージヘッダ内のメッセージ種類９０２によってメッセージの種類を判別する。メッセージ種類が要求メッセージであった場合には、メッセージヘッダ９０１内のメッセージ種類９０２、送信元アドレス９０４、送信先アドレス９０５、送信データ長９０６、送信ストライド幅９０７の各情報を信号線Ｌ９を介してヘッダ解析部１０７内のヘッダレジスタ１０８に格納してメッセージ受信を完了する。
一方、メッセージ種類がデータメッセージであった場合には、メッセージヘッダ１００１内のメッセージ種類９０２、送信先アドレス９０５、送信データ長９０６、送信ストライド幅９０７の各情報を信号線Ｌ９を介してヘッダ解析部１０７内のヘッダレジスタ１０８に格納し、さらに後続する転送データを信号線Ｌ１０を介して主記憶書き込み部１２８に伝える。
【００１９】
メッセージ種類がデータメッセージであった場合、ヘッダ解析部１０７は、ヘッダレジスタ１０８内の送信先アドレスおよび送信ストライド幅をそれぞれ信号線Ｌ３１およびＬ３２を介してアドレス加算部１１２に伝え、送信データ長を、信号線Ｌ３５を介して主記憶書き込み部１２８内のカウンタ１２９に初期値として伝える。さらに、信号線Ｌ３３を介して主記憶書き込み部１２８に対して主記憶書き込みを要求する。要求を受けた主記憶書き込み部１２８は、アドレス加算部１１２が信号線Ｌ３４を介して与えるアドレスと、メッセージ受取部１０６から信号線Ｌ１０を介して伝えられ、主記憶書き込み部１２８内のレジスタ１３０にセットされるデータを持って主記憶アクセスを行い、これをカウンタ１２９に初期値として与えられた回数だけ繰り返す。メッセージ受取部１０６が送信データ長分の転送データを全て受取り、主記憶書き込み部１２８内のレジスタ１３０に最後のデータを書き込んだ時点で、メッセージ受取部１０６はメッセージ受信を完了し、メッセージ受取部１０６およびヘッダ解析部１０７は、受信したデータメッセージに対する処理を完了する。
【００２０】
一方、メッセージ種類が要求メッセージであった場合、ヘッダ解析部１０７は、前述のとおり、要求調停部１２２に対して信号線Ｌ１１を用いてメッセージ送出要求を伝え、然るべき後に要求調停部１２２から信号線Ｌ１２を介して要求を認める信号を受け取る。ヘッダ解析部１０７は、メッセージ送出要求が認められると、送信元アドレスレジスタ１１７、送信先アドレスレジスタ１１８、送信データ長レジスタ１１９および送信ストライド幅レジスタ１２０を順次選択し、各々のレジスタを指定するレジスタ選択信号を信号線Ｌ２２を介してセレクタ１２３に順次伝え、その都度各々のレジスタに書き込むべき値を、ヘッダレジスタ１０８の該当する領域から選択し、その値を順次信号線Ｌ２４でセレクタ１２４に伝える。送信元アドレスレジスタ１１７、送信先アドレスレジスタ１１８、送信データ長レジスタ１１９および送信ストライド幅レジスタ１２０の全てのレジスタへの値の設定が終了すると、ヘッダ解析部１０７は、メッセージ送出部１０３に対して信号線Ｌ２を介してメッセージ送出開始信号を伝達し、メッセージ送出を開始させる。ヘッダ解析部１０７は、メッセージ送出開始信号をメッセージ送出部１０３に伝達した時点で、受信した要求メッセージに対する処理を完了する。
【００２１】
ネットワークインタフェース２０８内のメッセージ送信部に対する処理依頼は、受信した要求メッセージに対するヘッダ解析部１０７の処理手順で説明したとおりの以下の手順で行われる。
（１）要求調停部１２２へのメッセージ送出要求伝達。
（２）要求調停部１２２からのメッセージ送出承認。
（３）送信元アドレスレジスタ１１７、送信先アドレスレジスタ１１８、送信データ長レジスタ１１９および送信ストライド幅レジスタ１２０へのメッセージ送信に係わるパラメータ値の設定。
（４）メッセージ送出部１０３へのメッセージ送出開始信号の伝達。
メッセージ送信部に対して処理を依頼する主体は、命令プロセッサ２０２およびヘッダ解析部１０７である。ヘッダ解析部１０７の処理依頼に係わる全動作については既に述べたとおりであり、命令プロセッサ２０２の処理依頼に係わる動作についても、（１）、（２）、（４）については既述した。命令プロセッサ２０２の（３）に係わる動作は、基本的にヘッダ解析部１０７の動作と同様であり、レジスタを指定するレジスタ選択信号を、（既に説明を加えたアクセスパスを介して最終的に）信号線Ｌ２１を介してセレクタ１２３に順次伝え、その都度各々のレジスタに書き込むべき値を、順次信号線Ｌ２３でセレクタ１２４に伝える。
なお、命令プロセッサ２０２からの処理依頼に対しては、結果として要求メッセージ送出とデータメッセージ送出の２種類が発行されうるが、ヘッダ解析部１０７からの処理依頼に対しては、結果としてデータメッセージ送出しか発行されえない。
【００２２】
送信元アドレスレジスタ１１７、送信先アドレスレジスタ１１８、送信データ長レジスタ１１９および送信ストライド幅レジスタ１２０への値の書き込みは、書き込み制御部１２１とセレクタ１２３およびセレクタ１２４を用いて実現する。
セレクタ１２３とセレクタ１２４は組となって機能し、セレクタ１２３が送信元アドレスレジスタ１１７、送信先アドレスレジスタ１１８、送信データ長レジスタ１１９および送信ストライド幅レジスタ１２０のいずれかを指定するレジスタ指定信号を信号線Ｌ１９を介して書き込み制御部１２１に伝達し、セレクタ１２４が信号線Ｌ１９で指定されるレジスタに書き込むべき値を信号線Ｌ２０を介して書き込み制御部１２１に伝達する。
信号線Ｌ１９の値は、信号線Ｌ２１および信号線Ｌ２２のうちのいずれかであり、信号線Ｌ２０の値は、信号線Ｌ２３および信号線Ｌ２４のうちのいずれかである。どちらを選択するかは、信号線Ｌ１２の値によって、すなわち、要求調停部１２２が命令プロセッサ２０２あるいはヘッダ解析部１０７のどちらに対してメッセージ送出承認を行っているかで定まる。要求調停部１２２が命令プロセッサ２０２に対してメッセージ送出承認を行っている場合には、信号線Ｌ２１の値と信号線Ｌ２３の値がそれぞれ信号線Ｌ１９の値と信号線Ｌ２０の値になる。要求調停部１２２がヘッダ解析部１０７に対してメッセージ送出承認を行っている場合には、信号線Ｌ２２の値と信号線Ｌ２４の値がそれぞれ信号線Ｌ１９の値と信号線Ｌ２０の値になる。
書き込み制御部１２１は、信号線Ｌ１９を介して伝わるレジスタ指定信号に基づいて送信元アドレスレジスタ１１７、送信先アドレスレジスタ１１８、送信データ長レジスタ１１９および送信ストライド幅レジスタ１２０のいずれかを選択し、その選択に対応して各々のレジスタに対応して設けられている書き込みパスＬ１８ａ、Ｌ１８ｂ、Ｌ１８ｃおよびＬ１８ｄのいずれかを有効にし、その有効になった書き込みパス上に信号線Ｌ２０を介して伝わる書き込み値をのせる。結果として、信号線Ｌ１９の値が指定するレジスタに信号線Ｌ２０の値が書き込まれる。
【００２３】
次に、先に概略的に述べた主記憶アクセス部の動作をさらに詳細に説明する。主記憶読出し部１２５は、データメッセージ送出時にメッセージ送出部１０３から信号線Ｌ６を介して伝わる起動信号（主記憶読出し要求）によって起動される。主記憶読出し部１２５は内部にカウンタ１２６およびデータ用レジスタ１２７を有する。カウンタ１２６には、主記憶読出し部１２５の起動時に信号線Ｌ２６を介して伝えられるその時点での送信データ長レジスタ１１９に格納されている値が初期値として設定される。以降、主記憶読出し部１２５が主記憶読出し要求を発行する度にカウンタ１２６の値は１づつ減じられる。主記憶読出し部１２５は、起動されてから以降、カウンタ１２６の値が０となるまで主記憶読出し要求の発行を繰り返す。カウンタ１２６の値が０となると、そのデータメッセージ送出に係わる主記憶読出し要求の発行は完了する。主記憶読出し要求の発行時、主記憶読出し部１２５は、主記憶アクセスコマンド線であるＬ３７に読出しコマンドを伝え、同時に主記憶読出しアドレス線Ｌ２８で読出しアドレスを伝える。なお、主記憶読出しアドレス線Ｌ２８はアドレス加算部１１６から伝えられる信号である。
【００２４】
アドレス加算部１１６は、内部に加算器１１５、セレクタ１１４、アドレス用レジスタ１１３を有する。加算器１１５は、アドレス用レジスタ１１３の値に、（（信号線Ｌ２７を介して伝えられる送信ストライド幅レジスタの値）×（送信単位データのバイトサイズ））の値を加えてその結果を信号線Ｌ２９に出力する。セレクタ１１４は、信号線Ｌ２９の値と信号線Ｌ１４を介して伝わる送信元アドレスレジスタ１１７の値のうちどちらかを選択し、その値をアドレス用レジスタ１１３にセットする。ただし、セレクタ１１４が信号線Ｌ１４の値を選択するのは、信号線Ｌ６によって主記憶読出し起動信号が伝わる時だけである。それ以外の場合は、信号線Ｌ２９の値を選択する。これによって、アドレス用レジスタ１１３の値を信号線Ｌ２８を介して主記憶読出しアドレスとして供給するアドレス加算部１１６は、主記憶読出し起動時にその回のデータメッセージ送出に係わる転送元データ領域の先頭アドレスを供給し、以降、その値にストライドを反映させた値を供給することができる。
主記憶読出しデータ線Ｌ３８を介して伝わる主記憶装置からの読出しデータは、逐次データ用レジスタ１２７で受け、信号線Ｌ３６を介してメッセージ送出部１０３に伝えられる。
【００２５】
一方、主記憶書き込み部１２８は、データメッセージ受信時にヘッダ解析部１０７から信号線Ｌ３３を介して伝わる起動信号（主記憶書き込み要求）によって起動される。主記憶書き込み部１２８は内部にカウンタ１２９およびデータ用レジスタ１３０を有する。カウンタ１２９には、主記憶書き込み部１２８の起動時に信号線Ｌ３５を介してヘッダ解析部１０７内のヘッダレジスタ１０８の該当領域から伝えられる送信データ長値が初期値として設定される。以降、主記憶書き込み部１２８が主記憶書き込み要求を発行する度にカウンタ１２９の値は１づつ減じられる。主記憶書き込み部１２８は、起動されてから以降、カウンタ１２９の値が０となるまで主記憶書き込み要求の発行を繰り返す。カウンタ１２９の値が０となると、そのデータメッセージ受信に係わる主記憶書き込みは完了する。主記憶書き込み要求の発行時、主記憶書き込み部１２８は、主記憶アクセスコマンド線であるＬ３９に書き込みコマンドを伝え、同時に主記憶読出しアドレス線Ｌ３４で書き込みアドレスを伝え、主記憶書き込みデータ線Ｌ４０を介して、メッセージ受取部１０６から信号線Ｌ１０を介してセットされているデータ用レジスタ１３０の値を伝える。なお、主記憶読出しアドレス線Ｌ３４はアドレス加算部１１２から伝えられる信号である。
【００２６】
アドレス加算部１１２は、アドレス加算部１１６と同様に、内部に加算器１１１、セレクタ１１０、アドレス用レジスタ１０９を有する。加算器は、アドレス用レジスタ１０９の値に、（（信号線Ｌ３２を介してヘッダ解析部１０７内のヘッダレジスタ１０８の該当領域から伝えられる送信ストライド幅値）×（送信単位データのバイトサイズ））の値を加えてその結果を信号線Ｌ３０に出力する。セレクタ１１０は、信号線Ｌ３０の値と信号線Ｌ３１を介してヘッダ解析部１０７内のヘッダレジスタ１０８の該当領域から伝わる送信先（書き込み先）アドレス値のうちどちらかを選択し、その値をアドレス用レジスタ１０９にセットする。ただし、セレクタ１１０が信号線Ｌ３１の値を選択するのは、信号線Ｌ３３によって主記憶書き込み起動信号が伝わる時だけである。それ以外の場合は、信号線Ｌ３０の値を選択する。これによって、アドレス用レジスタ１０９の値を信号線Ｌ３４を介して主記憶書き込みアドレスとして供給するアドレス加算部１１２は、主記憶書き込み起動時にその回のデータメッセージ受信に係わる転送先（書き込み先）データ領域の先頭アドレスを供給し、以降、その値にストライドを反映させた値を供給することができる。
【００２７】
以上で図１に示した本発明に基づくネットワークインタフェース２０８の構成および各部動作の説明を終了する。次に、本発明に係わるデータ転送方法に基づくデータ転送の処理の流れを説明する。
データ転送要求は、命令プロセッサ２０２から発行される。命令プロセッサ２０２は、ネットワークインタフェース２０８内の要求調停部１２２に対してデータ転送要求を発行し、要求調停部１２２からの許可を待つ。この時要求調停部１２２には、同じネットワークインタフェース２０８内のメッセージ受信部側（具体的には、ヘッダ解析部１０７）からもデータメッセージ送出要求が届いている場合があり、その場合には優先度制御の結果メッセージ受信部側に許可がおりる場合もある。
命令プロセッサ２０２は、要求調停部１２２からの許可を得ると、データ転送のためのパラメータである“ｓｒｃ−ａｄｒ”すなわち転送する一連のデータ群の先頭グローバルアドレス、“ｄｓｔ−ａｄｒ”すなわち転送データ群の書き込み先の先頭グローバルアドレス、“ｌｅｎｇｔｈ”すなわち転送データ量、“ｓｔｒｉｄｅ”すなわち転送対象データのメモリ領域における存在間隔をそれぞれ順番に送信元アドレスレジスタ１１７、送信先アドレスレジスタ１１８、送信データ長レジスタ１１９および送信ストライド幅レジスタ１２０にセットする。このパラメータの設定が終了すると、メッセージ送出部１０３へのデータ転送開始信号を伝達する。これでデータ転送が開始され、データ転送に係わる命令プロセッサ２０２の役割は終了する。
【００２８】
命令プロセッサ２０２から開始されたデータ転送に関して、送信元アドレスレジスタ１１７にセットされているパラメータ“ｓｒｃ−ａｄｒ”すなわち転送する一連のデータ群の先頭グローバルアドレスのＰＵ番号フィールド値が自プロセッサ番号である場合、メッセージ送出部１０３の制御の下、主記憶読出し部１２５が自らの主記憶装置２０７からデータを読み出して、データメッセージの送出が始まる。すなわち、実際にデータの転送が始まる。このデータメッセージは、送信先アドレスレジスタ１１８にセットされているパラメータ“ｄｓｔ−ａｄｒ”すなわち転送データ群の書き込み先先頭グローバルアドレスのＰＵ番号フィールド値が示す要素プロセッサ２０１に対して送出される。
データメッセージの送信先となった要素プロセッサ２０１は、ネットワークインタフェース２０８内のメッセージ受取部１０６でメッセージを受け、データメッセージであることを認識すると、メッセージ受取部１０６およびヘッダ解析部１０７の制御の下、主記憶書き込み部１２８が受信したデータを自らの主記憶装置２０７に書き込んでいく。全データの書き込みが終了した時点でこのデータ転送が完了する。
【００２９】
一方、命令プロセッサ２０２から開始されたデータ転送に関して、送信元アドレスレジスタ１１７にセットされているパラメータ“ｓｒｃ−ａｄｒ”すなわち転送する一連のデータ群の先頭グローバルアドレスのＰＵ番号フィールド値が自プロセッサ番号でない場合、そのＰＵ番号フィールド値が示す要素プロセッサ２０１に対してメッセージ送出部１０３が要求メッセージを送信する。
要求メッセージの送信先となった要素プロセッサ２０１は、ネットワークインタフェース２０８内のメッセージ受取部１０６でメッセージを受け、要求メッセージであることを認識すると、ヘッダ解析部１０７からデータメッセージ送信要求が発行される。ヘッダ解析部１０７は、ネットワークインタフェース２０８内の要求調停部１２２に対してデータメッセージ送信要求を発行し、要求調停部１２２からの許可を待つ。この時要求調停部１２２には、同じ要素プロセッサ２０１内の命令プロセッサ２０２からもデータ転送要求が届いている場合があり、その場合には優先度制御の結果命令プロセッサ２０２に許可がおりる場合もある。
【００３０】
ヘッダ解析部１０７は、要求調停部１２２からの許可を得ると、ヘッダレジスタ１０８に格納されている送信元アドレス９０４、送信先アドレス９０５、送信データ長９０６、送信ストライド幅９０７の各情報をそれぞれ順番に送信元アドレスレジスタ１１７、送信先アドレスレジスタ１１８、送信データ長レジスタ１１９および送信ストライド幅レジスタ１２０にセットする。この設定が終了すると、メッセージ送出部１０３へのデータメッセージ送出開始信号を伝達する。この時、送信元アドレスレジスタ１１７にセットされているグローバルアドレスのＰＵ番号フィールド値は常に自プロセッサ番号である。したがって、これでデータメッセージ送信が開始される。
以降、メッセージ送出部１０３の制御の下、主記憶読出し部１２５が自らの主記憶装置２０７からデータを読み出して、データメッセージを送出する。このデータメッセージは、送信先アドレスレジスタ１１８にセットされているグローバルアドレスのＰＵ番号フィールド値が示す要素プロセッサ２０１に対して送出される。
このデータメッセージの送信先となった要素プロセッサ２０１は、ネットワークインタフェース２０８内のメッセージ受取部１０６でメッセージを受け、データメッセージであることを認識すると、メッセージ受取部１０６およびヘッダ解析部１０７の制御の下、主記憶書き込み部１２８が受信したデータを自らの主記憶装置２０７に書き込んでいく。全データの書き込みが終了した時点でこのデータ転送が完了する。
以上が本発明に係わる実施例である。
なお、本実施例の変形例として次のものが考えられる。
【００３１】
（変形例１）
図３に示したデータ転送インタフェースを図４に示すようなインタフェースに変形する。図４の“ｓｒｃ−ａｄｒ”および“ｄｓｔ−ａｄｒ”は、グローバルアドレスではなく、それぞれ、送信元の要素プロセッサ２０１が所有する主記憶装置のアドレスおよび送信先の要素プロセッサ２０１が所有する主記憶装置のアドレスである。図４に示すインタフェースでは、“ｓｒｃ−ａｄｒ”および“ｄｓｔ−ａｄｒ”をグローバルアドレスとしない代わりに、データ転送の送信元および送信先をそれぞれ明示するための新たなパラメータ“ｓｒｃ−ＰＵ＃”および“ｄｓｔ−ＰＵ＃”を定義する。残りの“ｌｅｎｇｔｈ”、“ｓｔｒｉｄｅ”については図３のそれと同じである。
図４に示すインタフェースとした場合、図３に示すインタフェースのデータ転送に係わる要素プロセッサを意識しないでよいという特徴は失われるが、任意の要素プロセッサ間（主記憶装置間）のデータ転送を任意の要素プロセッサが起動できるという特徴はそのまま保有している。
【００３２】
図４に示すインタフェースを採用した場合の実施例からの機構上の主な変更点は以下の２点である。
（１）図７のような構成であった送信元アドレスレジスタ１１７および送信先アドレスレジスタ１１８を図８のような構成とし、このＰＵ番号レジスタ８０１とＰＵ内アドレスレジスタ８０２を連結して用いる。連結して用いれば、ＰＵ番号レジスタ８０１をＰＵ番号フィールド７０２として、さらに、ＰＵ内アドレスレジスタ８０２をＰＵ内アドレスフィールド７０３としてレジスタ７０１を擬似的に実現できる。
（２）図９、図１０に示したメッセージヘッダをそれぞれ図１１、図１２に示すように変更する。細かく記述すると、図９の送信元アドレス９０４は、図１１の送信元ＰＵ内アドレス１１０４に代わり、図９の送信先アドレス９０５は、図１１の送信先ＰＵ番号１１０５と送信先ＰＵ内アドレス１１０６に代わる。また、図１０の送信先アドレス９０５は、図１２の送信先ＰＵ内アドレス１２０５に代わる。
【００３３】
（変形例２）
バス２０９でメモリアクセスインタフェース２０３とネットワークインタフェース２０８を接続するのをやめ、メモリアクセスインタフェース２０３とネットワークインタフェース２０８を直結とする。この時、バスインタフェース部１０１に代わり新たなインタフェース処理部がネットワークインタフェース２０８内に必要となる。
【００３４】
【発明の効果】
本発明によれば、分散メモリ型並列計算機において、分散共有メモリ方式で実現される“データ転送起動者がデータまたはデータ群の帰属先プロセッサを特別意識する必要がない”というプログラム記述容易性の高さを継承した上で、分散共有メモリ方式上で実現されるデータ転送方式によっては従来実現できなかった数百、数千ワード以上のデータ群の一括転送が可能になった。
さらに、本発明によれば、任意要素プロセッサ間（任意主記憶装置間）のデータ転送が可能となり、また、データ転送起動者を、データ転送元あるいはデータ転送先のいずれとも規定しない。すなわち、要素プロセッサＢから要素プロセッサＣへのデータ転送を要素プロセッサＢでも要素プロセッサＣでもない要素プロセッサＡが指示できるようになった。これは、一方向のみのインタフェースであった従来のメッセージ・パッシング・インタフェースや、せいぜい２方向であった従来の分散共有メモリ方式上で実現されるデータ転送インタフェースを凌駕する多方向のインタフェースであり、この特徴によりプログラム記述容易性が一層向上する。
【図面の簡単な説明】
【図１】実施例におけるデータ転送方法を実現する分散メモリ型並列計算機の根幹であるネットワークインタフェースの構成図である。
【図２】実施例における並列計算機を構成する要素プロセッサの構成例を示す図である。
【図３】実施例におけるデータ転送インタフェースを示す図である。
【図４】変形例１におけるデータ転送インタフェースを示す図である。
【図５】従来のメッセージ・パッシング・インタフェースを示す図である。
【図６】実施例におけるグローバルアドレス空間を例示する図である。
【図７】実施例におけるグローバルアドレスを格納するためのレジスタを示す図である。
【図８】変形例１における並列計算機内の任意の主記憶アドレスを表現するための値の組を格納するレジスタ群を示す図である。
【図９】実施例における要求メッセージヘッダを示す図である。
【図１０】実施例におけるデータメッセージヘッダを示す図である。
【図１１】変形例１における要求メッセージヘッダを示す図である。
【図１２】変形例１におけるデータメッセージヘッダを示す図である。
【図１３】実施例におけるグローバルアドレスのフォーマットを例示する図である。
【図１４】実施例におけるデータ転送時のストライド値を説明するための図である。
【図１５】実施例におけるデータパケットを示す図である。
【符号の説明】
１０９アドレス用レジスタ
１１３アドレス用レジスタ
１２７データ用レジスタ
１３０データ用レジスタ
２０９バス
７０１グローバルアドレスレジスタ
８０１ＰＵ番号レジスタ
８０２ＰＵ内アドレスレジスタ
９０１要求メッセージヘッダ
１００１データメッセージヘッダ
１１０１要求メッセージヘッダ
１２０１データメッセージヘッダ
１３０１プロセッサ番号フィールド
１３０２オフセットフィールド[0001]
[Industrial application fields]
The present invention relates to a data transfer method in a distributed memory type parallel computer, a distributed memory type parallel computer and an element processor for realizing the method, and in particular, any element processor constituting the parallel computer includes any two (including itself) ( The present invention relates to a data transfer method that enables data transmission / reception between main memory units included in an element processor (transmission side and reception side), a distributed memory parallel computer, and an element processor that realize the data transfer method.
[0002]
[Prior art]
In an advanced information society in recent years, there has been a strong demand for an increase in processing amount and speeding up of information processing devices, and in order to answer these demands, a parallel computer configured by linking multiple arithmetic processors has been developed. It was. One parallel computer has several arithmetic processors, and the several arithmetic processors are configured to share one memory. This type of parallel computer is called a TCMP (Tightly Coupled Multi-Processor) type parallel computer. On the other hand, parallel computers having more arithmetic processors than the TCMP type, specifically, hundreds to thousands of arithmetic processors have appeared. This parallel computer has a method in which each arithmetic processor has a memory independently from the viewpoint of the difficulty of realization on hardware, without using a method in which all arithmetic processors share one memory. It is called a distributed memory type parallel computer.
A distributed memory type parallel computer can achieve higher performance than a TCMP type parallel computer. However, since the distributed memory type parallel computer is provided with the memory distributed to a plurality of arithmetic processors, the portability and the ease of programming of the program based on the conventional programming style assuming a single arithmetic processor and a single memory, etc. Some pointed out that there was a problem. Therefore, recently, a distributed shared memory system has been introduced that enables each arithmetic processor to refer to the memory of other arithmetic processors with respect to a distributed memory type parallel computer represented by research at Stanford University in the United States. The tendency to do is high.
[0003]
In order to realize a distributed shared memory, there is a problem of how to refer to a memory included in another arithmetic processor. This problem is solved by addressing. Specifically, the memory of another arithmetic processor is mapped to its own address space. The address space realized by this is hereinafter referred to as a global address space. FIG. 6 is an example of the global address space. The global address space 601 is divided into the number of element processors constituting the parallel computer. The divided areas 603, 605,... 607 are assigned to different element processors. .. 607 in each of the areas 603, 605,... 607 is mapped to the main storage device included in the corresponding element processor.
For example, in RP3 which is a parallel computer experimentally manufactured by IBM, "RP3 Processor-Memory Element" which is a manuscript of pages 782 to 789 of the 1985 International Conference on Parallel Processing, and Japanese Patent Publication No. 5-20776. As described in the above, a memory included in another arithmetic processor is referred to using an address in the form shown in FIG. In the address of FIG. 13, an arithmetic processor having a memory to be referred to is designated by a processor number field 1301, and an address in the memory is designated by an offset field 1302.
[0004]
[Problems to be solved by the invention]
In the conventional distributed shared memory system described above, when a certain arithmetic processor refers to a memory included in another arithmetic processor, a load / store instruction similar to that used when referring to the memory included in the own processor is used. In other words, if the distributed shared memory system is regarded as a data transfer interface between the element processors constituting the parallel computer, the conventional distributed shared memory implementation system can only realize data transfer in a small granularity in units of words.
For example, considering the application of this parallel computer to database processing, when a large-scale database copy (between memory) occurs, a large amount of data must be transferred in units of words, resulting in an increase in overhead and performance. The problem is big.
In this interface, the processor that activates data transfer must always be either the data transfer source or the data transfer destination. That is, this interface is an interface in only two directions.
On the other hand, the message passing interface that is basically supported by the distributed memory parallel computer is an interface that can transfer several words to several hundreds or thousands of words at a time. However, in the conventional message passing interface, it is necessary to explicitly specify the processor number of the data transmission destination. In addition, the data to be transmitted must exist in the memory of the own processor. That is, the conventional message passing interface is a one-way interface from its own processor to another processor.
SUMMARY OF THE INVENTION An object of the present invention is to provide a data transfer method that is not conscious of a processor to which a data group belongs, and that targets a variable amount of data group and that can be started by an arbitrary element processor between arbitrary element processors. An object of the present invention is to provide a distributed memory type parallel computer and an element processor.
[0005]
[Means for Solving the Problems]
In order to solve the above-described problems, the present invention introduces the concept of a global address space realized by a distributed shared memory method to a data transfer method based on a conventional message passing interface.
Specifically, as shown in FIG. 5, the start address (src-adr), the destination processor number (dst-PU #), and the transmission of the data group to be transmitted in the transmission source processor (in this case, the own processor) Conventionally expressed by five parameters, the first address (dst-adr), the data transfer amount (length), and the existence interval (stride) in the memory area of the data to be transmitted / received. As shown in FIG. 3, the head global address (src-adr) of the data group to be transmitted, the head global address (dst-adr) of the write destination of the transfer data group, and the data transfer amount, as shown in FIG. (Length) and the existence interval (strid) in the memory area of transmission / reception target data ) Defines the interface represented by four parameters:. This interface is a distributed memory type parallel computer that refers to all the main storage devices belonging to each of the element processors constituting the parallel computer by a global address space, and from any global address area to any other global address. It is an interface that realizes data copy between memory areas to an area. The present invention is characterized in that data transfer is realized in a copy mode using such an interface.
[0006]
[Action]
According to the present invention, data transfer can be realized with the concept of data copy between memory areas in a distributed memory type parallel computer by the above means. Therefore, even when referring to the global address space, a data group of several hundreds or thousands of words or more from a word unit can be targeted at a time.
Further, from the viewpoint of data transfer, it is not necessary for the data transfer initiator to be aware of the processor to which the data or data group belongs. This feature has the effect of improving the ease of describing a program for a parallel computer to which the above means is applied.
Further, in the data transfer method realized by the above means, data transfer between arbitrary element processors (arbitrary main storage devices) is possible, and the data transfer initiator is designated as either the data transfer source or the data transfer destination. not regulated. In other words, the element processor A that is neither the element processor B nor the element processor C can instruct data transfer from the element processor B to the element processor C. This is a multi-directional interface that surpasses the conventional message passing interface that was a one-way interface and the data transfer interface based on the conventional distributed shared memory method that was at most two-way. The effect of improving the ease of describing a program is also great. This feature is considered to be particularly effective for server-client model program description.
[0007]
【Example】
Embodiments of the present invention will be described in detail with reference to the drawings.
FIG. 2 shows an embodiment of an element processor constituting a distributed memory type parallel computer. In FIG. 1, an element processor 201 is connected to an instruction processor 202 that performs program processing and an instruction processor 202, and a main storage device 207 and an I / O device to be described later according to a command / address / data combination issued from the instruction processor 202. 205 and a memory access interface 203 that issues access to the inside of the network interface 208, an I / O interface 204, a memory control unit 206, another element processor (having the same configuration as 201), and an inter-element processor connection network (network ) Through which a packet and data are transferred, an I / O device 205 connected to the I / O interface 204, a main storage device 207 connected to the memory control unit 206, And consists of a bus 209 connecting the memory access interface 203, I / O interface 204, the memory control unit 206 and a network interface 208.
The present invention relates to a network interface 208 that is the basis of a data transfer mechanism for realizing data transfer.
[0008]
Next, the data transfer interface defined in the present invention will be described.
FIG. 3 represents the data transfer interface defined in the present invention in the form of a function using a programming language such as C language. When data transfer is expressed in a program for a parallel computer to which the interface is applied, it is actually described in a form similar to FIG.
The first parameter “src-adr” in FIG. 3 is the leading global address of a series of data groups to be transferred. The second parameter “dst-adr” is the leading global address of the transfer data group write destination. The third parameter “length” is the amount of transfer data, and the fourth parameter “stride” is the existence interval of the transfer target data in the memory area. As shown in FIG. 14, “stride” is, for example, 1 when transfer target data that are consecutive in the transfer order are adjacent in the address order, and 2 when every other data in the address order (the same applies hereinafter).
[0009]
In the data transfer interface defined by the present invention, it is necessary to specify at least the first, second, and third parameters among the four parameters shown in FIG. 3 (stride is fixed to 1). Conversely, if this data transfer method is expanded, other parameters can be set).
FIG. 3 shows that “length” × “stride” data areas starting from a global address “src-adr” are read out by “length” data at “stride” intervals, and all the read data are This represents an operation in which only “length” data is written at “stride” intervals into “length” × “stride” data areas starting from a global address “dst-adr”. That is, using this interface can be regarded as data copy between memory areas from any global address area to any other global address area, rather than data transfer.
In addition, what is expressed as a global address in the above is an address on the global address space as illustrated in FIG. 6, and takes, for example, a format as shown in FIG. In the address of FIG. 13, an element processor having a memory to be referred to is designated by a processor number field 1301, and an address in the memory is designated by an offset field 1302. 6 is divided into the number of element processors, and the divided areas 603, 605,... 607 are assigned to different element processors. .. 607 in each of the areas 603, 605,... 607 is mapped to the main memory included in the corresponding element processor.
[0010]
Next, the configuration and operation of each part of the network interface 208 according to the present invention will be described in detail with reference to FIG. Note that the signal lines L15, L16, and L17 in FIG. 1 will be described in advance so as not to be misunderstood. Each signal line is written with an image running below or behind the square representing some registers for the sake of space. The signal line L15 passes under the transmission source address register 117 and serves as an input signal of the message transmission unit 103. The signal line L16 passes under the transmission destination address register 118 and the transmission source address register 117 and serves as an input signal of the message transmission unit 103. The signal line L17 passes under the transmission data length register 119, the transmission destination address register 118, and the transmission source address register 117, and becomes an input signal of the message transmission unit 103.
The connection relationship between the network interface 208, the bus 209, and the network in FIG. 1 is as described above with reference to FIG.
[0011]
The network interface 208 is roughly composed of four parts: a message transmission unit, a message reception unit, a main memory access unit, and a bus interface unit 101.
The message transmission unit includes a message transmission unit 103, a transmission source address register 117, a transmission destination address register 118, a transmission data length register 119, a transmission stride width register 120, a write control unit 121, a selector 123, a selector 124, and an own processor. A request arbitration unit 122 that arbitrates a request and a data transfer request via the network (from the header analysis unit), an address addition unit 116, a comparator 104, a self PU number register 105 (PU is an abbreviation for an element processor), and the like.
The message receiving unit includes a message receiving unit 106, a header analyzing unit 107, an address adding unit 112, and the like.
The main memory access unit includes a main memory reading unit 125 and a main memory writing unit 128.
The bus interface unit 101 is connected to the bus 209, receives the following three types of requests transmitted from the instruction processor 202 through the memory access interface 203 and further via the bus 209, and performs necessary processing.
(1) Transmission of a message transmission start command signal L1 to the message transmission unit 103.
(2) Transmission of the message transmission request signal L13 to the request arbitration unit 122.
(3) Write values to the transmission source address register 117, transmission destination address register 118, transmission data length register 119, and transmission stride width register 120.
Further, the bus interface unit 101 transmits the arbitration result of the request arbitration unit 122 to the instruction processor 202 via the memory access interface 203 via the bus 209. The bus interface unit 101 realizes main memory access from the main memory reading unit 125 and the main memory writing unit 128 in the network interface 208.
[0012]
A transmission source address register 117, a transmission destination address register 118, a transmission data length register 119, and a transmission stride width register 120 in the message transmission unit are each the four parameters “src-adr” shown in FIG. For storing the leading global address of the data group, “dst-adr”, that is, the leading global address of the write destination of the transfer data group, “length”, that is, the transfer data amount, and “stride”, that is, the interval in the memory area of the transfer target data It is a register.
In the transmission source address register 117 and the transmission destination address register 118, global addresses having the format illustrated in FIG. 13 are stored, and therefore, a register 701 having the form shown in FIG. 7 is used. The register 701 has a PU number field 702 and an intra-PU address field 703 for storing the processor number field 1301 and the offset field 1302 in FIG.
[0013]
From the message transmission unit, two types of message transmission, request message transmission and data message transmission, can occur according to the contents of the source address register 117. As a result of comparing the value of the signal line L25 that conveys the contents of the PU number field of the transmission source address register 117 with the value of the own PU number register 105, if the values are equal, data message transmission occurs. On the other hand, if the comparison results in the comparator 104 indicate that the values are different, a request message transmission requesting data transfer to the element processor indicated by the contents of the PU number field of the transmission source address register 117 occurs.
Separate message headers 901 and 1001 shown in FIGS. 9 and 10 are defined for each of the request message and the data message.
[0014]
The message header 901 of FIG. 9 for the request message includes information such as a message type 902, a transmission source PU number 903, a transmission source address 904, a transmission destination address 905, a transmission data length 906, a transmission stride width 907, and the like. The message type 902 is information (1 bit is acceptable) indicating whether the message is a request message / data message. In this case, the message type 902 indicates a request message. The transmission source PU number 903 is the content of the PU number field of the transmission source address register 117 transmitted to the message transmission unit 103 via the signal line L14, that is, has a main storage device in which data to be transferred is stored. Element processor number. The transmission source PU number 903 is also the number of the transmission destination element processor of the request message itself. The transmission source address 904, the transmission destination address 905, the transmission data length 906, and the transmission stride width 907 are respectively transmitted to the message transmission unit 103 via the signal lines L14, L15, L16, and L17, and the transmission destination address. The contents of the register 118, the transmission data length register 119, and the transmission stride width register 120 are shown.
[0015]
The message header 1001 of FIG. 10 for the data message includes information such as a message type 902, a transmission destination PU number 1003, a transmission destination address 905, a transmission data length 906, a transmission stride width 907, and the like. The message type 902 is information indicating a request message / data message as described above, and in this case, indicates a data message. The transmission destination PU number 1003 is the content of the PU number field of the transmission destination address register 118 transmitted to the message transmission unit 103 via the signal line L15, that is, the element processor having the main storage device to which the data to be transferred is to be written. Number. The destination PU number 1003 is also the number of the destination element processor of this data message itself. The transmission destination address 905, the transmission data length 906, and the transmission stride width 907 are transmitted to the message transmission unit 103 via the signal lines L15, L16, and L17, respectively, the transmission destination address register 118, the transmission data length register 119, and the transmission stride width register. 120 contents.
[0016]
The message sending unit 103 makes a request message as described above based on the information transmitted via the signal lines L14, L15, L16, and L17 in accordance with the comparison result in the comparator 104 transmitted via the signal line L8. The message headers for the data message and the data message are created separately, and the message header is sent to the network via the signal line L4, thereby classifying the request message and the data message. Further, when the transmission is a data message transmission, the transfer data is transmitted following the transmission of the message header. A data message packet is shown in FIG.
Transmission of the transfer data is realized by the message transmission unit 103 transmitting a main memory read request to the main memory read unit 125 via the signal line L6. The main memory reading unit 125 performs main memory reading via the bus interface unit 101, and sequentially transfers the read data to the message sending unit 103 via the signal line L36. In addition, when transferring to the message transmission part 103, a valid signal is also transferred using the signal line L7. The valid signal is a signal indicating that valid read data is carried on the signal line L in the machine cycle. The message sending unit 103 sequentially sends the read data to the network via the signal line L4. The number of transmitted data is counted by the message transmission unit 103. When the count value becomes equal to the transmission data length transmitted via the signal line L16, the transmission of the transfer data is completed, and the message transmission is completed.
On the other hand, when the transmitted message header is for a request message, the message transmission is completed as soon as the message header transmission is completed.
[0017]
In order for the message sending unit 103 to start the operation as described above, a message sending start signal needs to be transmitted via the signal line L3. The signal line L3 is an OR signal of the signal line L1 and the signal line L2. The signal line L1 is a signal line through which a true value is transmitted as a result of the instruction processor 202 requesting the message transmission start as described above, and the signal line L2 is transmitted when the header analysis unit 107 in the message receiving unit requests the message transmission start. This is a signal line that conveys the true value. Further, when the message transmission unit 103 completes the message transmission, the message transmission unit 103 notifies the request arbitration unit 122 of the state via the signal line L41. In order for the instruction processor 202 and the header analysis unit 107 to request the start of message transmission, each must first transmit a message transmission request to the request arbitration unit 122. The request from the instruction processor 202 is transmitted through the signal line L13 as described above, and the request from the header analysis unit 107 is transmitted through the signal line L11. The request arbitration unit 122 receives these requests, performs priority control in some form, and then sends a signal to the signal line L12 to indicate the side accepting the request when the message transmission is completed. Make it. The instruction processor 202 and the header analysis unit 107 that have seen the contents of the signal line L12 request the above-described message transmission start if the contents indicate themselves.
[0018]
Since the message transmission unit transmits two types of messages, that is, request message transmission and data message transmission, two types of messages can arrive at the message reception unit 106 of the message reception unit. In general, when a request message arrives, the message receiving unit requests the message transmitting unit in the same network interface 208 to start the requested data transfer. When the data message arrives, the main memory writing unit 128 is requested to write the transfer data to the main storage device.
When the message arrives, the message receiving unit 106 determines the message type based on the message type 902 in the message header which is the first information transmitted by the message. If the message type is a request message, the information of the message type 902, the transmission source address 904, the transmission destination address 905, the transmission data length 906, and the transmission stride width 907 in the message header 901 is transmitted via the signal line L9. The message is stored in the header register 108 in the header analysis unit 107 and the message reception is completed.
On the other hand, if the message type is a data message, the header analysis unit transmits the information on the message type 902, the transmission destination address 905, the transmission data length 906, and the transmission stride width 907 in the message header 1001 via the signal line L9. The data is stored in the header register 108 in 107, and subsequent transfer data is transmitted to the main memory writing unit 128 via the signal line L10.
[0019]
When the message type is a data message, the header analysis unit 107 transmits the transmission destination address and the transmission stride width in the header register 108 to the address addition unit 112 via the signal lines L31 and L32, respectively, and the transmission data length is The initial value is transmitted to the counter 129 in the main memory writing unit 128 via the signal line L35. Further, the main memory writing unit 128 is requested to perform main memory writing via the signal line L33. The main memory writing unit 128 that has received the request is notified of the address given by the address adding unit 112 via the signal line L34 and the message receiving unit 106 via the signal line L10, and is sent to the register 130 in the main memory writing unit 128. The main memory access is performed with the data to be set, and this is repeated as many times as the initial value given to the counter 129. When the message receiving unit 106 receives all the transfer data for the transmission data length and writes the last data to the register 130 in the main memory writing unit 128, the message receiving unit 106 completes the message reception, and the message receiving unit 106 And the header analysis part 107 completes the process with respect to the received data message.
[0020]
On the other hand, if the message type is a request message, the header analysis unit 107 transmits a message transmission request to the request arbitration unit 122 using the signal line L11 as described above, and then the request arbitration unit 122 sends a signal line. A signal confirming the request is received via L12. When a message transmission request is accepted, the header analysis unit 107 sequentially selects a transmission source address register 117, a transmission destination address register 118, a transmission data length register 119, and a transmission stride width register 120, and register selection for designating each register A signal is sequentially transmitted to the selector 123 via the signal line L22, and a value to be written to each register is selected from the corresponding area of the header register 108 each time, and the value is sequentially transmitted to the selector 124 via the signal line L24. When the setting of values in all of the transmission source address register 117, transmission destination address register 118, transmission data length register 119, and transmission stride width register 120 is completed, the header analysis unit 107 sends a signal to the message transmission unit 103. A message transmission start signal is transmitted via the line L2, and message transmission is started. The header analysis unit 107 completes the processing for the received request message when the message transmission start signal is transmitted to the message transmission unit 103.
[0021]
The processing request to the message transmission unit in the network interface 208 is performed according to the following procedure as described in the processing procedure of the header analysis unit 107 for the received request message.
(1) Message transmission request transmission to the request arbitration unit 122.
(2) Message transmission approval from the request arbitration unit 122.
(3) Setting of parameter values related to message transmission to the transmission source address register 117, transmission destination address register 118, transmission data length register 119, and transmission stride width register 120.
(4) Transmission of a message transmission start signal to the message transmission unit 103.
The main body that requests processing to the message transmission unit is the instruction processor 202 and the header analysis unit 107. All the operations related to the processing request of the header analysis unit 107 are as described above, and the operations related to the processing request of the instruction processor 202 are also described above for (1), (2), and (4). The operation related to (3) of the instruction processor 202 is basically the same as the operation of the header analysis unit 107, and a register selection signal for designating a register is sent (finally via the access path already described). The value to be written to each register is sequentially transmitted to the selector 123 via the signal line L21, and the value to be written in each register is sequentially transmitted to the selector 124 via the signal line L23.
Note that two types of request message transmission and data message transmission can be issued as a result for a processing request from the instruction processor 202, but a data message transmission as a result for a processing request from the header analysis unit 107. Can only be issued.
[0022]
Writing values to the transmission source address register 117, transmission destination address register 118, transmission data length register 119, and transmission stride width register 120 is realized using a write control unit 121, a selector 123, and a selector 124.
The selector 123 and the selector 124 function as a pair, and the selector 123 outputs a register designation signal for designating one of the transmission source address register 117, the transmission destination address register 118, the transmission data length register 119, and the transmission stride width register 120. The value is transmitted to the write control unit 121 via the line L19, and the selector 124 transmits the value to be written to the register designated by the signal line L19 to the write control unit 121 via the signal line L20.
The value of the signal line L19 is one of the signal line L21 and the signal line L22, and the value of the signal line L20 is one of the signal line L23 and the signal line L24. Which one is selected depends on the value of the signal line L12, that is, whether the request arbitration unit 122 has approved the message transmission to the instruction processor 202 or the header analysis unit 107. When the request arbitration unit 122 approves message transmission to the instruction processor 202, the value of the signal line L21 and the value of the signal line L23 become the value of the signal line L19 and the value of the signal line L20, respectively. When the request arbitration unit 122 approves message transmission to the header analysis unit 107, the value of the signal line L22 and the value of the signal line L24 become the value of the signal line L19 and the value of the signal line L20, respectively.
The write control unit 121 selects one of the transmission source address register 117, the transmission destination address register 118, the transmission data length register 119, and the transmission stride width register 120 based on the register designation signal transmitted through the signal line L19. One of the write paths L18a, L18b, L18c and L18d provided corresponding to each register corresponding to the selection is validated, and the write value transmitted via the signal line L20 on the validated write path Put on. As a result, the value of the signal line L20 is written into the register specified by the value of the signal line L19.
[0023]
Next, the operation of the main memory access unit outlined above will be described in more detail. The main memory reading unit 125 is activated by an activation signal (main memory reading request) transmitted from the message transmission unit 103 via the signal line L6 when a data message is transmitted. The main memory reading unit 125 has a counter 126 and a data register 127 inside. In the counter 126, the value stored in the transmission data length register 119 at that time, which is transmitted via the signal line L26 when the main memory reading unit 125 is started, is set as an initial value. Thereafter, each time the main memory read unit 125 issues a main memory read request, the value of the counter 126 is decremented by one. After being activated, the main memory reading unit 125 repeatedly issues a main memory reading request until the value of the counter 126 becomes zero. When the value of the counter 126 becomes 0, the issue of the main memory read request relating to the transmission of the data message is completed. When the main memory read request is issued, the main memory read unit 125 transmits a read command to the main memory access command line L37 and simultaneously transmits a read address through the main memory read address line L28. The main memory read address line L28 is a signal transmitted from the address adder 116.
[0024]
The address adder 116 includes an adder 115, a selector 114, and an address register 113 inside. The adder 115 adds the value of ((value of the transmission stride width register transmitted via the signal line L27) × (byte size of transmission unit data)) to the value of the address register 113, and adds the result to the signal line. Output to L29. The selector 114 selects either the value of the signal line L 29 or the value of the transmission source address register 117 transmitted via the signal line L 14 and sets the value in the address register 113. However, the selector 114 selects the value of the signal line L14 only when the main memory read activation signal is transmitted through the signal line L6. In other cases, the value of the signal line L29 is selected. As a result, the address adder 116 that supplies the value of the address register 113 as the main memory read address via the signal line L28 sets the start address of the transfer source data area related to the data message transmission at the time of starting the main memory read. After that, a value reflecting the stride in the value can be supplied.
Read data from the main memory transmitted through the main memory read data line L38 is received by the sequential data register 127 and transmitted to the message sending unit 103 through the signal line L36.
[0025]
On the other hand, the main memory writing unit 128 is activated by an activation signal (main memory write request) transmitted from the header analysis unit 107 via the signal line L33 when a data message is received. The main memory writing unit 128 has a counter 129 and a data register 130 therein. In the counter 129, a transmission data length value transmitted from the corresponding area of the header register 108 in the header analysis unit 107 via the signal line L35 when the main memory writing unit 128 is activated is set as an initial value. Thereafter, every time the main memory writing unit 128 issues a main memory write request, the value of the counter 129 is decremented by one. After being activated, the main memory writing unit 128 repeats issuing the main memory write request until the value of the counter 129 becomes zero. When the value of the counter 129 becomes 0, the main memory writing related to the data message reception is completed. When the main memory write request is issued, the main memory writing unit 128 transmits a write command to the main memory access command line L39, and simultaneously transmits a write address through the main memory read address line L34, and passes through the main memory write data line L40. Then, the value of the data register 130 set through the signal line L10 is transmitted from the message receiving unit 106. The main memory read address line L34 is a signal transmitted from the address adder 112.
[0026]
Similarly to the address addition unit 116, the address addition unit 112 includes an adder 111, a selector 110, and an address register 109. The adder adds the value of the address register 109 to ((transmission stride width value transmitted from the corresponding area of the header register 108 in the header analysis unit 107 via the signal line L32) × (byte size of transmission unit data)). And the result is output to the signal line L30. The selector 110 selects either the value of the signal line L30 or the transmission destination (write destination) address value transmitted from the corresponding area of the header register 108 in the header analysis unit 107 via the signal line L31, and sets the value as the address. Set in the register 109. However, the selector 110 selects the value of the signal line L31 only when the main memory write activation signal is transmitted through the signal line L33. In other cases, the value of the signal line L30 is selected. As a result, the address adder 112 that supplies the value of the address register 109 as the main memory write address via the signal line L34 is the transfer destination (write destination) data area related to the reception of the data message at the time of starting the main memory write. Can be supplied, and thereafter, a value reflecting the stride can be supplied.
[0027]
This is the end of the description of the configuration and operation of each part of the network interface 208 based on the present invention shown in FIG. Next, the flow of data transfer processing based on the data transfer method according to the present invention will be described.
The data transfer request is issued from the instruction processor 202. The instruction processor 202 issues a data transfer request to the request arbitration unit 122 in the network interface 208 and waits for permission from the request arbitration unit 122. At this time, the request arbitration unit 122 may receive a data message transmission request also from the message reception unit side (specifically, the header analysis unit 107) in the same network interface 208. As a result of the control, permission may be given to the message receiver.
When the instruction processor 202 obtains permission from the request arbitration unit 122, “src-adr”, which is a parameter for data transfer, that is, the leading global address of a series of data groups to be transferred, “dst-adr”, that is, transfer data groups The first global address of the write destination, “length”, that is, the amount of transfer data, and “stride”, that is, the existence interval in the memory area of the data to be transferred, are sequentially assigned to the source address register 117, the destination address register 118, and the transmission data length register 119 And the transmission stride width register 120 is set. When this parameter setting is completed, a data transfer start signal is transmitted to the message sending unit 103. Thus, data transfer is started, and the role of the instruction processor 202 related to data transfer ends.
[0028]
Regarding the data transfer started from the instruction processor 202, when the parameter “src-adr” set in the source address register 117, that is, the PU number field value of the leading global address of a series of data to be transferred is the own processor number Under the control of the message sending unit 103, the main memory reading unit 125 reads the data from its own main storage device 207 and starts sending the data message. That is, data transfer actually starts. This data message is sent to the element processor 201 indicated by the parameter “dst-adr” set in the transmission destination address register 118, that is, the PU number field value of the write destination head global address of the transfer data group.
When the element processor 201 that is the transmission destination of the data message receives the message at the message receiving unit 106 in the network interface 208 and recognizes that it is a data message, under the control of the message receiving unit 106 and the header analyzing unit 107, The main memory writing unit 128 writes the received data to its own main storage device 207. This data transfer is completed when all the data has been written.
[0029]
On the other hand, regarding the data transfer started from the instruction processor 202, the parameter “src-adr” set in the source address register 117, that is, the PU number field value of the leading global address of the series of data to be transferred is not the own processor number. In this case, the message sending unit 103 sends a request message to the element processor 201 indicated by the PU number field value.
When the element processor 201 that has transmitted the request message receives the message at the message receiving unit 106 in the network interface 208 and recognizes that the request message is received, a data message transmission request is issued from the header analysis unit 107. The header analysis unit 107 issues a data message transmission request to the request arbitration unit 122 in the network interface 208 and waits for permission from the request arbitration unit 122. At this time, the request arbitration unit 122 may receive a data transfer request from the instruction processor 202 in the same element processor 201. In this case, the instruction processor 202 may be permitted as a result of priority control. is there.
[0030]
Upon obtaining permission from the request arbitration unit 122, the header analysis unit 107 sequentially transmits the information of the transmission source address 904, transmission destination address 905, transmission data length 906, and transmission stride width 907 stored in the header register 108, respectively. Are set in the transmission source address register 117, transmission destination address register 118, transmission data length register 119, and transmission stride width register 120. When this setting is completed, a data message transmission start signal is transmitted to the message transmission unit 103. At this time, the PU number field value of the global address set in the transmission source address register 117 is always the own processor number. Thus, data message transmission is started.
Thereafter, under the control of the message sending unit 103, the main memory reading unit 125 reads the data from its own main storage device 207 and sends the data message. This data message is sent to the element processor 201 indicated by the PU number field value of the global address set in the destination address register 118.
When the element processor 201 that is the transmission destination of the data message receives the message at the message receiving unit 106 in the network interface 208 and recognizes that the data message is a data message, the element processor 201 is under the control of the message receiving unit 106 and the header analyzing unit 107. The main memory writing unit 128 writes the data received in its own main storage device 207. This data transfer is completed when all the data has been written.
The above is an embodiment according to the present invention.
In addition, the following can be considered as a modification of a present Example.
[0031]
(Modification 1)
The data transfer interface shown in FIG. 3 is transformed into the interface shown in FIG. “Src-adr” and “dst-adr” in FIG. 4 are not global addresses, but the address of the main memory owned by the source element processor 201 and the main memory owned by the destination element processor 201, respectively. Address. In the interface shown in FIG. 4, instead of using “src-adr” and “dst-adr” as global addresses, a new parameter “src-PU #” for specifying the data transfer source and destination respectively. Define “dst-PU #”. The remaining “length” and “stride” are the same as those in FIG.
In the case of the interface shown in FIG. 4, the feature that the element processor related to the data transfer of the interface shown in FIG. 3 does not need to be considered is lost, but the data transfer between any element processors (main storage devices) is arbitrary. The feature that the element processor can be activated is retained as it is.
[0032]
The main changes in the mechanism from the embodiment when the interface shown in FIG. 4 is adopted are the following two points.
(1) The transmission source address register 117 and the transmission destination address register 118 that are configured as shown in FIG. 7 are configured as shown in FIG. 8, and the PU number register 801 and the intra-PU address register 802 are connected and used. When connected and used, the register 701 can be realized in a pseudo manner by using the PU number register 801 as the PU number field 702 and the PU address register 802 as the PU address field 703.
(2) The message headers shown in FIGS. 9 and 10 are changed as shown in FIGS. 11 and 12, respectively. More specifically, the transmission source address 904 in FIG. 9 replaces the transmission source PU address 1104 in FIG. 11, and the transmission destination address 905 in FIG. 9 changes to the transmission destination PU number 1105 and the transmission destination PU address 1106 in FIG. replace. Further, the transmission destination address 905 in FIG. 10 replaces the transmission destination PU address 1205 in FIG.
[0033]
(Modification 2)
The bus 209 stops connecting the memory access interface 203 and the network interface 208, and the memory access interface 203 and the network interface 208 are directly connected. At this time, a new interface processing unit is required in the network interface 208 instead of the bus interface unit 101.
[0034]
【The invention's effect】
According to the present invention, in a distributed memory type parallel computer, high program description easiness that “data transfer initiator does not need to be particularly aware of the processor to which the data or the data group belongs” realized by the distributed shared memory system. Inheriting the above, it became possible to transfer data groups of several hundreds or thousands of words, which could not be realized in the past depending on the data transfer method realized on the distributed shared memory method.
Furthermore, according to the present invention, data transfer between arbitrary element processors (arbitrary main storage devices) is possible, and the data transfer initiator is not defined as either a data transfer source or a data transfer destination. That is, the element processor A that is neither the element processor B nor the element processor C can instruct data transfer from the element processor B to the element processor C. This is a multi-directional interface that surpasses the conventional message passing interface that was a one-way interface and the data transfer interface realized on the conventional distributed shared memory method that was at most two-way. This feature further improves the ease of program description.
[Brief description of the drawings]
FIG. 1 is a configuration diagram of a network interface that is the basis of a distributed memory parallel computer that implements a data transfer method according to an embodiment;
FIG. 2 is a diagram illustrating a configuration example of an element processor included in the parallel computer in the embodiment.
FIG. 3 is a diagram illustrating a data transfer interface in the embodiment.
FIG. 4 is a diagram illustrating a data transfer interface in a first modification.
FIG. 5 illustrates a conventional message passing interface.
FIG. 6 is a diagram illustrating a global address space in the embodiment.
FIG. 7 is a diagram illustrating a register for storing a global address in the embodiment.
FIG. 8 is a diagram illustrating a register group that stores a set of values for expressing an arbitrary main memory address in a parallel computer according to Modification 1;
FIG. 9 is a diagram illustrating a request message header in the embodiment.
FIG. 10 is a diagram illustrating a data message header in the embodiment.
FIG. 11 is a diagram showing a request message header in Modification 1;
12 is a diagram showing a data message header in Modification 1. FIG.
FIG. 13 is a diagram illustrating a format of a global address in the embodiment.
FIG. 14 is a diagram for explaining a stride value at the time of data transfer in the embodiment.
FIG. 15 is a diagram illustrating a data packet in the embodiment.
[Explanation of symbols]
109 Register for address
113 Address register
127 Data register
130 Data register
209 Bus
701 Global address register
801 PU number register
802 Address register in PU
901 Request message header
1001 Data message header
1101 Request message header
1201 Data message header
1301 Processor number field
1302 Offset field

Claims

Each of the plurality of element processors each including an instruction processor, a main storage device, and a network interface unit, and a network connecting the plurality of element processors is unique to the main storage device included in the plurality of element processors . in parallel computer divided region of space with a global address is assigned respectively, the Dede over data transfer between the processor elements a row arm over data transfer method,
An instruction processor provided in any one of the plurality of element processors specifies a transfer source global address, a transfer destination global address, and a transfer data amount of data to be transferred, and transmits the specified data to the network interface unit of the processor element.
In the network interface part that received the communication,
Determining whether the designated transfer source global address is within an area allocated to a main storage device of the processor of its own element;
If the designated transfer source global address is within an area allocated to the main storage device of the own element processor, the main memory of the own element processor is based on the transfer source global address and the designated transfer data amount. Read data to be transferred from the device, create a data message including at least the designated transfer destination global address and the read data and with information indicating that the message is a data transfer message, and transferring the data to the transfer destination global Transmitting to the element processor having the main storage device to which the area including the address is allocated , via the network,
If the designated transfer source global address is not within the area allocated to the main storage device of the local processor, the designated transfer source global address, the designated transfer destination global address, and the designated transfer A transfer request message including information indicating that the message is a data transfer request message including at least data amount information, and a main storage device to which an area including the designated transfer source global address is allocated A data transfer method comprising: transmitting to an element processor provided via the network.

The predetermined upper bit of the global address indicates the number of the element processor having the main storage device to which the area specified by the predetermined upper bit is allocated, and the designated transfer source global address is the main address of the own element processor. The determination as to whether or not the area is allocated to the storage device is based on whether or not the predetermined higher-order bit of the designated transfer source global address matches the number of the processor of its own element. 1. The data transfer method according to 1.

The network interface unit of the element processor that has received the transfer request message reads the data to be transferred from the main storage device of the element processor based on the information indicating the transfer source global address and the data amount included in the transfer request message. An area including the transfer destination global address by creating a data message including at least the transfer destination global address included in the transfer request message and the read data and attached with information indicating that the data is to be transferred 2. The data transfer method according to claim 1, wherein the data is transmitted to the element processor having the allocated main storage device via the network.