JPH08212178A

JPH08212178A - Parallel computer

Info

Publication number: JPH08212178A
Application number: JP7020137A
Authority: JP
Inventors: Toshiaki Tarui; 俊明垂井; Hideya Akashi; 英也明石; Naonobu Sukegawa; 直伸助川; Keimei Fujii; 啓明藤井
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1995-02-08
Filing date: 1995-02-08
Publication date: 1996-08-20

Abstract

PURPOSE: To reduce the overhead of accesses by issuing a single network command for the requests of accesses to be given to plural discontinuous data on the addresses of other processing units(PU) and then having the automatic accesses to these data by means of hardware. CONSTITUTION: In a write command mode, the addresses included in the even- numbered words are outputted to an address bus 160 and the data included in the odd-numbered words are outputted to a data bus 161. Then these addresses and data are written in a main storage 120. Thereby, the writing operations can be requested by a single network command to the discontinuous addresses of the main storage 120. In a read command mode, the addresses included in the even-numbered words are outputted to the bus 160 and an access is given to the storage 120. Then these addresses are paired with the return destination addresses included in the odd-numbered words of a request command, and a remote write command is acquired by a selector 141 and a header assembly circuit 140. This write command is sent back to a requester PU which processes the received command as a write command.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は複数のプロセッシングユ
ニットからなる並列計算機におけるデータ転送方式に関
する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a data transfer system in a parallel computer composed of a plurality of processing units.

【０００２】[0002]

【従来の技術】計算機性能の飛躍的向上に関して、多数
台のプロセッシングユニット（以下、ＰＵ）を並列動作
させる、並列計算機が有望視されている。並列計算機で
は、多数台のＰＵの間で効率良くデータを通信すること
が重要で、特に大規模な数値演算では、計算に必要な大
量のデータを、ＰＵ間で一括して高速に転送するための
アーキテクチャが必要である。2. Description of the Related Art A parallel computer in which a large number of processing units (hereinafter, referred to as PUs) are operated in parallel is regarded as promising for dramatically improving computer performance. In a parallel computer, it is important to communicate data efficiently between a large number of PUs, especially in a large-scale numerical operation, in order to transfer a large amount of data required for calculation collectively between the PUs at high speed. Architecture is required.

【０００３】従来の並列計算機におけるデータ転送機構
は、特開平6−19856号公報に示されているように、連続
したアドレスのデータを一括して転送する機構が採用さ
れていた。各ＰＵは他ＰＵの主記憶との間でＤＭＡを行
うための機構を持ち、転送したいデータの領域を指定す
ると、ＤＭＡ機構のハードウェアが指定された領域を自
動的に転送する。As a data transfer mechanism in a conventional parallel computer, as shown in Japanese Patent Application Laid-Open No. 6-19856, a mechanism for collectively transferring data of consecutive addresses has been adopted. Each PU has a mechanism for performing DMA with the main memory of another PU, and when the area of the data to be transferred is designated, the hardware of the DMA mechanism automatically transfers the designated area.

【０００４】[0004]

【発明が解決しようとする課題】上記従来技術では、転
送しようとするデータが全て連続したアドレス（もしく
はストライドアクセスなどの定型的なパターン）に存在
する場合は有効であるが、転送するデータのアドレス
が、連続でないランダムなアドレスの場合には、効率が
悪いという問題がある。The above-mentioned conventional technique is effective when all the data to be transferred exists at consecutive addresses (or a fixed pattern such as stride access), but the address of the data to be transferred is effective. However, in the case of random addresses that are not continuous, there is a problem of poor efficiency.

【０００５】例えば、リモートの非連続な領域に書き込
みを行う場合を考える。その場合、従来の連続アドレス
への書き込みのみが可能なＤＭＡデータ転送機構では、
相手のメモリ上に複数のデータを一括して書き込むこと
ができない。そのため、（１）１ワード毎にリモート書き込みコマンドを出す。（２）書き込むデータと書き込むアドレスを入れた２本
の配列を、一旦、相手先ＰＵの別々の領域に２回に分け
て転送した後、相手先のＰＵに、本来書き込むべき領域
への実際の書き込み処理を依頼する。等の方式が取られ
ていた。For example, consider the case where writing is performed in a remote non-continuous area. In that case, in the conventional DMA data transfer mechanism that can only write to consecutive addresses,
Multiple data cannot be written in the memory of the other party at once. Therefore, (1) a remote write command is issued for each word. (2) After temporarily transferring the two arrays containing the data to be written and the address to be written to different areas of the destination PU in two steps, the actual data to the area to be originally written to the destination PU is actually transferred to the destination PU. Request write processing. And so on.

【０００６】（１）の方式は、１ワード毎の書き込みコ
マンドを多数送出しなければならないため、実行時間が
増大するばかりか、ネットワーク上に大量のパケットを
出す必要が有るため、ネットワークの負荷が増大し、問
題である。In the method (1), a large number of write commands for each word must be sent, so that not only the execution time increases, but also a large number of packets need to be sent out on the network, which imposes a load on the network. Increasing and problematic.

【０００７】（２）の方式は、ネットワークの負荷は軽
減されるが、一旦アドレス，データを着地させる領域が
新たに必要になり、メモリの使用効率が落ちる。さら
に、相手先のＰＵに余分な仕事が発生するため、プログ
ラムの実行時間が増大する問題がある。In the method (2), the load on the network is reduced, but a new area for temporarily landing the address and data is newly required, and the efficiency of memory use decreases. Further, since the PU of the other party generates extra work, there is a problem that the execution time of the program increases.

【０００８】リモートのＰＵの非連続なアドレスにある
データを読み出そうとした場合も、（１）１ワード毎に読み出す（２）相手先のＣＰＵに依頼してデータを連続領域に集
めてもらった後、一括して転送する等の処理が必要となり、書き込みの場合と同様に処理効
率の大幅な低下を招く。Even when trying to read data at non-contiguous addresses of a remote PU, (1) reading data word by word (2) requesting the other party's CPU to collect data in a continuous area After that, it is necessary to perform processing such as batch transfer, which causes a significant decrease in processing efficiency as in the case of writing.

【０００９】特に近年、非定型なデータを扱うため、デ
ータ構造としてリストベクトルを利用したプログラムが
数多く見られる。リストベクトルのアクセスの場合、ア
クセス先のアドレスはポインタの配列で示されているた
め、一般には非連続なアドレスのへのデータアクセスと
なる。従ってリストベクトルのプログラムを高速に実行
するには、非連続なアドレスにあるデータを一括して高
速に転送する機構が必要となる。In particular, in recent years, many programs that use list vectors as a data structure have been found to handle atypical data. In the case of list vector access, since the access destination address is indicated by an array of pointers, data access is generally to non-contiguous addresses. Therefore, in order to execute a list vector program at high speed, a mechanism for transferring data at non-contiguous addresses all at once at high speed is required.

【００１０】[0010]

【課題を解決するための手段】上記目的を達成するため
に、複数のデータを転送するためのネットワークコマン
ドの中で、アクセスを行うリモートアドレスを１データ
毎に指定することが可能なコマンド構造とする。In order to achieve the above object, a command structure capable of designating a remote address to be accessed for each data in a network command for transferring a plurality of data. To do.

【００１１】他ＰＵの非連続なアドレスへの書き込みの
場合、書き込み先のＰＵへ送られるネットワークコマン
ドの中に、書き込みアドレスと書き込むべきデータの組
を、任意の個数持たせる。上記コマンドを受け取ったＰ
Ｕは、コマンド中の各組のアドレス部分に入っている値
をアドレスバスに、データ部分に入っている値をデータ
バスに振り分け、アドレスで示される領域にデータを書
き込む処理をコマンドの長さだけ繰り返す。これによ
り、他ＰＵのアドレスが非連続な複数のワードへの書き
込みを一つのコマンドで指示することができる。In the case of writing to non-contiguous addresses of other PUs, a network command sent to a write destination PU has an arbitrary number of sets of write addresses and data to be written. P that received the above command
U distributes the value contained in the address part of each set in the command to the address bus and the value contained in the data part to the data bus, and writes data in the area indicated by the address for the length of the command. repeat. Thus, writing to a plurality of words whose addresses of other PUs are not continuous can be instructed by one command.

【００１２】非連続なアドレスの読み出しの場合には、
ネットワーク上の要求コマンドに読み出しアドレスと、
読み出したデータを書き込むべき要求側のＰＵの主記憶
のアドレス（以下では返送先アドレスと呼ぶ）の組を、
任意の個数持たせる。上記コマンドを受け取ったＰＵ
は、コマンド中の各組の中のアドレス部分に入っている
値をアドレスバスに出力して主記憶中の値を読み出した
後、返送先アドレスに書き込むための処理を行う。ここ
で、読み出した値の返送先アドレスへの書き込みは、そ
れ自体、複数の非連続なアドレスへの書き込みとなるの
で、前に記した非連続なアドレスへの書き込みコマンド
を用いて、読み出した値を返送先アドレスへ書き込むよ
うに指示する。これにより、他ＰＵの、アドレスが非連
続な複数のワードのデータを読み出し、自ＰＵの領域に
書き込むことが出来る。In the case of non-consecutive address reading,
Read address in the request command on the network,
A set of addresses in the main memory of the requesting PU to which the read data should be written (hereinafter referred to as a return address) is
Have any number. PU that received the above command
Performs the processing for outputting the value contained in the address portion of each set in the command to the address bus, reading the value in the main memory, and then writing the value in the return destination address. Here, writing the read value to the return destination address is itself writing to a plurality of non-contiguous addresses, so the read value is written using the write command to the non-contiguous addresses described above. To write to the return address. As a result, it is possible to read data of a plurality of words of which addresses are not continuous in another PU and write the data in the area of the own PU.

【００１３】[0013]

【作用】本発明によれば、ネットワーク上を流れる主記
憶アクセスコマンドで、アクセス先のアドレスをデータ
毎に指定し、さらに、返答側ＰＵのハードウェアで前記
コマンドを分解し、主記憶をアクセスするハードウェア
を設ける。これにより、他ＰＵの主記憶上のアドレスが
連続しない複数のデータを、１回のネットワークコマン
ドで高速にアクセスすることができる。According to the present invention, the main memory access command flowing on the network designates the address of the access destination for each data, and further, the hardware of the responding PU decomposes the command to access the main memory. Provide hardware. As a result, a plurality of data whose addresses in the main memory of another PU are not consecutive can be accessed at high speed with a single network command.

【００１４】図１に本発明の並列計算機のブロック図を
示す。図中１３０が他ＰＵからの要求パケットのヘッダ
を分解するための回路、１３１が要求パケットのデータ
部のアドレスとデータ等を振り分けるための回路であ
る。データ部のワードの数をカウンタ１５０で数える。
書き込みコマンドの場合、偶数ワードに入っているアド
レスはアドレスバス１６０へ、奇数ワードに入っている
データはデータバス161へ出力し、主記憶への書き込み
を行う。これにより、主記憶の非連続なアドレスへの書
き込みを一つのネットワークコマンドで依頼することが
できる。FIG. 1 shows a block diagram of a parallel computer of the present invention. In the figure, reference numeral 130 is a circuit for decomposing the header of a request packet from another PU, and 131 is a circuit for allocating the address and data of the data portion of the request packet. The number of words in the data part is counted by the counter 150.
In the case of a write command, the address contained in the even word is output to the address bus 160, and the data contained in the odd word is output to the data bus 161, and the main memory is written. As a result, writing to non-contiguous addresses in the main memory can be requested by one network command.

【００１５】読み出しコマンドの場合は、偶数ワードに
入っているアドレスをアドレスバス１６０に出力し、主
記憶をアクセスした後、要求コマンドの奇数ワードに入
っている返送先アドレスと組にして、セレクタ１４１お
よびヘッダ組立回路１４０でリモート書き込みコマンド
を組立て、要求元ＰＵに返送する。要求元ＰＵでは前述
の書き込みコマンドとして処理を行うことにより、他Ｐ
Ｕの非連続なアドレスにあるデータを、自ＰＵの非連続
なアドレスに転送することができる。In the case of a read command, the address contained in the even word is output to the address bus 160, the main memory is accessed, and the selector 141 is paired with the return destination address contained in the odd word of the request command. Then, the remote writing command is assembled by the header assembling circuit 140 and returned to the requesting PU. At the requesting PU, the other P
Data at non-contiguous addresses of U can be transferred to non-contiguous addresses of its own PU.

【００１６】図中のコンパレータ１５１はコマンド中の
アクセスワード数とカウンタ１５０の値を比べ、コマン
ド処理の終了を検出するための回路である。これによ
り、パケット中でアクセスするワード数を任意に指定で
き、柔軟なリモートアクセスを行うことが出来る。A comparator 151 in the figure is a circuit for comparing the number of access words in a command with the value of the counter 150 to detect the end of command processing. As a result, the number of words to be accessed in the packet can be arbitrarily specified, and flexible remote access can be performed.

【００１７】[0017]

【実施例】図１ないし図４に本発明の一実施例を示す。
図１は本発明の並列計算機のブロック図である。図２な
いし図４はＰＵ間ネットワークのコマンドパケットのフ
ォーマットである。図２は非連続なアドレスへ複数のデ
ータの書き込みを指定するためのコマンド（以下ではマ
ルチワードライトと呼ぶ）、図３は非連続なアドレスの
複数のデータの読み出しを指定するためのコマンド（以
下ではマルチワードリードと呼ぶ）である。それに対し
て、図４は従来のＤＭＡ書き込みのコマンド（以下では
ＤＭＡライトと呼ぶ）である。1 to 4 show an embodiment of the present invention.
FIG. 1 is a block diagram of a parallel computer of the present invention. 2 to 4 are formats of command packets in the PU-to-PU network. 2 is a command for designating writing of a plurality of data to non-contiguous addresses (hereinafter referred to as multi-word write), and FIG. 3 is a command for designating reading of a plurality of data of non-contiguous addresses (hereinafter Is called multiword read). On the other hand, FIG. 4 shows a conventional DMA write command (hereinafter referred to as DMA write).

【００１８】図１において、１００，２００はＰＵ、９
００はＰＵ間ネットワークである。以下ではＰＵ１００
の内部のみ詳細に記す。他のＰＵも全く同一の構成を持
つ。ＰＵの内部では、１９０がＣＰＵ、１２０が主記
憶、１６０がアドレスバス、１６１がデータバス、１１
０がＰＵ間で従来型のＤＭＡ転送を行うためのＤＭＡコ
ントローラである。さらに、１３０は他ＰＵからのマル
チワードリード，マルチワードライトコマンドのヘッダ
部を解釈するための要求コマンドヘッダ分解回路、１３
１は要求コマンドのデータ部のアクセスアドレスと，書
き込みデータ（書き込みの場合）又は返送先アドレス
（読み出しの場合）を振り分けるためのスイッチであ
る。１５０はデータ部のワード数を数えるためのカウン
タ、150a0はカウンタの最下位ビット、１５１はコマン
ドパケットの終了を判定するための比較器、１３２，１
３３は主記憶アクセスコマンドを出力するための回路で
ある。１３４はマルチワードリード，マルチワードライ
トコマンドを切り替えるためのスイッチである。１４０
はマルチワードリードの返答を行うためのマルチワード
ライトコマンドのヘッダを組み立てる回路、１４１はマ
ルチワードリードの返答を行うためのマルチワードライ
トコマンドのデータ部の、返送先アドレスとデータの組
を組み立てるためのセレクタである。In FIG. 1, 100 and 200 are PU and 9
00 is a network between PUs. In the following, PU100
Only the inside of is described in detail. The other PUs have exactly the same configuration. Inside the PU, 190 is a CPU, 120 is a main memory, 160 is an address bus, 161 is a data bus, 11
Reference numeral 0 is a DMA controller for performing conventional DMA transfer between PUs. Further, 130 is a request command header decomposing circuit for interpreting a header part of a multiword read / multiword write command from another PU, 13
Reference numeral 1 is a switch for allocating the access address of the data part of the request command and the write data (for writing) or the return destination address (for reading). Reference numeral 150 is a counter for counting the number of words in the data section, 150a0 is the least significant bit of the counter, 151 is a comparator for determining the end of the command packet, 132, 1
Reference numeral 33 is a circuit for outputting a main memory access command. A switch 134 switches between multi-word read and multi-word write commands. 140
Is a circuit for assembling a multiword write command header for making a multiword read response, and 141 is for assembling a return destination address and data set in the data part of the multiword write command for making a multiword read response. Is a selector of.

【００１９】本発明では、各ＰＵがマルチワードコマン
ドを実行するために、パケット中のアドレス情報をスイ
ッチ１３１により切り分け、主記憶をアクセスする機構
を持つことに特徴が有る。The present invention is characterized in that each PU has a mechanism for separating the address information in the packet by the switch 131 and accessing the main memory in order to execute the multi-word command.

【００２０】先ず、システム全体の構成について述べ
る。システムは、プログラムを実行するＰＵ（１００，
２００）が、ネットワークにより接続された構成を取
る。各ＰＵはＣＰＵ１９０及び主記憶１２０を持ち、主
記憶分散型のマルチプロセッサシステムを構成してい
る。ＰＵ間の通信はネットワークを経由したパケット通
信で行われる。First, the configuration of the entire system will be described. The system executes PU (100,
200) takes the configuration connected by the network. Each PU has a CPU 190 and a main memory 120, and constitutes a main memory distributed multiprocessor system. Communication between PUs is performed by packet communication via a network.

【００２１】通常（従来型の）ＰＵ間の通信は１１０の
ＤＭＡ通信機構によって主記憶上のあるまとまった領域
を一括して転送することにより行われる。Communication between normal (conventional) PUs is carried out by collectively transferring a certain area on the main memory by means of the 110 DMA communication mechanism.

【００２２】図４にネットワーク上のＤＭＡライトコマ
ンドのフォーマットを示す。ネットワークコマンドはヘ
ッダとしてコマンド名１００１，宛先ＰＵ番号１００
２，コマンド長（データ部のワード数）１００３，送信
元ＰＵ番号１００４が置かれる。ヘッダ部の後にデータ
部が置かれる。データ部ではＤＭＡの送り先アドレス１
３００ａに引続き、ＤＭＡで送られるデータ１３００ｄ
〜１３０６ｄが置かれる。ここで、この実施例でのデー
タの１ワードは４Ｂである。ＤＭＡの送信側では、図４
に示されるパケットをＣＰＵが主記憶上に作成し、ＤＭ
Ａコントローラは主記憶上のパケットをネットワークに
転送する。ＤＭＡライトコマンドを受けたＰＵのＤＭＡ
コントローラは、ＤＡＴＡ０〜ＤＡＴＡｎ−１のデータ
を開始アドレスで示される領域から順番に書き込む。Ｄ
ＭＡコントローラの詳細については既知の技術であるの
でここでは説明を略す。FIG. 4 shows the format of a DMA write command on the network. A network command has a command name 1001 as a header and a destination PU number 100.
2, a command length (the number of words in the data section) 1003, a transmission source PU number 1004 are placed. The data part is placed after the header part. DMA destination address 1 in the data section
Data sent by DMA 1300d following 300a
~ 1306d is placed. Here, one word of data in this embodiment is 4B. On the DMA transmission side, as shown in FIG.
The CPU creates the packet shown in to the main memory, and DM
The A controller transfers the packet on the main memory to the network. DMA of PU that received DMA write command
The controller sequentially writes the data of DATA0 to DATAn-1 from the area indicated by the start address. D
Since the details of the MA controller are known techniques, the description thereof is omitted here.

【００２３】次にマルチワードライトコマンドの動作に
ついて述べる。図２にマルチワードライトコマンドのフ
ォーマットを示す。ヘッダ部はＤＭＡ転送と同じである
が、データ部の形式が異なる。ＤＭＡ転送ではデータ部
で指定される転送先アドレスは一つであるのに対し、マ
ルチワードライトではデータ１ワード毎にアドレスが指
定される。図の例では、Addr０で示されるアドレスにDa
ta０を、Addr１にＤａｔａ１を、という様に、各データ
を別々のアドレスに書き込むことが出来る。Next, the operation of the multi-word write command will be described. FIG. 2 shows the format of the multi-word write command. The header part is the same as the DMA transfer, but the format of the data part is different. In the DMA transfer, there is one transfer destination address specified in the data section, whereas in multiword write, an address is specified for each word of data. In the example in the figure, Da is assigned to the address indicated by Addr0.
It is possible to write each data to different addresses, such as ta0, Addr1 and Data1.

【００２４】マルチワードライトの要求側のＰＵでは、
図２のパケットをＣＰＵが予め主記憶に作成し、ＤＭＡ
コントローラ１１０を利用して主記憶上のパケットをネ
ットワークに転送する（この部分はＤＭＡライトと全く
同じである）。In the PU on the request side of multi-word write,
The CPU creates the packet shown in FIG. 2 in the main memory in advance, and DMA
The controller 110 is used to transfer the packet on the main memory to the network (this part is exactly the same as the DMA write).

【００２５】マルチワードライトを受信したＰＵはパケ
ットのヘッダを要求コマンドヘッダ分解回路１３０に、
データ部をスイッチ１３１に送る。要求コマンドヘッダ
分解回路では、パケットのヘッダ部を分解し、コマンド
種に応じてマルチワードライトの場合は信号１３０ｄを
出力すると同時に、データ部の長さ１３０ｂ，送り元Ｐ
Ｕ番号１３０ｅを出力する。さらにワード数カウンタ１
５０にリセット／スタート信号１３０ｆを送る。カウン
タ１５０の出力とパケット中の長さフィールド１３０ｂ
は比較機１５１で比較され、両者の値が異なる（カウン
タの値がパケットの長さより小さい）間、信号１５１ａ
が出力される。１５１ａにより、カウンタ１５０がイネ
ーブルされると同時に、ゲート１３３によって、書き込
みコマンド１３３ａが主記憶１２０に伝えられる。The PU receiving the multi-word write sends the packet header to the request command header disassembling circuit 130,
The data part is sent to the switch 131. The request command header disassembling circuit disassembles the header part of the packet and outputs a signal 130d in the case of multiword write according to the command type, and at the same time, the length 130b of the data part and the source P
The U number 130e is output. Word count counter 1
A reset / start signal 130f is sent to 50. Output of counter 150 and length field 130b in packet
Are compared by the comparator 151, and while the two values are different (the counter value is smaller than the packet length), the signal 151a
Is output. At the same time that the counter 150 is enabled by 151a, the write command 133a is transmitted to the main memory 120 by the gate 133.

【００２６】スイッチ１３１は、カウンタの最下位ビッ
ト１５０ａ０の値、つまり、パケットのデータ部の偶数
ワードか奇数ワードかに応じ、パケットのデータ部の値
１３１ａを、アドレス１３１ｃ（偶数ワードの場合）と
データ１３１ｄ（奇数ワードの場合）に振り分ける（ス
イッチ１３４はマルチワードライトの場合データバス16
1に接続されてる）。これにより、パケット中のアドレ
スとデータの組をアドレスバス１６０とデータバス１６
１に出力し、主記憶１２０に書き込むことが出来る。The switch 131 sets the value 131a of the packet data portion to the address 131c (in the case of an even word) according to the value of the least significant bit 150a0 of the counter, that is, whether the data portion of the packet is an even word or an odd word. Allocate to data 131d (in the case of odd words) (switch 134 is data bus 16 in the case of multi-word write)
Connected to one). As a result, the set of address and data in the packet is transferred to the address bus 160 and the data bus 16
It can be output to 1 and written to the main memory 120.

【００２７】カウンタ１５０の値がパケットのデータ長
１３０ｂと等しくなる（パケットが終了する）と、１５
１ａ信号が出力されなくなる。それにより、主記憶への
書き込み信号１３３ａが止められ、カウンタ１５０の動
作が止められ、処理が終了する。When the value of the counter 150 becomes equal to the data length 130b of the packet (when the packet ends), 15
The 1a signal is no longer output. As a result, the write signal 133a to the main memory is stopped, the operation of the counter 150 is stopped, and the process ends.

【００２８】以上の処理により、マルチワードライトコ
マンドの中の各アドレス，データの組を主記憶に書き込
むことが出来る。By the above processing, each address and data set in the multi-word write command can be written in the main memory.

【００２９】次に、マルチワードリードの動作について
述べる。図３にマルチワードリードコマンドのフォーマ
ットを示す。ヘッダ部はＤＭＡ転送等と同じである。マ
ルチワードリードは、相手先ＰＵの任意のアドレスの値
を読み、自ＰＵの任意のアドレスに書き込むためのコマ
ンドである。データ部には読み出す相手先ＰＵのアドレ
スと、読み出したデータを書き込む自ＰＵ上のアドレス
（返送先アドレス）の組を複数持つ。Next, the operation of multi-word read will be described. FIG. 3 shows the format of the multi-word read command. The header part is the same as the DMA transfer and the like. The multi-word read is a command for reading the value of an arbitrary address of the other party PU and writing it to an arbitrary address of its own PU. The data section has a plurality of sets of the address of the other party PU to read and the address (return destination address) of the own PU to which the read data is written.

【００３０】図３の例では、相手先ＰＵのAddr０で示さ
れるアドレスのデータを読み出し、自ＰＵのDest０で示
されるアドレスに書き込み、Addr１のデータをDest１
に、という様に、相手先ＰＵの別々のアドレスのデータ
を読み出し、自ＰＵの別々のアドレスに書き込むことが
出来る。In the example of FIG. 3, the data of the address indicated by Addr0 of the destination PU is read, the data of the address indicated by Dest0 of the own PU is written, and the data of Addr1 is indicated by Dest1.
As described above, it is possible to read data at different addresses of the partner PU and write to different addresses of the own PU.

【００３１】マルチワードリードの要求側のＰＵでは、
図３のパケットをＣＰＵが予め主記憶に作成し、ＤＭＡ
コントローラ１１０を利用して主記憶上のパケットをネ
ットワークに転送する。In the PU on the request side of multi-word read,
The CPU creates the packet of FIG. 3 in the main memory in advance, and DMA
The controller 110 is used to transfer the packet on the main memory to the network.

【００３２】マルチワードリードを受信したＰＵはパケ
ットのヘッダを要求コマンドヘッダ分解回路１３０に、
データ部をスイッチ１３１に送る。要求コマンドヘッダ
分解回路では、パケットのヘッダ部を分解し、コマンド
種に応じて、マルチワードリードの場合は信号１３０ｃ
を出力すると同時に、データ部の長さ１３０ｂ，送り元
ＰＵ番号１３０ｅを出力する。さらにワード数カウンタ
１５０にリセット／スタート信号１３０ｆを送る。カウ
ンタ１５０の出力とパケット中の長さフィールド１３０
ｂは比較機１５１で比較され、両者の値が異なる（カウ
ンタの値がパケットの長さより小さい）間、信号１５１
ａが出力される。１５１ａにより、カウンタ１５０がイ
ネーブルされると同時に、ゲート１３２によって、読み
出しコマンド１３２ａが主記憶１２０に伝えられる。Upon receiving the multi-word read, the PU sends the packet header to the request command header decomposing circuit 130.
The data part is sent to the switch 131. The request command header decomposing circuit decomposes the header part of the packet, and according to the command type, in the case of multiword read, the signal 130c is generated.
At the same time, the length of the data part 130b and the source PU number 130e are output. Further, a reset / start signal 130f is sent to the word number counter 150. Output of counter 150 and length field 130 in packet
b is compared by the comparator 151, and while the two values are different (the counter value is smaller than the packet length), the signal 151
a is output. The counter 150 is enabled by 151a, and at the same time, the read command 132a is transmitted to the main memory 120 by the gate 132.

【００３３】スイッチ１３１は、カウンタの最下位ビッ
ト１５０ａ０の値、つまり、パケットのデータ部の偶数
ワードか奇数ワードかに応じ、パケットのデータ部の値
131aを、アクセスアドレス１３１ｃ（偶数ワードの場
合）と返送先アドレス１４１ｂ（奇数ワードの場合）に
振り分ける（スイッチ１３４はマルチワードリードの場
合セレクタ１４１に接続されてる）。その後、アドレス
バス１６０上のアドレスを用いて、主記憶の値が読み出
され、読み出されたデータは、データバス１６１を通
り、セレクタ１４１に入力される。The switch 131 determines the value of the packet data portion according to the value of the least significant bit 150a0 of the counter, that is, whether it is an even word or an odd word of the packet data portion.
131a is allocated to an access address 131c (in the case of an even word) and a return destination address 141b (in the case of an odd word) (the switch 134 is connected to the selector 141 in the case of multiword read). After that, the value of the main memory is read using the address on the address bus 160, and the read data is input to the selector 141 via the data bus 161.

【００３４】その後、返答コマンドヘッダ組立回路１４
０，セレクタ１４１を用いて、読み出された値を送り元
のＰＵに返送するためのマルチワードライトコマンドが
出力される。このコマンドはAddr０〜Addrｎ−１に格納
されていた値を、送り元ＰＵのDest０〜Destｎ−１に書
き込む。Thereafter, the reply command header assembling circuit 14
0, a multi-word write command for returning the read value to the source PU is output using the selector 141. This command writes the values stored in Addr0 to Addrn-1 to Dest0 to Destn-1 of the source PU.

【００３５】まず、返答コマンド組立回路１４０は要求
コマンドヘッダ分解回路１３０から伝えられた送り元Ｐ
Ｕ（つまり返答コマンド宛先ＰＵ）番号１３０ｅ，コマ
ンド長１３０ｂより、返答用のマルチワードライトコマ
ンドのヘッダを送出する。パケットのデータ部１４１ａ
には、スイッチ１４１を用いて、カウンタの最下位ビッ
ト１５０ａ０の値（ただし、返答回路では、主記憶アク
セスを待つために、ディレイラッチ１５２を用いて１サ
イクル遅らせてある）、つまり、パケットのデータ部が
偶数ワードか奇数ワードかに応じ、返送先アドレス１４
１ｂ（偶数ワードの場合）もしくは読み出したデータ１
４１ｃ（奇数ワードの場合）を出力する。これにより、
送り元ＰＵから送られてきた返送先アドレスと、主記憶
を読み出したデータの組をマルチワードライトコマンド
のデータ部として送り元ＰＵに返送することができる。First, the reply command assembling circuit 140 sends the source P sent from the request command header disassembling circuit 130.
The U (that is, reply command destination PU) number 130e and the command length 130b are used to send out the header of the multi-word write command for reply. Data part 141a of packet
The switch 141 is used to set the value of the least significant bit 150a0 of the counter (however, in the reply circuit, one cycle is delayed by using the delay latch 152 to wait for the main memory access), that is, the packet data. Return address 14 depending on whether the part is an even word or an odd word
1b (for even word) or read data 1
41c (in the case of an odd word) is output. This allows
The return destination address sent from the source PU and the set of data read from the main memory can be returned to the source PU as the data part of the multiword write command.

【００３６】カウンタ１５０の値がパケットのデータ長
１３０ｂと等しくなる（パケットが終了する）と、１５
１ａ信号の出力が止められる。それにより、主記憶への
読み出し信号１３２ａが止められ、カウンタ１５０の動
作が止められ、返送コマンドの送出も終了する。以上の
処理により、マルチワードリードコマンドの中の各アド
レスの値を読み出し、送り元ＰＵの返送先アドレスに書
き込むことができる。When the value of the counter 150 becomes equal to the data length 130b of the packet (packet ends), 15
The output of the 1a signal is stopped. As a result, the read signal 132a to the main memory is stopped, the operation of the counter 150 is stopped, and the sending of the return command is completed. Through the above processing, the value of each address in the multiword read command can be read and written to the return destination address of the source PU.

【００３７】以上の方式により、マルチワードライト，
マルチワードリードコマンドを用いて、他ＰＵの主記憶
上のアドレスが非連続な複数のワードに対する、書き込
み，読み出し処理を一つのネットワークコマンドで一括
して行うことが可能である。By the above method, multi-word write,
By using the multi-word read command, it is possible to collectively perform writing and reading processes for a plurality of words whose addresses in the main memory of another PU are not continuous with one network command.

【００３８】[0038]

【発明の効果】本発明によれば、分散メモリ型の並列計
算機において、他ＰＵにある、アドレスの連続しない複
数のデータに対するアクセスを、一つのネットワークコ
マンドで依頼し、ハードウェアで自動的に行うことによ
り、従来の連続アドレスに対するアクセスのみが可能な
ＤＭＡ機構を使用した場合と比較して、アクセスのオー
バヘッドを大幅に削減することが可能になる。According to the present invention, in a distributed memory type parallel computer, a single network command is used to request access to a plurality of data of non-consecutive addresses in another PU, and the data is automatically executed by hardware. As a result, it becomes possible to significantly reduce the access overhead as compared with the case of using the conventional DMA mechanism that can only access continuous addresses.

[Brief description of drawings]

【図１】本発明の一実施例のリモートアクセス機構を持
った並列計算機のブロック図。FIG. 1 is a block diagram of a parallel computer having a remote access mechanism according to an embodiment of the present invention.

【図２】ネットワーク上のマルチワードライトコマンド
のフォーマットを示す図。FIG. 2 is a diagram showing a format of a multi-word write command on a network.

【図３】ネットワーク上のマルチワードリードコマンド
のフォーマットを示す図。FIG. 3 is a diagram showing a format of a multiword read command on a network.

【図４】従来のＤＭＡ書き込みコマンドのフォーマット
を示す図。FIG. 4 is a diagram showing a format of a conventional DMA write command.

[Explanation of symbols]

１００，２００…プロセッシングユニット、１１０…Ｄ
ＭＡコントローラ、１１０ａ…ＤＭＡコマンド、１２０
…主記憶、１３０…分解回路、１３０ａ…要求コマンド
ヘッダ、１３０ｂ…データ長、１３０ｃ…マルチワード
リード信号、１３０ｄ…マルチワードライト信号、１３
０ｅ…送り元ＰＵ番号、１３０ｆ…カウンタコントロー
ル信号、１３１…スイッチ、１３１ａ…要求コマンドデ
ータ、１３１ｃ…主記憶アドレス、１３１ｄ…返送先ア
ドレス、１３２…信号出力ゲート、１３３…信号出力ゲ
ート、１３４…切替スイッチ、１４０…組立回路、140a
…返答コマンドヘッダ、１４１…セレクタ、１４１ａ…
データ、１４１ｂ…返送先アドレス、１４１ｃ…読み出
しデータ、１５０…ワード数カウンタ、１５０ａ…カウ
ンタ出力、１５０ａ０…カウンタ出力最下位ビット、１
５１…コンパレータ、１５１ａ…コマンドイネーブル信
号、１５２…ラッチ、１６０…アドレスバス、１６１…
データバス、１９０…ＣＰＵ、９００…ネットワーク。100, 200 ... Processing unit, 110 ... D
MA controller, 110a ... DMA command, 120
... main memory, 130 ... decomposition circuit, 130a ... request command header, 130b ... data length, 130c ... multiword read signal, 130d ... multiword write signal, 13
0e ... Source PU number, 130f ... Counter control signal, 131 ... Switch, 131a ... Request command data, 131c ... Main storage address, 131d ... Return destination address, 132 ... Signal output gate, 133 ... Signal output gate, 134 ... Switch Switch, 140 ... Assembly circuit, 140a
... Reply command header, 141 ... Selector, 141a ...
Data, 141b ... Return address, 141c ... Read data, 150 ... Word number counter, 150a ... Counter output, 150a0 ... Counter output least significant bit, 1
51 ... Comparator, 151a ... Command enable signal, 152 ... Latch, 160 ... Address bus, 161 ...
Data bus, 190 ... CPU, 900 ... Network.

───────────────────────────────────────────────────── フロントページの続き (72)発明者藤井啓明東京都国分寺市東恋ケ窪１丁目280番地株式会社日立製作所中央研究所内 ─────────────────────────────────────────────────── ─── Continuation of the front page (72) Inventor Hiroaki Fujii 1-280, Higashikoigokubo, Kokubunji, Tokyo Inside the Central Research Laboratory, Hitachi, Ltd.

Claims

[Claims]

1. In a parallel computer having a plurality of processing units, each processing unit having an independent main memory, and each processing unit being connected by a network, an address on the main memory of another processing unit. A parallel computer characterized by specifying access to a plurality of non-consecutive data with a single network command.

2. A parallel computer according to claim 1, wherein writing to a plurality of data having arbitrary addresses is requested by the same network command.

3. The parallel computer according to claim 2, wherein the data write command on the network can have an arbitrary number of sets of write addresses and write data.

4. The switch according to claim 3, further comprising a switch for allocating a write address and write data in a write command received from another processing unit to an address line and a data line of the main memory, and writing to the main memory. Parallel computer to do.

5. The parallel computer according to claim 1, wherein reading of a plurality of data having arbitrary addresses is requested by the same network command.

6. A parallel computer according to claim 5, wherein a plurality of data read from another processing unit can be placed at an arbitrary position of a main memory value of its own processing unit.

7. A data read command on a network according to claim 6, wherein an arbitrary number of sets of a read address on the main memory of another processing and an address on the own processing unit for storing the read data can be held. Is a parallel computer.

8. The method according to claim 7, wherein the read address and the address for storing the read data in the read command received from the other processing unit are distributed to the main memory address line and the return destination address of the read data. A parallel computer that has a switch to combine multiple sets of return destination address and read data into one write command and output to the network.