JPH05173991A

JPH05173991A - Parallel processing system and data transferring method

Info

Publication number: JPH05173991A
Application number: JP4044399A
Authority: JP
Inventors: Ichiro Okabayashi; 一郎岡林
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 1991-03-19
Filing date: 1992-03-02
Publication date: 1993-07-13
Anticipated expiration: 2018-03-24
Also published as: JP3389610B2

Abstract

PURPOSE:To enable a PE-to-PE communication and improve processor performance. CONSTITUTION:The parallel processing system consists of plural PEs 1a, 1c, and 1d and a network 2 which connects the PEs mutually. The PE 1a is constituted by connecting a processor 3a, a memory 4a, and a data transfer device 5a to a common bus. The data transfer device 5 has three buffers and a data repeating device 6 has two buffers. Data from the PE1a to the PE1d are transferred in the order of the memory 4a, buffer 7a, buffer 10a, buffer 8c, buffer 11e, buffer 9d, and memory 4d as shown by a dotted line. Namely, the PE1c repeats the data by utilizing the buffer 8c. Consequently, neither memory writing nor reading is performed at the repeating PE at the time of an optional PE-to- PE communication, so the overhead at the repeating PE is reduced to improve the transfer performance. Further, the data transfer device performs no bus access, so the bus width is widened and the performance of the processor is improved.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、計算機分野でその将来
性が期待されている並列処理システムに係わり、特にプ
ロセッサエレメント間通信に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a parallel processing system which is expected to be promising in the computer field, and more particularly to communication between processor elements.

【０００２】[0002]

【従来の技術】一般に並列処理システムは、計算処理を
行なうプロセサエレメント（以下ＰＥ）と、ＰＥ間のデ
ータ転送を行なうネットワークで構成される。2. Description of the Related Art Generally, a parallel processing system comprises a processor element (hereinafter referred to as PE) that performs calculation processing and a network that transfers data between PEs.

【０００３】以下図面を参照しながら、従来の並列処理
システムにおけるＰＥ間通信の一例について説明する。
図１０は従来の第１の並列処理システムの構成図、図１
１は従来のデータ転送装置の構成図、図１２は従来のデ
ータ中継装置の構成図を示すものである。これらは、電
子情報通信学会・集積回路研究会ICD89-152に詳しく述
べられている。ここでは、ＰＥ、ネットワークの一部を
示す。また、データの流れを単方向に限って説明する。An example of communication between PEs in a conventional parallel processing system will be described below with reference to the drawings.
FIG. 10 is a block diagram of a conventional first parallel processing system, FIG.
1 is a block diagram of a conventional data transfer device, and FIG. 12 is a block diagram of a conventional data relay device. These are described in detail in ICD89-152, Institute of Electronics, Information and Communication Engineers, Integrated Circuit Research Group. Here, PE and a part of the network are shown. The data flow will be described only in one direction.

【０００４】まず、従来の並列処理システム（図１０）
について説明する。基本的には、ＰＥ1a,1c,1dとＰＥ間
を相互に接続するネットワーク2で構成される。ＰＥ１
は全て同一の構成であり、ＰＥ1aを例にとれば、プロセ
サ3a,メモリ4a,データ転送装置5aを共通のバスに接続し
た構成である。また、データ転送装置5aは２つのバッフ
ァ7a,9aを有する。また、ネットワーク2内部にデータ中
継装置6a,6eを設ける。データ中継装置6a・6eはそれぞれ
バッファ10a,10eを有する。ネットワーク2は、任意ＰＥ
間通信が第３のＰＥを１回経由することで可能なもの
（ＰＥ間距離が２）である。以上の従来の並列処理シス
テムにおいて、ＰＥ1aからＰＥ1dへのデータの流れは、
メモリ4a、バッファ7a、バッファ10a、バッファ9c、メ
モリ4c 、バッファ7c、バッファ10e、バッファ9d、メモ
リ4d となる。これを図１０中に点線で示す。First, a conventional parallel processing system (FIG. 10)
Will be described. Basically, it is composed of a network 2 for connecting PEs 1a, 1c, 1d and PEs to each other. PE1
All have the same configuration, and in the case of PE1a as an example, the processor 3a, the memory 4a, and the data transfer device 5a are connected to a common bus. Further, the data transfer device 5a has two buffers 7a and 9a. Further, data relay devices 6a and 6e are provided inside the network 2. The data relay devices 6a and 6e have buffers 10a and 10e, respectively. Network 2 is an optional PE
Inter-communication is possible by passing once through the third PE (distance between PEs is 2). In the above conventional parallel processing system, the data flow from PE1a to PE1d is
The memory 4a, the buffer 7a, the buffer 10a, the buffer 9c, the memory 4c, the buffer 7c, the buffer 10e, the buffer 9d, and the memory 4d. This is shown by the dotted line in FIG.

【０００５】続いて、従来のデータ転送装置（図１１）
について説明する。入出力ポート17aはメモリ4に、入出
力ポート17b,17cはネットワーク2に接続される。入出力
ポート17aから17bへのデータの流れは次の様になる。メ
モリアドレス生成部12aよりセレクタ18a経由で、アドレ
ス50aを出力してメモリリードを行ないデータ51aを入出
力ポート17aからバッファ7 へ取り込む。次に、中継ア
ドレス生成部15aからアドレス50bを出力して、入出力ポ
ート17bからデータ51bを出力する。Subsequently, a conventional data transfer device (FIG. 11)
Will be described. The input / output port 17a is connected to the memory 4, and the input / output ports 17b and 17c are connected to the network 2. The data flow from the input / output ports 17a to 17b is as follows. The address 50a is output from the memory address generation unit 12a via the selector 18a, the memory is read, and the data 51a is fetched from the input / output port 17a into the buffer 7. Next, the relay address generator 15a outputs the address 50b and the input / output port 17b outputs the data 51b.

【０００６】また、入出力ポート17cから17aへのデータ
の流れは次の様になる。中継アドレス生成部15bからア
ドレス50cを出力して、入出力ポート17cからデータ51c
を入力し、バッファ9に取り込む。次に、入出力ポート1
7aから、メモリアドレス生成部12bよりセレクタ18a経由
でアドレス50aを、バッファ9よりデータ51a を出力して
メモリライトを行なう。なお、制御部16a・16bはバッフ
ァ状態52a・52bを監視する。The data flow from the input / output ports 17c to 17a is as follows. The address 50c is output from the relay address generator 15b, and the data 51c is output from the input / output port 17c.
Enter and capture in buffer 9. Next, I / O port 1
7a outputs the address 50a from the memory address generation unit 12b via the selector 18a and the data 51a from the buffer 9 to perform the memory write. The control units 16a and 16b monitor the buffer states 52a and 52b.

【０００７】さらに続いて、従来のデータ中継装置（図
１２）について説明する。データ51bはバッファ10に格
納される。制御部31aはバッファ10のリード／ライトを
制御する。デコーダ30a・30bはアドレス50b・50cを監視
し、自分がアクセスされた際に、トライステートバッフ
ァ32a・32cをイネーブルとして、バッファ状態52a・52bを
外部へ通過させる。ここでのバッファ状態とは、書き込
み側はバッファフル、読みだし側はバッファエンプティ
に関するものである。Next, the conventional data relay device (FIG. 12) will be described. The data 51b is stored in the buffer 10. The control unit 31a controls reading / writing of the buffer 10. The decoders 30a and 30b monitor the addresses 50b and 50c, and when they are accessed, enable the tristate buffers 32a and 32c and pass the buffer states 52a and 52b to the outside. The buffer state here means that the writing side is buffer full and the reading side is buffer empty.

【０００８】次に、図１３は従来のデータ転送方法を示
す図である。これは、ネットワーク2が完全クロスバ網
の例である。ＰＥからのデータ送出順序をデータ中継装
置6a-6p内に示す。即ち、最初のステップでＰＥ1aはデ
ータ中継装置6aに、1bは6e、1cは6i、1dは6mに一斉にデ
ータを送出する。次のステップでは、ＰＥ1aはデータ中
継装置6bに、1bは6f、1cは6j、1dは6nに一斉にデータを
送出する。以下同様で、端まで送出し終わると最初に戻
る。最初のステップ終了後、ＰＥ1aがデータ中継装置6a
よりデータを受信する。Next, FIG. 13 is a diagram showing a conventional data transfer method. This is an example in which the network 2 is a perfect crossbar network. The data transmission order from the PE is shown in the data relay devices 6a-6p. That is, in the first step, the PE 1a sends data to the data relay device 6a, 6b for 1b, 6i for 1c, and 6m for 1d all at once. In the next step, the PE 1a sends data to the data relay device 6b, 1b to 6f, 1c to 6j, and 1d to 6n all at once. The same applies to the above, and when the transmission to the end is completed, the process returns to the beginning. After the first step is completed, the PE 1a is the data relay device 6a.
Receive more data.

【０００９】最後に、図１４は従来の第２の並列処理シ
ステムの構成図である。これは、電子情報通信学会・コ
ンピュータシステム研究会CPSY89-1に詳しく述べられて
いる。ここで、ＰＵ（プロセッシングユニット）は図１
４（ａ）に示す様にメッシュ状に接続されている。各Ｐ
Ｕは図１４（ｂ）に示す様にＣＰＵ71、ローカルメモリ
72、周辺ＬＳＩ73を共通のバスに接続した構成である。
また、４つのポート75a-dを有し、２ポートＲＡＭであ
るコネクションメモリ74a-bを介して他ＰＵと通信を行
なう。Finally, FIG. 14 is a block diagram of a second conventional parallel processing system. This is described in detail in the Institute of Electronics, Information and Communication Engineers / Computer Systems Research Group CPSY89-1. Here, the PU (processing unit) is shown in FIG.
As shown in FIG. 4 (a), they are connected in a mesh. Each P
U is a CPU 71 and a local memory as shown in FIG.
The configuration is such that 72 and peripheral LSI 73 are connected to a common bus.
Further, it has four ports 75a-d and communicates with other PUs via a connection memory 74a-b which is a 2-port RAM.

【００１０】[0010]

【発明が解決しようとする課題】しかしながら上記の様
な第１の並列処理システムでは、中継するＰＥがデータ
を一旦メモリにライトしてから再度リードするので、こ
こでのオーバーヘッドが大きいという問題点を有してい
た。またメモリアクセスを行なうので、バスネックが発
生しプロセサ性能も低下する。However, in the first parallel processing system as described above, the PE to be relayed once writes the data in the memory and then reads the data again, so that the overhead here is large. I had. In addition, since memory access is performed, a bus neck occurs and processor performance is degraded.

【００１１】また、上記の様なデータ転送方式では、最
初のステップ終了後ＰＥ1aのみがデータ中継装置6a,6e,
6i,6mに接続されているため受信可能で、このパスだけ
に負荷が集中し、システム全体の転送性能は低下する。Further, in the data transfer system as described above, only the PE 1a is connected to the data relay devices 6a, 6e, 6e, 6e, after the first step is completed.
Receiving is possible because it is connected to 6i and 6m, and the load concentrates only on this path, and the transfer performance of the entire system deteriorates.

【００１２】また、上記の様な第２の並列処理システム
では、全ＰＵが同期して隣接するＰＵと通信する場合は
非常に高速であるが、距離の遠いＰＵとの通信は遅い。
任意のＰＥ間距離はＮｘＮのシステムで最大Ｎ、平均Ｎ
／２である。また個々のＰＵで通信要求がランダムに発
生する場合の対応、あるいは他ネットワークへの拡張の
２点でも不利である。In the second parallel processing system as described above, when all the PUs synchronously communicate with the adjacent PUs, the communication speed is very high, but the communication with the distant PUs is slow.
Arbitrary PE distance is maximum N and average N in NxN system
/ 2. In addition, there are two disadvantages in dealing with the case where communication requests are randomly generated in individual PUs, or extending to other networks.

【００１３】本発明は上記問題点に鑑み、高いプロセサ
性能、高速なＰＥ間通信を実現し、かつ柔軟な並列処理
システムを提供することを目的とする。In view of the above problems, it is an object of the present invention to provide a flexible parallel processing system which realizes high processor performance and high-speed PE communication.

【００１４】[0014]

【課題を解決するための手段】上記問題点を解決するた
めに、本発明の並列処理システムは、プロセサと、メモ
リと、第１,第２,第３の３つのバッファを有するデータ
転送装置を具備する複数のプロセサエレメントと、前記
複数のプロセサエレメント間で直接または１つ以上のプ
ロセサエレメントを中継することで間接的にデータ転送
が可能なように接続するネットワークを備えたものであ
り、データ転送の際に、送り手となるプロセサエレメン
トでは、データ転送装置がメモリまたはプロセサからデ
ータを第１のバッファに取り込んでから前記ネットワー
クへ送出し、受け手となるプロセサエレメントでは、デ
ータ転送装置が前記ネットワークから前記データを第２
のバッファに取り込んだ後、メモリまたはプロセサへ格
納し、中継するプロセエレメントでは、データ転送装置
が前記ネットワークから前記データを第３のバッファに
取り込んだ後、再び前記ネットワークへ送出するもので
ある。In order to solve the above problems, the parallel processing system of the present invention comprises a data transfer device having a processor, a memory, and three buffers of first, second and third buffers. The data transfer includes: a plurality of processor elements provided; and a network that connects the plurality of processor elements directly or indirectly by relaying one or more processor elements to enable data transfer. In this case, in the processor element serving as the sender, the data transfer device captures the data from the memory or the processor into the first buffer and then sends the data to the network. In the processor element serving as the receiver, the data transfer device transmits the data from the network. Second data
In the processor element which stores the data in the memory or processor and then relays the data, the data transfer device fetches the data from the network into the third buffer and then sends the data to the network again.

【００１５】また、本発明のデータ転送方式は、少なく
とも２つのポートを有するＮ個（Ｎは２以上の整数）の
プロセサエレメントと、ＮｘＮ個の格子点を有し、各格
子点を（Ｋ,Ｌ）（Ｋ,Ｌは１以上Ｎ以下の整数）とし、
ここに少なくとも２つのポートを有するバッファ（Ｋ,
Ｌ）を配したネットワークを備え、前記第Ｋのプロセサ
エレメントの一端をバッファ（Ｋ,Ｌ）の一端に共通に
接続し、またバッファ（Ｋ,Ｌ）の他端をＬが共通にな
るように接続し、この共通接続線を前記プロセサエレメ
ントに接続するか、または外部ポートとする構成を基本
単位として含む並列処理システムにおいて、前記第Ｋの
プロセサエレメントからデータを送出する際に、バッフ
ァ（Ｋ,Ｋ）から順次データを送出するものである。Further, the data transfer system of the present invention has N (N is an integer of 2 or more) processor elements having at least two ports and N × N grid points, each grid point being (K, L) (K and L are integers of 1 or more and N or less),
A buffer with at least two ports (K,
L) is arranged in the network, one end of the Kth processor element is commonly connected to one end of the buffer (K, L), and the other end of the buffer (K, L) is commonly L. In a parallel processing system including as a basic unit a structure in which a common connection line is connected to the processor element or an external port is connected as a basic unit, a buffer (K, The data is sequentially transmitted from K).

【００１６】[0016]

【作用】本発明の並列処理システムでは、上記した構成
によって、ＰＥ間で通信を行なう場合に、送り手となる
プロセサエレメントでは、データ転送装置がメモリまた
はプロセサからデータを第１のバッファに取り込んでか
らネットワークへ送出する。また、受け手となるプロセ
サエレメントでは、データ転送装置がネットワークから
データを第２のバッファに取り込んだ後、メモリまたは
プロセサへ格納する。さらに、中継するプロセエレメン
トでは、データ転送装置がネットワークからデータを第
３のバッファに取り込んだ後、再びネットワークへ送出
する。従って、中継ＰＥでのメモリライト／リードがな
いので、ここでのオーバーヘッドが軽減され、転送性能
が向上する。また、データ転送装置がバスアクセスをし
ないので、バス幅も広がりプロセサの性能も向上する。In the parallel processing system of the present invention, when the PEs communicate with each other, the data transfer device fetches the data from the memory or the processor into the first buffer in the processor element serving as the sender. To the network. In the processor element serving as the receiver, the data transfer device fetches the data from the network into the second buffer and then stores the data in the memory or the processor. Further, in the relaying process element, the data transfer device fetches the data from the network into the third buffer, and then sends the data to the network again. Therefore, since there is no memory write / read in the relay PE, the overhead here is reduced and the transfer performance is improved. Further, since the data transfer device does not access the bus, the bus width is widened and the performance of the processor is improved.

【００１７】また、本発明のデータ転送方式では、Ｎｘ
Ｎ個（Ｎは２以上の整数）の格子点を有するクロスバ網
に対して、第Ｋのプロセサエレメントは、格子点（Ｋ,
Ｋ）から順次データを送出するので、受信の際特定のパ
スにトラフィックが集中することはなく、転送性能が向
上する。Further, in the data transfer system of the present invention, Nx
For a crossbar network having N grid points (N is an integer of 2 or more), the Kth processor element is a grid point (K,
Since the data is sequentially transmitted from K), the traffic is not concentrated on a specific path upon reception, and the transfer performance is improved.

【００１８】[0018]

【実施例】以下本発明の実施例について、図面を参照し
ながら説明する。Embodiments of the present invention will be described below with reference to the drawings.

【００１９】これらの実施例では説明及び図面の簡単化
のため、データの流れを一方向に限定する。また、ネッ
トワークは、任意ＰＥ間通信が第３のＰＥを１回経由す
ることで可能なもの（ＰＥ間距離が２）である。In these embodiments, the data flow is limited to one direction for simplification of description and drawings. Further, the network is one in which communication between arbitrary PEs is possible by passing through the third PE once (distance between PEs is 2).

【００２０】本発明の第１の実施例における並列処理シ
ステムの全体構成図（図６）を説明する。これはＰＥ数
４の並列処理システムである。図６の構成によりＰＥ間
距離２を実現する。図１はこの抜粋と考えてよく、ＰＥ
1a、データ中継装置6a、ＰＥ1c、データ中継装置6e、Ｐ
Ｅ1dと接続されることがわかる。An overall configuration diagram (FIG. 6) of the parallel processing system in the first embodiment of the present invention will be described. This is a parallel processing system with four PEs. The PE distance 2 is realized by the configuration of FIG. Figure 1 can be thought of as this excerpt, PE
1a, data relay device 6a, PE1c, data relay device 6e, P
It can be seen that it is connected to E1d.

【００２１】図１は本発明の第１の実施例における並列
処理システムの構成図である。基本的には、ＰＥ1a・1c・
1dとＰＥ間を相互に接続するネットワーク2で構成され
る。制御線を含めた全体的な接続は、図７を用いて後ほ
ど説明する。FIG. 1 is a block diagram of a parallel processing system according to the first embodiment of the present invention. Basically, PE1a ・ 1c ・
1d and PE are connected to each other by a network 2. The overall connection including the control lines will be described later with reference to FIG.

【００２２】ＰＥは全て同一の構成であり、ＰＥ1aを例
にとれば、プロセサ3a、メモリ4a、データ転送装置5aを
共通のバスに接続した構成である。また、データ転送装
置5aは３つのバッファ7a・8a・9aを有する。また、ネット
ワーク2内部にデータ中継装置6a・6eを設ける。データ中
継装置6aはバッファ10a・11aを、6eはバッファ10e・11eを
有する。以上の並列処理システムにおいて、ＰＥ1aから
ＰＥ1dへのデータの流れは、メモリ4a、バッファ7a、バ
ッファ10a、バッファ8c、バッファ11e、バッファ9d、メ
モリ4dとなる。これを図１中に点線で示す。All PEs have the same configuration. Taking PE1a as an example, the processor 3a, memory 4a, and data transfer device 5a are connected to a common bus. Further, the data transfer device 5a has three buffers 7a, 8a and 9a. Further, data relay devices 6a and 6e are provided inside the network 2. The data relay device 6a has buffers 10a and 11a, and 6e has buffers 10e and 11e. In the above parallel processing system, the flow of data from PE1a to PE1d is memory 4a, buffer 7a, buffer 10a, buffer 8c, buffer 11e, buffer 9d, and memory 4d. This is indicated by the dotted line in FIG.

【００２３】ここで、データを中継するＰＥ1cにおい
て、データ転送装置5cはデータ中継装置6aから受信した
データを内部のバッファ8c経由でデータ中継装置6eに送
出する。ここで、メモリ4cのアクセスを伴わないので、
転送速度の向上及びバス幅の拡大の両者が同時に可能と
なる。Here, in the PE 1c that relays data, the data transfer device 5c sends the data received from the data relay device 6a to the data relay device 6e via the internal buffer 8c. Here, since the memory 4c is not accessed,
Both the improvement of the transfer speed and the expansion of the bus width are possible at the same time.

【００２４】以下、本実施例を実現するための要素技術
であるデータ転送装置、データ中継装置等について順次
説明する。Hereinafter, the data transfer device, the data relay device, and the like, which are the elemental techniques for realizing the present embodiment, will be sequentially described.

【００２５】まず、第１のデータ転送装置について説明
する。図２は、本発明の第１の実施例におけるデータ転
送装置の構成図である。入出力ポート17aはメモリ4に、
17b,17cはネットワーク2のバッファ中継装置6に接続さ
れる。First, the first data transfer device will be described. FIG. 2 is a block diagram of the data transfer device in the first embodiment of the present invention. I / O port 17a is in memory 4,
17b and 17c are connected to the buffer relay device 6 of the network 2.

【００２６】入出力ポート17aから17bへのデータの流れ
は次の様になる。メモリアドレス生成部12aよりセレク
タ18a経由で、アドレス50を出力してメモリリードを行
なう。データ51を入出力ポート17aから取り込み、タグ
生成部13の出力をタグとしてデータに付加した後、バッ
ファ7へ取り込む。次に、中継アドレス生成部15aからセ
レクタ18c経由でアドレス50b、バッファ部7からセレク
タ18b経由でデータ51bを入出力ポート17bより出力す
る。ここで、メモリアドレス生成部12aはリード回数の
カウントも行なう。タグについては、図４を用いて後ほ
ど説明する。The data flow from the input / output ports 17a to 17b is as follows. The memory address generator 12a outputs the address 50 via the selector 18a to read the memory. The data 51 is fetched from the input / output port 17a, the output of the tag generator 13 is added to the data as a tag, and then fetched into the buffer 7. Next, the relay address generation unit 15a outputs the address 50b via the selector 18c, and the buffer unit 7 outputs the data 51b via the selector 18b from the input / output port 17b. Here, the memory address generation unit 12a also counts the number of reads. The tags will be described later with reference to FIG.

【００２７】また、入出力ポート17cから17aへのデータ
の流れは次の様になる。中継アドレス生成部15bからア
ドレス50cを出力して、入出力ポート17cからデータ51c
を入力し、バッファ9に取り込む。次に、入出力ポート1
7aから、バッファ9 の出力の一部をセレクタ18a経由で
アドレス50に、バッファ9の他の一部をデータ51に出力
してメモリライトを行なう。ここで、カウンタ14はライ
ト回数のカウントを行なう。The data flow from the input / output ports 17c to 17a is as follows. The address 50c is output from the relay address generator 15b, and the data 51c is output from the input / output port 17c.
Enter and capture in buffer 9. Next, I / O port 1
7a outputs a part of the output of the buffer 9 to the address 50 via the selector 18a and another part of the buffer 9 to the data 51 to perform the memory write. Here, the counter 14 counts the number of writes.

【００２８】最後に、入出力ポート17cから17bへのデー
タの流れは次の様になる。中継アドレス生成部15bから
アドレス50cを出力して、入出力ポート17cからデータ51
cを入力し、タグ変換部131でタグ部分を変換したのちバ
ッファ8に取り込む。次に、アドレス50bとしてバッファ
8の出力の一部をセレクタ18c経由で、データ51bとして
バッファ部8の出力の他の一部をセレクタ18b経由でそれ
ぞれ入出力ポート17bより出力する。即ち中継アドレス
として、メモリからネットワークへの転送時は中継アド
レス生成部15a出力を、ネットワークからネットワーク
への転送時はデータの一部を用いる。またメモリアドレ
スとして、リード時はメモリアドレス生成部18a出力
を、ライト時はデータの一部を用いる。Finally, the data flow from the input / output ports 17c to 17b is as follows. The address 50c is output from the relay address generator 15b, and the data 51 is output from the input / output port 17c.
After inputting c, the tag converting unit 131 converts the tag portion and then fetches it in the buffer 8. Then buffer as address 50b
A part of the output of 8 is output via the selector 18c, and another part of the output of the buffer unit 8 is output as data 51b from the input / output port 17b via the selector 18b. That is, as the relay address, the output of the relay address generation unit 15a is used when transferring from the memory to the network, and a part of the data is used when transferring from the network to the network. As the memory address, the output of the memory address generation unit 18a is used at the time of reading and a part of the data is used at the time of writing.

【００２９】なお、制御部16a、16bはデータ中継装置の
バッファ状態52a、52bを監視する。ここでは、入出力ポ
ート17b,17cを単方向としたが、これは双方向でもよ
い。図15はこれを示したもので、第１の実施例における
データ転送装置を双方向にした場合の構成図である。こ
の場合はバッファ7、8、9、内部線を双方向化した上で、
セレクタ18e,18dを入出力ポート17c側に設ける。タグ変
換部131は１つであるので、データの流れが入出力ポー
ト17c->入出力ポート17bの場合はバッファ8入力時、入
出力ポート17b->入出力ポート17cの場合はバッファ8出
力時にタグの変換をすることになる。The control units 16a and 16b monitor the buffer states 52a and 52b of the data relay device. Although the input / output ports 17b and 17c are unidirectional here, they may be bidirectional. FIG. 15 shows this, and is a block diagram when the data transfer device in the first embodiment is bidirectional. In this case, after making buffers 7, 8, 9 and the internal line bidirectional,
Selectors 18e and 18d are provided on the input / output port 17c side. Since there is only one tag converter 131, when the data flow is I / O port 17c-> I / O port 17b, buffer 8 is input, and when I / O port 17b-> I / O port 17c, buffer 8 is output. You will be converting tags.

【００３０】次に、第２のデータ転送装置について説明
する。図３は本発明の第２の実施例におけるデータ転送
装置の構成図である。基本的には図２と同様であるが、
入出力ポート17aから17bへのデータの流れに若干の相違
点があるのでこれについて説明する。メモリアドレス生
成部12aよりセレクタ18a経由で、アドレス50を出力して
メモリリードを行なう。データ51を入出力ポート17aか
ら取り込み、タグ及び中継アドレス生成部130の出力を
タグとしてデータに付加した後、バッファ7へ取り込
む。次に、アドレス50bとしてバッファ部7の出力の一部
をセレクタ18c経由で、データ51bとしてバッファ部7の
他の一部をセレクタ18b経由でそれぞれ入出力ポート17b
より出力する。即ち中継アドレスとして、メモリからネ
ットワークへの転送時、ネットワークからネットワーク
への転送時共にデータの一部を用いる。図２ではデータ
中継装置15aで生成したアドレスを、図３ではタグ及び
中継アドレス生成部130で生成することになる。Next, the second data transfer device will be described. FIG. 3 is a block diagram of a data transfer device according to the second embodiment of the present invention. Basically the same as in FIG. 2, but
Since there are some differences in the data flow from the input / output ports 17a to 17b, this will be described. The memory address generator 12a outputs the address 50 via the selector 18a to read the memory. The data 51 is fetched from the input / output port 17a, the output of the tag and relay address generation unit 130 is added to the data as a tag, and then fetched into the buffer 7. Next, a part of the output of the buffer unit 7 as the address 50b is sent via the selector 18c, and another part of the buffer unit 7 is sent as the data 51b via the selector 18b.
Output more. That is, a part of the data is used as the relay address both during the transfer from the memory to the network and during the transfer from the network to the network. The address generated by the data relay device 15a in FIG. 2 is generated by the tag and relay address generation unit 130 in FIG.

【００３１】ここでは、入出力ポート17b,17cを単方向
としたが、これは双方向でもよい。図16はこれを示した
もので、第２の実施例におけるデータ転送装置を双方向
にした場合の構成図である。この場合はバッファ7、8、
9、内部線を双方向化した上で、セレクタ18e,18dを入出
力ポート17c側に設ける。タグ変換部131は１つであるの
で、データの流れが入出力ポート17c->入出力ポート17b
の場合はバッファ8入力時、入出力ポート17b->入出力ポ
ート17cの場合はバッファ8出力時にタグの変換をするこ
とになる。Although the input / output ports 17b and 17c are unidirectional here, they may be bidirectional. FIG. 16 shows this, and is a block diagram when the data transfer device in the second embodiment is bidirectional. In this case buffers 7, 8,
9. After making the internal lines bidirectional, the selectors 18e and 18d are provided on the input / output port 17c side. Since there is only one tag converter 131, the data flow is I / O port 17c-> I / O port 17b.
In the case of, the conversion of the tag is performed at the time of inputting the buffer 8, and in the case of the input / output port 17b-> I / O port 17c, the output of the buffer 8 is performed.

【００３２】次に、タグ生成部13で生成するタグについ
て、本実施例におけるデータ形式の構成図である図４を
用いて説明する。データはデータ部24とタグ部で構成さ
れる。タグは、データの種別（属性・形式等）を示す制
御情報部20、現在の転送が第何ステップであるかを示す
回数部21、２回目以降に中継する中継部のアドレスを中
継順に示す複数の中継アドレス部22a,22b、最終的なデ
ータの格納アドレスを示すメモリアドレス部23を備え
る。図４（ａ）が送出、中継ＰＥ間、図４（ｂ）が中
継、受信ＰＥ間のデータの形式である。図１の例では図
４（ａ）がＰＥ1a-1c間、図４（ｂ）がＰＥ1c-1d間のデ
ータの形式である。Next, the tags generated by the tag generator 13 will be described with reference to FIG. 4, which is a block diagram of the data format in this embodiment. The data is composed of a data section 24 and a tag section. The tag is a control information section 20 that indicates the type (attribute, format, etc.) of data, a count section 21 that indicates what step is the current transfer, and a plurality that indicates the addresses of relay sections that relay after the first time in the relay order. Relay address sections 22a and 22b, and a memory address section 23 indicating a final storage address of data. FIG. 4A shows a data format between the sending and relaying PEs, and FIG. 4B shows a data format between the relaying and receiving PEs. In the example of FIG. 1, FIG. 4A shows a data format between PE1a-1c and FIG. 4B shows a data format between PE1c-1d.

【００３３】図１を用いてタグとＰＥ間通信の関係につ
いて説明する。データ転送装置は図３の構成とする。デ
ータ転送装置5aのタグ及び中継アドレス生成部130で、
制御情報部20に制御情報を、回数部21には回数を、中継
アドレス部22aにはデータ中継装置6aのアドレスを、中
継アドレス部22bにはデータ中継装置6eのアドレスを、
メモリアドレス部23にはメモリ4dのアドレスをそれぞれ
示したタグを生成・付加する。これが図４（ａ）であ
る。例として、制御情報部20はデータ長を示す３、回数
部21は一回目を示す１、中継アドレス部22aはデータ中
継装置6aが手前にあるので０、中継アドレス部22bも０
を付加する。データ転送装置5aは、中継アドレス部22a
｀０｀を送出してデータ中継装置6aへデータを送出す
る。ここで制御情報部20の３はデータ長が２の３乗つま
り８ワードである旨等を含む。次にデータ転送装置5cの
タグ変換部131ではデータ中継装置6aからデータを取り
込み、回数部21を２回目を示す２にし、中継アドレス部
22aを削除して図４（ｂ）に示す形式に変換後、中継ア
ドレス部22b｀０｀を送出してデータ中継装置6eへデー
タを送出する。最後に、データ転送装置5dではデータ中
継装置6eからデータを取り込んだ後、メモリアドレス部
23を出力してメモリ4dへデータをライトする。データ部
が８ワードであるので、メモリアドレス部23に示すアド
レスより順次８ワードライトすることになる。The relationship between the tag and PE communication will be described with reference to FIG. The data transfer device has the configuration shown in FIG. In the tag and relay address generation unit 130 of the data transfer device 5a,
Control information in the control information unit 20, the number of times in the number unit 21, the address of the data relay device 6a in the relay address unit 22a, the address of the data relay device 6e in the relay address unit 22b,
The memory address unit 23 generates and adds tags indicating the addresses of the memory 4d. This is FIG. 4 (a). As an example, the control information unit 20 indicates a data length of 3, the count unit 21 indicates a first time, the relay address unit 22a is 0 because the data relay device 6a is in front, and the relay address unit 22b is also 0.
Is added. The data transfer device 5a includes a relay address section 22a.
"0" is sent to send the data to the data relay device 6a. Here, 3 in the control information section 20 includes the fact that the data length is 2 to the power of 3, that is, 8 words. Next, the tag conversion unit 131 of the data transfer device 5c takes in the data from the data relay device 6a, sets the count unit 21 to 2 indicating the second time, and sets the relay address unit.
After deleting 22a and converting it to the format shown in FIG. 4B, the relay address section 22b-0 'is sent to send the data to the data relay device 6e. Finally, in the data transfer device 5d, after fetching the data from the data relay device 6e,
23 is output and the data is written to the memory 4d. Since the data portion is 8 words, 8 words are sequentially written from the address shown in the memory address portion 23.

【００３４】ここで、データ取り込みに関して、候補と
なるデータ中継装置が複数ある場合は、順次スキャンし
てデータの準備されたものから取り込めばよい。これら
一連の動作で、データ転送装置はネットワークから受け
たデータを再度ネットワークに送出するか、メモリにラ
イトするか決める必要がある。これは例えば、回数部21
を見て判断すればよい。また、別の方法としてデータ中
継装置にＰＥ間距離に相当する複数のバッファを準備
し、ある特定のバッファからのデータはメモリ、その他
はネットワークと決めれば回数部21は不要となる。これ
は、図５にて後ほど説明する。Here, regarding data acquisition, when there are a plurality of candidate data relay devices, it is sufficient to sequentially scan and acquire data from the prepared one. In these series of operations, the data transfer device needs to decide whether to send the data received from the network again to the network or to write it in the memory. This is, for example, the counting part 21
You can judge by looking at. As another method, if a plurality of buffers corresponding to the distance between PEs are prepared in the data relay device and data from a certain specific buffer is determined to be a memory, and the rest is a network, the counting unit 21 becomes unnecessary. This will be described later with reference to FIG.

【００３５】また、データ転送装置が図２の構成の場合
には、図４の中継アドレス部22aは存在せず、代わりに
中継アドレス生成部15aが一回目の中継アドレスを生成
することになる。なお、タグの構成としては回数部21の
代わりに、複数の中継アドレス部の最後に終了符号を付
ける形式も可能である。When the data transfer device has the configuration of FIG. 2, the relay address part 22a of FIG. 4 does not exist, and the relay address generation part 15a instead generates the first relay address. Note that the tag may have a format in which an end code is added to the end of a plurality of relay address parts instead of the count part 21.

【００３６】さらに別の方法として、タグ変換をせず図
４（ａ）の形式のままで、１回目は22aを、２回目は22b
を中継アドレスとして送出する方法もある。この場合は
図２等におけるタグ変換部131は不要であるが、セレク
タ18cで１、２回目で選択するビット位置を変える必要
が生じる。また少しではあるが、タグのビット数が大き
くなる。As another method, without converting the tag, the format is as shown in FIG. 4A, the first time is 22a, and the second time is 22b.
Is also available as a relay address. In this case, the tag conversion unit 131 in FIG. 2 and the like is unnecessary, but it is necessary to change the bit position selected by the selector 18c at the first and second times. Moreover, the bit number of the tag becomes large, although it is a little.

【００３７】データ毎にアドレス情報が付加された以上
の構成により、送り手及び受け手のＰＥが複数で、かつ
データ数が複数で、また流れる順序がランダムな場合で
も、複雑な制御なしで、確実に転送が実現できる。With the above configuration in which the address information is added to each data, even if there are plural senders and receivers PE, the number of data is plural, and the flow order is random, complicated control can be performed without fail. Transfer can be realized.

【００３８】さて次に、データ中継装置について説明す
る。図５は本発明の実施例におけるデータ中継装置の構
成図である。データ中継装置には２つのモードを有す
る。Next, the data relay device will be described. FIG. 5 is a block diagram of the data relay device in the embodiment of the present invention. The data relay device has two modes.

【００３９】第１のモードは、１本のバッファとして動
作するもので、入力セレクタ35はバッファ10出力を、出
力セレクタ34はバッファ11出力を選択する。バッファ10
及び11を連続した１本のバッファとして使用する。動作
は従来例（図12）と同様である。即ち、制御部31aはバ
ッファ10、11のリード／ライトを制御する。デコーダ30a
・30bはアドレス50b・50cを監視し、自分がアクセスされ
た際に、トライステートバッファ32a・32bをイネーブル
として、バッファ状態52a・52bを外部へ通過させる。こ
の時、制御部31b,トライステートバッファ32c・32d はデ
ィスエーブルである。The first mode operates as one buffer, and the input selector 35 selects the buffer 10 output and the output selector 34 selects the buffer 11 output. Buffer 10
And 11 are used as one continuous buffer. The operation is similar to the conventional example (Fig. 12). That is, the control unit 31a controls reading / writing of the buffers 10 and 11. Decoder 30a
The 30b monitors the addresses 50b and 50c and, when accessed, enables the tristate buffers 32a and 32b and passes the buffer states 52a and 52b to the outside. At this time, the control unit 31b and the tri-state buffers 32c and 32d are disabled.

【００４０】第２のモードは、２本のバッファを並列に
動作させるもので、入力セレクタ35は入出力ポート36a
を、出力セレクタ34は必要に応じてバッファ10または11
出力を選択する。バッファ10及び11を独立な２本のバッ
ファとして使用する。In the second mode, two buffers are operated in parallel, and the input selector 35 is the input / output port 36a.
The output selector 34 uses the buffer 10 or 11 as required.
Select an output. Buffers 10 and 11 are used as two independent buffers.

【００４１】制御部31aはバッファ10のリード／ライト
を、制御部31bはバッファ11のリード／ライトを制御す
る。デコーダ30a・30bはアドレス50b・50cを監視し、自分
がアクセスされた際に、トライステートバッファ32a-32
dをイネーブルとして、バッファ状態52a-52d を外部へ
通過させる。ここでのバッファ状態とは、書き込み側は
バッファフル、読みだし側はバッファエンプティに関す
るものであり、バッファ状態52a・52bがバッファ10に、
バッファ状態52c・52dがバッファ11にそれぞれ対応す
る。The control unit 31a controls reading / writing of the buffer 10, and the control unit 31b controls reading / writing of the buffer 11. Decoders 30a and 30b monitor addresses 50b and 50c, and when they are accessed, tri-state buffers 32a-32
Enable d and pass buffer states 52a-52d to the outside. The buffer state here is about buffer full on the writing side and buffer empty on the reading side, and the buffer states 52a and 52b are stored in the buffer 10.
The buffer states 52c and 52d correspond to the buffer 11, respectively.

【００４２】第２のモードでは、データ中継装置6にＰ
Ｅ間距離２に相当する２本のバッファが存在することに
なる。データ中継装置6a・6eでは、バッファ10a・10eが１
回目、バッファ11a・11eが２回目のデータを格納する。
従って、データは前記した様に図１の点線の流れとな
る。In the second mode, the data relay device 6 receives a P
There will be two buffers corresponding to the E-to-E distance 2. In the data relay devices 6a and 6e, the buffers 10a and 10e are 1
The buffers 11a and 11e store the data for the second time.
Therefore, the data has the flow of the dotted line in FIG. 1 as described above.

【００４３】データ転送装置5 は、データ中継装置の２
つのバッファ状態を監視して、送受可能な方とデータの
やりとりを行なう。ここで、データ転送装置5cがバッフ
ァ10a から取り込んだ時はバッファ8c に、バッファ11a
から取り込んだ時はバッファ9cにそれぞれ格納する制御
を行なうことで、図４のタグの回数部21は不要となる。The data transfer device 5 is a data relay device 2
It monitors the status of the two buffers and exchanges data with those who can send and receive data. Here, when the data transfer device 5c fetches from the buffer 10a, the data is transferred to the buffer 8c and the buffer 11a.
When the data is fetched from, the tag count portion 21 of FIG. 4 becomes unnecessary by controlling the storage in the buffer 9c.

【００４４】第３のＰＥを介するＰＥ間通信時、本デー
タ中継装置の第１のモードを用いるデータ中継装置を用
いた場合はデッドロックが発生するが、本データ中継装
置の第２のモードを用いることで、１回目と２回目のデ
ータが独立に扱えるので、デッドロックが回避できる
が、ここの事情について説明する。During PE-to-PE communication via the third PE, a deadlock occurs when a data relay device using the first mode of the data relay device is used, but the second mode of the data relay device is switched to the deadlock mode. By using the data, the first and second data can be handled independently, so deadlock can be avoided. The situation here will be described.

【００４５】図17は本発明のデータ中継装置（第１のモ
ード）を用いた場合の転送の様子を示す図、図18は本発
明のデータ中継装置（第２のモード）を用いた場合の転
送の様子を示す図である。ここでは、ＰＥ1b,1a,1c間で
データが流れる場合について考える。また簡単化のため
バッファ7,8,9は１段、図17でバッファ10d,10a,10fは２
段、図18でバッファ10d,10a,10f,11d,11a,11fは１段と
する。FIG. 17 is a diagram showing a state of transfer when the data relay apparatus (first mode) of the present invention is used, and FIG. 18 is a case where the data relay apparatus (second mode) of the present invention is used. It is a figure which shows the mode of transfer. Here, consider a case where data flows between the PEs 1b, 1a, 1c. Further, for simplification, the buffers 7, 8, 9 have one stage, and the buffers 10d, 10a, 10f have two stages in FIG.
In FIG. 18, the buffers 10d, 10a, 10f, 11d, 11a and 11f are assumed to be one stage.

【００４６】送信、中継、受信ＰＥとデータの関係は次
の様になる。送信ＰＥ1b->中継ＰＥ1a->受信ＰＥ1c ：データc1,c
2,c3,c4 送信ＰＥ1a->中継ＰＥ1c->受信ＰＥ1b ：データb1,b
2,b3,b4 送信ＰＥ1c->中継ＰＥ1b->受信ＰＥ1a ：データa1,a
2,a3,a4 データ中継装置（第１のモード）を用いた場合では、図
17に示す様な状態に陥った場合にデッドロックとなる。
ここで例えばＰＥ1bはデータc4またはa2を送出したい
が、データ中継装置6dのバッファ10dがフルであるので
送れない。データ中継装置6dはデータc3を吐き出したい
がc3が入るべきバッファ8aがフルであるので転送できな
い。バッファ8aに空きが生じるためにはバッファ10ａに
空きが生じる必要があるが、データb3が入るべきバッフ
ァ8cがフルであるのでバッファ10aは空かない。バッフ
ァ8cが空くためにはバッファ10fが空く必要があるが、
このためにはバッファ8bが空く必要がある。そのために
はバッファ10dが空く必要があり、結局どのバッファも
空くことはない、つまりデッドロックとなる。この様に
複数のＰＥで閉じたループを構成する場合にデッドロッ
クが発生する可能性が高い。The relationship between transmission, relay, reception PE and data is as follows. Send PE1b-> Relay PE1a-> Receive PE1c: Data c1, c
2, c3, c4 Send PE1a-> Relay PE1c-> Receive PE1b: Data b1, b
2, b3, b4 Send PE1c-> Relay PE1b-> Receive PE1a: Data a1, a
When using 2, a3, a4 data relay device (first mode),
Deadlock occurs when the situation shown in 17 is reached.
Here, for example, PE1b wants to send the data c4 or a2, but cannot send because the buffer 10d of the data relay device 6d is full. The data relay device 6d wants to discharge the data c3, but cannot transfer it because the buffer 8a into which c3 should be stored is full. In order for the buffer 8a to have a free space, the buffer 10a needs to have a free space, but since the buffer 8c to which the data b3 should be inserted is full, the buffer 10a is not free. Buffer 10f needs to be free in order for buffer 8c to be free,
This requires buffer 8b to be free. For that purpose, the buffer 10d needs to be free, and eventually, no buffer is free, that is, a deadlock. When a closed loop is composed of a plurality of PEs in this way, a deadlock is likely to occur.

【００４７】データ中継装置（第２のモード）を用いた
場合では、図17に相当する状態が図18(a)である。例え
ばデータ中継装置6dについてみればバッファ10dに１回
目の転送途中のデータc3、バッファ11dに２回目の転送
途中のデータa1が格納される。When the data relay device (second mode) is used, the state corresponding to FIG. 17 is FIG. 18 (a). For example, regarding the data relay device 6d, the data c3 during the first transfer is stored in the buffer 10d, and the data a1 during the second transfer is stored in the buffer 11d.

【００４８】次のサイクルではデータa1がバッファ9a、
データc1がバッファ9c、データb1がバッファ9bに転送さ
れる。この状態を示したのが図18(b)である。バッファ9
a,9c,9bのデータはメモリにライトされるのでこれらの
バッファにはすぐ空きが生じる。こうなると例えばＰＥ
1bはデータa2をバッファ11d経由でバッファ9aに送れ
る。バッファ8bに空きが生じるのでバッファ10fのデー
タa3がバッファ8bに転送可能となる。以下同様に順次デ
ータが流れデッドロックは生じない。In the next cycle, the data a1 is stored in the buffer 9a,
The data c1 is transferred to the buffer 9c and the data b1 is transferred to the buffer 9b. This state is shown in FIG. 18 (b). Buffer 9
Since the data of a, 9c, and 9b are written in the memory, empty space is immediately created in these buffers. When this happens, for example PE
1b can send data a2 to buffer 9a via buffer 11d. Since there is a space in the buffer 8b, the data a3 in the buffer 10f can be transferred to the buffer 8b. Similarly, the data sequentially flows thereafter, and deadlock does not occur.

【００４９】以上により、ランダムな通信要求が発生し
た場合でも確実に動作できる。また１回目と２回目のデ
ータの優先度付けを適切に行なうことで転送性能も向上
する。また直接ＰＥ間で転送する場合は、第１のモード
で大きなバッファリングが可能となる。As described above, even if a random communication request occurs, the operation can be surely performed. Also, the transfer performance is improved by appropriately prioritizing the first and second data. Further, when transferring directly between PEs, large buffering becomes possible in the first mode.

【００５０】データ転送装置とデータ中継装置間の制御
線を含めた信号線の接続の様子を図７に示す。データ中
継装置6aと6b、6aと6cの信号線が共通に接続される。中
継アドレスによりアクティブなデータ中継装置が選択さ
れてデータ・アドレスが受け渡される。またバッファ状
態は選択されたデータ中継装置のみが出力し、他のデー
タ中継装置はハイインピーダンスである。FIG. 7 shows how the signal lines including the control lines between the data transfer device and the data relay device are connected. The signal lines of the data relay devices 6a and 6b and 6a and 6c are commonly connected. The active data relay device is selected by the relay address and the data address is passed. The buffer state is output only by the selected data relay device, and the other data relay devices have high impedance.

【００５１】次にネットワークでのアドレス・データの
形式を、本発明の実施例におけるアドレス・データ構成
図である図８を用いて説明する。ここでは、図８
（ａ）,（ｂ）２つの例を示す。一般に並列処理システ
ムではメモリとネットワークでのバス幅が異なり、ネッ
トワーク側が狭くなる。そのため、ネットワークとのイ
ンターフェースでデータ幅の変換が必要となる。Next, the format of address data in the network will be described with reference to FIG. 8 which is an address data configuration diagram in the embodiment of the present invention. Here, FIG.
(A), (b) Two examples are shown. Generally, in a parallel processing system, the memory and the bus width of the network are different, and the network side becomes narrower. Therefore, it is necessary to convert the data width at the interface with the network.

【００５２】図８（ａ）では、データ転送装置1aからデ
ータ中継装置6a間のデコーダ30aへアドレス50bが入力さ
れる。データ51b（アドレス以外）は、データ転送装置1
aの出力ラッチ40に格納された後、セレクタ41で分解さ
れて、データ中継装置6aへ入力される。図８（ｂ）で
は、アドレス50b及びデータ51bはデータ転送装置1aの出
力ラッチ40に格納された後、セレクタ41で分解されて、
データ中継装置6aへ入力され、アドレス50bはデコーダ3
0aへ、データ51bは内部へ入力される。図８（ｂ）は、
データにアドレス情報が含まれ、データ中継装置は入力
データを常に監視し、自分のアドレスに対応するものが
出現した場合にデータを取り込むデーターフロー的な制
御となる。図８（ａ）に比べて、制御ロジックは複雑に
なり、転送量も多いが、データ転送装置とデータ中継装
置間の配線数は少なくなる。In FIG. 8A, the address 50b is input from the data transfer device 1a to the decoder 30a between the data relay devices 6a. Data 51b (other than address) is transferred to the data transfer device 1
After being stored in the output latch 40 of a, it is disassembled by the selector 41 and input to the data relay device 6a. In FIG. 8B, the address 50b and the data 51b are stored in the output latch 40 of the data transfer device 1a and then decomposed by the selector 41,
It is input to the data relay device 6a and the address 50b is input to the decoder 3
The data 51b is input to 0a. FIG. 8B shows
Since the data includes address information, the data relay device constantly monitors the input data, and when data corresponding to its own address appears, the data relay device controls the data flow to take in the data. Compared to FIG. 8A, the control logic is complicated and the transfer amount is large, but the number of wires between the data transfer device and the data relay device is small.

【００５３】最後に、本発明の実施例におけるデータ転
送方法について図９、図１９で説明する。図９は本発明
の実施例におけるデータ転送方法を示す図、図１９は同
実施例における時間と転送レートの関連図である。図９
は、ネットワーク2が完全クロスバ網の例である。ＰＥ
からのデータ送出順序をデータ中継装置6a-6p内に示
す。即ち、最初のステップでＰＥ1aはデータ中継装置6a
に、1bは6f、1cは6k、1dは6pに一斉にデータを送出す
る。次のステップでは、ＰＥ1aはデータ中継装置6bに、
1bは6g、1cは6l、1dは6mに一斉にデータを送出する。以
下同様で、端まで送出し終わると最初に戻る。これによ
り最初のステップ終了後、全てのＰＥで受信が可能とな
る。即ち、ＰＥ1aはデータ中継装置6a、1bは6f、1cは6
k、1dは6pよりそれぞれデータを受信できる。従って特
定のネットワークの負荷が偏らないので、システム全体
の転送効率が向上する。これを図１９に示す。従来は、
時間１で１つのチャネルでのみ送受が行なわれるので転
送レートは１である。時間２で転送レート２、３で３、
４で４と増えて行きその後減少し、時間７で終了する。
これを点線で示す。本実施例では実線で示した様に時間
１−４で全チャネルが動作つまり転送レートが４であ
り、時間４で転送は終了する。Finally, a data transfer method according to the embodiment of the present invention will be described with reference to FIGS. FIG. 9 is a diagram showing a data transfer method in the embodiment of the present invention, and FIG. 19 is a relational diagram of time and transfer rate in the embodiment. Figure 9
Is an example in which network 2 is a complete crossbar network. PE
The data transmission order from the data relay device is shown in the data relay devices 6a-6p. That is, in the first step, the PE 1a is the data relay device 6a.
In addition, 1b sends data to 6f, 1c sends 6k, and 1d sends data to 6p all at once. In the next step, the PE 1a transfers to the data relay device 6b,
1b sends data to 6g, 1c sends to 6l, and 1d sends to 6m all at once. The same applies to the above, and when the transmission to the end is completed, the process returns to the beginning. This allows all PEs to receive data after the first step is completed. That is, PE 1a is a data relay device 6a, 1b is 6f, and 1c is 6
Each of k and 1d can receive data from 6p. Therefore, the load on a specific network is not biased, and the transfer efficiency of the entire system is improved. This is shown in FIG. conventionally,
The transfer rate is 1 because transmission / reception is performed only on one channel at time 1. Transfer rate 2 and 3, 3 at time 2,
It increases to 4 at 4, then decreases, and ends at time 7.
This is shown by the dotted line. In this embodiment, as shown by the solid line, all channels are operating at time 1-4, that is, the transfer rate is 4, and at time 4, the transfer ends.

【００５４】なお、本例は既に説明した図６の様に部分
的にクロスバを有するシステムに適用可能である。ま
た、最初に述べた様に、ここではデータの流れを一方向
に限定したが、バッファの双方向化、セレクタなど一部
の回路を２つ持つことで、二方向の流れにも容易に対応
できる。This example can be applied to a system having a crossbar partially as shown in FIG. Also, as mentioned earlier, the data flow is limited to one direction here, but by making the buffer bidirectional and having some circuits such as a selector, it is possible to easily handle the two-way flow. it can.

【００５５】また、ＰＥ間距離が２の並列処理システム
について述べたが、タグの中継アドレス、データ中継装
置のバッファ本数をＮとすることで、ＰＥ間距離がＮの
並列処理システムに拡張可能である。またこれらを組み
合わせて、各種の形態のネットワークを実現することが
可能となる。The parallel processing system having a PE distance of 2 has been described. However, by setting the relay address of the tag and the number of buffers of the data relay device to N, the parallel processing system having a PE distance of N can be expanded. is there. Also, by combining these, it becomes possible to realize various forms of networks.

【００５６】[0056]

【発明の効果】以上述べてきた様に、本発明の並列処理
システムでは、任意ＰＥ間通信時、中継ＰＥでのメモリ
ライト／リードがないので、ここでのオーバーヘッドが
軽減され、転送性能が向上する。また、データ転送装置
がバスアクセスをしないので、バス幅も広がりプロセサ
の性能も向上する。As described above, in the parallel processing system of the present invention, since there is no memory write / read in the relay PE during communication between arbitrary PEs, the overhead here is reduced and the transfer performance is improved. To do. Further, since the data transfer device does not access the bus, the bus width is widened and the performance of the processor is improved.

【００５７】また本発明のデータ転送方式では、ネット
ワークの負荷が分散する、つまり受信側のＰＥが均等に
動作できるので、システム全体の転送性能が向上する。Further, in the data transfer system of the present invention, the load of the network is dispersed, that is, the PEs on the receiving side can operate uniformly, so that the transfer performance of the entire system is improved.

【００５８】さらに、本発明のデータ転送装置、データ
中継装置を用いることで、各種のネットワークを有する
並列処理システムが構成できる。Further, by using the data transfer device and the data relay device of the present invention, a parallel processing system having various networks can be constructed.

【００５９】単体プロセサの計算機性能及び半導体技術
の限界が見えてきた現在、並列処理システムへの期待は
非常に大きく、本発明は極めて有用なものである。Now that the limitations of computer performance and semiconductor technology of a single processor have become apparent, expectations for parallel processing systems are extremely high, and the present invention is extremely useful.

[Brief description of drawings]

【図１】本発明の第１の実施例における並列処理システ
ムの構成図FIG. 1 is a configuration diagram of a parallel processing system according to a first embodiment of the present invention.

【図２】本発明の第１の実施例におけるデータ転送装置
の構成図FIG. 2 is a configuration diagram of a data transfer device according to the first embodiment of the present invention.

【図３】本発明の第２の実施例におけるデータ転送装置
の構成図FIG. 3 is a configuration diagram of a data transfer device according to a second embodiment of the present invention.

【図４】本発明の実施例におけるデータ形式の構成図FIG. 4 is a configuration diagram of a data format in the embodiment of the present invention.

【図５】本発明の実施例におけるデータ中継装置の構成
図FIG. 5 is a configuration diagram of a data relay device according to an embodiment of the present invention.

【図６】本発明の第１の実施例における並列処理システ
ムの全体構成図FIG. 6 is an overall configuration diagram of a parallel processing system according to the first embodiment of the present invention.

【図７】同実施例における接続詳細図FIG. 7 is a detailed connection diagram in the embodiment.

【図８】本発明の実施例におけるアドレス・データ構成
図FIG. 8 is an address / data configuration diagram according to an embodiment of the present invention.

【図９】本発明の実施例におけるデータ転送方法を示す
図FIG. 9 is a diagram showing a data transfer method according to an embodiment of the present invention.

【図１０】従来の第１の並列処理システムの構成図FIG. 10 is a configuration diagram of a conventional first parallel processing system.

【図１１】従来のデータ転送装置の構成図FIG. 11 is a block diagram of a conventional data transfer device.

【図１２】従来のデ−タ中継装置の構成図を示す図FIG. 12 is a diagram showing a configuration diagram of a conventional data relay device.

【図１３】従来のデータ転送方法を示す図FIG. 13 is a diagram showing a conventional data transfer method.

【図１４】従来の第２の並列処理システムの構成図FIG. 14 is a configuration diagram of a conventional second parallel processing system.

【図１５】第１の実施例におけるデータ転送装置を双方
向にした場合の構成図FIG. 15 is a configuration diagram when the data transfer device in the first embodiment is bidirectional.

【図１６】第１の実施例におけるデータ転送装置を双方
向にした場合の構成図FIG. 16 is a configuration diagram when the data transfer device in the first embodiment is bidirectional.

【図１７】データ中継装置（第１のモード）を用いた場
合の転送の様子を示す図FIG. 17 is a diagram showing a state of transfer when a data relay device (first mode) is used.

【図１８】データ中継装置（第２のモード）を用いた場
合の転送の様子を示す図FIG. 18 is a diagram showing a state of transfer when a data relay device (second mode) is used.

【図１９】本発明の実施例におけるデータ転送方法にお
ける時間と転送レートの関連図FIG. 19 is a diagram showing the relationship between time and transfer rate in the data transfer method according to the embodiment of the present invention.

[Explanation of symbols]

１ＰＥ（プロセサエレメント）２ネットワーク３プロセサ４メモリ５データ転送装置６データ中継装置７−１１バッファ１２メモリアドレス生成部１３タグ生成部１４カウンタ１５中継アドレス生成部１６制御部１７入出力ポート１８セレクタ２０制御情報部２１回数部２２中継アドレス部２３メモリアドレス部２４データ部３０デコーダ３１バッファ制御部３２トライステートバッファ３４出力セレクタ３５入力セレクタ３６入出力ポート４０出力ラッチ４１セレクタ５０アドレス５１データ５２バッファ状態７０ＰＵ（プロセッシングユニット）７１ＣＰＵ７２ローカルメモリ７３周辺ＬＳＩ７４コネクションメモリ７５ポート１３０タグ及び中継アドレス生成部１３１タグ変換部 1 PE (Processor Element) 2 Network 3 Processor 4 Memory 5 Data Transfer Device 6 Data Relay Device 7-11 Buffer 12 Memory Address Generator 13 Tag Generator 14 Counter 15 Relay Address Generator 16 Controller 17 Input / Output Port 18 Selector 20 Control information part 21 Counting part 22 Relay address part 23 Memory address part 24 Data part 30 Decoder 31 Buffer control part 32 Tri-state buffer 34 Output selector 35 Input selector 36 Input / output port 40 Output latch 41 selector 50 Address 51 Data 52 Buffer status 70 PU (Processing Unit) 71 CPU 72 Local Memory 73 Peripheral LSI 74 Connection Memory 75 Port 130 Tag and Relay Address Generator 131 Tag Converter

Claims

[Claims]

1. A plurality of processor elements comprising a processor, a memory, and a data transfer device having three buffers, first, second and third buffers, and between the plurality of processor elements directly or at least one of them. The processor element serving as a sender is provided with a network for indirectly connecting the data by relaying the processor element, and in the processor element serving as a sender at the time of the data transfer, the data transfer device transfers the data from the memory or the processor. In the processor element which is the receiver after receiving the data from the network in the second buffer,
In a processor element for storing in a memory or a processor and relaying the data, a data transfer device captures the data from the network in the third buffer and then sends the data back to the network.

2. A first, a second, and a third input / output port,
A first buffer connected between the first input / output port and the second input / output port, and a second buffer connected between the first input / output port and the third input / output port A third buffer connected between the second input / output port and the third input / output port; and a tag generator for adding a tag to the data fetched from the first input / output port, A memory address generation unit that generates an address when inputting data from the outside to the first input / output port and counts the number of inputs, and a data when outputting data from the first input / output port to the outside. A counter that counts the number of outputs and a part of the data output from the first input / output port is taken out as an address, and this is selected when outputting data, and the address of the address generation unit is selected when inputting data. To the third input / output port, and a first selector that outputs an external address when the data is output from the second input / output port to the outside. A second relay address generation unit that generates an external address when inputting data from the outside, and outputs the first buffer and the third buffer when outputting data from the second input / output port to the outside. A second selector for selecting a part of the first buffer and an output of the first relay address generation unit when outputting data from the first buffer, and a data output from the third buffer when outputting data from the third buffer. And a third selector that selects another part of the output of the third buffer.

3. A first, a second, and a third input / output port,
A first buffer connected between the first input / output port and the second input / output port, and a second buffer connected between the first input / output port and the third input / output port A third buffer connected between the second input / output port and the third input / output port; and a tag generator for adding a tag to the data fetched from the first input / output port, A memory address generation unit that generates an address when inputting data from the outside to the first input / output port and counts the number of inputs, and a data when outputting data from the first input / output port to the outside. A counter that counts the number of outputs and a part of the data output from the first input / output port is taken out as an address, and this is selected when outputting data, and the address of the address generation unit is selected when inputting data. And a second selector that selects a part of the first buffer output and a part of the third buffer output when outputting data from the second input / output port to the outside. And a third selector for selecting the other part of the output of the first buffer and the other part of the output of the third buffer.

4. A first, a second, and a third input / output port,
A first buffer connected between the first input / output port and the second input / output port, the first input / output port and the third buffer
A second buffer connected between the input / output ports of the third buffer, a third buffer connected between the second input / output port and the third input / output port, and the first input / output port A tag generator that adds tags to the data imported from
A memory address generation unit that generates an address when inputting data from the outside to the first input / output port and counts the number of inputs, and a data when outputting data from the first input / output port to the outside. A counter that counts the number of outputs and a part of the data output from the first input / output port is taken out as an address, and this is selected when outputting data, and the address of the address generation unit is selected when inputting data. And a first selector which outputs an external address when outputting data from the second input / output port to the outside, and the third input / output port A second relay address generating unit for generating an external address when outputting data from the outside to the outside, and a first relay address generating unit when outputting data from the second input / output port to the outside. And a second selector for selecting a part of the output of the third buffer, and the output of the first relay address generation unit when outputting data from the first buffer, the third selector A third selector that selects another part of the output of the third buffer when outputting data from the third buffer, and the second buffer and the third buffer when outputting data from the third input / output port to the outside. A fourth selector for selecting a part of the output of the third buffer, and a second selector for outputting data from the second buffer.
And a fifth selector for selecting the other part of the output of the third buffer when outputting the data from the relay address generator of the third buffer. apparatus.

5. At least three I / O ports, a first buffer connected between the first I / O port and the second I / O port, the first I / O port and the third I / O port. A second buffer connected between the output ports;
A third buffer connected between the I / O port of the first I / O port and the third I / O port, a tag generation unit for adding a tag to the data fetched from the first I / O port, and the first external buffer. A memory address generation unit that generates an address when inputting data to the I / O port and counts the number of inputs, and a number of times that the data is output when the data is output from the first I / O port to the outside. The counter and a part of the data output from the first input / output port are taken out as an address, which is output at the time of data output, and the address of the address generator is selected at the time of data input and output to the outside. When outputting data to the outside from the first selector and the second input / output port, the first selector
Selector for selecting a part of the buffer output and a part of the third buffer output, and another part of the first buffer output and another part of the third buffer output And a fourth selector for selecting a part of the second buffer output and a part of the third buffer output when outputting data from the third input / output port to the outside. The other part of the second buffer output and the third part
And a fifth selector for selecting another part of the buffer output of the data transfer device.

6. The tag generation unit according to claim 2 or 4, wherein: a control information unit that determines the type of data; and a plurality of relay address units that indicate the addresses of the relay units to be relayed after the second time in the relay order. A data transfer device, wherein information having a memory address portion indicating a final data storage address is added as a tag.

7. A tag generation unit according to claim 3 or 5, wherein a control information unit for determining the type of data, a plurality of relay address units indicating addresses of relay units to be relayed in relay order, and final data A data transfer device, characterized in that information having a memory address portion indicating the storage address of is added as a tag.

8. A first and a second input / output port, N buffers where N is an integer of 2 or more, an output selector with N inputs and 1 output, and N−1 two inputs and 1 output. An input selector, the output of the N buffers is connected to the input of the output selector, the output is connected to the first input / output port, and the second input / output port is connected to the input of the first buffer; Connected to one end of the inputs of the N-1 input selectors,
The outputs of the input selectors are connected to the second to Nth buffer inputs and L is an integer from 1 to N-1.
The data relay device is characterized in that the buffer output of is connected to the other end of the L-th input selector.

9. A processor, a memory, a plurality of processor elements configured to connect the data transfer device according to claim 2 to a common bus, and data according to claim 8 at each grid point. Arbitrary communication between processor elements is possible by arranging a relay device and performing communication between two processor elements by passing through this, and by relaying the processor element N-1 times, where N is an integer of 2 or more. In the processor element that is provided with a network that becomes a sender at the time of data transfer, in the processor element that the data transfer device captures the data from the memory or the processor into the first buffer and then sends the data to the network, and relays the data, The data transfer device fetches the data from the network into the third buffer and then sends the data to the network again. In the processor element serving as the receiver, the data transfer device fetches the data from the network into the second buffer and then stores the data in the memory or the processor. Further, L is an integer of 1 or more and N−1, A parallel processing system, characterized in that, when data is transferred from a processor element for a first time to a processor element for an (L + 1) th time, the data is passed through an Lth buffer of the data relay device.

10. N is an integer of 2 or more and at least 2.
N processor elements with one port, K,
A network having NxN grid points, where L is an integer of 1 or more and N or less, each grid point is (K, L), and a buffer (K, L) having at least two ports is arranged at this grid point. One end of the K-th processor element is commonly connected to one end of the buffer (K, L), and the other end of the buffer (K, L) is connected so that L becomes common. In a parallel processing system including, as a basic unit, a configuration in which a connection line is connected to the processor element or is used as an external port, a buffer (K,
A data transfer method of sequentially sending data from K).