JP2585318B2

JP2585318B2 - Interconnecting network and cross bus switch therefor

Info

Publication number: JP2585318B2
Application number: JP62289323A
Authority: JP
Inventors: 晃村松; 郁夫吉原; 和夫中尾; 林　　剛久; 輝雄田中; 重夫長島
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1987-11-18
Filing date: 1987-11-18
Publication date: 1997-02-26
Anticipated expiration: 2012-02-26
Also published as: JPH01131950A

Description

【発明の詳細な説明】〔産業上の利用分野〕本発明は並列計算機の要素プロセツサの相互結合方式
に係り、特に高い結合能力が必要でありながらプロセツ
サ台数が多くてフルクロスバスイツチでは全部を結合出
来ない場合に好適なスイツチ構成に関する。DETAILED DESCRIPTION OF THE INVENTION [Industrial Application Field] The present invention relates to an interconnecting method of element processors of a parallel computer, and particularly requires a high connecting capability, but has a large number of processors and is fully connected in a full cross bus switch. The present invention relates to a switch structure suitable for a case where the switch cannot be used.

[Conventional technology]

従来の装置は、各要素プロセツサを一本または数本の
バスに結合する一般的な方法の他、特開昭58−181168に
記載のように、格子状に配列された要素プロセツサの隣
合うもの同士を結合する方式、特開昭59−109966に記載
のように、全要素プロセツサを（フル）クロスバスイツ
チで結合する方式、特開昭62−2353に記載のように、全
要素プロセツサを多段スイツチで結合する方式、文献１
に記載のように、超立方体結合を取るもの等が代表的で
ある。Conventional devices include, in addition to a general method of coupling each element processor to one or several buses, a device adjacent to element processors arranged in a grid as described in JP-A-58-181168. A method of connecting all elements to each other, as described in JP-A-59-109966, a method of connecting all element processors by a (full) cross bus switch, and a method of connecting all element processors to a multi-stage switch as described in JP-A-62-2353. , Method 1
As described above, those taking hypercube bonds are typical.

文献1:シー・エル・サイツ『ザコスミツクキユー
ブ』、コミユニケーシヨンズオブザエーシーエ
ム、28巻、１号、22〜23頁、1985年（C.L.Seitz“The C
osmic Cube",communications of the ACM,vol.28,no.1,
pp.22−33,1985）〔発明が解決しようとする問題点〕上記従来技術のうち、バス結合方式はハードウエアの
量が少ないという利点があるが、結合されている要素プ
ロセツサ台数が多いと、バスの競合により性能が低下す
るという問題点があり、十数台が限度とされている。Literature 1: C. L. Sites, The Kosmitsu Kyubu, Communications of the ACM, Vol. 28, No. 1, pp. 22-23, 1985 (CLSeitz “The C
osmic Cube ", communications of the ACM, vol.28, no.1,
pp.22-33,1985) [Problems to be Solved by the Invention] Among the above-mentioned prior arts, the bus connection method has an advantage that the amount of hardware is small, but if the number of connected element processors is large, However, there is a problem that performance is degraded due to bus contention, and the number is limited to a dozen.

格子状結合（メツシユ結合ともいう）は同じくハード
ウエアの量が少なく、しかも多数の要素プロセツサを結
合できる反面，隣接プロセツサとしか交信出来ないため
その通信性能は扱う問題の性質に大きく依存する。近傍
計算に向く偏微分方程式の求解，画像処理等は良いが、
有限要素法や高速フーリエ変換（FET）、論理／回路シ
ミユレーシヨン等では通信のオーバーヘツドが著しくな
る。Lattice connection (also called mesh connection) requires a small amount of hardware, and can connect a large number of element processors. However, since it can communicate only with adjacent processors, its communication performance greatly depends on the nature of the problem to be handled. Solving PDEs and image processing for neighborhood calculation are good,
In the finite element method, the fast Fourier transform (FET), the logic / circuit simulation, etc., the communication overhead becomes remarkable.

フルクロスバスイツチ結合はマトリツクススイツチに
より全ての要素プロセツサを完全結合するものである。
そのため、性能的にはあらゆる結合の中で最高である
が、ハードウエア量が要素プロセツサ台数２乗に比例す
るため、一般には数十台程度が結合限度とされている。The full cross bus switch completely connects all the component processors by a matrix switch.
For this reason, the performance is the highest among all combinations. However, since the amount of hardware is proportional to the square of the number of element processors, the coupling limit is generally about several tens.

多段スイツチはハードウエア量が要素プロセツサ台数
をＬとするとLlog₂L程度に抑えられ、しかも完全結合が
可能であるため、多数の要素プロセツサを含む超並列計
算機向きの結合方式とされてきた。しかし、通信路の長
さ（中継段数）が約log₂L程度となり転送遅延が大きい
こと、また多数の要素プロセツサが同じ共有変換をアク
セスしようとすると、複数のアクセスパスが途中の通信
経路を奪いあいホツトスポツトコンテンシヨンと呼ばれ
るネツトワークの全面マヒ（マヒは全てのアクセスに波
及する）が乗じること、ホツトスポツトコンテンシヨン
に至らなくてもアクセス競合が大きく性能が出ないこ
と、等の問題点が指摘されている。The multistage switch has a hardware amount reduced to about Llog ₂ L when the number of element processors is L, and can be completely connected. Therefore, the connection method has been adopted for a massively parallel computer including a large number of element processors. However, when the length of the communication path (the number of relay stages) is about log ₂ L and the transfer delay is large, and when many element processors try to access the same shared conversion, multiple access paths rob the middle communication path. The problem is that the entire network of the network called "hot spot contention" (the paralysis affects all accesses) is multiplied, and even if the hotspot contention is not reached, access competition is large and performance does not appear. It is pointed out.

超立方体結合（ハイパーキユーブ）は、比較的効率の
良い通信が行える結合として知られているが、プログラ
ム上で通信相手を指定しなくてはならずプログラミング
が煩雑となる。これを避けるために、各要素プロセツサ
対応に自動中継機構を設けると、ハードウエア量が増大
する。また、結線が交差するので実装が面倒であるとい
う問題がある。The hypercube connection (hypercube) is known as a connection that enables relatively efficient communication, but requires a communication partner to be specified on a program, which complicates programming. If an automatic relay mechanism is provided for each element processor to avoid this, the amount of hardware increases. In addition, there is a problem that mounting is troublesome because the connections intersect.

大規模な数値計算の並列処理においては、しばしば特
定のプロセツサ間通信パタンが現れることが知られてい
る。代表的なものとしては、格子結合，リング結合，バ
タフライ結合が挙げられる。従つて、これら特定のパタ
ンの通信が高速に処理できるならば、そのネツトワーク
の有効性は大であるといえる。上記の従来技術のうち、
格子結合，リング結合，バタフライ結合を自身の結合ト
ポロジーとして内包しており、中継動作を必要とせずに
これらのパタンで通信できるのは、フルクロスバスイツ
チと超立方体結合だけである。バス結合，格子結合，多
段スイツチはいずれも、これら特定パタンの通信全てを
高速に処理することはできない。また、特殊な例とし
て、文献２には２台のプロセツサの結合を基本とする２
構成の超立方体結合を複数台のプロセツサの結合を基本
にした構成に拡張したSpanning Bus Hypercubeが紹介さ
れているが、複数台のプロセツサはバス結合されている
ため一時には２台のプロセツサしか通信出来ず、上記結
合トポロジーを内包しているとはみなせない。In parallel processing of large-scale numerical calculations, it is known that a specific communication pattern between processors often appears. Typical examples include lattice coupling, ring coupling, and butterfly coupling. Therefore, if the communication of these specific patterns can be processed at high speed, the effectiveness of the network can be said to be great. Of the above prior art,
Lattice connection, ring connection, and butterfly connection are included as their own connection topologies, and only full-cross bus switches and hypercube connections can communicate with these patterns without the need for relay operations. All of the bus connection, the lattice connection, and the multistage switch cannot process all of the communication of these specific patterns at high speed. Also, as a special example, reference 2 describes a basic method based on the coupling of two processors.
The Spanning Bus Hypercube, which is an extension of the hypercube coupling of the configuration to a configuration based on the coupling of multiple processors, is introduced, but since multiple processors are bus-coupled, only two processors can communicate at a time. Therefore, it cannot be considered that the above connection topology is included.

文献2:ダルマ・ピー・アグラワル他『エバリユエイテイ
ングザパフオーマンスオブマルチコンピユータ
コンフイギユレーシヨン』、アイ・イー・イー・イー
コンピユータ、メイ、1986、28〜29頁、1986年（Dharma
P.Agrawal et.al."Evaluating the Performance of Mu
lticomputer Configurations",May 1986,pp.28−29,198
6）以上の諸問題のうち、バス結合における結合台数制約
の問題は、要素プロセツサ台数が多い場合には解決する
ことが出来ない。また、格子結合における扱う問題の性
質によつて性能が大幅に変わる点、多段スイツチのホツ
トスポツトコンテンシヨンの問題はいずれも基本的かつ
本質的な問題であり、現状では解決されていない。さら
に、Spanning Bus Hypercubeと共に、これらの結合は格
子結合，リング結合，バタフライ結合の全てを内包して
いないことに起因する主要応用問題における性能低下の
問題がある。Literature 2: Dharma P. Agrawal et al., "Evarieating the Performance of Multicomputer Configuration", IEEComputer, May, 1986, 28-29, 1986 (Dharma
P.Agrawal et.al. "Evaluating the Performance of Mu
lticomputer Configurations ", May 1986, pp.28-29,198
6) Among the above-mentioned problems, the problem of the restriction on the number of connected buses cannot be solved when the number of element processors is large. Further, the point that the performance greatly changes depending on the nature of the problem to be dealt with in the lattice coupling, and the problem of the hot spot contention of the multistage switch are all fundamental and essential problems, and have not been solved at present. Further, along with the Spanning Bus Hypercube, there is a problem of performance degradation in a main application problem due to the fact that these couplings do not include all of lattice coupling, ring coupling, and butterfly coupling.

このように原理的困難が無い残る二つのネツトワー
ク：（フル）クロスバスイツチ、超立方体結合の内、前
者はハードウエア量が多すぎて多数台の要素プロセツサ
を結合出来ず、後者は多数台を結合できるが、プログラ
ミングと実装が大変であり、結合台数が増加すると性能
も低下する。また、超立方体結合では、直接結合してい
ない２台の要素プロセツサ間で通信する場合には、別の
要素プロセツサに中継させる必要がある。このように情
報パケツトを一旦ある要素プロセツサに取り込ませてか
ら別の要素プロセツサに転送していく通信の方法をスト
ア・アンド・フオワード方式というが、超立方体結合に
限らずストア・アンド・フオワード方式では、複数台の
要素プロセツサP₁,P₂,P₃…がループ状に通信経路を形成
して、その上で中継動作を行なおうとすると、P₁はP₂が
送信動作を終了して受信可能になるまで送信動作を終了
出来ず、P₂はP₃が送信動作を終了して受信可能になるま
で送信動作を終了出来ず、…というようにお互いに噛み
合つて動けなくなるデツドロツク状態に陥ることがある
という問題がある。As described above, there are two remaining networks that have no fundamental difficulty: (full) cross bus switch and hypercube connection. The former has too much hardware and cannot connect a large number of element processors, and the latter has a large number. They can be combined, but programming and implementation are difficult, and the performance decreases as the number of combined units increases. In the hypercube connection, when communication is performed between two element processors that are not directly connected, it is necessary to relay the data to another element processor. As described above, the communication method in which an information packet is once taken into one element processor and then transferred to another element processor is called the store-and-forward method, but the store-and-forward method is not limited to hypercube coupling. When a plurality of element processors P ₁ , P ₂ , P ₃ … form a communication path in a loop and attempt to perform a relay operation thereon, P ₁ receives the signal after the transmission operation is completed by P ₂ can not terminate the transmission operation until it can be, P ₂ is falling into a deadlock state in which the P ₃ can not be the end of the transmission operation until it can be received by the end of the transmission operation, ... get stuck if connexion bite to each other and so on There is a problem that there is.

性能は、一単位の送信情報が最終目的地に到達するま
でに通過する基本切替スイツチ（クロスポイント）の数
で、ハードウエア量は、ネツトワークを構成するクロス
ポイントの総数で評価するが、通常、ハードウエア量と
性能はトレードオフ関係にあり、クロスポイント総数を
増せば一単位の送信情報の通過するクロスポイント数は
減少する。本発明は、上記の意味での原理的困難のない
相互結合ネツトワークであり、かつ、ハードウエア量の
上限値（技術的・経済的）と要素プロセツサ台数が任意
に与えられたとき、これらのプロセツサをフルクロスバ
スイツチに近い高い結合能力（少ない切替段数）で結合
するとともに、通信性能とハードウエア量に関して最適
な結合を与えるシステム構成を提供すること、特に最小
もしくは最適なスイツチ・ハードウエア量を持つネツト
ワークを可変的に構成する技術を提供することを目的と
している。すなわち、従来技術の範囲では、プロセツサ
台数が少ないうちはフルクロスバスイツチで、ある台数
以上になると超立方体結合でネツトワークを構成せざる
を得なかつたが、本発明によればこれら両結合方式の中
間の性能を持ち、デツドロツクの恐れのない自動中継機
能付きの場合にはさらにスイツチ・ハードウエア量も超
立方体結合より少ない結合ネツトワークを幾種類も構成
できる。また実装は、単位となるスイツチをチツプ内，
モジユール内，ボード内，匡体内，匡体間等にスイツチ
毎にまとめて実装出来るので、性能バランス上、保守上
好適である。The performance is the number of basic switching switches (cross points) that a unit of transmission information passes before reaching the final destination, and the hardware amount is evaluated by the total number of cross points constituting the network. The amount of hardware and the performance are in a trade-off relationship. If the total number of cross points is increased, the number of cross points through which one unit of transmission information passes decreases. The present invention is an interconnection network without any fundamental difficulties in the above sense, and when an upper limit (technical / economic) of the hardware amount and the number of element processors are arbitrarily given, To combine processors with high coupling capacity (small number of switching stages) close to a full cross bus switch, and to provide a system configuration that provides optimal coupling in terms of communication performance and hardware amount, and in particular, to minimize the minimum or optimal switch hardware amount. The purpose of the present invention is to provide a technique for variably configuring a network having the same. In other words, in the range of the prior art, a network is formed by a full-cross bus switch while the number of processors is small, and a network is formed by a hypercube connection when the number of processors is more than a certain number. In the case of an automatic relay function having an intermediate performance and no danger of deadlock, it is possible to construct a number of types of connection networks in which the amount of switch hardware is smaller than that of the hypercube connection. In addition, the implementation is as follows:
Since the switches can be collectively mounted in the module, on the board, in the housing, between the housings, and the like, it is suitable for performance balance and maintenance.

[Means for solving the problem]

上記目的は、基本的には、Ｌ＝n₁×n₂×…×n_N個と因
数分解できるＬを要素プロセツサの台数とする並列計算
機において、これらの因数の各々を一辺の格子点数とす
るＮ次元格子間空間上の超直方体の内点の座標（i₁,i₂,
…,i_N）,0≦i₁≦n₁−1,0≦i₂≦n₂−1,…０≦i_N≦n_N−１
を各要素プロセツサのプロセツサ番号として与え、任意
のｋに対し第ｋ次元の座標のみが異なるプロセツサ番号
を持つ一群の要素プロセツサ、すなわち、プロセツサ番
号を持つn_K個の要素プロセツサ群を一つのn_K入力n_K出力の
クロスバスイツチで相互に結合し、該結合を第ｋ次元を
除くＮ−１次元部分空間の座標（i₁,i₂,…,n_K-1,n_K+1,…,i_N）の全て（L/n_K組）にわたつて行ない、さらに全てのＫ
（１≦ｋ≦Ｎ）に対して行なうことにより構成した、計
Ｌ×（1/n₁＋1/n₂＋…＋1/n_N）個のクロスバスイツチに
より要素プロセツサを結合し、さらにこの結合におい
て、送信側プロセツサに付随する中継手段が自プロセツ
サ番号（i₁,i₂,…,i_N）と目的地プロセツサ番号（j₁,
j₂,…,j_N）とで不一致な次元の一つｋ（i_K≠j_K）を選
び、送信側プロセツサの中継手段に付随しているＮ個の
クロスバスイツチ（これを以下、座標変換クロスバスイ
ツチと呼ぶ）のうち第ｋ次元の座標のみが異なるプロセ
ツサ番号の要素プロセツサ群を結合しているクロスバス
イツチ（第ｋ次元座標変換クロスバスイツチ）を選択
し、これに目的地プロセツサ番号を送信データと組にし
て構成した通信用情報パケツトを入力し、各座標変換ク
ロスバスイツチは目的地プロセツサ番号の第ｋ次元座標
部分をデコードして第ｋ次元の座標が目的地プロセツサ
番号の第ｋ次元座標と等しいプロセツサ番号を持つプロ
セツサ、すなわち、目的地プロセツサそれ自身、または
目的地プロセツサへ至る経路上のプロセツサに送信して
中継させ、後者の場合、これを不一致座標が無くなるま
で繰り返すことにより目的地プロセツサに情報パケツト
を送信する方式を用いることにより解決することが出来
る。さらに、要素プロセツサに付随する中継手段として
クロスバスイツチを用いることにより、中継時に他の中
継パスと競合することが無くなるためデツドロツクの危
険性を完全に排除することができる。Basically, in a parallel computer in which L can be factorized into L = n ₁ × n ₂ ×... × n _N , and the number of element processors is L, each of these factors is defined as the number of grid points on one side. The coordinates of the interior points of the hypercube on the N-dimensional interstitial space (i ₁ , i ₂ ,
..., i _N ), 0 ≤ i ₁ ≤ n ₁ -1, 0 ≤ i ₂ ≤ n ₂ -1, ... 0 ≤ i _N ≤ n _N -1
Is given as a processor number of each element processor, and for a given k, a group of element processors having a processor number different only in the k-dimensional coordinate, that is, a processor number N _K number of elements processor groups bonded to each other by cross-bus Lee Tutsi of one n _K input n _K outputs, the coupling coordinates of N-1 dimensional subspace except for the k-th dimension (i _1, i ₂ with, …, N _K−1 , n _{K + 1} ,…, i _N ) (L / n _K pairs), and all K
(1 ≦ k ≦ N), the element processors are connected by a total of L × (1 / n ₁ + 1 / n ₂ +... + 1 / n _N ) cross bus switches. , The relay means associated with the transmitting-side processor includes its own processor number (i ₁ , i ₂ ,..., I _N ) and the destination processor number (j ₁ ,
j ₂ ,..., j _N ), and selects one of the dimensions k (i _K ≠ j _K ) that is inconsistent with N 2 cross bus switches (hereinafter referred to as coordinate transformation) attached to the relay means of the transmitting processor. Cross bus switch), and selects a cross bus switch (k-th coordinate conversion cross bus switch) that connects the element processor groups having different processor numbers only in the k-th coordinate, and sends the destination processor number to this. And a communication information packet configured as a pair, each coordinate conversion cross bus switch decodes the k-th coordinate part of the destination processor number, and the k-th coordinate becomes the k-th coordinate of the destination processor number. A processor having an equal processor number, that is, the destination processor itself or a processor on a route to the destination processor for transmission and relay, and the latter processor This mismatch coordinates can be solved by using a method for transmitting information packet to the destination processor by the repeated until no. Further, by using a cross bus switch as a relay means associated with the element processor, it is possible to completely eliminate the risk of deadlock since there is no competition with other relay paths at the time of relay.

[Action]

本発明の相互結合方式により任意の要素プロセツサ間
で通信が行えることを述べる。プロセツサ番号（i₁,i₂,
…,i_N）を持つ送信元プロセツサからプロセツサ番号（j
₁,i₂,…,i_N）を持つ目的地プロセツサへ通信する場合を
考える。送信元プロセツサの第一座標i₁と目的地プロセ
ツサ第一座標j₁とが等しくない場合、これ以外の座標が
全て等しい要素プロセツサは一つのクロスバスイツチ
（第１次座標変換クロスバスイツチ）に接続されている
から、このクロスバスイツチによりプロセツサ番号
（j₁,i₂,…,i_N）を持つ要素プロセツサ、または該要素
プロセツサに付随した中継クロスバスイツチに情報を送
信できる。次にいま情報を受け取つた要素プロセツサ、
または該要素プロセツサに付随した中継クロスバスイツ
チは、第２次座標以外の座標が全て等しい要素プロセツ
サと一つの座標変換クロスバスイツチにより結合されて
いるから、i₂≠j₂なら、このスイツチによりプロセツサ
番号（j₁,j₂,i₃,…i_N）を持つ要素プロセツサ、または
該要素プロセツサに付随した中継クロスバスイツチに情
報を送信できる。このような経路を選び次々とクロスバ
スイツチにより対応する座標を置き換えた要素プロセツ
サ、または該要素プロセツサに付随した中継クロスバス
イツチに送信していくことにより、最後にはプロセツサ
番号（j₁,j₂,…,j_N）を持つ要素プロセツサに情報を送
信できる。It is described that communication can be performed between arbitrary element processors by the mutual connection method of the present invention. Processor number (i ₁ , i ₂ ,
, I _N ) to the processor number (j
₁ , i ₂ ,..., I _N ). If the first coordinate i ₁ of the source processor and the destination processor first coordinate j ₁ are not equal, other coordinates all elements equal processor is connected to one of the crossbar Lee Tutsi (primary coordinate transformation crossbar b Tutsi) Therefore, the information can be transmitted to the element processor having the processor number (j ₁ , i ₂ ,..., I _N ) or the relay cross bus switch associated with the element processor by using the cross bus switch. Next, the element processor that has just received the information,
Alternatively, the relay cross bus switch attached to the element processor is connected to the element processor having the same coordinates other than the secondary coordinates by one coordinate conversion cross bus switch. Therefore, if i ₂ 2j ₂ , the processor number is determined by this switch. Information can be transmitted to an element processor having (j ₁ , j ₂ , i ₃ ,... I _N ) or a relay cross bus switch associated with the element processor. By selecting such a path and sequentially transmitting it to an element processor whose corresponding coordinates are replaced by a cross bus switch or a relay cross bus switch attached to the element processor, finally, the processor numbers (j ₁ , j ₂ , .., J _N ) can be sent to the element processor.

また、多くの場合、Ｌの因数分解を適当に行なうこと
により、各次元の結合要素プロセツサ台数をある範囲に
制限することが出来る。これにより、各次元の座標変換
クロスバスイツチを定められた実装単位内、例えば、チ
ツプ内，モジユール内，ボード内，匡体内，匡体間等に
収めることが可能となる。この性質は、各因数がすべて
同じ値をとるという条件:L＝m^Nの下では充分満たすこと
ができず、本発明の分解:L＝n₁×n₂×…xn_Nが必要条件
となる。In many cases, the number of connected component processors in each dimension can be limited to a certain range by appropriately performing the factorization of L. As a result, it is possible to store the coordinate transformation cross bus switch of each dimension in a predetermined mounting unit, for example, in a chip, in a module, in a board, in a housing, between housings, and the like. This property is all the factors condition that takes the same value: can not be satisfied sufficiently under the L = m ^N, decomposition of the present invention: a _{_{L = n 1 × n 2 ×}} ... xn N prerequisite .

中継クロスバスイツチを用いない場合には、要素プロ
セツサP₁が要素プロセツサP₂にパケツトを中継しようと
し、同時に要素プロセツサP₂も要素プロセツサP₁にパケ
ツトを中継しようとするとデツドロツクが生じる。しか
し、中継クロスバスイツチを用いると、P₂からP₁へのパ
ケツトの流れと独立にP₁からP₂へのパケツトの流れを設
定できるため、デツトロツクは発生しない。When not using the relay crossbar Lee Tutsi, the element processor P ₁ is attempting to relay packet to the element processor P _2, deadlock occurs simultaneously element processor P ₂ also tries to relay packet to the element processor P _1. However, the use of relay crossbar Lee Tutsi, it is possible to set the flow of packet from P ₁ to P ₂ independently of the flow of the packet from P ₂ to P _1, Detsutorotsuku does not occur.

〔Example〕

以下、本発明の実施例を図面により詳細に説明する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

第一実施例（１）相互結合ネツトワークの構成第１図は本発明の第一実施例の結合方式の３次元を例
とした説明図であるが、Ｎ次元に拡張した場合も同様で
ある。第２図に示すように、３次元格子空間上のＬ×Ｍ
×Ｋの直方体の内点に対応する各格子点に論理的に配置
された任意の（プロセツサ番号（i,j,k）を持つ）要素
プロセツサＰ（i,j,k）は、第１図に示すように、３個
のクロスバスイツチ９−1,9−2,9−３に接続される。こ
こにクロスバスイツチ９−１は、プロセツサＰ（i,j,
k）と第１次元の座標のみが異なる要素プロセツサＰ
（0,j,k）,P（1,j,k），……,P（Ｌ−1,j,k）を完全結
合するものであり、同様にクロスバスイツチ９−２は第
２次元の座標のみが異なる要素プロセツサＰ（i,0,k）,
P（i,1,k），……,P（i,M−1,k）を、また、クロスバス
イツチ９−３は第３次元の座標のみが異なる要素プロセ
ツサＰ（i,j,0）,P（i,j,1），……,P（i,j,k−１）を
完全結合するものである。First Embodiment (1) Configuration of Interconnection Network FIG. 1 is an explanatory view showing a three-dimensional example of a coupling system according to a first embodiment of the present invention. . As shown in FIG. 2, L × M on a three-dimensional lattice space
An arbitrary element processor P (i, j, k) (having a processor number (i, j, k)) logically arranged at each grid point corresponding to the interior point of the × K rectangular parallelepiped is shown in FIG. As shown in (1), they are connected to three cross bus switches 9-1, 9-2, 9-3. Here, the cross bus switch 9-1 is connected to the processor P (i, j,
k) and an element processor P that differs only in the first dimension coordinates
(0, j, k), P (1, j, k),..., P (L−1, j, k) are completely connected. Similarly, the cross-bass switch 9-2 has a second dimension. Element processor P (i, 0, k) which differs only in coordinates,
P (i, 1, k),..., P (i, M−1, k), and the cross bus switch 9-3 is an element processor P (i, j, 0) that differs only in the coordinates of the third dimension. , P (i, j, 1),..., P (i, j, k-1).

各々のクロスバスイツチは、プロセツサ番号を構成す
る３次元座標の一つの特定次元の座標値を他の座標値に
置き換えた番号を持つ要素プロセツサと通信する機能を
持つ。このため、このクロスバスイツチを以下では座標
変換クロスバスイツチと呼ぶ。そして、特定の次元ｋの
座標変換を行なうスイツチをＫ次元座標変換クロスバス
イツチと呼ぶ。後に示すように、３個の座標変換クロス
バスイツチを中継することにより、いかなる番号の要素
プロセツサとも通信することができる。Each cross bus switch has a function of communicating with an element processor having a number obtained by replacing one specific dimension coordinate value of three-dimensional coordinates constituting a processor number with another coordinate value. Therefore, this cross bus switch is hereinafter referred to as a coordinate transformation cross bus switch. The switch for performing the coordinate conversion of a specific dimension k is called a K-dimensional coordinate conversion cross bus switch. As will be described later, it is possible to communicate with any number of element processors by relaying the three coordinate transformation cross bus switches.

（２）要素プロセツサの構造第３図には、要素プロセツサの構造を示す。要素プロ
セツサＰ（i,j,k）は中継装置１およびプログラムカウ
ンタを持ち命令を逐次実行していく通常の計算機である
処理装置２とから構成されている。中継装置１はマイク
ロプログラムを内蔵し、処理装置２または入力ポートレ
ジスタ５から入力した通信用パケツト送信先プロセツサ
番号を解読し、その結果に基づき特定の座標変換クロス
バスイツチ９−１〜９−３、または処理装置２を選択し
てそこに通信用パケツトを送る機能を持つ通信制御装置
３と、通信用パケツトを一時的に格納する入力ポートレ
ジスタ５および出力ポートレジスタ６と、３個の座標変
換クロスバスイツチからの入力通信路の一つを選択する
セレクタ７と、出力ポート中の通信用パケツトの送出先
として３個の座標変換クロスバスイツチの一つを選択す
る分配器８とから構成されている。ここに通信用パケツ
トは送信先プロセツサ番号と送信データとから成つてい
る。(2) Structure of Element Processor FIG. 3 shows the structure of the element processor. The element processor P (i, j, k) comprises a relay device 1 and a processing device 2 which is a normal computer having a program counter and sequentially executing instructions. The relay device 1 has a built-in microprogram, decodes a communication packet transmission destination processor number input from the processing device 2 or the input port register 5, and, based on the result, specific coordinate conversion cross-bus switches 9-1 to 9-3; Alternatively, a communication control device 3 having a function of selecting a processing device 2 and sending a communication packet thereto, an input port register 5 and an output port register 6 for temporarily storing a communication packet, and three coordinate conversion crossbars. It comprises a selector 7 for selecting one of the input communication paths from the switch, and a distributor 8 for selecting one of the three coordinate conversion cross-bus switches as a destination of the communication packet in the output port. Here, the communication packet is composed of a destination processor number and transmission data.

（３）通信方法次に第１図の３次元の例で、プロセツサＰ（i,j,k）
−出発地プロセツサーからプロセツサＰ（0,0,0）−目
的地プロセツサーへ送信する仕組みについて第1,3,4図
を用いて説明する。まず、出発地プロセツサＰ（i,j,
k）の処理装置２は、通信制御装置３に通信用パケツト
を入力しその送信を指示する。通信用パケツトの宛先情
報（プロセツサ番号）は３個の座標（0,0,0）から構成
され、その第一座標から順にその座標値を通信制御装置
３のマイクロプログラム中の自プロセツサ番号を構成す
る３個の座標値（i,j,k）と比較していき、最初に不一
致となつた第１座標ｉに関してこの座標値を０に置き換
えた差を持つ要素プロセツサＰ（0,j,k）と通信するべ
く、対応する第１座標変換クロスバスイツチ９−１を選
択する。通信用パケツトは出力ポートレジスタ６に置か
れ、選択されたクロスバスイツチ９−１の番号“1"は信
号線12により分配器に入力される。分配器８はこの番号
“1"を用いてデータ線13、制御信号線12を第１座標変換
クロスバスイツチ９−１の一つの入力チヤネル101,102
に接続する。通信制御装置３は制御信号線12および101
データ線13および102を用いて出力ポート６中の通信用
パケツトを第１座標変換クロスバスイツチ９−１に送出
する。座標変換クロスバスイツチの構造と動作について
は後述する。(3) Communication method Next, in the three-dimensional example of FIG. 1, the processor P (i, j, k)
The mechanism of transmission from the originating processor to the processor P (0,0,0) will be described with reference to FIGS. First, the departure point processor P (i, j,
The processing device 2 of k) inputs the communication packet to the communication control device 3 and instructs the transmission thereof. The destination information (processor number) of the communication packet is composed of three coordinates (0,0,0), and the coordinate values are sequentially arranged from the first coordinate to the own processor number in the microprogram of the communication control device 3. Is compared with the three coordinate values (i, j, k) to be processed, and an element processor P (0, j, k) having a difference obtained by replacing the coordinate values with 0 with respect to the first coordinate i that first becomes inconsistent. ), The corresponding first coordinate transformation cross-bus switch 9-1 is selected. The communication packet is placed in the output port register 6, and the number “1” of the selected cross bus switch 9-1 is input to the distributor via the signal line 12. The distributor 8 uses the number "1" to connect the data line 13 and the control signal line 12 to one of the input channels 101 and 102 of the first coordinate conversion cross bus switch 9-1.
Connect to The communication control device 3 includes control signal lines 12 and 101
The communication packet in the output port 6 is sent to the first coordinate conversion cross bus switch 9-1 using the data lines 13 and 102. The structure and operation of the coordinate transformation cross bus switch will be described later.

該クロスバスイツチ９−１を経由してこの通信用パケ
ツトを送られた要素プロセツサＰ（0,j,k）は、そのセ
レクタ７−２が送信要求信号OREQ（後述）を出力してい
る複数のクロスバスイツチの中から該クロスバスイツチ
９−１を選択すると、（本発明では選択の論理について
は主張しない）クロスバスイツチの出力チヤネル線103,
104が制御信号線10−２とデータ線11−２に接続され、
入力ポートレジスタ５−２を経て通信制御装置３−２に
通信用パケツトを取り込む。このとき、セレクタ７−２
はその選択論理が選択した受信クロスバスイツチ９−１
の番号“1"も制御信号線10−２により伝える。要素プロ
セツサＰ（0,j,k）の通信制御装置３−２では該スイツ
チ番号“1"から変換された座標−この例では第１座標−
を知り、再びそれ（第１座標）以降の座標から順に目的
地プロセツサの座標値（＊,0,0）−＊は変換済み座標で
あることを示す−を自プロセツサの座標値（0,j,k）と
比較していき、最初に不一致となつた第２次元座標ｊに
関してこの座標値を０に置き換えた座標を持つ要素プロ
セツサＰ（0,0,k）に送信するべく第２次元座標変換ク
ロスバスイツチ９−４を選択して入力チヤネル線201,20
2に通信用パケツトを送出する。出力チヤネル線203,204
からパケツトを入力したプロセツサＰ（0,0,k）でも同
様にして中継動作を行ない、第３次元座標変換クロスバ
スイツチ９−５の入力チヤネル線301,302にパケツトを
送出し、目的地プロセツサＰ（0,0,0）へは出力チヤネ
ル線303,304を経由して通信用パケツトを届けることが
出来る。目的地プロセツサＰ（0,0,0）では、セレクタ
７−４を経由して入力ポート５−４に格納されたパケツ
トの宛先（プロセツサ番号）（0,0,0）を通信制御装置
３−４が解読し、マイクロプログラム中の自プロセツサ
番号（0,0,0）と一致するので、処理装置２−４にパケ
ツトの到着を通知する。The element processor P (0, j, k) to which the communication packet has been sent via the cross bus switch 9-1 has a plurality of selectors 7-2 whose transmission request signals OREQ (described later) are output. When the cross bus switch 9-1 is selected from the cross bus switches, the output channel lines 103 and 103 of the cross bus switch (the present invention does not claim the logic of selection) are selected.
104 is connected to the control signal line 10-2 and the data line 11-2,
The communication packet is taken into the communication control device 3-2 via the input port register 5-2. At this time, the selector 7-2
Is the reception cross bus switch 9-1 selected by the selection logic.
The number "1" is also transmitted by the control signal line 10-2. In the communication control device 3-2 of the element processor P (0, j, k), the coordinates converted from the switch number “1” —the first coordinates in this example—
And the coordinate values (*, 0, 0) of the destination processor in the order from the coordinates after the first coordinate (*, 0, 0)-* indicate that the coordinates are converted coordinates-and the coordinate values (0, j) of the own processor. , k), and the second-dimensional coordinate j to be transmitted to the element processor P (0,0, k) having the coordinate obtained by replacing the coordinate value with 0 with respect to the first non-coincident second-dimensional coordinate j. Select the conversion cross bus switch 9-4 and input channel lines 201 and 20.
2. Send a communication packet to 2. Output channel wire 203,204
Similarly, the processor P (0,0, k) to which a packet has been input performs a relaying operation, sends a packet to the input channel lines 301 and 302 of the three-dimensional coordinate conversion cross bus switch 9-5, and outputs the packet to the destination processor P (0 , 0,0), a communication packet can be delivered via output channel lines 303,304. In the destination processor P (0,0,0), the destination (processor number) (0,0,0) of the packet stored in the input port 5-4 via the selector 7-4 is transmitted to the communication control device 3- (3). 4 is decrypted and matches the own processor number (0,0,0) in the microprogram, so that the processor 2-4 is notified of the arrival of the packet.

一般のＮ次元の場合でも同様に、このようにして次々
と座標変換クロスバスイツチにより不一致座標を目的地
プロセツサの座標に置き換えた座標を持つ要素プロセツ
サに中継していくことにより、最後には目的地プロセツ
サに情報を送信できる。不一致座標の変換は高々Ｎ回で
完了するから、この結合方式の最大通信路長はＮであ
る。第１図の３次元の例では、最大通信路長は３であ
る。しかし、格子結合，リング結合，バタフライ結合に
関しては、中継動作無しで一回の送信で目的地プロセツ
サに通信用パケツトを転送できるので、フルクロスバス
イツチと同等の通信性能となる。第４図には、以上に述
べた通信制御装置３の中継動作論理を示す。Similarly, in the case of a general N-dimension, the non-coincidence coordinates are successively relayed to the element processors having the coordinates obtained by replacing the non-coincidence coordinates with the coordinates of the destination processor by the coordinate conversion cross bus switch in the same manner. Information can be sent to the processor. Since the conversion of the mismatched coordinates is completed at most N times, the maximum communication path length of this coupling method is N. In the three-dimensional example of FIG. 1, the maximum communication path length is 3. However, with regard to lattice coupling, ring coupling, and butterfly coupling, a communication packet can be transferred to a destination processor in a single transmission without a relay operation, so that the communication performance is equivalent to that of a full cross bus switch. FIG. 4 shows the relay operation logic of the communication control device 3 described above.

（４）座標変換クロスバスイツチの構造と動作第５図には、一つのＬ入力Ｌ出力のクロスバスイツチ
９の外部インタフエースを示す。一組の入力チヤネルは
２本の制御信号線IREQとIACK、およびデータ線IDATAと
からなつている。IREQは送信側の要素プロセツサが出力
ポートレジスタ６に送信したいデータを格納して送信待
ち状態にあることをクロスバスイツチに通知する信号を
載せるためのものであり、IACKはクロスバスイツチが次
の送信データを出力ポートレジスタ６に書き込んでも良
いことを要素プロセツサに通知する信号を載せるための
ものである。IDATAには送信データを載せる。同様に、
一組の出力チヤネルは２本の制御信号線OREQとOACK、お
よび出力データ線ODATAとからなつている。OREQはクロ
スバスイツチが受信側要素プロセツサの入力ポートレジ
スタ５に送信データの転送を要求する信号を載せるため
のものであり、OACKは受信側要素プロセツサがそれを完
了した信号を載せるためのものである。ODATAには送信
データを載せる。以上のインタフエースにおいて、制御
信号線IREQ,IACKは第３図に示すように分配器８を介し
て要素プロセツサの通信制御装置３と、また、制御信号
線OREQ,OACKはセレクタ７を介して要素プロセツサの通
信制御装置３と接続されている。また、データ線IDATA
は分配器８を介して要素プロセツサの出力ポートレジス
タ６と、データ線ODATAはセレクタ７を介して要素プロ
セツサの入力ポートレジスタ５と接続されている。クロ
スバスイツチは、他に、後述するプロセツサ番号をマス
クするためのマスクレジスタの内容（マスクパタン）を
設定するためのマスクレジスタ書き込み制御信号線Ｗと
マスクパタン信号線MASKを備えている。(4) Structure and Operation of Coordinate Transform Cross Bus Switch FIG. 5 shows an external interface of one L input L output cross bus switch 9. One set of input channels includes two control signal lines IREQ and IACK and a data line IDATA. IREQ is for storing data to be transmitted by the element processor on the transmission side in the output port register 6 and for carrying a signal for notifying the cross bus switch that the transmission processor is in a transmission waiting state, and IACK is for transmitting the next transmission data to the cross bus switch. Is written to the output port register 6 to inform the element processor that the data may be written. The transmission data is placed on IDATA. Similarly,
One set of output channels is composed of two control signal lines OREQ and OACK, and an output data line ODATA. OREQ is for the cross bus switch to carry a signal requesting transfer of transmission data to the input port register 5 of the receiving element processor, and OACK is for carrying the signal that the receiving element processor has completed the transfer. . The transmission data is placed on ODATA. In the above interface, the control signal lines IREQ and IACK are connected to the communication control device 3 of the element processor via the distributor 8 as shown in FIG. 3, and the control signal lines OREQ and OACK are connected to the element It is connected to the communication control device 3 of the processor. Also, the data line IDATA
Is connected to the output port register 6 of the element processor via a distributor 8, and the data line ODATA is connected to the input port register 5 of the element processor via a selector 7. The cross bus switch further includes a mask register write control signal line W and a mask pattern signal line MASK for setting the contents (mask pattern) of a mask register for masking a processor number described later.

クロスバスイツチの構造の一例を第６図に示す。この
例では、３入力６出力のクロスバスイツチをとりあげて
いるが、一般のＬ入力Ｌ出力の場合でも全く同様であ
る。送信を行なおうとする要素プロセツサ、例えば第３
図でｉ＝２とした場合プロセツサＰ（2,j,k）の通信制
御装置３は出力ポートレジスタ６に送信データを格納し
た後、分配器８にクロスバスイツチの番号“1"を送つて
特定のクロスバスイツチ９−１を選択・接続し、信号線
12を経由して入力チヤネル２の制御信号線101すなわちI
REQ2に送信要求信号を出力する。当該クロスバスイツチ
はプロセツサＰ（0,j,k）,P（1,j,k）,P（2,j,k）をそ
れぞれ入出力チヤネル0,1,2により接続している。線IRE
Q₂上の要求信号が当該クロスバスイツチのデコーダ20−
３に入力されると、送信先を求めるために出力ポート６
上にある送信情報パケツト中の送信先プロセツサ番号の
第１次元座標相当部分がデコードされ、送信先チヤネ
ル、例えば目的地プロセツサＰ（0,0,0）に対応するチ
ヤネル（出力チヤネル０）に対しては１が、それ以外に
対しては０が信号線26−３に出力されて全ての優先順位
制御回路21−１〜21−３に伝えられる。デコードされる
プロセツサ番号のビツト列はその一部だけがデコーダ20
−３に入力されれば良い。従つて、プロセツサ番号は、
３次元格子空間の座標を表す３個のフイールドよりな
り、各次元の座標変換クロスバスイツチではこれらのフ
イールドの当該次元に対応する一つのフイールドを取り
出す機構が必要である。各次元の座標値の範囲は一般に
は等しくなく、これに伴い各次元の座標フイールドの位
置や長さはまちまちである。本実施例では、このフイー
ルドを可変的に選択できるよう、各デコーダにはマスク
レジスタ24−１〜24−３を用意し、プロセツサ番号の一
部マスクして残りのビツト列だけをデコードする仕組に
している。FIG. 6 shows an example of the structure of the cross bus switch. In this example, a three-input, six-output cross bus switch is described, but the same applies to a general L-input, L-output case. The element processor that is trying to send, for example the third
In the case where i = 2 in the figure, the communication control device 3 of the processor P (2, j, k) stores the transmission data in the output port register 6 and then sends the number 8 of the cross bus switch to the distributor 8 to specify the same. Select and connect the cross bus switch 9-1 of
Via the control signal line 101 of the input channel 2
It outputs a transmission request signal to REQ2. The cross bus switch connects processors P (0, j, k), P (1, j, k) and P (2, j, k) by input / output channels 0,1,2, respectively. Line IRE
Decoder request signal on Q ₂ is of the crossbar Lee Tutsi 20-
When input to port 3, output port 6 is used to determine the destination.
The part corresponding to the first-dimension coordinates of the destination processor number in the transmission information packet above is decoded, and is transmitted to the destination channel, for example, the channel (output channel 0) corresponding to the destination processor P (0,0,0). 1 is output to the priority control circuits 21-1 to 21-3, and 0 is output to the others on the signal line 26-3. Only a part of the bit string of the processor number to be decoded is
-3 may be input. Therefore, the processor number is
It consists of three fields representing the coordinates of the three-dimensional lattice space. The coordinate transformation cross bus switch of each dimension needs a mechanism for extracting one field corresponding to the dimension of these fields. In general, the range of coordinate values in each dimension is not equal, and accordingly, the position and length of the coordinate field in each dimension vary. In this embodiment, mask registers 24-1 to 24-3 are prepared in each decoder so that this field can be variably selected, and a part of the processor number is masked to decode only the remaining bit strings. ing.

第11図にマスクレジスタの機能と構成を述べる。図で
は16×16のクロスバスイツチを想定しているが、他の規
模のクロスバスイツチでも同様である。マスクレジスタ
24は一種のマトリクススイツチとして働き、IDATA中の
送信先プロセツサ番号を示すビツト列d₀,d₁,…d₃₁のう
ち特定次元の座標を示す部分ビツト列d_id_jd_kd_lを選択し
てデコーダ（ROM）20への入力（アドレス）A₁A₂A₃A₄と
して出力する。そのためには、図に示すようにデータ線
d_id_jd_kd_lと出力線A₁A₂A₃A₄の交点に対応するフイールド
に０を、他のフイールドに１を書き込めば良い。出力線
A₁A₂A₃A₄上の信号はデコーダ20により優先順位制御回路
番号を表す４ビツトの２進数としてデコードされ、優先
順位制御回路０〜15への送信要求信号r₀〜r₁₅に変換さ
れる。倒えば、d_id_jd_kd_lの内容が‘0000'であれば、優
先順位制御回路０を選択するべく‘1000…0'なる信号に
デコードされる。このスイツチを16×16より小模式のク
ロスバスイツチ、例えば４×４のクロスバスイツチとし
て用いる場合は、マスクレジスタ24は２ビツトだけを選
択し、デコーダROM20への入力アドレスA₁A₂として用い
る。A₃A₄は０となる。従つて、この場合にはデコーダRO
Mはその一部だけが使用される。マスクレジスタ24−１
〜24−３の内容は外部（要素プロセツサまたはホスト計
算機等）からマスクレジスタ書き込み制御回路25に指示
して設定可能である（ＷおよびMASK信号線を用いる）。FIG. 11 illustrates the function and configuration of the mask register. In the figure, a 16 × 16 cross bass switch is assumed, but the same applies to cross bass switches of other sizes. Mask register
24 functions as a kind of matrix switch, and selects a partial bit sequence d _i d _j d _k d _l indicating a coordinate of a specific dimension among bit sequences d ₀ , d ₁ ,... D ₃₁ indicating a destination processor number in IDATA And outputs it as an input (address) A ₁ A ₂ A ₃ A ₄ to the decoder (ROM) 20. To do so, use the data line as shown
It suffices to write 0 in the field corresponding to the intersection of d _i d _j d _k d _l and the output line A ₁ A ₂ A ₃ A ₄ , and write 1 in the other fields. Output line
Signal on A ₁ A ₂ A ₃ A ₄ is decoded as a binary number of 4 bits representing the priority control circuit number by the decoder 20, converted into a transmission request signal r ₀ ~r ₁₅ to the priority control circuit 0-15 Is done. In other words, if the contents of d _i d _j d _k d _l are “0000”, they are decoded into signals of “1000... 0” to select the priority control circuit 0. When this switch is used as a cross bus switch smaller than 16 × 16, for example, a 4 × 4 cross bus switch, the mask register 24 selects only 2 bits and uses it as an input address A ₁ A ₂ to the decoder ROM 20. A ₃ A ₄ becomes 0. Therefore, in this case, the decoder RO
M is only partially used. Mask register 24-1
The contents of .about.24-3 can be set by instructing the mask register write control circuit 25 from outside (such as an element processor or a host computer) (using the W and MASK signal lines).

優先順位制御回路21−１には各入力チヤネルから送信
要求が伝えられ、後述するように、予め定められた論理
に従つてそのうちの一つが選択される。その後、優先順
位制御回路21−１は選択転送制御回路22−１中のバツフ
ア23−１が空きであることを確認し、信号線27−１によ
り選択転送制御回路22−１に選択された入力チヤネル番
号（チヤネル２）を伝える。その結果、プロセツサＰ
（2,j,k）の出力ポート６上にある送信情報パケツトが
（分配器８および入力チヤネル２のデータ線102（すな
わちIDATA₂）を経由して）選択転送制御回路22−１中の
バツフア23−１に転送される。この間優先順位制御回路
21−１はビジー状態にあり、転送が完了すると次の選択
動作に入る。A transmission request is transmitted from each input channel to the priority control circuit 21-1, and one of them is selected according to a predetermined logic, as described later. Thereafter, the priority control circuit 21-1 confirms that the buffer 23-1 in the selective transfer control circuit 22-1 is empty, and the input selected by the selective transfer control circuit 22-1 by the signal line 27-1. Give the channel number (Channel 2). As a result, processor P
(2, j, k) of the output port is sent information packet in on 6 (through the distributor 8 and the input channel 2 of the data lines 102 (i.e. IDATA ₂₎₎ buffer of the selected transfer control circuit 22-1 23-1. During this time, the priority control circuit
21-1 is in a busy state, and when the transfer is completed, the next selection operation is started.

選択転送制御回路22−１は、バツフア23−１にデータ
が転送されると、出力先プロセツサ（プロセツサＰ（0,
j,k））に対して送信を要求する信号線OREQ₀,すなわち
制御信号103上に出力する。プロセツサＰ（0,j,k）のセ
レクタ７−２には複数個の座標変換クロスバスイツチか
らの送信要求信号が入力され、予め定められた論理に従
つてそのうちの一つが選択され通信制御装置３−２に伝
えられる。通信制御装置３−２は、入力ポートレジスタ
５−２が空であるばあい、データ線ODATA₀、すなわちデ
ータ線104上のデータをセレクタ７−２を経由して入力
ポートレジスタ５−２に書き込み、書き込みが終了する
と、書き込み完了信号を制御信号線OACK₀上に出力す
る。選択転送制御回路22−１は線OACK₀から書き込み完
了信号を入力すると線OREQ₀上の送信要求信号をネゲー
トし、プロセツサＰ（0,j,k）の通信制御装置３−２は
これを検知して線OACK₀上の受信完了信号をネゲートす
る。選択転送制御回路22−１のバツフア23−１は再び転
送可能状態になり、プロセツサＰ（0,j,k）のセレクタ
７−２も他のクロスバスイツチを選択可能になる。When the data is transferred to the buffer 23-1, the selective transfer control circuit 22-1 outputs the data to the output destination processor (processor P (0,
j, k)) on the signal line OREQ ₀ requesting transmission, that is, on the control signal 103. A transmission request signal from a plurality of coordinate transformation cross bus switches is input to the selector 7-2 of the processor P (0, j, k), and one of the transmission request signals is selected according to a predetermined logic, and the communication control device 3 is selected. -2. When the input port register 5-2 is empty, the communication control device 3-2 writes the data on the data line ODATA ₀ , that is, the data on the data line 104, to the input port register 5-2 via the selector 7-2. When writing is completed, it outputs a write completion signal on the control signal line OACK _0. Negates the transmission request signal of selecting the transfer control circuit 22-1 on line OREQ ₀ by entering a write completion signal from line OACK _0, processor P (0, j, k) the communication control unit 3-2 detects this in Then, the reception completion signal on the line OACK ₀ is negated. The buffer 23-1 of the selective transfer control circuit 22-1 is again in a transfer enable state, and the selector 7-2 of the processor P (0, j, k) can also select another cross bus switch.

一方、優先順位制御回路21−１がビジー状態から抜け
出ると、信号線26−３によりこれを検知したデコーダ20
−３は、プロセツサＰ（2,j,k）に対しIACK₂上に転送完
了信号を送る。IACK₂上の転送完了信号を受信したプロ
セツサＰ（2,j,k）の通信制御装置３はIREQ₂上の転送要
求信号をネゲートし、次の送信データを出力ポート６上
に載せることが可能となる。On the other hand, when the priority control circuit 21-1 comes out of the busy state, the decoder 20 which has detected this by the signal line 26-3.
-3 sends a transfer completion signal on IACK ₂ to processor P (2, j, k). The communication control device 3 of the processor P (2, j, k) that has received the transfer completion signal on IACK ₂ can negate the transfer request signal on IREQ ₂ and put the next transmission data on the output port 6 Becomes

優先順位制御回路21−１〜21−３の論理の一例を第７
図に示す。この例では、３つのデコーダからの入力が３
ビツトの情報、すなわち０〜７となることに着目し、８
エントリのメモリ（RAM）15に各入力に対応した許可信
号のパタンを記憶させておく方式をとる。しかし、優先
順位制御回路の論理はこれにとどまるものではない。An example of the logic of the priority control circuits 21-1 to 21-3 is shown in FIG.
Shown in the figure. In this example, the inputs from the three decoders are 3
Paying attention to the bit information, that is, 0 to 7,
A method is employed in which a pattern of a permission signal corresponding to each input is stored in a memory (RAM) 15 of the entry. However, the logic of the priority control circuit is not limited to this.

第二実施例第8,9図には第二実施例の結合方式の説明図を示す。
第一実施例と異なるのは、要素プロセツサ内の通信制御
装置３で中継動作を行う代わりに、各要素プロセツサ毎
に設置された中継クロスバスイツチ14がこれを行う点に
ある。Second Embodiment FIGS. 8 and 9 are explanatory views of the connection method of the second embodiment.
The difference from the first embodiment is that, instead of performing the relay operation by the communication control device 3 in the element processor, the relay cross bus switch 14 installed for each element processor performs this operation.

（１）中継クロスバスイツチの構造と動作中継クロスバスイツチの構造は基本的には座標変換ク
ロスバスイツチの構造と同じであるが、デコーダに入力
される宛先プロセツサ番号は一つの座標フイールドでな
く、３個の座標フイールド全てである。中継クロスバス
イツチの宛先デコード部の詳細説明図を第12図を用いて
行なう。データ線IDATAより入力された送信先プロセツ
サ番号と、中継クロスバスイツチ内に用意された自プロ
セツサ番号を格納した自プロセツサ番号レジスタ50の内
容は比較器51に入力され、ビツトワイズにEXORがとられ
て一致すれば１が、不一致であれば０が信号線52−１〜
52−32上に出力される。この出力はマスクレジスタ24に
入力され、マスクレジスタ24の交点フイールドに０が書
かれている場合、すなわちマスクされていない場合は、
この入力は出力線A₁A₂A₃上にそのまま出力され、ここで
ワイアツドANDがとられる。その結果、マスクされずに
出力線につながれた比較器出力が全て１の場合にのみ、
出力線には１が出力される。各出力線A₁A₂A₃に対しプロ
セツサ番号の第１〜第３各座標を割り付ければ、ある座
標フイールドが全て自プロセツサ番号の対応するフイー
ルドと等しければ出力線には１が、そうでなければ（不
一致座標の場合には）０が出力されることになる。出力
線上の信号は反転されてデコーダ20に入力される。例え
ば、図で第１座標がデータ線IDATAより入力された送信
先プロセツサ番号のビツト0,1,2で表されるとする。対
応する自プロセツサ番号レジスタ50のビツト0,1,2とと
もに比較器51に入力され、全てのビツトが一致すれば信
号線52−1,52−2,52−３には１が出力される。第１座標
フイールドとマスクレジスタの第１出力線A₁の交点フイ
ールド53−1,53−2,53−３には０が、他の出力線A₂A₃と
の交点フイールドには１が書き込まれているから、第１
出力線A₁に対してのみ比較結果が送られる。従つて、信
号線52−1,52−2,52−３に全て１が出力された場合にの
み、マスクレジスタの第１出力線A₁に１が出力される。(1) Structure and operation of the relay cross bus switch The structure of the relay cross bus switch is basically the same as that of the coordinate conversion cross bus switch, but the destination processor number input to the decoder is not one coordinate field but three. Are all coordinate fields. A detailed explanatory diagram of the destination decoding section of the relay cross bus switch will be described with reference to FIG. The destination processor number input from the data line IDATA and the contents of the own processor number register 50 that stores the own processor number prepared in the relay cross bus switch are input to the comparator 51, and are bitwise EXORed and matched. In this case, 1 is output, and if they do not match, 0 is output from the signal lines 52-1 to 52-1.
Output on 52-32. This output is input to the mask register 24, and when 0 is written in the intersection field of the mask register 24, that is, when the mask is not masked,
This input is output as it is on output lines A ₁ A ₂ A ₃ , where a wired AND is taken. As a result, only when all the comparator outputs connected to the output lines without being masked are 1,
1 is output to the output line. If the first to third coordinates of the processor number are assigned to each output line A ₁ A ₂ A ₃ , 1 is output to the output line if all coordinate fields are equal to the corresponding field of the own processor number, and so on. Otherwise (in the case of mismatched coordinates) 0 will be output. The signal on the output line is inverted and input to the decoder 20. For example, assume that the first coordinate is represented by bits 0, 1, and 2 of the destination processor number input from the data line IDATA in the figure. The bits are input to the comparator 51 together with the corresponding bits 0, 1, and 2 of the own processor number register 50. If all the bits match, 1 is output to the signal lines 52-1, 52-2, and 52-3. The first coordinate field and the intersection field 53-1,53-2,53-3 first output line A ₁ of the mask register 0, 1 written to the intersection field with other output line A ₂ A ₃ The first
Only a comparison result to the output line A ₁ is sent. Accordance connexion, only when all the signal lines 52-1,52-2,52-3 1 is output, 1 is output to the first output line A ₁ of the mask register.

デコーダは出力線A₁A₂A₃上の信号を２進アドレスとし
てチヤネル番号に変換し、該チヤネルの優先順位制御回
路に１を、他には０を送る。例えば、出力線A₁A₂A₃上の
信号が全て１であつた場合、すなわち反転されたデコー
ダ入力アドレス‘000'に対してはチヤネル０、すなわ
ち、自プロセツサへのチヤネルを選択する。The decoder converts the signal on output line A ₁ A ₂ A ₃ as a binary address to a channel number and sends a 1 to the channel priority control circuit and a 0 to the others. For example, when the signals on the output lines A ₁ A ₂ A ₃ are all 1, that is, for the inverted decoder input address '000', the channel 0, that is, the channel to the own processor is selected.

（２）通信方法第８図において、一つの座標変換クロスバスイツチ９
から中継クロスバスイツチ14に通信用パケツトが入力さ
れるとその宛先がデコードされ、もし、このプロセツサ
宛であればスイツチを要素プロセツサの入力ポート５へ
接続しパケツトを入力する。もし、それ以外の宛先であ
れば、不一致座標を変換する座標変換クロスバスイツチ
９を選択してそれに接続する。中継クロスバスイツチ14
の外部インタフエースは座標変換クロスバスイツチ９と
同じである。(2) Communication method In FIG. 8, one coordinate transformation cross bus switch 9 is used.
When a communication packet is input to the relay cross bus switch 14, the destination of the packet is decoded. If the packet is addressed to this processor, the switch is connected to the input port 5 of the element processor to input the packet. If the destination is any other address, the coordinate conversion cross bus switch 9 for converting the mismatched coordinates is selected and connected thereto. Relay cross bus switch 14
Is the same as that of the coordinate transformation cross bus switch 9.

第９図には要素プロセツサＰ（i,j,k）から要素プロ
セツサＰ（0,j,k）,P（0,0,k）の中継クロスバスイツチ
を経由して要素プロセツサＰ（0,0,0）にパケツトを転
送する例を破線で示してある。FIG. 9 shows the element processor P (0,0) from the element processor P (i, j, k) via the relay cross bus switch of the element processor P (0, j, k), P (0,0, k). , 0) is shown by a broken line in the example of transferring a packet.

第二実施例においては、通信制御装置３は第一実施例
にて述べたような通信用パケツトの宛先情報（通信先プ
ロセツサ番号）の解読、その結果に基づく特定の座標変
換クロスバスイツチ、または処理装置２の選択と通信用
パケツトの送出機能は持たず、単に中継クロスバスイツ
チ14とのインタフエースをとる機能だけを持つ。In the second embodiment, the communication control device 3 decodes the destination information (communication destination processor number) of the communication packet as described in the first embodiment, and performs a specific coordinate conversion cross-bus switch or processing based on the result. It does not have the function of selecting the device 2 and the function of transmitting a communication packet, but merely the function of interfacing with the relay cross bus switch 14.

評価第９図の例では座標変換クロスバスイツチを三つ（９
−1,9−4,9−５）と、中継クロスバスイツチを四つ（14
−1,14−2,14−3,14−４）経由しているので計７回スイ
ツチング動作が必要である。第一実施例ではクロスポイ
ントの通過数、すなわち通信用パケツトを一つの入出力
ポート／バツフアからの次のバツフア／入出力ポートへ
転送する単位スイツチング動作を３回と数えているが、
要素プロセツサの制御装置３での判定・選択処理を考慮
すれば、転送に要する時間は結局同じことになる。第一
実施例（プロセツサ自身が中継する方式）では最大Ｎ回
の送信動作が必要であることから、このスイツチの最大
通信路長はＮであり、ハードウエアとしてはクロスポイ
ントの数でみるとn₁×n₂×…×n_N×（n₁＋n₂＋…＋n_N）
となる。また、n_K ²:k＝1,…Ｎの最大値がクロスバスイ
ツチの最大結合能力である。また、第二実施例（プロセ
ツサ対応に中継クロスバスイツチを持つ方式）では、中
継クロスバスイツチ14で中継動作を行うこと自体を一回
の送信動作とみなすと最大2N＋１回の送信動作が必要と
なる。すなわちこのスイツチの最大通信路長は2N＋１で
ある。また、ハードウエア量は n₁×n₂×…×n_N×｛（Ｎ＋１）^２＋n₁＋n₂＋…＋n_N｝で表される。Evaluation In the example of FIG. 9, three coordinate transformation cross bus switches (9
-1,9-4,9-5) and four relay cross bus switches (14
-1, 14-2, 14-3, 14-4), a total of seven switching operations are required. In the first embodiment, the number of crossing points, that is, the unit switching operation for transferring a communication packet from one input / output port / buffer to the next buffer / input / output port is counted as three times.
Considering the determination / selection process in the control unit 3 of the element processor, the time required for the transfer is the same after all. In the first embodiment (a method in which the processor itself relays), since a maximum of N transmission operations are required, the maximum communication path length of this switch is N, and the number of cross points in hardware is n. ₁ × n ₂ ×… × n _N × (n ₁ + n ₂ +… + n _N )
Becomes The maximum value of n _K ² : k = 1,... N is the maximum coupling capacity of the cross bass switch. Further, in the second embodiment (a system having a relay cross bus switch corresponding to a processor), when performing the relay operation by the relay cross bus switch 14 itself is regarded as one transmission operation, a maximum of 2N + 1 transmission operations are required. That is, the maximum communication path length of this switch is 2N + 1. The amount of hardware is represented by n ₁ × n ₂ × ... × n _N × {(N + 1) ² + n ₁ + n ₂ +... + _{N N} }.

次に、本発明の相互結合方式により、一つのクロスバ
スイツチの最大結合能力が与えられたときに、最高の性
能が出せる構成と最小のハードウエア量で済む構成を容
易に求めることが出来ることを示す。Next, by the mutual coupling method of the present invention, when the maximum coupling capacity of one cross bus switch is given, it is possible to easily find a configuration that can provide the highest performance and a configuration that requires a minimum amount of hardware. Show.

性能は、クロスバスイツチの各プロセツサ間結合用信
号線の本数を一定とすると、通信路長Ｎまたは2N＋１で
決まる。すなわち、プロセツサを論理的に配置する空間
の次元を出来るだけ小さくする方が高い性能が得られ
る。例えば第一実施例に示すプロセツサが中継する方式
の場合、プロセツサ台数をＬ、一つのクロスバスイツチ
の最大結合可能プロセツサ台数をｎとするとき、ｑ＝［logL/logn］＋１が該クロスバスイツチを用いたときの最小通信路長であ
る。ここに［］は商の整数部分をとる記号である。この
ときの構成は、要素プロセツサをｑまたはｑ＋１次元の
格子空間の超立方体状領域に配置し、その中の一次元部
分領域を構成する全ての要素プロセツサを結合可能プロ
セツサ台数が上記最大値ｎであるクロスバスイツチを用
いて結合したものである。The performance is determined by the communication path length N or 2N + 1, assuming that the number of signal lines for coupling between processors of the cross bus switch is fixed. That is, higher performance can be obtained by reducing the dimension of the space where processors are logically arranged as much as possible. For example, in the case of the processor relay system shown in the first embodiment, when the number of processors is L and the maximum number of processors that can be combined in one cross bus switch is n, q = [logL / logn] +1 uses the cross bus switch. This is the minimum communication path length when there is. Here, [] is a symbol that takes an integer part of a quotient. The configuration at this time is such that the element processors are arranged in a hypercube-shaped region of a q or q + 1-dimensional lattice space, and the number of processors capable of coupling all the element processors constituting the one-dimensional partial region therein is the maximum value n. These are combined using a certain cross bass switch.

一方、ハードウエア量はn₁×n₂×…×n_N×（n₁＋n₂＋
…＋n_N）となるから、明らかにn_i＝２の場合が最小ハー
ドウエア量となる。しかし、第二実施例に示す中継クロ
スバスイツチを用いる方式では、第10図に示すように、
別の構成を取るときにハード量が最小と成る。例えば25
6台構成では８×８×４台の３次元に、4096台構成では
８×８×８×８台の４次元に配置する場合がハード量が
最小となる。また、ある程度ハード量が多くても性能が
出た方が良いとする立場に立てば、要素プロセツサ64台
〜1024台構成では８×８〜32×32の２次元構成が、2048
〜32768台構成では４×８×８〜32×32×32の３次元構
成が適当であろう。On the other hand, the hardware amount is n ₁ × n ₂ × ... × n _N × (n ₁ + n ₂ +
.. + _{N N} ), and the case of n _i = 2 is obviously the minimum amount of hardware. However, in the system using the relay cross bus switch shown in the second embodiment, as shown in FIG.
The amount of hardware is minimized when taking another configuration. For example, 25
In the case of a 6-unit configuration, the amount of hardware is the smallest when arranged in 8 × 8 × 4 3 dimensions, and in the case of a 4096-unit configuration, 8 × 8 × 8 × 8 units are arranged in 4 dimensions. In addition, from the standpoint that it is better to have high performance even with a certain amount of hardware, a two-dimensional configuration of 8 × 8 to 32 × 32 is required for a configuration of 64 to 1024 element processors.
For a configuration of ~ 32768 units, a three-dimensional configuration of 4x8x8 to 32x32x32 would be appropriate.

〔The invention's effect〕

本発明により、一つのクロスバスイツチ（フルクロス
バスイツチ）では接続出来ないような多数の要素プロセ
ツサを、プロセツサ台数の如何にかかわらずフルクロス
バスイツチに近い結合能力で接続するスイツチを構成す
ることができる。ここにフルクロスバスイツチに近い結
合能力とは、通信性能が高い（クロスポイント通過数が
少ない）こと、応用上重要なプロセツサ間結合トポロジ
ー（格子，リング，バタフライ）を内包していて、この
ようなプロセツサ間通信パタンに対しては最小のクロス
ポイント通過数で通信できること、を意味する。従来技
術の範囲では、上記結合トポロジーを内包し、かつ、多
数台のプロセツサを結合できるネツトワークとしてはハ
イパーキユーブが公知であるが、本発明の結合方式は上
記特定の結合トポロジー以外の通信パタンにおける通信
性能が、ハイパーキユーブよりはるかに優れている。と
くに中継クロスバイスイツチを用いることにより、デツ
トロツクを完全に防止することができる。また、本発明
により、座標変換用クロスバスイツチ規模（クロスバス
イツチの入出力チヤネル数）の（技術的・経済的な）上
限値と要素プロセツサ台数が任意に与えられたとき、最
適（通信性能最大、ハードウエア量最小、または両者の
中間）な結合方式を構成する方法が与えられ、フルクロ
スバスイツチとハイパーキユーブの間隙を埋めることが
可能となる。According to the present invention, it is possible to configure a switch that connects a large number of element processors that cannot be connected by one cross bus switch (full cross bus switch) with a coupling capacity close to that of a full cross bus switch regardless of the number of processors. Here, the coupling capability close to a full cross bus switch means that the communication performance is high (the number of cross points passing is small) and the coupling topology between processors (lattice, ring, butterfly) that is important in application is included. This means that communication can be performed with a minimum number of crosspoints for the inter-processor communication pattern. In the range of the prior art, a hypercube is known as a network including the above connection topology and capable of connecting a large number of processors, but the connection method of the present invention employs a communication pattern other than the specific connection topology. Is much better than Hypercube. In particular, by using a relay cross-by switch, it is possible to completely prevent destruction. According to the present invention, when the (technically and economically) upper limit value and the number of element processors of the scale of the cross bus switch for coordinate conversion (the number of input / output channels of the cross bus switch) and the number of element processors are arbitrarily given, the optimum (the maximum communication performance, A method of constructing a coupling system with a minimum amount of hardware or an intermediate between the two is provided, and it is possible to bridge the gap between the full cross bus switch and the hypercube.

さらに、チツプ，モジユール，ボード，筺体等の実装
単位ごとに各次元の座標変換スイツチを収納できるよ
う、要素プロセツサの結合関係を定めることが可能であ
る。Furthermore, it is possible to determine the connection relationship of the element processors so that the coordinate transformation switches of each dimension can be stored for each mounting unit such as a chip, a module, a board, a housing, and the like.

[Brief description of the drawings]

第１図は本発明の第１実施例の構成図、第２図はプロセ
ツサの超直方体状配置図、第３図は要素プロセツサの構
成図、第４図は通信制御装置３の中継動作を示す説明
図、第５図はクロスバスイツチのインタフエース説明
図、第６図はクロスバスイツチの構成図、第７図は優先
順位制御回路の一例を示す説明図、第８図は第二実施例
の構成図、第９図は第二実施例の動作説明図、第10図は
中継クロスバスイツチを含む場合のハードウエア量を示
す説明図、第11図はマスクレジスタの説明図、第12図は
中継用クロスバスイツチのデコード部説明図である。１……中継装置、２〜２−４……処理装置、３〜３−４
……通信制御装置、５〜５−４……入力ポートレジス
タ、６〜６−４……出力ポートレジスタ、７〜７−４…
…セレクタ、８〜８−４……分配器、９−１〜９−５…
…座標変換クロスバスイツチ、20,20−１〜20−３……
プロセツサ番号デコーダ、21−１〜21−３……優先順位
制御回路、22−１〜22−３……選択転送制御回路、24,2
4−１〜24−３……マスクレジスタ、14−１〜14−４…
…中継クロスバスイツチ、15……メモリ（RAM）、50…
…自プロセツサ番号レジスタ、51……比較器FIG. 1 is a block diagram of a first embodiment of the present invention, FIG. 2 is a diagram of a supercuboid arrangement of processors, FIG. 3 is a block diagram of an element processor, and FIG. 4 shows a relay operation of the communication control device 3. FIG. 5 is an explanatory diagram of the interface of the cross bus switch, FIG. 6 is a configuration diagram of the cross bus switch, FIG. 7 is an explanatory diagram showing an example of the priority control circuit, and FIG. 8 is a configuration of the second embodiment. FIG. 9, FIG. 9 is an explanatory diagram of the operation of the second embodiment, FIG. 10 is an explanatory diagram showing the amount of hardware when a relay cross bus switch is included, FIG. 11 is an explanatory diagram of a mask register, and FIG. FIG. 4 is an explanatory diagram of a decoding unit of a cross bus switch. 1 ... relay device, 2-2-4 ... processing device, 3-3-4
... Communication control device, 5-5-4 ... Input port register, 6-6-4 ... Output port register, 7-7-4 ...
... selector, 8-8-4 ... distributor, 9-1-9-5 ...
… Coordinate conversion cross bus switch, 20,20-1 to 20-3 ……
Processor number decoders, 21-1 to 21-3 ... priority control circuits, 22-1 to 22-3 ... selective transfer control circuits, 24, 2
4-1 to 24-3 ... Mask register, 14-1 to 14-4 ...
… Relay cross bus switch, 15 …… Memory (RAM), 50…
... own processor number register, 51 ... ... comparator

フロントページの続き (72)発明者中尾和夫神奈川県川崎市麻生区王禅寺1099番地株式会社日立製作所システム開発研究所内 (72)発明者林剛久東京都国分寺市東恋ケ窪１丁目280番地株式会社日立製作所中央研究所内 (72)発明者田中輝雄東京都国分寺市東恋ケ窪１丁目280番地株式会社日立製作所中央研究所内 (72)発明者長島重夫東京都国分寺市東恋ケ窪１丁目280番地株式会社日立製作所中央研究所内 (56)参考文献特開昭64−4856（ＪＰ，Ａ) 情報処理学会第35回（昭和62年後期) 全国大会（昭和62年９月28日〜30日）講演論文集Ｐ．135−136（３Ｃ−５）鈴岡節中村定雄藤田純一小柳滋「超並列ＡＩマシンの構想」Continued on the front page (72) Inventor Kazuo Nakao 1099 Ozenji Temple, Aso-ku, Kawasaki City, Kanagawa Prefecture Inside Hitachi, Ltd. System Development Laboratory Co., Ltd. (72) Inventor Takehisa Hayashi 1-280, Higashi-Koikekubo, Kokubunji-shi, Tokyo Hitachi, Ltd. In the laboratory (72) Inventor Teruo Tanaka 1-280 Higashi Koikekubo, Kokubunji, Tokyo, Japan Inside the Central Research Laboratory of Hitachi, Ltd. (72) Inventor Shigeo Nagashima 1-280 Higashi Koikekubo, Kokubunji, Tokyo, Japan, Central Research Laboratory of Hitachi, Ltd. (56 References JP-A-64-4856 (JP, A) IPSJ 35th (late 1987) National Convention (September 28-30, 1987) 135-136 (3C-5) Setsu Suzuoka Sadao Nakamura Junichi Fujita Shigeru Koyanagi "The concept of massively parallel AI machine"

Claims

(57) [Claims]

1. A _{_{L = n 1 × n 2 ×}} ... × n N pieces of N-dimensional lattice processors and bonded to each other said element processors L × the element including a processor _{(1 / n 1 + 1 /} n 2 + ... + 1 / n _N ) In a parallel computer constituted by an interconnection network including crossbar switches, a) N-dimensional grid coordinates (i ₁ , i ₂ ,..., I _N ), 0 ≦ i _k ≦ n _k -1
Means for identifying each element processor by element processor numbers represented by (k = 1, 2,..., N); b) the element processor number and the number _{k of the} k-th element processors (k = 1, 2) ,..., N), based on the positional relationship between the two element processors specified by the above-described identifying means, the information transmission destination of the element processor which is the information transmission source among the two element processors. And c) means for operating the crossbar switch so that information is transmitted to the selected element processor.

2. A parallel computer as claimed in claim 1, wherein said interconnection network relays information packets to each element processor for relaying information packets from a source element processor to a final destination element processor. A parallel computer comprising:

3. The parallel computer according to claim 2, wherein: said information packet relay means: d) an information packet of a departure source element processor comprising transmission data and an element processor number of a final destination element processor; And (e) the first element processor number of the own element processor.
Element processor number _{_{(i 1, i 2, ...}} , i N) of the captured second element processor number is an element processor number of the final destination element processors in the information packet _{_{(j 1, j 2, ...}} , j
_N ) is compared with each coordinate, and coordinates i _k and j _k (k = 1, 2,...,
N) means for detecting one of the dimensions k in which the mismatch (i _k ≠ j _k ) is detected; and f) coordinates of the detected one dimension from among the N crossbar switches connected to the self-element processor. Means for selecting a crossbar switch to which a plurality of element processors having different element processor numbers having the same coordinate in other dimensions are connected; g) means for inputting the information packet to the selected crossbar switch; ) said first element processor number _{_{(i 1, i 2, ...}} , i N) and the second element processor number _{_{(j 1, j 2, ...}} , j N) If the match, the own element processor the Means for determining the final destination element processor and inputting the information packet to the processing device of the own element processor.

4. The parallel computer according to claim 1, wherein the parallel computer comprises: a) a plurality of relay crossbar switches having N + 1 inputs and N + 1 outputs provided to each of the element processors; Nu) relaying the information packet to the final destination element processor by connecting the input port register and the output port register of the source element processor; and l) the source element processor via the relay crossbar switch. A parallel computer comprising: N crossbar switches connected to a processor.