JPH01131950A

JPH01131950A - Interconnecting network and its cross bar switch

Info

Publication number: JPH01131950A
Application number: JP62289323A
Authority: JP
Inventors: Akira Muramatsu; 晃村松; Ikuo Yoshihara; 郁夫吉原; Kazuo Nakao; 中尾　和夫; Takehisa Hayashi; 剛久林; Teruo Tanaka; 輝雄田中; Shigeo Nagashima; 長島　重夫
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1987-11-18
Filing date: 1987-11-18
Publication date: 1989-05-24
Anticipated expiration: 2012-02-26
Also published as: JP2585318B2

Abstract

PURPOSE:To prevent the contention for another relay path from being generated at the time of relaying and to completely eliminate the risk of dead lock b using a cross bar switch as a relay means annexed to an element processor. CONSTITUTION:An arbitrary element processor P(i,j,k) arranged logically at each lattice point corresponding to the inner point of a rectangular parallelopiped of (LXMXK) on a three-dimensional grid space is connected to three cross bar switches 9-1-9-3. Each cross bar switch is provided with a function to communicate with the element processor having a number in which the coordinate value of one specific dimension of a three-dimensional coordinate constituting a processor number is substituted by another coordinate value. Therefore, it is possible to communicate with the element processor having any number by relaying three coordinate transform cross bar switches, and the dead lock can be prevented from being generated completely by using a relay cross bar switch.

Description

【発明の詳細な説明】〔産業上の利用分野〕本発明は並列計算機の要素プロセッサの相互結合方式に
係り、特に高い結合能力が必要でありながらプロセッサ
台数が多くてフルクロスバスイッチでは全部を結合出来
ない場合に好適なスイッチ構成に関する。[Detailed Description of the Invention] [Industrial Application Field] The present invention relates to a method for mutually coupling element processors of a parallel computer, and in particular, since a high coupling capacity is required and the number of processors is large, it is difficult to couple all of them with a full crossbar switch. This invention relates to a switch configuration suitable for cases where this is not possible.

[Conventional technology]

従来の装置は、各要素プロセッサを一本または数本のバ
スに結合する一般的な方法の他、特開昭５８−１８１１
６８に記載のように、格子状に配列された要素プロセッ
サの隣合うもの同士を結合する方式、特開昭５９−１０
９９６６に記載のように、全要素プロセッサを（フル）
クロスバスイッチで結合する方式、特開昭６２−２３５
３に記載のように、全要素プロセッサを多段スイッチで
結合する方式、文献１に記載のように、超立方体結合を
取るもの等が代表的である。Conventional devices include the general method of coupling each element processor to one or several buses, as well as the
68, a method for connecting adjacent element processors arranged in a grid, JP-A-59-10
All element processors (full) as described in 9966
Method of coupling with a crossbar switch, JP-A-62-235
Typical examples include a system in which all element processors are connected using multi-stage switches as described in Reference 1, and a system in which hypercube coupling is performed as described in Reference 1.

文献１ニジ−・エル・サイン「ザ　コスミックキューブ
」、コミュニケーションズ　オブ　ザエーシーエム、２
８巻、１号、２２〜３３頁、１９８５年（Ｃ，Ｌ、５ａ
ｉｔｚ　”Ｔｈｅ　Ｃｏ５ｍ１ｃ　Ｃｕｂｅ”。Literature 1 Nigi L. Sign “The Cosmic Cube”, Communications of the ACM, 2
Volume 8, No. 1, pp. 22-33, 1985 (C, L, 5a
itz “The Co5m1c Cube”.

ｃｏｍ＋ｎｕｎｉｃａｔｉｏｎＮ　　ｏｆ　　ｔｈｅＡ
ＣＭ、ｖｏ１、　　２８．　ｎｏ、　　１、　ｐｐ、　
　２２−３３．１９８５）〔発明が解決しようとする問
題点〕上記従来技術のうち、バス結合方式はハードウェアの量
が少ないという利点があるが、結合されている要素プロ
セッサ台数が多いと、バスの競合により性能が低下する
という問題点があり、十数台が限度とされている６格子状結合（メツシュ結合ともいう）は同じくハードウ
ェアの量が少なく、しがも多数の要素プロセッサを結合
できる反面、隣接プロセッサとしか交信出来ないためそ
の通信性能は扱う問題の性質に大きく依存する。近傍計
算に向く偏微分方程式の求解９画像処理等は良いが、有
限要素法や高速フーリエ変換（ＦＦＴ）、論理／回路シ
ミュレーション等では通信のオーバヘッドが著しくなる
。com+nunicationN of theA
CM, vol1, 28. no, 1, pp,
22-33.1985) [Problems to be Solved by the Invention] Among the above-mentioned conventional technologies, the bus coupling method has the advantage of requiring a small amount of hardware. There is a problem that performance decreases due to competition between processors, and the number of processors is limited to a dozen or so.6 Lattice joins (also known as mesh joins) also require a small amount of hardware, but are capable of combining a large number of element processors. However, since it can only communicate with neighboring processors, its communication performance depends largely on the nature of the problem being handled. Solving Partial Differential Equations Suitable for Neighborhood Calculations 9 Image processing is good, but communication overhead becomes significant in finite element method, fast Fourier transform (FFT), logic/circuit simulation, etc.

フルクロスバスイッチ結合はマトリックススイッチによ
り全ての要素プロセッサを完全結合するものである。そ
のため、性能的にはあらゆる結合の中で最高であるが、
ハードウェア量が要素プロセッサ台数２乗に比例するた
め、一般には数十台程度が結合限度とされている。A full crossbar switch connection is one in which all element processors are completely connected by a matrix switch. Therefore, in terms of performance, it is the best of all combinations, but
Since the amount of hardware is proportional to the square of the number of element processors, the limit for coupling is generally about several dozen processors.

多段スイッチはハードウェア量が要素プロセッサ台数を
ＬとするとＬ　ｌｏｇｚ　Ｌ程度に抑えられ、しかも完
全結合が可能であるため、多数の要素プロセッサを含む
超並列計算機向きの結合方式とされてきた。しかし、通
信路の長さ（中継段数）が約１ｏｇｚ　Ｌ程度となり転
送遅延が大きいこと、また多数の要素プロセッサが同じ
共有変換をアクセスしようとすると、複数のアクセスバ
スが途中の通信経路を奪いあいホットスポットコンテン
ションと呼ばれるネットワークの全面マヒ（マヒは全て
のアクセスに波及する）が生じること、ホットスポット
コンテンションに至らなくてもアクセス競合が大きく性
能が出ないこと１等の問題点が指摘されている。Multi-stage switches have been considered as a coupling method suitable for massively parallel computers including a large number of element processors because the amount of hardware can be suppressed to approximately L logz L, where L is the number of element processors, and complete coupling is possible. However, the length of the communication path (number of relay stages) is approximately 1 ogz L, resulting in a large transfer delay, and when many element processors attempt to access the same shared conversion, multiple access buses compete for the communication path in between. Problems have been pointed out, such as the occurrence of a total paralysis of the network called hotspot contention (paralysis affects all accesses), and even if hotspot contention does not occur, access contention is large and performance is poor. ing.

超立方体結合（ハイパーキューブ）は、比較的効率の良
い通信が行える結合として知られているが、プログラム
上で通信相手を指定しなくてはならずプログラミングが
煩雑となる。これを避けるために、各要素プロセッサ対
応に自動中継機構を設けると、ハードウェア量が増大す
る。また、結線が交差するので実装が面倒であるという
問題がある。A hypercube connection is known as a connection that allows relatively efficient communication, but the communication partner must be specified in the program, making programming complicated. In order to avoid this, if an automatic relay mechanism is provided for each element processor, the amount of hardware will increase. Additionally, there is a problem in that the wiring connections intersect, making implementation troublesome.

大規模な数値計算の並列処理においては、しばしば特定
のプロセッサ間通信バタンか現れることが知られている
。代表的なものとしては、格子結合、リング結合、バタ
フライ結合が挙げられる。It is known that in parallel processing of large-scale numerical calculations, specific inter-processor communication buttons often appear. Typical examples include lattice coupling, ring coupling, and butterfly coupling.

従って、これら特定のパタンの通信が高速に処理できる
ならば、そのネットワークの有効性は大であるといえる
。上記の従来技術のうち、格子結合。Therefore, if these specific patterns of communication can be processed at high speed, the effectiveness of the network can be said to be great. Among the above conventional techniques, lattice coupling.

リング結合、バタフライ結合を自身の結合トポロジーと
して内包しており、中継動作を必要とせずにこれらのパ
タンで通信できるのは、フルクロスバスイッチと超立方
体結合だけである。バス結合。It includes ring coupling and butterfly coupling as its own coupling topology, and only full crossbar switches and hypercube couplings can communicate using these patterns without requiring relay operations. Bus coupling.

格子結合、多段スイッチはいずれも、これら特定パタン
の通信全てを高速に処理することはできない、また、特
殊な例として、文献２には２台のプロセッサの結合を基
本とする２構成の超立方体結合を複数台のプロセッサの
結合を基本にした構成に拡張したＳｐａｎｎｉｎｇ　Ｂ
ｕｓ　）ｌｙｐｅｒｃｕｂａが紹介されているが、複数
台のプロセッサはバス結合されているため一時には２台
のプロセッサしか通信出来ず、上記結合トポロジーを内
包しているとはみなせない。Neither lattice coupling nor multistage switches can process all of these specific patterns of communication at high speed.As a special example, Reference 2 describes a two-configuration hypercube based on the coupling of two processors. Spanning B, which expands the combination to a configuration based on the combination of multiple processors
US) lypercuba has been introduced, but since multiple processors are connected via a bus, only two processors can communicate at a time, and it cannot be considered to include the above-mentioned connection topology.

文献２：ダルマ・ピー・アグラワル他「エバリュエイテ
イング　ザ　パフォーマンス　オブ　マルチコンピュー
タ　コンフィギユレーション」。Reference 2: Dharma P. Agrawal et al. "Evaluating the Performance of Multicomputer Configurations".

アイ・イー・イー・イーコンピュータ、メイ。I.E.E.Computer, May.

１９８６．２８〜２９頁、１９８６年（Ｄｈａｒｍａ　Ｐ、　Ａｇｒａｗａｌ　ｅ１、　ａ１
、’　Ｅｖａｌｕａｔｉｎｇ　ｔｈｅＰｅｒｆｏｒｍａ
ｎｃｅ　ｏｆ　Ｍｕｌｔｉｃｏｎ＋ｐｕｔｅｒ　Ｃｏｎ
ｆｉｇｕｒａｔｉｏｎＮ”。1986.28-29, 1986 (Dharma P, Agrawal e1, a1
,' Evaluating thePerforma
nce of Multicon+puter Con
figurationN”.

Ｍａｙ　１９８６．ｐｐ、２８−２９，１９８６）以上
の譜問題のうち、バス結合における結合台数制約の問題
は、要素プロセッサ台数が多い場合には解決することが
出来ない。また、格子結合における扱う問題の性質によ
って性能が大幅に変わる点、多段スイッチのホットスポ
ットコンテンションの問題はいずれも基本的かつ本質的
な問題であり、現状では解決されていない。さらに。May 1986. (pp, 28-29, 1986) Among the above problems, the problem of constraints on the number of connected devices in bus connection cannot be solved when the number of element processors is large. Furthermore, the performance varies greatly depending on the nature of the problem being handled in lattice coupling, and the problem of hot spot contention in multi-stage switches are both fundamental and essential problems that have not been solved at present. moreover.

Ｓｐａｎｎｉｎｇ　Ｂｕｓ　Ｈｙｐｅｒｃｕｂｅと共に
、これらの結合は格子結合、リング結合、バタフライ結
合の全てを内包していないことに起因する主要応用問題
における性能低下の問題がある。Along with the Spanning Bus Hypercube, there is a problem of performance degradation in major application problems due to the fact that these couplings do not include all of the lattice coupling, ring coupling, and butterfly coupling.

このような原理的困難が無い残る二つのネットワーク：
　（フル）クロスバスイッチ、超立方体結合の内、前者
はハードウェア量が多すぎて多数台の要素プロセッサを
結合出来ず、後者は多数台を結合できるが、プログラミ
ングと実装が大変であり、結合台数が増加すると性能も
低下する。また、超立方体結合では、直接結合していな
い２台の要素プロセッサ間で通信する場合には、別の要
素プロセッサに中継させる必要がある。このように情報
パケットを一旦ある要素プロセッサに取り込ませてから
別の要素プロセッサに転送していく通信の方法をストア
・アンド・フォワード方式というが、超立方体結合に限
らずストア・アンド・フォワード方式では、複数台の要
素プロセッサＰ１゜Ｐａ、　Ｐａ・・・がループ状に通
信経路を形成して、その上で中継動作を行なおうとする
と、ＰｌはＰ２が送信動作を終了して受信可能になるま
で送信動作を終了出来ず、Ｐ２はＰ３が送信動作を終了
して受信可能になるまで送信動作を終了出来ず、・・・
というようにお互いに噛み合って動けなくなるデッドロ
ック状態に陥ることがあるという問題がある。The remaining two networks that do not have such fundamental difficulties:
Among (full) crossbar switches and hypercube connections, the former requires too much hardware and cannot connect multiple element processors, while the latter can connect multiple processors, but programming and implementation are difficult, and the number of connected processors is too large. As , increases, performance also decreases. Furthermore, in hypercube coupling, when communicating between two element processors that are not directly coupled, it is necessary to relay the communication to another element processor. The communication method in which an information packet is once taken into one element processor and then transferred to another element processor is called the store-and-forward method, but the store-and-forward method is not limited to hypercube coupling. , multiple element processors P1゜Pa, Pa... form a communication path in a loop and try to perform a relay operation on it, Pl becomes ready to receive after P2 completes its sending operation. P2 cannot finish the sending operation until P3 finishes the sending operation and becomes ready to receive.
There is a problem that a deadlock situation may occur where the two systems become stuck together and cannot move.

性能は、−単位の送信情報が最終目的地に到達するまで
に通過する基本切替スイッチ（クロスポイント）の数で
、ハードウェア量は、ネットワークを構成するクロスポ
イントの総数で評価するが。Performance is evaluated by the number of basic changeover switches (crosspoints) that a unit of transmitted information passes through before reaching its final destination, and the amount of hardware is evaluated by the total number of crosspoints that make up the network.

通常、ハードウェア量と性能はトレードオフ関係にあり
、クロスポイント総数を増せず一単位の送信情報の通過
するクロスポイント数は減少する。Usually, there is a trade-off relationship between hardware amount and performance, and the total number of crosspoints cannot be increased, but the number of crosspoints through which one unit of transmission information passes decreases.

本発明は、上記の意味での原理的困難のない相互結合ネ
ットワークであり、かつ、ハードウェア量の上限値（技
術的・経済的）と要素プロセッサ台数が任意に与えられ
たとき、これらのプロセッサをフルクロスバスイッチに
近い高い結合能力（少ない切替段数）で結合するととも
に、通信性能とハードウェア量に関して最適な結合を与
えるシステム構成を提供すること、特に最小もしくは最
適なスイッチ・ハードウェア量を持つネットワークを可
変的に構成する技術を提供することを目的としている６
すなわち、従来技術の範囲では、プロセッサ台数が少な
いうちはフルクロスバスイッチで、ある台数以上になる
と超立方体結合でネットワークを構成せざるを得なかっ
たが、本発明によればこれら再結合方式の中間の性能を
持ち、デッドロックの恐れのない自動中継機能付きの場
合にはさらにスイッチ・ハードウェア量も超立方体結合
より少ない結合ネットワークを幾種類も構成できる。ま
た実装は、単位となるスイッチをチップ内、モジュール
内、ボード内、匣体内、匡体間等にスイッチ毎にまとめ
て実装出来るので、性能バランス上、保守上好適である
。The present invention is an interconnected network that does not have any theoretical difficulties in the above sense, and when the upper limit of the amount of hardware (technical and economical) and the number of element processors are arbitrarily given, these processors can be To provide a system configuration that provides a high coupling capacity (fewer switching stages) close to that of a full crossbar switch (few switching stages), and provides an optimal coupling in terms of communication performance and hardware amount, especially with the minimum or optimal amount of switch hardware. The purpose is to provide technology for variably configuring networks6.
In other words, in the conventional technology, when the number of processors is small, a full crossbar switch is used, and when the number of processors exceeds a certain level, the network has to be configured using hypercube coupling. However, according to the present invention, an intermediate between these recombination methods is used. If it has an automatic relay function that eliminates the risk of deadlock, it is possible to construct many types of connection networks that require less switch hardware than hypercube connections. In addition, since the unit switches can be mounted on a chip, module, board, casing, between casings, etc., it is suitable for performance balance and maintenance.

[Means for solving problems]

上記目的は、基本的には、　Ｌ　＝　ｎ　Ｉ　Ｘ　ｎ２
Ｘ・・・×ｎＮと因数分解できるＬを要素プロセッサの
台数とする並列計算機において、これらの因数の各々を
一辺の格子点数とするＮ次元格子間空間上の超直方体の
内点の座’！！４　（ｉ１、ｉｚ＋・・・、ＩＮ）Ｔｏ
≦ｉ１≦ｎｚ１、Ｏ≦１２≦ｎｚ−１，−０≦ｉＮ　≦
ｎＮ−１を各要素プロセッサのプロセッサ番号として与
え、任意のｋに対し第に次元の座標のみが異なるプロセ
ッサ番号を持つ一群の要素プロセッサ、すなわち、プロ
セッサ番号（１１＋１２ｙ・・・、ｏｒ・・・、１ｓ）（１１，１
２？・・・、１、・・・、ｉ＊）（１１１１２ｙ　”’
ＨｎＫ　　１　ｒ　”’ｔ　ｉＮ）を持つｎに個の要素
プロセッサ群を一つのｎＫ入力ｎＫ出力のクロスバスイ
ッチで相互に結合し、該結合を第に次元を除くＮ−１次
元部分空間の座標（Ｉｌｌ　　１２？　　”’ｒ　　ｎ
Ｋ−１ｒ　　ｎＫ＋１＋　　”’ｔ　　ＩＮ）の全て（
Ｌ／ｎｇ組）にわたって行ない、さらに全てのＫ（１≦
に≦Ｎ）に対して行なうことにより構成した、計ＬＸ　
（１／ｎｚ＋１／ｎｚ＋−＋１／ｎＮ）個のクロスバス
イッチにより要素プロセッサを結合し、さらにこの結合
において、送信側プロセッサに付随する中継手段が自プ
ロセッサ番号（ｉ工、ｉｚｔ・・・ｔｘｈ）と目的地プ
ロセッサ番号Ｆ１＋Ｊ２ｙ・・・、ｊＮ）とで不一致な
次元の一つに（ｉに≠ｊｘ）を選び、送信側プロセッサ
の中継手段に付随しているＮ個のクロスバスイッチ（こ
れを以下、座標変換クロスバスイッチと呼ぶ）のうち第
に次元の座標のみが異なるプロセッサ番号の要素プロセ
ッサ群を結合しているクロスバスイッチ（第に次元座標
変換クロスバスイッチ）を選択し、これに目的地プロセ
ッサ番号を送信データと組にして構成した通信用情報パ
ケットを入力し、各座標変換クロスバスイッチは目的地
プロセッサ番号の第に次元座標部分をデコードして第に
次元の座標が目的地プロセッサ番号の第に次元座標と等
しいプロセッサ番号を持つプロセッサ、すなわち、目的
地プロセッサそれ自身、または目的地プロセッサへ至る
経路上のプロセッサに送信して中継させ、後者の場合、
これを不一致座標が無くなるまで繰り返すことにより目
的地プロセッサに情報パケットを送信する方式を用いる
ことにより解決することが出来る。さらに、要素プロセ
ッサに付随する中継手段としてクロスバスイッチを用い
ることにより、中継時に他の中継パスと競合することが
無くなるためデッドロックの危険性を完全に排除するこ
とができる。The above purpose is basically L = n I X n2
In a parallel computer in which the number of element processors is L, which can be factorized as ! 4 (i1, iz+..., IN)To
≦i1≦nz1, O≦12≦nz-1, -0≦iN≦
nN-1 is given as the processor number of each element processor, and for any k, a group of element processors having processor numbers that differ only in the coordinates of the th dimension, that is, processor numbers (11+12y..., or..., 1s) (11,1
2? ..., 1, ..., i*) (11112y "'
A group of n element processors with HnK 1 r ''t iN) is interconnected by one nK input nK output crossbar switch, and the connection is expressed as the coordinate ( Ill 12?”'r n
K-1r nK+1+ ”'t IN) all (
L/ng set), and further all K (1≦
≦N), the total LX
Element processors are connected by (1/nz+1/nz+-+1/nN) crossbar switches, and furthermore, in this connection, the relay means attached to the transmitting processor has its own processor number (i-work, izt...txh). Select (i≠jx) as one of the dimensions that do not match with the destination processor number F1+J2y..., jN), and select N crossbar switches (hereinafter referred to as Select a crossbar switch (referred to as a coordinate conversion crossbar switch) that connects a group of element processors with processor numbers that differ only in dimensional coordinates (referred to as a dimensional coordinate conversion crossbar switch), and assign the destination processor number to this crossbar switch. Inputting the communication information packet configured as a pair with the transmission data, each coordinate conversion crossbar switch decodes the coordinate part of the first dimension of the destination processor number, and converts the coordinates of the first dimension into the first dimension of the destination processor number. It is transmitted and relayed to a processor with a processor number equal to the coordinates, i.e., the destination processor itself or a processor on the route to the destination processor; in the latter case,
This problem can be solved by repeating this process until there are no unmatched coordinates, thereby transmitting an information packet to the destination processor. Furthermore, by using a crossbar switch as a relay means attached to an element processor, there is no conflict with other relay paths during relaying, so the risk of deadlock can be completely eliminated.

[Effect]

本発明の相互結合方式により任意の要素プロセッサ間で
通信が行えることを述べる。プロセッサ番号（ｉｘｇｘ
ｚ＋・・・ｔｘＮ）を持つ送信元プロセッサからプロセ
ッサ番号Ｆ　ｔｔ　ｘ　２．・・・、ｉＮ）を持つ目的
地プロセッサへ通信する場合を考える６送信元プロセッ
サの第一座標ｉｚと目的地プロセッサ第一座標ｊｚとが
等しくない場合、これ以外の座標が全て等しい要素プロ
セッサは一つのクロスバスイッチ（第１次座標変換クロ
スバスイッチ）に接続されているから、このクロスバス
イッチによりプロセッサ番号Ｆ１ｙ　ｉｇ、・・・ｔｘ
Ｎ）を持つ要素プロセッサ、または該要素プロセッサに
付随した中継クロスバスイッチに情報を送信できる。It will be described that communication can be performed between arbitrary element processors using the mutual coupling method of the present invention. Processor number (ixgx
z+...txN) to the processor number F tt x 2. ..., iN) 6 If the first coordinates iz of the source processor and the first coordinates jz of the destination processor are not equal, then the element processors with all other coordinates equal are Since it is connected to one crossbar switch (first-order coordinate conversion crossbar switch), this crossbar switch allows the processor number F1y ig,...tx
The information can be sent to an element processor with N) or a relay crossbar switch attached to the element processor.

次にいま情報を受は取った要素プロセッサ、または該要
素プロセッサに付随した中継クロスバスイッチは、第２
次座標以外の座標が全て等しい要素プロセッサと一つの
座標変換クロスバスイッチにより結合されているから、
１２≠ｊ２なら、このスイッチによりプロセッサ番号Ｃ
ｊｓｖ　ｊｚｔ　ｉａ、・・・ｉＮ）を持つ要素プロセ
ッサ、または該要素プロセッサに付随した中継クロスバ
スイッチに情報を送信できる。このような経路を選び次
々とクロスバスイッチにより対応する座標を置き換えた
要素プロセッサ、または該要素プロセッサに付随した中
継クロスバスイッチに送信していくことにより、最後に
はプロセッサ番号Ｆ１＋、］２．・・・Ｗ　ｊＳ）を持
つ要素プロセッサに情報を送信できる。Next, the element processor that has just received the information, or the relay crossbar switch attached to the element processor,
Since the coordinates other than the next coordinate are all connected by the same element processor and one coordinate conversion crossbar switch,
If 12≠j2, this switch will change the processor number C.
jsv jzt ia, . . . iN) or a relay crossbar switch attached to the element processor. By selecting such a route and transmitting it one after another to the element processor whose corresponding coordinates have been replaced by the crossbar switch, or to the relay crossbar switch attached to the element processor, the processor number F1+ is finally obtained.]2. ...W jS).

また、多くの場合、Ｌの因数分解を適当に行なうことに
より、各次元の結合要素プロセッサ台数をある範囲に制
限することが出来る。これにより、各次元の座標変換ク
ロスバスイッチを定められた実装単位内、例えば、チッ
プ内、モジュール内。Furthermore, in many cases, by appropriately factorizing L, the number of connected element processors in each dimension can be limited to a certain range. This allows coordinate transformation of each dimension to convert the crossbar switch within a defined mounting unit, for example, within a chip or within a module.

ボード内、置体内、匡体間等に収めることが可能となる
。この性質は、各因数がすべて同じ値をとるという条件
：Ｌ＝ｍＮの下では充分満たすことができず、本発明の
分解：Ｌ＝ｎＩＸｎ２Ｘ・・・ＸｎＮが必要条件となる
。It is possible to store it inside a board, inside a stand, between boxes, etc. This property cannot be fully satisfied under the condition that all factors take the same value: L=mN, and the decomposition of the present invention: L=nIXn2X...XnN becomes a necessary condition.

中継クロスバスイッチを用いない場合には、要素プロセ
ッサＰ１が要素プロセッサＰ２にパケットを中継しよう
とし、同時に要素プロセッサＰ２も要素プロセッサＰ１
にパケットを中継しようとするとデッドロックが生じる
。しかし、中継クロスバスイッチを用いると、Ｐｌから
Ｐｌへのパケットの流れと独立にＰｌからＰｌへのパケ
ットの流れを設定できるため、プツトロックは発生しな
い。If a relay crossbar switch is not used, element processor P1 attempts to relay the packet to element processor P2, and at the same time element processor P2 also tries to relay the packet to element processor P1.
A deadlock will occur if you try to relay a packet to However, if a relay crossbar switch is used, the flow of packets from Pl to Pl can be set independently of the flow of packets from Pl to Pl, so no putlock occurs.

〔Example〕

以下、本発明の実施例を図面により詳細に説明する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

第一実施例（１）相互結合ネットワークの構成第１図は本発明の第一実施例の結合方式の３次元を例と
した説明図であるが、Ｎ次元に拡張した場合も同様であ
る。第２図に示すように。First Embodiment (1) Structure of Mutual Coupling Network FIG. 1 is an explanatory diagram exemplifying a three-dimensional coupling method according to the first embodiment of the present invention, but the same applies when expanded to N dimensions. As shown in Figure 2.

３次元格子空間上のＬＸＭＸＫの直方体の内点に対応す
る各格子点に論理的に配置された任意の（プロセッサ番
号（１＊、１＊ｋ）を持つ）要素プロセッサＰ（１ｔ　
ａ　ｓ　ｋ）は、第１図に示すように、３個のクロスバ
スイッチ９−１、９−２．９−３に接続される。ここに
クロスバスイッチ９−１は、プロセッサｐ　（ｉ、　Ｊ
、ｋ）と第１次元の座標のみが異なる要素プロセッサＰ
（Ｏｗ　ｊ＋　ｋ）＋　Ｐ（１＋　ｊ＋　ｋ）＋・・・
・・・、Ｐ（Ｌ−１，ｊｔｋ）を完全結合するものであ
り。An arbitrary (with processor number (1*, 1*k)) element processor P (1t
As k) is connected to three crossbar switches 9-1, 9-2, and 9-3, as shown in FIG. Here, the crossbar switch 9-1 has a processor p (i, J
, k) and the element processor P that differs only in the coordinates of the first dimension.
(Ow j+ k)+ P(1+ j+ k)+...
..., P(L-1, jtk) are completely connected.

同様にクロスバスイッチ９−２は第２次元の座標のみが
異なる要素プロセッサＰ（ｉ、Ｏ，ｋ）。Similarly, the crossbar switch 9-2 is an element processor P(i, O, k) that differs only in the coordinates of the second dimension.

Ｐ　（１＋　１＋　ｋ）＋・・・・・・、Ｐ　（ｉ、Ｍ
−１，ｋ）を、また、クロスバスイッチ９−３は第３次
元の座標のみが異なる要素プロセッサＰ（１＋ｊ＋０）
ｔ　Ｐ　（ｌｅ　ｊｖ　１）ｔ・・・・・・、Ｐ　（１
＋　、）＋に−１）を完全結合するものである。P (1+ 1+ k)+..., P (i, M
-1,k), and the crossbar switch 9-3 is an element processor P(1+j+0) that differs only in the coordinates of the third dimension.
t P (le jv 1) t..., P (1
+ , -1) is perfectly combined with )+.

各々のクロスバスイッチは、プロセッサ番号を構成する
３次元座標の一つの特定次元の座標値を他の座標値に置
き換えた番号を持つ要素プロセッサと通信する機能を持
つ。このため、このクロスバスイッチを以下では座標変
換クロスバスイッチと呼ぶ。そして、特定の次元にの座
標変換を行なうスイッチをに次元座標変換クロスバスイ
ッチと呼ぶ。後に示すように、３個の座標変換クロスバ
スイッチを中継することにより、いかなる番号の要素プ
ロセッサとも通信することができる。Each crossbar switch has a function of communicating with an element processor having a number in which the coordinate value of one specific dimension of the three-dimensional coordinates constituting the processor number is replaced with another coordinate value. Therefore, this crossbar switch is hereinafter referred to as a coordinate conversion crossbar switch. A switch that performs coordinate transformation to a specific dimension is called a dimensional coordinate transformation crossbar switch. As will be shown later, any number of element processors can be communicated with by relaying the three coordinate transformation crossbar switches.

（２）要素プロセッサの構造第３図には、要素プロセッサの構造を示す。(2) Structure of element processor FIG. 3 shows the structure of an element processor.

要素プロセッサｐ　（ｉ、ｊ＋　ｋ）は中継装置１およ
びプログラムカウンタを持ち命令を逐次実行していく通
常の計算機である処理装置２とから構成されている。中
継装置１はマイクロプログラムを内蔵し、処理装置２ま
たは入力ポートレジスタ５から入力した通信用パケット
送信先プロセッサ番号を解読し、その結果に基づき特定
の座標変換クロスバスイッチ９−１〜９−３、または処
理装置２を選択してそこに通信用パケットを送る機能を
持つ通信制御装置３と、通信用パケットを一時的に格納
する入力ポートレジスタ５および出力ポートレジスタ６
と、３個の座標変換クロスバスイッチからの入力通信路
の一つを選択するセレクタ７と、出力ポート中の通信用
パケットの送出先としてＮ個の座標変換クロスバスイッ
チの一つを選択する分配器８とから構成されている。こ
こに通信用パケットは送信先プロセッサ番号と送信デー
タとから成っている。The element processor p (i, j+k) is composed of a relay device 1 and a processing device 2 which is a normal computer having a program counter and sequentially executing instructions. The relay device 1 has a built-in microprogram, decodes the communication packet transmission destination processor number input from the processing device 2 or the input port register 5, and based on the result, specifies the coordinate conversion crossbar switches 9-1 to 9-3, Alternatively, a communication control device 3 having a function of selecting the processing device 2 and sending communication packets thereto, and an input port register 5 and an output port register 6 that temporarily store communication packets.
, a selector 7 that selects one of the input communication paths from the three coordinate conversion crossbar switches, and a distributor that selects one of the N coordinate conversion crossbar switches as the destination of the communication packet in the output port. It consists of 8. Here, the communication packet consists of a destination processor number and transmission data.

（３）通信方法次に第１図の３次元の例で、プロセッサＰ（ｉ、ｊ、ｋ
）−出発地プロセッサーからプロセッサＰ　（０，Ｏ，
Ｏ）−目的地プロセッサーへ送信する仕組みについて第
１、３．４図を用いて説明する。まず、出発地プロセッ
サＰ（ｉ。(3) Communication method Next, in the three-dimensional example shown in FIG.
) - origin processor to processor P (0, O,
O) - The mechanism for transmitting to the destination processor will be explained using Figures 1 and 3.4. First, the origin processor P(i.

ｊ、ｋ）の処理袋Ｗ２は、通信制御装置３に通信用パケ
ットを入力しその送信を指示する。通信用パケットの宛
先情報（プロセッサ番号）は３個の座標（０，Ｏ，Ｏ）
から構成され、その第一座標から順にその座標値を通信
制御装置３のマイクロプログラム中の自プロセッサ番号
を構成する３個の座標値（ｉ、ｊｔ　ｋ）と比較してい
き、最初に不一致となった第１座標ｉに関してこの座標
値をＯに置き換えた査を持つ要素プロセッサｐ　（０，
ｊ＋　ｋ）と通信するべく、対応する第１座標変換クロ
スバスイッチ９−１を選択する。通信用パケットは出力
ポートレジスタ６に置かれ、選択されたクロスバスイッ
チ９−１の番号“１″は信号線１２により分配器に入力
される。分配器８はこの番号１１１　ＩＦを用いてデー
タ線１３、制御信号線１２を第１座標変換クロスバスイ
ッチ９−１の一つの入力チャネル１０１，１０２に接続
する。通信制御装置３は制御信号線１２および１０１デ
ータ線１３および１０２を用いて出力ポートロ中の通信
用パケットを第１座標変換クロスバスイッチ９−１に送
出する。座標変換クロスバスイッチの構造と動作につい
ては後述する。The processing bag W2 of j, k) inputs a communication packet to the communication control device 3 and instructs its transmission. The destination information (processor number) of the communication packet is 3 coordinates (0, O, O)
The coordinate values are compared in order from the first coordinate with the three coordinate values (i, jt k) that constitute the own processor number in the microprogram of the communication control device 3, and the first one is determined to be a mismatch. Regarding the first coordinate i that has become, an element processor p (0,
j+k), the corresponding first coordinate conversion crossbar switch 9-1 is selected. The communication packet is placed in the output port register 6, and the number "1" of the selected crossbar switch 9-1 is input to the distributor via the signal line 12. The distributor 8 uses this number 111 IF to connect the data line 13 and the control signal line 12 to one input channel 101, 102 of the first coordinate conversion crossbar switch 9-1. The communication control device 3 uses the control signal lines 12 and 101 and the data lines 13 and 102 to send out communication packets in the output ports to the first coordinate conversion crossbar switch 9-1. The structure and operation of the coordinate conversion crossbar switch will be described later.

該クロスバスイッチ９−１を経由してこの通信用パケッ
トを送られた要素プロセッサＰ　（０゜ｊ、ｋ）は、そ
のセレクタ７−２が送信要求信号０ＲＥＱ　（後述）を
出力している複数のクロスバスイッチの中から該クロス
バスイッチ９−１を選択すると、（本発明では選択の論
理については主張しない）クロスバスイッチの出力チャ
ネル線１０３，１０４が制御信号線１０−２とデータ線
１１−２に接続され、入力ポートレジスタ５−２を経て
通信制御装置３−２に通信用パケットを取り込む。この
とき、セレクタ７−２はその選択論理が選択した受信ク
ロスバスイッチ９−１の番号１１１　＃ｌも制御信号線
１〇−２により伝える。要素プロセッサＰ　（０＊　ｊ
＋ｋ）の通信制御袋［３−２では該スイッチ番号Ｊｌ　
Ｉ　Ｉ＋から変換された座標−二の例では第１座標−を
知り、再びそれ（第１座標）以降の座標から順に目的地
プロセッサの座標値（傘、０゜０）−傘は変換済み座標
であることを示す−を自プロセッサの座標値（０，ｊ、
ｋ）と比較していき、最初に不一致となった第２次元座
標ｊに関してこの座標値を０に置き換えた座標を持つ要
素プロセッサＰ　（０，Ｏ，ｋ）に送信するべく第２次
元座標変換クロスバスイッチ９−４を選択して入力チャ
ネル線２０１，２０２に通信用パケットを送出する。出
力チャネル線２０３゜２０４からパケットを入力したプ
ロセッサＰ（０，Ｏ，ｋ）でも同様にして中継動作を行
ない、第３次元座標変換クロスバスイッチ９−５の入力
チャネル線３０１，３０２にパケットを送出し、目的地
プロセッサＰ　（０，Ｏ，Ｏ）へは出力チャネル線３０
３，３０４を経由して通信用パケットを届けることが出
来る。目的地プロセッサＰ　（０，Ｏ，Ｏ）では、セレ
クタ７−４を経由して入力ボート５−４に格納されたパ
ケットの宛先（プロセッサ番号）（０，Ｏ，Ｏ）を通信
制御装置３−４が解読し、マイクロプログラム中の自プ
ロセッサ番号（０，０，０）と一致するので、処理装置
２−４にパケットの到着を通知する。The element processor P (0°j, k) to which this communication packet is sent via the crossbar switch 9-1 selects a plurality of processors whose selector 7-2 outputs a transmission request signal 0REQ (described later). When the crossbar switch 9-1 is selected from among the crossbar switches, the output channel lines 103 and 104 of the crossbar switch are connected to the control signal line 10-2 and the data line 11-2 (the logic of selection is not claimed in the present invention). It is connected, and a communication packet is taken into the communication control device 3-2 via the input port register 5-2. At this time, the selector 7-2 also transmits the number 111 #l of the receiving crossbar switch 9-1 selected by the selection logic through the control signal line 10-2. Element processor P (0* j
+k) communication control bag [in 3-2, the corresponding switch number Jl
In the second example, the coordinates converted from I I + are known, and the destination processor's coordinate values (umbrella, 0°0) are calculated sequentially from the coordinates after that (first coordinate) - the umbrella is the converted coordinates. - indicates that the coordinate value of the own processor (0, j,
k), and for the first second-dimensional coordinate j that does not match, the second-dimensional coordinate transformation is performed in order to send this coordinate value to the element processor P (0, O, k) whose coordinate value is replaced with 0. Crossbar switch 9-4 is selected and communication packets are sent to input channel lines 201 and 202. The processor P (0, O, k) which receives the packet from the output channel lines 203 and 204 performs the same relay operation and sends the packet to the input channel lines 301 and 302 of the third-dimensional coordinate transformation crossbar switch 9-5. and the output channel line 30 to the destination processor P (0, O, O)
Communication packets can be delivered via 3,304. The destination processor P (0, O, O) selects the destination (processor number) (0, O, O) of the packet stored in the input port 5-4 via the selector 7-4 from the communication control device 3- 4 is decoded and matches the own processor number (0, 0, 0) in the microprogram, so the processing device 2-4 is notified of the arrival of the packet.

一般のＮ次元の場合でも同様に、このようにして次々と
座標変換クロスバスイッチにより不一致座標を目的地プ
ロセッサの座標に置き換えた座標を持つ要素プロセッサ
に中継していくことにより、最後には目的地プロセッサ
に情報を送信できる。不一致座標の変換は高々Ｎ回で完
了するから、この結合方式の最大通信路長はＮである。Similarly, in the general N-dimensional case, by relaying the mismatched coordinates one after another to the element processor whose coordinates have been replaced with the coordinates of the destination processor using the coordinate transformation crossbar switch, the destination processor is finally reached. Can send information to the processor. Since the conversion of mismatched coordinates is completed in at most N times, the maximum communication path length of this combination method is N.

第１図の３次元の例では、最大通信路長は３である。し
かし、格子結合、リング結合。In the three-dimensional example of FIG. 1, the maximum communication path length is three. However, lattice bonds and ring bonds.

バタフライ結合に関しては、中継動作無しで一回の送信
で目的地プロセッサに通信用パケットを転送できるので
、フルクロスバスイッチと同等の通信性能となる。第４
図には、以上に述べた通信制御装置３の中継動作論理を
示す。Regarding butterfly coupling, communication packets can be transferred to the destination processor in one transmission without any relay operation, resulting in communication performance equivalent to that of a full crossbar switch. Fourth
The figure shows the relay operation logic of the communication control device 3 described above.

（４）座標変換クロスバスイッチの構造と動作第５図に
は、一つのＬ入力り出力のクロスバスイッチ９の外部イ
ンタフェースを示す。−組の入力チャネルは２本の制御
信号線ＩＲＥＱとＩＡＣＫ、およびデータ線ＩＤＡＴＡ
とからなっている。ＩＲＥＱは通信用の要素プロセッサ
が出力ポートレジスタ６に送信したいデータを格納して
送信待ち状態にあることをクロスバスイッチに通知する
信号を載せるためのものであり、ＩＡＣＫはクロスバス
イッチが次の送信データを出力ポートレジスタ６に書き
込んでも良いことを要素プロセッサに通知する信号を載
せるためのものである。ＩＤＡＴＡには送信データを載
せる。同様に、−組の出力チャネルは２本の制御信号ｇ
ＯＲＥＱと０ＡｃＫ、オヨび出力データ線０ＤＡＴＡと
からなっている。０ＲＥＱはクロスバスイッチが受信側
要素プロセッサの入力ポートレジスタ５に送信データの
転送を要求する信号を載せるためのものであり、　ＯＡ
Ｃには受信側要素プロセッサがそれを完了した信号を載
せるためのものである。０ＤＡＴＡには送信データを載
せる。以上のインタフェースにおいて、制御信号線ＩＲ
ＥＱ、ＩＡＣＫは第３図に示すように分配器８を介して
要素プロセッサの通信制御装置３と、また、制御信号線
０ＲＥＱ。(4) Structure and operation of coordinate conversion crossbar switch FIG. 5 shows an external interface of the crossbar switch 9 with one L input/output. - a set of input channels consists of two control signal lines IREQ and IACK and a data line IDATA;
It consists of IREQ is for transmitting a signal to notify the crossbar switch that the communication element processor has stored the data it wants to transmit in the output port register 6 and is waiting for transmission. This is to carry a signal to notify the element processor that it is OK to write the data into the output port register 6. Transmission data is loaded on IDATA. Similarly, the − set of output channels has two control signals g
It consists of OREQ, 0AcK, and output data line 0DATA. 0REQ is for the crossbar switch to load a signal requesting transmission data transfer to the input port register 5 of the receiving side element processor, and OA
C is for carrying a signal that the element processor on the receiving side has completed. Transmission data is placed in 0DATA. In the above interface, the control signal line IR
EQ and IACK are connected to the communication control device 3 of the element processor via the distributor 8 as shown in FIG. 3, and also to the control signal line 0REQ.

０ＡＧＫはセレクタ７を介して要素プロセッサの通信制
御装置３と接続されている。また、データ線ＩＤＡＴＡ
は分配器８を介して要素プロセッサの出力ポートレジス
タ６と、データ線０ＤＡＴＡはセレクタ７を介して要素
プロセッサの入力ポートレジスタ５と接続されている。0AGK is connected to the communication control device 3 of the element processor via the selector 7. In addition, the data line IDATA
is connected to the output port register 6 of the element processor via the distributor 8, and the data line 0DATA is connected to the input port register 5 of the element processor via the selector 7.

クロスバスイッチは、他に、後述するプロセッサ番号を
マスクするためのマスクレジスタの内容（マスクバタン
）を設定するためのマスクレジスタ書き込み制御信号線
Ｗとマスクバタン信号線ＭＡＳＫを備えている。The crossbar switch also includes a mask register write control signal line W and a mask button signal line MASK for setting the contents of a mask register (mask button) for masking a processor number, which will be described later.

クロスバスイッチの構造の一例を第６図に示す。この例
では、３入力６出力のクロスバスイッチをとりあげてい
るが、一般のＬ入力Ｌ出力の場合でも全く同様である。An example of the structure of a crossbar switch is shown in FIG. In this example, a 3-input, 6-output crossbar switch is taken up, but the same applies to a general L-input and L-output case.

送信を行なおうとする要素プロセッサ、例えば第３図で
ｉ＝２とした場合プロセッサｐ　（２１ｊｔ　ｋ）の通
信制御装置３は出力ポートレジスタ６に送信データを格
納した後、分配器８にクロスバスイッチの番号“１”を
送って特定のクロスバスイッチ９−１を選択・接続し、
信号線１２を経由して入力チャネル２の制御信号、［１
０１すなわちＩＲＥＱ２に送信要求信号を出力する。当
該クロスバスイッチはプロセッサＰ　（０１ｊ、ｋ）、
Ｐ　（１゜ｊ、ｋ）ｒ　Ｐ　（２ｗ　ｊ−ｋ）をそれぞ
れ人出カチャネル０，１、２により接続している。線Ｉ
ＲＥＱＺ上の要求信号が当該クロスバスイッチのデコー
ダ２０−３に入力されると、送信先を求めるために出力
ポートロ上にある送信情報パケット中の送信先プロセッ
サ番号の第１次元座標相当部分がデコードされ、送信先
チャネル、例えば目的地プロセッサＰ　（０，Ｏ，Ｏ）
に対応するチャネル（出力チャネルＯ）に対しては１が
、それ以外に対してはＯが信号線２６−３に出力されて
全ての優先順位制御回路２１−１〜２１−３に伝えられ
る。デコードされるプロセッサ番号のビット列はその一
部だけがデコーダ２０−３に入力されれば良い。従って
、プロセッサ番号は、３次元格子空間の座標を表す３個
のフィールドよりなり、各次元の座標変換クロスバスイ
ッチではこれらのフィールドの当該次元に対応する一つ
のフィールドを取り出す機構が必要である。各次元の座
標値の範囲は一般には等しくなく、これに伴い各次元の
座標フィールドの位置や長さはまちまちである。本実施
例では、このフィールドを可変的に選択できるよう、各
デコーダにはマスクレジスタ２４−１〜２４−３を用意
し、プロセッサ番号の一部マスクして残りのビット列だ
けをデコードする仕組にしている。The communication control device 3 of the element processor that attempts to transmit, for example, processor p (21jt k) when i=2 in FIG. Send the number “1” to select and connect a specific crossbar switch 9-1,
Control signal of input channel 2 via signal line 12, [1
01, that is, outputs a transmission request signal to IREQ2. The crossbar switch is a processor P (01j, k),
P (1°j, k)r P (2w j−k) are connected by crowd channels 0, 1, and 2, respectively. Line I
When the request signal on REQZ is input to the decoder 20-3 of the crossbar switch, the part corresponding to the first dimension coordinates of the destination processor number in the transmission information packet on the output portro is decoded to find the destination. , destination channel, e.g. destination processor P (0,O,O)
For the channel corresponding to (output channel O), 1 is output, and for the other channels, O is output to the signal line 26-3 and transmitted to all priority control circuits 21-1 to 21-3. Only a part of the bit string of the processor number to be decoded needs to be input to the decoder 20-3. Therefore, a processor number consists of three fields representing coordinates in a three-dimensional grid space, and a coordinate conversion crossbar switch for each dimension requires a mechanism for extracting one field corresponding to the relevant dimension from among these fields. The range of coordinate values in each dimension is generally not equal, and accordingly, the position and length of the coordinate field in each dimension vary. In this embodiment, in order to variably select this field, mask registers 24-1 to 24-3 are provided in each decoder, and a mechanism is adopted in which a part of the processor number is masked and only the remaining bit string is decoded. There is.

第１１図にマスクレジスタの機能と端成を述べる。図で
は１６Ｘ１６のクロスバスイッチを想定しているが、他
の規模のクロスバスイッチでも同様である。マスクレジ
スタ２４は一種のマトリクススイッチとして働き、ＩＤ
ＡＴＡ中の送信先プロセッサ番号を示すビット列ｄｏ。FIG. 11 describes the function and structure of the mask register. Although the figure assumes a 16x16 crossbar switch, the same applies to crossbar switches of other sizes. The mask register 24 works as a kind of matrix switch, and the ID
A bit string do indicating the destination processor number in the ATA.

ｄｚ、・・・ｄ２１１のうち特定次元の座標を示す部分
ビット列ｄｉｄＪｄｈｄ露を選択してデコーダ（ＲＯＭ
）２ｏへの入力（アドレス）ＡｔＡｚＡａＡ４として出
力する。そのためには、図に示すようにデータ線ｄｓｄ
ａｄｈｄｍと出力線Ａ　ＩＡ　ｘ　Ａ　ｓ　Ａ　＆の交
点に対応するフィールドにＯを、他のフィールドに１を
書き込めば良い、出力線Ａ　Ｉ　Ａ　ｚＡ　ｓ　Ａ　！
上の信号はデコーダ２０により優先順位制御回路番号を
表す４ビツトの２進数としてデコードされ、優先順位制
御回路０〜１５への送信要求信号ｒｏ”ｒｔｓに変換さ
れる０例えば。dz,...d211, the partial bit string didJdhd indicating the coordinates of a specific dimension is selected and sent to the decoder (ROM
) Input to 2o (address) Output as AtAzAaA4. For that, data line dsd as shown in the figure
Just write O in the field corresponding to the intersection of adhdm and the output line A IA x A s A &, and write 1 in the other fields, the output line A I A z A s A!
The above signal is decoded by the decoder 20 as a 4-bit binary number representing the priority control circuit number, and is converted into a transmission request signal ro'rts to the priority control circuits 0-15.

ｄ＋ｄＪｄｋｄｔの内容が’ｏ　ｏ　ｏ　ｏ’であれば
、優先順位制御回路Ｏを選択するべ（’１０００・・・
０′なる信号にデコードされる。このスイッチを１６Ｘ
１６より小模式のクロスバスイッチ、例えば４Ｘ４のク
ロスバスイッチとして用いる場合は、マスクレジスタ２
４は２ビツトだけを選択し、デコーダＲＯＭ２０への入
力アドレスＡ　ｓ　Ａ　２として用いる。Ａ　ａ　Ａ　
４は０となる。従って、この場合にはデコーダＲＯＭは
その一部だけが使用される。マスクレジスタ２４−１〜
２４−３の内容は外部（要素プロセッサまたはホスト計
算機等）からマスクレジスタ書き込み制御回路２５に指
示して設定可能である（ＷおよびＭＡＳＫ信号線を用い
る）。If the content of d+dJdkdt is 'o o o o', priority control circuit O should be selected ('1000...
It is decoded into a signal of 0'. This switch is 16X
When used as a crossbar switch smaller than 16, for example a 4x4 crossbar switch, mask register 2 is used.
4 selects only 2 bits and uses them as the input address A s A 2 to the decoder ROM 20. A a A
4 becomes 0. Therefore, in this case, only a portion of the decoder ROM is used. Mask register 24-1~
The contents of 24-3 can be set by instructing the mask register write control circuit 25 from the outside (element processor, host computer, etc.) (using the W and MASK signal lines).

優先順位制御回路２１−１には各入力チャネルから送信
要求が伝えられ、後述するように、予め定められた論理
に従ってそのうちの一つが選択される。その後、優先順
位制御回路２１−１は選択転送制御回路２２−１中のバ
ッファ２３−１が空きであることを確認し、信号線２７
−１により選択転送制御回路２２−１に選択された入力
チャネル番号（チャネル２）を伝える。その結果、プロ
セッサＰ　（２，ｊｔ　ｋ）の出力ポートロ上にある送
信情報パケットが（分配器８および入力チャネル２のデ
ータ線１０２（すなわちＩＤＡＴＡｚ）を経由して）選
択転送制御回路２２−１中のバッファ２３−１に転送さ
れる。この間優先順位制御回路２１−１はビジー状態に
あり、転送が完了すると次の選択動作に入る。Transmission requests are transmitted from each input channel to the priority control circuit 21-1, and one of them is selected according to predetermined logic, as will be described later. Thereafter, the priority control circuit 21-1 confirms that the buffer 23-1 in the selective transfer control circuit 22-1 is empty, and the signal line 27
-1, the selected input channel number (channel 2) is transmitted to the selection transfer control circuit 22-1. As a result, the transmission information packet on the output port of the processor P (2, jt k) is transferred (via the distributor 8 and the data line 102 (i.e., IDATAz) of the input channel 2) into the selective transfer control circuit 22-1. The data is transferred to the buffer 23-1. During this time, the priority control circuit 21-1 is in a busy state, and when the transfer is completed, it begins the next selection operation.

選択転送制御回路２２−１は、バッファ２３−１にデー
タが転送されると、出力光プロセッサ（プロセッサＰ　
（０，ｊ、ｋ））に対して送信を要求する信号線○ＲＥ
Ｑｏ、すなわち制御信号線１０３上に出力する。プロセ
ッサｐ（ｏ。When data is transferred to the buffer 23-1, the selective transfer control circuit 22-1 outputs an output optical processor (processor P).
Signal line ○RE that requests transmission to (0, j, k))
Qo, that is, output on the control signal line 103. Processor p(o.

ｊ＋ｋ）のセレクタ７−２には複数個の座標変換クロス
バスイッチからの送信要求信号が入力され、予め定めら
れた論理に従ってそのうちの一つが選択され通信制御装
置３−２に伝えられる０通信制御装置３−２は、入力ポ
ートレジスタ５−２が空であるばあい、データ線０ＤＡ
ＴＡｏ。Transmission request signals from a plurality of coordinate conversion crossbar switches are input to the selector 7-2 of j+k), and one of them is selected according to predetermined logic and transmitted to the communication control device 3-2. 3-2 is the data line 0DA when the input port register 5-2 is empty.
TAo.

すなわちデータ、１１！１０４上のデータをセレクタ７
−２を経由して入力ポートレジスタ５−２に書き込み、
書き込みが終了すると、書き込み完了信号を制御信号線
０ＡＣＫｏ上に出力する。That is, data, 11!The data on 104 is selected by selector 7
-2, write to input port register 5-2,
When writing is completed, a write completion signal is output onto control signal line 0ACKo.

選択転送制御回路２２−１は線ＯＡ　ＣＫ　ｏから書き
込み完了信号を入力すると線ＯＲＥ　Ｑｏ上の送信要求
信号をネゲートし、プロセッサＰ（Ｏｔ　ｊｔ　ｋ）の
通信制御装置３−２はこれを検知して線ＯＡ　ＣＫ　ｏ
上の受信完了信号をネゲートする。選択転送制御回路２
２−１のバッファ２３−１は再び転送可能状態になり、
プロセッサＰ　（０＝　ｊｔ　ｋ）のセレクタ７−２も
他のクロスバスイッチを選択可能になる。When the selective transfer control circuit 22-1 receives a write completion signal from the line OA CK o, it negates the transmission request signal on the line ORE Qo, and the communication control device 3-2 of the processor P (Ot jt k) detects this. Line OA CK o
Negate the reception completion signal above. Selective transfer control circuit 2
The buffer 23-1 of 2-1 becomes ready for transfer again,
The selector 7-2 of the processor P (0=jtk) also becomes able to select another crossbar switch.

一方、優先順位制御回路２１−１がビジー状態から抜は
出ると、信号線２６−３によりこれを検知したデコーダ
２０−３は、プロセッサＰ（２１、Ｌ　ｋ）に対しＩＡ
ＣＫｚ上に転送完了信号を送る。ＩＡＣＫｚ上の転送完
了信号を受信したプロセッサｐ　（２１、Ｌ　ｋ）の通
信制御装置３はＩＲＥＱｚ上の転送要求信号をネゲート
し１次の送信データを出力ポートロ上に載せることが可
能となる。On the other hand, when the priority control circuit 21-1 comes out of the busy state, the decoder 20-3 detects this through the signal line 26-3 and sends an IA signal to the processor P (21, L k).
Sends a transfer completion signal on CKz. The communication control device 3 of the processor p (21, Lk) which has received the transfer completion signal on IACKz negates the transfer request signal on IREQz, and is now able to place the primary transmission data on the output port.

優先順位制御回路２１−１〜２１−３の論理の一例を第
７図に示す。この例では、３つのデコーダからの入力が
３ビツトの情報、すなわちＯ〜７となることに着目し、
８エントリのメモリ（ＲＡＭ）１５に各入力に対応した
許可信号のパタンを記憶させておく方式をとる。しかし
、優先順位制御回路の論理はこれにとどまるものではな
い。An example of the logic of the priority control circuits 21-1 to 21-3 is shown in FIG. In this example, we focus on the fact that the input from the three decoders is 3-bit information, that is, O to 7.
A system is adopted in which a pattern of permission signals corresponding to each input is stored in a memory (RAM) 15 having eight entries. However, the logic of the priority control circuit is not limited to this.

第二実施例第８，９図には第二実施例の結合方式の説明図を示す。Second embodiment FIGS. 8 and 9 are explanatory diagrams of the coupling method of the second embodiment.

第一実施例と異なるのは、要素プロセッサ内の通信制御
装置３で中継動作を行う代わりに、各要素プロセッサ毎
に設置された中継クロスバスイッチ１４がこれを行う点
にある。The difference from the first embodiment is that instead of the communication control device 3 in the element processor performing the relay operation, a relay crossbar switch 14 installed in each element processor performs this operation.

（１）中継クロスバスイッチの構造と動作中継クロスバ
スイッチの構造は基本的には座標変換クロスバスイッチ
の構造と同じであるが、デコーダに入力される宛先プロ
セッサ番号は一つの座標フィールドでなく、３個の座標
フィー゛ルド全でである。中継クロスバスイッチの宛先
デコード部の詳細説明図を第１２図を用いて行なう。デ
ータ線ＩＤＡＴＡより入力された送信先プロセッサ番号
と、中継クロスバスイッチ内に用意された自プロセッサ
番号を格納した自プロセッサ番号レジスタ５０の内容は
比較器５１に入力され、ビットワイズにＥＸＯＲがとら
れて一致すれば１が、不一致であれば０が信号線５２−
１〜５２−３２上に出力される。この出力はマスクレジ
スタ２４に入力され、マスクレジスタ２４の交点フィー
ルドにＯが書かれている場合、すなわちマスクされてい
ない場合は、この入力は出力線Ａ　Ｉ　Ａ　Ｚ　Ａ　ａ
上にそのまま出力され、ここでワイアラドＡＮＤがとら
れる。その結果、マスクされずに出力線につながれた比
較器出力が全て１の場合にのみ、出力線には１が出力さ
れる。各出力線Ａ　Ｉ　Ａ　！　Ａ　ｓに対しプロセッ
サ番号の第１〜第３各座標を割り付ければ。(1) Structure and operation of a relay crossbar switch The structure of a relay crossbar switch is basically the same as that of a coordinate conversion crossbar switch, but the destination processor number input to the decoder is not one coordinate field but three coordinate fields. is the entire coordinate field. A detailed explanatory diagram of the destination decoding section of the relay crossbar switch will be explained using FIG. The destination processor number input from the data line IDATA and the contents of the own processor number register 50, which stores the own processor number prepared in the relay crossbar switch, are input to the comparator 51 and EXORed bitwise. If they match, 1 is returned, and if they do not match, 0 is sent to the signal line 52-
1 to 52-32. This output is input to the mask register 24, and if O is written in the intersection field of the mask register 24, that is, if it is not masked, this input is input to the output line A I A Z A a
It is output as is on the top, and wired AND is performed here. As a result, 1 is output to the output line only when all comparator outputs connected to the output line without being masked are 1. Each output line AI A! If the first to third coordinates of the processor number are assigned to As.

ある座標フィールドが全て自プロセッサ番号の対応する
フィールドと等しければ出力線には１が、そうでなけれ
ば（不一致座標の場合には）Ｏが出力されることになる
６出力線上の信号は反転されてデコーダ２０に入力され
る。If a certain coordinate field is all equal to the corresponding field of the own processor number, 1 is output to the output line, otherwise (in case of mismatched coordinates) O is output.The signal on the 6 output line is inverted. and is input to the decoder 20.

例えば１図で第１座標がデータ線ＩＤＡＴＡより入力さ
れた送信先プロセッサ番号のビット０゜１、２で表され
るとする。対応する自プロセッサ番号レジスタ５０のビ
ット０，１，２とともに比較器５１に入力され、全ての
ビットが一致すれば信号線５２−１、５２−２．５２−
３には１が出力される。第１座標フイールートとマスク
レジスタの第１出力線Ａｒの交点フィールド５３−１、
５３−２．５３−３にはＯが、他の出力線Ａ　ｚ　Ａ　
ｓとの交点フィールドには１が書き込まれているから、
第１出力１ｉＡ　Ａ　１に対してのみ比較結果が送られ
る。従って、信号線５２−１、５２−２．５２−３に全
て１が出力された場合にのみ、マスクレジスタの第１出
力線Ａ１に１が出力される。For example, in FIG. 1, assume that the first coordinate is represented by bits 0°1 and 2 of the destination processor number input from the data line IDATA. It is input to the comparator 51 along with bits 0, 1, and 2 of the corresponding own processor number register 50, and if all the bits match, the signal lines 52-1, 52-2, 52-
1 is output to 3. An intersection field 53-1 between the first coordinate track and the first output line Ar of the mask register;
53-2.53-3 has O, other output line A z A
Since 1 is written in the intersection field with s,
The comparison result is sent only to the first output 1iA A1. Therefore, only when all 1s are output to the signal lines 52-1, 52-2, and 52-3, 1 is output to the first output line A1 of the mask register.

デコーダは出力線Ａ　ｓ　Ａ　ｚ　Ａ　ｓ上の信号を２
進アドレスとしてチャネル番号に変換し、該チャネルの
優先順位制御回路に１を、他には０を送る。例えば、出
力線Ａ　Ｉ　Ａ　ｘ　Ａ　ｓ上の信号が全て１であった
場合、すなわち反転されたデコーダ入力アドレス’ｏ　
ｏ　ｏ’に対してはチャネルＯ１すなわち、自プロセッ
サへのチャネルを選択する。The decoder converts the signal on the output line A s A z A s into 2
The channel number is converted into a channel number, and 1 is sent to the priority control circuit of the channel, and 0 is sent to the others. For example, if the signals on the output line A
For o o', select channel O1, that is, the channel to the own processor.

（２）通信方法第８図において、一つの座標変換クロスバスイッチ９か
ら中継クロスバスイッチ１４に通信用パケットが入力さ
れるとその宛先がデコードされ、もし、このプロセッサ
宛であればスイッチを要素プロセッサの入力ポート５へ
接続しパケットを入力する。もし、その以外の宛先であ
れば、不一致座標を変換する座標変換クロスバスイッチ
９を選択してそれに接続する。中継クロスバスイッチ１
４の外部インタフェースは座標変換クロスバスイッチ９
と同じである。(2) Communication method In FIG. 8, when a communication packet is input from one coordinate conversion crossbar switch 9 to the relay crossbar switch 14, its destination is decoded, and if it is addressed to this processor, the switch is sent to the element processor. Connect to input port 5 and input packets. If the destination is any other destination, select the coordinate conversion crossbar switch 9 that converts the mismatched coordinates and connect to it. Relay crossbar switch 1
4 external interface is coordinate conversion crossbar switch 9
is the same as

第９図には要素プロセッサＰ　（ｉ、ｊ、ｋ）から要素
プロセッサＰ　（０１ｊｔ　ｋ）　、　Ｐ　（０゜０、
ｋ）の中継クロスバスイッチを経由して要素プロセッサ
Ｐ　（０，Ｏ，Ｏ）にパケットを転送する例を破線で示
しである。In FIG. 9, from element processor P (i, j, k) to element processor P (01jt k), P (0°0,
An example of transferring the packet to the element processor P (0, O, O) via the relay crossbar switch of k) is shown by a broken line.

第二実施例においては、通信制御装置３は第一実施例に
て述べたような通信用パケットの宛先情報（通信先プロ
セッサ番号）の解読、その結果に基づく特定の座標変換
クロスバスイッチ。In the second embodiment, the communication control device 3 decodes the destination information (communication destination processor number) of the communication packet as described in the first embodiment, and performs a specific coordinate conversion crossbar switch based on the result.

または処理装置２の選択と通信用パケットの送出機能は
持たず、単に中継クロスバスイッチ１４とのインタフェ
ースをとる機能だけを持つ。Alternatively, it does not have the functions of selecting the processing device 2 and sending communication packets, but merely has the function of interfacing with the relay crossbar switch 14.

評価第９図の例では座標変換クロスバスイッチを三つ（９−
１，９，−４，９−５）と、中継クロスバスイッチを四
つ（１４−１，１４−２，１４−３゜１４−４）経由し
ているので計７回スイッチング、動作が必要である。第
一実施例ではクロスポイントの通過数、すなわち通信用
パケットを一つの入出力ボート／バッファから次のバッ
ファ／入出力ポートへ転送する単位スイッチング動作を
３回と数えているが、要素プロセッサの制御装置３での
判定・選択処理を考慮すれば、転送に要する時間は結局
同じことになる。第一実施例（プロセッサ自身が中継す
る方式）では最大Ｎ回の送信動作が必要であることから
、このスイッチの最大通信路長はＮであり、ハードウェ
アとしてはクロスポイントの数でみるとｎｔＸｎｚＸ・
・・ＸｎＮＸ（ｎｔ＋ｎｚ＋　−＋　ｎ　Ｎ）となる。Evaluation In the example shown in Figure 9, there are three coordinate conversion crossbar switches (9-
1, 9, -4, 9-5) and four relay crossbar switches (14-1, 14-2, 14-3゜14-4), so a total of 7 switching operations are required. be. In the first embodiment, the number of crosspoint passages, that is, the unit switching operation of transferring a communication packet from one input/output port/buffer to the next buffer/input/output port, is counted as three times, but the control of the element processor Considering the determination and selection processing in the device 3, the time required for transfer will be the same after all. In the first embodiment (method in which the processor itself relays), a maximum of N transmission operations are required, so the maximum communication path length of this switch is N, and in terms of the number of cross points in terms of hardware, it is ntXnzX・
...XnNX (nt+nz+ -+ n N).

また、ｎＫ”：　ｋ＝１、−Ｎの最大値がクロスバスイ
ッチの最大結合能力である。Further, nK'': k=1, the maximum value of -N is the maximum coupling capacity of the crossbar switch.

また、第二実施例（プロセッサ対応に中継クロスバスイ
ッチを持つ方式）では、中継クロスバスイッチ１４で中
継動作を行うこと自体を一回の送信動作とみなすと最大
２Ｎ＋１回の送信動作が必要となる。すなわちこのスイ
ッチの最大通信路長は２Ｎ＋１である。また、ハードウ
ェア量はｎ１×ｎ２×−ＸｎＮ×｛（Ｎ＋１）”＋ｎｌ
＋ｎｚ＋−＋ｎＮ）で表される。Furthermore, in the second embodiment (system having a relay crossbar switch corresponding to a processor), if performing a relay operation by the relay crossbar switch 14 itself is regarded as one transmission operation, a maximum of 2N+1 transmission operations are required. That is, the maximum communication path length of this switch is 2N+1. Also, the amount of hardware is n1×n2×-XnN×{(N+1)”+nl
+nz+-+nN).

次に、本発明の相互結合方式により、一つのクロスバス
イッチの最大結合能力が与えられたときに、最高の性能
が出せる構成と最小のハードウェア量で済む構成を容易
に求めることが出来ることを示す。Next, it will be shown that by using the mutual coupling method of the present invention, it is possible to easily find a configuration that can achieve the highest performance and a configuration that requires the least amount of hardware, given the maximum coupling capacity of one crossbar switch. show.

性能は、クロスバスイッチの各プロセッサ間結合用信号
線の本数を一定とすると、通信路長Ｎまたは２Ｎ＋１で
決まる。すなわち、プロセッサを論理的に配置する空間
の次元を出来るだけ小さくする方が高い性能が得られる
。例えば第一実施例に示すプロセッサが中継する方式の
場合、プロセッサ台数をり、一つのクロスバスイッチの
最大結合可能プロセッサ台数をｎとするとき、ｑ＝　［
ｌｏｇＬ／ｌｏｇｎ］　＋　１が該クロスバスイッチを
用いたときの最小通信路長である。ここに［］は商の整
数部分をとる記号である。このときの構成は、要素プロ
セッサをｑまたはｑ＋１次元の格子空間の超立方体状領
域に配置し、その中の一次元部分領域を構成する全ての
要素プロセッサを結合可能プロセッサ台数が上記最大値
ｎであるクロスバスイッチを用いて結合したものである
。Performance is determined by the communication path length N or 2N+1, assuming that the number of signal lines for coupling between each processor of the crossbar switch is constant. That is, higher performance can be obtained by reducing the dimension of the space in which processors are logically arranged as small as possible. For example, in the case of the processor relay method shown in the first embodiment, when the number of processors is divided and the maximum number of processors that can be connected to one crossbar switch is n, then q=[
logL/logn] + 1 is the minimum communication path length when using the crossbar switch. Here, [] is a symbol that takes the integer part of the quotient. In this case, the element processors are arranged in a hypercube-like area of a q or q+1-dimensional lattice space, and the number of processors that can be combined with all the element processors constituting a one-dimensional partial area within the hypercube is the maximum value n above. They are connected using a certain crossbar switch.

一方、ハードウェア量はｎｌＸｎ２Ｘ・・・Ｘ−ｎＮＸ
（ｎ工＋ｎ２＋・・・＋ｎＮ）となるから、明らかにｎ
１＝２の場合が最小ハードウェア量となる。しかし、第
二実施例に示す中継クロスバスイッチを用いろ方式では
、第１０図に示すように、別の構成を取るときにハード
量が最小と成る。例えば２５６台構成では８Ｘ８Ｘ４台
の３次元に、４０９６台構成では８Ｘ８Ｘ８Ｘ８台の４
次元に配置する場合がハード量が最小となる。また、あ
る程度ハード量が多くても性能が出た方が良いとする立
場に立てば、要素プロセッサ６４台〜１０２４台構成で
は８×８〜３２ｘ３２の２次元構成が、２０４８〜３２
７６８台構成では４Ｘ８Ｘ８〜３２Ｘ３２×３２の３次
元構成が適当であろう。On the other hand, the amount of hardware is nlXn2X...X-nNX
(n engineering+n2+...+nN), so clearly n
The case of 1=2 is the minimum amount of hardware. However, in the method using the relay crossbar switch shown in the second embodiment, the amount of hardware is minimized when a different configuration is adopted, as shown in FIG. For example, in a 256-device configuration, 8
The amount of hardness is minimized when it is placed in a dimension. Also, from the standpoint that performance is better even if there is a certain amount of hardware, in a configuration with 64 to 1024 element processors, a two-dimensional configuration of 8 x 8 to 32 x 32
For the 768-unit configuration, a three-dimensional configuration of 4x8x8 to 32x32x32 would be appropriate.

〔発明の効果〕本発明により、一つのクロスバスイッチ（フルクロスバ
スイッチ）では接続出来ないような多数の要素プロセッ
サを、プロセッサ台数の如何にかかわらずフルクロスバ
スイッチに近い結合能力で接続するスイッチを構成する
ことができる。二二にフルクロスバスイッチに近い結合
能力とは、通信性能が高い（クロスポイント通過数が少
ない）こと、応用上重要なプロセッサ間結合トポロジー
（格子、リング、バタフライ）を内包していて、このよ
うなプロセッサ間通信パタンに対しては最小のクロスポ
イント通過数で通信できること、を意味する。従来技術
の範囲では、上記結合トポロジーを内包し、かつ、多数
台のプロセッサを結合できるネットワークとしてはハイ
パーキューブが公知であるが、本発明の結合方式は上記
特定の結合トポロジー以外の通信パタンにおける通信性
能が、ハイパーキューブよりはるかに優れている。[Effects of the Invention] According to the present invention, a switch is configured that connects a large number of element processors that cannot be connected with a single crossbar switch (full crossbar switch) with a coupling capacity close to that of a full crossbar switch, regardless of the number of processors. can do. Second, the coupling ability close to that of a full crossbar switch means that it has high communication performance (fewer cross-point passages) and includes inter-processor coupling topology (lattice, ring, butterfly), which is important for applications. This means that communication can be performed with the minimum number of crosspoints for a certain inter-processor communication pattern. Within the scope of the prior art, a hypercube is known as a network that includes the above-mentioned coupling topology and can couple a large number of processors, but the coupling method of the present invention supports communication in communication patterns other than the above-mentioned specific coupling topology. Performance is much better than Hypercube.

とくに中継クロスパイスイッチを用いることにより、デ
ッドロックを完全に防止することができる。In particular, by using a relay cross-spy switch, deadlock can be completely prevented.

また、本発明により、座標変換用クロスバスイッチ規模
（クロスバスイッチの入出力チャネル数）の（技術的・
経済的な）上限値と要素プロセッサ台数が任意に与えら
れたとき、最適（通信性能最大、ハードウェア量最小、
または両者の中間）な結合方式を構成する方法が与えら
れ、フルクロスバスイッチとハイパーキューブの間隙を
埋めることが可能となる。In addition, the present invention also improves the scale of the coordinate conversion crossbar switch (the number of input/output channels of the crossbar switch).
Optimum (maximum communication performance, minimum amount of hardware,
A method of configuring a coupling method (or an intermediate between the two) is provided, making it possible to fill the gap between a full crossbar switch and a hypercube.

さらに、チップ、モジュール、ボード、筐体等の実装単
位ごとに各次元の座標変換スイッチを収納できるよう、
要素プロセッサの結合関係を定めることが可能である。Furthermore, it is possible to store coordinate conversion switches of each dimension for each mounting unit such as a chip, module, board, or housing.
It is possible to define the connection relationship of element processors.

[Brief explanation of the drawing]

第１図は本発明の第１実施例の構成図、第２図はプロセ
ッサの超直方体状配置図、第３図は要素プロセッサの摺
成図、第４図は通信制御装置３の中継動作を示す説明図
、第５図はクロスバスイッチのインタフェース説明図、
第６図はクロスバスイッチの構成図、第７図は優先順位
制御回路の一例を示す説明図、第８図は第二実施例の構
成図、第９図は第二実施例の動作説明図、第１０図は中
継クロスバスイッチを含む場合のハードウェア量を示す
説明図、第１１図はマクスレジスタの説明図、第１２図
は中継用クロスバスイッチのデコード部説明図である。１・・・中継装置、２〜２−４・・・処理装置、３〜３
−４・・・通信制御装置、５〜５−４・・・入力ポート
レジスタ、６〜６−４・・・出力ポートレジスタ、７〜
７−４・・・セレクタ、８〜８−４・・・分配器、９−
１〜９−５・・・座標変換クロスバスイッチ、２０．２
０−１〜２０−３・・・プロセッサ番号デコーダ、２１
−１〜２１−３・・・優先順位制御回路、２２−１〜２
２−３・・・選択転送制御回路、２４．２４−４〜２４
−３・・・マスクレジスタ、１４−１〜１４−４・・・
中継クロスバスイッチ、１５・・・メモリ（ＲＡＭ）５
０・・・自プロセッサ番号レジスタ、５１・・・比較器
−ン顎　１　　図ｒ（ｖ忍θ）Ｋｚ　　図第　５　図（ｊＩＶ川旦用Ｊａ化２哲迭ぺｊ１門５　児７Ｋ　’　ｆｌβ 入力チャネルネル前不　９　　凪Ｐ（θ、θ、ρ） ■　／θ　困不　１１　　図Ｚ４　マス７Ｌノス７FIG. 1 is a block diagram of a first embodiment of the present invention, FIG. 2 is a hypercuboid arrangement of processors, FIG. 3 is a schematic diagram of an element processor, and FIG. 4 shows a relay operation of a communication control device 3. Figure 5 is an explanatory diagram of the crossbar switch interface.
FIG. 6 is a configuration diagram of a crossbar switch, FIG. 7 is an explanatory diagram showing an example of a priority control circuit, FIG. 8 is a configuration diagram of the second embodiment, and FIG. 9 is an explanatory diagram of the operation of the second embodiment. FIG. 10 is an explanatory diagram showing the amount of hardware when a relay crossbar switch is included, FIG. 11 is an explanatory diagram of a mask register, and FIG. 12 is an explanatory diagram of a decoding section of a relay crossbar switch. 1... Relay device, 2-2-4... Processing device, 3-3
-4...Communication control device, 5-5-4...Input port register, 6-6-4...Output port register, 7-
7-4...Selector, 8-8-4...Distributor, 9-
1 to 9-5...Coordinate conversion crossbar switch, 20.2
0-1 to 20-3... Processor number decoder, 21
-1 to 21-3...Priority control circuit, 22-1 to 2
2-3...Selection transfer control circuit, 24.24-4 to 24
-3...Mask register, 14-1 to 14-4...
Relay crossbar switch, 15...Memory (RAM) 5
0... Own processor number register, 51... Comparator-n-jaw 1 Figure r (v Shin θ) Kz Figure 5 (jIV Kawatan Ja conversion 2 Channel Neru Maefu 9 Nagi P (θ, θ, ρ) ■ /θ Kanfu 11 Figure Z4 Square 7L Nos. 7

Claims

[Claims] 1. In a parallel computer composed of L=n_1×n_2×...×n_N element processors or external devices (hereinafter referred to as element processors), N-dimensional grid coordinates (i_1, i_2 , ..., i_N), 0≦i_1≦
n_1-1, 0≦i_2≦n_2-1, ..., 0≦i
_N≦n_N-1 is given as the processor number of each element processor, and the dimension field in the processor number, which has a unique position and length determined corresponding to the number of grid points of a specific dimension, is decoded to perform switching regarding the dimension. A crossbar switch that performs an operation is prepared, and for any k, a group of element processors having processor numbers that differ only in the coordinates of the k-th dimension, that is, processor numbers (i_1, i_2, ..., ■, ..., i_N)(i
_1, i_2,..., 1,..., i_N)...
... (i_1, i_2, ..., n_K-1, ..., i_
A group of n_K element processors with N) is combined into one n_K
The input n_K outputs are mutually coupled by the above crossbar switch, and the coupling is performed using the coordinates (i_1, i_2, ..., n_K_-1, n_K+1) of the N-1 dimensional subspace excluding the k-th dimension.
, ..., i_N) (L/n_K group), and then for all K (1≦k≦N), total L
+...+1/n_N) crossbar switch interconnection network 2 of element processors according to claim 1, in which a destination processor number of a final destination is attached to transmitted data as an address. It sequentially takes in the information packets given by
, n_N) and the number of the destination element processor (j_1, j
_2, ..., j_N), one of the dimensions that does not match is (i
_K≠j_K), and among the N crossbar switches connected to the element processor, combine the element processor group that has a processor number that differs only in the processor number of the element processor and the coordinates of the k-th dimension, and Select a processor whose k-dimensional coordinate is equal to the k-th dimension coordinate of the destination processor number, that is, a crossbar switch that can send to the destination processor itself or a processor on the path to the destination processor. and inputs the information packet to the crossbar switch, and inputs the information packet to the processing device of its own processor if there are no mismatched coordinates, the information packet relay means provided for each element processor. Interconnected network. 3. In the interconnection network according to claim 1, an input/output port register of the processor and N crossbar switches connected to the processor are connected to the interconnection network, thereby relaying information packets to a destination.
A mutual coupling network characterized by having a relay crossbar switch with N+1 inputs and N+1 outputs provided for each element processor. 4. In claim 3, when the amount of hardware is evaluated by the number of basic changeover switches (cross points) through which information packets pass, the minimum amount of hardware for a given number L of processors. Take, that is, L×{(N+1)^2+n_1+n_2+...+n
The dimension N and the factor n_ of each dimension are set such that _N} is minimized.
1, n_2, . . . n_N is defined as an interconnection network. 5. In a crossbar switch that has the function of inputting an information packet with a destination address expressed in multidimensional coordinates and performing switching and forwarding regarding a specific dimension of the destination address, the unique position and A cross-buy switch characterized by having means for variably selecting and decoding a field in a processor number having a length according to an external instruction.