JPH07253954A

JPH07253954A - Parallel computer

Info

Publication number: JPH07253954A
Application number: JP6042322A
Authority: JP
Inventors: Tatsuyuki Ootsuka; 竜志大塚; Hideki Yoshizawa; 英樹吉沢; Katsuto Fujimoto; 克仁藤本
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1994-03-14
Filing date: 1994-03-14
Publication date: 1995-10-03
Anticipated expiration: 2017-06-10
Also published as: US5715471A; JP3290798B2

Abstract

PURPOSE:To perform processing at a high speed by eliminating the need for reallocating a processing program to a processor and rearrange data at the time of execution even when network constitution is altered. CONSTITUTION:Respecitve nodes 1-(n) are each provided with >=2 processors PE1 and PE2. The processing program is allocated to the processors PE1 and PE2 in the order of the nodes 1-(n) without being made to correspond to the transfer direction of the data, which are arranged while made to correspond to the processors PE1 and PE2. When the data are transferred to the processors PE1 and PE2, the processors performs data arranged on the respective processors and the transferred data to perform processing for finding the product of a matrix, etc. The processors are allocated in the order of the nodes, so the network constitution can easily be altered. Even when the network constitution is altered, the need to rearrange the data at the time of the execution is eliminated and the processing is performed at a high speed.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、各ノードの持つデータ
を転送して処理を行う、各ノードを往路と復路が通過す
るリング型ネットワークにおいて、効率的に処理を実現
するための並列コンピュータに関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a parallel computer for efficiently processing data in a ring type network that forwards data in each node and processes the data. .

【０００２】[0002]

【従来の技術】図１１は本発明の前提となる並列コンピ
ュータを備えたシステムの構成を示す図であり、同図に
おいて、ＣＰＵ１はリング型ネットワークを備えた並列
コンピュータ、ＣＰＵ２はホストコンピュータ、１はホ
ストコンピュータのプロセッサ、２はホストコンピュー
タＣＰＵ２の主記憶装置、３はインタフェースである。2. Description of the Related Art FIG. 11 is a diagram showing the configuration of a system equipped with a parallel computer which is the premise of the present invention. In FIG. 11, a CPU 1 is a parallel computer equipped with a ring network, a CPU 2 is a host computer, and 1 is a host computer. A processor of the host computer, 2 is a main memory of the host computer CPU 2, and 3 is an interface.

【０００３】並列コンピュータＣＰＵ１のプロセッサ・
エレメントＰＥ1,ＰＥ2 はプロセッサとメモリから構成
され、並列演算の１つの計算処理単位となる。また、ノ
ードNode1 ，…，Nodex は２個のプロセッサ・エレメン
トＰＥ1,ＰＥ2 から構成され、これがリング型ネットワ
ークのデータ転送単位となる。並列コンピュータＣＰＵ
１は上記ノードNode-1，…，Node-xをリング型ネットワ
ークで結合し、各プロセッサ・エレメントＰＥ1,ＰＥ2
で演算処理を施しながらリングRING上をデータを循環さ
せることで最終的な結果を得る。Processor of parallel computer CPU1
Each of the elements PE1 and PE2 is composed of a processor and a memory and serves as one calculation processing unit of parallel operation. Further, the nodes Node1, ..., Nodex are composed of two processor elements PE1 and PE2, which are the data transfer units of the ring network. Parallel computer CPU
1 connects the above-mentioned nodes Node-1, ..., Node-x in a ring network to form processor elements PE1 and PE2.
The final result is obtained by circulating the data on the ring RING while performing the arithmetic processing in.

【０００４】また、ホストコンピュータＣＰＵ２は並列
コンピュータＣＰＵ１の各プロセッサ・エレメントＰＥ
1,ＰＥ2 に直結したホストバス４を持ち、並列コンピュ
ータＣＰＵ１に対してデータの設定、データの読み出
し、演算処理の制御を行う。そして、計算すべき課題は
ホストコンピュータＣＰＵ２が受け付け、プログラムや
データを並列コンピュータＣＰＵ１内の各プロセッサ・
エレメントＰＥ1,ＰＥ2 のメモリmem にローディングし
た後、並列コンピュータＣＰＵ１で実行する。Further, the host computer CPU2 is each processor element PE of the parallel computer CPU1.
1 has a host bus 4 directly connected to PE2 and controls data setting, data reading, and arithmetic processing for the parallel computer CPU1. Then, the host computer CPU2 accepts the problems to be calculated, and the programs and data are processed by each processor in the parallel computer CPU1.
After being loaded into the memory mem of the elements PE1 and PE2, the parallel computer CPU1 executes it.

【０００５】実行した結果は、演算終了後、各プロセッ
サ・エレメント内のメモリmem からホストコンピュータ
ＣＰＵ２側のメモリ２に読み出した後、出力される。図
１２は本発明の前提となる上記した並列コンピュータＣ
ＰＵ１における、往路と復路が通過するリング型ネット
ワークの構成を示す図である。同図において、Node-1，
…，Node-nはリング型ネットワークの各ノードであり、
各ノードNode-1，…，Node-nには２個のプロセッサ・エ
レメントが用意され、各ノードのプロセッサ・エレメン
トは同図に示すようにリング状に接続されている。ま
た、各プロセッサ・エレメントにはネットワークの物理
的な順序に従いＰＥ1 からＰＥ2nの番号が付けられてお
り、データもこの順に各プロセッサ・エレメントに分散
して配置される。After the operation is completed, the result of execution is read out from the memory mem in each processor element to the memory 2 on the side of the host computer CPU2, and then output. FIG. 12 shows the above-described parallel computer C which is the premise of the present invention.
FIG. 3 is a diagram showing a configuration of a ring network in which a forward path and a backward path pass in PU1. In the figure, Node-1,
…, Node-n is each node of the ring network,
Two processor elements are prepared for each node Node-1, ..., Node-n, and the processor elements of each node are connected in a ring shape as shown in FIG. Further, the processor elements are numbered PE1 to PE2n according to the physical order of the network, and data is also distributed and arranged in the processor elements in this order.

【０００６】図１３，図１４は（１）式に示す行列演算
を行う場合に、各プロセッサ・エレメントへのデータ配
置を行うための処理を示すフローチャートである。な
お、同図はフローチャートを簡略化するため、ｎ，ｍは
何方も偶数の場合を示している。ｎ，ｍが奇数の場合に
は、空きプロセッサ・エレメントを割り出す例外処理が
追加されるだけて本質的には上記フローチャートと変わ
らない。FIGS. 13 and 14 are flow charts showing a process for allocating data to each processor element when the matrix calculation shown in the equation (1) is performed. In addition, in order to simplify the flowchart, the figure shows the case where n and m are even numbers. When n and m are odd numbers, exception processing for allocating a free processor element is added, and the flow chart is essentially the same as the above flowchart.

【０００７】[0007]

【数１】 [Equation 1]

【０００８】図１３，図１４のフローチャートにより、
従来における各プロセッサ・エレメントへのデータ配置
について説明する。なお、以下の説明においては、分か
りやすくするため、ｍ＝６，ｎ＝４の行列を例にとり説
明するが、ｍ，ｎは６，４に限定されるものではない。
図１３のステップＳ１において、ｉ＝１、および、ｍｎ
＝ｍａｘ（ｍ，ｎ）、すなわち、ｍｎをｍ，ｎ内の大き
い方の値、この場合には６に設定する。ステップＳ２に
おいて、ｉ＜＝ｍｎ／２か否か判別し、ｉ＜＝ｍｎ／２
の場合には、ステップＳ３において、ｎ１＝ｉ，ｐ１＝
１に設定し、また、ｉ＞ｍｎ／２の場合、すなわち、ｉ
が３より大きくなった場合には、ステップＳ４におい
て、ｎ１＝ｍｎ−ｉ＋１，ｐ１＝２に設定する。According to the flowcharts of FIGS. 13 and 14,
The conventional data arrangement in each processor element will be described. In the following description, a matrix of m = 6 and n = 4 will be described as an example for easy understanding, but m and n are not limited to 6,4.
In step S1 of FIG. 13, i = 1 and mn
= Max (m, n), that is, mn is set to the larger value in m, n, in this case 6. In step S2, it is determined whether i <= mn / 2 and i <= mn / 2.
In the case of, in step S3, n1 = i, p1 =
1 and when i> mn / 2, that is, i
When is larger than 3, in step S4, n1 = mn-i + 1 and p1 = 2 are set.

【０００９】ついで、ステップＳ５において、ｉ＜＝ｎ
であるか否かを判別し、ｉ＞ｎの場合には、図１４のス
テップＳ１２に行き後述する処理を行う。また、ｉ＜＝
ｎの場合には、ステップＳ６に行き、ｊ＝１に設定し、
ステップＳ７に行き、ｉ＋ｊ−１＜＝ｍか否かを判断す
る。そして、ｉ＋ｊ−１＜＝ｍの場合には、ステップＳ
８に行き、NODE(n1),PE(p1),mem(j)=a(i,i+j-1) とす
る。すなわち、まず、ノードNode-1のプロセッサ・エレ
メントＰＥ1 のメモリmem の第１番目の領域にａ1,1 の
データを配置する。以上の処理が終わると、図１４のス
テップＳ１０に行き、ｊ＝ｊ＋１として、ステップＳ１
１において、ｊ＜＝ｍか否かを判断し、ｊ＜＝ｍの場合
には、ステップＳ７に戻り上記処理を行う。Then, in step S5, i <= n
If i> n, the process proceeds to step S12 of FIG. 14 and the process described later is performed. Also, i <=
If n, go to step S6 and set j = 1,
In step S7, it is determined whether i + j-1 <= m. If i + j-1 <= m, then step S
Go to No. 8 and set NODE (n1), PE (p1), mem (j) = a (i, i + j-1). That is, first, the data of a1,1 is arranged in the first area of the memory mem of the processor element PE1 of the node Node-1. When the above process is completed, the process goes to step S10 in FIG. 14, j = j + 1 is set, and step S1
In 1, it is determined whether j <= m, and if j <= m, the process returns to step S7 to perform the above process.

【００１０】これにより、ノードNode-1のプロセッサ・
エレメントＰＥ1 のメモリmem の第２番目の領域にａ1,
2 が配置される。同様に以上の処理を繰り返し、ｊ＝６
になるとノードNode-1のプロセッサ・エレメントＰＥ1
のメモリmem の第６番目の領域にａ1,6 が配置される。
以上の処理により、ノードNode-1のプロセッサ・エレメ
ントＰＥ1 のメモリmem の１〜６番目の領域にａ1,1 、
ａ1,2 、ａ1,3 、ａ1,4 、ａ1,5 、ａ1,6 のデータが配
置される。As a result, the processor of node Node-1
A1, in the second area of the memory mem of the element PE1
Two are placed. Similarly, the above processing is repeated, and j = 6
Becomes the processor element PE1 of node Node-1
A1,6 is arranged in the sixth area of the memory mem of the.
By the above processing, a1,1 are assigned to the first to sixth areas of the memory mem of the processor element PE1 of the node Node-1,
The data of a1,2, a1,3, a1,4, a1,5 and a1,6 are arranged.

【００１１】そして、ステップＳ１０でｊ＝ｊ＋１とす
ると、ｊは６を越えるので、ステップＳ１１からステッ
プＳ１２に行き、ｉ＜＝ｍか否かを判別する。この場合
には、ｉ＜＝ｍであるので、ステップＳ１３に行き、NO
DE(n1),PE(p1),mem(x)=x(i)とする。すなわち、ノードN
ode-1のプロセッサ・エレメントＰＥ1 のメモリmemのｘ
の領域にｘ1 のデータを配置する。If j = j + 1 is set in step S10, j exceeds 6. Therefore, the process proceeds from step S11 to step S12 to determine whether i <= m. In this case, i <= m, so go to step S13 and NO.
DE (n1), PE (p1), mem (x) = x (i). That is, node N
x of memory mem of processor element PE1 of ode-1
Place x1 data in the area.

【００１２】ついで、ステップＳ１４において、ｉ＝ｉ
＋１とし、ステップＳ１５において、ｉ＜＝ｍｎである
か否かを判別し、ｉ＜＝ｍｎの場合には、ステップＳ２
に戻り、上記処理を繰り返し、ノードNode-2のプロセッ
サ・エレメントＰＥ1 のメモリmem の各領域に、上記の
ようにデータを配置する。ここで、ｉ＝２でｊ＝６にな
ったとき、ステップＳ７でｉ＋ｊ−１＞ｍとなるので、
ステップＳ９に行き、NODE(n1),PE(p1),mem(j)=a(i,i+j
-m-1) とする。すなわち、ノードNode-2のプロセッサ・
エレメントＰＥ1 のメモリmem の第６番目の領域にはａ
2,1 のデータが配置される。Then, in step S14, i = i
+1 is set, and in step S15, it is determined whether or not i <= mn. If i <= mn, step S2
Then, the above process is repeated to arrange the data in each area of the memory mem of the processor element PE1 of the node Node-2 as described above. When i = 2 and j = 6, i + j-1> m in step S7.
Go to step S9, NODE (n1), PE (p1), mem (j) = a (i, i + j
-m-1). That is, the processor of node Node-2
The sixth area of the memory mem of the element PE1 has a
2,1 data is placed.

【００１３】以上の処理により、ノードNode-2のプロセ
ッサ・エレメントＰＥ1 のメモリmem の１〜６番目の領
域にａ2,2 、ａ2,3 、ａ2,4 、ａ2,5 、ａ2,6 、ａ2,1
が配置され、メモリmem のｘの領域にｘ2 のデータが配
置される。同様な処理を行うことにより、ノードNode-3
のプロセッサ・エレメントＰＥ1のメモリmem の１〜６
番目の領域にａ3,3 、ａ3,4 、ａ3,5 、ａ3,6 、ａ3,1
、ａ3,2 が配置され、メモリmem のｘの領域にｘ3 の
データが配置される。As a result of the above processing, a2,2, a2,3, a2,4, a2,5, a2,6, a2, are stored in the first to sixth areas of the memory mem of the processor element PE1 of the node Node-2. 1
Is arranged, and the data of x2 is arranged in the area of x of the memory mem. By performing similar processing, node Node-3
1 to 6 of the memory mem of the processor element PE1 of
In the second region, a3,3, a3,4, a3,5, a3,6, a3,1
, A3,2 are arranged, and the data of x3 is arranged in the area of x of the memory mem.

【００１４】そして、ステップＳ２において、ｉ＞ｍｎ
／２、すなわち、ｉ＝４となると、ステップＳ４に行
き、ｎ１＝ｍｎ−ｉ＋１、すなわち、ｎ１＝３、ｐ１＝
２に設定し、ステップＳ５からＳ１１において、ｊ＝１
からｊ＝６まで上記処理を繰り返す。これにより、ノー
ドNode-3のプロセッサ・エレメントＰＥ2 のメモリmem
の１〜６番目の領域にａ4,4 、ａ4,5 、ａ4,6 、ａ4,1
、ａ4,2 、ａ4,3 が配置され、メモリmem のｘの領域
にｘ4 のデータが配置される。Then, in step S2, i> mn
/ 2, i.e., i = 4, go to step S4, n1 = mn-i + 1, that is, n1 = 3, p1 =
2 and j = 1 in steps S5 to S11.
To j = 6 are repeated. As a result, the memory mem of the processor element PE2 of node Node-3
In the 1st to 6th areas of a4,4, a4,5, a4,6, a4,1
, A4,2, a4,3 are arranged, and x4 data is arranged in the x area of the memory mem.

【００１５】以上の処理を行ったのち、ステップＳ１４
でｉを１増加し（ｉは５となる）、ｎ１＝ｍｎ−ｉ＋１
＝２、Ｐ１＝２に設定した後、ステップＳ５に行くと、
ｉ＞ｎ（この場合はｎは４）となるので、ノードNode-2
のプロセッサ・エレメントＰＥ2 のメモリmem の１〜６
番目の領域にはデータを配置せず、ステップＳ１２に行
き、ノードNode-2のプロセッサ・エレメントＰＥ2 のメ
モリmem のｘの領域にｘ5 のデータを配置する。After performing the above processing, step S14
To increase i by 1 (i becomes 5), and n1 = mn-i + 1
= 2, P1 = 2, and then go to step S5,
Since i> n (n is 4 in this case), node Node-2
1 to 6 of memory mem of processor element PE2 of
The data is not arranged in the second area, the process goes to step S12, and the data of x5 is arranged in the area of x of the memory mem of the processor element PE2 of the node Node-2.

【００１６】同様に、ｉ＝６になると、ノードNode-1の
プロセッサ・エレメントＰＥ2 のメモリmem のｘの領域
にｘ6 のデータを配置し、ｉ＜ｍｎとなるので、ステッ
プＳ１５からｅｎｄに行き終了する。以上の処理を行う
ことにより、ノードNode-1〜 Node-3 のプロセッサ・エ
レメントＰＥ1 と、ノードNode-3のプロセッサ・エレメ
ントＰＥ2 のメモリmem の第１〜第６の領域に、それぞ
れ、ａ1,1 〜ａ1,6 、ａ2,2 〜ａ2,1 、ａ3,3 〜ａ3,2
、ａ4,4 〜ａ4,3 のデータが配置され、ノードNode-1
〜 Node-3 のプロセッサ・エレメントＰＥ1 と、ノード
Node-3〜 Node-1 のプロセッサ・エレメントＰＥ2 のメ
モリmem のｘの領域に、それぞれ、ｘ1 〜ｘ6 のデータ
が配置される。Similarly, when i = 6, the data of x6 is arranged in the area of x of the memory mem of the processor element PE2 of the node Node-1, and i <mn, so that the process goes from step S15 to end. To do. By performing the above-described processing, a1,1 are respectively allocated to the processor elements PE1 of the nodes Node-1 to Node-3 and the first to sixth areas of the memory mem of the processor element PE2 of the node Node-3. To a1,6, a2,2 to a2,1, a3,3 to a3,2
, A4,4 to a4,3 data are placed and node Node-1
~ Processor element PE1 of Node-3 and node
The data of x1 to x6 are arranged in the x area of the memory mem of the processor element PE2 of Node-3 to Node-1, respectively.

【００１７】次に、上記リング型ネットワークによる並
列計算の一例として、２ｎ行、２ｎ列の要素を持つ行列
Ｗと２ｎ個の要素を持つベクトルｘの積和計算Ｗｘにつ
いて示す。まず、各プロセッサ・エレメントには、図１
５に示すように、Ｗの各行の要素Ｗ_ij（ｉ＝１，…，２
ｎ，ｊ＝１，…，２ｎ）を持たせ、リング上にｘの各要
素ｘ_j（ｊ＝１，…，２ｎ）を配置し、次の手順で演算
を行う。各プロセッサ・エレメントは図１６に示すようにリ
ング上のデータを取り込み、内部に持つＷ_ijと積算し、
内部のワークエリアに書き込む。リング上のデータを全体に一段シフトする。その結
果、リング上のデータ配置は図１７に示すようになる。再度、プロセッサ・エレメントはリング上のデータ
を取り込み、図１８に示すように内部にもつＷ_ijの次の
要素と積算し、内部のワークエリアに書き込まれた前回
の結果に加算する。以下、全データに渡って上記処理を繰り返す。Next, as an example of parallel calculation by the ring network, a matrix W having 2n rows and 2n columns of elements and a product-sum calculation Wx of a vector x having 2n elements will be shown. First, each processor element has
As shown in FIG. 5, elements W _ij (i = 1, ..., 2) of each row of W
n, j = 1, ..., 2n), each element x _j (j = 1, ..., 2n) of x is arranged on the ring, and the calculation is performed in the following procedure. Each processor element takes in the data on the ring as shown in FIG. 16 and multiplies it with the internal W _ij ,
Write to the internal work area. Shift the data on the ring by one step. As a result, the data arrangement on the ring is as shown in FIG. Again, the processor element takes in the data on the ring, integrates it with the next element of W _ij inside as shown in FIG. 18, and adds it to the previous result written in the internal work area. Hereinafter, the above process is repeated for all data.

【００１８】[0018]

【発明が解決しようとする課題】図１２においては、ｎ
個のノードを通るリング型ネットワークを構成し、２ｎ
個のプロセッサ・エレメントにより処理を行っている
が、往路と復路が通過する図１２に示すようなリング型
ネットワークの特徴として、リングを通過するノード数
を変更することが可能であり、これにより、例えば、大
きさの異なった行列を次々に計算することができる。In FIG. 12, n
2n
Although the processing is performed by the number of processor elements, it is possible to change the number of nodes passing through the ring, which is a characteristic of the ring type network as shown in FIG. For example, matrices of different sizes can be calculated one after another.

【００１９】図１２に示したリング型ネットワークのノ
ード数を変更し、ｍ個のノードを通るリングを構成した
場合、プロセッサ・エレメントの割り付けは図１９に示
すようになる。同図に示すように、この場合にはプロセ
ッサ・エレメントの番号はＰＥ1 からＰＥ2mとなり、Ｐ
Ｅn+1 からＰＥ2mまでのプロセッサ番号はネットワーク
を変更する前と後ろで異なったものとなる。When the number of nodes in the ring network shown in FIG. 12 is changed and a ring passing through m nodes is constructed, the allocation of processor elements is as shown in FIG. In this case, the processor element numbers are PE1 to PE2m as shown in FIG.
The processor numbers from En + 1 to PE2m are different before and after changing the network.

【００２０】したがって、図１５に示したプロセッサ・
エレメントに処理プログラムを割り付け、データ配置処
理をした後、ネットワークの構成を図１９に示すように
変更する場合には、その時点でプロセッサ番号の移動に
伴うデータの再配置が必要となる。以上のように、従来
の技術においては、演算規模が変わってネットワーク構
成が変更されるとき、ネットワーク上を通るデータがプ
ロセッサ・エレメントに投入される順序が変わるため、
データの再配置をしなければならない。これは、全ての
データを把握しているホストコンピュータＣＰＵ２でし
か行えないため、並列コンピュータＣＰＵ１側の演算を
中断し、一つもしくは少数のホストコンピュータ内のプ
ロセッサでプロセッサ・エレメントのメモリを更新しな
れければならず、並列コンピュータの高速性能が犠牲に
なる。Therefore, the processor shown in FIG.
When the processing program is assigned to the element and the data arrangement processing is performed, and then the network configuration is changed as shown in FIG. 19, it is necessary to relocate the data at that time in accordance with the movement of the processor number. As described above, in the conventional technique, when the operation scale is changed and the network configuration is changed, the order in which the data passing through the network is input to the processor element is changed.
Data must be rearranged. Since this can be done only by the host computer CPU2 that knows all the data, interrupt the operation on the parallel computer CPU1 side and update the memory of the processor element by one or a few processors in the host computer. Must be done, and the high speed performance of parallel computers is sacrificed.

【００２１】上記のような問題を避けるため、一連の演
算の中で、最も規模の大きい場合のプロセッサ・エレメ
ント数を通るようにネットワーク構成を固定してしまう
方法もあるが、演算規模が小さいときも全てのプロセッ
サ・エレメント上をデータが通過しなければならず、転
送時間の無駄が生ずる。したがって、往路と復路が各ノ
ードを通過する本発明の前提となるリング型ネットワー
クにおいて、その特徴を生かすために演算途中でプロセ
ッサ構成が変わってもデータの再配置が不要となるプロ
セッサへの処理プログラムの割り付けが望ましい。In order to avoid the above problems, there is also a method of fixing the network configuration so that the number of processor elements in the case of the largest scale is passed in the series of calculations, but when the scale of the calculation is small. Data must pass over all processor elements, resulting in wasted transfer time. Therefore, in the ring network which is a premise of the present invention in which the forward path and the backward path pass through each node, the processing program for the processor which does not require the data rearrangement even if the processor configuration is changed in the middle of the calculation in order to take advantage of its characteristics. Allocation is desirable.

【００２２】なお、各ノードに一つのプロセッサ・エレ
メントしかもたず、復路がプロセッサ・エレメントを通
過しないリング型ネットワークのように、ネットワーク
構成が可変にできない通常の１通りリングの場合には、
上記問題点は生じない。本発明は上記した従来技術の問
題点を考慮してなされたものであって、ネットワーク構
成を変更しても実行時のデータの再配置を不要とするこ
とにより、高速に処理を行うことができる並列コンピュ
ータを提供することを目的とする。In the case of a normal one-way ring in which the network configuration cannot be made variable, such as a ring network in which each node has only one processor element and the return path does not pass through the processor element,
The above problems do not occur. The present invention has been made in consideration of the above-mentioned problems of the prior art, and even if the network configuration is changed, it is possible to perform high-speed processing by eliminating the need to relocate data during execution. The purpose is to provide a parallel computer.

【００２３】[0023]

【課題を解決するための手段】図１は本発明の原理図で
ある。同図において、ノードNode-1, …,Node-n は往路
と復路が通過するリング型ネットワークにおけるノー
ド、ＰＥ1,ＰＥ2 は各ノードに設けられたプロセッサ・
エレメントであり、各ノードNode-1, …,Node-nには、
プロセッサ・エレメントＰＥ1,ＰＥ2 が少なくとも２個
以上設けられている。FIG. 1 shows the principle of the present invention. In the figure, nodes Node-1, ..., Node-n are nodes in a ring network through which the forward path and the return path pass, and PE1 and PE2 are processors provided in each node.
It is an element, and each node Node-1, ..., Node-n has
At least two processor elements PE1 and PE2 are provided.

【００２４】上記課題を解決するため、本発明の請求項
１の発明は、各ノードNode-1, …,Node-n に少なくとも
２個のプロセッサ・エレメントＰＥ１，ＰＥ２を備え、
各ノードを往路と復路が通過するリング型ネットワーク
で構成され、各ノードNode-1, …,Node-n のプロセッサ
・エレメントＰＥ１，ＰＥ２が持つデータを各プロセッ
サ・エレメントＰＥ１，ＰＥ２間で転送して、各プロセ
ッサ・エレメントＰＥ１，ＰＥ２により処理を行う並列
コンピュータにおいて、ノードNode-1, …,Node-n の順
番に、当該ノードに備えられたプロセッサ・エレメント
ＰＥ１，ＰＥ２から順次ネットワーク上を転送されるデ
ータを処理する処理プログラムを割り付けたものであ
る。In order to solve the above problems, the invention of claim 1 of the present invention comprises at least two processor elements PE1 and PE2 in each node Node-1, ..., Node-n,
It is composed of a ring type network that passes forward and backward through each node, and transfers the data held by the processor elements PE1 and PE2 of each node Node-1, ..., Node-n between the processor elements PE1 and PE2. , In a parallel computer that performs processing by the processor elements PE1 and PE2, nodes Node-1, ..., Node-n are sequentially transferred on the network from the processor elements PE1 and PE2 provided in the node. A processing program for processing data is assigned.

【００２５】本発明の請求項２の発明は、各ノードNode
-1, …,Node-n に少なくとも２個のプロセッサ・エレメ
ントＰＥ１，ＰＥ２を備え、各ノードを往路と復路が通
過するリング型ネットワークで構成され、各ノードNode
-1, …,Node-n のプロセッサ・エレメントＰＥ１，ＰＥ
２が持つデータを各プロセッサ・エレメントＰＥ１，Ｐ
Ｅ２間で転送して、各プロセッサ・エレメントＰＥ１，
ＰＥ２により処理を行う並列コンピュータにおいて、ノ
ードNode-1, …,Node-n の順番に、当該ノードに備えら
れたプロセッサ・エレメントＰＥ１，ＰＥ２から順次、
ネットワーク上を転送されるデータを処理する処理プロ
グラムを割り付け、ノードNode-1, …,Node-n の順番
に、当該ノードに備えられた各プロセッサ・エレメント
ＰＥ１，ＰＥ２から順次データを配置したものである。According to the second aspect of the present invention, each node Node
-1, ..., Node-n is provided with at least two processor elements PE1 and PE2, and each node is configured as a ring type network in which a forward path and a backward path pass, and each node Node
-1, ..., Node-n processor elements PE1, PE
2 has the data that each processor element PE1, P has
Transfer between E2 and each processor element PE1,
In a parallel computer that performs processing by PE2, nodes Node-1, ..., Node-n are arranged in this order from processor elements PE1 and PE2 provided in the node,
A processing program for processing the data transferred on the network is allocated, and the data is sequentially arranged from the processor elements PE1 and PE2 provided in the node in the order of the nodes Node-1, ..., Node-n. is there.

【００２６】[0026]

【作用】図１に示すリング型ネットワークにおいて、各
ノードに少なくとも２個以上のプロセッサ・エレメント
ＰＥ1,ＰＥ2 を設ける。また、データ転送方向が同図矢
印に示す方向の場合、ノードNode-1, …,Node-n の順番
に、当該ノードに備えられたプロセッサ・エレメントＰ
Ｅ１，ＰＥ２から順次ネットワーク上を転送されるデー
タを処理する処理プログラムを割り付ける。In the ring type network shown in FIG. 1, each node is provided with at least two or more processor elements PE1 and PE2. When the data transfer direction is the direction shown by the arrow in the figure, the processor elements P provided in the node are arranged in the order of nodes Node-1, ..., Node-n.
A processing program for sequentially processing data transferred on the network from E1 and PE2 is allocated.

【００２７】そして、同図に示す転送方向で各プロセッ
サ・エレメントＰＥ1,ＰＥ2 にデータが転送された際、
各プロセッサ・エレメントＰＥ1,ＰＥ2 は各プロセッサ
・エレメントに配置されたデータと転送データの演算を
行い、例えば、行列の積を求める等の所定の処理を行
う。なお、上記のようにノードNode-1, …,Node-n の順
番に、当該ノードに備えられたプロセッサ・エレメント
ＰＥ１，ＰＥ２から順次ネットワーク上を転送されるデ
ータを処理する処理プログラムを割り付けることによ
り、リング上から見たプロセッサ番号が不連続となり、
転送されるデータが順番に並ばなくなるが、プロセッサ
・エレメントＰＥ1,ＰＥ2 が内部に持つデータおよび計
算順序を予め再配置しておけば、この問題は解決でき
る。これは、従来のような実行時のデータ再配置ではな
いので、処理時間に及ぼす影響は無視できる。When data is transferred to the processor elements PE1 and PE2 in the transfer direction shown in FIG.
Each of the processor elements PE1 and PE2 calculates the data arranged in each of the processor elements and the transfer data, and performs a predetermined process such as obtaining a matrix product. As described above, by allocating the processing programs for sequentially processing the data transferred on the network from the processor elements PE1 and PE2 provided in the node in order of the nodes Node-1, ..., Node-n , Processor numbers seen from the ring are discontinuous,
The data to be transferred is not arranged in order, but this problem can be solved by rearranging the data and calculation order internally held by the processor elements PE1 and PE2. Since this is not the data relocation at the time of execution as in the past, the influence on the processing time can be ignored.

【００２８】本発明の請求項１の発明においては、上記
のように、ノードNode-1, …,Node-n の順番に、当該ノ
ードに備えられたプロセッサ・エレメントＰＥ１，ＰＥ
２から順次ネットワーク上を転送されるデータを処理す
る処理プログラムを割り付けているので、リング型ネッ
トワークにおいて、ネットワーク構成によらずに、プロ
セッサの割り付けを一意に決定することができる。According to the first aspect of the present invention, as described above, the processor elements PE1 and PE provided in the node are arranged in the order of the nodes Node-1, ..., Node-n.
Since the processing programs for sequentially processing the data transferred on the network from 2 are allocated, the allocation of the processors can be uniquely determined in the ring network regardless of the network configuration.

【００２９】本発明の請求項２の発明においては、ノー
ドNode-1〜Node-nの順番にプロセッサ・エレメントＰＥ
1,ＰＥ2 ，…を割り付け、ノードNode-1, …,Node-n の
順番に、当該ノードに備えられたプロセッサ・エレメン
トＰＥ１，ＰＥ２から順次ネットワーク上を転送される
データを処理する処理プログラムを割り付け、ノードNo
de-1, …,Node-n の順番に、当該ノードに備えられた各
プロセッサ・エレメントＰＥ１，ＰＥ２から順次データ
を配置しているので、ネットワーク構成を変更しても、
実行時のデータの再配置が不要となり、高速に処理する
ことが可能となる。In the second aspect of the present invention, the processor elements PE are arranged in the order of the nodes Node-1 to Node-n.
1, PE2, ... Are allocated, and in order of nodes Node-1, ..., Node-n, processing programs for processing the data transferred sequentially on the network from the processor elements PE1, PE2 provided in the node are allocated. , Node No
Since data is sequentially arranged from the processor elements PE1 and PE2 provided in the node in the order de-1, ..., Node-n, even if the network configuration is changed,
Relocation of data at the time of execution becomes unnecessary, and high-speed processing becomes possible.

【００３０】[0030]

【実施例】図２は本発明の第１の実施例の処理プログラ
ムの割り付けを示す図であり、同図において、Node-1か
らNode-nのノードには、２個のプロセッサ・エレメント
が割り付けられており、ノードNode-1にはプロセッサ・
エレメントＰＥ1 〜ＰＥ2が、また、ノードNode-2には
プロセッサ・エレメントＰＥ3 〜ＰＥ4 というように、
ノードの順番にＰＥ1 からＰＥ2nのプロセッサに処理プ
ログラムが割り付けられている。FIG. 2 is a diagram showing the allocation of processing programs according to the first embodiment of the present invention. In FIG. 2, two processor elements are allocated to nodes Node-1 to Node-n. The node Node-1 has a processor
Elements PE1 to PE2, and processor elements PE3 to PE4 for node Node-2,
Processing programs are assigned to the processors PE1 to PE2n in the order of nodes.

【００３１】図３、図４、図５は本実施例において、式
（１）に示す行列演算を行うため、各ノードのプロセッ
サ・エレメントにデータ配置を行う処理を示すフローチ
ャートであり、同図はフローチャートを簡略化するた
め、従来例の場合と同様、ｎ，ｍは何方も偶数の場合を
示している。図３，図５のフローチャートにより、本実
施例における各プロセッサ・エレメントへのデータ配置
について説明する。なお、この場合においても、分かり
やすくするため、ｍ＝６、ｎ＝４の場合について説明す
る。FIGS. 3, 4 and 5 are flowcharts showing a process of allocating data to the processor element of each node in order to perform the matrix operation shown in the equation (1) in this embodiment. In order to simplify the flowchart, as in the case of the conventional example, n and m are even numbers. Data allocation to each processor element in the present embodiment will be described with reference to the flowcharts of FIGS. Also in this case, for the sake of simplicity, the case of m = 6 and n = 4 will be described.

【００３２】図３のステップＳ１において、ｉ１＝１，
ｉ２＝ｍａｘ（ｍ，ｎ）、ｊ＝１に設定し、ステップＳ
２において、ｔａｂｌｅ（ｉ１）＝ｊ、ｔａｂｌｅ（ｉ
２）＝ｊ＋１に設定する。ついで、ステップＳ３におい
て、ｉ１＝ｉ１＋１，ｉ２＝ｉ２−１、ｊ＝ｊ＋２とし
て、ステップＳ４において、ｉ１＜ｉ２であるか否か判
断し、ｉ１＜ｉ２の場合には、ステップＳ２に戻り、上
記処理を繰り返す。In step S1 of FIG. 3, i1 = 1,
i2 = max (m, n) and j = 1 are set, and step S
2, table (i1) = j, table (i
2) = j + 1 is set. Then, in step S3, i1 = i1 + 1, i2 = i2-1, j = j + 2 is determined, and in step S4, it is determined whether i1 <i2. If i1 <i2, the process returns to step S2, and Repeat the process.

【００３３】その結果、ｍ＝６、ｎ＝４の場合、ｔａｂ
ｌｅは次のように設定される。ｔａｂｌｅ（１）＝１，ｔａｂｌｅ（２）＝３，ｔａｂ
ｌｅ（３）＝５ｔａｂｌｅ（４）＝６，ｔａｂｌｅ（５）＝４，ｔａｂ
ｌｅ（６）＝２次に、ステップＳ５に行き、ｉ１＝１，ｍｎ＝ｍａｘ
（ｍ，ｎ）に設定し、ステップＳ６において、ｉ＜＝ｍ
ｎ／２であるか否か判別する。そして、ｉ＜＝ｍｎ／２
の場合には、ステップＳ７に行き、ｉ１＝ｔａｂｌｅ
（ｉ）、ｎ１＝ｉ、ｐ１＝１に設定する。また、後述す
るように、ｉが増加してｉ＞ｍｎ／２、すなわち、ｉが
４以上となると、ｉ１＝ｔａｂｌｅ（ｉ）、ｎ１＝ｍｎ
−ｉ＋１、ｐ１＝２に設定する。したがって、最初は、
ｉ１＝１、ｎ１＝１、ｐ１＝１に設定される。As a result, when m = 6 and n = 4, tab
le is set as follows. table (1) = 1, table (2) = 3, tab
le (3) = 5 table (4) = 6, table (5) = 4, tab
le (6) = 2 Next, in step S5, i1 = 1 and mn = max.
(M, n), and i <= m in step S6
It is determined whether it is n / 2. And i <= mn / 2
In the case of, go to step S7 and i1 = table
(I), n1 = i and p1 = 1 are set. Further, as will be described later, when i increases and i> mn / 2, that is, i becomes 4 or more, i1 = table (i), n1 = mn
-I + 1 and p1 = 2 are set. So at first,
i1 = 1, n1 = 1 and p1 = 1 are set.

【００３４】ついで、図４のステップＳ９に行き、ｉ１
＜＝ｎか否かを判別し、ｉ１＞ｎになると後述するよう
に図５のステップＳ１６に行く。ｉ１＜＝ｎの場合に
は、ステップＳ１０に行き、ｊ＝１とし、ステップＳ１
１において、ｉ＋ｊ−１＜＝ｍか否かを判断する。そし
て、ｉ＋ｊ−１＜＝ｍの場合には、ステップＳ１２に行
き、NODE(n1),PE(p1),mem(j)=a(i1,table(i+j-1)) とす
る。すなわち、ｔａｂｌｅ（ｉ＋ｊ−１）＝ｔａｂｌｅ
（１）＝１であるので、まず、ノードNode-1のプロセッ
サ・エレメントＰＥ1 のメモリmem の第１番目の領域に
ａ1,1 のデータを配置する。Then, in step S9 of FIG. 4, i1
It is determined whether or not <= n, and if i1> n, the process proceeds to step S16 of FIG. 5 as described later. If i1 <= n, go to step S10 and set j = 1, and then step S1
At 1, it is determined whether i + j-1 <= m. When i + j-1 <= m, the process goes to step S12, where NODE (n1), PE (p1), mem (j) = a (i1, table (i + j-1)). That is, table (i + j-1) = table
Since (1) = 1, first, the data of a1,1 is arranged in the first area of the memory mem of the processor element PE1 of the node Node-1.

【００３５】ついで、ステップＳ１４に行き、ｊ＝ｊ＋
１とし、ステップＳ１５において、ｊ＜＝ｍであるか否
かを判別し、ｊ＜＝ｍの場合には、ステップＳ１１に戻
り上記処理をｊ＝６まで繰り返す。これにより、ノード
Node-1のプロセッサ・エレメントＰＥ1 のメモリmem の
第１〜６番目の領域にそれぞれ、ａ1,1 、ａ1,3 、ａ1,
5 、ａ1,6 、ａ1,4 、ａ1,2 のデータが配置される。Then, in step S14, j = j +
1, and in step S15, it is determined whether or not j <= m. If j <= m, the process returns to step S11 and the above process is repeated until j = 6. This allows the node
In the first to sixth areas of the memory mem of the processor element PE1 of Node-1, a1,1, a1,3 and a1, respectively.
The data of 5, a1,6, a1,4 and a1,2 are arranged.

【００３６】そして、ステップＳ１４でｊ＝ｊ＋１とす
ると、ｊは６を越えるので、ステップＳ１５から図５の
ステップＳ１６に行き、ｉ＜＝ｍか否かを判別する。こ
の場合には、ｉ＜＝ｍであるので、ステップＳ１７に行
き、NODE(n1),PE(p1),mem(x)=x(i) とする。すなわち、
ノードNode-1のプロセッサ・エレメントＰＥ1 のメモリ
mem のｘの領域にｘ1 のデータを配置する。If j = j + 1 is set in step S14, j exceeds 6. Therefore, the process goes from step S15 to step S16 in FIG. 5 to determine whether i <= m. In this case, since i <= m, the process goes to step S17 to set NODE (n1), PE (p1), mem (x) = x (i). That is,
Memory of processor element PE1 of node Node-1
Place the x1 data in the x area of mem.

【００３７】以上の処理により、ノードNode-1のプロセ
ッサ・エレメントＰＥ1 のメモリmem の１〜６番目の領
域に上記のようなａ1,1 〜ａ1,2 のデータが配置され、
メモリmem のｘの領域にｘ1 のデータが配置される。つ
いで、ステップＳ１８において、ｉ＝ｉ＋１とし、ステ
ップＳ１９において、ｉ＜＝ｍｎであるか否かを判別
し、ｉ＜＝ｍｎの場合には、ステップＳ６に戻る。By the above processing, the data of a1,1 to a1,2 as described above are arranged in the first to sixth areas of the memory mem of the processor element PE1 of the node Node-1.
The x1 data is placed in the x area of the memory mem. Then, in step S18, i = i + 1 is set, and in step S19, it is determined whether or not i <= mn. If i <= mn, the process returns to step S6.

【００３８】今度は、ｉ＝２となっているので、ｉ１＝
３、ｎ１＝２、ｐ１＝１に設定され、ノードNode-2のプ
ロセッサ・エレメントＰＥ1 のメモリmem の領域にデー
タが配置される。つまり、ｉ＋ｊ−１＜＝ｍの場合に
は、ステップＳ１１からステップＳ１２に行き、NODE(n
1),PE(p1),mem(j)=a(i1,table(i+j-1)) とする。すなわ
ち、ｔａｂｌｅ（ｉ＋ｊ−１）＝ｔａｂｌｅ（２）＝３
であるので、ノードNode-2のプロセッサ・エレメントＰ
Ｅ1 のメモリmem の第１番目の領域にａ3,3 のデータを
配置する。Since i = 2 this time, i1 =
3, n1 = 2 and p1 = 1 are set, and the data is arranged in the area of the memory mem of the processor element PE1 of the node Node-2. That is, when i + j-1 <= m, the process goes from step S11 to step S12, and NODE (n
1), PE (p1), mem (j) = a (i1, table (i + j-1)). That is, table (i + j-1) = table (2) = 3
Therefore, the processor element P of the node Node-2 is
The data of a3,3 is arranged in the first area of the memory mem of E1.

【００３９】ついで、ステップＳ１４に行き、ｊ＝ｊ＋
１とし、ステップＳ１５において、ｊ＜＝ｍであるか否
かを判別し、ｊ＜＝ｍの場合には、ステップＳ１１に戻
り上記処理をｊ＝５まで繰り返す。これにより、ノード
Node-2のプロセッサ・エレメントＰＥ1 のメモリmem の
第２〜５番目の領域には、ａ3,5 、ａ3,6 、ａ3,4 、ａ
3,2 のデータが配置される。Then, in step S14, j = j +
1, and in step S15, it is determined whether or not j <= m. If j <= m, the process returns to step S11 and the above processing is repeated until j = 5. This allows the node
In the second to fifth areas of the memory mem of the processor element PE1 of Node-2, a3,5, a3,6, a3,4, a
3 and 2 data are placed.

【００４０】ｊ＝６になると、ｉ＋ｊ−１＞ｍになるの
で、ステップＳ１３に行き、NODE(n1),PE(p1),mem(j)=a
(i1,table(i+j-1-m)) とする。すなわち、ｔａｂｌｅ
（ｉ＋ｊ−１−ｍ）＝ｔａｂｌｅ（１）＝１であるの
で、ノードNode-2のプロセッサ・エレメントＰＥ1 のメ
モリmem の第６番目の領域にａ3,1 のデータを配置す
る。そして、ステップＳ１４でｊ＝ｊ＋１とすると、ｊ
は６を越えるので、ステップＳ１５から図５のステップ
Ｓ１６に行き、ｉ＜＝ｍか否かを判別する。この場合に
は、ｉ＜＝ｍであるので、ステップＳ１７に行き、NODE
(n1),PE(p1),mem(x)=x(i) とする。すなわち、ノードNo
de-2のプロセッサ・エレメントＰＥ1 のメモリmem のｘ
の領域にｘ3 のデータを配置する。When j = 6, i + j-1> m. Therefore, the process proceeds to step S13, where NODE (n1), PE (p1), mem (j) = a.
(i1, table (i + j-1-m)). That is, table
Since (i + j-1-m) = table (1) = 1, the data of a3,1 is arranged in the sixth area of the memory mem of the processor element PE1 of the node Node-2. Then, if j = j + 1 in step S14, j
Exceeds 6, the process goes from step S15 to step S16 in FIG. 5 to determine whether i <= m. In this case, i <= m, so go to step S17
Let (n1), PE (p1), mem (x) = x (i). That is, node number
x of memory mem of processor element PE1 of de-2
Place x3 data in the area.

【００４１】以上の処理により、ノードNode-2のプロセ
ッサ・エレメントＰＥ1 のメモリmem の１〜６番目の領
域に上記のようなａ3,3 〜ａ3,1 のデータが配置され、
メモリmem のｘの領域にｘ3 のデータが配置される。以
上の処理を行ったのち、ステップＳ１８でｉを１増加し
（ｉは３となる）、ステップＳ１９でｉ＜ｍｎが否か判
別し、ステップＳ６に戻る。By the above processing, the data of a3,3 to a3,1 as described above are arranged in the first to sixth areas of the memory mem of the processor element PE1 of the node Node-2,
The x3 data is placed in the x area of the memory mem. After performing the above processing, i is incremented by 1 (i becomes 3) in step S18, it is determined whether i <mn is satisfied in step S19, and the process returns to step S6.

【００４２】そして、ステップＳ７において、ｉ１＝ｔ
ａｂｌｅ（３）＝５、ｎ１＝３、ｐ１＝１に設定し、ス
テップＳ９に行く。今度は、ｉ１＝５であり、ｉ１はｎ
（この場合は４）を越えるので、ステップＳ１６におい
て、ｉ１＜ｍであるか否かを判別し、ｉ１＜ｍの場合に
は、ステップＳ１７に行き、NODE(n1),PE(p1),mem(x)=x
(i) とする。すなわち、ノードNode-3のプロセッサ・エ
レメントＰＥ1 のメモリmem の１〜６番目の領域にはデ
ータを配置せず、ノードNode-3のプロセッサ・エレメン
トＰＥ1 のメモリmem のｘの領域にｘ5 のデータを配置
する。Then, in step S7, i1 = t
The setting is made as “able (3) = 5”, “n1 = 3”, and “p1 = 1”, and the process goes to step S9. This time, i1 = 5 and i1 is n
Since (4 in this case) is exceeded, it is determined in step S16 whether i1 <m. If i1 <m, the process proceeds to step S17, where NODE (n1), PE (p1), mem. (x) = x
(i) That is, the data is not placed in the first to sixth areas of the memory mem of the processor element PE1 of the node Node-3, and the data of x5 is placed in the x area of the memory mem of the processor element PE1 of the node Node-3. Deploy.

【００４３】そして、ステップＳ１８において、ｉを１
増加し（ｉは４となる）、ステップＳ１９でｉ＜ｍｎが
否か判別し、ステップＳ６に戻る。ステップＳ６におい
て、ｉ＜＝ｍｎ／２を判別すると、ｉは４でｍｎ／２＝
３を越えるので、ステップＳ８に行き、ｉ１＝ｔａｂｌ
ｅ（４）＝６、ｎ１＝ｍｎ−ｉ＋１＝３、ｐ２＝２に設
定する。ついで、ステップＳ９に行き、ｉ１＜＝ｎであ
るか否かを判別すると、ｉ１はｎを越える（この場合ｎ
は４）、ステップＳ１６に行き、前記と同様、ノードNo
de-3のプロセッサ・エレメントＰＥ2 のメモリmem の１
〜６番目の領域にはデータを配置せず、ノードNode-3の
プロセッサ・エレメントＰＥ2 のメモリmem のｘの領域
にｘ6 のデータを配置する。Then, in step S18, i is set to 1
It increases (i becomes 4), it is determined in step S19 whether i <mn, and the process returns to step S6. When i <= mn / 2 is determined in step S6, i is 4 and mn / 2 =
Since 3 is exceeded, go to step S8, i1 = tabl
Set e (4) = 6, n1 = mn-i + 1 = 3, and p2 = 2. Then, in step S9, if it is determined whether or not i1 <= n, i1 exceeds n (in this case, n
4), go to step S16 and, as before, go to the node No.
1 of memory mem of processor element PE2 of de-3
The data is not arranged in the 6th area, but the data of x6 is arranged in the area of x of the memory mem of the processor element PE2 of the node Node-3.

【００４４】ついで、ステップＳ１８において、ｉ＝ｉ
＋１とし、ステップＳ１９において、ｉ＜＝ｍｎである
か否かを判別し、ｉ＜＝ｍｎの場合には、ステップＳ６
に戻る。今度は、ｉ＝５となっているので、ステップＳ
８において、ｉ１＝４、ｎ１＝２、ｐ１＝２に設定さ
れ、ノードNode-2のプロセッサ・エレメントＰＥ2 のメ
モリmem の領域にデータが配置される。Then, in step S18, i = i
+1 is set, and it is determined in step S19 whether i <= mn. If i <= mn, step S6
Return to. Since i = 5 this time, step S
8, i1 = 4, n1 = 2, p1 = 2 are set, and the data is arranged in the area of the memory mem of the processor element PE2 of the node Node-2.

【００４５】そして前記と同様、ノードNode-2のプロセ
ッサ・エレメントＰＥ2 のメモリmem の第１〜６番目の
領域には、ａ4,4 、ａ4,2 、ａ4,1 、ａ4,3 、ａ4,5 、
ａ4,6 が配置され、メモリmem のｘの領域にｘ4 のデー
タが配置される。そして、ｉ＝６となると、ノードNode
-1のプロセッサ・エレメントＰＥ2 のメモリmem の第１
〜６番目の領域には、ａ2,2 、ａ2,1 、ａ2,3 、ａ2,5
、ａ2,6、ａ2,4 が配置され、メモリmem のｘの領域に
ｘ2 のデータか配置される。Then, as described above, a4,4, a4,2, a4,1, a4,3, a4,5 are stored in the first to sixth areas of the memory mem of the processor element PE2 of the node Node-2. ,
a4,6 are arranged, and x4 data is arranged in the x area of the memory mem. Then, when i = 6, the node Node
-1 first memory mem of processor element PE2
~ The sixth area contains a2,2, a2,1, a2,3, a2,5
, A2,6, a2,4 are arranged, and data of x2 is arranged in the area of x of the memory mem.

【００４６】そして、ステップＳ１８でｉを１増加する
と、ｉ＞ｍとなるので、ステップＳ１９からｅｎｄに行
き終了する。以上の処理を行うことにより、ノードNode
-1のプロセッサ・エレメントＰＥ1, ＰＥ2 とノードNod
e-2のプロセッサ・エレメントＰＥ1 とＰＥ2 のメモリm
emの第１〜第６の領域に、それぞれ、上記したようにａ
1,1 〜ａ1,2 、ａ2,2 〜ａ2,4 、ａ3,3 〜ａ3,1 、ａ4,
4 〜ａ4,6 のデータが配置され、ノードNode-1〜 Node-
3 のプロセッサ・エレメントＰＥ1 と、ノードNode-3〜
Node-1 のプロセッサ・エレメントＰＥ2 のメモリmem
のｘの領域に、それぞれ、ｘ1 〜ｘ6 のデータが配置さ
れる。When i is incremented by 1 in step S18, i> m, so that the process goes from step S19 to end and ends. By performing the above processing, node Node
-1 processor elements PE1, PE2 and node Nod
Memory m of processor element PE1 and PE2 of e-2
In each of the first to sixth areas of em, as described above,
1,1 to a1,2, a2,2 to a2,4, a3,3 to a3,1, a4,
Data of 4 to a4,6 are placed, and nodes Node-1 to Node-
Processor element PE1 of 3 and node Node-3 ~
Memory mem of processor element PE2 of Node-1
The data of x1 to x6 are arranged in the area of x in FIG.

【００４７】図６は図２に示したプロセッサへ処理プロ
グラムの割り付けを行ったリング型ネットワークにおい
て、上記した処理により、前記した２ｎ行、２ｎ列の要
素を持つ行列Ｗと２ｎ個の要素を持つベクトルｘのデー
タ配置を示す図である。同図に示すように、本実施例に
おいては、各プロセッサ・エレメントＰＥ1 からＰＥ2n
に、行列Ｗ_ijの各行の要素Ｗ_ij（ｉ＝１，…，２ｎ，ｊ
＝１，…，２ｎ）を各プロセッサ・エレメントＰＥ1 〜
ＰＥ2nの番号に対応させて配置し、また、リング上のｘ
の各要素ｘ_j（ｊ＝１，…，２ｎ）を同様に各プロセッ
サ・エレメントＰＥ1 〜ＰＥ2nの番号に対応させて配置
する。FIG. 6 shows a matrix W having 2n rows and 2n columns and 2n elements having the above-mentioned processing in the ring network in which the processing program is allocated to the processor shown in FIG. It is a figure which shows the data arrangement of the vector x. As shown in the figure, in this embodiment, each processor element PE1 to PE2n is
, The elements W _ij (i = 1, ..., 2n, j) of each row of the matrix W _ij
= 1, ..., 2n) for each processor element PE1 ...
It is arranged corresponding to the number of PE2n, and x on the ring
Similarly, each element x _j (j = 1, ..., 2n) is arranged corresponding to the number of each processor element PE1 to PE2n.

【００４８】そして、２ｎ行、２ｎ列の要素を持つ行列
Ｗと２ｎ個の要素を持つベクトルｘの積和計算Ｗｘを求
める場合には、図１５から図１８に示した場合と同様、
リング上のデータを取り込み、内部に持つＷ_ijと積算し
て内部のワークエリアに書込み、リンク上のデータを全
体にシフトしながらそれらの和を求める。以上の処理を
行うことにより、図１５から図１８に示した従来例とは
演算の順序は若干異なるが、同一の結果を得ることがで
きる。Then, when the product sum calculation Wx of the matrix W having 2n rows and 2n columns and the vector x having 2n elements is obtained, as in the case shown in FIGS.
The data on the ring is taken in, integrated with the internal _Wij and written to the internal work area, and the sum of them is obtained while shifting the data on the link as a whole. By performing the above processing, the same result can be obtained although the order of calculation is slightly different from that of the conventional example shown in FIGS.

【００４９】図７は上記実施例において、ネットワーク
の構成を変更しノード数をｎからｍにした場合を示して
いる。ノード数を変更した場合、ノードNode-1からNode
-nについての処理プログラムの割り付けは図２の場合と
同一であり、ノードNode-n+1からNode-mについて、新た
にノードの順番にプロセッサ・エレメントが割り付けら
れる。FIG. 7 shows a case where the network configuration is changed and the number of nodes is changed from n to m in the above embodiment. If you change the number of nodes, Node-1 to Node
The allocation of the processing program for -n is the same as in the case of FIG. 2, and the processor elements are newly allocated in the order of nodes for the nodes Node-n + 1 to Node-m.

【００５０】したがって、図６に示したような行列Ｗと
ベクトルｘの積和計算Ｗｘを行った後、大きさが異なっ
た行列とベクトルの積和を求める場合には、プロセッサ
・エレメントはノードNode-n+1からNode-mについて割り
付ければよく、また、データについても、ノードNode-n
+1からNode-mのプロセッサ・エレメントに新たに配置す
ればよい。Therefore, when the product-sum calculation Wx of the matrix W and the vector x as shown in FIG. 6 is performed and the product-sum of the matrix and the vector having different sizes is obtained, the processor element is the node Node. Allocate from -n + 1 to Node-m, and also for data, node Node-n
It should be newly arranged from +1 to the processor element of Node-m.

【００５１】すなわち、ノードNode-1からNode-nについ
ては図２の場合と同一であり、ノードNode-n+1からNode
-mについて、プロセッサへの処理プログラムの割り付
け、データの配置をすればよいので、従来方法のように
ネットワーク全体のプロセッサへの処理プログラムの再
割り付けおよびデータの再配置を行う必要がない。図８
は従来方法と、本実施例の方法における処理プログラム
の割り付け、データの配置を示す図である。同図は、入
力データがｘ1 ，ｘ2 ，…，ｘ6 の６個、出力データが
ｙ1 ，ｙ2 ，…，ｙ4 の４個の次の式（２）示す行列演
算を行う場合の処理プログラムの割り付けとデータの配
置を示しており、同図（ａ）は従来方法による場合、
（ｂ）は本実施例の方法による場合を示している。That is, the nodes Node-1 to Node-n are the same as in the case of FIG.
As for -m, it is sufficient to allocate the processing program to the processors and allocate the data, so that it is not necessary to reallocate the processing programs to the processors of the entire network and reallocate the data unlike the conventional method. Figure 8
FIG. 9 is a diagram showing allocation of processing programs and data arrangement in the conventional method and the method of the present embodiment. In the figure, the allocation of the processing program when the matrix operation shown in the following formula (2) is performed for six pieces of input data x1, x2, ..., X6 and four pieces of output data y1, y2 ,. The data arrangement is shown in FIG. 3A, which is the case of the conventional method.
(B) shows the case according to the method of this embodiment.

【００５２】[0052]

【数２】 [Equation 2]

【００５３】ここで、演算開始前のデータの配置を見る
限り２つの方法に優劣はない。しかし、演算結果のｙ1
，ｙ2 ，…，ｙ4 について、次の式（３）の演算を行
う場合に決定的な差が生ずる。Here, as far as the arrangement of data before the start of calculation is concerned, the two methods are not superior or inferior. However, the calculation result y1
, Y2, ..., Y4, a definite difference occurs when the calculation of the following formula (3) is performed.

【００５４】[0054]

【数３】 [Equation 3]

【００５５】図９は式（２）の演算結果を示す図であ
り、同図（ａ）は従来例の場合、（ｂ）は本実施例の場
合を示している。同図に示すように、従来方法の場合に
は、出力データはノードNode-1からNode-3のＰＥ１とノ
ードNode-3のＰＥ２に配置され、本実施例の方法の場合
には、出力データはノードNode-1からNode-2のＰＥ１と
ＰＥ２に配置される。FIG. 9 is a diagram showing the calculation result of the equation (2). FIG. 9A shows the case of the conventional example, and FIG. 9B shows the case of the present embodiment. As shown in the figure, in the case of the conventional method, the output data is arranged in the PE1 of the node Node-1 to the node-3 and the PE2 of the node Node-3, and in the case of the method of the present embodiment, the output data is Are arranged in PE1 and PE2 of nodes Node-1 to Node-2.

【００５６】図９に示すように、従来例においては、ノ
ードNode-3にｙ3 ，ｙ4 が残されるため、ノードNode-2
までの４つのプロセッサ・エレメントで演算できる処理
であっても、ノードNode-3まで含めた６つのプロセッサ
・エレメントで処理するか、さもなくば、一旦データを
ホストプロセッサで読み出した後、再度、式（３）の行
列データｂ11，…，ｂ24とともに前記した図１３、図１
４のフローチャートに従ってノードNode-1とNode-2に再
配置しなければならない。As shown in FIG. 9, in the conventional example, since y3 and y4 are left in the node Node-3, the node Node-2
Even if the processing that can be performed by the four processor elements up to is processed by the six processor elements including node Node-3, otherwise, once the data is read by the host processor, the expression is re-expressed. , 3 described above together with the matrix data b11, ..., B24 of (3).
It must be relocated to nodes Node-1 and Node-2 according to the flow chart of 4.

【００５７】これに対して、本実施例では、ｙ1 ，ｙ2
，…，ｙ4 も、最初のデータｘ1 ，ｘ2 ，…，ｘ6 と
本質的に同じであるため、予め行列データｂ11，…，ｂ
24を配置しておくことにより計算処理を短縮することが
できる。図１０は本発明の第２の実施例を示す図であ
り、本実施例は各ノードに４個のプロセッサ・エレメン
トを設け、最大８個のノード間をリングで接続したリン
グ型ネットワークを示している。On the other hand, in the present embodiment, y1, y2
, ..., y4 are also essentially the same as the first data x1, x2, ..., x6, so the matrix data b11, ..., b are previously set.
Arranging 24 makes it possible to shorten the calculation process. FIG. 10 is a diagram showing a second embodiment of the present invention. This embodiment shows a ring-type network in which four processor elements are provided in each node and a maximum of eight nodes are connected by a ring. There is.

【００５８】本実施例においても、ノードNode-1からNo
de-8へのプロセッサ・エレメントＰＥ1 〜ＰＥ32の割り
付けは図２に示した第１の実施例と同様にノード順に割
り付けられる。そして、図６に示したような行列とベク
トルの積和を求める場合には、図６と同様に、各プロセ
ッサ・エレメントＰＥ1 からＰＥ32に、行列Ｗ_ijの各行
の要素を各プロセッサ・エレメントＰＥ1 〜ＰＥ32の番
号に対応させて配置し、また、リング上のｘの各要素を
同様に各プロセッサ・エレメントＰＥ1 〜ＰＥ32の番号
に対応させて配置する。Also in the present embodiment, the nodes Node-1 to No
The processor elements PE1 to PE32 are allocated to de-8 in the node order, as in the first embodiment shown in FIG. When the product sum of the matrix and the vector as shown in FIG. 6 is obtained, the elements of each row of the matrix W _ij are assigned to the processor elements PE1 to PE32 in the same manner as in FIG. The elements of x on the ring are similarly arranged corresponding to the numbers of PE32, and the elements of x on the ring are similarly arranged corresponding to the numbers of the processor elements PE1 to PE32.

【００５９】これにより、前記した場合と同様な手順で
行列とベクトルの積和を求めることができる。以上のよ
うに、各ノードに２以上のプロセッサ・エレメントを配
置した場合にも第１の実施例と同様、本発明を適用する
ことが可能である。As a result, the product sum of the matrix and the vector can be obtained by the same procedure as that described above. As described above, the present invention can be applied to the case where two or more processor elements are arranged in each node as in the first embodiment.

【００６０】[0060]

【発明の効果】以上説明したように、本発明において
は、データの転送方向とは対応づけず、ノードの順番に
プロセッサ・エレメントへ処理プログラムを割り付けて
いるので、リング型ネットワークにおいて、ネットワー
ク構成の変更を容易に行うことができる。また、ノード
の順番に各プロセッサ・エレメントにデータを配置する
ことにより、ネットワーク構成を変更しても、実行時の
データの再配置が不要となり、高速に処理することが可
能となる。As described above, in the present invention, the processing program is allocated to the processor elements in the order of the nodes, not in correspondence with the data transfer direction. Changes can be made easily. Further, by arranging the data in each processor element in the order of the nodes, even if the network configuration is changed, it is not necessary to re-arrange the data at the time of execution, and the processing can be performed at high speed.

[Brief description of drawings]

【図１】本発明の原理図である。FIG. 1 is a principle diagram of the present invention.

【図２】本発明の第１の実施例の処理プログラムの割り
付けを示す図である。FIG. 2 is a diagram showing allocation of processing programs according to the first embodiment of this invention.

【図３】本実施例におけるデータ配置処理のフローチャ
ートである。FIG. 3 is a flowchart of a data arrangement process in this embodiment.

【図４】本実施例におけるデータ配置処理のフローチャ
ート（続き）である。FIG. 4 is a flowchart (continuation) of data arrangement processing in the present embodiment.

【図５】本実施例におけるデータ配置処理のフローチャ
ート（続き）である。FIG. 5 is a flowchart (continuation) of data arrangement processing in the present embodiment.

【図６】本実施例におけるデータ配置を示す図である。FIG. 6 is a diagram showing a data arrangement in this embodiment.

【図７】本実施例において構成を変更したリング型ネッ
トワークを示す図である。FIG. 7 is a diagram showing a ring network whose configuration is changed in the present embodiment.

【図８】従来方法と本実施例の方法におけるデータの配
置を示す図である。FIG. 8 is a diagram showing data arrangement in the conventional method and the method of the present embodiment.

【図９】従来方法と本実施例の方法における演算の結果
を示す図である。FIG. 9 is a diagram showing the result of calculation in the conventional method and the method of the present embodiment.

【図１０】本発明の第２の実施例を示す図である。FIG. 10 is a diagram showing a second embodiment of the present invention.

【図１１】本発明の前提となるシステムの構成を示す図
である。FIG. 11 is a diagram showing a configuration of a system which is a premise of the present invention.

【図１２】本発明の前提となるリング型ネットワークの
構成を示す図である。FIG. 12 is a diagram showing a configuration of a ring network that is a premise of the present invention.

【図１３】従来例におけるデータ配置処理のフローチャ
ートである。FIG. 13 is a flowchart of a data arrangement process in a conventional example.

【図１４】従来例におけるデータ配置処理のフローチャ
ート（続き）である。FIG. 14 is a flowchart (continuation) of data arrangement processing in the conventional example.

【図１５】従来のリング型ネットワークにおけるデータ
配置を示す図である。FIG. 15 is a diagram showing a data arrangement in a conventional ring network.

【図１６】従来のリング型ネットワークにおける積算処
理を示す図である。FIG. 16 is a diagram showing integration processing in a conventional ring network.

【図１７】従来のリング型ネットワークにおける転送処
理を示す図である。FIG. 17 is a diagram showing a transfer process in a conventional ring network.

【図１８】従来のリング型ネットワークにおける次段の
積算処理を示す図である。FIG. 18 is a diagram showing a next-stage integration process in a conventional ring network.

【図１９】従来例において構成を変更したリング型ネッ
トワークを示す図である。FIG. 19 is a diagram showing a ring network whose configuration has been changed in the conventional example.

[Explanation of symbols]

Node-1〜Node-n ノードＰＥ1 〜ＰＥ2m, ＰＥ2n プロセッサ・エ
レメントＣＰＵ１並列コンピュー
タＣＰＵ２ホストコンピュ
ータ１プロセッサ２主記憶装置３インタフェースNode-1 to Node-n nodes PE1 to PE2m, PE2n Processor element CPU1 Parallel computer CPU2 Host computer 1 Processor 2 Main memory 3 Interface

Claims

[Claims]

1. A ring type network comprising at least two processor elements (PE1, PE2) in each node (Node-1, ..., Node-n) and comprising a forward path and a return path passing through each node, Processor element of each node (Node-1, ..., Node-n)
The data held by (PE1, PE2) is stored in each processor element (P
E1, PE2) to transfer each processor element (PE
In a parallel computer that performs processing by (1, PE2), nodes (Node-1, ..., Node-n) are sequentially transferred on the network from the processor elements (PE1, PE2, ...) provided in the node. A parallel computer characterized in that a processing program for processing data is allocated.

2. Each node (Node-1, ..., Node-n) is provided with at least two processor elements (PE1, PE2), and is constituted by a ring-type network in which an outward path and a return path pass through each node, Processor element of each node (Node-1, ..., Node-n)
The data held by (PE1, PE2) is stored in each processor element (P
E1, PE2) to transfer each processor element (PE
In a parallel computer that processes by (1, PE2), data transferred sequentially on the network from the processor elements (PE1, PE2) provided in the node (Node-1, ..., Node-n) in order. A processing program for processing is assigned to each processor element provided in the node in the order of nodes (Node-1, ..., Node-n).
A parallel computer in which data is sequentially arranged from (PE1, PE2).