JPS6174058A

JPS6174058A - Parallel processing system

Info

Publication number: JPS6174058A
Application number: JP59195296A
Authority: JP
Inventors: Toshio Komatsu; 小松　俊雄; Atsushi Ishikawa; 篤石川
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 1984-09-18
Filing date: 1984-09-18
Publication date: 1986-04-16

Abstract

PURPOSE:To make a processing at high speed after transfer between PEs without decreasing the number of an effective PE by providing a register holding a boundary value, a control register and a selector in respective processor element (PE). CONSTITUTION:Registers 78-81, selectors 82-85, and a control register 86 are provided for a PE77. the registers 78-81 hold boundary values and selectors 82-85 select transfer data between PEs or boundary values. The control register 86 holds a judging bit judging whether the boundary values are required or not after the transfer between the PEs. Interfaces 87-94 carry out an interface with other PE.

Description

【発明の詳細な説明】〔産業上の利用分野〕本発明は、同一のプロセッサ・エレメント（ＰＥ）をネ
ットワークに結合し・て処理を分散させる事により、高
度な並列処理を行う並列処理方式に関し、特にＰＥ間の
データ転送後の処理を高速化する方式に関する。[Detailed Description of the Invention] [Industrial Application Field] The present invention relates to a parallel processing method that performs highly parallel processing by connecting identical processor elements (PEs) to a network and distributing processing. In particular, the present invention relates to a method for speeding up processing after data transfer between PEs.

[Prior art]

超ＬＳＩおよびマイクロプロセッサの著しい発展に伴っ
て、同一のＰＥをネットワーク状に結合し、て高度の並
列処理を行う装置が種々提案、構築されている。第３図
はその場合の従来のＰＥ構成を示し・たちので、ＰＥ１
００は、所定の（′ｔｉ算を実行する演算部２．ＰＥ間
のデータ転送を行う転送部３．演算部２の演算結果ある
いは転送部３の転送データをセレクタ４を介して記憶す
るメモリ部５、及び、第３図では省略し１だが実行すべ
きプログラムを記憶するプログラムメモリより構成され
る。With the remarkable development of VLSIs and microprocessors, various devices have been proposed and constructed that connect identical PEs in a network to perform highly parallel processing. Figure 3 shows the conventional PE configuration in that case.
00 is a calculation unit 2 that performs a predetermined (′ti calculation); a transfer unit 3 that transfers data between PEs; a memory unit that stores the calculation result of the calculation unit 2 or the transfer data of the transfer unit 3 via the selector 4; 5, and a program memory 1, which is omitted in FIG. 3, for storing programs to be executed.

かかるＰＥのネットワーク形態とては格子状。The network form of such PEs is a grid.

ツリー状、キューブ状等があり、解くへき問題のＰＥ間
転送の形態に応じてそれぞれのネットワークが採用され
ている。There are tree-shaped, cube-shaped, etc. networks, and each network is adopted depending on the form of inter-PE transfer of the problem to be solved.

いま、−例として、ある境界条件を持つ２次元のラプラ
ス方程式％式％を格子状の装置で解く場合を考える（、二の種の問題は
格子状のネットワークが適している）。Now, as an example, consider the case where a two-dimensional Laplace equation with certain boundary conditions is solved using a grid-like device (a grid-like network is suitable for the second type of problem).

従来、この解法としては、まず、（１）を次の差分方程
式に近似する。Conventionally, this solution method first approximates (1) to the following difference equation.

−ｕＨ＋、ｌ−＋　　ｕｌ−１、Ｊ＋４ｕ＋　Ｊ　　ｕ
ｔｒｔ　ＩＪ　　１１．＋Ｊ＋１　＝０　（２）ここで
、左から１番目、下からｊｔｉ目の格子点を（ｉ、ｊ）
とし、その格子点におけるｕ　（ｘ＋　ｙ）の値をｕｉ
ｊと表現している。次に、各Ｕに近似性を代入すると、
新Ｕ１．ガウス・ザイデル法では新Ｕ、＋　＝Ｖａ　（ｕ；　＋ｊ−１＋ｕ、　−１ＩＪ
　＋ｕ、や１．Ｊ　＋ｕ、　、Ｊ＋、）　　（３）ＳＯ
Ｒ法では新ｕ、ｊ＝旧Ｊ　ｊ　＋ｗ　（正記の新ｕ、−）−（旧
ｕｚ））（４）となる。いずれの場合でも、従来装置で
はＩＰＥに１つまたはそれ以上の格子点を割付けて、Ｐ
Ｅ間転送を行い−ｕｉ　＋Ｊ−１１ｕ＋−１１Ｊｌ　ｕ
ｔｒｔ　＋１＋ｕ１，１＋１がそろった時点で新ｕ＋、
＋の計算を行っている。各ＰＥはＰＥ間転送と新ｕｉＪ
の計算をある条件を満たすまで繰り返す。-uH+, l-+ ul-1, J+4u+ J u
trt IJ 11. +J+1 =0 (2) Here, the first grid point from the left and the jtith grid point from the bottom is (i, j)
and the value of u (x+y) at that grid point is ui
It is expressed as j. Next, by substituting the closeness for each U, we get
New U1. In the Gauss-Seidel method, the new U, + = Va (u; +j-1+u, -1IJ
+u, and 1. J+u, ,J+,) (3) SO
In the R method, new u, j=old J j +w (registered new u, -) - (old uz)) (4). In either case, conventional equipment assigns one or more grid points to the IPE and
Perform inter-E transfer -ui +J-11u+-11Jl u
trt +1+u1, as soon as 1+1 is complete, new u+,
+ is being calculated. Each PE uses inter-PE transfer and new uiJ
Repeat the calculation until a certain condition is met.

第４図は４Ｘ４個のＰＥが格子状に結合された装置例で
、１はコントロールユニット（ＣＵ）。Figure 4 shows an example of a device in which 4x4 PEs are connected in a grid pattern, where 1 is a control unit (CU).

６〜２１はＰＥである。ＣＵ５はコントロール線２２と
ステータス線２３で各ＰＥと接続され、各ＰＥの状態を
ステータス線２３で入手し、コントロール線２２で各Ｐ
Ｅを制御する。ＰＥ６〜２１はＰＥ間インタフェース線
２４によって結合され。6 to 21 are PE. The CU5 is connected to each PE through a control line 22 and a status line 23, obtains the status of each PE through the status line 23, and obtains the status of each PE through the control line 22.
Control E. The PEs 6 to 21 are coupled by an inter-PE interface line 24.

ＰＥ間のデータ転送が行わ九る。Data transfer between PEs takes place.

二こで、格子点のＰＥ割付けとしては、第５図に示す様
に２つの方法が考えられる。第５図は第４図の装置例に
対応するもので、ＩＰＥに１格子の割付け、境界は正方
形としている。しかし、そうでない場合でも以下に述べ
る問題は同様である。Two methods can be considered for PE allocation of grid points, as shown in FIG. FIG. 5 corresponds to the device example shown in FIG. 4, in which one grid is allocated to the IPE, and the boundaries are square. However, even if this is not the case, the problems described below are the same.

第５図（Ａ）は格子点を４×４とし、第４図のＰＥ６〜
２１と格子番号２５〜４０をそれぞれ対応づけ、１番外
側のＰＥに境界値を、それ以外の内側のＰＥには初期値
を割付ける方法である。この割付けは境界値が一定であ
るので、外側のＰＥは新ｕｉｊの計算（直を境界値に置
き換えるため、外側ＰＨの計算は無効となる。従って、
格子の行数をＭ、≠１．政をＮとすると、実際に有効な
新Ｌ１＋Ｊの計算を実行し、でいるＰＥ数は（Ｍ−２）
ｘ（Ｎ−２）であり、有効ＰＥ数が（２（Ｍ＋Ｎ）−４
）藺少なくなるという問題が生しる。第５図（Ａ）の例
では、　総ＰＥ数が・ＩＸ４に対し、有効ＰＥ数は２Ｘ
２となる。哀らに、格子状が正方形でなく長方形でＮ＝
４の場合、有効ＰＥ数は半数以下になる問題かある。In Figure 5 (A), the grid points are 4 x 4, and PE6~
21 and grid numbers 25 to 40, respectively, and assign a boundary value to the outermost PE and an initial value to the other inner PEs. In this assignment, the boundary value is constant, so the outer PE calculates the new uij (replaces direct with the boundary value, so the calculation of the outer PH becomes invalid. Therefore,
Let the number of rows of the grid be M, ≠1. When the government is N, the actual effective new L1+J calculation is executed, and the number of PEs obtained is (M-2)
x(N-2), and the effective number of PEs is (2(M+N)-4
) The problem arises that the amount of food is reduced. In the example in Figure 5 (A), the total number of PEs is ・IX4, and the effective number of PEs is 2X.
It becomes 2. Unfortunately, the grid is not square but rectangular and N=
In the case of 4, there is a problem that the number of effective PEs is less than half.

第５図（Ｂ）は格子点を６Ｘ６とし１．第４図の全１’
Ｅ６＝２１に初期値を割付け、有効ＰＥ数を総ＰＥＶｌ
と同し、４Ｘ４（一般にはＭＸＮ）とする方法である。Figure 5(B) shows 1. The grid points are 6x6. All 1' in Figure 4
Assign an initial value to E6=21, and set the number of effective PEs to the total PEVl.
This is the same method as 4X4 (generally MXN).

第５図（Ｂ・）の場合、格子番号４１〜５６か第４図の
ＰＥ６〜２１に対応する。In the case of FIG. 5(B), grid numbers 41-56 correspond to PE6-21 in FIG.

二の第５図（Ｂ）の割付けとし、た場合の従来の処理フ
ローを第６図に示す。まず、初期値を全ＰＥに設定し、
境界値を外側のＰＥのみに設定する。FIG. 6 shows a conventional processing flow in the case of the layout shown in FIG. 5(B) in FIG. First, set the initial value to all PEs,
Set boundary values only for outer PEs.

次に、プログラムを全ＰＥにブロードキャスト後／。Then, after broadcasting the program to all PEs.

各ＰＥはＰＥ間転送および新ｕ　＋　、１の計算を実行
する５この場合、各ＰＥ同一のプログラムであるので（
同一のプログラムでないと初期設定のオーバーヘッドが
大きくなる）、外側ＰＥは計算を実行するにあたり、Ｐ
Ｅ間転送されたデータを初期設定された境界値に置き換
える必要がある。従って、繰り返し、毎に各ＰＥは、新
ｕ＋、＋の計算を実行するにあた一す、境界値に置き換
えるかどうかの判定と、もし２そうであれば置き換えを
ソフトウェアで実行するため、このオーバーヘッドが問
題となる。Each PE performs inter-PE transfer and calculation of new u + , 1. In this case, since each PE has the same program (
(If the programs are not the same, the initial setting overhead will be large), and the outer PE will use P when executing calculations.
It is necessary to replace the data transferred between E with the initialized boundary value. Therefore, at each iteration, each PE performs the calculation of the new u+, +, determines whether to replace it with the boundary value, and if so, performs the replacement in software, so this Overhead is a problem.

[Purpose of the invention]

本発明の目的は、上記従来技術の欠点を解決し７１、有
効ＰＥ数を減少させずに、かつＰＥ間間転後後処理の高
速化を図った並列処理方式を提供することにある。SUMMARY OF THE INVENTION An object of the present invention is to solve the drawbacks of the prior art described above and provide a parallel processing method that does not reduce the number of effective PEs and speeds up post-processing after transfer between PEs.

[Structure and operation of the invention]

本発明は、各ＰＥに境界値を保持するレジスタと、ＰＥ
間転送後境界値を必要とするかどうかの判定ビットを保
持する制御レジスタと、境界値か転送データかを選択す
るセレクタを設ける事により５有効ＰＥ数を減少させず
に、かつ、ＰＥ間間転後後処理を高速にしたものである
。The present invention provides registers that hold boundary values in each PE, and
By providing a control register that holds a judgment bit that determines whether a boundary value is required after data transfer, and a selector that selects either the boundary value or the transferred data, it is possible to reduce the number of PEs without reducing the number of effective PEs. This speeds up post-transfer processing.

第１図は本発明の一実施例であって、７７はハードウェ
アを拡張したＰＥ、７８〜８１は境界値を保持するレジ
スタ、８２〜８５はＰＥ間転送データと境界値を選択す
るセレクタ、８６は境界値に置き換えるかどうかの判定
ビットを保持する制御レジスタ、８７〜９４は他ＰＥと
のインタフェース、１００は第３図に示す従来のＰＥ構
成部である。FIG. 1 shows an embodiment of the present invention, in which 77 is a PE with expanded hardware, 78 to 81 are registers that hold boundary values, 82 to 85 are selectors that select inter-PE transfer data and boundary values, Reference numeral 86 denotes a control register that holds a determination bit for determining whether or not to replace with a boundary value, 87 to 94 are interfaces with other PEs, and 100 is a conventional PE component shown in FIG.

第１図の処理の流れを第２図に示す。以下、第４図の装
置構成を例にとって、第５図（Ｂ）のＲＥ割付法での詳
細な動作順を説明する。まず、コントロール・ユニット
５は各ＰＥに初期値をセットする。すなわち、ＰＥ６に
４１．ＰＥ７に４２゜ＰＥ８に４３、ＰＥ９に４／ｌ　
ＰＥｌ０に４５゜ＰＥ１１に４６．ＰＥ１２に４７．Ｐ
Ｅ１３に４８、ＰＥ１４に１１９．ＰＥ１５に５０．Ｐ
Ｅ１６に５１、ＰＥ１７に５２．ＰＥ１８に５３．ＰＥ
１９に５４、ＰＥ２０に５５．ＰＥ２１に５６の各格子
点の初期値をセットする。次に外側のＰＥに境界値をセ
ットする。すなわち、ＰＥ６のレジスタ７８に７６、レ
ジスタ８０に５８．ＰＥ７のレジスタ８０に５９．ＰＥ
８のレジスタ８０に６０、ＰＥ９のレジスタ７９に６３
、レジスタ８０に６１．ＰＥｌ０のレジスタ７８に７５
、ＰＥｌ３のレジスタ７９に６４．ＰＥ１４のレジスタ
７８に７６、ＰＥ１７のレジスタ７９に６５、ＰＥ１８
のレジスタ７８に７３．レジスタ８１に７１、ＰＥｌ９
のレジスタ８１に７０．ＰＥ２０のレジスタ８１に７１
．ＰＥ２１のレジスタ７９に６６゜レジスタ８１に６８
の各格子点の境界値をセットする。なお、第５図（Ｂ）
の四隅の格子点５７，６２．６７．７２は利用しない。FIG. 2 shows the flow of the process shown in FIG. The detailed operation sequence in the RE allocation method of FIG. 5(B) will be described below, taking the device configuration of FIG. 4 as an example. First, the control unit 5 sets initial values to each PE. That is, 41. to PE6. 42° for PE7, 43 for PE8, 4/l for PE9
45° to PE10, 46° to PE11. 47 for PE12. P
48 on E13, 119 on PE14. 50 for PE15. P
51 for E16, 52 for PE17. 53 for PE18. P.E.
54 on 19, 55 on PE20. Initial values of each of the 56 grid points are set in PE21. Next, a boundary value is set in the outer PE. That is, the register 78 of PE6 has 76, the register 80 has 58. 59. in the register 80 of PE7. P.E.
60 in register 80 of PE 8, 63 in register 79 of PE9
, 61. in register 80. 75 in register 78 of PEl0
, 64. in the register 79 of PEl3. 76 in register 78 of PE14, 65 in register 79 of PE17, PE18
73. in the register 78. 71 in register 81, Pel9
70. in register 81. 71 in register 81 of PE20
．． 66° to register 79 of PE21 68 to register 81
Set the boundary value of each grid point in . In addition, Fig. 5 (B)
Grid points 57, 62, 67, and 72 at the four corners of are not used.

次に制御レジスタ８６に制御情報をセットする。Next, control information is set in the control register 86.

いま、制御レジスタ８６のビットが左から順に左ＰＥ、
右ＰＥ、上ＰＥ、下ＰＥに対応し、転送データを使用す
る場合を’ｏ”、境界値に置き換える場合を°１″とす
ると、ＰＥ６のレジスタ８６に”Ｉ　ＯＩ　Ｏ”、ＰＥ
７．ＰＥ８のレジスタ８６に”００１０’″、ＰＥ９の
レジスタ８６に”０１１０”。Now, the bits of the control register 86 are sequentially set from the left to the left PE,
Corresponding to the right PE, upper PE, and lower PE, if 'o' is used to use the transfer data, and '1'' is used to replace the boundary value, then 'I OI O' is written in the register 86 of PE6, and PE
7. "0010'" in the register 86 of PE8, "0110" in the register 86 of PE9.

ＰＥｌ０．１４のレジスタ８６に”ｔ　ｏ　ｏ　ｏ”、
　　ｐＥＩ　Ｉ、１２．１５．１．６のレジスタ８６に
００００”、ＰＥ１３，１７のレジスタ８６に′０１０
０”、ＰＥ１８のレジスタ８６に”１００１”。"t o o o" in register 86 of PEL0.14,
pEI I, 0000” in register 86 of 12.15.1.6, ’010 in register 86 of PE13,17
0”, “1001” in register 86 of PE18.

ＰＥ］、９．２０のレジスタ８６に”ｏ　ｏ　ｏ　ｉ”
、　　ｐＥ２＋のレジスタ８６に’０１０１”をセント
する。PE], “o o o i” in register 86 of 9.20
, writes '0101' to the register 86 of pE2+.

次に各ＰＥへのプログラムのブロードキャストｍ、　Ｐ
Ｅ間転送を実行する。す、なわち、全ＰＥ６〜２１は左
方向、右方向、上方向、下方向に一斉に転送を行う。こ
の際、ＰＥ間転送を必要とし、ない外側ＰＥも存在する
が、ＰＥ間転送をｍ純にすることと、転送周りのプログ
ラムを全ＰＥ同一にしてブロードキャストを可能とする
ため、全ＰＥ間し、転送を実行する。いま、左方向のＰ
Ｅ間転送を行う場合、ＰＥ９，１３．１７．２１はレジ
スタ７９に保持し１ている境界値をインタフェース９２
に、それ以外のＰＥは転送データをインタフェース９２
に通知する。また、右方向のＰＥ間転送の場合、ＰＥ６
，１０，１４．１８はレジスタ７８に保持している値を
、それ以外のＰＥは転送データをインタフェース９１に
通知する。また、上方向のＰＥ間転送の場合、ＰＥ１８
．１９．２０゜２１はレジスタ８１に尿持している値を
、それ以外のＰＥは転送データをインタフェース９４に
通知する。また、下方向のＰＥ間転送の場合は、ＰＥ６
，７．８．９はレジスタ８０に保持している値を、それ
以外のＰＥは転送データをインタフェース９３に通知す
る。Next, broadcast the program to each PE m, P
Execute inter-E transfer. In other words, all the PEs 6 to 21 perform transfer in the left direction, right direction, upward direction, and downward direction at the same time. At this time, there are outer PEs that require inter-PE transfers and do not have them, but in order to make inter-PE transfers m-pure and to make the transfer-related programs the same for all PEs to enable broadcasting, it is possible to transfer between all PEs. , perform the transfer. Now P towards the left
When performing inter-E transfer, PE 9, 13, 17, 21 transfers the boundary value held in register 79 to interface 92.
Then, other PEs transfer the transferred data to the interface 92.
to notify. In addition, in the case of transfer between PEs in the right direction, PE6
, 10, 14, and 18 notify the interface 91 of the value held in the register 78, and the other PEs notify the transfer data. In addition, in the case of upward inter-PE transfer, PE18
．． 19.20.21 notifies the value stored in the register 81, and other PEs notify the interface 94 of the transfer data. In addition, in the case of downward inter-PE transfer, PE6
, 7.8.9 notifies the value held in the register 80, and the other PEs notify the interface 93 of the transfer data.

ＰＦ、間転送終了後、各ＰＥは新ｕｉ、ｌの計算を行い
、ある条件（例えば残差条件、また繰り返し数）が満足
されるまでＰＥ間転送、新ｕｔ、＋の計算を繰り返す。After the PF, inter-transfer is completed, each PE calculates the new ui,l, and repeats the inter-PE transfer and the calculation of the new ut,+ until a certain condition (for example, a residual condition or the number of repetitions) is satisfied.

この様な構造になっているから、ＰＥの使用効率が悪く
なる第５図（Ａ）の割付けを採用することなく、全ての
ＰＥが有効である第５図（Ｂ）の割付けが選択できる。With such a structure, the allocation shown in FIG. 5(B) in which all PEs are effective can be selected without adopting the allocation shown in FIG. 5(A), which deteriorates PE usage efficiency.

し１かも、従来、ソフトウェアで境界値の置き換えを行
っている所を、ハードウェア化し、ているので、処理時
間の短縮が可能であり。Also, the processing time can be shortened because the boundary value replacement, which is conventionally done in software, is done in hardware.

転送周りのソフトウェア作成も容易である。It is also easy to create software related to transfer.

いま、従来装置における処理時ｒＩ！Ｊ（割付けは第５
図（Ｂ）とする）をＴ（従）とすると。Now, when processing with the conventional device rI! J (assigned to 5th
(Fig. (B)) is T (subordinate).

Ｔ（従）ニーＴ（初）十Ｔ（プ）＋ｋ（Ｔ（転）十Ｔ（
判１）十Ｔ（α）＋Ｔ（ｕ）＋Ｔ（判））で表わすことができる。ここで、Ｔ（初）は初期設定時
間、Ｔ（プ）はプログラム設定時間、には繰り返し数、
Ｔ（転）はＰＥ間の転送時間、Ｔ（判Ｉ）は転送データ
を使用するかどうかの判定時間、■（α）は境界値に置
き換える時間、Ｔ（ｕ）はｕ＋、＋の計算時間、Ｔ（判
２）は繰り返し条件の判定時間である。T (subordinate) knee T (first) ten T (pu) + k (T (turn) ten T (
Size 1) It can be expressed as 10T(α)+T(u)+T(size)). Here, T (first time) is the initial setting time, T (pu) is the program setting time, and is the number of repetitions.
T (transfer) is the transfer time between PEs, T (judge I) is the time to determine whether to use the transferred data, ■ (α) is the time to replace with the boundary value, T (u) is the calculation time for u+, + , T (judgment 2) is the determination time of the repetition condition.

一方１本発明での処理時間Ｔ（本）はＴ（木）−＝Ｔ（初）十Ｔ（制）十Ｔ（プ）＋ｋ（Ｔ（
転）＋Ｔ（ｕ）＋Ｔ（’ＮＪ２　ン　）となる。なお、Ｔ（制）は制御情報の設定時間である５
二こで、制御情報の設定は、ＰＥ数が増加しても制御パ
ターンは高々９通りなので、パターン毎に各ＰＥにブロ
ードキャストすればよいので。On the other hand, the processing time T (books) according to the present invention is T (tree) - = T (first time) ten T (system) ten T (pu) + k (T (
)+T(u)+T('NJ2n). Note that T (regime) is the setting time of the control information 5
Second, when setting the control information, even if the number of PEs increases, there are at most nine control patterns, so it is sufficient to broadcast each pattern to each PE.

Ｔ（制）はｋが大きい場合、無視できる。従って。T (control) can be ignored when k is large. Therefore.

本発明ＰＥでは従来ＰＥと比べて約ｋ　（Ｔ（α）十Ｔ
（判１））の処理時間の短縮がはかれる。In the PE of the present invention, compared to the conventional PE, approximately k (T (α) + T
(Version 1)) The processing time can be shortened.

実施例では、格子状ネットワークを例にとって説明し、
だが、ネットワークがキューブ、ツリー等の形態でも境
界条件を扱う様な問題分野にネットワーク状の装置を適
用する場合、有効である。また、境界が種々の形態（例
えば凸形、凹形等）でも１本発明方式は有効である。さ
らに、ＩＰＨに１格子を割付けるのではなく、複数の格
子を割付ける場合でも、境界値を保持するレジスタ容量
と制御レジスタのビット数を増加することにより、本発
明方式の適用が可能である。In the example, a lattice network will be explained as an example,
However, even if the network is in the form of a cube or tree, it is effective when applying a network-like device to a problem field that deals with boundary conditions. Furthermore, the method of the present invention is also effective even if the boundaries have various shapes (for example, convex, concave, etc.). Furthermore, even when multiple grids are allocated to the IPH instead of one grid, the method of the present invention can be applied by increasing the register capacity for holding boundary values and the number of bits of the control register. .

〔Effect of the invention〕

以上説明したように１本発明によれば、従来のＰＥに簡
単なハードウェアを付加することにより。As explained above, according to the present invention, simple hardware is added to a conventional PE.

ＰＥ間間転後後データ処理が高速な、ＰＥ間使用効率の
いい割付は法を適用できる利点がある。There is an advantage in that the method can be applied to efficient allocation among PEs, in which data processing after data transfer between PEs is fast.

[Brief explanation of drawings]

第１図は本発明の一実施例を示すＰＥ構成図。第２図は第１図の処理フローを示す図、第３図は従来の
ＰＥ構成例を示す図、第４図はＰＥを格子状に配置し・
た装置例を示す図、第５図（Ａ）、（Ｂ）は格子点のＰ
Ｅ割付は法を示す図、第６図は従来のＰＥ構成の処理フ
ローを示す図である。５・・・コントロール・ユニット（ＣＵ）。６〜２１・・・プロセッサ・エレメント（ＰＥ）。７７・・・本発明ＰＥ、　　　７８〜８１・・・境界値
保持レジスタ、　　８２〜８５々セレクタ。８６・・・制御レジスタ、　　１００甲従来ＰＥ。第１図第　２　区第　　４　　図第　　５　　図FIG. 1 is a PE configuration diagram showing an embodiment of the present invention. Fig. 2 shows the processing flow of Fig. 1, Fig. 3 shows an example of a conventional PE configuration, and Fig. 4 shows PEs arranged in a grid.
Figures 5(A) and 5(B) show an example of a device in which the P of the lattice point is
The E layout is a diagram showing the method, and FIG. 6 is a diagram showing the processing flow of the conventional PE configuration. 5...Control unit (CU). 6-21... Processor element (PE). 77... PE of the present invention, 78-81... Boundary value holding register, 82-85 selectors. 86...Control register, 100A conventional PE. Figure 1, Ward 2, Figure 4, Figure 5

Claims

[Claims]

(1) Multiple processors connected in a network
and a control unit that controls each processor element, and each processor element has a program memory that stores a program to be executed, a data memory that stores data, and an arithmetic unit that executes its own program. In a parallel processing system comprising a transfer unit that performs transfer between each processor element, each processor element has a value (
A control register that indicates whether or not a boundary value is required after transfer between processor elements (hereinafter referred to as a boundary value), and a selector that selects a boundary value and inter-PE transfer data. Features parallel processing method.