JPH0438017B2

JPH0438017B2 -

Info

Publication number: JPH0438017B2
Application number: JP59270898A
Authority: JP
Inventors: Fumio Takahasi; Yukio Nagaoka; Iwao Harada
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1984-12-24
Filing date: 1984-12-24
Publication date: 1992-06-23
Also published as: JPS61148564A

Description

【発明の詳細な説明】〔発明の利用分野〕本発明は、並列処理計算機に係り、特に偏微分
方程式の数値解を並列処理により高速に求めるの
に好適な、多重命令流多重データ流型（MIMD
型）並列処理計算機に関するものである。DETAILED DESCRIPTION OF THE INVENTION [Field of Application of the Invention] The present invention relates to a parallel processing computer, and in particular to a multiple instruction stream multiple data stream type ( MIMD
type) related to parallel processing computers.

[Background of the invention]

従来から、科学技術計算、特に偏微分方程式の
数値解を高速に求めるために、複数のプロセツサ
で並列に処理する並列処理計算機が開発されてき
た。その代表的なものはACM Transactionson
Computer Systeme，Vol １，No.３、
Auqust1983、p195−221に“PACS：Aparallel
Microprocessor Array for Scientific
Calculations”と題する論文に示されている。 BACKGROUND ART Parallel processing computers that perform parallel processing using multiple processors have been developed in order to rapidly obtain numerical solutions for scientific and technical calculations, particularly partial differential equations. A typical example is ACM Transactionson
Computer Systeme, Vol 1, No. 3,
Auqust1983, p195−221 “PACS: Aparallel
Microprocessor Array for Scientific
Calculations”.

この計算機は、隣接プロセツサ間を接続して１
次元または２次元的配列のアレイプロセツサを構
成した近接接続型と呼ばれるものであり、プロセ
ツサ間の接続が簡単である利点を有する反面、離
れたプロセツサ間のデータ転送時間が大きい弱点
があつた。 This computer connects adjacent processors and
It is a so-called close connection type in which array processors are arranged in a two-dimensional or two-dimensional array, and has the advantage of easy connection between processors, but has the disadvantage that it takes a long time to transfer data between distant processors.

また、プロセツサの配列がハードウエア的に固
定されているものであるため、以下の例に示すよ
うに、処理する問題によつて計算効率が悪くなる
という問題点がある。 Furthermore, since the processor arrangement is fixed in terms of hardware, there is a problem that calculation efficiency deteriorates depending on the problem to be processed, as shown in the following example.

(1) ２次元配列のプロセツサで１次元問題（例え
ば∂φ／∂t＝∂²φ／∂x²）を処理する場合は、第２図
ａのように１行または１列のみのプロセツサを使用
するが、同図ｂのような計算格子点割当を行う
のが一般的である。前者の場合はプロセツサ１
台あたりが受け持つ格子点が多くなり計算時間
が長くなる。後者はプロセツサの場所によりデ
ータ転送方向が異なるため、この判定に余分な
処理時間を要する。(1) When processing a one-dimensional problem (for example, ∂φ/∂t=∂ ² φ/∂x ² ) with a two-dimensional array processor, use a processor with only one row or one column as shown in Figure 2a. However, it is common to allocate calculation grid points as shown in FIG. In the former case, processor 1
The number of grid points handled by each platform increases, and the calculation time increases. In the latter case, since the data transfer direction differs depending on the location of the processor, extra processing time is required for this determination.

(2) ２次元計算（例えば∂φ／∂t＝∂²φ／∂x²＋∂²φ／∂y²のイクスプリシヤツト差分解法）の処理は、１次元
配列プロセツサでは２次元配列プロセツサより
転送データ数が多くなるため、処理時間が長く
なる。例えば16×16格子点の計算を16台の１次
元配列プロセツサと４×１の２次元配列プロセ
ツサで処理する場合、どちらも格子点16個の計
算を受け持つが、隣接プロセツサ間の転送デー
タ数は、２次元配列では16個に対して１次元配
列では32個となる。(2) Two-dimensional calculation (e.g. ∂φ/∂t=∂ ² φ/∂x ² +∂ ² The processing using the φ/∂y ² explicit difference decomposition method requires a longer processing time because the number of data to be transferred is larger in a one-dimensional array processor than in a two-dimensional array processor. For example, if calculations for 16 x 16 grid points are processed by 16 one-dimensional array processors and 4 x 1 two-dimensional array processors, both processors handle calculations for 16 grid points, but the number of data transferred between adjacent processors is , there are 16 in a two-dimensional array and 32 in a one-dimensional array.

以上のように、同じ問題でもプロセツサの配列
により、転送データ数が多くなつたり、１台のプ
ロセツサ当りの処理量が多くなることがあり、プ
ロセツサ配列が固定されているものでは、これら
の問題に対しては効率の低下を招いていた。 As mentioned above, even for the same problem, depending on the processor arrangement, the amount of data to be transferred or the amount of processing per processor may increase.If the processor arrangement is fixed, these problems cannot be solved. However, this resulted in a decrease in efficiency.

[Purpose of the invention]

本発明の目的は、並列演算処理を高効率で実行
するために、処理する問題に応じて最適なプロセ
ツサ配列を形成できる並列処理計算機を提供する
ことにある。 SUMMARY OF THE INVENTION An object of the present invention is to provide a parallel processing computer that can form an optimal processor array depending on the problem to be processed in order to execute parallel processing with high efficiency.

[Summary of the invention]

本発明の特徴は、複数のプロセツサを２次元配
列し、この２次元配列を構成する各々の行及び列
でプロセツサ間を接続してなる並列処理計算機に
おいて、複数行または複数列のプロセツサを行単
位または列単位で直列接続する機能を持つバス接
続機構を設けたことにある。 A feature of the present invention is that in a parallel processing computer in which a plurality of processors are arranged in a two-dimensional array and the processors are connected in each row and column constituting the two-dimensional array, the processors in the plurality of rows or columns are arranged in a row-by-row manner. Alternatively, a bus connection mechanism having a function of serially connecting each column is provided.

本発明は、以下の検討によりなされたものであ
る。 The present invention was made based on the following considerations.

複数プロセツサによる並列処理で偏微分方程式
の数値解を求める場合、プロセツサ間のデータ転
送を伴なう。したがつてデータ転送に要する時間
を短縮することは、全体の演算時間短縮につなが
る。データ転送時間を短縮する手段には、転送速
度を上げることと転送データ数を減らすことが考
えられるが、本発明では後者の転送データ数の減
少に着目したものである。 When finding a numerical solution to a partial differential equation using parallel processing using multiple processors, data transfer between the processors is involved. Therefore, reducing the time required for data transfer leads to reducing the overall calculation time. Possible means for shortening the data transfer time include increasing the transfer speed and reducing the number of transferred data, but the present invention focuses on the latter reduction in the number of transferred data.

第３図は２次元Poisson方程式∂φ／∂t＝∂²φ／∂x
²＋ ∂²φ／∂y²（ｔ：時間、ｘ，ｙ：行、列方向位置、 φ：求める変数）をイクスプリシツト差分解法に
より解く場合の、プロセツサ配列と計算格子点配
列に対するプロセツサ間の転送データ数を示した
ものである。同図で（）を付した部分は、プロ
セツサ列数が計算格子点列数より多いため、同じ
行のプロセツサでも異なる行の計算格子点を受け
持つことが生じ、このためにデータ転送先の判断
に余分な処理を必要とする。すなわち第３図は同
じ問題でもプロセツサの配列により、転送データ
数が異なつたり、本来の計算以外の処理が増すこ
とを示しており、Ｍ／Ｎ＝ｍ／ｎ（Ｍ，Ｎ：プロ
セツサの行、列数、ｍ，ｎ：計算格子点の行，列
数）に近い程効率が良くなることを表わしてい
る。 Figure 3 shows the two-dimensional Poisson equation ∂φ/∂t=∂ ² φ/∂x
Transfer between processor array and calculation grid point array when solving ² + ∂ ² φ/∂y ² (t: time, x, y: position in row, column direction, φ: variable to be sought) using explicit difference method This shows the number of data. In the part marked in parentheses in the same figure, the number of processor columns is greater than the number of calculation grid point columns, so even processors in the same row may take charge of calculation grid points in different rows, which makes it difficult to determine the data transfer destination. Requires extra processing. In other words, Figure 3 shows that even for the same problem, the number of transferred data changes depending on the processor arrangement, and processing other than the original calculation increases.M/N=m/n (M, N: Processor row) , number of columns, m, n: number of rows and columns of calculation grid points), the efficiency becomes better.

そこで、本発明では２次元配列したプロセツサ
において、異なる行または列のプロセツサを一次
元接続する機能を設け、適用問題の格子配列に応
じてプロセツサ間の転送データ数が最小となるよ
うに、プロセツサの配列を変更できるようにし
た。 Therefore, in the present invention, in a two-dimensional array of processors, a function is provided to one-dimensionally connect processors in different rows or columns, and the number of data transferred between processors is minimized according to the lattice array of the applied problem. Arrays can now be changed.

[Embodiments of the invention]

以下、本発明の実施例を図面を用いて説明す
る。 Embodiments of the present invention will be described below with reference to the drawings.

第１図は本発明の並列処理計算機の構成を示す
ブロツク図である。第１図において、P_i,j（ｉ＝１
〜Ｍ，ｊ＝１〜Ｎ）はプロセツサ、S₁（ｉ＝１〜
Ｍ−１）はバス接続機構、CUはコントロールプ
ロセツサを示す。プロセツサP_i,jは２次元的に配
列され、行及び列の隣接プロセツサ間は各々行方
向及び列方向データ転送バスで接続される。バス
接続機構S_iはｉ行のプロセツサとｉ＋１行のプロ
セツサの間にあり、ｉ行の最終プロセツサP_i,N及
びｉ＋１行の先頭プロセツサP_i+1,1とデータ転送
バスで接続される。また、隣接のバス接続機構
（S_iとS_i-1及びS_i+1）間もデータ転送バスで接続さ
れる。 FIG. 1 is a block diagram showing the configuration of a parallel processing computer according to the present invention. In Figure 1, P _i,j (i=1
~M, j=1~N) is a processor, S ₁ (i=1~
M-1) indicates a bus connection mechanism, and CU indicates a control processor. The processors P _i,j are arranged two-dimensionally, and adjacent processors in rows and columns are connected by row-direction and column-direction data transfer buses, respectively. The bus connection mechanism S _i is located between the processors in row i and the processors in row i+1, and is connected to the final processor P _i,N in row i and the first processor P _i+1,1 in row i+1 by a data transfer bus. Adjacent bus connection mechanisms (S _i , S _i-1 and S _i+1 ) are also connected by a data transfer bus.

バス接続機構S_iは４つのバス入出力ポートport
１〜port４間の接続及び切離しを行うものであ
り、第４図にブロツク図を示す。第４図におい
て、１１はゲート制御回路、２１〜２４はゲート
回路である。ゲート回路２１〜２４は、ゲート制
御信号gc１，gc２がONのとき入出力線を電気的
に導通せしめ、OFFのときは入出力線間を電気
的に絶縁する機能を持つ。ゲート制御回路１１は
コントロールプロセツサCUからのバス接続制御
信号bc_iにより、ゲート制御信号gc１，gc２のど
ちらかをONにする。 The bus connection mechanism S _i has four bus input/output ports.
It connects and disconnects ports 1 to 4, and a block diagram is shown in FIG. In FIG. 4, 11 is a gate control circuit, and 21 to 24 are gate circuits. The gate circuits 21 to 24 have a function of electrically connecting the input and output lines when the gate control signals gc1 and gc2 are ON, and electrically insulating the input and output lines when the gate control signals gc1 and gc2 are OFF. The gate control circuit 11 turns on one of the gate control signals gc1 and gc2 in response to the bus connection control signal bc _i from the control processor CU.

ゲート制御信号gc１がONのときはゲート回路
２１及び２２が導通状態となるので、port１と
port３及びport２とport４が接続される。同様
に、ゲート制御信号gc２がONのときはゲート回
路２３，２４が導通し、port１とport２及びport
３とport４が接続される。 When gate control signal gc1 is ON, gate circuits 21 and 22 are conductive, so port1 and
port3 and port2 and port4 are connected. Similarly, when gate control signal gc2 is ON, gate circuits 23 and 24 are conductive, and port1, port2, and port
3 and port4 are connected.

ここで、第１図よりバス接続機構S_iのport１に
はｉ行最終列プロセツサが、port２にはｉ＋１行
先頭列プロセツサのデータ転送バスが接続されて
おり、port３，port４は良接バス接続機構S_i-1，
S_i+1のport４，port３とそれぞれ接続されてい
る。このため、ゲート制御信号gc１がONにより
ｉ行とｉ＋１行プロセツサは切離され、gc２が
ONのときはｉ＋１行のプロセツサはｉ行最終列
プロセツサの後に接続され、ｉ行とｉ＋１行は同
一行となる。 Here, from FIG. 1, port 1 of the bus connection mechanism S _i is connected to the processor in the i row and the last column, port 2 is connected to the data transfer bus of the processor in the i+1 row and first column, and ports 3 and 4 are connected to the bus connection mechanism S i. S _i-1 ,
It is connected to port 4 and port 3 of S _i+1 , respectively. Therefore, when the gate control signal gc1 is turned ON, the i-row and i+1-row processors are separated, and gc2 becomes
When ON, the processor in row i+1 is connected after the processor in row i and the last column, and row i and row i+1 are the same row.

したがつて、全てのバス接続機構S_i〜S_N1-1で
ゲート制御信号gc１をONにすれば行間の接続は
なくＭ行Ｎ列の２次元配列プロセツサとなり、逆
に全てgc２がONならば全行が接続されるためＭ
×Ｎ台の１次元配列プロセツサが形成される。さ
らに、個々のバス接続機構S_iに対して接続、切離
しを行えば種々のプロセツサ配列が形成される。 Therefore, if the gate control signal gc1 is turned ON in all the bus connection mechanisms S _i to S _N1-1 , there will be no connection between rows, resulting in a two-dimensional array processor with M rows and N columns, and conversely, if gc2 is turned ON in all bus connection mechanisms, M because all rows are connected
×N one-dimensional array processors are formed. Furthermore, by connecting and disconnecting each bus connection mechanism S _i , various processor arrays can be formed.

第５図はプロセツサ配列の構成例を示したもの
で、この例では16台のプロセツサから成る４行４
列の２次元配列を基本構成とし、各行間に３つの
バス接続機構S₁〜S₃を設けている。バス接続機構
S_iに与えるバス接続制御信号bc₁が“０”のとき
行間切離し、“１”のとき行間接続を行うものと
すると、各バス接続機構S_iに与えるバス接続制御
信号bc_iの組合せにより、第５図に示すように正
方形、長方形、及び凹凸のある複雑な形状のプロ
セツサ配列が形成できる。 Figure 5 shows an example of the configuration of a processor array. In this example, there are 4 rows and 4
The basic configuration is a two-dimensional array of columns, and three bus connection mechanisms S ₁ to _{S 3} are provided between each row. bus attachment
Assuming that when the bus connection control signal bc ₁ given to S _i is "0", the rows are separated, and when it is "1", the rows are connected, then the combination of the bus connection control signals bc _i given to each bus connection mechanism S _i is as follows: As shown in FIG. 5, processor arrays with complex shapes such as squares, rectangles, and uneven shapes can be formed.

ただし、列方向のデータ転送バスの接続は常に
基本構成と変らずP_1,j→P_2,j→……→P_M,j→P_1,jで
あるため、プロセツサ配列を変更すると列方向の
データ転送距離は長くなる。例えば第５図ｆに示
した２行８列構成では、P_1,1の次行同列はP_3,1で
あるが、実際のデータ転送はP_1,1→P_2,1→P_3,1の経
路で実行される。しかしながら、これは離れたプ
ロセツサ間のデータ転送が容易にかつ高速に実行
できる方式であれば、プログラム処理上及び時間
的にはほとんど問題にならず、実質的に隣接プロ
セツサとのデータ転送と同じ扱いができる。 However, since the connection of the data transfer bus in the column direction is always the same as the basic configuration: P _1,j →P _2,j →……→P _M,j →P _1,j , changing the processor arrangement The data transfer distance becomes longer. For example _, in the 2 _rows and 8 columns configuration _shown in _FIG _{. 1} route is executed. However, if the method allows data transfer between distant processors to be performed easily and quickly, this will pose little problem in terms of program processing and time, and will essentially be treated the same as data transfer between adjacent processors. I can do it.

第６図は第４図に示したバス接続機構S_iを論理
回路で具体的に示したものである。第６図におい
てはバス接続制御信号bcは、行間の接続か切離
しかを“１”（接続、“０”（切離し）で指定する
接続信号cncと、接続、切離し動作を実行させる
トリガ信号trgより成るものとする。FFはＤ型フ
リツプフロツプ、Ｇ１〜Ｇ４はスリーステートバ
ツフアと一般に呼ばれている論理素子であり、第
４図のゲート制御回路１１は１個のＤ型フリツプ
フロツプに、ゲート回路２１〜２４はそれぞれ１
組のスリーステートバツフアＧ１〜Ｇ４を用い
て、極めて簡単に実現できる。 FIG. 6 specifically shows the bus connection mechanism S _i shown in FIG. 4 using a logic circuit. In FIG. 6, the bus connection control signal bc is composed of a connection signal cnc that specifies connection or disconnection between rows with "1" (connection) or "0" (disconnection), and a trigger signal trg that executes the connection or disconnection operation. FF is a D-type flip-flop, G1 to G4 are logic elements commonly called three-state buffers, and the gate control circuit 11 in FIG. ~24 are each 1
This can be realized extremely easily using a set of three-state buffers G1 to G4.

以上のように、本発明の並列処理計算機ではプ
ロセツサ配列が可変であり、処理する問題の格子
点配列形状に合わせて最も効率良いプロセツサ配
列を形成することが可能である。 As described above, in the parallel processing computer of the present invention, the processor array is variable, and it is possible to form the most efficient processor array according to the lattice point array shape of the problem to be processed.

なお、前述の実施例では各行にバス接続機構を
設けたものを示したが、各列に設けても同様の機
能が得られることは明らかである。さらに、行及
び列の両方にバス接続機構を設け、適用問題によ
り随時使い分けることも可能である。 Although the above-mentioned embodiments have been shown in which a bus connection mechanism is provided in each row, it is clear that the same function can be obtained even if the bus connection mechanism is provided in each column. Furthermore, it is also possible to provide bus connection mechanisms for both rows and columns, and use them as needed depending on the application problem.

コントロールプロセツサCUは、個々のバス接
続機構S_iに対してバス接続制御信号bc_iを与える
機能を持つものである。この例としては、最も簡
単なものは各バス接続制御信号bc_iに対応させた
スイツチを設け、手動により各々のバス接続機構
S_iの制御を行うものである。あるいは、プロセツ
サとデイジタル信号出力回路を用いて、計算格子
点配列から最適プロセツサ配列をプロセツサで求
め、その結果によりデイジタル信号出力回路を介
してバス接続機構S_iにバス接続信号bc_iを与える
ことも可能である。 The control processor CU has a function of providing a bus connection control signal bc _i to each bus connection mechanism S _i . The simplest example of this is to provide a switch corresponding to each bus connection control signal bc _i , and manually control each bus connection mechanism.
It controls S _i . Alternatively, a processor and a digital signal output circuit may be used to determine the optimal processor array from the calculation grid point array, and the result may be used to provide the bus connection signal bc _i to the bus connection mechanism S _i via the digital signal output circuit. It is possible.

第７図は計算格子点配列から最適プロセツサ配
列を求める計算方法の１例を示したものである。
この例において、計算格子点行、列数をｍ，ｎ、
プロセツサの行、列数をＭ，Ｎとし、格子点数は
プロセツサ数以上とする。具体的手順は次の様で
ある。 FIG. 7 shows an example of a calculation method for determining an optimal processor arrangement from a calculation grid point arrangement.
In this example, the number of calculation grid point rows and columns are m, n,
The number of rows and columns of the processors is M and N, and the number of grid points is greater than the number of processors. The specific steps are as follows.

(1) 初期値として計算格子点の行列比ａとプロセ
ツサの行列比αを求め、配列変更回数Ｉ＝０と
する。(1) As initial values, find the matrix ratio a of the calculation grid point and the matrix ratio α of the processor, and set the number of arrangement changes I=0.

(2) 計算格子点行列比ａとプロセツサ行列比αを
比較し、ａ＞αならばプロセツサ配列を変更す
るために配列変更回数Ｉを更新し、プロセツサ
行列比αを変更する。(2) Compare the calculated lattice point matrix ratio a and the processor matrix ratio α, and if a>α, update the number of array changes I to change the processor array, and change the processor matrix ratio α.

(3) ａ＝αまたはａ＜αとなるまで(2)を繰り返
し、途中でαがプロセツサ台数以上になつたら
打切る。(3) Repeat (2) until a=α or a<α, and stop when α becomes equal to or greater than the number of processors.

(4) 以上の手順で得られた配列変更回数Ｉより、
バス接続制御信号を作成する。Ｍ−１ビツトの
２進データBCを考え、下位ビツトより順にバ
ス接続機構S₁，S₂……S_M-1へのバス接続制御信
号になるとし、“１”のとき隣接行と接続する
ものとする。(4) From the number of array changes I obtained in the above steps,
Create bus connection control signals. Considering binary data BC of M-1 bits, it is assumed that the lower bits become bus connection control signals to bus connection mechanisms S ₁ , S ₂ ...S _M-1 in order, and when it is "1", it is connected to the adjacent row. shall be taken as a thing.

２進データBCは、Ｋ＝2^Iと置き、下位よりｉ・Ｋ（ｉ＝１，２，
……）ビツトを“０”とし、他のビツトを
“１”にすることにより作られる。 Binary data BC is set as K = 2 ^I , and i K (i = 1, 2,
...) is created by setting one bit to "0" and setting the other bits to "1".

次に本発明の並列処理計算機による計算例を示
す。 Next, an example of calculation by the parallel processing computer of the present invention will be shown.

(1) ２次元問題２次元Poisson方程式∂φ／∂t＝∂²φ／∂x²＋∂²φ
／∂y²（ｔ：時間、ｘ，ｙ：行、列方向位置、φ：求める変
数）の差分解法を考える。すなわち、 φ^(k+1) _i,j＝λφ^(K) _i-1,j＋（１−2λ）φ^(K) _i,j＋
λφ^(K) _i+1,j ＋λφ^(K) _i,j-1＋（１−2λ）φ^(K) _i,j＋λφ^(K) _i,
j+1 （λ＝Δt／Δx²＝Δt／Δy²、Δt：時間間隔、
Δx，Δy：格子点間隔、ｉ，ｊ：２次元格子の配
列を表わす添時、Ｋ：時間ステツプ）を境界条件 φ^(K) _p,j＝C^(K) _p,i，φ^(K) _n+1,j＝C^(K) _n+1,j φ^(K) _i,p＝C^(K) _i,p，φ^(K) _i,o+1＝C^(K) _1,o+1 を与えて、各格子点ｉ，ｊでのφ_i,jを計算する。(1) Two-dimensional problem Two-dimensional Poisson equation ∂φ／∂t＝∂ ² φ／∂x ² +∂ ² φ
/∂y ² (t: time, x, y: position in row and column directions, φ: variable to be sought) will be considered. That is, φ ^(k+1) _i,j = λφ ^(K) _i-1,j + (1-2λ)φ ^(K) _i,j +
λφ ^(K) _i+1,j +λφ ^(K) _i,j-1 + (1−2λ)φ ^(K) _i,j +λφ ^(K) _{i,
j+1} (λ=Δt/Δx ² =Δt/Δy ² , Δt: time interval,
Δx, Δy: lattice point spacing, i, j: time representing the two-dimensional lattice arrangement, K: time step) are the boundary conditions φ ^(K) _p,j = C ^(K) _p,i , φ ^(K) _n+1,j ＝C ^(K) _n+1,j φ ^(K) _i,p ＝C ^(K) _i,p ，φ ^(K) _i,o+1 ＝C ^(K) _1,o+1 , and calculate φ _i, j at each grid point i,j.

ここで、プロセツサ台数は256台を考え、基本
構成Ｍ行Ｎ列を16行16列とする。また計算格子点
は16行256列を考える。具体的な計算手順を以下
に示す。 Here, the number of processors is assumed to be 256, and the basic configuration M rows and N columns is 16 rows and 16 columns. Also, consider 16 rows and 256 columns of calculation grid points. The specific calculation procedure is shown below.

(i) φ_i,jの計算実行前にコントロールプロセツサ
CUにより、前述の手順でプロセツサ配列を変
更する。計算格子点の行列比は256／16＝16でありプロセツサ行列比は１であるので、前述の計算
方法に従えば、配列変更回数は２となり、プロ
セツサ配列は４行64列に変更される。これによ
り、各プロセツサで計算を受持つ格子点配列は
４行４列となる。(i) Before executing the calculation of φ _i,j, the control processor
Change the processor array using the CU using the procedure described above. Since the matrix ratio of calculation grid points is 256/16=16 and the processor matrix ratio is 1, according to the calculation method described above, the number of times the arrangement is changed is 2, and the processor arrangement is changed to 4 rows and 64 columns. As a result, the lattice point array for which each processor is responsible for calculations has four rows and four columns.

(ii) 各プロセツサにおいて、４行４列の計算格子
点配列の端辺のデータ合計16個を隣接プロセツ
サに転送する。すなわち、格子点の添字をi′，
j′とおくと、φ^(K) _i′_,1（i′＝１〜４）を左、φ^(K) _i′_,4（i′＝
１〜４）を右、φ^(K) _1,j′（j′＝１〜４）を上、
φ_4,j′（j′＝１〜４）を下のプロセツサに転送す
る。(ii) Each processor transfers a total of 16 edge data of the 4 rows and 4 columns of calculation grid point array to the adjacent processor. In other words, the subscript of the grid point is i′,
j′, φ ^(K) _i ′ _,1 (i′=1 to 4) is on the left, φ ^(K) _i ′ _,4 (i′=
1 to 4) to the right, φ ^(K) _1,j ′ (j′=1 to 4) to the top,
φ _4,j ′ (j′=1 to 4) is transferred to the processor below.

(iii) 各プロセツサは自己プロセツサが持つている
φ^(K) _i′_,j′（i′，j′＝１〜４）と上，下，左，右か
らの
データを用いてφ^(k+1) _i′_,j′を計算する。 ₍ iii ⁾ _Each processor calculates φ ^{(k+ 1)} Calculate _i ′ _,j ′.

(iv) ｋを更新して(ii)，(iii)を繰り返す。(iv) Update k and repeat (ii) and (iii).

この問題をプロセツサ配列を変更せずに実行し
た場合は、各プロセツサは１行16列の格子点計算
を行うことになる。したがつて、この場合プロセ
ツサ間で転送するデータ数は１×２＋16×２＝34
個となり、本発明による計算の２倍以上となる。 If this problem is executed without changing the processor array, each processor will calculate grid points of 1 row and 16 columns. Therefore, in this case, the number of data transferred between processors is 1 x 2 + 16 x 2 = 34
This is more than twice the calculation according to the present invention.

(2) １次元問題前述と同じ方程式の１次元問題∂φ／∂t＝∂²φ／∂
x²を考える。プロセツサ台数は16行16列の256台とし、
計算格子点はプロセツサ数と同じ256点とする。(2) One-dimensional problem One-dimensional problem with the same equation as above ∂φ/∂t=∂ ² φ/∂
Consider x ² . The number of processors is 256 with 16 rows and 16 columns.
The number of calculation grid points is 256, which is the same as the number of processors.

計算手順は２次元問題と全く同様であり、以下
の様である。 The calculation procedure is exactly the same as the two-dimensional problem, and is as follows.

(i) プロセツサ配列を１行256列に変更する。各
プロセツサは格子点１点を受け持つ。(i) Change the processor array to 1 row and 256 columns. Each processor is responsible for one grid point.

(ii) 各プロセツサは左右のプロセツサにデータ
φ^(K)を転送する。(ii) Each processor transfers data φ ^(K) to the left and right processors.

(iii) 自己プロセツサのデータφ^(K)と左右からのデ
ータを用いてφ^(k+1)を計算する。(iii) Calculate φ ^(k+1) using data φ ^(K) of the self-processor and data from the left and right sides.

(iv) ｋを更新して(ii)，(iii)を繰返す。(iv) Update k and repeat (ii) and (iii).

この問題をプロセツサ配列を変更せずに計算す
る場合、各プロセツサへの格子点割当ては第２図
に示したようになる。第２図ａのように１行のプ
ロセツサのみを使用した場合は、各プロセツサで
は16格子点の計算を行うことになり、本発明の方
法に比べ計算時間が大幅に増大することは明らか
である。また、第２図ｂのような格子点割当を行
うと、各プロセツサが受け持つ格子点数は１点と
なり、本発明のものと同じであるが、データ転送
方向が各行の中央部と端で異なる。すなわち、中
央部ではデータを左右のプロセツサに転送する
が、左端及び右端のプロセツサでは位置により左
または右と上または下となる。このためデータ転
送処理時に自己プロセツサの位置の判定とこれに
よる転送方向判断の処理が必要となる。したがつ
て、これらの処理増加により、本発明の方法より
計算時間が増す結果となる。 When this problem is calculated without changing the processor arrangement, the grid points are assigned to each processor as shown in FIG. If only one row of processors is used as shown in Figure 2a, each processor will calculate 16 grid points, and it is clear that the calculation time will be significantly increased compared to the method of the present invention. . Furthermore, when the grid points are allocated as shown in FIG. 2b, the number of grid points handled by each processor is one, which is the same as in the present invention, but the data transfer direction is different between the center and the ends of each row. That is, data is transferred to the left and right processors at the center, but the processors at the left and right ends are transferred to the left or right, and above or below, depending on their position. Therefore, during data transfer processing, it is necessary to determine the position of the own processor and to determine the transfer direction based on this determination. Therefore, these increases in processing result in an increase in calculation time compared to the method of the present invention.

なお、以上の実施例では、複数個のバス接続機
構を設けているが、バス接続機構内のスイツチン
グ回路の数を増加させることにより１個のバス接
続機構で任意の複数行または列のプロセツサを一
次元接続することもできる。 In the above embodiment, a plurality of bus connection mechanisms are provided, but by increasing the number of switching circuits in the bus connection mechanism, it is possible to connect processors in any number of rows or columns with one bus connection mechanism. A one-dimensional connection is also possible.

〔Effect of the invention〕

以上述べたように、本発明によれば適用問題の
格子点形状に応じてプロセツサの配列変更がで
き、プロセツサ間の転送データ数を最小にするこ
とができるため、データ転送に要する時間の短縮
すなわち計算効率が向上する効果がある。 As described above, according to the present invention, the arrangement of processors can be changed according to the grid point shape of the applied problem, and the number of data transferred between processors can be minimized, thereby reducing the time required for data transfer. This has the effect of improving calculation efficiency.

[Brief explanation of drawings]

第１図は本発明の並列処理計算機の構成を示す
ブロツク図、第２図は従来の並列処理計算機によ
る一次元問題処理時の格子点割当例、第３図はプ
ロセツサ配列と計算格子点配列による転送データ
数の差を示す表、第４図は第１図におけるバス接
続機構の構成を示すブロツク図、第５図はプロセ
ツサ配列の構成例、第６図はバス接続機構の論理
回路例、第７図はプロセツサ配列計算手順を示す
フローチヤートである。 P_ij…プロセツサ、S_i…バス接続機構、CU…コ
ントロールプロセツサ、１１…ゲート制御回路、
２１〜２４…ゲート回路、bc_i…バス接続制御信
号、gc１，gc２…ゲート制御信号。 Figure 1 is a block diagram showing the configuration of the parallel processing computer of the present invention, Figure 2 is an example of grid point assignment when processing a one-dimensional problem using a conventional parallel processing computer, and Figure 3 is based on the processor array and calculation grid point array. A table showing the difference in the number of transferred data, FIG. 4 is a block diagram showing the configuration of the bus connection mechanism in FIG. 1, FIG. FIG. 7 is a flowchart showing the processor array calculation procedure. P _ij ...processor, S _i ...bus connection mechanism, CU...control processor, 11...gate control circuit,
21 to 24...Gate circuit, bc _i ...Bus connection control signal, gc1, gc2...Gate control signal.

Claims

[Scope of Claims] 1. In a parallel processing computer configured by arranging a plurality of processors in a two-dimensional array and connecting the processors in each row and column constituting the two-dimensional array, A parallel processing computer characterized by having a bus connection mechanism that has a function of serially connecting units or columns. 2. In the parallel processing computer according to claim 1, a plurality of the bus connection mechanisms are provided corresponding to each row or column, and each bus connection mechanism has an input/output terminal connected to a processor bus or other bus connection. A parallel processing computer comprising four switching circuits connected to a mechanism and a control circuit that controls opening and closing of each switching circuit. 3. In the parallel processing computer according to claim 2, the control circuit within the bus connection mechanism comprises 1
A parallel processing computer characterized in that opening and closing are controlled simultaneously by bus connection control signals from two control processors.