JPH04247570A

JPH04247570A - Parallel processing arithmetic device for simultaneous linear equations

Info

Publication number: JPH04247570A
Application number: JP3364891A
Authority: JP
Inventors: Hiroo Kaneko; 金子　博夫; Koji Ueyama; 植山　高次
Original assignee: Nippon Steel Corp
Current assignee: Nippon Steel Corp
Priority date: 1991-02-01
Filing date: 1991-02-01
Publication date: 1992-09-03

Abstract

PURPOSE:To efficiently execute an entire processing at high speed by transfer ring data between a secondary storage device and the memory of a processor group while the processor group parallelly executes calculation by using the secondary storage device to store the coefficient matrix of a simultaneous linear equations. CONSTITUTION:The first H rows of the coefficient matrix are transmitted from a secondary storage device 3 through a main processor 1 to a processor group 2. When the number of processors is defined as (n), each processor takes charge of H/n rows. When data are received, the processor group 2 starts calculation and when the first row is calculated, the processor in charge of this row calculates the next charged row and simultaneously transmits the data of the first row completing LU decomposition through a link 5 and the main processor 1 to the secondary storage device 3. The main processor 1 receives the data of the (H+1)th row from the secondary storage device 3 and transmits it to the processor group 2, and the LU decomposition is advanced.

Description

[Detailed description of the invention]

【０００１】0001

【産業上の利用分野】本発明は、有限要素法や差分法を
行うときに生ずる大規模連立一次方程式を並列処理装置
を用いて高速に、しかも簡単な構成によって演算処理す
るようにした並列処理演算装置に関するものである。[Industrial Application Field] The present invention is a parallel processing method that uses a parallel processing device to process large-scale simultaneous linear equations that occur when performing the finite element method or the finite difference method at high speed and with a simple configuration. It relates to arithmetic devices.

【０００２】0002

【従来の技術】有限要素法や差分法の連立一次方程式を
並列処理演算装置を用いて解く場合、解法アルゴリズム
としては共役勾配法が用いられることが多かった（例え
ば特開昭６３−９５５６８号公報、特開昭６３−１２７
３６６号公報）。これは、共役勾配法が係数マトリクス
の非零要素のデータしか必要としないため、並列処理を
行うプロセッサ群のメモリ上に必要なデータを全て載せ
ることができたからである。しかし、反復法である共役
勾配法では、解を求めるのに非常に時間がかかったり、
或いは、収束せずに、解が求められないことがある。[Prior Art] When solving simultaneous linear equations using the finite element method or the finite difference method using a parallel processing arithmetic unit, the conjugate gradient method is often used as the solution algorithm (for example, Japanese Patent Laid-Open No. 63-95568 , Japanese Patent Publication No. 63-127
Publication No. 366). This is because the conjugate gradient method requires only data on non-zero elements of the coefficient matrix, so all the necessary data could be stored in the memory of a group of processors that perform parallel processing. However, the conjugate gradient method, which is an iterative method, takes a very long time to find a solution.
Alternatively, the solution may not be found without convergence.

【０００３】一方、並列処理演算装置上でＬＵ分解法を
用いて連立一次方程式を解いた例もある。ＬＵ分解法は
直接法なので、問題を解くのにかかる時間は係数マトリ
クスの次元とバンド幅（後述）だけに依存し、必ず解を
得ることができる。On the other hand, there are also examples of solving simultaneous linear equations using the LU decomposition method on a parallel processing arithmetic unit. Since the LU decomposition method is a direct method, the time required to solve the problem depends only on the dimension and bandwidth of the coefficient matrix (described later), and a solution can always be obtained.

【０００４】0004

【発明が解決しようとする課題】しかしながら、ＬＵ分
解法を用いるときに１つの問題があり、それは係数マト
リクスの次元数が大きい場合は、プロセッサ群のメモリ
上に係数マトリクスの全てのデータを載せることが困難
になることである。このため、係数マトリクスは２次記
憶装置上に記憶されることが多いが、その場合には、２
次記憶装置と並列処理を行うプロセッサ群とのデータの
Ｉ／Ｏに時間がかかり、並列処理の性能を低下させるこ
とになる（例えば、戸川隼人著「マトリクスの数値計算
法」（オーム社）１９７１年発行、ｐ５２〜５３、星野
力著「ＰＡＸコンピュータ」（オーム社）１９８５年発
行、第３〜４章）。[Problem to be Solved by the Invention] However, there is one problem when using the LU decomposition method, and that is when the number of dimensions of the coefficient matrix is large, it is necessary to store all the data of the coefficient matrix on the memory of the processor group. It becomes difficult. For this reason, the coefficient matrix is often stored on a secondary storage device;
Data I/O between the storage device and the processor group that performs parallel processing takes time, which reduces the performance of parallel processing (for example, "Matrix Numerical Computation Method" by Hayato Togawa (Ohmsha) 1971 Published in 1985, pp. 52-53, "PAX Computer" by Tsutomu Hoshino (Ohmsha), published in 1985, chapters 3-4).

【０００５】そこで、本発明は、連立一次方程式の解法
アルゴリズムにＬＵ分解法を用いる場合、係数マトリク
スデータがプロセッサ群のメモリに納まり切らずに２次
記憶装置を使う場合でも、並列処理の性能を低下させず
に、高速に連立一次方程式を解くことができる並列処理
演算装置を提供することを目的とする。Therefore, the present invention aims to improve parallel processing performance when using the LU decomposition method as a solution algorithm for simultaneous linear equations, even when the coefficient matrix data cannot be stored in the memory of a group of processors and a secondary storage device is used. It is an object of the present invention to provide a parallel processing arithmetic device capable of solving simultaneous linear equations at high speed without deterioration.

【０００６】[0006]

【課題を解決するための手段】上記課題を解決するため
に、本発明では、ＬＵ分解法により連立一次方程式を解
く並列処理演算装置において、上記連立一次方程式の係
数マトリクスを記憶する２次記憶装置と、上記２次記憶
装置に接続されたメインプロセッサと、夫々リンクを介
して上記メインプロセッサに接続されたｎ個のプロセッ
サからなり、各プロセッサが、上記２次記憶装置から上
記メインプロセッサを通じて送られてくる係数マトリク
スのｎ行おきの計算を担当するようになされたプロセッ
サ群と、上記プロセッサ群に接続され、上記プロセッサ
群が上記計算を行う間に、上記２次記憶装置との間でデ
ータの転送を行うようになされたメモリとを具備する。[Means for Solving the Problems] In order to solve the above problems, the present invention provides a parallel processing arithmetic device that solves simultaneous linear equations using the LU decomposition method, and a secondary storage device that stores the coefficient matrix of the simultaneous linear equations. , a main processor connected to the secondary storage device, and n processors each connected to the main processor via a link, each processor receiving data sent from the secondary storage device through the main processor. A processor group is connected to the processor group and is in charge of calculating every nth row of the coefficient matrix to be generated, and data is transferred between the processor group and the secondary storage device while the processor group performs the calculation. and a memory adapted to perform the transfer.

【０００７】なお、上記プロセッサ群の各プロセッサは
、上記２次記憶装置から上記メインプロセッサを通じて
送られてくる係数マトリクスのｌｎ行（ｌ＝１、２、３
、…）おきの計算を担当するように構成することができ
る。[0007] Each processor of the processor group receives ln rows (l=1, 2, 3) of the coefficient matrix sent from the secondary storage device through the main processor.
,...) can be configured to take charge of every calculation.

【０００８】また、上記プロセッサ群の各プロセッサが
、上記２次記憶装置から上記メインプロセッサを通じて
送られてくる係数マトリクスのｎ列おきの計算を担当す
るように構成しても良い。[0008] Furthermore, each processor in the processor group may be configured to be in charge of calculating every nth column of the coefficient matrix sent from the secondary storage device through the main processor.

【０００９】更に、上記プロセッサ群の各プロセッサが
、上記２次記憶装置から上記メインプロセッサを通じて
送られてくる係数マトリクスのｌｎ列（ｌ＝１、２、３
、…）おきの計算を担当するように構成することもでき
る。Furthermore, each processor of the processor group receives ln columns (l=1, 2, 3) of the coefficient matrix sent from the secondary storage device through the main processor.
,...) can also be configured to take charge of every calculation.

【００１０】0010

【作用】本発明においては、連立一次方程式の係数マト
リクスを記憶する２次記憶装置を用い、プロセッサ群が
並列的に計算を行う間に、この２次記憶装置とプロセッ
サ群のメモリとの間でデータの転送を行うように構成し
ているので、全体の処理を効率良く且つ高速に行うこと
ができる。[Operation] In the present invention, a secondary storage device that stores coefficient matrices of simultaneous linear equations is used, and while a group of processors performs calculations in parallel, data is stored between this secondary storage device and the memory of the processor group. Since it is configured to transfer data, the overall processing can be performed efficiently and at high speed.

【００１１】[0011]

【実施例】以下、本発明を実施例につき図面を参照して
説明する。DESCRIPTION OF THE PREFERRED EMBODIMENTS The present invention will be described below with reference to the drawings.

【００１２】まず、解くべき連立一次方程式を（１）式
に示す。First, the simultaneous linear equations to be solved are shown in equation (1).

【００１３】Ａｘ＝ｂ　　　　　　　　　　　　（１）
[0013]Ax=b (1)

【００１４】Ａは係数マトリクス、ｂは右辺ベクトル、
ｘは解ベクトルである。係数マトリクスＡを図２に示す
。有限要素法や差分法ではＡの非零要素は対角の近くに
しか存在しないことが多く、このような対角要素から或
る距離以内にしか非零要素が存在しないマトリクスをバ
ンドマトリクスと呼び、非零要素が存在する幅をバンド
幅と呼ぶ。図２のマトリクスはバンドマトリクスである
。ここで、Ｍはマトリクスの次元、Ｗはバンド幅を表し
ている。Ｈは、対角要素と、対角から最も離れた非零要
素までの距離で、半バンド幅と呼ばれ、Ｗ＝２Ｈ−１の
関係がある。このバンドマトリクスに対してＬＵ分解法
を適用して解を求める。ＬＵ分解法とは、係数マトリク
スＡを上三角マトリクスＵと下三角マトリクスＬに分解
して（１）式を以下のように変形する方法である。A is a coefficient matrix, b is a right-hand side vector,
x is the solution vector. The coefficient matrix A is shown in FIG. In the finite element method and finite difference method, nonzero elements of A often exist only near the diagonal, and a matrix in which nonzero elements exist only within a certain distance from the diagonal is called a band matrix. , the width where non-zero elements exist is called the bandwidth. The matrix in FIG. 2 is a band matrix. Here, M represents the dimension of the matrix, and W represents the bandwidth. H is the distance between the diagonal element and the farthest nonzero element from the diagonal, and is called the half-bandwidth, and there is a relationship of W=2H-1. A solution is obtained by applying the LU decomposition method to this band matrix. The LU decomposition method is a method of decomposing the coefficient matrix A into an upper triangular matrix U and a lower triangular matrix L and transforming equation (1) as follows.

【００１５】ＬＵｘ＝ｂ　　　　　　　　　　（２）LUx=b (2)

【
００１６】すると、中間的なベクトルｙを設けることに
よって、解ベクトルｘは次のようにして計算できる。[
Then, by providing an intermediate vector y, the solution vector x can be calculated as follows.

【００１７】Ｌｙ＝ｂ　　　　　　　　　　　　（３）
[0017]Ly=b (3)

【００１８】Ｕｘ＝ｙ　　　　　　　　　　　　（４）
[0018] Ux=y (4)

【００１９】Ｌは下三角マトリクスなので、ｙは以下の
ようにして求められる。Since L is a lower triangular matrix, y can be found as follows.

【００２０】ｆｏｒ　ｉ＝１　ｔｏ　Ｍｙｉ＝ｂｉ　ｆｏｒ　ｊ＝１　ｔｏ　ｉ−１ｙｉ＝ｙｉ−ｂｊ＊Ｌｉｊｎｅｘｔ　ｊｙｉ＝ｙｉ／Ｌｉｉ　ｎｅｘｔ　ｉ　　　　　　　　　　　　　　　　　　　
　　　（５）for i=1 to Myi=bi for j=1 to i-1 yi=yi-bj*Lij next j yi=yi/Lii next i
(5)

【００２１】また、Ｕは上三角マトリクス
なので、解ベクトルｘは以下のようにして求められる。Furthermore, since U is an upper triangular matrix, the solution vector x can be obtained as follows.

【００２２】ｆｏｒ　ｉ＝Ｍ　ｔｏ　１ｘｉ＝ｙｉ　ｆｏｒ　ｊ＝ｉ＋１　ｔｏ　Ｍｘｉ＝ｘｉ−ｙｊ＊Ｕｉｊｎｅｘｔ　ｊｘｉ＝ｘｉ／Ｕｉｉ　ｎｅｘｔ　ｉ　　　　　　　　　　　　　　　　　　　
　　　（６）for i=M to 1xi=yi for j=i+1 to M xi=xi-yj*Uij next j xi=xi/Uii next i
(6)

【００２３】（５）式を前進代入、（６）
式を後退代入と呼ぶ。なお、（５）、（６）式はＢａｓ
ｉｃで記述されているが、これは計算方法を示すために
用いたのであって、本発明がこのプログラムを用いてい
るわけではない。Forward substitution of equation (5), (6)
The expression is called backward substitution. Note that equations (5) and (6) are based on Bas
ic, but this is used to show the calculation method, and the present invention does not use this program.

【００２４】上三角マトリクスＵ、下三角マトリクスＬ
は夫々次の（７）式で計算される。なお、（７）式も　
Ｂａｓｉｃで記述されているが、やはり、本発明でこの
プログラムが用いられているわけではない。Ｕ、Ｌの初
期値は、夫々、Ａの上三角部分、下三角部分である（但
し、対角要素はＵに含まれる。）。[0024] Upper triangular matrix U, lower triangular matrix L
are calculated using the following equations (7). In addition, equation (7) also
Although it is written in Basic, this program is not used in the present invention. The initial values of U and L are the upper triangular part and lower triangular part of A, respectively (however, the diagonal elements are included in U).

【００２５】ｆｏｒ　ｉ＝１　ｔｏ　ＭＬｉｉ＝１　ｆｏｒ　ｊ＝１　ｔｏ　Ｈ−１Ｌｉ＋ｊ，ｉ＝Ｌｉ＋ｊ，ｉ／Ｕｉ，ｉｆｏｒ　ｋ＝１
　ｔｏ　ｊ−１Ｌｉ＋ｊ，ｉ＋ｋ＝Ｌｉ＋ｊ，ｉ＋ｋ−Ｕｉ，ｉ＋ｋ＊
Ｌｉ＋ｊ，ｉ　ｎｅｘｔ　ｋｆｏｒ　ｋ＝ｊ　ｔｏ　Ｈ−１Ｕｉ＋ｊ，ｉ＋ｋ＝Ｕｉ＋ｊ，ｉ＋ｋ−Ｕｉ，ｉ＋ｋ＊
Ｌｉ＋ｊ，ｉ　ｎｅｘｔ　ｋｎｅｘｔ　ｊｎｅｘｔ　ｉ　　　　　　　　　　　　　　　　　　　
　　　（７）for i=1 to MLii=1 for j=1 to H−1 Li+j, i=Li+j, i/Ui, ifor k=1
to j−1 Li+j, i+k=Li+j, i+k−Ui, i+k*
Li+j,i next k for k=j to H-1 Ui+j,i+k=Ui+j,i+k-Ui,i+k*
Li+j,i next k next j next i
(7)

【００２６】この（７）式の計算は、（５
）、（６）式の計算よりもずっと時間がかかり、連立一
次方程式を解く時間の大部分を占める。The calculation of this equation (7) is as follows: (5
), it takes much more time than calculating equation (6), and occupies most of the time to solve the simultaneous linear equations.

【００２７】本実施例では、この（７）式の計算を並列
化する。In this embodiment, the calculation of equation (7) is parallelized.

【００２８】図１に本実施例のハードウェアシステムを
示す。図中、１はメインプロセッサ、２はｎ個のプロセ
ッサからなるプロセッサ群、３は２次記憶装置、４はメ
モリ、５はリンクを示している。プロセッサ群２のプロ
セッサはリンク５を介して、計算と同時に外部とデータ
のやりとりができるようになされている。FIG. 1 shows the hardware system of this embodiment. In the figure, 1 is a main processor, 2 is a processor group consisting of n processors, 3 is a secondary storage device, 4 is a memory, and 5 is a link. The processors in the processor group 2 are configured to be able to perform calculations and exchange data with the outside via the link 5.

【００２９】本実施例において、プロセッサ群２の各プ
ロセッサは、係数マトリクスＵ、Ｌのｎ行おきの行を計
算する。図３に、プロセッサが４個の場合の各プロセッ
サ（プロセッサ番号０、１、２、３）と行との対応の例
を示す。各プロセッサは、自分の担当する行の要素に対
してのみ（７）式の計算を行う。In this embodiment, each processor of processor group 2 calculates every nth row of coefficient matrices U and L. FIG. 3 shows an example of the correspondence between each processor (processor numbers 0, 1, 2, and 3) and rows when there are four processors. Each processor calculates equation (7) only for the elements in the row for which it is responsible.

【００３０】本実施例における演算処理方法を図５のフ
ローチャートを参照しながら説明する。The arithmetic processing method in this embodiment will be explained with reference to the flowchart of FIG.

【００３１】（７）式の最も外側のループ（ループカウ
ンタｉ）の１ステップ分の計算に必要なデータは、図４
に示す陰のついた部分である。陰よりも左上の部分は既
にＬＵ分解の終了した部分である。そこで、プロセッサ
群２は、陰のついた部分を含む行だけをメモリに記憶し
、（７）式の１ステップの計算を行う。The data required to calculate one step of the outermost loop (loop counter i) in equation (7) is shown in FIG.
This is the shaded area shown in . The portion to the upper left of the shadow is a portion where LU decomposition has already been completed. Therefore, processor group 2 stores only the rows including the shaded portions in memory, and performs one-step calculation of equation (7).

【００３２】例えば、ｉ番目のステップでは、ｉ行目か
らｉ＋Ｈ−１行目が計算される。ｉ＋ｊ行目（１≦ｊ＜
Ｈ）を担当しているプロセッサが計算しなければならな
い要素は、Ｌｉ＋ｊ，ｉ＋ｋ（０≦ｋ＜ｊ）、Ｕｉ＋ｊ
，ｉ＋ｋ（ｊ≦ｋ＜Ｈ）であり、その計算に必要なデー
タは、（７）式より、Ｌｉ＋ｊ，ｉ＋ｋ（０≦ｋ＜ｊ）
、Ｕｉ＋ｊ，ｉ＋ｋ（ｊ≦ｋ＜Ｈ）及びＵｉ，ｉ＋ｋ（
０≦ｋ＜Ｈ）である。従って、ｉ＋ｊ行を担当している
プロセッサは、ｉ＋ｊ行のデータとｉ行のデータがあれ
ば（７）式の計算を行うことができる。このため、ｉ番
目のステップでは、ｉ行のデータを持った（ｉ行を担当
した）プロセッサは、ｉ行のデータを全てのプロセッサ
に転送し、それを受け取った各プロセッサは、夫々の担
当する行の計算を行う（図５■）。ｉ番目のステップが
終了すれば、もはやｉ行は（７）式の計算に必要なくな
るので、２次記憶装置３に送られ（図５■）、２次記憶
装置３からはｉ＋Ｈ行が送られてきて、ｉ＋１番目のス
テップが始まる。ｉ＋ＨがＭより大きい場合は、当然、
データの転送はない（図５■）。For example, in the i-th step, the i-th to i+H-1 rows are calculated. i+j row (1≦j<
The elements that the processor in charge of H) must calculate are Li+j, i+k (0≦k<j), Ui+j
, i+k (j≦k<H), and the data necessary for the calculation is Li+j, i+k (0≦k<j) from equation (7).
, Ui+j,i+k(j≦k<H) and Ui,i+k(
0≦k<H). Therefore, the processor in charge of row i+j can perform the calculation of equation (7) if it has the data of row i+j and the data of row i. Therefore, in the i-th step, the processor that has the i-row data (is in charge of the i-row) transfers the i-row data to all processors, and each processor that receives it transfers the data to its respective Perform row calculations (Figure 5 ■). When the i-th step is completed, the i-th row is no longer needed for the calculation of equation (7), so it is sent to the secondary storage device 3 (Fig. 5 ■), and the i+H row is sent from the secondary storage device 3. Then, the i+1th step begins. Naturally, if i+H is larger than M,
There is no data transfer (Figure 5■).

【００３３】以上に述べた方法ならば、プロセッサ群２
のメモリ４が記憶するデータ量はＨ２　のオーダーにと
どまり、係数マトリクスが余程大きなものでない限り、
メモリ４内に納まる。そして、、ｉ番目のステップでは
、ｉ行目は参照されるだけで更新されないので、プロセ
ッサ群２が計算を行っている間にｉ行を２次記憶装置３
に送り、次のステップの計算に必要なｉ＋Ｈ行を受け取
れば、例えば２次記憶装置３にディスクを使用したよう
な場合、ディスクＩ／Ｏと計算とは並列に行われ、ディ
スクＩ／Ｏの時間は計算時間に隠れてしまう（図４参照
）。即ち、このステップでＬＵ分解の終了する行は最初
に計算されるので、以下の行を計算している間にディス
クに送ることができる。なお、図５における■、■、■
の処理は同時に行われる。According to the method described above, processor group 2
The amount of data stored in the memory 4 remains on the order of H2, and unless the coefficient matrix is extremely large,
It fits in memory 4. Then, in the i-th step, the i-th row is only referenced and not updated, so while the processor group 2 is performing calculations, the i-th row is stored in the secondary storage 3.
For example, if a disk is used as the secondary storage device 3, disk I/O and calculation will be performed in parallel, and the disk I/O will be processed in parallel. The time is hidden in the calculation time (see Figure 4). That is, in this step, the row where the LU decomposition ends is calculated first, so it can be sent to disk while the following rows are being calculated. In addition, ■, ■, ■ in Figure 5
processing is performed simultaneously.

【００３４】図１に示したシステムで連立一次方程式を
ＬＵ分解法を用いて解く場合、最初にメインプロセッサ
１を通じて２次記憶装置３からプロセッサ群２へ係数マ
トリクスの最初のＨ行が送られる。プロセッサの数をｎ
とすると、各プロセッサはＨ／ｎ（端数切上げ）の行を
担当することになる。データを受け取ると、プロセッサ
群２では、直ちに（７）式の第１ステップの計算を始め
る。１行目の計算が行われると、この行を担当している
プロセッサは、次の担当行の計算を行うと同時に、ＬＵ
分解が終了した１行目のデータをリンク５及びメインプ
ロセッサ１を介して２次記憶装置３へ送る（実際は、１
行目は計算によって更新されるのではなく、２行目から
Ｈ行目を計算するときに参照されるだけであるが）。ま
た、メインプロセッサ１は２次記憶装置３からＨ＋１行
目のデータを受け取り、リンク５を介してプロセッサ群
２へ送る。こうしてＬＵ分解を進めてゆき、プロセッサ
群２は、ステップｉでは、ｉ行目の分解を終えて２次記
憶装置３に送り、ｉ＋Ｈ行目を２次記憶装置３から受け
取る。ｉ＜Ｍ−Ｈの場合はデータを受け取ることはなく
なり、計算を行って、２次記憶装置３へ分解の終了した
行のデータを送るだけである。When simultaneous linear equations are solved using the LU decomposition method in the system shown in FIG. 1, the first H rows of the coefficient matrix are first sent from the secondary storage device 3 to the processor group 2 through the main processor 1. The number of processors is n
Then, each processor is responsible for H/n (rounded up) rows. Upon receiving the data, processor group 2 immediately starts calculating the first step of equation (7). When the calculation of the first row is performed, the processor in charge of this row calculates the next row, and at the same time
The data of the first line that has been decomposed is sent to the secondary storage device 3 via the link 5 and the main processor 1 (actually, the first line data is
The row is not updated by calculation, but is only referenced when calculating rows 2 to H). Furthermore, the main processor 1 receives data from the H+1 row from the secondary storage device 3 and sends it to the processor group 2 via the link 5. Proceeding with the LU decomposition in this way, in step i, the processor group 2 finishes decomposing the i-th row and sends it to the secondary storage device 3, and receives the i+H row from the secondary storage device 3. In the case of i<MH, no data is received, and only the data of the line that has been decomposed is sent to the secondary storage device 3 after calculation.

【００３５】なお、上記の実施例では、各プロセッサが
計算を担当する行をｎ行おきに持った場合を説明したが
、図６に示すように、担当する行を２ｎ行おきの２行に
しても、或いは、一般化してｌｎ行おきのｌ行（ｌ＝１
、２、３、…）にしても、殆どアルゴリズムは変わらず
、２次記憶装置３とデータＩ／Ｏを行うプロセッサの順
番とデータを他のプロセッサに転送するプロセッサの順
番が変わるだけである。In the above embodiment, a case was explained in which each processor has a row in charge of calculation every n rows, but as shown in FIG. Or, it can be generalized to every ln rows (l=1
, 2, 3, . . .), the algorithm remains almost the same, only the order of processors that perform data I/O with the secondary storage device 3 and the order of processors that transfer data to other processors change.

【００３６】また、上記の実施例では、各プロセッサに
行を割り当てて並列化を行ったが、列を割り当てて並列
化を行う方法もある。（７）式より、ＬＵ分解のｉステ
ップ目の各要素の計算に必要なデータは、その要素が含
まれている列とｉ列の中に存在するので、ｉステップ目
の計算では、ｉ列を持っている（ｉ列を担当している）
プロセッサがｉ列のデータを全てのプロセッサに転送し
、それを受け取った各プロセッサが夫々の担当する列の
計算を行えば（７）式は並列化できる。そして、列に対
して並列化を行った場合のプロセッサと列との対応方式
はｌｎ列おきのｌ列（ｌ＝１、２、３、…）となる。Further, in the above embodiment, parallelization was performed by allocating rows to each processor, but there is also a method of performing parallelization by allocating columns. From equation (7), the data required to calculate each element in the i-th step of LU decomposition exists in the column containing that element and in the i-th column. (responsible for column i)
Equation (7) can be parallelized if the processor transfers the data of the i column to all processors, and each processor that receives the data calculates the column it is responsible for. When parallelization is performed on columns, the correspondence system between processors and columns is l columns every ln columns (l=1, 2, 3, . . . ).

【００３７】[0037]

【表１】[Table 1]

【００３８】[0038]

【表２】[Table 2]

【００３９】表１は、本発明を用いて連立一次方程式を
実際に解かせた場合のプロセッサ数と計算時間の関係を
示す。また、表２は、ＬＵ分解の計算と２次記憶装置と
プロセッサ群とのＩ／Ｏを逐次的に行った場合の同様の
関係を示す。表１の方が表２よりも、ＣＰＵ数を増して
いったときのＣＰＵ効率（速度比／ＣＰＵ数）が高くな
っていることが分かる。Table 1 shows the relationship between the number of processors and calculation time when simultaneous linear equations are actually solved using the present invention. Further, Table 2 shows a similar relationship when the calculation of LU decomposition and the I/O between the secondary storage device and the processor group are performed sequentially. It can be seen that the CPU efficiency (speed ratio/number of CPUs) is higher in Table 1 than in Table 2 when the number of CPUs is increased.

【００４０】[0040]

【発明の効果】本発明によれば、並列処理演算装置のプ
ロセッサ群は、連立一次方程式の係数マトリクスの一部
を記憶できるだけのメモリを持つだけでＬＵ分解を行う
ことができる。また、係数マトリクスを格納した２次記
憶装置とプロセッサ群との通信はＬＵ分解の計算と同時
に行われるので、２次記憶装置とのＩ／Ｏの時間が表面
に出ず、従って、全体の処理を高速に行うことができる
。According to the present invention, the processor group of the parallel processing arithmetic unit can perform LU decomposition simply by having a memory sufficient to store a part of the coefficient matrix of the simultaneous linear equations. Furthermore, since communication between the secondary storage device that stores the coefficient matrix and the processor group is performed at the same time as the LU decomposition calculation, the I/O time with the secondary storage device is not visible, and therefore the overall processing can be done quickly.

[Brief explanation of the drawing]

【図１】本発明の一実施例による並列処理演算装置の構
成を示すブロック図である。FIG. 1 is a block diagram showing the configuration of a parallel processing arithmetic device according to an embodiment of the present invention.

【図２】係数マトリクスの次元、バンド幅等の説明図で
ある。FIG. 2 is an explanatory diagram of dimensions, bandwidth, etc. of a coefficient matrix.

【図３】係数マトリクスの行とプロセッサ群の各プロセ
ッサとの対応関係を示す説明図である。FIG. 3 is an explanatory diagram showing the correspondence between rows of a coefficient matrix and each processor of a processor group.

【図４】ＬＵ分解１ステップを行うときの２次記憶装置
とのマトリクスデータの受渡しを示す説明図である。FIG. 4 is an explanatory diagram showing the exchange of matrix data with a secondary storage device when performing one step of LU decomposition.

【図５】ＬＵ分解の処理を説明するフローチャートであ
る。FIG. 5 is a flowchart illustrating LU decomposition processing.

【図６】係数マトリクスの行とプロセッサ群の各プロセ
ッサとの他の対応関係を示す説明図である。FIG. 6 is an explanatory diagram showing another correspondence relationship between rows of a coefficient matrix and each processor of a processor group.

[Explanation of symbols]

１　　メインプロセッサ２　　プロセッサ群３　　２次記憶装置４　　メモリ５　　リンク 1 Main processor 2 Processor group 3 Secondary storage device 4 Memory 5 Link

Claims

[Claims]

1. A parallel processing arithmetic device for solving simultaneous linear equations using an LU decomposition method, comprising: a secondary storage device for storing a coefficient matrix of the simultaneous linear equations; and a main processor connected to the secondary storage device. It consists of n processors connected to the main processor via a link, each processor being responsible for calculating every nth row of the coefficient matrix sent from the secondary storage device through the main processor. and a memory connected to the processor group and configured to transfer data to and from the secondary storage device while the processor group performs the calculation. parallel processing arithmetic unit.

2. A parallel processing arithmetic device for solving simultaneous linear equations using an LU decomposition method, comprising: a secondary storage device that stores a coefficient matrix of the simultaneous linear equations; and a main processor connected to the secondary storage device. Consisting of n processors connected to the main processor via links, each processor receives ln rows (l=1, 2, 3) of the coefficient matrix sent from the secondary storage device through the main processor. ,...) connected to the processor group and configured to transfer data between the processor group and the secondary storage device while the processor group performs the calculation. A parallel processing arithmetic device for simultaneous linear equations, which is equipped with an integrated memory.

3. A parallel processing arithmetic device for solving simultaneous linear equations using an LU decomposition method, comprising: a secondary storage device that stores a coefficient matrix of the simultaneous linear equations; and a main processor connected to the secondary storage device. It consists of n processors connected to the main processor via a link, each processor being responsible for calculating every nth column of the coefficient matrix sent from the secondary storage device through the main processor. and a memory connected to the processor group and configured to transfer data to and from the secondary storage device while the processor group performs the calculation. parallel processing arithmetic unit.

4. A parallel processing arithmetic device for solving simultaneous linear equations using an LU decomposition method, comprising: a secondary storage device for storing a coefficient matrix of the simultaneous linear equations; and a main processor connected to the secondary storage device. It consists of n processors connected to the main processor via a link, and each processor receives ln columns (l=1, 2, 3) of the coefficient matrix sent from the secondary storage device through the main processor. ,...) connected to the processor group and configured to transfer data between the processor group and the secondary storage device while the processor group performs the calculation. A parallel processing arithmetic device for simultaneous linear equations, which is equipped with an integrated memory.