JP2643110B2

JP2643110B2 - A fast method for solving simultaneous linear equations using an auxiliary storage device

Info

Publication number: JP2643110B2
Application number: JP3157487A
Authority: JP
Inventors: 日▲じゅん▼ 車
Original assignee: Nippon Electric Co Ltd
Current assignee: NEC Corp
Priority date: 1991-06-03
Filing date: 1991-06-03
Publication date: 1997-08-20
Anticipated expiration: 2012-08-20
Also published as: JPH04355878A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【産業上の利用分野】本発明はスーパーコンピュータお
よび半導体高速記憶装置である拡張記憶装置の出現によ
って、主記憶装置に入りきらないほど大規模な密行列を
係数とする連立１次方程式を超高速に計算することがで
きるようになってきた、数値計算の分野における大規模
連立１次方程式の高速な計算方式に関する。BACKGROUND OF THE INVENTION With the advent of a supercomputer and an extended storage device which is a semiconductor high-speed storage device, the present invention has been developed to realize simultaneous high-speed simultaneous linear equations having a dense matrix that is too large to fit in a main storage device. The present invention relates to a high-speed calculation method for a large-scale simultaneous linear equation in the field of numerical calculation, which has been able to calculate in a short time.

【０００２】[0002]

【従来の技術】拡張記憶装置を用いた連立１次方程式の
計算方式として、津田らによる計算方式（Ｔｓｕｄａ，
Ｔ．，Ｏｋａｂｅ，Ｙ．：ＵｓｅｏｆＳｅｍｉｃｏ
ｎｄｕｃｔｏｒＥｘｔｅｎｄｅｄＳｔｏｒａｇｅ
ａｓＥｘｔｅｎｄｅｄＭａｉｎＳｔｏｒａｇｅ
ｆｏｒＬａｒｇｅ−ＳｃａｌｅＳｕｐｅｒｃｏｍｐ
ｕｔｉｎｇ，Ｐｒｏｃｅｅｄｉｎｇｓｏｆｔｈｅ
ＳｅｃｏｎｄＩｎｔｅｒｎａｔｉｏｎａｌＣｏｎｆ
ｅｒｅｎｃｅｏｎＳｕｐｅｒｃｏｍｐｕｔｉｎｇ，
Ｍａｙ１９８７，ＳａｎｔａＣｌａｒａ（Ｓｕｐ
ｅｒｃｏｍｐｕｔｉｎｇ’８７），Ｖｏｌ．１，ｐｐ．
１７６−１８３，ＩｎｔｅｒｎａｔｉｏｎａｌＳｕｐ
ｅｒｃｏｍｐｕｔｉｎｇＩｎｓｔｉｔｕｔｅ．）が
ある。この方法は、拡張記憶と主記憶の間でデータをや
り取りしながら内積形式ガウス法を用いて係数行列をＬ
Ｕ分解し、連立１次方程式の解を求めるものである。以
下に、この計算方式について説明する。図６に示すよう
な連立１次方程式１を考え、係数ａｉｊ（ｉ＝１，２，
…，ｎ，ｊ＝１，２，…，ｎ）とｂｉ（ｉ＝１，２，
…，ｎ）とを与えて解ｘｉ（ｉ＝１，２，…，ｎ）を図
７に示した計算機で計算することを考える。この計算機
は、中央処理装置２と中央処理装置２が直接アクセスす
る主記憶装置３と、主記憶装置３とデータのやり取りが
可能である補助記憶装置４、特に補助記憶装置４として
半導体で構成された高速補助記憶装置である拡張記憶装
置１２を有する。連立１次方程式１の係数を図８に示す
様な形式で、拡張記憶装置１２に格納した状態から始め
て、拡張記憶装置１２と主記憶装置３との間でデータの
やり取りを行いながら最終的に図９に示すような変換が
行われたデータを拡張記憶装置１２上に得ることが本計
算方式の主要部分をなすＬＵ分解ステップである。ただ
し、図９におけるｌｉｊ、ｕｉｊはそれぞれ、ｎ次正方
下三角行列Ｌとｎ次正方上三角行列Ｕの対応する要素を
与え、行列Ｌと行列Ｕとの行列積は、連立１次方程式１
の係数を成分とする係数行列１１を与える。なお、ｎ次
正方下三角行列Ｌの対角要素はすべて１であるとしてい
る。上記のように連立１次方程式１の係数行列１１が２
つの三角行列Ｌ，Ｕの積に分解されれば、連立１次方程
式１の解は前進及び後退代入によって簡単に計算でき
る。図９の変換は、図１０に示す計算方式によって得ら
れる。ａｉｊの初期値を前記の係数行列１１の要素とし
て図５の計算方式に沿って計算すると、最終的にａｉｊ
には図９に示すＬＵ分解後の行列１５の要素が得られ
る。この計算方式は内積形式ガウス法（または、添え字
の計算順序からｊｋｉ形式−ＧＡＸＰＹガウス法）と呼
ばれる。なお、簡単のため、図１０には後ほど述べるア
ンローリングによる高速化手法を適用していない計算手
順が示されている。また、計算方式の記述には、計算機
言語ＰＡＳＣＡＬで用いられる表記法を一部用いてい
る。2. Description of the Related Art As a method of calculating simultaneous linear equations using an extended storage device, a calculation method by Tsuda et al. (Tsuda,
T. Okabe, Y .; : Use of Semico
nductor Extended Storage
as Extended Main Storage
for Large-Scale Supercomp
uting, Proceedings of the
Second International Conf
erence on Supercomputing,
May 1987, Santa Clara (Sup
ercomputing '87), Vol. 1, pp.
176-183, International Sup
ercomputing Institute. ). In this method, while exchanging data between an extended storage and a main storage, a coefficient matrix is transformed into L
The U-decomposition is performed to obtain a solution of the simultaneous linear equation. Hereinafter, this calculation method will be described. Considering a system of linear equations 1 as shown in FIG. 6, the coefficients aij (i = 1, 2, 2,
.., N, j = 1, 2,..., N) and bi (i = 1, 2,
, N) and the solution xi (i = 1, 2,..., N) is calculated by the computer shown in FIG. This computer is composed of a central processing unit 2, a main storage device 3 directly accessed by the central processing unit 2, an auxiliary storage device 4 capable of exchanging data with the main storage device 3, and particularly a semiconductor as the auxiliary storage device 4. It has an extended storage device 12 which is a high-speed auxiliary storage device. Starting from a state in which the coefficients of the simultaneous linear equation 1 are stored in the extended storage device 12 in a format as shown in FIG. 8, finally, while exchanging data between the extended storage device 12 and the main storage device 3, Obtaining the converted data as shown in FIG. 9 on the extended storage device 12 is the LU decomposition step which is a main part of the present calculation method. Here, lij and uij in FIG. 9 give the corresponding elements of the n-th square lower triangular matrix L and the n-th square upper triangular matrix U, respectively, and the matrix product of the matrix L and the matrix U is expressed by the simultaneous linear equation 1
Is given as a coefficient matrix 11 having the coefficients of It is assumed that all diagonal elements of the n-th square lower triangular matrix L are 1. As described above, the coefficient matrix 11 of the simultaneous linear equation 1 is 2
If it is decomposed into the product of two triangular matrices L and U, the solution of the simultaneous linear equation 1 can be easily calculated by forward and backward substitution. 9 is obtained by the calculation method shown in FIG. When the initial value of aij is calculated as an element of the coefficient matrix 11 according to the calculation method of FIG.
Obtains the elements of the matrix 15 after LU decomposition shown in FIG. This calculation method is called the inner product form Gauss method (or jki form-GAXPY Gauss method from the calculation order of subscripts). Note that, for simplicity, FIG. 10 shows a calculation procedure to which a speeding-up method by unrolling described later is not applied. The description of the calculation method partially uses the notation used in the computer language PASCAL.

【０００３】内積形式ガウス法の第ｋ列の計算における
データの参照関係を図１１に示す。内積形式ガウス法の
特徴は、第ｋ列の最終計算結果を得るときに、その左側
の第１、第２、…、第ｋ−１列の下三角部分と第ｋ列自
身のみが必要とされるということである。いま、図７に
示すように主記憶３上に２種類の記憶領域として、参照
データ用記憶領域５と更新データ用記憶領域６を考え
る。ここでは、参照データ用記憶領域５の上に補助記憶
装置４から読み込まれたデータは処理において参照され
るだけで、変化を受けず、従って、次の読み込みによっ
て消去される。一方、更新データ用領域６に読み込まれ
たデータは、処理によって値が変化し、処理結果は、次
の読み込みが行われる前に対応付られた補助記憶装置４
上に戻されるものとする。後ほど述べる実施例において
は、参照データ用記憶領域５と更新データ用記憶領域６
の使い方を変更する。また、主記憶装置３と補助記憶装
置４とのデータ転送を効率的に行うために、補助記憶装
置４（今の場合は拡張記憶装置１２）上に格納する連立
１次方程式１の係数行列１１を図８のように一定列ごと
にページ分割し、それぞれ第１ページ７、第２ページ
８、第３ページ９、…、第Ｎページ１０と呼び、ページ
単位でデータ転送を行うものとする。図８でｍは１ペー
ジが含む列数を表す。ただし、最後のページである第Ｎ
ページ１０は係数行列１１の次元によって決まるｍ以下
１以上の列を含む。図７の更新データ用記憶領域６に係
数行列１１の第１ページ７、第２ページ８、…、第Ｎペ
ージ１０と読み込んで各ページのデータに図９に示した
変換を行うためには、前述の内積形式ガウス法の特徴に
よって、図１２の計算方式を用いればよい。FIG. 11 shows a reference relation of data in the calculation of the k-th column in the inner product form Gaussian method. The feature of the inner product form Gauss method is that when obtaining the final calculation result of the k-th column, only the lower triangular part on the left side of the first, second, ..., k-1 columns and the k-th column itself are required. That is. Now, as shown in FIG. 7, a reference data storage area 5 and an update data storage area 6 are considered as two types of storage areas on the main storage 3. Here, the data read from the auxiliary storage device 4 onto the reference data storage area 5 is only referred to in the processing, is not changed, and is therefore deleted by the next reading. On the other hand, the value of the data read into the update data area 6 is changed by the processing, and the processing result is stored in the auxiliary storage device 4 associated before the next reading is performed.
Shall be returned to the top. In an embodiment to be described later, the reference data storage area 5 and the update data storage area 6
Change the usage of. To efficiently transfer data between the main storage device 3 and the auxiliary storage device 4, the coefficient matrix 11 of the simultaneous linear equation 1 stored in the auxiliary storage device 4 (in this case, the extended storage device 12) is stored. Are page-divided into fixed columns as shown in FIG. 8, and are referred to as a first page 7, a second page 8, a third page 9,..., An N-th page 10, respectively, and data transfer is performed in page units. In FIG. 8, m represents the number of columns included in one page. However, the last page, Nth
The page 10 includes m or less and 1 or more columns determined by the dimension of the coefficient matrix 11. To read the first page 7, the second page 8,..., The N-th page 10 of the coefficient matrix 11 into the update data storage area 6 in FIG. 7, and perform the conversion shown in FIG. The calculation method shown in FIG. 12 may be used depending on the characteristics of the inner product form Gauss method described above.

【０００４】上述のような内積ガウス法を基にした計算
方式は、図８に示したようなページ分割方式を用いた場
合には入出力回数を少なくできるので良いとされてい
る。A calculation method based on the inner product Gauss method as described above is said to be good when the page division method as shown in FIG. 8 is used because the number of times of input / output can be reduced.

【０００５】[0005]

【発明が解決しようとする課題】一方、スーパーコンピ
ュータにおいては、計算の高速化のためにアンローリン
グという手法が用いられる。アンローリングとは、たと
えば、図４の計算方式（ａ）を計算方式（ｂ）に変換す
ることによって、ａｉで表されている記憶領域に対する
データ格納回数を２ｎからｎに減らし、高速化を図る手
法である。図４の例は２段アンローリングの例である。On the other hand, in a supercomputer, a technique called unrolling is used to speed up the calculation. Unrolling refers to, for example, converting the calculation method (a) in FIG. 4 to the calculation method (b), thereby reducing the number of times of data storage in the storage area represented by ai from 2n to n, and increasing the speed. Method. FIG. 4 shows an example of two-stage unrolling.

【０００６】この様なアンローリング手法を適用した場
合、内積形式ガウス法は、計算速度の点で、必ずしも最
適な手法ではない。このことはロバートら（Ｒｏｂｅｒ
ｔ，Ｙ．ａｎｄＳｇｕａｚｚｅｒｏ，Ｐ．：Ｔｈｅ
ＬＵｄｅｃｏｍｐｏｓｉｔｉｏｎａｌｇｏｒｉｔｈ
ｍａｎｄｉｔｓｅｆｆｉｃｉｅｎｔＦＯＲＴＲＡ
ＮｉｍｐｌｅｍｅｎｔａｔｉｏｎｏｎＩＢＭ３
０９０ＶｅｃｔｏｒＭｕｔｉｐｒｏｃｅｓｓｏｒ，
ＩＢＭＴｅｃｈ．Ｒｅｐ．，ＩＣＥ−０００６，Ｍａ
ｒｃｈ１９８７）によって実験的に明らかにされてい
る。発明者もスーパーコンピュータＳＸ−２Ａ上で同様
な実験を行い、アンローリング手法を用いる場合に内積
形式ガウス法が、後述の外積形式ガウス法よりも計算速
度の点で大きく劣ることを確認している。例えば、１０
００元の連立１次方程式を主記憶装置上で解く場合に内
積形式ガウス法ではアンローリングを行っても５００〜
６００ＭＦＬＯＰＳ（１ＭＦＬＯＰＳは１秒間に１００
万回の浮動小数点演算を行う速度）程度の計算速度であ
るが、外積形式ガウス法では、９００ＭＦＬＯＰＳを越
える計算速度が得られる。これは、アンローリングによ
って、外積形式ガウス法の方が内積形式ガウス法よりも
データ格納回数を減らせることに起因している。従っ
て、内積形式ガウス法を拡張して主記憶装置と拡張記憶
装置との間のデータ入れ替えを行いながら連立１次方程
式の解を計算する計算方式は計算速度に関して必ずしも
最適な計算方式ではないという課題を有している。一
方、拡張記憶装置は高速ではあるがその大容量化は低速
なディスク装置よりも困難であり、利用可能な記憶容量
は制限される。したがって、利用可能な拡張記憶の容量
を越えるほど大規模な係数行列を持つ連立１次方程式を
解こうとする場合には、拡張記憶装置よりも低速なほか
の補助記憶装置（例えば、高速ディスク装置）を利用す
る必要がある。この場合、主記憶装置と補助記憶装置と
のデータ転送の可能な限り多くの部分を拡張記憶装置と
主記憶装置との間で行い、残りを他の補助記憶装置と主
記憶装置とで行うようにするという工夫が必要である。
しかし、従来の計算方式は、上記の問題を考慮にいれて
いない。さらに、従来の計算方式は、たとえば、大規模
行列による行列積演算のような他の演算との組み合わせ
を想定していなかったため、汎用性に欠けるという課題
を有していた。本発明は上記の課題を解決するためにな
されたもので、その目的は、アンローリングによる外積
形式ガウス法によって記憶装置の効率利用を計る補助記
憶装置を利用した連立１次方程式の解の高速計算方式を
提供することである。When such an unrolling method is applied, the inner product form Gaussian method is not always the optimal method in terms of calculation speed. This is what Robert et al.
t, Y. and Sguazzero, P.M. : The
LU decomposition algorithm
m and it's effective FORTRA
N implementation on IBM 3
090 Vector Multiprocessor,
IBM Tech. Rep. , ICE-0006, Ma
rch 1987). The inventor has also conducted a similar experiment on the supercomputer SX-2A, and has confirmed that the inner product form Gaussian method is significantly inferior to the outer product form Gaussian method in terms of calculation speed when the unrolling method is used. . For example, 10
When solving the system of linear equations of 00 element on the main memory, the inner product form Gaussian method is 500-
600 MFLOPS (1 MFLOPS is 100 per second
Although the calculation speed is of the order of the number of times of performing floating point operations for 10,000 times), the cross product Gaussian method can achieve a calculation speed exceeding 900 MFLOPS. This is because the outer product form Gaussian method can reduce the number of times of data storage by the unrolling compared to the inner product form Gaussian method. Therefore, a problem that the calculation method for calculating the solution of the simultaneous linear equations while exchanging the data between the main storage device and the expansion storage device by extending the inner product form Gauss method is not necessarily the optimum calculation method with respect to the calculation speed. have. On the other hand, although the expansion storage device is high-speed, it is more difficult to increase its capacity than a low-speed disk device, and the available storage capacity is limited. Therefore, when trying to solve a system of linear equations having a coefficient matrix large enough to exceed the available expanded storage capacity, another auxiliary storage device (for example, a high-speed disk device) that is slower than the expanded storage device ) Must be used. In this case, as much of the data transfer between the main storage device and the auxiliary storage device as possible is performed between the extended storage device and the main storage device, and the rest is performed between the other auxiliary storage device and the main storage device. It is necessary to devise something.
However, the conventional calculation method does not take the above problem into consideration. Furthermore, the conventional calculation method has a problem that it lacks versatility because it does not assume a combination with another operation such as a matrix multiplication operation using a large-scale matrix. SUMMARY OF THE INVENTION The present invention has been made to solve the above-described problems, and an object of the present invention is to provide a high-speed calculation of a solution of a system of linear equations using an auxiliary storage device that measures storage efficiency by a Gaussian cross product method using unrolling. Is to provide a scheme.

【０００７】[0007]

【課題を解決するための手段】本発明の補助記憶装置を
利用した連立１次方程式の解の高速計算方式は、中央処
理装置と中央処理装置が直接アクセスする主記憶装置
と、主記憶装置とデータのやり取りが可能である補助記
憶装置、特に補助記憶装置として半導体で構成された高
速補助記憶装置である拡張記憶装置を有する計算機にお
いて、補助記憶装置上に連立１次方程式の係数に対応す
る係数行列の数値データを与えて、まず、前記係数行列
を２つの三角行列の積に分解し、連立１次方程式の解を
計算する計算方式であって、前記連立１次方程式の係数
行列を２つの三角行列の積に分解するに当たって、前記
係数行列を一定数の列を含むように分割し、分割された
単位をページと呼び、順番に第１ページ、第２ページ、
…としてページを処理単位とする。ただし、最後のペー
ジは他のページと同じ大きさではなく係数行列の大きさ
に応じて余った要素を含むようにとる。SUMMARY OF THE INVENTION According to the present invention, a high-speed system for solving simultaneous linear equations using an auxiliary storage device includes a central processing unit, a main storage device directly accessed by the central processing unit, a main storage device and a data storage device. In a computer having an auxiliary storage device capable of exchanging data, in particular, a computer having an extended storage device, which is a high-speed auxiliary storage device constituted by a semiconductor as the auxiliary storage device, a coefficient matrix corresponding to a coefficient of a simultaneous linear equation on the auxiliary storage device Is given first, the coefficient matrix is decomposed into a product of two triangular matrices, and a solution of the simultaneous linear equation is calculated. In decomposing into a product of matrices, the coefficient matrix is divided so as to include a fixed number of columns, and the divided units are called pages, and the first page, the second page,
The page is set as a processing unit as. However, the last page is not the same size as the other pages, but includes extra elements according to the size of the coefficient matrix.

【０００８】前記係数行列の１つ以上のページを補助記
憶装置から主記憶装置へ読み込み、主記憶装置上の読み
込まれたデータに対して処理を行い、結果を補助記憶装
置の対応するページへと戻す。[0008] One or more pages of the coefficient matrix are read from the auxiliary storage device to the main storage device, the read data on the main storage device is processed, and the result is converted to a corresponding page of the auxiliary storage device. return.

【０００９】主記憶装置上の記憶領域を参照データ用記
憶領域と更新データ用記憶領域の２つに分け、参照デー
タ用記憶領域に補助記憶装置から読み込んだデータにつ
いてはそれ自身のデータのみで更新処理を行い、処理を
通じて次に述べる更新データ用領域に読み込まれたデー
タの更新のために参照される。従って、関連するデータ
の更新がすべて終わった後で補助記憶装置の対応するペ
ージに戻す。一方、更新データ用記憶領域に読み込んだ
データについては、前記参照データ用記憶領域に読み込
まれたデータを参照して値を更新し、補助記憶装置の対
応するページへと戻す。The storage area on the main storage device is divided into two, a reference data storage area and an update data storage area, and the data read from the auxiliary storage device into the reference data storage area is updated only with its own data. The process is performed, and is referred to for updating data read into the update data area described below through the process. Therefore, the page is returned to the corresponding page in the auxiliary storage device after all the related data has been updated. On the other hand, the data read into the update data storage area is updated with reference to the data read into the reference data storage area, and is returned to the corresponding page of the auxiliary storage device.

【００１０】前記係数行列がＮページからなるとし、順
番に第１ページ，第２ページ，…，第Ｎページと呼ぶと
き、補助記憶装置と、主記憶装置とのデータのやり取り
の順序を第１ページを補助記憶装置から参照データ用記
憶領域へ読み込み、処理する。第２ページ、第３ペー
ジ、…、第Ｎページの順に補助記憶装置から更新データ
用記憶領域へ読み込み、処理し、補助記憶装置の対応す
るページへと戻す。When the coefficient matrix is composed of N pages and is called a first page, a second page,..., An Nth page in order, the order of data exchange between the auxiliary storage device and the main storage device is the first. The page is read from the auxiliary storage device into the storage area for reference data and processed. The second page, the third page,..., And the Nth page are read from the auxiliary storage device to the storage area for update data, processed, and returned to the corresponding page of the auxiliary storage device.

【００１１】第１ページを参照データ用記憶領域から補
助記憶装置の対応するページへと戻す。The first page is returned from the reference data storage area to the corresponding page of the auxiliary storage device.

【００１２】第２ページを補助記憶装置から参照データ
用記憶領域へ読み込み、処理する。第３ページ、第４ペ
ージ、…、第Ｎページの順に補助記憶装置から更新デー
タ用記憶領域へ読み込み、処理し補助記憶装置の対応す
るページへと戻す。The second page is read from the auxiliary storage device into the reference data storage area and processed. The third page, the fourth page,..., And the Nth page are read in order from the auxiliary storage device to the update data storage area, processed, and returned to the corresponding page of the auxiliary storage device.

【００１３】第２ページを参照データ用記憶領域から補
助記憶装置の対応するページへと戻す。The second page is returned from the storage area for reference data to the corresponding page in the auxiliary storage device.

【００１４】 … 第Ｎ−１ページを補助記憶装置から参照データ用記憶領
域へ読み込み、処理する。Reads the N-1 page from the auxiliary storage device into the reference data storage area and processes it.

【００１５】第Ｎページを補助記憶装置から更新データ
用領域へ読み込み、処理し、補助記憶装置の対応するペ
ージへと戻す。The Nth page is read from the auxiliary storage device into the update data area, processed, and returned to the corresponding page in the auxiliary storage device.

【００１６】第Ｎ−１ページを参照データ用記憶領域か
ら補助記憶装置の対応するページへと戻す、順とする、
計算方式であり、また、補助記憶装置が拡張記憶装置を
含む場合に、前記係数行列の第Ｎページから逆順に拡張
記憶装置に入り切るだけのページを拡張記憶装置上に保
持しながら処理を進める計算方式であり、処理途中で、
以降の処理で参照・更新される部分に対応する小行列の
みを取り出し、これを新たに１つの行列と見なして前記
（１）及び（２）でのべた処理をおこない、最後に前記
小行列部分を元に戻すことによって前記係数行列を２つ
の三角行列の積として表し、前記連立１次方程式の解を
計算する計算方式であり、補助記憶装置上のデータを直
編成ファイルとして扱い、そのレコード長を前記係数行
列の１列が含む要素が丁度入る大きさに設定した計算方
式である。The N-1th page is returned from the reference data storage area to the corresponding page of the auxiliary storage device, in order.
This is a calculation method, and when the auxiliary storage device includes an extended storage device, the process proceeds while holding, on the extended storage device, only enough pages to fit in the extended storage device in reverse order from the Nth page of the coefficient matrix. This is a calculation method.
Only the small matrix corresponding to the part to be referred to and updated in the subsequent processing is extracted, and this is regarded as a new matrix, and the above-described processing in (1) and (2) is performed. Is a calculation method for expressing the coefficient matrix as a product of two triangular matrices and calculating the solution of the simultaneous linear equations, treating the data on the auxiliary storage device as a direct organization file, and Is set to such a size that an element included in one column of the coefficient matrix just fits.

【００１７】[0017]

【作用】本発明の補助記憶装置を利用した連立１次方程
式の解の高速計算方式は、外積形式ガウス法により参照
データ用記憶領域と更新データ用記憶領域を使い分け主
記憶装置と拡張補助記憶装置間のデータのやりとりを行
って記憶域を拡張使用するので、高速計算が可能とな
る。According to the present invention, a high-speed calculation method for solving a system of linear equations using an auxiliary storage device is divided into a reference data storage region and an update data storage region by a cross product Gaussian method. Since the storage area is extended and used by exchanging data between them, high-speed calculation is possible.

【００１８】[0018]

【実施例】本計算方式の第１の実施例について説明を行
う。本計算方式は前記ロバートら（Ｒｏｂｅｒｔ，Ｙ．
ａｎｄＳｇｕａｚｚｅｒｏ，Ｐ．：ＴｈｅＬＵｄ
ｅｃｏｍｐｏｓｉｔｉｏｎａｌｇｏｒｉｔｈｍａｎ
ｄｉｔｓｅｆｆｉｃｉｅｎｔＦＯＲＴＲＡＮｉ
ｍｐｌｅｍｅｎｔａｔｉｏｎｏｎＩＢＭ３０９０
ＶｅｃｔｏｒＭｕｔｉｐｒｏｃｅｓｓｏｒ，ＩＢＭ
Ｔｅｃｈ．Ｒｅｐ．，ＩＣＥ−０００６，Ｍａｒｃｈ
１９８７）によって提案された外積形式ガウス法を大
規模行列用に改良し、図７に示したような補助記憶装置
４をもつ計算機で計算できるようにしたものである。ま
ず、外積形式ガウス法について説明する。図６に示すよ
うな連立１次方程式１を考え、係数ａｉｊ（ｉ＝１，
２，…，ｎ，ｊ＝１，２，…，ｎ）とｂｉ（ｉ＝１，
２，…，ｎ）とを与えて解ｘｉ（ｉ＝１，２，…，ｎ）
を計算することを考える。連立１次方程式１の係数を図
８に示す様な形式で、拡張記憶装置１２に格納した状態
から始めて、拡張記憶装置１２と主記憶装置３との間で
データのやり取りを行いながら最終的に図９に示すよう
な変換が行われたデータを拡張記憶装置１２上に得るこ
とが本計算方式の主要部分をなすＬＵ分解ステップであ
る。ただし、図９におけるｌｉｊ、ｕｉｊはそれぞれ、
ｎ次正方下三角行列Ｌとｎ次正方上三角行列Ｕの対応す
る要素を与え、行列Ｌと行列Ｕとの行列積は、連立１次
方程式１の係数を成分とする係数行列１１を与える。な
お、ｎ次正方下三角行列Ｌの対角要素はすべて１である
としている。上記のように連立１次方程式１の係数行列
１１が２つの三角行列Ｌ，Ｕの積に分解されれば、連立
１次方程式１の解は前進及び後退代入によって簡単に計
算できる。図９の変換は、図２に示す計算方式によって
得られる。ａｉｊの初期値を前記の係数行列１１の要素
として図２の計算方式に沿って計算すると、最終的にａ
ｉｊには図９に示すＬＵ分解後の行列１５の要素が得ら
れる。この計算方式は外積形式ガウス法（または、その
添え字の計算順序からｋｊｉ形式−ＳＡＸＰＹガウス
法）と呼ばれる。また、簡単のため、アンローリングに
よる高速化手法を適用していない計算手順が示されてい
る。外積形式ガウス法の第ｋステップの計算におけるデ
ータの参照関係を図３に示す。外積形式ガウス法の特徴
は、図３に示すよう第ｋステップの計算においては、ｎ
−ｋ＋１次の右下小正方行列部分のみが必要とされ、更
新は、ｎ−ｋ次右下小正方行列部分１６内でのみ行われ
るということである。一般にさきに述べたアンローリン
グ手法を用いた場合にもステップが進むにつれて、処理
対象となる部分が、縮小し、右下小正方行列で表せると
いう性質は変化しない。従って、連立１次方程式１の係
数行列１１を図８のようにページ分割し、図７のように
拡張記憶装置１２上において、拡張記憶装置１２と主記
憶装置３とでデータの入れ替えを行いながら図９に示し
た変換を行うには、図１に示す計算方式を実行すればよ
い。ここで、主記憶装置上の２つの記憶領域である参照
データ用記憶領域５と更新データ用記憶領域６を以下の
ように使い分ける。参照データ用記憶領域５に補助記憶
装置４から読み込んだデータについてはそれ自身のデー
タのみで更新処理を行い、処理を通じて更新データ用領
域６に読み込まれたデータの更新のために参照される。
従って、関連するデータの更新がすべて終わった後で補
助記憶装置４の対応するページに戻す。一方、更新デー
タ用記憶領域６に読み込んだデータについては、前記参
照データ用記憶領域５に読み込まれたデータを参照して
値を更新し、補助記憶装置４の対応するページへと戻
す。図１に示した計算手順を何機種かある演算パイプラ
イン方式ベクトル計算機の中から拡張記憶を有するスー
パーコンピュータＳＸ−２Ａ上で実現し、５０００元の
連立１次方程式の解の計算を１ＧＦＬＯＰＳ（１秒間に
１０億回の浮動小数点演算を行う速度）を越える計算速
度で計算することが出来た。この計算に用いた主記憶装
置上の記憶容量は約２５Ｍｂｙｔｅで、この値は、この
連立１次方程式の解を主記憶装置３のみを用いて計算す
るときに必要となる主記憶容量の約１／８であった。ま
た、ＳＸ−２Ａの理論ピーク性能は、１．３ＧＦＬＯＰ
Ｓであり、理論性能に非常に近い高速性能を実現でき
た。上に述べた実施例は、補助記憶装置４として拡張記
憶装置１２のみを用いた場合に相当するものである。EXAMPLE A first example of the present calculation method will be described. This calculation method is described in Robert et al.
and Sguazzero, P.M. : The LU d
ecological algorithm an
d its effective FORTRAN i
implementation on IBM 3090
Vector Multiprocessor, IBM
Tech. Rep. ICE-0006, March
1987) is improved for a large-scale matrix by using the outer product form Gaussian method, which can be calculated by a computer having an auxiliary storage device 4 as shown in FIG. First, the outer product Gaussian method will be described. Considering the simultaneous linear equation 1 as shown in FIG. 6, the coefficient aij (i = 1,
2, ..., n, j = 1,2, ..., n) and bi (i = 1,
2, ..., n) to give a solution xi (i = 1,2, ..., n)
Consider calculating Starting from a state in which the coefficients of the simultaneous linear equation 1 are stored in the extended storage device 12 in a format as shown in FIG. 8, finally, while exchanging data between the extended storage device 12 and the main storage device 3, Obtaining the converted data as shown in FIG. 9 on the extended storage device 12 is the LU decomposition step which is a main part of the present calculation method. However, lij and uij in FIG.
Corresponding elements of an nth-order square lower triangular matrix L and an nth-order upper square triangular matrix U are given, and a matrix product of the matrix L and the matrix U gives a coefficient matrix 11 having coefficients of the simultaneous linear equation 1 as components. It is assumed that all diagonal elements of the n-th square lower triangular matrix L are 1. If the coefficient matrix 11 of the simultaneous linear equation 1 is decomposed into the product of the two triangular matrices L and U as described above, the solution of the simultaneous linear equation 1 can be easily calculated by forward and backward substitution. 9 is obtained by the calculation method shown in FIG. When the initial value of aij is calculated as an element of the coefficient matrix 11 according to the calculation method of FIG.
For ij, the elements of the matrix 15 after LU decomposition shown in FIG. 9 are obtained. This calculation method is called the outer product form Gauss method (or kji form-SXPY Gauss method from the calculation order of the subscript). In addition, for simplicity, a calculation procedure to which the high-speed technique by unrolling is not applied is shown. FIG. 3 shows a reference relation of data in the calculation of the k-th step of the outer product form Gaussian method. The feature of the Gaussian cross product method is that, in the calculation of the k-th step, as shown in FIG.
This means that only the -k + 1 order lower right small square matrix part is needed, and the update is performed only in the nk order lower right square matrix part 16. In general, even when the unrolling method described above is used, the property that the portion to be processed is reduced and can be represented by a lower right small square matrix does not change as the step proceeds. Therefore, the coefficient matrix 11 of the simultaneous linear equation 1 is divided into pages as shown in FIG. 8, and data is exchanged between the extended storage device 12 and the main storage device 3 on the extended storage device 12 as shown in FIG. To perform the conversion shown in FIG. 9, the calculation method shown in FIG. 1 may be executed. Here, the reference data storage area 5 and the update data storage area 6, which are two storage areas on the main storage device, are selectively used as follows. The data read from the auxiliary storage device 4 into the reference data storage area 5 is updated only with its own data, and is referred to for updating the data read into the update data area 6 through the processing.
Therefore, the page is returned to the corresponding page in the auxiliary storage device 4 after all the related data has been updated. On the other hand, for the data read into the update data storage area 6, the value is updated with reference to the data read into the reference data storage area 5, and the data is returned to the corresponding page of the auxiliary storage device 4. The calculation procedure shown in FIG. 1 is realized on a supercomputer SX-2A having an extended storage from among several types of operation pipeline system vector computers, and the calculation of the solution of the 5000-element simultaneous linear equation is performed by 1GFLOPS (1 (The speed of performing one billion floating-point operations per second). The storage capacity on the main storage device used for this calculation is about 25 Mbytes, and this value is about 1 of the main storage capacity required when calculating the solution of this simultaneous linear equation using only the main storage device 3. / 8. The theoretical peak performance of SX-2A is 1.3 GFLOP
S, and a high-speed performance very close to the theoretical performance was realized. The embodiment described above corresponds to a case where only the extended storage device 12 is used as the auxiliary storage device 4.

【００１９】第２の実施例として、係数行列１１が大き
すぎて、拡張記憶装置１２に入りきらない場合について
のべる。この場合には、補助記憶装置４として、より低
速な、ディスク装置を考えなければいけないが、拡張記
憶１２とディスク装置とのデータ転送速度の差が著しい
ため、多くのデータ転送が、拡張記憶装置１２側で行わ
れるように、補助記憶装置４を使い分ければよい。さき
に述べた、外積形式ガウス消去法の特徴から、データ転
送回数は連立一次方程式１の係数行列１１をページ分割
したとき第１ページから最終ページ（第Ｎページとす
る）へ移るにつれて多くなる。したがって、第Ｎページ
から逆順に拡張記憶装置に入るだけのページを拡張記憶
１２上に保持し、残りの部分をディスク上に保持するよ
うに処理を行えば、高速処理が実現できる。この計算方
式は、実施例１で述べた手順において、拡張記憶装置１
２と主記憶装置３とのやり取りを一部ディスク装置と主
記憶装置３とのやり取りに変更するように書き換えるこ
とによって、容易に実現できる。たとえば、初期状態と
して連立１次方程式１の係数行列１１の第１ページから
ｐ−１ページまでをディスク装置上に与え、ｐページか
らＮページまでを拡張記憶装置上に与えた場合に、初期
状態と同じページ配置で図９に示した変換の結果を得る
ためには、図１の計算方式を図５のように書き換えれば
よい。なお、初期状態として連立１次方程式１の係数行
列１１がすべてディスク装置上にある場合や最終結果を
すべてディスク装置上におくようにする場合に対応する
ように上述の計算方式を変更するのは容易である。As a second embodiment, a case where the coefficient matrix 11 is too large to fit in the extended storage device 12 will be described. In this case, a slower disk device must be considered as the auxiliary storage device 4. However, since the difference in data transfer speed between the extended storage 12 and the disk device is remarkable, many data transfers are performed in the extended storage device. What is necessary is just to use the auxiliary storage device 4 properly, as performed on the 12 side. Due to the feature of the Gaussian elimination method described above, the number of data transfers increases as the coefficient matrix 11 of the simultaneous linear equation 1 is divided into pages, from the first page to the last page (the Nth page). Therefore, high-speed processing can be realized by performing processing such that pages that only enter the extended storage device in the reverse order from the Nth page are stored in the extended storage 12 and the remaining portion is retained on the disk. This calculation method is based on the procedure described in the first embodiment.
2 can be easily realized by rewriting the exchange between the disk drive and the main storage device 3 so as to partially change the exchange between the disk device and the main storage device 3. For example, when the first page to the p-1 page of the coefficient matrix 11 of the simultaneous linear equation 1 are provided on the disk device and the p pages to the N page are provided on the extended storage device as the initial state, the initial state In order to obtain the result of the conversion shown in FIG. 9 with the same page arrangement as in FIG. 9, the calculation method in FIG. 1 may be rewritten as shown in FIG. It is to be noted that the above-described calculation method is changed so as to correspond to a case where the coefficient matrix 11 of the simultaneous linear equation 1 is entirely on the disk device as an initial state or a case where all the final results are to be stored on the disk device. Easy.

【００２０】第３の実施例として、全計算時間に対し
て、補助記憶装置４と主記憶装置３とのデータ転送時間
の占める割合が大きい場合を考える。この場合には、補
助記憶装置４と主記憶装置３とのデータ転送量を減らす
ことが重要となる。さきに、外積形式ガウス消去法では
参照・更新される部分は、処理が進むにつれて縮小する
右下小正方行列部であり、ページ内の他の部分は、以後
の処理には不要となると述べた。従って、今後の処理に
必要な右下部分行列のみを取り出し、これを、一つの行
列と見なして、第１や第２の実施例に述べた方法を適用
すれば、データ転送量を減らすことができる。この方法
は、例えば、補助記憶装置４として、拡張記憶装置１２
を持たないような計算機構成の場合にデータ転送量を減
らす有効な方法である。As a third embodiment, consider the case where the ratio of the data transfer time between the auxiliary storage device 4 and the main storage device 3 to the total calculation time is large. In this case, it is important to reduce the data transfer amount between the auxiliary storage device 4 and the main storage device 3. Earlier, in the outer product form Gaussian elimination method, the part to be referred and updated is the lower right small square matrix part that shrinks as the processing progresses, and the other parts in the page are unnecessary for subsequent processing . Therefore, if only the lower right sub-matrix necessary for the future processing is extracted, this is regarded as one matrix, and the method described in the first or second embodiment is applied, the data transfer amount can be reduced. it can. This method uses, for example, the extended storage device 12 as the auxiliary storage device 4.
This is an effective method to reduce the amount of data transfer in a computer configuration that does not have the above.

【００２１】最後に、第４の実施例として、補助記憶装
置４上のデータの格納形式として、上記の実施例では、
直編成ファイルを用いることを前提としている。直編成
ファイルにおいては、レコードと呼ばれる単位を記憶領
域の単位として用い、任意のレコードに対して、読み書
きを行うことが出来る。従来、このレコードの大きさと
して、連立１次方程式１の係数行列１１をページ分割し
たときの１ページの大きさが採用されてきたが、こうす
ると、他の計算と、連立１次方程式の求解を結合して計
算する場合に問題が生じる。例えば、２つの大規模行列
の積を計算し、計算された積を係数行列とする連立１次
方程式を解く場合を考えると、一般には、行列の積を計
算するときの最適なページの大きさと連立１次方程式の
解を計算するときの最適なページの大きさが異なる。し
かし、直編成ファイルにおいては、レコード長を一度決
定すると、後で変更することが出来ない。従って、レコ
ード長をページの大きさに設定することは好ましくな
い。ここでは、レコード長として、係数行列の１列の大
きさをとり、１ページ分のデータを読み書きする場合に
１ページに相当する複数のレコードを連続して読み書き
することによってこの問題を解決した。こうすることに
よって、他の処理と結合した場合にも各処理ごとに最適
なページの大きさを設定することができる。Finally, as a fourth embodiment, as a storage format of data in the auxiliary storage device 4, in the above embodiment,
It is assumed that a direct organization file is used. In a direct-organization file, a unit called a record is used as a unit of a storage area, and reading and writing can be performed on an arbitrary record. Conventionally, as the size of this record, the size of one page when the coefficient matrix 11 of the simultaneous linear equation 1 is divided into pages has been adopted. However, other calculations and the solution of the simultaneous linear equation A problem arises when calculating by combining. For example, considering the case of calculating the product of two large-scale matrices and solving a system of linear equations using the calculated product as a coefficient matrix, in general, the optimal page size when calculating the product of the matrices and The optimum page size when calculating the solution of the system of linear equations differs. However, once a record length is determined in a direct organization file, it cannot be changed later. Therefore, it is not preferable to set the record length to the size of the page. Here, this problem was solved by taking the size of one column of the coefficient matrix as the record length, and reading and writing data for one page continuously by reading and writing a plurality of records corresponding to one page. This makes it possible to set an optimum page size for each process even when the process is combined with another process.

【００２２】なお、本実施例では述べなかったが、連立
１次方程式を解く場合に普通行われる部分ピボティング
操作を行う場合にも同様な計算方式を採用できる。Although not described in the present embodiment, a similar calculation method can be adopted when a partial pivoting operation which is usually performed when solving simultaneous linear equations is performed.

【００２３】[0023]

【発明の効果】本計算方式によって、大規模連立１次方
程式を高速に、計算することができ、その結果、構造解
析、流体力学、電磁気学、原子力等の様々な工学分野は
もちろんのこと、数多くの応用分野における数値計算の
高速化に貢献できる。According to the present calculation method, a large-scale simultaneous linear equation can be calculated at high speed. As a result, not only various engineering fields such as structural analysis, fluid dynamics, electromagnetics, and nuclear power, but also, It can contribute to speeding up numerical calculations in many application fields.

[Brief description of the drawings]

【図１】本発明の補助記憶装置を利用した連立１次方程
式の解の高速計算方式の説明図である。FIG. 1 is an explanatory diagram of a high-speed calculation method for solving simultaneous linear equations using an auxiliary storage device of the present invention.

【図２】本発明の外積形式ガウス法における各要素の計
算方式を示す図である。FIG. 2 is a diagram showing a calculation method of each element in the outer product form Gaussian method of the present invention.

【図３】本発明の外積形式ガウス法におけるｋステップ
目のデータ参照関係を示す図である。FIG. 3 is a diagram illustrating a k-th data reference relationship in the outer product form Gaussian method of the present invention.

【図４】アンローリング手法の説明図である。FIG. 4 is an explanatory diagram of an unrolling method.

【図５】本発明の第２の実施例の説明図である。FIG. 5 is an explanatory diagram of a second embodiment of the present invention.

【図６】解を求めるべき連立１次方程式を表わした図で
ある。FIG. 6 is a diagram showing simultaneous linear equations for which a solution is to be obtained.

【図７】計算機構成を示した図である。FIG. 7 is a diagram showing a computer configuration.

【図８】連立１次方程式のページ分割説明図である。FIG. 8 is an explanatory diagram of page division of simultaneous linear equations.

【図９】連立１次方程式の系数行列の各要素とＬＵ分解
された行列の要素との対応関係を示した図である。FIG. 9 is a diagram showing a correspondence relationship between each element of a modulus matrix of a system of linear equations and an element of an LU-decomposed matrix;

【図１０】従来の内積形式ガウス法における各要素の計
算方式を示した図である。FIG. 10 is a diagram showing a calculation method of each element in a conventional inner product form Gaussian method.

【図１１】従来の内積形式ガウス法における第ｋ列の計
算における各要素の計算方式を示した図である。FIG. 11 is a diagram showing a calculation method of each element in calculation of a k-th column in a conventional inner product form Gaussian method.

【図１２】従来の計算方式を説明した図である。FIG. 12 is a diagram illustrating a conventional calculation method.

[Explanation of symbols]

１…連立１次方程式、２…中央処理装置、３…主記憶装
置、４…補助記憶装置、５…参照データ用記憶領域、６
…更新データ用記憶領域、７…ページ分割された係数行
列の第１ページ、８…同第２ページ、９…同第３ペー
ジ、１０…同第Ｎページ（最終ページ）、１１…連立１
次方程式の係数行列、１２…拡張記憶装置、１３…ｎ次
正方上三角行列Ｕの要素、１４…ｎ次正方下三角行列Ｌ
の要素、１５…ＬＵ分解後の各要素の配置、１６…第ｋ
ステップ目で更新される領域、１７…第ｋステップ目で
参照される領域。DESCRIPTION OF SYMBOLS 1 ... Simultaneous linear equation, 2 ... Central processing unit, 3 ... Main storage device, 4 ... Auxiliary storage device, 5 ... Storage area for reference data, 6
... storage area for update data, 7 ... first page of coefficient matrix divided into pages, 8 ... second page, 9 ... third page, 10 ... Nth page (last page), 11 ... simultaneous 1
Coefficient matrix of the following equation, 12... Expanded storage device, 13... Element of n-th square upper triangular matrix U, 14.
, 15... Arrangement of each element after LU decomposition, 16.
The area updated at the step number 17; the area referenced at the k-th step.

Claims

(57) [Claims]

1. A central processing unit, a main storage device directly accessed by the central processing unit, and an auxiliary storage device capable of exchanging data with the main storage device, in particular, a high-speed auxiliary storage device composed of a semiconductor as the auxiliary storage device. In a computer having a certain extended storage device, numerical data of a coefficient matrix corresponding to a coefficient of a system of linear equations is given to an auxiliary storage device,
First, there is provided a calculation method for decomposing the coefficient matrix into a product of two triangular matrices and calculating a solution of a system of linear equations. In decomposing the coefficient matrix of the system of linear equations into a product of two triangular matrices, , The coefficient matrix is divided so as to include a fixed number of columns, the divided units are called pages, and the first page, the second page,...
However, the last page is not the same size as the other pages, but includes extra elements according to the size of the coefficient matrix. One or more pages of the coefficient matrix are read from the auxiliary storage device to the main storage device, the read data on the main storage device is processed, and the result is returned to the corresponding page of the auxiliary storage device. The storage area on the main storage device is divided into two, a reference data storage area and an update data storage area, and the data read from the auxiliary storage apparatus into the reference data storage area is subjected to update processing only with its own data. And is referred to for updating the data read into the update data area described below through the processing.
After all related data has been updated, the page is returned to the corresponding page in the auxiliary storage device. On the other hand, for the data read into the update data storage area, the data read into the reference data storage area is referred to. To update the value,
Return to the corresponding page in the auxiliary storage device. When the coefficient matrix is composed of N pages and is called a first page, a second page,...
The first page is read from the auxiliary storage device into the storage area for reference data, and the order of data exchange with the main storage device is processed. The second page, the third page,..., The Nth page are read in order from the auxiliary storage device to the update data storage area,
Process and return to the corresponding page in the auxiliary storage device. The first page is returned from the reference data storage area to the corresponding page of the auxiliary storage device. The second page is read from the auxiliary storage device to the storage area for reference data and processed. The third page, the fourth page,..., The Nth page are read in order from the auxiliary storage device to the storage area for update data, processed, and returned to the corresponding page of the auxiliary storage device. The second page is returned from the reference data storage area to the corresponding page of the auxiliary storage device. ... Read the N-1th page from the auxiliary storage device into the reference data storage area and process it. The Nth page is read from the auxiliary storage device into the update data area, processed, and returned to the corresponding page in the auxiliary storage device. The (N-1) th page is returned from the reference data storage area to the corresponding page of the auxiliary storage device. A high-speed calculation method for solving simultaneous linear equations using an auxiliary storage device.

2. When the auxiliary storage device includes an extended storage device, the process proceeds while holding, on the extended storage device, only enough pages to fit in the extended storage device in reverse order from the Nth page of the coefficient matrix. A high-speed calculation method for solving simultaneous linear equations using the auxiliary storage device according to claim 1.

3. During processing, only a small matrix corresponding to a part to be referred to / updated in the subsequent processing is extracted, and this is regarded as a new matrix, and the processing is performed. 3. The auxiliary storage device according to claim 1, wherein the coefficient matrix is represented as a product of two triangular matrices, and a solution of the simultaneous linear equation is calculated. High-speed calculation method for solving simultaneous linear equations.

4. The data processing apparatus according to claim 1, wherein the data on the auxiliary storage device is treated as a direct organization file, and a record length thereof is set to a size just enough to include an element included in one column of the coefficient matrix. A high-speed calculation method for solving a system of linear equations using the auxiliary storage device according to the first, second, and third terms.