JPS63127366A

JPS63127366A - Analyzer for simultaneous linear equations system

Info

Publication number: JPS63127366A
Application number: JP27471886A
Authority: JP
Inventors: Takashi Doi; 俊土肥
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1986-11-17
Filing date: 1986-11-17
Publication date: 1988-05-31

Abstract

PURPOSE:To effectively utilize a parallel computer, by performing forward/ backward substitute calculation for an inverse matrix operation by an approximate matrix with a parallel processing. CONSTITUTION:A coefficient matrix degeneration circuit 1 which inputs the coefficient matrix of a simultaneous linear equation system and a coefficient vector obtained by the discrete approximation of a partial differential equation and degenerates the coefficient matrix to the approximate matrix in which the inverse matrix operation can be performed, a forward/back substitution circuit 2 which performs the inverse matrix operation of the approximate matrix generated at the circuit 1 on a vector data from a recursive calculation circuit 3 and outputs it to a circuit 5, and the circuit 5 are provided. And assuming an equation Au=f as an expression I corresponding to the division (u)=[u1...um] of an unknown vector (u), each of diagonal blocks A11, A22,...A is imperfectly LU-degenerated. It is assumed that the matrix imperfectly LU-degenerated of Aii (i=1-m) is expressed as LiiUii=Mii, and the approximate matrix for the inverse calculation of the approximate matrix M is expressed as an expres sion II. In such way, it is possible to perform the inverse matrix operation of the approximate matrix M at every diagonal matrix Mii.

Description

【発明の詳細な説明】〔産業上の利用分野〕本発明は連立一次方程式解析装置、特に有限要素解析方
式による連立一次方程式解析装置に関するものである。DETAILED DESCRIPTION OF THE INVENTION [Field of Industrial Application] The present invention relates to a system for analyzing simultaneous linear equations, and particularly to a system for analyzing simultaneous linear equations using a finite element analysis method.

[Conventional technology]

従来の技術としては、係数行列の全非零要素に対して不
完全Ｃｈｏｌｅｓｋｉ分解又は不完全ＬＵ分解（以下で
はこれらを総称して不完全ＬＵ分解と呼ぶ）を行い、得
られた行列を逆行列操作のための近似行列とする方法が
知られている。即ち本発明の構成（第１図）に従えば、
係数行列分解回路で係数行列の全非零要素の不完全ＬＵ
分解を行い、前進後退代入回路で前記不完全ＬＵ分解さ
れた近似行列による逆行列操作を行う。Conventional techniques include performing incomplete Choleski decomposition or incomplete LU decomposition (hereinafter collectively referred to as incomplete LU decomposition) on all nonzero elements of a coefficient matrix, and inverting the resulting matrix. A method of creating an approximate matrix for manipulation is known. That is, according to the configuration of the present invention (FIG. 1),
Incomplete LU of all non-zero elements of coefficient matrix in coefficient matrix decomposition circuit
Decomposition is performed, and a forward/backward substitution circuit performs an inverse matrix operation using the approximate matrix subjected to the incomplete LU decomposition.

行列ＡのＬＬＩ分解はＡを下三角行列りと上三角行列Ｕ
の積ＬＵ　（＝Ａ＞に分解するものであり、これにより
例えば、Ａｕ・「なる方程式の未知ベクトルＵをｖ＝Ｌ
−”ｆ（前進代入）、ｕ＝ＬＪ−’ｖ（後退代入）によ
り求めるものであるが、偏微分方程式の離散近似により
得られる連立一次方程式では係数行列Ａは疎である（全
行列要素に占める非零要素の割合が低い）のに対し、Ｌ
Ｕ分解された行列はそれよりも非零要素が多くなる。通
常この非零要素の発生のことをＩ’１ｌｌ−ｉｎと呼ん
でいるが、大規模な問題ではこのｆｉｌｌ−ｉｎによる
メモリ使用量・演算量の増大がネックとなって、事実上
計算不可能となる。そこで不完全ＬＵ分解法では、この
ｆｉｌｌ−ｉｎ部分を無視して（零に近似して）ＬＵ分
解を行う。従って、係数行列の全非零要素に対して不完
全ＬＵ分解を行って得られる行列し。The LLI decomposition of matrix A transforms A into a lower triangular matrix and an upper triangular matrix U
The product LU (=A>) is decomposed into the product LU (=A>. For example, the unknown vector U in the equation of
-"f (forward substitution) and u=LJ-'v (backward substitution), but in simultaneous linear equations obtained by discrete approximation of partial differential equations, the coefficient matrix A is sparse (all matrix elements are ), whereas L
The U-decomposed matrix has more nonzero elements than that. Normally, the occurrence of this non-zero element is called I'1ll-in, but in large-scale problems, the increase in memory usage and calculation amount due to this fill-in becomes a bottleneck, making it virtually impossible to calculate. becomes. Therefore, in the incomplete LU decomposition method, LU decomposition is performed while ignoring this fill-in part (approximating it to zero). Therefore, the matrix obtained by performing incomplete LU decomposition on all nonzero elements of the coefficient matrix.

ＵはそれぞれＡの上玉角部分、上玉角部分と同じ横道を
持つ。U has the same lateral paths as the upper bead corner part and the upper bead corner part of A, respectively.

この従来技術については、村田健朗他著、「スーパーコ
ンピュータ、科学技術計算への応用」（丸首）に詳述さ
れている。This conventional technology is described in detail in ``Supercomputers, Applications to Scientific and Technical Calculations'' (published by Kenro Murata et al.).

〔発明が解決しようとする問題点〕例えば、複数のプロセッサ（これをＰＩＪＩ（ｉ川。[Problem that the invention seeks to solve] For example, multiple processors (PIJI).

・・・、ｍ）と表す）を用いて連立一次方程式Ａ　ｌｌ
・１″を解くために、未知ベクＩ・ル１１−［＋１１．
・・・、Ｕお］のように１１個のブロックｕ１（ｉ・１
．・・・、ｍ）に分割し、各ｕｌの計算をＰＵ、で行う
ものとする。..., m)) to form simultaneous linear equations A ll
・To solve 1″, use the unknown vector I・11−[+11.
..., Uo], 11 blocks u1(i・1
．． ..., m), and each ul is calculated by PU.

前進・後退代入計算では、行列の非対角部分に位置する
非零要素は演算結果のフィードバックを意味する。例え
ば対角に隣接する行列要素が非零であると、その代入計
算は一つ前の行の計算が終わってから始めて次の行の計
算が始まるといった逐次計算となる。従って従来の係数
行列の全非零要素に対して不完全ＬＵ分解を行う方法で
は、Ｌ及びＵによる前進及び後退代入計算が、プロセッ
サＰ　ＩＩ　＋　（ｉ・１．・・・、ｍ）毎の逐次処理
となり、並列処理による高速化の妨げとなる。In forward/backward substitution calculations, non-zero elements located in off-diagonal parts of the matrix mean feedback of the calculation results. For example, if the diagonally adjacent matrix elements are non-zero, the assignment calculation will be a sequential calculation in which the calculation of the next row begins only after the calculation of the previous row is completed. Therefore, in the conventional method of performing incomplete LU decomposition for all non-zero elements of a coefficient matrix, forward and backward substitution calculations using L and U are performed for each processor P II + (i・1..., m). This results in sequential processing, which hinders speeding up by parallel processing.

したがって本発明が解決しようとする問題点、換言すれ
ば本発明の目的は複数のプロセッサによる並列処理を可
能とする手順を適用することによって上記の問題点を改
良した連立一次方程式解析装置を提供することにある。Therefore, the problem to be solved by the present invention, in other words, the object of the present invention is to provide a system for analyzing simultaneous linear equations that improves the above-mentioned problems by applying a procedure that enables parallel processing by a plurality of processors. There is a particular thing.

１１問題点を解決するための手段〕本発明の連立一次方程式解析装置は、偏微分方程式のｉ
４敗近似により得られる連立一次方程式の係数行列と係
数ベクトルを入力データとし、前記係数行列を逆行列操
作可能な近似行列に分解する係数行列分解回路と、後記
反復計算回路から入力したベクトルデータに前記係数行
列分解回路で生成された近似行列の逆行列操作を行い反
復計算回路に出力する前進後退代入回路と、これに接続
される反復計算回路とを有し、前記連立一次方程式の近
似解を出力とする連立一次方程式解析装置であって、前
記係数行列分解回路は前記係数行列をブロック行列に分
割し、その各対角ブロック行列に対して不完全Ｃｂｏｌ
ｅｓｋｉ分解又は不完全ＬＵ分解を行い、前記前進後退
代入回路は反復計算回路から入力したベクトルデータに
不完全分解された各対角ブロック行列による前進後退代
入を行うことによって近似行列の逆行列操作を行うよう
にして構成される。Means for Solving 11 Problems] The simultaneous linear equation analysis device of the present invention solves i of a partial differential equation.
A coefficient matrix decomposition circuit takes the coefficient matrix and coefficient vector of the simultaneous linear equations obtained by the four-loss approximation as input data, and decomposes the coefficient matrix into an approximate matrix that can be inverted, and the vector data input from the iterative calculation circuit described later. It has a forward/backward substitution circuit that performs an inverse matrix operation on the approximate matrix generated by the coefficient matrix decomposition circuit and outputs it to the iterative calculation circuit, and an iterative calculation circuit connected thereto, which calculates approximate solutions of the simultaneous linear equations. An apparatus for analyzing simultaneous linear equations as an output, wherein the coefficient matrix decomposition circuit divides the coefficient matrix into block matrices, and calculates an incomplete Cbol for each diagonal block matrix.
eski decomposition or incomplete LU decomposition, and the forward and backward substitution circuit performs an inverse matrix operation of the approximate matrix by performing forward and backward substitution using each incompletely decomposed diagonal block matrix on the vector data input from the iterative calculation circuit. It is configured as follows.

[Effect]

本発明においては、未知ベクトルＵの分割Ｕ−［ＩＪ　
ｌ、＝・、　１４　ｍ　］に対応して方程式Ａｕ＝ｆを
と表したとき、対角ブロックＡ　１１．　Ａ　２２．・
・・Ａ　ｍ１１１それぞれを不完全Ｌ　Ｕ分解する。Ａ
ｌｌの不完全ＬＵ分解された行列をＬ　、、Ｕ　、、＝
Ｍ　、、と表わし、逆行列計算のための近似行列（これ
をＭとする）を、と定義する。In the present invention, the unknown vector U is divided U−[IJ
l, = . A22.・
... Perform incomplete LU decomposition of each A m111. A
Let the incomplete LU decomposition matrix of ll be L,,U,,=
M, , , and an approximate matrix (this is M) for inverse matrix calculation is defined as follows.

こうすると、近似行列Ｍの逆行列操作は各対角行列Ｍ■
毎に独立に行うことができる。即ち、（以下余白）従ってＭによる前進後退代入計算、ｖ＝Ｍ−’ｇ、は各
プロセッサＰＵＰ（ｉ・１．・・・、ａ＋）による独立
な前進後退代入計算、Ｖ＋”Ｍｚ−’　ｇ＋（ｊｌ＋−
Ｊ）で置き換えられる。In this way, the inverse matrix operation of the approximate matrix M is performed using each diagonal matrix M
It can be done independently for each. That is, (blank space below) Therefore, forward/backward assignment calculation by M, v=M-'g, is independent forward/backward assignment calculation by each processor PUP (i・1. . . , a+), V+"Mz-' g+ (jl+-
J).

〔Example〕

以下、本発明について図面を参照して説明する。 Hereinafter, the present invention will be explained with reference to the drawings.

第１図は本発明の一実施例を示す基本構成図、第２図は
装置構成図である。同図において連立一次方程式解析装
置はｍ台のプロセッサｐｕ１．・・・ＰＵ。FIG. 1 is a basic configuration diagram showing one embodiment of the present invention, and FIG. 2 is a device configuration diagram. In the same figure, the simultaneous linear equation analysis device includes m processors pu1. ...PU.

から成り、各Ｐｔｌ、がローカルメモリＬＭＩと、共有
メモリＳＭ、−，およびＳＭＩをアクセスできる。また
コントロールプロセッサＣＰは全共有メモリＳＭ、（ｉ
・１、・・・、ｍ）、をアクセスできる。第１図の係数
行列分解回路１．前進後退代入回路２２反復計算回路３
は各ＰＵＩ内にそれぞれ独立に構成されている。, and each Ptl can access the local memory LMI and the shared memories SM, -, and SMI. In addition, the control processor CP has all the shared memories SM, (i
・1, ..., m) can be accessed. Coefficient matrix decomposition circuit 1 in FIG. Forward/backward substitution circuit 22 Iteration calculation circuit 3
are configured independently within each PUI.

連立一次方程式Ａｕ＝ｆにおいてＡは対称止定行列であ
るとし、反復計算回路で共役勾配法を用いるものとする
。また方程式の構造をとする。有限要素法、差分法等で得られる方程式はこの
ような構造を持つ。この方程式を第２図の装置で解くた
めに、部分行列Ａ　１　＝　［Ａ　１ｌ−ＩＡ　１１Ａ
１１＋１］、部分ベタ１〜ルｆ、をローカルメモリＬＭ
、に、近似解ベクｉ・ルｕ１を共有メモリＳＭ＋−１゜
ＳＭ’、にそれぞれ割り当て、各ｕｉの計算を各ＰＵ。In the simultaneous linear equations Au=f, A is assumed to be a symmetric fixed matrix, and the iterative calculation circuit uses the conjugate gradient method. Also assume the structure of the equation. Equations obtained by the finite element method, finite difference method, etc. have this structure. In order to solve this equation using the apparatus shown in FIG. 2, the submatrix A 1 = [A 1l-IA 11A
11+1], partial solids 1 to 1f, to local memory LM
, the approximate solution vectors i and u1 are allocated to the shared memory SM+-1°SM', and the calculation of each ui is performed by each PU.

で行う。Do it with

ＰＵＩは係数行列分解回路でＬＭ、からＡ　１１を取り
込み、Ｌ口ＩＬ目”に不完全ＬＵ分解し、これをＬＭ、
に書き込む。また前進後退代入回路では反復計算回路か
ら入力されたベクトル、例えばｇ＋＋に対してＶ＋”（
ＬｚＬ目”　）−’　ｇ＋　を出力する。PUI uses a coefficient matrix decomposition circuit to take in A11 from LM, performs incomplete LU decomposition into "L-th IL", and converts this into LM,
write to. In addition, in the forward/backward substitution circuit, for a vector input from the iterative calculation circuit, for example, g++, V+''(
Output LzLth ")-' g+.

第３図にＰＵ、の反復計算回路におけるフローチャート
を示す。基本的な流れは前処理付き共役勾配法に従って
いる。３０１のｕ　＋−ｔ　＋　ｕ　ｒ　＋　ｕ　＋＋
ｔはそれソｈ　（１）初期値を表す。ＰＵＩは３０１　
′Ｃ″ｕ　ｌ−１，ｕｌ＋１をＳＭ、−、、ＳＭ、から
それぞれ取り込むことができる。FIG. 3 shows a flowchart in the iterative calculation circuit of the PU. The basic flow follows the preconditioned conjugate gradient method. 301 u ++ - t + ur + u ++
t represents the (1) initial value. PUI is 301
'C''ul-1, ul+1 can be taken in from SM, -, ,SM, respectively.

同様に３０２．３１２でｐＩをＳＭＩ−ｔ、ＳＭ＋Ｍｚ
に書き込むことで、各ＰＵＩは３０５でＰ　＋−ｔ＋ｐ
ｌ＋ｌを直接取り込むことができる。３０２．３０８で
はベクトルを前進後退代入回路に送り、Ｍ−１が掛けら
れた結果を得る。この計算は各ＰＵで並列に行えること
は前に述べた。３０３．３０４で内積ｒ＝　（ｐ、ｒ＞
−４：（ｐＩ＋ｒ　＋）＝ｎ７　＋の計算において、各
ＰＵＩが１１を計算し、それらの総和をＣＰが計算する
。３０７，３０８及び３０９．３１０も同様である。Ｃ
Ｐはε＝εε■により収束判定を行い、収束した場合各
ＰＵＩに終了を命令する（３１１）　。Similarly, pI is SMI-t, SM+Mz at 302.312
By writing to , each PUI is 305 and P + - t + p
l+l can be taken directly. At 302.308, the vector is sent to the forward/backward substitution circuit to obtain a result multiplied by M-1. As mentioned above, this calculation can be performed in parallel on each PU. 303.304, inner product r= (p, r>
-4: In the calculation of (pI+r +)=n7 +, each PUI calculates 11, and the CP calculates their sum. The same applies to 307, 308 and 309.310. C
P makes a convergence judgment based on ε=εε■, and if convergence occurs, instructs each PUI to terminate (311).

〔Effect of the invention〕

以上の説明から明らかなように、本発明により、近似行
列による逆行列操作のための前進後退代入計算が並列処
理可能となり、複数のプロセッサから成る並列計算機を
効果的に利用できる。不完全ＬＵ分解を行う範囲を対角
ブロック部分に限定しているため、係数行列全体に対し
て分解を行う従来法に比べ、ブロック分割が細いほど収
束性は低いことが予想される。しかしプロセッサ台数が
方程式の元数と等しい、即ち、１未知数が１ブロツクに
対応する最悪の場合においても反復数の増加は３倍程度
であることが実験により確かめられている。通常プロセ
ッサ台数はもつと少なく、従ってブロックは大きいから
、その範囲では収束性の低下はほとんど問題にならない
。As is clear from the above description, according to the present invention, forward and backward substitution calculations for inverse matrix operations using approximate matrices can be processed in parallel, and a parallel computer comprising a plurality of processors can be effectively utilized. Since the range in which incomplete LU decomposition is performed is limited to the diagonal block portion, it is expected that the narrower the block division, the lower the convergence compared to the conventional method in which decomposition is performed on the entire coefficient matrix. However, it has been experimentally confirmed that even in the worst case where the number of processors is equal to the number of elements of the equation, that is, one unknown corresponds to one block, the number of iterations increases by about three times. Normally, the number of processors is small and the blocks are therefore large, so within that range, deterioration in convergence is hardly a problem.

[Brief explanation of the drawing]

第１図は本発明の一実施例を示す基本構成図、第２図は
並列計算装置の構造図、第３図は反復計算回路の処理手
順を示すフロー図である。ｌ・・・係数行列分解回路、２・・・前進後退代入回路
、３・・・反復計算回路、１１・・・入力データ、１２
・・・Ａの近似行列Ｍの分解された行列、１３・・・ベ
クトル（仮にｇと表ず）、１４・・・ベクトルｖ（−Ｍ
　−１ｇ又はＭ”　−’　ｇ）、１５・・・方程式Ａｕ
−＄　／　図１１：人〃テパ−ダ（行列Ａ、ベアトルｆ）１２　　：
　Ａ　ｃｌ）Ｍ）？＃ｐｙすＭｎ　Ｌ　Ｕインラγｆ７
ｈＥ；ｔＤすＩ３：べ７トンレづｆ４：ベフトルＶＣ＝Ｍ−”ｊ父上Ｍ７ｔ３）１５：方
耀人Ａｕ＝ｆの近似解八ットル匡第　２　回ＣＰ：ブン）ロー）Ｌ・プロセッサｓＨ：＄屑メ上りＰＵ：アロセッサ・ユニットＬＮ：ローカル・メモリFIG. 1 is a basic configuration diagram showing an embodiment of the present invention, FIG. 2 is a structural diagram of a parallel computing device, and FIG. 3 is a flow diagram showing a processing procedure of an iterative computing circuit. l... Coefficient matrix decomposition circuit, 2... Forward/backward substitution circuit, 3... Iterative calculation circuit, 11... Input data, 12
... Matrix decomposed of approximate matrix M of A, 13... Vector (temporarily expressed as g), 14... Vector v (-M
-1g or M''-'g), 15...Equation Au
-$ / Figure 11: Person Tepada (Matrix A, Beatle f) 12:
Acl)M)? #pysuMn L U inla γf7
hE;tDsuI3:Be7tonrezuf4:Beftor VC=M-”j Father M7t3) 15:Approximate solution of Fang Yaojin Au=f8ttlecon 2nd CP:Bun)Lo)L processor sH:$waste Upstream PU: Allocessor unit LN: Local memory

Claims

[Claims]

A coefficient matrix decomposition circuit that takes as input data the coefficient matrix and coefficient vector of a system of linear equations obtained by discrete approximation of a partial differential equation, and decomposes the coefficient matrix into an approximate matrix that can be inverted, and input from an iterative calculation circuit described later. It has a forward/backward substitution circuit that performs an inverse matrix operation on the approximate matrix generated by the coefficient matrix decomposition circuit on vector data and outputs it to the iterative calculation circuit, and an iterative calculation circuit connected thereto, A system for analyzing simultaneous linear equations that outputs an approximate solution, wherein the coefficient matrix decomposition circuit divides the coefficient matrix into block matrices, and performs incomplete Choleski decomposition or incomplete LU decomposition on each diagonal block matrix. The forward and backward substitution circuit performs an inverse matrix operation of the approximate matrix by performing forward and backward substitution using each diagonal block matrix incompletely decomposed into the vector data input from the iterative calculation circuit. Simultaneous linear equation analysis device.