JPS63127366A - Analyzer for simultaneous linear equations system - Google Patents
Analyzer for simultaneous linear equations systemInfo
- Publication number
- JPS63127366A JPS63127366A JP27471886A JP27471886A JPS63127366A JP S63127366 A JPS63127366 A JP S63127366A JP 27471886 A JP27471886 A JP 27471886A JP 27471886 A JP27471886 A JP 27471886A JP S63127366 A JPS63127366 A JP S63127366A
- Authority
- JP
- Japan
- Prior art keywords
- matrix
- circuit
- approximate
- coefficient
- decomposition
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 239000011159 matrix material Substances 0.000 claims abstract description 74
- 238000004364 calculation method Methods 0.000 claims abstract description 32
- 238000006467 substitution reaction Methods 0.000 claims abstract description 17
- 239000013598 vector Substances 0.000 claims abstract description 17
- 238000000354 decomposition reaction Methods 0.000 claims description 29
- 238000004458 analytical method Methods 0.000 claims description 4
- 230000014509 gene expression Effects 0.000 abstract 2
- 230000007850 degeneration Effects 0.000 abstract 1
- 230000015654 memory Effects 0.000 description 7
- 238000000034 method Methods 0.000 description 6
- 238000010586 diagram Methods 0.000 description 5
- 238000007796 conventional method Methods 0.000 description 3
- 239000013256 coordination polymer Substances 0.000 description 3
- 239000011324 bead Substances 0.000 description 2
- 238000002939 conjugate gradient method Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000006866 deterioration Effects 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 238000011144 upstream manufacturing Methods 0.000 description 1
- 239000002699 waste material Substances 0.000 description 1
Landscapes
- Complex Calculations (AREA)
Abstract
Description
【発明の詳細な説明】
〔産業上の利用分野〕
本発明は連立一次方程式解析装置、特に有限要素解析方
式による連立一次方程式解析装置に関するものである。DETAILED DESCRIPTION OF THE INVENTION [Field of Industrial Application] The present invention relates to a system for analyzing simultaneous linear equations, and particularly to a system for analyzing simultaneous linear equations using a finite element analysis method.
従来の技術としては、係数行列の全非零要素に対して不
完全Choleski分解又は不完全LU分解(以下で
はこれらを総称して不完全LU分解と呼ぶ)を行い、得
られた行列を逆行列操作のための近似行列とする方法が
知られている。即ち本発明の構成(第1図)に従えば、
係数行列分解回路で係数行列の全非零要素の不完全LU
分解を行い、前進後退代入回路で前記不完全LU分解さ
れた近似行列による逆行列操作を行う。Conventional techniques include performing incomplete Choleski decomposition or incomplete LU decomposition (hereinafter collectively referred to as incomplete LU decomposition) on all nonzero elements of a coefficient matrix, and inverting the resulting matrix. A method of creating an approximate matrix for manipulation is known. That is, according to the configuration of the present invention (FIG. 1),
Incomplete LU of all non-zero elements of coefficient matrix in coefficient matrix decomposition circuit
Decomposition is performed, and a forward/backward substitution circuit performs an inverse matrix operation using the approximate matrix subjected to the incomplete LU decomposition.
行列AのLLI分解はAを下三角行列りと上三角行列U
の積LU (=A>に分解するものであり、これにより
例えば、Au・「なる方程式の未知ベクトルUをv=L
−”f(前進代入)、u=LJ−’v(後退代入)によ
り求めるものであるが、偏微分方程式の離散近似により
得られる連立一次方程式では係数行列Aは疎である(全
行列要素に占める非零要素の割合が低い)のに対し、L
U分解された行列はそれよりも非零要素が多くなる。通
常この非零要素の発生のことをI’1ll−inと呼ん
でいるが、大規模な問題ではこのfill−inによる
メモリ使用量・演算量の増大がネックとなって、事実上
計算不可能となる。そこで不完全LU分解法では、この
fill−in部分を無視して(零に近似して)LU分
解を行う。従って、係数行列の全非零要素に対して不完
全LU分解を行って得られる行列し。The LLI decomposition of matrix A transforms A into a lower triangular matrix and an upper triangular matrix U
The product LU (=A>) is decomposed into the product LU (=A>. For example, the unknown vector U in the equation of
-"f (forward substitution) and u=LJ-'v (backward substitution), but in simultaneous linear equations obtained by discrete approximation of partial differential equations, the coefficient matrix A is sparse (all matrix elements are ), whereas L
The U-decomposed matrix has more nonzero elements than that. Normally, the occurrence of this non-zero element is called I'1ll-in, but in large-scale problems, the increase in memory usage and calculation amount due to this fill-in becomes a bottleneck, making it virtually impossible to calculate. becomes. Therefore, in the incomplete LU decomposition method, LU decomposition is performed while ignoring this fill-in part (approximating it to zero). Therefore, the matrix obtained by performing incomplete LU decomposition on all nonzero elements of the coefficient matrix.
UはそれぞれAの上玉角部分、上玉角部分と同じ横道を
持つ。U has the same lateral paths as the upper bead corner part and the upper bead corner part of A, respectively.
この従来技術については、村田健朗他著、「スーパーコ
ンピュータ、科学技術計算への応用」(丸首)に詳述さ
れている。This conventional technology is described in detail in ``Supercomputers, Applications to Scientific and Technical Calculations'' (published by Kenro Murata et al.).
〔発明が解決しようとする問題点〕 例えば、複数のプロセッサ(これをPIJI(i川。[Problem that the invention seeks to solve] For example, multiple processors (PIJI).
・・・、m)と表す)を用いて連立一次方程式A ll
・1″を解くために、未知ベクI・ル11−[+11.
・・・、Uお]のように11個のブロックu1(i・1
.・・・、m)に分割し、各ulの計算をPU、で行う
ものとする。..., m)) to form simultaneous linear equations A ll
・To solve 1″, use the unknown vector I・11−[+11.
..., Uo], 11 blocks u1(i・1
.. ..., m), and each ul is calculated by PU.
前進・後退代入計算では、行列の非対角部分に位置する
非零要素は演算結果のフィードバックを意味する。例え
ば対角に隣接する行列要素が非零であると、その代入計
算は一つ前の行の計算が終わってから始めて次の行の計
算が始まるといった逐次計算となる。従って従来の係数
行列の全非零要素に対して不完全LU分解を行う方法で
は、L及びUによる前進及び後退代入計算が、プロセッ
サP II + (i・1.・・・、m)毎の逐次処理
となり、並列処理による高速化の妨げとなる。In forward/backward substitution calculations, non-zero elements located in off-diagonal parts of the matrix mean feedback of the calculation results. For example, if the diagonally adjacent matrix elements are non-zero, the assignment calculation will be a sequential calculation in which the calculation of the next row begins only after the calculation of the previous row is completed. Therefore, in the conventional method of performing incomplete LU decomposition for all non-zero elements of a coefficient matrix, forward and backward substitution calculations using L and U are performed for each processor P II + (i・1..., m). This results in sequential processing, which hinders speeding up by parallel processing.
したがって本発明が解決しようとする問題点、換言すれ
ば本発明の目的は複数のプロセッサによる並列処理を可
能とする手順を適用することによって上記の問題点を改
良した連立一次方程式解析装置を提供することにある。Therefore, the problem to be solved by the present invention, in other words, the object of the present invention is to provide a system for analyzing simultaneous linear equations that improves the above-mentioned problems by applying a procedure that enables parallel processing by a plurality of processors. There is a particular thing.
11問題点を解決するための手段〕
本発明の連立一次方程式解析装置は、偏微分方程式のi
4敗近似により得られる連立一次方程式の係数行列と係
数ベクトルを入力データとし、前記係数行列を逆行列操
作可能な近似行列に分解する係数行列分解回路と、後記
反復計算回路から入力したベクトルデータに前記係数行
列分解回路で生成された近似行列の逆行列操作を行い反
復計算回路に出力する前進後退代入回路と、これに接続
される反復計算回路とを有し、前記連立一次方程式の近
似解を出力とする連立一次方程式解析装置であって、前
記係数行列分解回路は前記係数行列をブロック行列に分
割し、その各対角ブロック行列に対して不完全Cbol
eski分解又は不完全LU分解を行い、前記前進後退
代入回路は反復計算回路から入力したベクトルデータに
不完全分解された各対角ブロック行列による前進後退代
入を行うことによって近似行列の逆行列操作を行うよう
にして構成される。Means for Solving 11 Problems] The simultaneous linear equation analysis device of the present invention solves i of a partial differential equation.
A coefficient matrix decomposition circuit takes the coefficient matrix and coefficient vector of the simultaneous linear equations obtained by the four-loss approximation as input data, and decomposes the coefficient matrix into an approximate matrix that can be inverted, and the vector data input from the iterative calculation circuit described later. It has a forward/backward substitution circuit that performs an inverse matrix operation on the approximate matrix generated by the coefficient matrix decomposition circuit and outputs it to the iterative calculation circuit, and an iterative calculation circuit connected thereto, which calculates approximate solutions of the simultaneous linear equations. An apparatus for analyzing simultaneous linear equations as an output, wherein the coefficient matrix decomposition circuit divides the coefficient matrix into block matrices, and calculates an incomplete Cbol for each diagonal block matrix.
eski decomposition or incomplete LU decomposition, and the forward and backward substitution circuit performs an inverse matrix operation of the approximate matrix by performing forward and backward substitution using each incompletely decomposed diagonal block matrix on the vector data input from the iterative calculation circuit. It is configured as follows.
本発明においては、未知ベクトルUの分割U−[IJ
l、=・、 14 m ]に対応して方程式Au=fを
と表したとき、対角ブロックA 11. A 22.・
・・A m111それぞれを不完全L U分解する。A
llの不完全LU分解された行列をL 、、U 、、=
M 、、と表わし、逆行列計算のための近似行列(これ
をMとする)を、
と定義する。In the present invention, the unknown vector U is divided U−[IJ
l, = . A22.・
... Perform incomplete LU decomposition of each A m111. A
Let the incomplete LU decomposition matrix of ll be L,,U,,=
M, , , and an approximate matrix (this is M) for inverse matrix calculation is defined as follows.
こうすると、近似行列Mの逆行列操作は各対角行列M■
毎に独立に行うことができる。即ち、(以下余白)
従ってMによる前進後退代入計算、v=M−’g、は各
プロセッサPUP(i・1.・・・、a+)による独立
な前進後退代入計算、V+”Mz−’ g+(jl+−
J)で置き換えられる。In this way, the inverse matrix operation of the approximate matrix M is performed using each diagonal matrix M
It can be done independently for each. That is, (blank space below) Therefore, forward/backward assignment calculation by M, v=M-'g, is independent forward/backward assignment calculation by each processor PUP (i・1. . . , a+), V+"Mz-' g+ (jl+-
J).
以下、本発明について図面を参照して説明する。 Hereinafter, the present invention will be explained with reference to the drawings.
第1図は本発明の一実施例を示す基本構成図、第2図は
装置構成図である。同図において連立一次方程式解析装
置はm台のプロセッサpu1.・・・PU。FIG. 1 is a basic configuration diagram showing one embodiment of the present invention, and FIG. 2 is a device configuration diagram. In the same figure, the simultaneous linear equation analysis device includes m processors pu1. ...PU.
から成り、各Ptl、がローカルメモリLMIと、共有
メモリSM、−,およびSMIをアクセスできる。また
コントロールプロセッサCPは全共有メモリSM、(i
・1、・・・、m)、をアクセスできる。第1図の係数
行列分解回路1.前進後退代入回路22反復計算回路3
は各PUI内にそれぞれ独立に構成されている。, and each Ptl can access the local memory LMI and the shared memories SM, -, and SMI. In addition, the control processor CP has all the shared memories SM, (i
・1, ..., m) can be accessed. Coefficient matrix decomposition circuit 1 in FIG. Forward/backward substitution circuit 22 Iteration calculation circuit 3
are configured independently within each PUI.
連立一次方程式Au=fにおいてAは対称止定行列であ
るとし、反復計算回路で共役勾配法を用いるものとする
。また方程式の構造を
とする。有限要素法、差分法等で得られる方程式はこの
ような構造を持つ。この方程式を第2図の装置で解くた
めに、部分行列A 1 = [A 1l−IA 11A
11+1]、部分ベタ1〜ルf、をローカルメモリLM
、に、近似解ベクi・ルu1を共有メモリSM+−1゜
SM’、にそれぞれ割り当て、各uiの計算を各PU。In the simultaneous linear equations Au=f, A is assumed to be a symmetric fixed matrix, and the iterative calculation circuit uses the conjugate gradient method. Also assume the structure of the equation. Equations obtained by the finite element method, finite difference method, etc. have this structure. In order to solve this equation using the apparatus shown in FIG. 2, the submatrix A 1 = [A 1l-IA 11A
11+1], partial solids 1 to 1f, to local memory LM
, the approximate solution vectors i and u1 are allocated to the shared memory SM+-1°SM', and the calculation of each ui is performed by each PU.
で行う。Do it with
PUIは係数行列分解回路でLM、からA 11を取り
込み、L口IL目”に不完全LU分解し、これをLM、
に書き込む。また前進後退代入回路では反復計算回路か
ら入力されたベクトル、例えばg++に対してV+”(
LzL目” )−’ g+ を出力する。PUI uses a coefficient matrix decomposition circuit to take in A11 from LM, performs incomplete LU decomposition into "L-th IL", and converts this into LM,
write to. In addition, in the forward/backward substitution circuit, for a vector input from the iterative calculation circuit, for example, g++, V+''(
Output LzLth ")-' g+.
第3図にPU、の反復計算回路におけるフローチャート
を示す。基本的な流れは前処理付き共役勾配法に従って
いる。301のu +−t + u r + u ++
tはそれソh (1)初期値を表す。PUIは301
′C″u l−1,ul+1をSM、−、、SM、から
それぞれ取り込むことができる。FIG. 3 shows a flowchart in the iterative calculation circuit of the PU. The basic flow follows the preconditioned conjugate gradient method. 301 u ++ - t + ur + u ++
t represents the (1) initial value. PUI is 301
'C''ul-1, ul+1 can be taken in from SM, -, ,SM, respectively.
同様に302.312でpIをSMI−t、SM+Mz
に書き込むことで、各PUIは305でP +−t+p
l+lを直接取り込むことができる。302.308で
はベクトルを前進後退代入回路に送り、M−1が掛けら
れた結果を得る。この計算は各PUで並列に行えること
は前に述べた。303.304で内積r= (p、r>
−4:(pI+r +)=n7 +の計算において、各
PUIが11を計算し、それらの総和をCPが計算する
。307,308及び309.310も同様である。C
Pはε=εε■により収束判定を行い、収束した場合各
PUIに終了を命令する(311) 。Similarly, pI is SMI-t, SM+Mz at 302.312
By writing to , each PUI is 305 and P + - t + p
l+l can be taken directly. At 302.308, the vector is sent to the forward/backward substitution circuit to obtain a result multiplied by M-1. As mentioned above, this calculation can be performed in parallel on each PU. 303.304, inner product r= (p, r>
-4: In the calculation of (pI+r +)=n7 +, each PUI calculates 11, and the CP calculates their sum. The same applies to 307, 308 and 309.310. C
P makes a convergence judgment based on ε=εε■, and if convergence occurs, instructs each PUI to terminate (311).
以上の説明から明らかなように、本発明により、近似行
列による逆行列操作のための前進後退代入計算が並列処
理可能となり、複数のプロセッサから成る並列計算機を
効果的に利用できる。不完全LU分解を行う範囲を対角
ブロック部分に限定しているため、係数行列全体に対し
て分解を行う従来法に比べ、ブロック分割が細いほど収
束性は低いことが予想される。しかしプロセッサ台数が
方程式の元数と等しい、即ち、1未知数が1ブロツクに
対応する最悪の場合においても反復数の増加は3倍程度
であることが実験により確かめられている。通常プロセ
ッサ台数はもつと少なく、従ってブロックは大きいから
、その範囲では収束性の低下はほとんど問題にならない
。As is clear from the above description, according to the present invention, forward and backward substitution calculations for inverse matrix operations using approximate matrices can be processed in parallel, and a parallel computer comprising a plurality of processors can be effectively utilized. Since the range in which incomplete LU decomposition is performed is limited to the diagonal block portion, it is expected that the narrower the block division, the lower the convergence compared to the conventional method in which decomposition is performed on the entire coefficient matrix. However, it has been experimentally confirmed that even in the worst case where the number of processors is equal to the number of elements of the equation, that is, one unknown corresponds to one block, the number of iterations increases by about three times. Normally, the number of processors is small and the blocks are therefore large, so within that range, deterioration in convergence is hardly a problem.
第1図は本発明の一実施例を示す基本構成図、第2図は
並列計算装置の構造図、第3図は反復計算回路の処理手
順を示すフロー図である。
l・・・係数行列分解回路、2・・・前進後退代入回路
、3・・・反復計算回路、11・・・入力データ、12
・・・Aの近似行列Mの分解された行列、13・・・ベ
クトル(仮にgと表ず)、14・・・ベクトルv(−M
−1g又はM” −’ g)、15・・・方程式Au
−$ / 図
11:人〃テパ−ダ(行列A、ベアトルf)12 :
A cl)M)?#pyすMn L Uインラγf7
hE;tDすI3:べ7トンレづ
f4:ベフトルVC=M−”j父上M7t3)15:方
耀人Au=fの近似解八ットル匡第 2 回
CP:ブン)ロー)L・プロセッサ
sH:$屑メ上り
PU:アロセッサ・ユニット
LN:ローカル・メモリFIG. 1 is a basic configuration diagram showing an embodiment of the present invention, FIG. 2 is a structural diagram of a parallel computing device, and FIG. 3 is a flow diagram showing a processing procedure of an iterative computing circuit. l... Coefficient matrix decomposition circuit, 2... Forward/backward substitution circuit, 3... Iterative calculation circuit, 11... Input data, 12
... Matrix decomposed of approximate matrix M of A, 13... Vector (temporarily expressed as g), 14... Vector v (-M
-1g or M''-'g), 15...Equation Au
-$ / Figure 11: Person Tepada (Matrix A, Beatle f) 12:
Acl)M)? #pysuMn L U inla γf7
hE;tDsuI3:Be7tonrezuf4:Beftor VC=M-”j Father M7t3) 15:Approximate solution of Fang Yaojin Au=f8ttlecon 2nd CP:Bun)Lo)L processor sH:$waste Upstream PU: Allocessor unit LN: Local memory
Claims (1)
の係数行列と係数ベクトルを入力データとし、前記係数
行列を逆行列操作可能な近似行列に分解する係数行列分
解回路と、後記反復計算回路から入力したベクトルデー
タに前記係数行列分解回路で生成された近似行列の逆行
列操作を行い反復計算回路に出力する前進後退代入回路
と、これに接続される反復計算回路とを有し、前記連立
一次方程式の近似解を出力とする連立一次方程式解析装
置であって、前記係数行列分解回路は前記係数行列をブ
ロック行列に分割し、その各対角ブロック行列に対して
不完全Choleski分解又は不完全LU分解を行い
、前記前進後退代入回路は反復計算回路から入力したベ
クトルデータに不完全分解された各対角ブロック行列に
よる前進後退代入を行うことによつて近似行列の逆行列
操作を行うことを特徴とする連立一次方程式解析装置。A coefficient matrix decomposition circuit that takes as input data the coefficient matrix and coefficient vector of a system of linear equations obtained by discrete approximation of a partial differential equation, and decomposes the coefficient matrix into an approximate matrix that can be inverted, and input from an iterative calculation circuit described later. It has a forward/backward substitution circuit that performs an inverse matrix operation on the approximate matrix generated by the coefficient matrix decomposition circuit on vector data and outputs it to the iterative calculation circuit, and an iterative calculation circuit connected thereto, A system for analyzing simultaneous linear equations that outputs an approximate solution, wherein the coefficient matrix decomposition circuit divides the coefficient matrix into block matrices, and performs incomplete Choleski decomposition or incomplete LU decomposition on each diagonal block matrix. The forward and backward substitution circuit performs an inverse matrix operation of the approximate matrix by performing forward and backward substitution using each diagonal block matrix incompletely decomposed into the vector data input from the iterative calculation circuit. Simultaneous linear equation analysis device.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP27471886A JPS63127366A (en) | 1986-11-17 | 1986-11-17 | Analyzer for simultaneous linear equations system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP27471886A JPS63127366A (en) | 1986-11-17 | 1986-11-17 | Analyzer for simultaneous linear equations system |
Publications (1)
Publication Number | Publication Date |
---|---|
JPS63127366A true JPS63127366A (en) | 1988-05-31 |
Family
ID=17545602
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
JP27471886A Pending JPS63127366A (en) | 1986-11-17 | 1986-11-17 | Analyzer for simultaneous linear equations system |
Country Status (1)
Country | Link |
---|---|
JP (1) | JPS63127366A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH0444165A (en) * | 1990-06-12 | 1992-02-13 | Nec Corp | Solution making system for symmetrical linear equations |
CN107423259A (en) * | 2017-06-22 | 2017-12-01 | 东南大学 | A kind of GPU of domino optimization accelerates trigonometric equation group back substitution method on electric power |
-
1986
- 1986-11-17 JP JP27471886A patent/JPS63127366A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH0444165A (en) * | 1990-06-12 | 1992-02-13 | Nec Corp | Solution making system for symmetrical linear equations |
CN107423259A (en) * | 2017-06-22 | 2017-12-01 | 东南大学 | A kind of GPU of domino optimization accelerates trigonometric equation group back substitution method on electric power |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Lim | Single-precision multiplier with reduced circuit complexity for signal processing applications | |
Parikh et al. | Block splitting for distributed optimization | |
Flanders et al. | On the matrix equations AX-XB=C and AX-YB=C | |
Keresztes et al. | An emulated digital CNN implementation | |
Stanimirović et al. | Simulation of varying parameter recurrent neural network with application to matrix inversion | |
Ienne et al. | GENES IV: A bit-serial processing element for a multi-model neural-network accelerator | |
Sajjad et al. | An efficient VLSI architecture for FastICA by using the algebraic Jacobi method for EVD | |
Irturk et al. | GUSTO: An automatic generation and optimization tool for matrix inversion architectures | |
Dimov et al. | A new highly convergent Monte Carlo method for matrix computations | |
JPS63127366A (en) | Analyzer for simultaneous linear equations system | |
Meyer-Bäse et al. | A parallel CORDIC architecture dedicated to compute the Gaussian potential function in neural networks | |
Van Ness | Inverse iteration method for finding eigenvectors | |
Chandy et al. | Systolic algorithms as programs | |
Lo et al. | Iterative solution of general sparse linear systems on clusters of workstations | |
Adams et al. | Validated numerics for continuation and bifurcation of connecting orbits of maps | |
Eskandari et al. | Pedersen–Takesaki operator equation and operator equation AX= B in Hilbert C⁎-modules | |
JP2884951B2 (en) | Matrix formulation for circuit-division simulation | |
TWI659324B (en) | Method and system for generating circuit planning results | |
Wing | A content-addressable systolic array for sparse matrix computation | |
Her et al. | Optimal orientations of transistor chains | |
Wyrzykowski | Processor arrays for matrix triangularisation with partial pivoting | |
Hornick | The mesh of trees architecture for parallel computation | |
Stroia et al. | GPU accelerated geometric multigrid method: Comparison with preconditioned conjugate gradient | |
Tafolla et al. | Low-synchronization Arnoldi Methods for the Matrix Exponential with Application to Exponential Integrators | |
Hashimoto et al. | A parallel architecture for recursive least square method |