JPH03265066A

JPH03265066A - Determinant solving processing system

Info

Publication number: JPH03265066A
Application number: JP6475990A
Authority: JP
Inventors: Hiroyuki Sato; 弘幸佐藤
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1990-03-15
Filing date: 1990-03-15
Publication date: 1991-11-26

Abstract

PURPOSE:To efficiently execute the parallel processing by executing the processing in parallel by plural sets of processors to which elements in positions being at intervals by a line unit are allocated, respectively. CONSTITUTION:To a processor 2-0, elements of, for instance, a line 1, a line 4, a line 7 and a line 10 in a matrix are allocated, and to a processor 2-1, elements of a line 2, a line 5, a line 8 and a line 11 are allocated, and to a processor 2-2, a line 3, a line 6, a line 9 and a line 12 are allocated. The operation is executed in such a form as the processor 2-0 executes an erasure with regard to an element (c) first, the processor 2-1 executes an erasure with regard to an element (a), and the processor 2-2 executes an erasure with regard to an element (b)... As a result, each processor executes a calculation by using the element allocated first, and it will suffice that data related to a prescribed number of elements is only received from other processor in accordance with necessity.

Description

【発明の詳細な説明】〔概　要〕帯行列を係数マトリクスとする連一次方程式の形に集約
される計算を効率よく実行する行列式解法処理方式に関
し。[Detailed Description of the Invention] [Summary] This invention relates to a determinant solution processing method that efficiently executes calculations that are summarized in the form of a series of linear equations using a band matrix as a coefficient matrix.

例えば夫々のプロセッサがメモリを分散して保持する形
の分散メモリ型の並列プロセッサ・システムにおいて、
効率よく並行処理を可能にすることを目的とし。For example, in a distributed memory parallel processor system where each processor stores memory in a distributed manner,
The purpose is to enable efficient parallel processing.

複数のプロセッサに夫々、マトリクスの行方向の飛び飛
びの位置のエレメント、あるいは行・列２方向の飛び飛
びの位置のエレメントを割り付けて処理を実行せしめる
ようにし、必要に応じて他プロセツサと交信するように
構成する。Elements at discrete positions in the row direction of the matrix, or elements at discrete positions in the row/column direction of the matrix are assigned to multiple processors to execute processing, and the processors communicate with other processors as necessary. Configure.

[Industrial application field]

本発明は１行列式解法処理力式、特に帯行列を係数マト
リクスとする連一次方程式の形に集約される計算を効率
よく実行する行列式解法処理方式に関する。The present invention relates to a 1-determinant solution processing power equation, and particularly to a determinant solution processing method that efficiently executes calculations that are summarized in the form of a series of linear equations using a banded matrix as a coefficient matrix.

連一次方程式は、工学分野の技術計算や、科学技術計算
に頻繁に現れ、応用分野はきわめて広い、なかでも、係
数行列が帯状になる問題は、偏微分方程式の解法や構造
計算等にしばしば使用されている。ことに、近年は大規
模な連一次方程式を高速に解く需要が高まっているが、
一般に疎な係数マトリクスをもつ連一次方程式は、ベク
トル計算機や並列計算機による高速化が困難である。Systems of linear equations frequently appear in technical calculations in the engineering field and in scientific and technical calculations, and have a wide range of applications.In particular, problems in which the coefficient matrix is band-shaped are often used in solving partial differential equations, structural calculations, etc. has been done. In particular, in recent years there has been an increasing demand for rapidly solving large-scale systems of linear equations.
In general, it is difficult to speed up a series of linear equations with a sparse coefficient matrix using a vector computer or a parallel computer.

る。Ru.

（従来の技術〕一般に連一次方程式は、直接法または反復法と呼ばれる
アルゴリズムで解かれる。直接法は係数マトリクスの掃
き出し計算またはこれと等価な方法で、また反復法は初
期値を与えて反復計算をさせて解に収束させる方法であ
る。それぞれ、利点や欠点があり９問題毎に適した方法
が採用される。(Prior art) Generally, a set of linear equations is solved by an algorithm called a direct method or an iterative method.The direct method is a coefficient matrix sweep calculation or an equivalent method, and the iterative method is an iterative calculation using initial values. Each method has advantages and disadvantages, and a method suitable for each of the nine problems is adopted.

直接法が反復法に比べ解の精度が高（、解法も比較的単
純なため、精度を要求される問題を初めしばしば使用さ
れる。直接法の計算には、従来から逐次計算機が用いら
れてきたが、最近ではベクトル計算機も使用されている
。The direct method has higher solution accuracy than the iterative method (and the solution method is relatively simple, so it is often used for problems that require precision. Sequential computers have traditionally been used to calculate the direct method. However, recently vector calculators have also been used.

近年、ますます大規模な連一次方程式を高速に解くこと
が要求されているが、直接法は、掃き出し計算の手順が
基本的に逐次的であり、並列化が困難であることから、
並列計算機によって高速化を図ることが困難であった。In recent years, there has been a need to solve increasingly large-scale linear equations at high speed, but the direct method is difficult to parallelize because the sweep calculation procedure is basically sequential.
It was difficult to increase the speed using parallel computers.

特殊な形の係数マトリクスを持つ連一次方程式では、幾
つか並列解法アルゴリズムが存在する。Several parallel solving algorithms exist for a set of linear equations with a special type of coefficient matrix.

三重対角行列の並列解法として、Ｈａｎｇ（１）の方法
がある。この方法では、係数マトリクスをブロックに分
割し、並列に係数の消去計算を行うのが特徴である。As a parallel solution method for tridiagonal matrices, there is the method of Hang (1). This method is characterized by dividing the coefficient matrix into blocks and performing coefficient elimination calculations in parallel.

（１）　Ｈ，ＨＪａｎｇ：Ａ　ｐａｒａｌｌｅｌ　　ｍ
ｅｔｈｏｄ　ｆｏｒｔｒｉｄａｉａｇｏｎａｌ　　ｅｑ
ｕａｔｉｏｎｓ、　　＾ＣＭ　Ｔｒａｎｓ、Ｍａｔｈ。(1) H, HJang: A parallel m
ethod fortridaiagonal eq.
ations, ^CM Trans, Math.

Ｓｏｆｔｗ、　　Ｖｏｌ、７．　　Ｎｏ、２．　　ｐｐ
、１７０−１８３当該方法を採用して、複数のプロセッ
サによって並列的に計算せしめるに当っては、第１θ図
に示す如くエレメントを複数台（図示の場合には４台）
のプロセッサに割り付けて処理を行うようにされる。Softw, Vol. 7. No, 2. pp
, 170-183 When this method is adopted and calculation is performed in parallel by multiple processors, multiple elements (four in the illustrated case) are used as shown in Fig. 1θ.
is assigned to the processor for processing.

第１０図は従来の場合のエレメントの割り付は態様を示
す０図中のＡ、　　ｂは方程式におけるＡとｂとに対応
し、ＰＥＩないしＰＥ４は夫々プロセッサを表している
。FIG. 10 shows the arrangement of elements in the conventional case. A and b in FIG. 10 correspond to A and b in the equation, and PEI to PE4 represent processors, respectively.

第１１図および第１２図は−ａｎｇの方法を説明する説
明図を示す。なお第１２図（Ａ）（Ｂ）（Ｃ）（Ｄ）は
まとまって１つのフローチャートを示している。FIG. 11 and FIG. 12 show explanatory diagrams for explaining the -ang method. Note that FIGS. 12(A), (B), (C), and (D) collectively show one flowchart.

令弟１１図に示す如く１次元数１２で帯幅３を与えられ
る帯行列を係数マトリクスとする連一次方程式が与えら
れ、これを解くものとすると。As shown in Figure 11, we are given a series of linear equations whose coefficient matrix is a band matrix with a one-dimensional number of 12 and a band width of 3, and this is to be solved.

第１２図（Ａ）（Ｂ）（Ｃ）（Ｄ）に示す如き処理フロ
ーにしたがって、係数を消去して第１２図（Ｄ）に示す
マトリクス（４−５）の形を求めてゆくようにされる。According to the processing flow shown in Fig. 12 (A), (B), (C), and (D), the coefficients are deleted to obtain the shape of the matrix (4-5) shown in Fig. 12 (D). be done.

Ａ、（処理ａ）令弟１２図（Ａ）図示のマトリクス（４−１）を３台の
プロセッサにて処理を行うものとするときマトリクス（
４−１）に示す点線の如く、各エレメントを夫々のプロ
セッサに割り付ける。A. (Processing a) When the matrix (4-1) shown in Figure 12 (A) is processed by three processors, the matrix (
As shown by the dotted line in 4-1), each element is allocated to each processor.

Ｂ、（処理す、１ないしす、３）３台のプロセッサのうちの、第１のプロセッサはエレメ
ントｅ　！ｌ　ｅＳ＋　ｅ　ａを消去し、第２のプロセ
ッサはエレメントｅ＊＋ｅｙ＋ｅｓを消去し、第３のプ
ロセッサはエレメントｅｌｌｌ＋　　ｅｌｌ＋　　ｅｌ
！を消去するようにする。即ち、第１のプロセッサは（
２行）−（１行）ｘｅ、／ａ。B, (Processing, 1 to 3) Among the three processors, the first processor is element e! l eS+ e a, the second processor erases the element e*+ey+es, and the third processor erases the element ell+ ell+ el
! to be erased. That is, the first processor (
2nd line) - (1st line) xe, /a.

なる処理を実行してエレメントｅ！を消去し、第２のプ
ロセッサは（６行）−（５行）×ｅ６／ｄＢなる処理を実行してエレメントｅ、を消去し、第３のプ
ロセッサは（１０行）−（９行）　Ｘ　ｅ　＋ｏ／　ｄ　ｇなる処
理を実行してエレメントｅ、。を消去するし。Execute the process to create element e! , the second processor executes the process (6 lines) - (5 lines) x e6/dB and erases the element e, and the third processor executes (10 lines) - (9 lines) Execute the process e+o/dg to create element e. and delete it.

・・・・・・てゆく。・・・・・・・・・.

第１２図（Ｂ）に示すマトリクス（４−２）はエレメン
トｅ　ｔ＋　６２＋　６４１　ｅ　＆１　ｅ　７１　ｅ
　ｌ＋　ｅ　Ｉｌｌ＋　　ｅ　Ｉｌ＋ｅ１□が消去され
た状態を示している。なおマトリクス（４−２）におい
てエレメントｇ　Ｓ＋　ｇ　＆＋・・・が現れたのは、
上記消去処理の影響で現れたものである。The matrix (4-2) shown in FIG. 12(B) has elements e t+ 62+ 641 e &1 e 71 e
l+ e Ill+ e Il+e1□ shows the erased state. In addition, the element g S+ g &+... appeared in the matrix (4-2) because
This appears due to the effect of the above erasing process.

Ｃ，（処理ｃ、１ないしＣ１３）次いで第１２図（Ｂ）図示の如く、エレメントｆ　ｔ＋
　ｆ　、、ｆ　１１１＋　　ｒ　ｌ＋　ｒ　Ｓ＋　ｆ　
１．ｆ　４＋　ｆ　ｓを消去して。C, (processing c, 1 to C13) Next, as shown in FIG. 12(B), element f t+
f ,,f 111+ r l+ r S+ f
1. Erase f 4 + f s.

マトリクス（４−３）が得られる。A matrix (4-3) is obtained.

Ｄ、（処理ｄ、１ないしｄ、２）次いで第１２図（Ｃ）図示の如く、エレメントｅｓ＋ｇ
約ｇｆｆ＋ｇｌ＋ｅ！＋ｇｌｌｌ＋　　ｇ＋＋＋　　ｇ
ｌｌを消去してマトリクス（４−４）が得られる。D, (processing d, 1 to d, 2) Then, as shown in FIG. 12(C), element es+g
About gff+gl+e! +gllll+ g+++ g
By eliminating ll, matrix (4-4) is obtained.

Ｅ、（処理ｅ、１ないしＣ１３）次いで第１２図（Ｃ）ないし第１２図（Ｄ）図示の如く
、エレメント１１ｓ＋　ｈｗ＋　ｈ　１０．　　ｒ　Ｉ
Ｉ＋　　ｈａｈｓ＋ｈ＊＋ｈｔ＋ｔ＋＋＋ｈｚ＋ｆ：＋
を消去してマトリクス（４−５）が得られる。E, (processing e, 1 to C13) Next, as shown in FIGS. 12(C) to 12(D), elements 11s+ hw+ h 10. r I
I+ hahs+h*+ht+t+++hz+f:+
By erasing , matrix (4-5) is obtained.

[Problem to be solved by the invention]

第１２図（Ａ）ないしくＤ）に示した如き形で処理が行
われるが、第１２図（Ａ）ないしくＤ）図示の場合には
１次の如き問題が存在している。Processing is performed as shown in FIGS. 12(A) to 12D), but in the case shown in FIGS. 12(A) to 12D), a first-order problem exists.

即ち、処理（ｂ、１）、　（ｂ、２）、　（ｂ、３）、
　（ｃ、１）。That is, processing (b, 1), (b, 2), (b, 3),
(c, 1).

（ｃ、２）、　（ｃ、３）までの処理に関しては、３台
のプロセッサが並行処理を行うことができるが、マトリ
クス（４−３）から（４−５）に至る処理に関しては。Regarding the processing up to (c, 2) and (c, 3), three processors can perform parallel processing, but regarding the processing from matrix (4-3) to (4-5).

各プロセンサ内での処理が終了した上で他のプロセッサ
内での処理に進む形となり、処理が逐次処理になってし
まう。After the processing within each processor is completed, processing proceeds to the other processors, resulting in sequential processing.

複数のプロセッサが、共有メモリをアクセスしつつ処理
を進めている場合には、データの割り付けを変更するな
どによって並列処理が可能となるが１分散メモリ型のシ
ステムの場合にはデータを切り換えるための通信コスト
が増大してしまい効率的ではない。When multiple processors are processing while accessing shared memory, parallel processing is possible by changing the data allocation, but in the case of a distributed memory system, it is necessary to change the data allocation. This increases communication costs and is not efficient.

本発明は９例えば夫々のプロセッサがメモリを分散して
保持する形の分散メモリ型の並列プロセッサ・システム
において、効率よく並列処理を可能にすることを目的と
している。An object of the present invention is to enable efficient parallel processing in a distributed memory type parallel processor system in which each processor holds memory in a distributed manner, for example.

[Means to resolve the conspiracy]

第１図は本発明の原理構成図を示す０図中の符号１はホ
スト・プロセッサ、２−１は夫ｈプロセッサ（エレメン
ト・プロセッサ）、３は通信ネットワーク、Ａ、ｂは夫
々連一次方程式の既知数を表している。FIG. 1 shows a diagram of the principle configuration of the present invention. In the diagram, 1 is a host processor, 2-1 is a husband processor (element processor), 3 is a communication network, and A and b are respective linear equations. Represents a known quantity.

プロセッサ２−０にはマトリクスにおける例えば行１９
行４９行７１行１０のエレメントが割り付けられ、プロ
セッサ２−１にはマトリクスにおける例えば行２２行５
９行８２行１１のエレメントが割り付けられ、プロセッ
サ２−２にはマトリクスにおける例えば行３１行６３行
９１行１２が割り付けられる。即ち５本発明の場合には
、飛び飛びの位置のエレメントが割り付けられる。Processor 2-0 has row 19 in the matrix, for example.
The elements in row 49, row 71, row 10 are allocated to the processor 2-1, and the element in row 22, row 5 in the matrix is allocated to the processor 2-1.
Elements of 9 rows, 82 rows, and 11 are allocated, and for example, rows 31, 63, 91, and 12 in the matrix are allocated to the processor 2-2. That is, in the case of the present invention, elements at discrete positions are allocated.

[For production]

本発明の場合には、第１図図示の例で言えば。 In the case of the present invention, let us take the example shown in FIG.

最初に図示Ｃで示すエレメントについてプロセッサ２−
０が消去を行い９図示ａで示すエレメントについてプロ
セッサ２−１が消去を行い２図示すで示すエレメントに
ついてプロセッサ２−２が消去を行う・・・・・・よう
な形で演算が行われてゆく。First, regarding the element indicated by C in the diagram, the processor 2-
0 performs the deletion, 9 the processor 2-1 performs deletion on the element indicated by a, and the processor 2-2 performs deletion on the element indicated by 2 in the diagram. go.

この結果、各プロセッサは最初に割り付けられたエレメ
ントを用いて計算を行い、必要に応じて他プロセツサか
ら所定個数のエレメントについてのデータを受信するだ
けで足りることとなる。As a result, each processor only needs to perform calculations using the initially allocated elements and receive data regarding a predetermined number of elements from other processors as necessary.

〔Example〕

第２図（Ａ）ないし第２図（０）は本発明の場合の処理
過程を説明する説明図を示す０図示の場合には９次元数
２０でかつ帯幅５のマトリクスが存在している場合を表
している。FIGS. 2(A) to 2(0) are explanatory diagrams for explaining the processing steps in the case of the present invention. In the case shown in FIG. 0, there is a matrix with nine dimensions and 20 and a band width of 5 represents the case.

そして１図中のＯ印は非ゼロ値をもつエレメント・印は
ピボット・エレメント（他のエレメントを消去せしめる
ために用いられるエレメント）。The O mark in Figure 1 is an element with a non-zero value, and the mark is a pivot element (an element used to erase other elements).

Δ印はフィルイン・エレメント（消去過程で発生した「
非ゼロ値をもつエレメント」）、・印は消去されたエレ
メント、Ｐはピボット行、Ｅは消去される行、ＰＮＯは
プロセッサ番号を表している。The Δ mark is a fill-in element (“
"Elements with non-zero values"), * indicates an erased element, P is a pivot row, E is a row to be erased, and PNO represents a processor number.

（１）第２図（Ａ）・・・・・・図示ＰＮＯで示す如く
４台のプロセッサに対して、夫々のプロセッサに対して
飛び飛びの行位置のエレメントが割り付けられる。(1) FIG. 2(A)... As shown by PNO in the figure, elements at discrete row positions are allocated to each of the four processors.

（２）　　第２図（Ｂ）・・・・・・４台のプロセッサ
が並列的に夫々ピボット行のデータを受信した上で、ピ
ボット・エレメントを用いて図示の・印で示したエレメ
ントを消去する。(2) Figure 2 (B)... Four processors receive the data of the respective pivot rows in parallel, and then use the pivot elements to erase the elements indicated by the mark in the figure. do.

（３）第２図（Ｃ）ないし第２図（０）・・・・・・４
台のプロセッサが夫々図示のピボット行のデータを受信
した上で、夫々・印で示したピボット・エレメントを用
いて・印で示したエレメントを消去する。これによって
、第　２図（０）に示す如く、エレメントの消去処理が
行われたマトリクスを得ることができる。(3) Figure 2 (C) to Figure 2 (0)...4
Each of the processors receives the data of the illustrated pivot row, and then erases the element indicated by the symbol , using the pivot element indicated by the symbol . As a result, a matrix in which elements have been deleted can be obtained as shown in FIG. 2(0).

第３図（Ａ）ないし第３図（Ｆ）は具体的な処理例を示
す０図示の場合には簡単のため１次元数１２でかつ帯幅
３の場合を示している。3(A) to 3(F) show specific processing examples. In the case of 0, the number of one dimensions is 12 and the band width is 3 for simplicity.

第３図（Ａ）は３台のプロセッサに対してエレメントを
割り付けた状況を表している。FIG. 3(A) shows a situation in which elements are allocated to three processors.

一般には、プロセッサの台数をＰとすれば。Generally, if the number of processors is P.

Ｋ　（＝０．１．２．・・・、Ｐ−１）番目のプロセッ
サにはに＋ＰＸｊ（但しｊ　＝Ｏ，ｌ、　２．・・・）で与えられる行のエレメントを割り付けるようにする。The elements of the row given by +PXj (where j = O, l, 2..) are allocated to the K (=0.1.2...., P-1)th processor.

第３図（Ｂ）は第３図（Ａ）と同じマトリクスを示し、
第３図（Ｂ）図示の状態の下で。FIG. 3(B) shows the same matrix as FIG. 3(A),
FIG. 3(B) Under the condition shown.

（ｅ＋＋ｅｓ、ｅｑの消去）プロセッサＯは、０行を、プロセッサｌに送る。プロセ
ッサｌは、０行の係数からｅ、を消去する。(Erase of e++es, eq) Processor O sends row 0 to processor l. Processor l deletes e from the coefficients in row 0.

プロセッサ１は、４行を、プロセッサ２に送る。プロセ
ッサ２は、４行の係数からｅ、を消去する。Processor 1 sends 4 rows to processor 2. Processor 2 deletes e from the coefficients in the 4th row.

プロセッサ２は、８行を、プロセッサＯに送る、プロセ
ッサ０は、８行の係数からｅ、を消去する。Processor 2 sends row 8 to processor O. Processor 0 deletes e from the coefficients of row 8.

（ｅ！＋ｅ＆＋ｅ＋。の消去）プロセッサ１は、１行を、プロセッサ２に送る。プロセ
ッサ２は、１行の係数からｅ２を消去する。(Erasure of e!+e&+e+.) Processor 1 sends one line to processor 2. Processor 2 deletes e2 from the coefficients in one row.

プロセッサ２は、５行を、プロセッサＯに送る。プロセ
ッサ０は、５行の係数からｅ、を消去する。Processor 2 sends 5 rows to processor O. Processor 0 deletes e from the coefficients in row 5.

プロセッサＯは、９行を、プロセッサｌに送る。プロセ
ッサ１は、９行の係数から８１゜を消去する。Processor O sends 9 rows to processor l. Processor 1 deletes 81° from the coefficients in row 9.

（ｅｓ＋ｅｔ＋ｅ＋＋の消去）プロセッサ２は、２行を、プロセッサ０に送る。プロセ
ッサＯは、２行の係数からｅ、を消去する。(Delete es+et+e++) Processor 2 sends two lines to processor 0. Processor O deletes e from the coefficients in the second row.

プロセッサ０は、６行を、プロセッサｌに送る。プロセ
ッサｌは、６行の係数からｅ、を消去する。Processor 0 sends 6 rows to processor l. Processor l deletes e from the coefficients in row 6.

プロセッサｌは、　　１０行を、プロセッサ２に送る。Processor l sends 10 lines to processor 2.

プロセッサ２は、　　１０行の係数からｅ。Processor 2 calculates e from the coefficients in row 10.

を消去する。Erase.

という各処理が行われ、第３図（Ｃ）図示の状態となる
０次いで。These processes are performed, resulting in the state shown in FIG. 3(C).

（ｆｗ、ｆｓ、ｆｗの消去）プロセッサ２は、２行を、プロセッサｌに送る。プロセ
ッサ１は、２行の係数から【Ｉを消去する。(Deleting fw, fs, fw) Processor 2 sends two lines to processor l. Processor 1 deletes [I from the coefficients in the second row.

プロセッサ０は、６行を、プロセッサ２に送る。プロセ
ッサ２は、６行の係数からｒｓを消去する。Processor 0 sends 6 rows to processor 2. Processor 2 deletes rs from the coefficients in row 6.

プロセッサ１は、１０行を、プロセッサ０に送る。プロ
セッサ０は、１０行の係数からｆ。Processor 1 sends 10 rows to processor 0. Processor 0 processes f from the coefficients in row 10.

を消去する。Erase.

（ｆｌ、［４，ｆｌの消去）プロセッサ１は、１行を、プロセッサ０に送る。プロセ
ッサ０は、１行の係数からｆｏを消去する。(Delete fl, [4, fl) Processor 1 sends one line to processor 0. Processor 0 deletes fo from the coefficients of one row.

プロセッサ２は、５行を、プロセッサｌに送る。プロセ
ッサ１は、５行の係数からｆ４を消去する。Processor 2 sends 5 rows to processor l. Processor 1 deletes f4 from the coefficients in row 5.

プロセッサ０は、９行を、プロセッサ２に送る。プロセ
ッサ２は、９行の係数からｆ、を消去する。Processor 0 sends 9 rows to processor 2. Processor 2 deletes f from the coefficients in row 9.

（ｆ、、ｆ？の消去）プロセッサ１は、４行を、プロセッサＯに送る。プロセ
ッサ０は、４行の係数からｆ、を消去する。(Erasure of f, , f?) Processor 1 sends 4 lines to processor O. Processor 0 deletes f from the coefficients in the 4th row.

プロセッサ２は、８行を、プロセッサ１に送る。プロセ
ッサ１は、８行の係数からｒ、を消去する。Processor 2 sends 8 rows to Processor 1. Processor 1 deletes r from the coefficients in row 8.

という各処理が行われ、第３図（Ｄ）図示の状態となる
０次いで。These processes are performed, resulting in the state shown in FIG. 3(D).

（ｅ　４＋　ｇ　Ｓｔ　ｇ　６１　ｇ　？の消去）プロ
セッサＯは、３行を、プロセッサ０とプロセッサｌに送
る。(Erasure of e 4+ g St g 61 g ?) Processor O sends 3 rows to processor 0 and processor l.

プロセッサ１は、３行の係数から８４を消去する。Processor 1 deletes 84 from the coefficients in row 3.

プロセッサ２は、３行の係数からｇｓを消去する。Processor 2 deletes gs from the coefficients in the third row.

プロセッサＯは、３行の係数からｇ、を消去する。Processor O deletes g from the coefficients in row 3.

プロセッサ１は、３行の係数からｇ、を消去する。Processor 1 deletes g from the coefficients in the third row.

（ｅｉ＋ｇｗ＋ｇ＋。＋　ｇ＋＋の消去）プロセッサｌ
は　７行を、プロセッサ２とプロセッサ０に送る。(Elimination of ei+gw+g+.+g++) Processor l
sends 7 lines to processor 2 and processor 0.

プロセッサ２は、７行の係数からｅ、を消去する。Processor 2 deletes e from the coefficients in row 7.

プロセッサＯは、７行の係数からｇ、を消去する。Processor O deletes g from the coefficients in row 7.

プロセッサｌは、７行の係数からｇ＋ｏを消去する。Processor l deletes g+o from the coefficients in row 7.

プロセッサ２は、７行の係数からｇ＋＋を消去する。Processor 2 deletes g++ from the coefficients in row 7.

という各処理が行われ、第３図（Ｅ）図示の状態となる
０次いで。These processes are performed, resulting in the state shown in FIG. 3(E).

（１＋。、　　ｈｗ、ｈａ、Ｉｌｔの消去）プロセッサ
２は、　　１１行を、プロセッサ１とプロセッサＯに送
る。(Erasure of 1+., hw, ha, Ilt) Processor 2 sends row 11 to Processor 1 and Processor O.

プロセッサ１は、１１行の係数からｆｉｌ＋を消去する
。Processor 1 deletes fil+ from the coefficients in row 11.

プロセッサ０は、１１行の係数からり、を消去する。Processor 0 erases , from the 11th row of coefficients.

プロセッサ２は、１１行の係数からｈｌを消去する。Processor 2 deletes hl from the 11th row of coefficients.

プロセッサｌは、１１行の係数からり、を消去する。Processor l erases , from the 11 rows of coefficients.

（ｆいり、、ｈ、、ｈ、の消去）プロセッサ１は、７行を、プロセッサＯとプロセッサ２
に送る。(Delete f, , h, , h) Processor 1 deletes 7 lines from processor O and processor 2.
send to

プロセッサＯは、７行の係数からｆ、を消去する。Processor O deletes f from the coefficients in row 7.

プロセッサ２は、７行の係数からり、を消去する。Processor 2 erases , from the 7th row of coefficients.

プロセッサ１は、７行の係数からり、を消去する。Processor 1 erases , from the coefficients in the 7th row.

プロセッサＯは、７行の係数からり、を消去する。Processor O erases , from the 7th row of coefficients.

（ｆｚ、ｂｔ、ｈａの消去）プロセッサＯは、３行を、プロセッサ２とプロセッサ１
に送る。(Delete fz, bt, ha) Processor O deletes 3 lines from processor 2 and processor 1.
send to

プロセッサ２は、３行の係数からｆ、を消去する。Processor 2 deletes f from the coefficients in the third row.

プロセッサ１は、３行の係数からｈｌを消去する。Processor 1 deletes hl from the coefficients in the third row.

プロセッサＯは、３行の係数からｈｏを消去する。Processor O deletes ho from the coefficients in row 3.

という各処理が行われ、第３図（Ｆ）図示のように対角
化されたものとなる。そして解は１次式で求める。These processes are performed, resulting in a diagonalized image as shown in FIG. 3(F). The solution is then found using a linear equation.

ｘｔ＝ｂｔ／ｄ＋これを、それぞれの行を担当しているプロセッサで計算
する。xt=bt/d+ This is calculated by the processor in charge of each row.

上記の如くして、帯行列を係数マトリクスとする連一次
方程式についての解を、複数のプロセッサが効率よく並
列的に処理を実行しつつ、求めることが明らかにされた
。この行列式解法処理方式を利用することによって２例
えば偏微分方程式を差分法を用いて解く場合に適用する
ことができる。この場合には、上記第２図ないし第３図
を参照して説明した解法を更に発展させたものが用いら
れることとなる。As described above, it has been revealed that a solution to a series of linear equations whose coefficient matrix is a banded matrix can be obtained while a plurality of processors efficiently execute processing in parallel. By using this determinant solution processing method, it can be applied to, for example, solving partial differential equations using the difference method. In this case, a further developed version of the solution method explained with reference to FIGS. 2 and 3 above will be used.

以下、並列プロセッサを用いたＡＤ＋（Ａｌｔｅｒｎａｔｅ　Ｄｉｒｅｃｔｉｏｎ　Ｉ＋＊ｐ
ｌｉｃｉｔ）法による解法について説明する。Below, AD+ (Alternate Direction I+*p) using parallel processors
The solution method using the (Licit) method will be explained.

まず二次元のラプラスの方程式を例に、ＡＤＩ法の原理
を説明する。First, the principle of the ADI method will be explained using a two-dimensional Laplace's equation as an example.

ラプラスの方程式は。Laplace's equation is.

ａ’　ｘ　　　　　　ｆ３”　ｙで与えられ、これを境界条件を与えて解く、境界上の値
を例えば０とする。It is given by a' x f3'' y, which is solved by giving a boundary condition, and the value on the boundary is set to 0, for example.

ｕ（ｘ、ｙ）＝Ｏ，（ｘ、ｙ）　　Ｃ（Ｃは境界を表す
）−００式の差分をとり、差分方程式に直す、中心差分
をｕｉ、　Ｊにとれば。u (x, y) = O, (x, y) C (C represents the boundary) - Take the difference of the equation 00 and convert it into a difference equation.If you take the central difference as ui, J.

ｕｉ−、＋＋ｊ”２ｕ！＋Ｊ＋ＬＩｉ＊ｌ＋ｊ”ｕｉ＋
Ｊ−１−２ｕｉ＋ｊ＋Ｌｌｉ、Ｊ＋１”０−・−・・・
−■ これを変形して。ui−, ++j”2u!+J+LIi*l+j”ui+
J-1-2ui+j+Lli, J+1"0-...
−■ Transform this.

ｕｉ、Ｊ　８（ｕｌ−１＋　ｊ　＋ｕｉ山Ｊ　＋ｕム＋
　Ｊ−１＋ｕ、、　３４＋）・−・−・−・■ ＡＤＩ法は、０式を次の２ステツプに分けて解く。ui, J 8 (ul-1+ j +ui mountain J +uum+
J-1+u,, 34+)・−・−・−・■ The ADI method solves equation 0 by dividing it into the following two steps.

ステップ１：ｕｉ＋、ｋ″””　＝（ｕｉ−＋＋　ｊｋ″””　＋ｕ
ｔ＋＋＋ｊ　’″Ｉ／１＋ｕｌ＋７−＋　”　＋Ｌｌ＋
＋Ｊ−＋　’　）　−一一−−−−■ステップ２；ｕｉ、　ｊ　”’　−（ｕｉ−１ｊ　”””　＋Ｌｌｉ
＊Ｉ＋　Ｊ　”””＋ｕｉ＋　ｊ−１”’　＋ｕｔ＋　
ｊ＊１　”’）−”−’−””−〇ステップ１およびス
テップ２をすべて格子点上の値Ｕ、１、の変化が十分小
さくなるまで繰り返す。Step 1: ui+, k″”” = (ui−++ jk″”” +u
t+++j '''I/1+ul+7-+ '' +Ll+
+J-+') -11----■Step 2; ui, j "' -(ui-1j """ +Lli
*I+ J """+ui+ j-1"' +ut+
j*1 "')-"-'-""-〇 Steps 1 and 2 are all repeated until the change in the value U, 1 on the grid point becomes sufficiently small.

■式、■式の未知数項を左辺へ、定数項を右辺に移項し
１両辺を４倍すれば、それぞれ０式と０式になる（添字
には省略する）。If we move the unknown term of the expressions ■ and ■ to the left side, the constant term to the right side, and multiply both sides by 4, we get the 0 expression and the 0 expression, respectively (the subscripts are omitted).

−ｕｉ−＋＋ｊ　＋　４ｕ＋、Ｊ−ｕｉ＋１．ｊ　＝ａ
ｔ＋ｊ　　　”’−’−”−■〔但しくａｉ、Ｊ　＝ｕ
！、ｊ−１　＋ｕｉ−ｊ’ｌ　）　）−１１−ｊ−＋　
＋　４ｕｉ、ｊ−ｕｉ＋ｊ４１−ｂｉ＋ｊ　　　’−”
””””■〔但しくｂｔ、　ｊ　＝ｕｒ−ｉ−ｊ　＋ｕ
ｌ＋１＋　ｊ　）　）０式から、添字ｊ毎にｉ方向に連
立させた３重対角行列を係数とする連一次方程式を解く
必要がある。同様に０式から添字ｉ毎にｊ方向に連立さ
せて解くこととなる。-ui-++j + 4u+, J-ui+1. j = a
t+j "'-'-"-■ [However, ai, J = u
! , j-1 +ui-j'l ) )-11-j-+
+ 4ui, j-ui+j41-bi+j '-"
””””■ [However, bt, j = ur-i-j +u
l+1+ j )) From the equation 0, it is necessary to solve a system of linear equations whose coefficients are tridiagonal matrices that are simultaneously arranged in the i direction for each subscript j. Similarly, equation 0 is solved simultaneously in the j direction for each subscript i.

なお０式や０式において示したｋは計算の繰り返し回数
を表しており９例えばｕｉ＋ｊ−１は格子点上の値ｕｉ＋Ｊ−１における繰り返し計算での
に回目の計算によって得られた値を表している。Note that k shown in formula 0 and formula 0 represents the number of repetitions of calculation.9For example, ui + j-1 represents the value obtained by the second calculation in the repeated calculation at the value ui + J-1 on the grid point. There is.

また０遍、　ｋｅ１／！は格子点上の値ＬＩｉ＋　Ｊにおける繰り返し計算での
に回目の計算につづ＜（ｋ＋１）回目の計算についての
ステップ１において得られた値を表しており。0en again, ke1/! represents the value obtained in step 1 for <(k+1)th calculation following the second calculation in repeated calculations at the value LIi+J on the grid point.

ｕ！＋Ｊ’４は同じくに回目の計算につづ＜（ｋ＋１）回目の計算に
ついてのステップ２において得られた値、即ち。u! +J'4 is the value obtained in step 2 for <(k+1)th calculation following the calculation of the second time, ie.

（ｋ　＋　１）回目の計算において得られた値を表して
いる。It represents the value obtained in the (k + 1)th calculation.

第４図（Ａ）（Ｂ）は計算格子領域の分割の態様を示し
ている。FIGS. 4(A) and 4(B) show how the calculation grid area is divided.

第４図（Ａ）の場合には０式に示す方程式を解く場合に
プロセッサＰＥＯないしＰＥ３によって分担せしめる態
様が示され、第４図（Ｂ）の場合には０式に示す方程式
を解く場合にプロセッサＰＥＯないしＰＥ３によって分
担せしめる態様が示されている。In the case of FIG. 4(A), a mode is shown in which processors PEO to PE3 are responsible for solving the equation shown in equation 0, and in the case of FIG. 4(B), when solving the equation shown in equation 0, A mode in which the processing is shared by processors PEO to PE3 is shown.

第４図（Ａ）（Ｂ）図示の夫々の分割態様の場合には、
第１０図を参照して説明した従来の場合に対応している
。In the case of each division mode shown in FIGS. 4(A) and (B),
This corresponds to the conventional case described with reference to FIG.

この点を考慮することによって、第２図などを参照して
説明した飛び飛びの割り付は態様を利用することが考え
られる。By taking this point into consideration, it is possible to utilize the pattern of discrete allocation described with reference to FIG. 2 and the like.

第５図（Ａ）（Ｂ）は夫々飛び飛びの割り付けを行う態
様を示す。FIGS. 5(A) and 5(B) each show a mode in which allocation is performed intermittently.

例えば第５図（Ａ）図示の如く割り付けることによって
、上記０式に対応する処理を行うことができる。また第
５図（Ｂ）図示の如く割り付けることによって、上記０
式に対応する処理を行うことができる。For example, by allocating as shown in FIG. 5(A), processing corresponding to the above equation 0 can be performed. In addition, by allocating as shown in FIG. 5(B), the above 0
Processing corresponding to the expression can be performed.

第６図は２次元飛び飛びの割り付けの一実施例を示す０
図示の場合には次元数１６のマトリクスに対して、１６
個のプロセッサ（００）、　（０１）、　（０２）。FIG. 6 shows an example of two-dimensional discontinuous allocation.
In the case shown, for a matrix with 16 dimensions, 16
processors (00), (01), (02).

・・・、　（３０）、　（３１）、　（３２）、　（３
３）を割り付けた例を示している。..., (30), (31), (32), (3
3) is shown.

第５図（Ａ）（Ｂ）の如く飛び飛びの割り付けを行う態
様を考慮して、第５図（Ａ）（Ｂ）におけるプロセッサ
ＰＥｉとプロセッサＰＥｊとに割り付けられる同じ位置
関係にあるエレメントをまとめて、１つのプロセッサに
割り付ける形をとると、第５図（Ａ）（Ｂ）図示の場合
には９台のプロセッサに割り付けることが可能となる。Considering the manner in which allocation is performed intermittently as shown in FIGS. 5(A) and 5(B), elements with the same positional relationship that are allocated to processor PEi and processor PEj in FIGS. 5(A) and 5(B) are grouped together. , if it is allocated to one processor, it becomes possible to allocate it to nine processors in the case shown in FIGS. 5(A) and 5(B).

第６図は、この考え方を次元数１６のマトリクスに発展
させた場合を表している。FIG. 6 shows a case in which this idea is developed into a matrix with 16 dimensions.

以下、上記０式と０式とを解く処理について説明する。The process of solving the above equations 0 and 0 will be described below.

プロセッサの台数をＰＸＰ台、格子点の数をＮ×Ｎ、１
台のプロセッサの担当する格子点の数をＭＸＭ　（但し
Ｍ＝Ｎ／Ｐ）とする。各プロセッサが担当する格子点の
データをｕｕ　（ｉ、ｊ）　（但しｉ、ｊ　＝０．１．　−、　
　Ｍ　−１）とし、隣接する格子点データの作業域とし
て。The number of processors is PXP, the number of grid points is N×N, 1
Let the number of grid points handled by one processor be MXM (where M=N/P). The data of the grid points handled by each processor are uu (i, j) (where i, j = 0.1. -,
M-1) as the working area of adjacent grid point data.

ｕｓ　（ｉ＋ｊ）ｕｐ　（ＩＩＪ）ａａ　（ｉ＋ｊ）（但しＩＩＪ　＝Ｏｒ　１＋　　・・・、Ｍ−１）を用
意しておくことにする。We will prepare us (i+j) up (IIJ) aa (i+j) (where IIJ = Or 1+ . . . , M-1).

ステップｌ：０式を解く処理すべてのプロセッサ（ｋ、１）は、担当する格子点デー
タｕｇ（＊、＊）をプロセッサ（ｋ、（１＋１）％Ｐ）
に転送する。ここで、ｉ％ｊは整数ｉを整数ｊで割った
剰余である。Step l: Processing to solve the 0 equation All processors (k, 1) process the grid point data ug (*, *) that they are responsible for using the processor (k, (1+1)%P)
Transfer to. Here, i%j is the remainder when integer i is divided by integer j.

すべてのプロセッサ（ｋ、１）は、送られてきたデータ
をｕｓ（掌、傘）に格納する。All processors (k, 1) store the sent data in us (palm, umbrella).

すべてのプロセッサ（ｋ、１）は、担当する格子点デー
タｔｉｌｌ（ＩＩＩ）をプロセッサ（ｋ、　（１−１）
％Ｐ）に転送する。All the processors (k, 1) transfer the grid point data till (III) to the processor (k, (1-1))
%P).

すべてのプロセッサ（ｋ、１）は、送られてきたデータ
をＵρ（率、＊）に格納する。All processors (k, 1) store the sent data in Uρ(rate, *).

すべてのプロセッサ（ｋ、１）は２次の計算を行う。All processors (k, 1) perform quadratic calculations.

ａ（ｉ、ｊ）＝　ｕ＋ｓ（ｉ、ｊ）＋　　ｕｐ（ｉ、ｊ
）（ｉ、ｊ＝ｏ、　　１．　２．・・・、　　Ｍ−１）
これで、■式の右辺ａ（ｉ、ｊ）が求められた０行ｉに
ついての連一次方程式は、第７図の様になっている。a(i,j)=u+s(i,j)+up(i,j
) (i, j=o, 1. 2...., M-1)
Now, the system of linear equations for row 0 i, for which the right-hand side a(i, j) of equation (2) has been found, is as shown in FIG.

この連一次方程式を第２図などを参照して説明した解法
によって解く。右辺のＡｉ、ｊ　は、横方向の一連のプ
ロセッサ（ｋ、１）　（ｋ＝　０．１．２．・・・Ｍ−
１）でＡｔ、ｊをｊについてサイクリックに担当してい
るので上記の並列解法がそのまま適用できる。This series of linear equations is solved by the solution method explained with reference to FIG. 2 and the like. Ai,j on the right side is a horizontal series of processors (k, 1) (k= 0.1.2...M-
In 1), At,j is handled cyclically with respect to j, so the above parallel solution method can be applied as is.

求めた解はＬＩＬＩ（＊、＊）に格納する。The obtained solution is stored in LILI (*, *).

ステップ２：０式を解く処理すべてのプロセッサ（ｋ、１）は、担当する格子点デー
タｕｕ（牢＋傘）をプロセッサ（（ｋ　＋　１）％Ｐ、
１）に転送する。Step 2: Processing to solve equation 0 All processors (k, 1) convert the lattice point data uu (prison + umbrella) in charge to processor ((k + 1)%P,
Transfer to 1).

すべてのプロセッサ（ｋ、１）は、送られてきたデータ
を１ｅｆｔ（傘、寧）に格納する。All processors (k, 1) store the sent data in 1ef.

すべてのプロセッサ（ｋ、１）は、担当する格子点デー
タｕｕ（傘＋＊）をブロモ・ンサ（（ｋ−１）％Ｐ、１
）に転送する。All processors (k, 1) store the lattice point data uu (umbrella+*) in charge of Bromo Nsa ((k-1)%P, 1
).

すべてのプロセッサ（ｋ、１）は、送られてきたデータ
をｒｉｇｈｔ（＊、＊）に格納する。All processors (k, 1) store the sent data in right(*, *).

すべてのプロセッサ（ｋ、１）は９次の計算を行う。All processors (k, 1) perform 9th order calculations.

ａ（ｉ、ｊ）＝ｌｅｆｔ（ｉ、ｊ）　＋　ｒｉｇｈｔ（
ｉ、ｊ）（＋、ｊ＝　０＋　１＋　２．・・・、　Ｍ−
１）これで、０式の右辺ａ（ｉ、ｊ）が求められた０列
ｊについての連一次方程式は、第８図の様になっている
。a(i,j)=left(i,j)+right(
i, j) (+, j= 0+ 1+ 2...., M-
1) Now, the continuous linear equations for the 0 column j, for which the right side a(i, j) of the 0 equation has been found, are as shown in FIG.

この連一次方程式を同様に解く、右辺のＡｉ、ｊ　は、
縦方向の一連のプロセッサ（ｋ、１）　（１＝　Ｏ。Solving this series of linear equations in the same way, Ai,j on the right side is,
Vertical series of processors (k, 1) (1=O.

１．２．・・・、　Ｍ−１）でＡ　ｉ　、　ｊをｉにつ
いてサイクリックに担当しているので上記の並列解法が
そのまま適用できる。1.2. ..., M-1) handles A i , j cyclically with respect to i, so the above parallel solution method can be applied as is.

求めた解はｕｕ（”＋”）に格納する。The obtained solution is stored in uu ("+").

第９図は１６台のプロセッサが動作する並列プロセッサ
・システムの一例を示す。図中の箱はプロセッサを表し
、また箱内の（００）、　（０１）、・・・（３３）は
夫々プロセッサのナンバを示している。またプロセッサ
間の線は通信結を表している。FIG. 9 shows an example of a parallel processor system in which 16 processors operate. The boxes in the figure represent processors, and (00), (01), . . . (33) within the boxes indicate the processor numbers, respectively. Also, lines between processors represent communication connections.

〔Effect of the invention〕

以上説明した如く１本発明によれば１例えば１台のプロ
セッサによる一連の処理を待って他のプロセッサが計算
可能となる如き状態が生しることがない、またプロセッ
サ間の交信データ量もそれ程大とならない利点をそなえ
ている。As explained above, according to the present invention, there is no situation where, for example, one processor is waiting for a series of processing to be performed before another processor can perform calculations, and the amount of data exchanged between processors is also small. It has some great advantages.

図中、ｌはホスト・プロセッサ、２−１は夫々プロセッ
サ、３は通信ネットワークを表す。In the figure, l represents a host processor, 2-1 each processor, and 3 a communication network.

Claims

[Claims]

(1) Simultaneous linear equations A_x=b……………………(1) That is, ▲There are mathematical formulas, chemical formulas, tables, etc.▼……(2) [However, ○ marks are a_i_j and the number of elements in the same row is arbitrary] In the banded matrix coefficient simultaneous linear equation parallel solution processing method in which the elements in the equation (2) are assigned to multiple processors and processed in parallel, the elements in equation (2) above are A processor (2-i) is provided to which elements are assigned at discrete positions in units, and a plurality of processors (2-i) are configured to execute processing on the elements assigned to themselves in parallel, and A determinant solving processing method characterized in that each processor (2-i) receives elements related to pivot rows from other processors as necessary and executes a coefficient matrix elimination calculation.

(2) In an alternate direction implicit method processing method that executes an alternate direction implicit method to solve a partial differential equation using a difference method, the solution method using the above alternate direction implicit method is performed using a band matrix as a coefficient matrix. Simultaneous linear equations A_x=b……………………(1) That is, ▲There are mathematical formulas, chemical formulas, tables, etc.▼……(2) [However, the ○ mark represents a_i_j and the number of elements in the same line. is arbitrary], and processors (2-ij) to which a plurality of elements located at discrete positions in the row direction and at discrete positions in the column direction in equation (2) above are respectively allocated. A plurality of processors (2-ij) are configured to execute processing on elements assigned to themselves in parallel, and each processor (2-ij) receives elements related to the pivot row from other processors. 1. A determinant solving processing method, characterized in that a coefficient matrix is received according to the request and an elimination calculation of a coefficient matrix is executed.