JPH06332871A

JPH06332871A - Parallel processing system

Info

Publication number: JPH06332871A
Application number: JP5120299A
Authority: JP
Inventors: Kazuki Shigeta; 一樹重田
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1993-05-24
Filing date: 1993-05-24
Publication date: 1994-12-02

Abstract

PURPOSE:To improve processing efficiency by unnecessitating interruption processing for synchronism after transfer by reporting transfer end to a processor at the transfer destination by a transfer end notice means together with the transfer of data. CONSTITUTION:It is decided whether the processor is in charge of a pivot line or not (S1). Processing 1 is executed by the processor decided as in charge of the pivot in the S1 (S2). The line data and transfer end flag of a k-th line are transferred from a processor in charge of the k-th line to the other processor (S3). The on/off of the transfer end flag is checked by the processor at the transfer destination (S4). When the flag is turned on, the processing is advanced to S5 but when the flag is turned off, the check is repeated. Concerning lines from a (K+1)-th line to an n-th line, processing 2 is respectively executed by processors in charge of respective lines (S5). The pivot is turned to ak+1 and ak+1 and processingis returned to the S1. The pivot line is changed from the 1st line up to the n-th line, this loop is repeated, and an upper trigonometric matrix is generated (S6). The solution of a simultaneous linear equation is calculated by backward substitution.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は並列処理方式に関し、特
に半導体設計装置に組込まれた行列演算を複数の並列プ
ロセッサから成る並列計算機により処理する並列処理方
式に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a parallel processing system, and more particularly to a parallel processing system for processing matrix operations incorporated in a semiconductor design device by a parallel computer composed of a plurality of parallel processors.

【０００２】[0002]

【従来の技術】計算機上である物理現象を模擬する場
合、対象とする空間を細かく分割し、上記物理現象を表
す偏微分方程式を解析するため格子点上で線形差分近似
することにより連立一次方程式の係数行列を作成し、こ
の係数行列を適当な行列求解法により演算し各格子点毎
の物理量を求める。通常、上記係数行列は大規模な行列
となり、行列演算において多くの時間が消費される。上
記行列演算を高速に行うための一つの手法は、並列計算
機を利用し並列処理することである。2. Description of the Related Art When simulating a physical phenomenon on a computer, a space to be divided is finely divided, and a linear differential approximation is performed on a grid point in order to analyze a partial differential equation representing the physical phenomenon. The coefficient matrix is created, and this coefficient matrix is calculated by an appropriate matrix solving method to obtain the physical quantity for each lattice point. Usually, the coefficient matrix is a large-scale matrix, and a lot of time is consumed in matrix calculation. One method for performing the matrix calculation at high speed is to perform parallel processing using a parallel computer.

【０００３】行列求解法の一つにガウスの消去法があ
る。情報処理学会編，情報処理ハンドブック第１５０〜
１５１頁，オーム社，１９８９年記載のように、このガ
ウスの消去法の処理は、前進消去と後退代入とから成
り、そのうち前進消去に殆どの計算時間を費やす。図４
を参照してガウス消去法を説明すると、第ｉ行ｊ列の要
素をａ_ij、現在の枢軸すなわちピボットをａ_kkとする。
まず、ピボットを１にするため第ｋ行の要素ａ_kjをａ_kj
／ａ_kkで置換る（処理１）。次に、第ｋ＋１行から第ｎ
行までを処理する。例えば、要素がａ_ijの場合は、ａ_ij
−ａ_ik＊ａ_kjに置換る（処理２）。すなわちこの処理２
は並列処理の対象部分である。A Gaussian elimination method is one of matrix solving methods. IPSJ, Information Processing Handbook No. 150-
As described on page 151, Ohmsha, 1989, the processing of the Gaussian elimination method consists of forward elimination and backward substitution, of which forward elimination consumes most of the computation time. Figure 4
To explain the Gaussian elimination with reference to the elements of the row i and column j a _ij, the current pivot or pivot to a _kk.
First, in order to set the pivot to 1, the element a _kj in the k-th row is set to a _kj
Replace with / a _kk (Process 1). Next, from the (k + 1) th row to the nth row
Process up to line. For example, if the element is a _ij , then a _ij
_{-Replace with} a _ik * a _kj (Process 2). That is, this process 2
Is the target part of parallel processing.

【０００４】並列計算機を用いて上述のガウスの消去法
を計算する場合には、処理１と処理２との間に、処理１
の出力データが確実に処理２を実行する全部のプロセッ
サに転送されるようにするため、この並列計算機を構成
する上記プロセッサ間の同期が必要となる。When the above Gaussian elimination method is calculated using a parallel computer, the process 1 is performed between the process 1 and the process 2.
In order to ensure that the output data of (1) is transferred to all the processors that execute the process 2, synchronization among the above-mentioned processors constituting this parallel computer is required.

【０００５】ｎ次の係数行列の各行をｍ台のプロセッサ
から成る並列計算機で計算する場合の従来の並列処理方
式のフローチャートを示す図５を参照すると、まず、ス
テップＰ１でこのプロセッサがこのピボット行担当であ
るか否かを判定する。ステップＰ２において、ステップ
Ｐ１でピボット行担当と判定されたプロセッサにより、
処理１を実行する。ステップＰ３において、第ｋ行の行
データを第ｋ行担当プロセッサから他の全部のプロセッ
サに転送する。ここで、並列計算機を構成する全プロセ
ッサ間の同期をとり、上記全プロセッサ間のデータ転送
が終了するまで処理を一時中断する。ステップＰ５にお
いて、第ｋ＋１行から第ｎ行を各行の担当プロセッサが
それぞれ処理２を実行する。ステップＰ６で、ピボット
をａ_k+1, _k+1にして、ステップＰ１に処理が戻る。ピボ
ット行を第１行から第ｎ行まで変えてこのループ処理を
反復し、上三角行列を生成する。ステップ７で、後退代
入により、上記連立一次方程式の解を求める。Referring to FIG. 5 which shows a flowchart of a conventional parallel processing system in which each row of an nth-order coefficient matrix is calculated by a parallel computer composed of m processors, first, at step P1, this processor makes this pivot row It is determined whether the person is in charge. In step P2, by the processor determined to be in charge of the pivot row in step P1,
Process 1 is executed. In Step P3, the row data of the kth row is transferred from the processor in charge of the kth row to all the other processors. Here, all the processors forming the parallel computer are synchronized with each other, and the processing is temporarily suspended until the data transfer between all the processors is completed. In Step P5, the processor in charge of each of the k + 1-th to n-th rows executes the process 2. In step P6, the pivot is set to a _{k + 1,} _{k + 1} , and the process returns to step P1. The loop process is repeated by changing the pivot row from the first row to the n-th row to generate the upper triangular matrix. In step 7, the solution of the simultaneous linear equations is obtained by backward substitution.

【０００６】上記ガウス消去法を４台のプロセッサＰＥ
０〜ＰＥ３を備えた並列計算機により従来の並列処理方
式で処理する場合のタイムチャートを示す図６を参照す
ると、処理１実行後のピボット行の行データ転送後に全
プロセッサＰＥ０〜ＰＥ３の同期をとる同期処理が存在
し、転送が全て終了するまでこれらプロセッサＰＥ０〜
ＰＥ３は処理を中断する。The above Gaussian elimination method is applied to four processor PEs.
Referring to FIG. 6, which shows a time chart when processing is performed by a parallel computer including 0 to PE3 in a conventional parallel processing method, all processors PE0 to PE3 are synchronized after row data transfer of a pivot row after execution of processing 1. There is synchronous processing, and these processors PE0 to PE0 until all transfers are completed.
PE3 interrupts the process.

【０００７】[0007]

【発明が解決しようとする課題】上述した従来の並列処
理方式は、処理１実行後のピボット行の行データ転送後
における全ての並列プロセッサの同期処理のため、上記
転送が終了するまで全ての上記プロセッサは処理を中断
するので、上記プロセッサの台数が増加すると上記転送
時間が増加し各々の上記プロセッサの上記中断時間が増
大することにより処理効率が著しく低下するという欠点
があった。In the conventional parallel processing method described above, all the parallel processors are synchronized until the transfer is completed because of the synchronous processing of all the parallel processors after the row data transfer of the pivot row after the execution of processing 1. Since the processor interrupts the processing, there is a drawback that the transfer time increases as the number of the processors increases, and the processing efficiency significantly decreases due to the increase of the interrupt time of each processor.

【０００８】[0008]

【課題を解決するための手段】本発明の並列処理方式
は、複数のプロセッサから構成される並列計算機が実行
する連立一次方程式の係数行列から成る大規模行列演算
の並列処理方式において、前記複数のプロセッサの各々
が予め定めた処理対象行の行データを他の前記プロッセ
ッサへの転送時にこの行データの転送終了を示す転送終
了告知手段を備え、転送元の第１の前記プロッセッサが
前記行データ転送と同時に前記転送終了告知手段により
転送先の第２の前記プロセッサに前記転送終了を告知
し、前記第２のプロセッサは前記告知により前記転送終
了を確認して次の処理を開始することにより前記複数の
プロセッサの各々が非同期制御されることを特徴とする
ものである。A parallel processing system of the present invention is a parallel processing system of a large-scale matrix operation composed of coefficient matrices of simultaneous linear equations executed by a parallel computer composed of a plurality of processors. Each of the processors is provided with transfer end notifying means for indicating the end of the transfer of the row data of a predetermined processing target row to another processor when the row data is transferred by the first processor of the transfer source. At the same time, the transfer end notifying means notifies the transfer destination second processor of the transfer end, and the second processor confirms the transfer end by the notification and starts the next process. Each of the processors is controlled asynchronously.

【０００９】[0009]

【実施例】次に、本発明の実施例について図面を参照し
て説明する。Embodiments of the present invention will now be described with reference to the drawings.

【００１０】本発明の並列処理方式の一実施例を示すフ
ローチャートである図１を参照すると、まず、ステップ
Ｓ１でこのプロセッサがこのピボット行担当であるか否
かを判定する。ステップＳ２において、ステップＳ１で
ピボット行担当と判定されたプロセッサにより、処理１
を実行する。ステップＳ３において、第ｋ行の行データ
および転送終了フラグを第ｋ行担当プロセッサから他の
プロセッサに転送する。ステップＳ４において、転送先
のプロセッサで転送終了フラグのオン・オフをチエック
する。フラグがオンの場合にはステップＳ５へ、オフの
場合には、チエックを反復する。ステップＳ５におい
て、第ｋ＋１行から第ｎ行を各行の担当プロセッサがそ
れぞれ処理２を実行する。ステップＳ６で、ピボットを
ａ_k+1,k+1にして、ステップＳ１に処理が戻る。ピボッ
ト行を第１行から第ｎ行まで変えてこのループ処理を反
復し、上三角行列を生成する。ステップ６で、後退代入
により、上記連立一次方程式の解を求める。Referring to FIG. 1, which is a flowchart showing one embodiment of the parallel processing system of the present invention, first, in step S1, it is determined whether or not this processor is in charge of this pivot row. In step S2, processing 1 is performed by the processor determined to be in charge of the pivot row in step S1.
To execute. In step S3, the row data of the kth row and the transfer end flag are transferred from the processor in charge of the kth row to another processor. In step S4, the transfer destination processor checks whether the transfer end flag is on or off. If the flag is on, the process goes to step S5, and if it is off, the check is repeated. In step S5, the processor in charge of each of the k + 1-th to n-th rows executes the process 2. In step S6, the pivot is set to a _{k + 1, k + 1} , and the process returns to step S1. The loop process is repeated by changing the pivot row from the first row to the n-th row to generate the upper triangular matrix. In step 6, the solution of the simultaneous linear equations is obtained by backward substitution.

【００１１】従来例と同様に、上記ガウス消去法を４台
のプロセッサＰＥ０〜ＰＥ３を備えた並列計算機により
従来の並列処理方式で処理する場合のタイムチャートを
示す図２を参照すると、処理１終了後転送元のプロセッ
サＰＥ０から、転送先の各プロセッサＰＥ１〜ＰＥ３に
データおよび転送終了フラグを同時に転送する。プロセ
ッサＰＥ１〜ＰＥ３の各々は上記フラグのチエック結果
転送終了であれば、直ちに処理２を実行する。Similar to the conventional example, referring to FIG. 2 which shows a time chart when the above-described Gaussian elimination method is processed by the conventional parallel processing method by the parallel computer having the four processors PE0 to PE3, the processing 1 is completed. The data PE and the transfer end flag are simultaneously transferred from the post-transfer source processor PE0 to each of the transfer destination processors PE1 to PE3. Each of the processors PE1 to PE3 immediately executes the process 2 when the check result transfer of the flag is completed.

【００１２】本実施例を半導体設計装置に適用した構成
例をブロックで示す図３を参照すると、この半導体設計
装置は、入力装置１と、解析メッシュ生成装置２１と本
実施例の並列処理方式の複数のプロセッサから成る行列
演算装置２２とを含む演算装置２と、磁気ディスク等の
記憶装置３と、プリンタ４１とＣＲＴデイスプレイ４２
とを含む出力装置４とを備える。Referring to FIG. 3, which shows a block diagram of a configuration example in which the present embodiment is applied to a semiconductor design apparatus, the semiconductor design apparatus includes an input device 1, an analysis mesh generation device 21, and a parallel processing system of the present embodiment. An arithmetic unit 2 including a matrix arithmetic unit 22 composed of a plurality of processors, a storage unit 3 such as a magnetic disk, a printer 41 and a CRT display 42.
And an output device 4 including.

【００１３】動作について説明すると、入力装置１は設
計対象の半導体装置の構造と物理パラメータとを入力す
る。解析メッシュ生成装置２１は、上記半導体装置の内
部を細かく分割し、解析対象の物理現象を表す偏微分方
程式を各格子点の近傍で線形近似し、連立一次方程式を
作成し、その係数行列を生成する。行列演算装置２２
は、上記連立一次方程式を解き、各格子点上の物理量を
求める。解析結果を記憶装置３に供給し、さらに、ＣＲ
Ｔディスプレイ４２やプリンタ４１に解析結果を出力す
る。In operation, the input device 1 inputs the structure and physical parameters of the semiconductor device to be designed. The analysis mesh generation device 21 finely divides the inside of the semiconductor device, linearly approximates a partial differential equation representing a physical phenomenon to be analyzed in the vicinity of each grid point, creates a simultaneous linear equation, and generates a coefficient matrix thereof. To do. Matrix operation device 22
Solves the simultaneous linear equations and finds the physical quantity on each grid point. The analysis result is supplied to the storage device 3, and further CR
The analysis result is output to the T display 42 and the printer 41.

【００１４】[0014]

【発明の効果】以上説明したように、本発明の並列処理
方式は、転送終了告知手段によりデータの転送とともに
転送終了を転送先のプロセッサに告知するので、転送後
の同期のための処理中断が不要となり、処理効率が向上
するという効果がある。As described above, in the parallel processing method of the present invention, the transfer end notifying means notifies the transfer destination processor of the transfer end as well as the transfer of the data, so that the processing interruption for the synchronization after the transfer is prevented. There is an effect that it becomes unnecessary and the processing efficiency is improved.

[Brief description of drawings]

【図１】本発明の並列処理方式の一実施例を示すフロー
チャートである。FIG. 1 is a flowchart showing an embodiment of a parallel processing system of the present invention.

【図２】本実施例の並列処理方式における動作の一例を
示すタイムチャートである。FIG. 2 is a time chart showing an example of an operation in the parallel processing system of this embodiment.

【図３】本実施例の並列処理方式を適用した半導体設計
装置のブロック図である。FIG. 3 is a block diagram of a semiconductor design device to which the parallel processing method of this embodiment is applied.

【図４】ガウス消去法の説明図である。FIG. 4 is an explanatory diagram of a Gaussian elimination method.

【図５】従来の並列処理方式の一例を示すフローチャー
トである。FIG. 5 is a flowchart showing an example of a conventional parallel processing method.

【図６】従来の並列処理方式における動作の一例を示す
タイムチャートである。FIG. 6 is a time chart showing an example of operation in a conventional parallel processing method.

[Explanation of symbols]

１入力装置２演算装置３記憶装置４出力装置２１解析メッシュ生成装置２２行列演算装置４１プリンタ４２ＣＲＴディスプレイ 1 Input Device 2 Arithmetic Device 3 Storage Device 4 Output Device 21 Analysis Mesh Generation Device 22 Matrix Arithmetic Device 41 Printer 42 CRT Display

Claims

[Claims]

1. A parallel processing method of large-scale matrix operation comprising a coefficient matrix of simultaneous linear equations executed by a parallel computer composed of a plurality of processors, wherein each of the plurality of processors has a row of a predetermined processing target row. When the data is transferred to another processor, there is provided transfer end notification means for indicating the end of transfer of this row data, and the first source processor is the transfer destination notification means by the transfer end notification means simultaneously with the transfer of the row data. 2, the second processor notifies the end of the transfer, the second processor confirms the end of the transfer by the notification, and starts the next process, whereby each of the plurality of processors is asynchronously controlled. And parallel processing method.

2. The transfer end notification means is a flag which is set when the transfer is completed, wherein the first processor of the transfer source sets the flag simultaneously with the row data to the second processor of the transfer destination. 2. The parallel processing method according to claim 1, wherein the second processor starts the next processing when the flag is in the set state.