JPS63265365A

JPS63265365A - Analysis of physical phenomenon

Info

Publication number: JPS63265365A
Application number: JP63022960A
Authority: JP
Inventors: ステイーヴン・ウオーレン・ハモンド; ゲイリイ・エヌエムエヌ・ベドロジアン
Original assignee: General Electric Co
Current assignee: General Electric Co
Priority date: 1987-02-04
Filing date: 1988-02-04
Publication date: 1988-11-01
Also published as: DE3803183A1; IT8819296A0; SE8800341D0; AU1113888A; FR2610429A1; GB8802490D0; GB2205183A

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】〔産業上の利用分野〕本発明は物理現象を解析する改良した方法に関するもの
である。DETAILED DESCRIPTION OF THE INVENTION Field of the Invention The present invention relates to an improved method for analyzing physical phenomena.

[Conventional technology]

多くの物理系は線形方程式系を用いて数学的に記述でき
る。それらの線形方程式系はマトリックス処理方法によ
シ解かれる。有限要素解析は、種々の物理現象を方程式
系に関して記述すること、およびそれらの方程式系を解
くための方法論を開発することに関する方法である。こ
の明細書においては「物理系」という用語は構造、装置
（ｄｏマ１ｃ・。Many physical systems can be described mathematically using a system of linear equations. These linear equation systems are solved by matrix processing methods. Finite element analysis is a method concerned with describing various physical phenomena in terms of systems of equations and developing methodologies for solving those systems of equations. In this specification, the term "physical system" refers to structures, devices (domains, etc.).

鼻ｐｐａｒａｔｕｓ　）、または物質体（固体、液体、
気体）、あるいは特定の物理現象、化学現象もしくはそ
の他の現象が起る単なる空間領域を意味するものとする
。有限要素解析は構造解析を行う方法に起源を持つもの
であるが、今日では、少しの例をあげれば、電動機、発
電機、磁気共鳴映像発生装置、航空機エンジン用点火装
置、遮断器および変圧器の設計に日常用いられている。nasal paratus), or a material body (solid, liquid,
gas), or simply a region of space in which a particular physical, chemical or other phenomenon occurs. Finite element analysis has its origins as a method of performing structural analysis, but today it is used to analyze electric motors, generators, magnetic resonance imagers, aircraft engine ignition systems, circuit breakers, and transformers, to name a few. It is routinely used in the design of

それの技術はあらゆる種類の物理系における応力、温度
、分子構造、電磁界、“電流、物理的な力等を解析する
ために用いられる。有限要素解析は、他の方法では容易
には解析されない数多くの製品のための設計サイクルの
標準的な部分になってきている。本発明はそのような製
品の解析および設計にとくに用いられる。Its techniques are used to analyze stress, temperature, molecular structure, electromagnetic fields, electrical currents, physical forces, etc. in all kinds of physical systems. It has become a standard part of the design cycle for many products, and the present invention has particular use in the analysis and design of such products.

有限要素解析技術によシ解くことを求められている線形
方程式系はしばしば非常に大きいためにコンピュータを
用いて解くことが困難である。たとえば、大きいが、典
型的なものではない二次元有限要素解析からの方程式系
は２５０００個の未知数を持つことがある。そのような
方程式系が多数の結節点からの寄与（ｅｏｎｔｒｉｂｕ
ｔｌｏｎ）　　を有する有限要素網目を基にしている場
合には、解を得るためにものすごい計算力を使用する他
に選択の余地はない。しかし、ある場合には、そのよう
な方程式は大きく、粗であるから、解くために強力な計
算力を必要としないようにするように方程式を前処理す
なわち変換する機会が与えられる。この明細書で用いる
「粗」という用語は、マトリックス中の非常に小さい割
合の要素のみが非零値を有するという特徴を指すもので
ある。非常に大きい系に極端な粗性が存在する場合には
、計算の見地からは一層容易に取扱われる方程式系へ方
程式系を変換するために使用できるいくつかの技術が存
在する。しかし、そのような変換にもかかわらず、結果
として得られたマトリックス方程式の大きさおよびその
他の特性によっては標準的な計算技法が実用的でなかっ
たシ、効率が非常に低いことがある。The systems of linear equations that finite element analysis techniques are required to solve are often so large that they are difficult to solve using computers. For example, a system of equations from a large, but not typical, two-dimensional finite element analysis may have 25,000 unknowns. If such a system of equations has contributions from a large number of nodes (eontribu
tlon), there is no choice but to use tremendous computational power to obtain the solution. However, in some cases, such equations are large and coarse, providing an opportunity to preprocess or transform them so that they do not require extensive computational power to solve. The term "coarse" as used herein refers to the characteristic that only a very small percentage of the elements in the matrix have non-zero values. When extreme roughness exists in very large systems, there are several techniques that can be used to transform the system of equations into one that is more easily handled from a computational standpoint. However, despite such transformations, the size and other characteristics of the resulting matrix equations may make standard computational techniques impractical and the efficiency may be very low.

以上述べたことから理解できるように、そのように大き
い方程式系を解くためＫ、大型で、より強力な計算機械
を利用できるようになったために、有限要素解析の分野
は大きく進歩してきた。現在では、汎用コンピュータで
実行するためにとくに良く使われる特殊な応用計算を実
行するために設計された各秤の高性能専用コンピュータ
システムがある。そのような専用コンピユーメジステム
の一例は、収縮アーキテクチャ（ｓｙｓｔｏｌｉｃ　ａ
ｒｅｈｌｔａｃｔｕｒｅ）の概念を基にしているもので
あって、高レベルの計算をハードウェア構造にマツピン
グするための一般的な方法論を提供するものである。収
縮コンピュータシステム（ｓｙｇｔｏｌｉｅ　ｅｏｎｐ
ｕｔｅｒ　ｓｙ＋ｓｔｅｍ）チータカコンピュータのメ
モリから周期的に流れ、チェーン状−！！念はパイプラ
イン状の多数の処理素子を通ってからメモリへ戻ること
により、各メモリアクセスととに多数の計算を行えるよ
うにして、関連する入力要求／出力要求を増加すること
なしに、多数の計算を必要とする諸問題の実行速度を大
幅に高くする結果となる。マトリックス演算を取扱うた
めの収縮アーキテクチャを構成するいくつかの方法論が
、ソサエティ・フォー・インダストリアル・アンド・ア
プライド・マトリックス（Ｓｏｃｉｅｔｙ　ｆｏｒ　Ｉ
ｎｄｕｓｔｒｉａｌ　ａｎｄ　ＡｐｐＨｅｄ　Ｍａｔｈ
ｅｍｔｉｅｓ）１９７９、スパース・マトリックス・ピ
ーアールオーシー（Ｓｐａｒｓｅ　Ｍａｔｒｉｘ　Ｐｒ
ｏｃ、）、１９７９．２５６〜２８２ページ所載のエイ
チ・ティー・クン（Ｈ，Ｔ。As can be seen from the foregoing, the field of finite element analysis has advanced greatly due to the availability of larger, more powerful computing machines to solve such large systems of equations. There are now high-performance dedicated computer systems for each scale designed to perform specialized application calculations that are particularly common to perform on general-purpose computers. An example of such a dedicated computer system is a systolic architecture.
It provides a general methodology for mapping high-level computations to hardware structures. sygtolie computer system (sygtolie eonp)
uter sy+stem) Flows periodically from the memory of the Cheetaka computer, in a chain-! ! The memory is pipelined through many processing elements and back to memory, allowing each memory access to perform many calculations without increasing the associated input/output requirements. This results in significantly faster execution of problems that require calculations. Several methodologies for constructing contracted architectures for handling matrix operations are proposed by the Society for Industrial and Applied Matrix
Industrial and Applied Math
emties) 1979, Sparse Matrix Pr
oc, ), 1979, pages 256-282, H.T.

Ｋｕｎｇ）およびシー・イー・レイザーソン（Ｃ，Ｅ。Kung) and C.E. Lazerson (C,E.

Ｌｓｉａｅｒｓｏｎ）　　の［シストリック・プレイ（
フォー・ブイエルニスアイ）　（Ｓｙｓｔｏｌｉｅ　Ａ
ｒｒａｙ（ｆｏｒ　ＶＬＳＩ）Ｊと題する論文において
論じられている。この問題の別の解析および示唆された
解法が、雑誌［アイ・イー・イー・イー・トランザクシ
ョンズ・オン・コンピュータス（ＩＥＥＥ　Ｔｒａｎｓ
ａｃｔｉｏｎｓ　ｏｎ　Ｃｏｒｎｐｕｔｅｒｓ）Ｊ、Ｖ
ｏｌ、Ｃ−３２、烹３．１９８３年３月、所載の「アン
・エフイシエント・パラレル・アルゴリズム・フォー・
ザ・ソリューション・オン・ラージ・スパース・リニヤ
・マトリックス・イクエーション（ＡｎＥｆｆｉｃｉｅ
ｎｔ　Ｐａｒａｌｌｅｌ　Ａｌｇｏｌｉｔｈｍ　ｆｏｒ
　ｔｈｅ　ａｏｌｕｔｉｏｎｏｆ　Ｌａｒｇｅ　５ｐａ
ｒｓｅ　Ｌｌｎｄａｒ　Ｍａｔｒｉｘ　Ｅｑｕａｔｉｏ
ｎｓ）Ｊに記載されている。Systolic Play (Lsiaerson)
(Systolie A)
rray (for VLSI) J. Another analysis and suggested solution to this problem was published in the journal IEEE Trans.
actions on Computers) J, V
ol, C-32, 3. March 1983, "An Efficient Parallel Algorithm for...
The Solution on Large Sparse Linear Matrix Equation (AnEfficie
nt Parallel Algorithm for
the solution of Large 5pa
rse Llndar Matrix Equatio
ns) described in J.

米国特許出願筒８７０，４６７号および第８７０，５６
６号の各明細書には、種々の解析法のうちで有限要素解
析法に共通である各種のマトリックス演算を実行し、か
つ計算速度を高くするために並列プロセッサを用いる技
術および装置が開示されている。U.S. Patent Application Nos. 870,467 and 870,56
The specifications of No. 6 disclose techniques and devices that use parallel processors to execute various matrix operations common to finite element analysis among various analysis methods and to increase calculation speed. ing.

これは、一連の並列プロセッサにより並列に実行される
多数の、繰返えされるほぼ同一のステップとして実行さ
れるように解を案配することによって主として行われる
。This is primarily done by arranging the solution to be performed as a large number of repeated, nearly identical steps executed in parallel by a series of parallel processors.

たとえば、米国特許出願筒８７０，５６６号明細書には
、種々のマトリックス演算のためにマトリックスの要素
を容易に利用できるようにするやシ方で、大きい粗マト
リックスを多数プロセッサアーキテクチャに格納する方
法が開示されている。線形方程式系を解くために用いら
れるそのような演算の１つは後進代入（ｂａｃｋ　５ｕ
ｂｓｔｉｔｕｔｉｏｎ）であるが、当業者であればその
他の演算も自明であろう。For example, U.S. Pat. Disclosed. One such operation used to solve a system of linear equations is backward substitution (back 5u
bstition), but other operations will be obvious to those skilled in the art.

本発明の技術の結果として発生された線形方程式の解を
、後で述べるように、各種の並列マルチプロセッサ・ア
ーキテクチャで実現できる。しがし、説明のために、こ
の明細書で開示する方法の結果として発生された方程式
の解を、１９８５年３月２６〜２９日に開催されたアイ
・イー・イー・イー・インターナショナル・コンファレ
ンス・オン・アコスチックス、スピーチ、アンド・シグ
ナル・プロセッシング（Ｉ部Ｅ　Ｉ　ｎ　ｔｅｒｎａｔ
　１ｏｎｓ　Ｉ　Ｃｏｎｆｅｒ＠ｎｃｅｏｎ　Ａｃｏｕ
ｓｔｉｃｓ、５ｐｅｅｃｈ、ａｎｄ　Ｓｉｇｎａｌ　Ｐ
ｒｏｃｓｓｓｌｎｇ）Ｊに提出されたイー・アーノルド
（Ｅ、Ａｒｎｏｕｌｄ）他の「ア・シストリック・アレ
イ・コンピュータ（ＡＳｙｓｔｏｌｉｅ　Ａｒｒａｙ　
Ｃｏｎｐｕｔｅｒ）Ｊと題する論文の２３２〜２３５ペ
ージに全体的に記載されている。本発明の方法は米国特
許第４，４９３．０４８号［シストリック・プレイ・ア
パレイタセス・フォー・マトリックス・コンビューテー
ションズ（Ｓｙａｔｏｌｌｅ　ＡｒｒａｙＡｐｐａｒａ
ｔｕｓｅｓ　ｆｏｒ　Ｍａｔｒｉｘ　Ｃｏｎｐｕｔａｔ
ｉｏｎｓ）Ｊに開示された装置によって実行されるもの
で、同特許を本明細書中で度々引用する。Solutions to linear equations generated as a result of the techniques of the present invention can be implemented on a variety of parallel multiprocessor architectures, as discussed below. However, for illustrative purposes, the solutions of the equations generated as a result of the method disclosed herein were presented at the IE International Conference held March 26-29, 1985.・On Acoustics, Speech, and Signal Processing (Part I)
1ons I Conference@nceon Acou
sticks, 5peech, and Signal P
``ASystolie Array Computer'' by E. Arnold et al.
Computer) J, pages 232-235. The method of the present invention is described in U.S. Pat. No. 4,493.048 [Sytolle Array Apparatus
uses for Matrix Computer
ions) J, which patent is frequently cited herein.

[Problem that the invention seeks to solve]

後進代入法または前進消去法（ｆｏｒｗａｒｄ　＠１１
ｍｉｎａｔｌｏｎｍｓｔｈｏｄ）により線形方程式系を
解くための前記米国特許出願筒８７０．５６６号明細書
に開示されている方法および装置の主な欠点は、未知数
を求めるのく同時計算すなわち並列計算ではなくて順次
計算を強いるデータ依存性から生ずるものである。Backward substitution or forward elimination (forward @11
The main disadvantage of the method and apparatus disclosed in the aforementioned U.S. Pat. This arises from data dependence that forces

[Purpose of the invention]

、したがって、本発明の目的は、線形方程式系によシ表
される物理系を解析する改良した装置および機械で実現
される方法を得ることである。, It is therefore an object of the present invention to obtain an improved apparatus and machine-implemented method for analyzing physical systems represented by systems of linear equations.

本発明の別の目的は、製造された製品を一層高い効率で
分析でき、かつ線形方程式系により表されている物理系
を一層高い効率で分析できるようにしてマルチプロセッ
サを動作させることである。Another object of the invention is to operate a multiprocessor in such a way that manufactured products can be analyzed with higher efficiency and physical systems represented by systems of linear equations can be analyzed with higher efficiency.

本発明の別の目的は、線形方程式系を並列プロセッサ、
とくに並列、パイプラインプロセッサで解く速度と効率
を高くする方法および装置を得ることである。Another object of the invention is to process a system of linear equations by a parallel processor.
In particular, it is an object to provide a method and apparatus for increasing the speed and efficiency of solving on parallel, pipelined processors.

本発明の別の目的は、多数の計算を同時に行うことがで
き、しかも効率の高い実現を可能にするために簡単で、
定期的な２通信および制御のみを用いる収縮プレイコン
ピュータシステムで大キい粗マトリックスを処理する方
法を得ることである。Another object of the present invention is to be simple so as to be able to perform a large number of calculations simultaneously and to enable highly efficient implementation.
It is an object of the present invention to provide a method for processing large coarse matrices in a contraction play computer system using only periodic 2 communications and control.

本発明の別の目的は、「帯域幅が最大にされた」三角形
分解されたマトリックス、すなわち、マトリックスの右
上隅または左下隅における非零要素から主対角線上の非
零要素を分離する予め選択された最小幅の零要素のバン
ドを得るために変換された三角形分解されたマトリック
スに対して実行する後進代入技術または前進消去技術を
用いることにより線形方程式系の解を計算する速度を向
上させる方法論を得ることである。Another object of the invention is to provide a "bandwidth-maximized" triangulated matrix, i.e., a preselected triangulated matrix that separates non-zero elements on the main diagonal from non-zero elements in the upper right or lower left corner of the matrix. A methodology for increasing the speed of computing solutions to systems of linear equations by using backward substitution or forward elimination techniques performed on transformed triangulated matrices to obtain bands of zero elements of minimum width. It's about getting.

本発明の別の目的は、上記バンドによフ特徴づけられる
分解され念マトリックスが発生されるように、有限要素
解析法の実行に用いられる有限要素網目の結節点の番号
をつけ換える技術を得ることである。Another object of the present invention is to obtain a technique for renumbering the nodes of a finite element network used in performing the finite element analysis method so that a decomposed mental matrix characterized by the above-mentioned bands is generated. That's true.

本発明の更に別の目的は、並列プロセッサを得ること、
および上記のような種類のシステムマトリックスにより
特徴づけられた線形方程式系を新規かつ一層効率的なや
り方で解くためにその並列プロセッサを動作させる方法
論を得ることである。Yet another object of the invention is to obtain a parallel processor;
and to obtain a methodology for operating the parallel processor to solve systems of linear equations characterized by system matrices of the type described above in a new and more efficient manner.

ｃ問題点を解決するための手段〕本発明のそれらの目的およびその他の目的は、物理系の
ための三角形分解されたマトリックスをそのマトリック
スの主対角線に隣接する零値要素のバンドすなわち所定
の最小バンド幅のデータのバンドを有する形で発生する
過程と、そのマトリックスに関連する方程式系、複数の
プロセッサのメモリに格納されているマトリックスのデ
ータ要素を処理する並列プロセッサで解く過程とを含み
、物理系を解析するために有限要素解析法を実行する方
法および技術により達成される。上記のように分解され
たマトリックス中に零値データのバンドが存在するから
、系の未知ベクトル成分の多数の計算を同時に、非常に
高い効率で行う新規なや〕方で実行でき、未知ベクトル
の複数の成分の値の計算を、その未知ベクトルの成分の
他の値の以前の解とは独立に、後進代入により行われる
。ベクトルの他の成分について先に行われた計算が完了
するまで、与えられたベクトル成分についての後進代入
法を遅らせる必要が無くなったから、系を解く全体的な
速度は十分に高くなる。These and other objects of the present invention are to provide a triangulated matrix for a physical system with a band of zero-valued elements adjacent to the main diagonal of the matrix, i.e., a predetermined minimum. The physical This is achieved by methods and techniques for performing finite element analysis methods to analyze systems. The existence of bands of zero-value data in the decomposed matrix as described above allows a novel way to perform many calculations of the unknown vector components of the system simultaneously and with very high efficiency. Computation of the values of the components is performed by backward substitution, independent of previous solutions of other values of the components of the unknown vector. The overall speed of solving the system is significantly higher since it is no longer necessary to delay the backward substitution method for a given vector component until the calculations previously performed for the other components of the vector are completed.

〔Example〕

以下、図面を参照して本発明の詳細な説明する。 Hereinafter, the present invention will be described in detail with reference to the drawings.

前記したように、本発明の方法は、物理系を解析するた
めに用いられる有限要素解析法の改良に関するものであ
る。それらの物理系においては、場の変数（ｆｌ・ｌｄ
　ｗａｒｔ凰ｂｌ・）（たとえば圧力、温度、変位、応
力その他のめる量）が、調べられている物体すなわち解
領域（ｓｏｌｕｔｉｏｎ　ａｒｅａ）内の各幾何学的な
点の関数であるから、そのような場の変数は無限に多く
の成分値を有する。本発明の方法の第１の過程において
は、解領域を要素に分け、未知の場の変数を各要素中の
近似関数で表すことにより、問題は有限数（ただし大き
い）の未知数を含んでいるものく分けられる（ｄｌｓｃ
ｒ＠ｔｉｚｓｄ）。As mentioned above, the method of the present invention relates to improvements in finite element analysis methods used to analyze physical systems. In those physical systems, field variables (fl・ld
Such a field is The variables have infinitely many component values. In the first step of the method of the present invention, the problem contains a finite (but large) number of unknowns by dividing the solution domain into elements and representing the unknown field variables by approximate functions in each element. can be separated (dlsc)
r@tizsd).

このようにして、複雑な問題が、非常に簡単にされた一
連の問題を解くことにされる。近似関数は、結節点と呼
ばれる指定され次点における場の変数の未知の値で定め
られる。それから、個々の要素の性質を表すマトリック
ス方程式が書かれる。In this way, complex problems are reduced to solving a series of greatly simplified problems. The approximation function is defined by the unknown values of field variables at specified next points called nodes. Matrix equations are then written that describe the properties of the individual elements.

一般に、結果変数すなわちフォーシング（ｆｏｒｃｉｎ
ｇ）変数Ｒをある与えられた結節点における場の変数Ｙ
と「スチフネス」変数Ｋに関係づける線形方程式により
各結節点は特徴づけられる。更に詳しくいえば、多数の
そのような結節点を有する調べている解領域に対して、
成分「Ｒ□、Ｒ３，・・・ＲＮＪ　（境界結節点におい
て知られている、または零値を有する）結果変数すなわ
ちフォーシング変数ｒＲＪが、成分［Ｙ□、Ｙ２．・・
・ＹＮ」を有するベクトル場の変数ｒＹＪに、それらの
結節点における場の変数の定数すなわち係数で構成され
ているスチフネスマトリックスｒＫＪを乗じたものであ
る次のマトリックス方程式％式％（）により表される。たとえば、線形ばね系の解析において
は、　　（Ｒ□、Ｒ１，・・・〕によシ表される結果変
数成分は系の選択された結節点へ加えられる力の値とす
ることができる。場の変数成分（Ｙｌ、Ｙ２．・・りは
結節点における変位値とすることができ、係数は、力を
結節点における変位に関係づける調べられているばね要
素のばねスチフネス値とすることができる。定数は係数
マトリックスすなわちスチフネスマトリックス（Ｋ）を
形成する。In general, the outcome variable or forcing
g) Let the variable R be the field variable Y at a given node
Each node is characterized by a linear equation relating K to a "stiffness" variable K. More specifically, for a solution domain under investigation with a large number of such nodes,
The component ``R□, R3,...RNJ (known or having a zero value at the boundary node) result variable or forcing variable rRJ is replaced by the component [Y□, Y2...
・The vector field variable rYJ with ``YN'' is multiplied by the stiffness matrix rKJ, which is made up of constants or coefficients of the field variables at their nodes. Ru. For example, in the analysis of a linear spring system, the result variable components represented by (R□, R1,...) can be the values of the forces applied to selected nodes of the system. The variable components of (Yl, Y2... can be the displacement values at the nodes, and the coefficients can be the spring stiffness values of the spring element being investigated, which relate the forces to the displacements at the nodes. The constants form a coefficient matrix or stiffness matrix (K).

有限要素解析法の次の過程は上のマｌツクス方程式を解
くことである。有限要素解析法によシ解析される複雑な
物理系を表す系すなわち方程式の解を迅速かつ高い効率
で自動的に得るためにマトリックス方程式を取扱う機械
技術が本願の主題である。有限要素解析法の原理のより
詳しい説明が、ケー・エイチ・ヒユーブナ−（Ｋ、Ｈ，
）Ｉｕｅｂｕｎ＠ｒ）著［ザ・ファイナイト・エレメン
ト・メソッド・フォー・エンジニャース（Ｔｈｅ　Ｆｉ
ｎｉｔｅ　ＥｌｅｍｅｎｔＭｅｔｈｏｄ　ｆｏｒ　Ｅｎ
ｇｉｎｅ＠ｒｓ）Ｊ、およびクラウス；ジャーゲン・ベ
イス（Ｊｕｒｇ＠ｎ−Ｋｌａｕｓ　Ｂａｔｈｅ）著［フ
ァイナイト・エレメント・グロシージャ・イン・エンジ
ニャリング・アナリシス（Ｆｉｎｉｔ・Ｅｌｅｍ＠ｆｉ
ｔＰｒｏｃｅｄｕｒ＠ｉｎ　Ｅｎｇ１ｎｅ＠ｒｉｎｇ　
Ａｎａｌｙｓｉｔ）Ｊ　　に記載されている。The next step in the finite element analysis method is to solve the above Marx equation. The subject of this application is a mechanical technology that handles matrix equations in order to quickly and efficiently automatically obtain solutions to systems or equations representing complex physical systems analyzed by finite element analysis. A more detailed explanation of the principles of finite element analysis is provided by K.H.
) Iuebun@r) [The Finite Element Method for Engineers (The Fi
nite ElementMethod for En
gine@rs) J, and Klaus; Jurg@n-Klaus Bathe
tProcedur@in Eng1ne@ring
Analysit) J.

上記方程式系を解くために多くの技術すなわち方法を利
用できる。それらの技術の性質は一般に非常に複雑なこ
とがしばしばあり、元の系方程式から得られ、かつそれ
ら元の方程式に等しいか、近似である系方程式の分解ま
たは変換の使用を含む。Many techniques or methods are available for solving the above system of equations. The nature of these techniques in general is often very complex and involves the use of decompositions or transformations of systems of equations that are derived from and are equal to or approximate the original equations.

したがって、たとえば、元の系方程式％式％（）という形に変換できる。ここに、〔Ａ〕は次元がＮＸＮ
である既知の分解されたマトリックス、Ｑは既知の変換
されたＮベクトル、未知の変換されたＮベクトルである
。Therefore, for example, the original system equation can be converted to the form %expression%(). Here, [A] has dimensions NXN
is a known decomposed matrix, Q is a known transformed N vector, and an unknown transformed N vector.

元の方程式系のそのような変換および分解の目的は、最
後の解に到達するために、変換された系に各種の既知マ
トリックス技術を適用できるようにすることである。The purpose of such transformation and decomposition of the original system of equations is to be able to apply various known matrix techniques to the transformed system in order to arrive at the final solution.

後進代入および前進消去は、有限要素解析法を用いて得
られた系方程式を解くのに用いられる２種類のそのよう
な数値技術である。Backward substitution and forward elimination are two such numerical techniques used to solve systems of equations obtained using finite element analysis methods.

後進代入技術は、分解または変換されたマトリックス（
Ａ）が、マトリックスの全ての対角線要素が非零である
三角形（上側または下側）であるような場合に用いられ
る。後進代入技術は、決定的な最終結果に達するために
１回使用でき、または最後の解に達するために一層複雑
な近似技術の一部として繰返えし使用できる。一般的な
形の線形系を三角形の系に変換する技術は周知である。The backward substitution technique uses the decomposed or transformed matrix (
A) is used when the matrix is triangular (upper or lower) in which all diagonal elements are non-zero. Backward substitution techniques can be used once to arrive at a definitive final result, or repeatedly as part of a more complex approximation technique to arrive at a final solution. Techniques for converting linear systems of general form to triangular systems are well known.

そのような三角形の系は次のような形を有する。A system of such triangles has the following shape:

Ａ１１Ｘｔ＋Ａ１ｇＸｓ”　””ｌ、Ｎ−ｌＸＮ−１＋
ＡＩＮＸＮ“ＱＩＡｌｌＸｌ”　””１．Ｎ−ｌＸＮ−
１＋ＡＩＮＸＮ　”　ＱＩＡＮ−Ｉ　Ｎ−１ｘＮ−１＋
ＡＮ−Ｉ　ＮＸＮ　＝　ＱＮ−ＩＡＮＮｘＮ＝　ＱＮ後進代入によりこの方程式系を解くためには、ＡＮＮ（
スチフネスマトリックスの対角線要素）とＱ９（フォー
シングベクトル変数の成分）が既知であるから、最後の
方程式をＸＮについて直ちに解くことができることにま
ず注意されたい。ＸＮＮ知知ば、最後から２番目の方程
式には念だ１つの未知数すなわち４−□が存在するだけ
であるから、ＸＮを知れば最後から２番目の方程式を解
くことができる。A11Xt+A1gXs” “”l, N-lXN-1+
AINXN“QIAllXl”””1. N-lXN-
1+AINXN ” QIAN-I N-1xN-1+
AN-I NXN = QN-IANNxN= QN To solve this system of equations by backward substitution, ANN (
Note first that the last equation can be immediately solved for XN since Q9 (the diagonal element of the stiffness matrix) and Q9 (the component of the forcing vector variable) are known. If we know XNN, there is only one unknown in the penultimate equation, namely 4-□, so if we know XN, we can solve the penultimate equation.

とくに、である。ＸＮとｘＮ−１が既知であシ、最後から３番目
の方程式はただ１つの真の未知数、すなわち、ＸＮ−１
を含んでいるから、その方程式を解くことができる。し
たがって、一般にｉ＝Ｎ、Ｎ−１，・・・１に対して意すべきである。このことは項無しについての和と解釈
され、規約により値Ｏを与える。In particular, . Since XN and xN-1 are known, the third-to-last equation has only one true unknown, i.e., XN-1
Since it contains, we can solve the equation. Therefore, it should generally be considered that i=N, N-1, . . . 1. This is interpreted as a sum with no terms and gives the value O by convention.

後進代入法のいくつかの特徴によう、ここで説明するマ
ルチプロセス装置によシ実現される非常に多数の同時処
理の可能性が与えられる。第１に、ひとたび計算された
ＸＮに、Ｘの成分値について解く過程中にマトリックス
のｎ列目の各要素を乗じなければならない。したがって
、ＸＮ−□　の計算においてはＸＮＫＡＮ−１，Ｎ（Ｎ
−１行・Ｎ列からの係数）を乗じなければならない。同
様に、ＸＮ−Ｉ　Ｎ・・・Ｘｌ　　の計算においては〜
にＡＮ−１，Ｎｌ・・・ＸＩＮ（Ｎ列の残りの係数）を
乗じなければならない。Several features of the backward substitution method allow for the large number of simultaneous processing possibilities realized by the multi-processing apparatus described herein. First, once calculated, XN must be multiplied by each element of the nth column of the matrix during the process of solving for the component values of X. Therefore, in the calculation of XN-□, XNKAN-1,N(N
- the coefficient from row 1 and column N). Similarly, in calculating XN-I N...Xl, ~
must be multiplied by AN-1, Nl...XIN (remaining coefficients of N columns).

同様にして、ひとたび計算され次項ＸＮ−１にＮ−１列
の各係数が乗ぜられる。Similarly, once calculated, the next term XN-1 is multiplied by each coefficient in the N-1 column.

並列プロセッサで上記過程を効率良く実行する装置と方
法が前記米国特許出願第８７０　、５６６号明細書に開
示されている。しかし、その米国特許出願明細書に開示
されている装置および方法は、求められている演算にお
けるデータ依存性が、有限要素方程式を解く際の利用可
能な並列処理を制限するという点で本質的に限定される
から、本願のマルチグロセツサアー千テクチャの処理速
度を低くする。とくに、上の方程式から、前のＸ１＋１
＋Ｘｌ＋１・・・ＸＮの全てが得られ、それらに関連す
る列方向の乗算が行われ、部分結果が累算されるまでは
Ｘｉを計算でき々いから、ＸＮ−１１ＸＮ−２・・・　
等を得る過程は本質的に直列である。An apparatus and method for efficiently performing the above process on parallel processors is disclosed in the aforementioned US patent application Ser. No. 870,566. However, the apparatus and method disclosed in that U.S. patent application are inherently limited in that the data dependencies in the operations being sought limit the available parallelism in solving finite element equations. Therefore, the processing speed of the multi-gross processor of this application is reduced. In particular, from the above equation, the previous X1+1
+Xl+1...XN cannot be computed until all of XN have been obtained, their associated column-wise multiplications have been performed, and the partial results have been accumulated, so XN-11XN-2...
The process of obtaining etc. is essentially serial.

解析される物理系が、大きくて粗である（高い割合の零
係数を有する）系方程式により特徴づけられる場合には
、行われる分解および変換を使用することが全く一般的
になった。It has become quite common to use decompositions and transformations that are performed when the physical system to be analyzed is characterized by large and coarse system equations (with a high proportion of zero coefficients).

まとめ（ｂａｎｄ＊ｄｎｅｓｓ）　　というのは非零類
を分解マトリックスの特定の部分すなわちバンド内に閉
じこめることを意味し、第５Ａ図と第５Ｂ図を参照する
ことにより良く理解できる。第５Ａ図と第５Ｂ図におい
て、２つのマトリックスの係数構造すなわち定紋（ｆｏ
ｏｔｐｒｌｎｔ）が示されている。実線の対角線は非零
要素を表し、クロスハツチされている領域は零データ要
素と非零デーメ要素を含む領域を表し、白または黒の領
域は零要素のみを示す。Band*dness means confining the nonzero class within a particular portion or band of the decomposition matrix, and can be better understood with reference to FIGS. 5A and 5B. 5A and 5B, the coefficient structure of the two matrices, i.e., the fixed pattern (fo
otprlnt) is shown. Solid diagonal lines represent non-zero elements, cross-hatched areas represent areas containing zero data elements and non-zero data elements, and white or black areas represent only zero elements.

したがって、第５Ａ図において、マトリックスの非零要
素は対角線の両側の幅すのバンドに（ある数の非零要素
とともに）閉じこめられる。マトリックスの右上隅と左
下隅には零要素のみが配置される。非零要素が対角線に
隣接する小さいバンドに閉じこめられるから、この種の
まとめは「バンドの幅め最小化」と呼ばれる。当業者に
は容品にわかるように、バンドの幅を最小にされたその
ようなマトリックスは後進代入法を行うなめに必要な計
算回数を減少する。その理由は、既知の計算（空白（ｂ
ｌａｎｋ）領域内の全ての要素を含んでいる）回数が零
結果をもたらすことが不可避だからである。しかし、そ
れでも場の変数の未知成分値の計算を直列に行わなけれ
ばならないから、バンドの幅を最小化では前記データ依
存性問題を解消できない。Thus, in FIG. 5A, the non-zero elements of the matrix are confined (along with a certain number of non-zero elements) to the width bands on either side of the diagonal. Only zero elements are placed in the upper right and lower left corners of the matrix. This type of summation is called ``band width minimization'' because nonzero elements are confined to small diagonally adjacent bands. As those skilled in the art will appreciate, such a matrix with minimized band width reduces the number of calculations required to perform the backward substitution method. The reason is the known calculation (blank (b
This is because it is inevitable that the number of times (including all elements in the rank) area will yield a zero result. However, since the unknown component values of the field variables must be calculated in series, the data dependence problem cannot be solved by minimizing the band width.

バンドの幅を最小化された方程式系およびそれの対応す
る最小化された分解マトリックスの形成が、有限要素系
の網目内の隣接する結節点に割当てられた番号の差が最
小にされるようにして、有限要素系内の結節点に番号を
つける、または番号をつけ直す既知技術の結果として有
限要素解析法の実行中に達成される。現在実施されてい
るバンドの幅を最小化を達成するために結節点に番号を
つける技術（適切なプログラムを含む）が次の刊行物に
おいて詳しく論ぜられている。アール・ローセス（Ｒ，
Ｒｏ■・ｎ）［マトリックス・バンドワイドス・ミニマ
イゼーション（Ｍａｔｒｌｘ　ＢａｎｄｗｌｄｔｈＭｌ
ｎｉｍｌｚａｔｉｏｎ）　Ｊ、プロシーデインダス、ナ
ショナル・コンファレンス・ニーシーエム（Ｐｒｏｃｅ
ｅｄｉｎｇｓ。The formation of a system of equations with a minimized band width and its corresponding minimized decomposition matrix is such that the difference in the numbers assigned to adjacent nodes in the mesh of the finite element system is minimized. is accomplished during a finite element analysis method as a result of known techniques for numbering or renumbering nodes within a finite element system. Current techniques for numbering nodes (including appropriate programs) to achieve band width minimization are discussed in detail in the following publications: Earl Roces (R,
Ro■・n) [Matrix Bandwidth Minimization (Matrlx BandwidthMl
nimlzation) J., Proced.
edings.

Ｎａｔｉｏｎａｌ　Ｃｏｎｆ＠ｒｅｎｃｅ　Ａ、Ｃ，Ｍ
、）＃　１９６８．５８５〜５９５ページ、およびイー
・エイチ・カッシル（ｉＨ。National Conf@rence A, C, M
, ) # 1968. pages 585-595, and E.H. Cassill (iH.

Ｃｕｔｈｌｌｌ）　　およびジエー・エム”ｆツキ−（
Ｊ、Ｍ。Cutllll) and G.M.
J.M.

Ｍ６に＠ｅ）、ｒレデューシング・ザ・バンドワイドス
・オン・スパース・シンメトリック・マトリックス（Ｒ
ｅｄｕｃｉｎｇ　ｔｈｅ　Ｂａｎｄｗｉｄｔｈ　ｏｆ　
５ｐａｒａｓ　ＳｙｒｒｒｎａｔｒｌｃＭａｔｒｌｃｅ
ｓ）　Ｊ、ブロシーデインダス、ナショナル・コンファ
レンス・ニーシーエム（Ｐｒｏａａｅｄｉｎｇｓ。M6 @e), rReducing the Bandwidth on Sparse Symmetric Matrix (R
Educating the Bandwidth of
5paras SyrrnatrlcMatrlce
s) J. Proaaedings, National Conference N.C.

Ｎａｔｉｏｎａｌ　Ｃｏｎｆｅｒｅｎｃｅ　Ａ、Ｃ，Ｍ
、）　、　１９６９．１５１−１７２ページ。National Conference A, C, M
), 1969. pp. 151-172.

有限要素方程式を解くのに上記データ依存性を除去し、
それらの方程式を並列プロセッサで一層効率良く解くこ
とができるようにするために、「バンドの幅を最大化」
する念めの技術をこの明細書でする、すなわち、分解さ
れたマトリックスの使用が第５Ｂ図に示されているよう
な係数構造を有する。そのようなマトリックス（および
それの対応する方程式系）がそれの主対角線上の非零デ
ータと、対角線に隣接する零データ要素のみのバンドと
、マトリックスの右上隅と左下隅にまとめられた他の全
てのデータ（他の全ての非零データを含む）とによって
のみ特徴づけられる。隣接する結節点が少くともある所
定の最小値にだけ異なるように有限要素網目の結節点に
番号をつけ、または番号をつけ換えることによＱ１最大
化された定紋を有するマトリックスの作成は容易に行わ
れる。したがって、マトリックスの対角線のいずれかの
側ＫＩＣＫのバンド内に零データ要素だけが存在する。To solve the finite element equation, remove the above data dependence,
In order to solve these equations more efficiently on parallel processors, we decided to "maximize the bandwidth".
A proposed technique is used herein, ie, the use of a decomposed matrix with a coefficient structure as shown in FIG. 5B. Such a matrix (and its corresponding system of equations) has non-zero data on its main diagonal, a band of only zero data elements adjacent to the diagonal, and other bands grouped in the upper right and lower left corners of the matrix. all data (including all other non-zero data). It is easy to create a matrix with a Q1-maximized pattern by numbering or renumbering the nodes of a finite element network so that adjacent nodes differ by at least a certain minimum value. It will be done. Therefore, there are only zero data elements in the KICK bands on either side of the matrix diagonal.

第５Ｂ図に示されている形の分解されたマトリックスに
より特徴づけられた方程式系が、後で詳しく説明する方
法論に従って動作する並列プロセッサ装置で後進代入法
を用いて解かれる時に、従来の方法および装置と比較し
て高い効率が得られる。それらの効率は、本発明の動作
方法に従って並列アーキテクチャで後進代入または前進
消去を実行する場合にデータ依存性の変更から得られる
ものである。When a system of equations characterized by a decomposed matrix of the form shown in FIG. High efficiency can be obtained compared to other devices. These efficiencies derive from the modification of data dependencies when performing backward assignment or forward elimination in parallel architectures according to the method of operation of the present invention.

未知の場の変数の最初のに成分を解くための後進代入は
いまは次のようになる。The backward substitution to solve for the first component of the unknown field variable now looks like this:

および同様に、等である。and similarly, etc.

その理由は、第５Ｂ図に示すように、解マトリックスの
一般化されたバンド幅が最大化された形の九めにＡＮ−
１，Ｎ’　１ＡＮ−１，ダおよびＡＮ−ｊ、Ｎ−０が全
て零である。一般に１　バンド幅が最大化されたマトリ
ックスが採用される場合には後進代入は１＝Ｎ、Ｎ−１
，・・・・、ｌ　に対してＸ、　＝（Ｑｉ−Σ　　Ｘｊ
ＡＩ　、　ｊ）／Ａｌ　、　ｉｊ−ｇｌ　十に＋１を行うこととなる。したがって、ｊ−１≧にであるよう
に、任意の与えられたＸ、を以前のＫＸＪとは独立に計
算できる。たとえば後述するバンド幅を最大化されたマ
トリックスを用いている後進代入法の実行において、第
４図のマルチプロセッサの動作に関して詳しく説明する
ように、与えられたＸｌが依存する全てのデータを一緒
に得るために要する最長の時間に対応するようにＫが選
択される。The reason for this is that, as shown in Figure 5B, the generalized bandwidth of the solution matrix is maximized in the ninth AN-
1,N' 1AN-1, da and AN-j, N-0 are all zero. Generally 1 If a matrix with maximized bandwidth is adopted, backward assignment is 1=N, N-1
,..., for l, X, = (Qi-Σ Xj
AI, j)/Al, ij-gl +1 will be performed. Therefore, any given X, such that j-1≧, can be computed independently of the previous KXJ. For example, in performing the backward substitution method using a bandwidth-maximized matrix described below, all the data on which a given K is chosen to correspond to the longest time required to obtain

線形方程式系を解くための計算装置としてマトリックス
代数を簡単に説明することは本発明を理解するための背
景として有用であろう。A brief description of matrix algebra as a computational device for solving systems of linear equations may be useful as background for understanding the present invention.

最も有用な形のマトリックスは互いに直交して配置され
たｍ行、ｎ列のスカラー景の長方形アレイである。マト
リックスの次数はそれの行数×列数によフ定められるか
ら、下に示すマトリックスは「ｍｘｎＪ　マトリックス
と呼ばれる。マトリックスを表す通常の簡潔な方法を下
に示す。The most useful form of matrix is a rectangular array of m rows and n columns of scalar fields arranged orthogonally to each other. Since the order of a matrix is determined by its number of rows times its number of columns, the matrix shown below is called an mxnJ matrix. The usual concise way to represent a matrix is shown below.

マトリックスの行というのは量の水平ラインすなわち一
次元アレイであり、列は量の垂直ラインすなわち一次元
アレイである。量Ａ□□、Ａ□３等はマトリックス（Ａ
）の要素と呼ばれる。ｎ−１であるマトリックスは列マ
トリックスすなわち列ベクトルであＪ）、ｍ＝１である
マトリックスは行マトリックスすなわち行ベクトルであ
る。よシ一般的には、べ）トルというのは順序づけられ
友ｎ個の要素から成る値（ｏｒｄ＠ｒｅｄ　ｎ−ｔｕｐ
ｌ＠ｏｆ　ｖａｌｕ＠ａ）として定義される。マトリッ
クス中のラインは行または列である。The rows of a matrix are horizontal lines or one-dimensional arrays of quantities, and the columns are vertical lines or one-dimensional arrays of quantities. The quantities A□□, A□3, etc. are matrix (A
) elements. A matrix with n-1 is a column matrix or column vector J), and a matrix with m=1 is a row matrix or row vector. In general, a vector is an ordered value consisting of n elements (ord@red n-tup).
l@of value@a). Lines in a matrix are rows or columns.

正方形マトリックスというのは、行数口と列数ｍが等し
いマトリックスのことである。正方形マトリックスの対
角線は要素Ａ１□１ＡｌｌｌＡ８８・・・ＡＮＮよ構成
る。A square matrix is a matrix in which the number of rows and the number of columns m are equal. The diagonal of the square matrix is composed of elements A1□1AllA88...ANN.

三角形マトリックスというのは、それの主対角線中と上
または下にそれの全ての非零要素を含むようなマトリッ
クスである。上側の三角形マトリックスはそれの非零要
素の全てをそれの主対角線中または主対角線の上に有し
、下側の三角形マトリックスはそれの非零要素の全てを
それの主対角線中または主対角線の下側に有する。　　
・［Ａ）をｍｘｎ次、（Ｘ）をＨＸｐ次数として、（Ａ
）　Ｘ（Ｘ）　−（Ｑ）の演算においては、（Ｑ）はｍ
Ｘｐ次のマトリックスである。したがって、〔Ｘ〕が列
ベクトルである場合には、〔Ｑ〕は、〔Ａ〕中の行の数
と同じ行の数ｍを有する列ベクトルである。マトリック
スの乗算は、上記の例においてマＩＩツクス（Ａ）がマ
トリックス（Ａ）の列の数と同じ行数を有する場合にの
み定められることに注意すべきである。上記の乗算にお
いては、一般にＱ、ｊ＝　　Σ　ＡｌｋｘＸｋｊｋ■ｌであるように、〔Ｘ〕の列に〔Ａ〕の行を乗することに
よ〕〔Ｑ〕が得られる。A triangular matrix is one that contains all its nonzero elements in, above, or below its main diagonal. The upper triangular matrix has all of its nonzero elements in or above its main diagonal, and the lower triangular matrix has all of its nonzero elements in or above its main diagonal. Has it on the bottom.
・If [A] is mxn order and (X) is HXp order, (A
) In the operation of X(X) - (Q), (Q) is m
It is a matrix of order Xp. Therefore, if [X] is a column vector, then [Q] is a column vector with the same number of rows m as the number of rows in [A]. It should be noted that matrix multiplication is defined in the above example only if matrix (A) has the same number of rows as the number of columns of matrix (A). In the above multiplication, in general, [Q] is obtained by multiplying the column of [X] by the row of [A], such that Q,j=Σ AlkxXkj k■l .

上記に従って、マトリックス（Ａ）へのベクトル（Ｘ）
の乗算は下記のようにして通常行われる。According to the above, vector (X) to matrix (A)
The multiplication of is normally performed as follows.

必要とする処理時間の関係から、各特定のベクトル要素
を含む全ての乗算を同時に行うと、中間のメモリアクセ
スが少くなるから、各特定のベクトル要素を含む全ての
乗算を同時に行うことが最も効率的である。したがって
、全ての演算Ａｌ　ｌＸｌ　！Ａ１１Ｘｌ’およびＡ８
□Ｘ０　　を同時に、すなわち、並列に行うことによシ
上記の乗算を行うことが入力／出力の見地から一層効率
的であろう。しかし、この手法によシ、ｒＱＪの種々の
寄与要素（ｅｏｎｔｒｉｂｕｔｉ確ｅｌｅｍｅｎｔｓ）
が散らばる（ｓｅａｔｔｅｒｉｎｇ）結果となる。Due to the processing time required, it is most efficient to perform all multiplications involving each specific vector element simultaneously, as this reduces intermediate memory accesses. It is true. Therefore, all operations Al lXl ! A11Xl' and A8
It would be more efficient from an input/output standpoint to perform the above multiplications by doing □X0 simultaneously, ie, in parallel. However, this approach allows us to identify various contributing elements of rQJ.
This results in scattering.

それらの要素は後で集めなければならない、すなわち、
再び組合わせなければならない。Those elements must be collected later, i.e.
have to be recombined.

上記の乗算法の第２の特徴は、積ベクトルの各要素Ｑｌ
　＋ＱＢ　＋Ｑｓがマトリック要素の元の行（または行
指数（１ｎｄ＠ｘ））を基にした累算すなわち加算の結
果であることである。とくに上記の例を参照すると、Ｑ
□は、マトリックスの行１からの種々の要素に、ベクト
ル（Ｘ）の関連する要素すなわち対応する要素を乗する
ことによフ得られた部分結果の和であることに注意され
たい。The second feature of the above multiplication method is that each element Ql of the product vector
+QB +Qs is the result of an accumulation or addition based on the original row (or row index (1nd@x)) of the matrix element. Specifically referring to the example above, Q
Note that □ is the sum of the partial results obtained by multiplying the various elements from row 1 of the matrix by the relevant or corresponding elements of the vector (X).

このように、この明細書において述べている方法論にお
いて用いられる主な意見（ｏｂｓｅｒｖａｔｉｏｎ）は
、演算の同時実行を強め、上記乗算における入力／出力
要求の数を最小にする九めには、その乗算を一時的に下
記のように進めるべきであるということである。Thus, the main observation used in the methodology described in this specification is to increase the concurrency of operations and minimize the number of input/output requests in the multiplication. This means that we should temporarily proceed as follows.

１回目の演算→Ｘ１＊　（Ａ１□　Ａｆｉｌ　　Ａ８１
　）２回目の演算→Ｘ、＊　（Ａ、　　Ａ、、　　Ａ、
、　）３回目の演算→Ｘ、＊（Ａ、、　　Ａ、、　　Ａ
３８）上記の演算に続いて、それらの部分積に含まれて
いるマトリックス要素の行指数を基にして下記のように
部分積加算する。First operation → X1* (A1□ Afil A81
) Second operation → X, * (A, A,, A,
, ) 3rd operation → X, *(A,, A,, A
38) Following the above operations, add the partial products as follows based on the row indices of the matrix elements included in these partial products.

上記から結論されるより一般的な意見は、マトリックス
の１つのラインから出る一連の種々の要素を横切って第
１の共通な演算を行い、それに続いて、マトリックス中
におけるそれらの要素の超厚（ｏｒｌｇｉｎ）　　を基
にして第２の演算を行うことによシ、多くのマトリック
ス演算を高度に並列なやシ方で行うことができることで
ある。マトリックスの乗算というのはそれらの演算の１
つである。A more general opinion, concluded from the above, is to perform a first common operation across a series of different elements emanating from one line of the matrix, followed by the ultrathickness ( By performing the second operation based on ``orlgin'', many matrix operations can be performed in a highly parallel manner. Matrix multiplication is one of those operations.
It is one.

更に詳しくいえば、以上の説明から、乗算過程における
第１の演算は乗算ベクトルの第１の要素Ｋ。More specifically, from the above explanation, the first operation in the multiplication process is the first element K of the multiplication vector.

マトリックスの列の要素である要素Ａ□□ＩＡｍ　ｌ　
ＩＡｓ□を含んでいる第２のベクトルを乗することであ
る。Element A□□IAm l which is an element of a matrix column
The second vector containing IAs□ is multiplied.

このように、乗算は、乗算ベクトルの各要素に、マトリ
ックスの列の要素よ構成る移項ベクトル（ｔｒａｎｒｐ
ｏｓｅ　ｖ＠ｃｔｏｒ）　　を乗することによシ行われ
る。ベクトル要素Ｘ、ｌには、要素Ａｌ　Ｉ　ＩＡｍ　
１１’４１　　を含んでいるマトリックスの第２の列の
順序換え（ｔｒａｎｓｐｏｓｅ）によシ同様に乗ぜられ
る。そのような各乗算の結果は部分結果を含む。それら
の部分結果は、格納されているマトリックス要素の元の
行を基にして累算されるものとすると、結果ペク。In this way, multiplication is performed by assigning each element of the multiplication vector to the transposition vector (tranrp
This is done by multiplying ose v@ctor). Vector elements X and l have elements Al I I Am
11'41 is similarly multiplied by the transpose of the second column of the matrix containing 11'41. The result of each such multiplication includes partial results. Assuming that those partial results are accumulated based on the original rows of the stored matrix elements, the result spec.

トルが生ずる結果となる。このことは、元の行（ｒｏｗ
　ｏｒｌｇｉｎ）、すなわちＮ　　Ａ１１ＸｌｌＡ１１
ＸｌｌＡｌｌｌＸ８を基にして部分結果を累算すること
によシＱ、が発生されるという、先に述べた説明かられ
かる。This results in the generation of torque. This means that the original row
orlgin), i.e. NA11XllA11
It follows from the previous explanation that Q is generated by accumulating partial results based on XllAllX8.

同じことが結果ベクトルの残りの要素についてもあては
まる。The same applies for the remaining elements of the result vector.

上記のマトリックス乗算を、他の乗算の中で、上記のよ
うにして同時に進めることができることがわかったが、
有限要素解析において遭遇する大きくて粗であるマトリ
ックスに適用できる下記の観察（ｏｂｓｅｒ　ｖａｔｌ
ｏｎ）　　に加えることが残っている。It turns out that the above matrix multiplication can proceed simultaneously among other multiplications as above, but
The following observations apply to large and coarse matrices encountered in finite element analysis.
on) remains to be added.

そのように大きくて粗なマトリックスに対しては、零要
素は最終結果に寄与しないから、マトリックスを記憶装
置にマツプする手続きにおいては零要素は捨てるか、無
視できる。処理すべきマトリックスを並列プロセッサに
マツプする際に零要素は常に無視され、または捨てられ
るが、完全に零ではないが、計算においては重要でない
他の要素を無視または捨てることは同様に可能である。For such large, coarse matrices, zero elements can be discarded or ignored in the procedure for mapping the matrix to storage, since they do not contribute to the final result. Although zero elements are always ignored or discarded when mapping the matrix to be processed onto parallel processors, it is equally possible to ignore or discard other elements that are not exactly zero but are not important in the computation. .

たとえば、零の近くに入る要素を含むことが最終結果を
変えるものではないとひとたび判定されると、それらの
要素を無視できる。し念がって、零という用語はそのよ
うな意味で解すべきである。For example, once it is determined that including elements that fall near zero does not change the final result, those elements can be ignored. As a precaution, the term zero should be interpreted in this sense.

マトリックス演算は多数の並列のや〕方で実行できる可
能性を秘めていることが認められると、並列処理を最適
化することができるようにしてマトリックスを並列プロ
セッサにマツプする技術を得ることが残っている。その
ようなマツピング法は前記米国特許出願第８７０，４６
７号の主題である。Once it is recognized that matrix operations have the potential to be performed in many parallel ways, it remains to develop techniques for mapping matrices onto parallel processors in a way that allows optimization of parallel processing. ing. Such mapping methods are described in the aforementioned U.S. Patent Application No. 870,46.
This is the subject of issue 7.

ここで、第１図に全体的に示されている、それぞれ関連
するメモリを有し、相互（通信する九めに相互接続され
た処理セルのネットワークを有する並列処理装置につい
て説明する。Reference will now be made to a parallel processing apparatus having a network of interconnected processing cells each having an associated memory and in communication with each other, as shown generally in FIG.

基本的な収縮装置が示されている第１図を参照する。そ
の収縮装置において、ホスト装置１０がデータを受け、
パイプラインされている処理セル１５の収縮アレイへ結
果をインター７エイスユニツト１２を介して出力する。Reference is made to FIG. 1, where a basic deflation device is shown. In the contraction device, a host device 10 receives data;
The results are output via inter-78 unit 12 to a contracting array of pipelined processing cells 15.

もちろん、ホスト装置１０はコンピュータと、メモリと
、実時間装置等を有することができる。第１図に示すよ
うに、′入力端子と出力端子がホスト装置１０へ結合さ
れる。入力端子は１つの物理的ホストからのものとする
ことができ、出力端子は別の物理的ホストへ向けること
ができる。収縮アーキテクチャの重要な目標は、ホスト
によシアレイの各人力／出力アクセスについてホストで
多数の計算を行うことである。それらの多数の計算は、
受けた入力の処理を「パイプライン」のようにして続け
るためにセルを構成するととくよシ並列に行われる。こ
のようにして、セルが新しいデータを受けて処理してい
る間に、他のセルが前の入力アクセスにおいて受けたデ
ータまたはデータの部分積について処理を続ける。収縮
装置を使用するという難問は、解かれる問題を、装置の
種々のセルにより並列に周期的に処理できる副過程に分
けることである。Of course, host device 10 can include a computer, memory, real-time devices, and the like. As shown in FIG. 1, the 'input and output terminals are coupled to a host device 10. Input terminals can be from one physical host and output terminals can be directed to another physical host. An important goal of the contraction architecture is to perform a large number of computations on the host for each force/output access of the shear array. A large number of those calculations are
When cells are configured to continue processing received input in a "pipeline" fashion, processing is done in parallel. In this way, while a cell is receiving and processing new data, other cells continue processing data or partial products of data received in a previous input access. The challenge of using a shrinkage device is to divide the problem to be solved into subprocesses that can be processed periodically in parallel by different cells of the device.

大きい粗なマトリックスを一連の並列処理セルにロード
して、格納させる手続きが前記米国特許出願第８７０，
４５７号明細書に開示されている。本発明を完全に説明
するためにその手続きをここで第２図を参照して繰返え
し説明する。第２図は第１図に示されている収縮プレイ
アーキテクチャの流れ図である。マトリックスのローデ
ィングと格納は第１図に示されているインターフェイス
五ニット１２の制御の下に行われる。そのインターフェ
イスユニット１２はその処理動作中にホスト１０と交信
する。そのホストにはマトリックスがアレイその他の適
当なデータ構造で既に格納されている。A procedure for loading and storing a large coarse matrix into a series of parallel processing cells is described in U.S. Pat.
No. 457. In order to fully explain the invention, the procedure will now be repeated with reference to FIG. FIG. 2 is a flow diagram of the contraction play architecture shown in FIG. Matrix loading and storage takes place under the control of the interface unit 12 shown in FIG. The interface unit 12 communicates with the host 10 during its processing operations. The host already has the matrix stored in an array or other suitable data structure.

第２図に示されているマツピング手続きの目的は、前記
したように、マルチプロセッサのメモリにマトリックス
を、一般的にマトリックス演算を極めて多数同時に行え
るようにして、とくに後進代入演算を多数同時に行える
ようにして格納することである。これは、共通演算子に
よる同時演算を行えるようにして、マトリックスの１つ
のラインの要素を格納することによフ行われる。更に詳
しくいえば、後で説明するように１後進代入演算におい
てはマトリックス列の要素が、解かれる場の変数ベクト
ルの関連する、かつ以前に１算された成分である共通演
算子によシ連続乗算または同時乗算のために格納される
。As mentioned above, the purpose of the mapping procedure shown in Figure 2 is to store a matrix in the memory of a multiprocessor so that a very large number of matrix operations can be performed simultaneously in general, and in particular a large number of backward assignment operations can be performed simultaneously. and store it. This is done by storing the elements of one line of the matrix, allowing simultaneous operations with common operators. More specifically, as will be explained later, in the 1-backward assignment operation, the elements of the matrix sequence are continuously assigned by a common operator that is the associated and previously 1-increased component of the variable vector of the field to be solved. Stored for multiplication or simultaneous multiplication.

格納方法の第２の特徴は、各要素の起原（マトリックス
の別のラインに対する）を識別する格納されている各マ
トリックス要素に組合わせる九めの指数を発生すること
である。とくに、マトリックスの列からの各要素には、
それがきたマトリックスの行を識別する指数が組合わさ
れていることである。更に、各列からの格納されている
要素が置換ベクトルに形成される。各置換ベクトルには
共通の演算子を組合わせて演算を規則的かつ同時に行わ
せることができる。このようにしていくつかの置換ベク
トルが形成される。各置換ベクトル複数のセル内の与え
られた記憶場所に配置されるから、共通演算子がそれら
の置換ベクトルに同時に働きかけることができる。共通
演算子の起原（ｏｒｉｇｉｎ）　　とキャラクタは、実
行されている特定のマトリックス演算に大きく依存する
。また、格納されている要素を含んでいるマｈ　ＩＪラ
ックス算を一層迅速に行うことができるように、無効な
要素と、零要素および計算上重要でない要素は格納動作
中に捨てられる。最後に、与えられた任意の置換ペクト
化があるマトリックス演算中に周期的かつ繰返えし処理
のために容易に利用できるよう（、与えられた任意の置
換ベクトルに含まれている対角線要素が１つのプロセッ
サセルに格納のために集められる。A second feature of the storage method is to generate a ninth index that is associated with each stored matrix element that identifies the origin (relative to another line of the matrix) of each element. In particular, each element from a column of the matrix has
It is combined with an index that identifies the row of the matrix from which it came. Additionally, the stored elements from each column are formed into a permutation vector. Common operators can be combined with each permutation vector to perform operations regularly and simultaneously. Several permutation vectors are thus formed. Because each permutation vector is placed in a given memory location within multiple cells, a common operator can operate on the permutation vectors simultaneously. The origin and character of the common operators are highly dependent on the particular matrix operation being performed. Also, invalid elements, zero elements, and computationally unimportant elements are discarded during the store operation so that mah IJ lux calculations involving the stored elements can be performed more quickly. Finally, given any permutation vectorization, there is a diagonal element contained in any given permutation vector that is easily available for periodic and iterative processing during matrix operations (i.e., are collected for storage in one processor cell.

上記目的は、マトリックス列の要素を、処理セルを横切
って構成された置換ベクトルに変換することＫよシ達成
される。格納動作中はマトリックスの零要素が無視され
るから、上記格納方法を実行するために必要な処理セル
の数は全く妥当である。２５０００Ｘ２５０００　　の
オーダーのマトリックスによシ多数の物理系が表される
。それらのマトリックスにおいては、与えられた任意の
列における非零要素の数は２０のオーダーまたはそれ以
下である。したがって、上記格納動作の結果として、マ
トリックスの列の各非零要素は、各処理セルの同じ記憶
場所に格納されている置換ベクトルす々わちアレイであ
る。また、アクセスを容易にするために全ての対角線番
号が１つのプロセッサセルに格納される。この手続きに
より同時にアクセスできる列の非零要素によ）、後で詳
しく説明するように、多数の並列動作で置換ベクトルに
共通演算子を作用させることができる。The above object is achieved by converting the elements of the matrix columns into permutation vectors constructed across the processing cell. Since zero elements of the matrix are ignored during the storage operation, the number of processing cells required to implement the above storage method is quite reasonable. A large number of physical systems can be represented by a matrix of the order of 25000x25000. In those matrices, the number of nonzero elements in any given column is on the order of 20 or less. Therefore, as a result of the above storage operation, each non-zero element of a column of the matrix is a permutation vector or array stored in the same memory location of each processing cell. Also, all diagonal numbers are stored in one processor cell for ease of access. Due to the non-zero elements of the column that can be accessed simultaneously by this procedure), a common operator can be applied to the permutation vector in a number of parallel operations, as will be explained in more detail below.

次に、上記目的舎達成するためにマトリックスをマルチ
プロセッサアレイにマツピングする方法を第２図を参照
して説明する。まず、ステップ２００からスタートして
、定数ｒＣＪを、利用すべき並列プロセッサアレイ中の
処理素子の総数に等しくセットする。処理セルの数は変
えることができるが、マツプすべき大きい粗なマトリッ
クスの任意の１列中の非零要素の最大数にほぼ等しい数
の時に効率が最高となる。手続き変数「ＲＯＷＪ　　（
処理場れるマトリックス要素のための行指数）と［Ｃ０
ＬＪ　　（処理されるマトリックス要素のための列指数
）が１に等しくセットされるから、処理は１行、１列に
おけるマトリックス要素においてスタートシ、そこから
進む。変数「ＰＲＯｃＪ　（、非零の、非対角線要素を
ロードすべき次の処理セル）が２に等しくセットされる
。もちろん、ｌ’−ＰＲＯＣＪは１とＣの間で変わるこ
とができるが、後で詳しく説明するように、マトリック
スの対角線要素のみを含むためにセル１には印がつけら
れる。メモリし変数（ロードされている「ＰＲＯｑ中の
記憶場所）が［ＭＥＭＪにセットされる。そのｌ−ＭＥ
ＭＪは各プロセッサセルに入れるべき最初の記憶場所で
ある。Next, a method of mapping a matrix to a multiprocessor array in order to achieve the above objective will be described with reference to FIG. Starting at step 200, a constant rCJ is set equal to the total number of processing elements in the parallel processor array to be utilized. The number of processing cells can vary, but efficiency is highest when the number is approximately equal to the maximum number of nonzero elements in any one column of the large coarse matrix to be mapped. Procedural variable “ROWJ (
The row index for the matrix element in the processing field) and [C0
Since LJ (column index for the matrix element being processed) is set equal to 1, processing starts at the matrix element in row 1, column 1 and proceeds from there. The variable 'PROcJ (the next processing cell to load the non-zero, non-diagonal element) is set equal to 2. Of course, l'-PROCJ can vary between 1 and C, but later As will be explained in detail, cell 1 is marked to contain only the diagonal elements of the matrix.The memory variable (the memory location in PROQ being loaded) is set to [MEMJ. M.E.
MJ is the first memory location to be placed in each processor cell.

ホストは変数Ｍ（マトリックス中の行の数）と変数Ｎ（
マトリックス中の列の数）もセットし、または先に行っ
た処理で既にセットしている。最後に、ステップ２００
において表（ＣＯＬ）　と記されている表（この表はホ
スト内に用意されている）が、マトリックス（作成すべ
き特定の置換ベクトル）の１つの列を、マトリックス演
算において後で使用する特定の演算子に相関させる予備
ステップにおいて全て１に初期化される。この初期化に
ついては後で詳しく説明する。The host sets variables M (number of rows in the matrix) and variables N (
(number of columns in the matrix) is also set, or has already been set by a previous operation. Finally, step 200
A table marked table (COL) (this table is provided in the host) is used to store one column of the matrix (the particular permutation vector to be created) into a particular column to be used later in the matrix operation. All are initialized to 1 in the preliminary step of correlating operators. This initialization will be explained in detail later.

次に、ステップ２０１において、マトリックス要素人（
ＲＯＷ）　（ＣＯＬ　）　（その要素は手続きの初めに
おいてはマトリックス要素Ａ（１）（１）である）が零
であるかどうかについての判定が行われる。その要素が
零であれば、その要素はどのプロセッサセルにもロード
されないで捨てられる。したがって、処理はステップ２
０２へ進み、そのステップにおいて変数ＲＯＷを増加さ
せる。それから処理はステップ２０３へ進み、そのステ
ップにおいてその変数がＭよシ大きいかどうかの判定を
行う。その変数がＭより大きくないと、処理はステップ
２０１へ戻って新しいマトリックス要素の判定を行う。Next, in step 201, the matrix element person (
A determination is made as to whether ROW) (COL) (whose element is matrix element A(1)(1) at the beginning of the procedure) is zero. If the element is zero, the element is discarded without being loaded into any processor cell. Therefore, the process is step 2
Proceed to step 02 and increase the variable ROW in that step. Processing then proceeds to step 203 where it is determined whether the variable is greater than M. If the variable is not greater than M, processing returns to step 201 to determine a new matrix element.

変数ＲＯＷがＭより大きいと、マトリックスの完全な列
と判定されたことになシ、処理はステップ２０５へ進む
。If the variable ROW is greater than M, it is determined that the matrix is a complete column, and the process proceeds to step 205.

そのステップ２０５においてはＰＲＯＣからＣまでの残
りの任意のプロセッサの記憶場所ＭＥＭにセルに零を格
納する。このステップの目的は、前記したように、各プ
ロセッサ内の特定の各記憶場所が、マトリックスの１つ
の、かっただ１つの列に関連する格納されている要素を
含むことができるようにすることである。このようにし
て、与えられた列に関連する要素は置換ベクトル（複数
のセルを横切って既知の記憶場所におのおの格納される
）と、複数のそのような置換ベクトル（マトリックスの
各列に１つ）が作成される。In step 205, zero is stored in cells in memory locations MEM of any remaining processors from PROC to C. The purpose of this step, as described above, is to enable each particular memory location within each processor to contain stored elements associated with one and only one column of the matrix. be. In this way, the elements associated with a given column contain a permutation vector (each stored in a known memory location across multiple cells) and multiple such permutation vectors (one for each column of the matrix). ) is created.

特定のＭＥＭにおける全ての処理セルが充された後で、
処理は次の列の１番上から始まるマトリックスの走査を
続ける。したがって、ステップ２０７においては、マト
リックスの新しい列内のマトリックス列をテストする準
備のために変数ＲＯＷを１に等しくリセットし、変数Ｃ
ＯＬを増加する。しかし、ステップ２０９において変数
ＣＯＬがＮ（マトリックス中の列の数）より大きいと判
定されると、マトリックスをロードする処理は終る（ス
テップ２１０）。また、変数ＣＯＬがＮより大きくなれ
ば、処理はステップ２０１へ戻って、そのステップにお
いて新しいマトリックス要素がテストされる。After all processing cells in a particular MEM are filled,
Processing continues scanning the matrix starting at the top of the next column. Therefore, in step 207, the variable ROW is reset equal to 1 in preparation for testing the matrix columns in the new column of the matrix, and the variable C
Increase OL. However, if it is determined in step 209 that the variable COL is greater than N (the number of columns in the matrix), the process of loading the matrix ends (step 210). Also, if the variable COL becomes greater than N, processing returns to step 201 where a new matrix element is tested.

ステップ２０１においてマトリックス要素が零でないと
判定されると、処理はステップ２３０へ進んで、その要
素が対角線要素であるかどうか、すなわち、ＲＯＷ−Ｃ
ＯＬでおるかどうかの判定を行う。If it is determined in step 201 that the matrix element is not zero, processing proceeds to step 230 to determine whether the element is a diagonal element, i.e., ROW-C
Determine whether or not you are an office worker.

その要素が対角線要素であればその要素がＰＲＯＣ−１
（プロセッサセル１）に格納され、指数が発生されて、
それを発生したマトリックスの行を示す格納されている
対角線要素を伴う。対角線要素をセル１に格納しｔ後で
、処理はステップ２０２へ進み、前記したのに類似した
ようにして後のステップへ進む。If the element is a diagonal element, the element is PROC-1
(processor cell 1), an exponent is generated,
It is accompanied by a stored diagonal element indicating the row of the matrix that generated it. After storing the diagonal element in cell 1, processing proceeds to step 202 and to subsequent steps similar to those described above.

ステップ２３０においてそのマトリックス要素が対角線
要素ではないと判定されると、そのマトリックス要素は
プロセッサセルＰＲＯＣの記憶場所に格納される（ステ
ップ２２０）。それから、同じ記憶場所ＭＥＭに新に格
納された要素に関連して、ＲＯＷに等しい指数が発生さ
れ、格納される（ステップ２２１）。、非零要素を格納
したら処理はＰＲＯＣ“を増加させる（ステップ２２５
）から、次の非零要素が次のプロセッサセルに格納され
、それからセル２２７においてそのＰＲＯＣがＣ（プロ
セッサセルの総数）Ｋ等しいかどうかの判定を行う。そ
の判定結果が肯定というのは、プロセッサセル内の全て
の記憶場所が充されたことを示す。そうすると処理はス
テップ２２８へ進み、そのステップにおいてＰＲＯＣは
２にリセットされ、メモリが新しい値に増加させられる
。それとともに、以前に作成された表へのエントリイが
増加させられて、後で行われる処理の間にホストがマト
リックスの各列を置換ベクトルの与えられた長さに相関
させることができるようにする。この工ントリイにより
ホストまたはインターフェイスユニットが、この現在の
ＣＯＬに対応する非零要素を、２個所よフ多い記憶場所
（正確な数は表に含まれている）をカバーする置換ベク
トルに関連させることができるようにする。多数の記憶
場所を求められたとすると、ホストはマトリックス演算
手続きにおいて後で、対応する演算子が正確な回数だけ
プロセッサに供給されるように適切な操作を行う。それ
から、前記し念ように、処理はステップ２０２へ進む。If it is determined in step 230 that the matrix element is not a diagonal element, the matrix element is stored in a memory location of processor cell PROC (step 220). An index equal to ROW is then generated and stored in association with the newly stored element in the same memory location MEM (step 221). , the process increments PROC" after storing a non-zero element (step 225
), the next non-zero element is stored in the next processor cell, and then a determination is made in cell 227 whether its PROC is equal to C (the total number of processor cells). A positive determination indicates that all memory locations within the processor cell are filled. Processing then proceeds to step 228 where PROC is reset to 2 and memory is increased to the new value. Along with that, the entries in the previously created table are increased to allow the host to correlate each column of the matrix to the given length of the permutation vector during later processing. . This factory allows the host or interface unit to associate the non-zero element corresponding to this current COL with a permutation vector covering more than two memory locations (the exact number is included in the table). be able to do so. If a large number of memory locations are required, the host takes appropriate actions later in the matrix arithmetic procedure so that the corresponding operator is provided to the processor the correct number of times. Processing then proceeds to step 202, as noted above.

ステップ２２７において、ＰＲＯＣがＣよシ大きくない
と判定されると、付加記憶場所ＭＥＭが処理セル内に存
在するから、処理はセル２０２へ直接進む。If it is determined in step 227 that PROC is not greater than C, processing proceeds directly to cell 202 since additional memory location MEM is present in the processing cell.

そのステップ２０２からは処理は前記したようにして続
行される。From step 202, processing continues as described above.

次に第３図を参照して、本発明に従ってバンド幅が最大
にされている特定の上側三角形分解されたマトリックス
〔Ａ〕（第３図の１番上に示されている）を、第１図に
示されているマルチプロセッサに類似するマルチプロセ
ッサのプロセッサセル１〜４の例示的記憶場所にマツプ
するために、第２図に示されている手続きの操作の例に
ついて説明する。図かられかるようＫ、分解されたマト
リックス（Ａ）は１２Ｘ１２次のものであるが、上側三
角形の形をしている。とくに、このマトリックスは対角
線の下側に零値だけを有し、かつ対角線要素は全て非零
要素である。また、マトリックス（３）内の全ての非零
データは右上隅にまとめられ、零要素の幅にのバンドが
対角線と右上隅のデータ群の間に設けられる。このよう
に、マトリックス中の非零データは対角線上または右上
隅に配置させられ、それらの間には零要素だけが配置さ
れる。Referring now to FIG. 3, a particular upper triangulated matrix [A] (shown at the top of FIG. 3) whose bandwidth is maximized in accordance with the present invention is An example of the operation of the procedure shown in FIG. 2 will now be described to map to exemplary memory locations of processor cells 1-4 of a multiprocessor similar to the multiprocessor shown in the figure. As can be seen from the figure, the decomposed matrix (A) is of order 12x12 and has the shape of an upper triangle. In particular, this matrix has only zero values below the diagonal, and all diagonal elements are non-zero elements. Also, all non-zero data in matrix (3) are grouped together in the upper right corner, and a band the width of the zero element is provided between the diagonal and the data group in the upper right corner. In this way, non-zero data in the matrix are placed diagonally or in the upper right corner, with only zero elements placed between them.

バンド幅を最大にされたそのようなマトリックスを作成
し、値Ｋを選択する技術については後で詳しく説明する
。この標本をマツピングする時には、有限要素法によシ
物理系を特徴づける時に一般に遭遇する大きくて、粗で
あるマトリックスのバンド幅を最大にされた上側三角形
マトリックスの例であることを意味するものであること
を記憶すべきである。そのよう々マトリックスの典型的
な次数は２５０００Ｘ２５０００　のことがあシ、行／
列内の要素の０．１％だけが非零である。Techniques for creating such a bandwidth-maximized matrix and selecting the value K will be described in detail below. When mapping this specimen, it is meant to be an example of an upper triangular matrix that maximizes the bandwidth of the large, coarse matrices commonly encountered when characterizing physical systems using finite element methods. You should remember one thing. The typical order of such a matrix is 25000x25000, with rows/
Only 0.1% of the elements in the column are non-zero.

第２図に示されている流れ図のステップ２００からスタ
ートして、マツプされるマトリックス（Ａ）の性質に変
数と定数を次のように合わせるために定数と変数が初期
化される。Ｃ−４、ＲＯＷ−１、Ｃ０Ｌ−１、Ｎ−１２
，Ｍ−１２，メモリｍＬ−ＭＥＭ、後で詳しく説明する
ように、対角線要素を保持するために第１のプロセッサ
セルをどけておくためにＰＲＯＣ＝２が２にセットされ
る。ステップ２０１において第１の要素Ａ（ＲＯＷ）（
ＣＯＬ）が零かどうかの判定を行う。それの値は零に等
しくないから、それは次にステップ２３０においてマ）
　ＩＪラックス対角線要素であるかどうか、とくにＲＯ
Ｗ−ＣＯＬであるかどうかの判定が行われる。要素人は
対角線要素であるから、その要素は、行１からのそれの
超厚を示す指数１とともにプロセッサセル１の場所に格
納される（ステップ２３１）。それから、処理はステッ
プ２０２へ進み、そのステップにおいてＲＯＷは増加さ
せられる。ＲＯＷはＭよす小さいから、ステップ２０３
において処理ステップ２０１へ戻って次の要素ＡＣ２〕
〔１〕を判定する。マトリックスの第１の列には他の非
零要素は存在しないから、行が１２より大きくなるまで
処理はステップ２０１　、２０２　、２０３を繰返えし
通される。行が１２よシ大きくなると、処理はステップ
２０３からステップ２０５へ進む。ステップ２０５にお
いては零がプロセッサセル２〜４の記憶場所ＭＥＭに格
納されて、マトリックスの列１がセル１に格納されてい
る対角線要素人以外の非零要素を有しないことを示す。Starting at step 200 of the flowchart shown in FIG. 2, constants and variables are initialized to match the properties of the matrix (A) to be mapped as follows. C-4, ROW-1, C0L-1, N-12
, M-12, memory mL-MEM, PROC=2 is set to 2 to set aside the first processor cell to hold the diagonal elements, as will be explained in more detail below. In step 201, the first element A(ROW)(
COL) is zero. Since its value is not equal to zero, it is then mapped in step 230.
IJ Lux diagonal element or not, especially RO
A determination is made as to whether it is W-COL. Since element 1 is a diagonal element, the element is stored in location of processor cell 1 with index 1 indicating its superthickness from row 1 (step 231). Processing then proceeds to step 202 where ROW is increased. Since ROW is small by M, step 203
Then return to processing step 201 and perform the next element AC2]
Determine [1]. Since there are no other non-zero elements in the first column of the matrix, the process repeats steps 201, 202, and 203 until there are more than 12 rows. If the row becomes larger than 12, processing proceeds from step 203 to step 205. In step 205, a zero is stored in memory location MEM of processor cells 2-4 to indicate that column 1 of the matrix has no non-zero elements other than the diagonal element stored in cell 1.

それらの零をセル２〜４の記憶場所ＭＥＭにロードする
ことが第３図に示゛されている。The loading of these zeros into memory locations MEM of cells 2-4 is shown in FIG.

マトリックスの列１の走査とローディングが終ったら、
ＣＯＬは２に増加され、行は１にリセットされて、ＣＯ
ＬがＮよフ小さければ列２の走査を開始する（ステップ
２０９）。After scanning and loading column 1 of the matrix,
COL is increased to 2, the row is reset to 1, and CO
If L is smaller than N, scanning of column 2 is started (step 209).

列２〜７は列１と同様にして走査され、各列はそれぞれ
ただ１つの非零要素Ｂ−Ｇも有し、更にそれらの非零要
素は対角ａ要素であるから、それらの非零要素はセル１
の記憶場所ＭＥＭ＋　１〜ＭＥＭ＋６に全て格納され、
零がセル２〜４のＭＥＭ＋　ｌ　に格納される。Columns 2 to 7 are scanned in the same way as column 1, and each column also has only one nonzero element B-G, and since their nonzero elements are diagonal a elements, their nonzero element is cell 1
All are stored in the memory locations MEM+ 1 to MEM+6,
Zero is stored in MEM+l of cells 2-4.

それから、処理はステップ２０１へ戻ってマトリックス
の８番目の列を走査する。したがってそのステップ２０
１においてはＡ［８］（１）が判定される。Processing then returns to step 201 to scan the eighth column of the matrix. Therefore step 20
1, A[8](1) is determined.

それの値（Ｔ（）は非零であシ、かつ対角線要素ではな
い（ステップ２３０）であるから、ステップ２２０にお
いてそれはセル２の記憶場所ＭＥＭ＋７に格納される。Since its value (T() is non-zero and not a diagonal element (step 230), it is stored in memory location MEM+7 of cell 2 in step 220.

指数もｒＩ（Ｊとともにセル２の同じ記憶場所に格納さ
れて、マトリックスの第１の列におけるそれの超厚を示
す。次に、ステップ２２５，２２７においてＰＲＯＣは
増加させられ、かつ判定される。An index is also stored with rI(J in the same memory location in cell 2 to indicate its superthickness in the first column of the matrix. PROC is then incremented and determined in steps 225, 227.

ＰＲＯＣは１２より小さいから、行が増加され、判定さ
れて、新しい要素人〔８〕〔２〕が処理される。この要
素は零であるから、処理はステップ２０１へ再び戻され
て要素Ａ〔８〕〔３〕が判定される。零が行２〜７の８
列に配置されているから、処理は順次戻されるが、その
間は格納されない。要素Ａ（８）　（８）は非零であり
、かつ対角線要素であるから、その要素は、先のデータ
要素人がセル１に格納されたやり方と同一のやり方で、
第３図に示すようにセル１０ＭＥＭ＋　７に格納される
。列８内のマトリックスの全ての他の要素は零であるか
ら、マトリックスの列９を走査する準備として、ステッ
プ２０５においてセル３．４の記憶場所ＭＥＭ＋７に零
が格納される。Since PROC is less than 12, the row is incremented, determined, and the new element [8][2] is processed. Since this element is zero, the process returns to step 201 again and element A[8][3] is determined. Zero is 8 in rows 2-7
Because they are arranged in columns, processing is returned sequentially, but data is not stored in between. Since element A(8) (8) is nonzero and a diagonal element, it is stored in the same way as the previous data element was stored in cell 1.
As shown in FIG. 3, it is stored in cell 10MEM+7. Since all other elements of the matrix in column 8 are zeros, a zero is stored in memory location MEM+7 of cell 3.4 in step 205 in preparation for scanning column 9 of the matrix.

それから、処理は上記のように繰返えし続行されて、マ
トリックスの各非零対角線要素をセル１の種々の記憶場
所に格納し、各列の非零要素をセル２〜４に格納し、零
要素を捨てて、最後に残シの記憶セルの任意の与えられ
た記憶場所に零を格納してから新しい列を走査する。そ
の結果、マトリックスＡの最初の列の全ての非零要素が
、プロセッサセルを横切ってＭＥＭとして識別されてい
る記憶場所における置換ベクトルに変形される。更に、
そのような置換ベクトルに含まれている全ての対角線要
素が特定のプロセッサセル１に格納される。実際に、記
憶場所ＭＫＭは、マトリックスの列１の全ての非零要素
を含んでいる新に作成された、すなわち、置換されたベ
クトルに対する識別子として作用し、対角線要素はセル
１に格納されている。第３図のマトリックス（Ａ）の列
１の場合には、ただ１つの要素がそれに含まれる。それ
は対角線であるから、それはセルＩＫ格納され、他の全
ての要素は零値にされ、残シの記憶場所には第３図に示
すように零が含まれる。マトリックス要素の同時処理を
行えるようにする際に対角線要素を特別に取扱って、そ
の新に作成された置換ベクトルＭＥＭの重要性が前記米
国特許出願第８７０，５６６号明細書に記載されている
。Processing then continues iteratively as described above, storing each nonzero diagonal element of the matrix in a different memory location in cell 1, storing the nonzero elements in each column in cells 2-4, and Discard zero elements and finally store a zero in any given memory location of the remaining memory cells before scanning a new column. As a result, all non-zero elements of the first column of matrix A are transformed into permutation vectors across the processor cell at memory locations identified as MEM. Furthermore,
All diagonal elements contained in such a permutation vector are stored in a particular processor cell 1. In fact, the memory location MKM acts as an identifier for the newly created, i.e. permuted, vector containing all non-zero elements of column 1 of the matrix, the diagonal elements being stored in cell 1. . In the case of column 1 of matrix (A) in FIG. 3, only one element is contained therein. Since it is a diagonal, it is stored in cell IK and all other elements are set to zero values and the remaining storage locations contain zeros as shown in FIG. The importance of the newly created permutation vector MEM with special treatment of diagonal elements in allowing simultaneous processing of matrix elements is described in the aforementioned US patent application Ser. No. 870,566.

先に述べたように、処理セルの数はマトリックスＡの与
えられた任意の列中の非零要素の最大数に等しく選択さ
れるが、発明の要旨を逸脱することなしに２または２よ
υ大きい任意の数に等しくできる。しかし、用いられる
処理セルの数とは無関係に、マトリックスの特定の列中
の非零要素の数がプロセッサセルの数よシ多くなること
がある。As stated earlier, the number of processing cells is chosen equal to the maximum number of nonzero elements in any given column of matrix A, but may be 2 or 2 or more without departing from the spirit of the invention. Can be equal to any large number. However, regardless of the number of processing cells used, the number of non-zero elements in a particular column of the matrix may be greater than the number of processor cells.

この状況に対処するために、前記ステップ２２７゜２２
８が第２図の流れ図に示されているマツピング過程に含
まれる。たとえば、マトリックス（Ａ）の第１の列がプ
ロセッサセルが利用できる以上の非零要素を有している
とすると、すなわち、ある列の全ての要素が格納される
前にＰＲＯＣがＣより大きいとすると（ステップ２２７
）、そのＰＲＯＣは２にリセットされ、ＭＥＭがＭＥＭ
＋１１ｃ増加させられ（ステップ２２８）て、「ＭＥＭ
Ｊ　として識別されている置換ベクトルがＭＥＭ＋１記
憶場所に格納され続けられる、すなわち、拡張される。To deal with this situation, step 227°22
8 is included in the mapping process shown in the flowchart of FIG. For example, if the first column of matrix (A) has more nonzero elements than the processor cell has available, i.e., if PROC is greater than C before all elements of a column are stored, Then (step 227
), its PROC is reset to 2 and MEM
+11c (step 228) and “MEM
The permutation vector identified as J continues to be stored in the MEM+1 memory location, ie, expanded.

これの結果としてマトリックスの列１が、記憶場所ＭＥ
ＭとＭＥＭ＋１　に配置される置換ベクトルに変換され
る。As a result of this, column 1 of the matrix is moved to memory location ME
It is converted into a permutation vector located at M and MEM+1.

その場合には、列１における表（ＣＯＬ）のエントリイ
は「２」であってそれを示す。In that case, the table (COL) entry in column 1 is "2" to indicate this.

マトリックスの与えられた列からの全ての要素を判定し
た後で非零要素が格納されていない与えられた記憶場所
に対応する残りの任意のプロセッサセルがステップ２０
５に従って零を格納する。たとえば、マトリックス（Ａ
）の列１０にはたった３個の非零要素Ｍ、Ｎ、Ｐがある
だけで、プロセッサセル４番のＭＥＭ＋９には零が格納
されたままで、ＭＥＭ＋　９における変換ベクトルを完
成する。同様な状況が第３図に示されている。この図に
おいては、マトリックスの列４には３個の非零要素（Ｓ
。After determining all elements from a given column of the matrix, any remaining processor cell corresponding to a given memory location in which no non-zero elements are stored is determined in step 20.
Store zero according to 5. For example, matrix (A
) has only three non-zero elements M, N, and P in column 10, and zero is still stored in MEM+9 of processor cell No. 4, completing the transformation vector in MEM+9. A similar situation is shown in FIG. In this figure, column 4 of the matrix has three nonzero elements (S
.

Ｒ，Ｔ）だけが存在するから、ステップ２０５の結果と
してプロセッサセル４番の記憶場所ＭＥＭ＋　１０に零
が格納される。R, T) exists, so as a result of step 205, zero is stored in memory location MEM+10 of processor cell number 4.

第３図にｑで示されている記憶場所のその部分の機能を
、第３図のマトリックスＡを含む線形方程式系を解くた
めの後進代入操作の実行に関連して以下に説明すること
にする。The function of that portion of the memory location, designated q in FIG. 3, will be described below in connection with performing a backward substitution operation to solve the system of linear equations containing the matrix A of FIG. .

本発明に従ってバンド幅が最大にされた分解されたマ）
　ＩＪツクスＡによシ特徴づけられた分解された方程式
系に対する後進代入操作の実行を、第４図を参照して説
明する。先に行った説明を簡単に思い出すために、後進
代入操作の開始前に、調べている積すなわち物理系を記
述する方程式系が（Ｋ）（Ｙ）　＝　［Ｒ）の形で発生されることを理解すべきである。先に説明し
たところから、（Ｙ）は、ばね系の変位のような、調べ
ている物理系の特質を記述する未知の場の変数を表すベ
クトルであって、成分Ｙ工、・・・ＹＮで構成される。A decomposed matrix whose bandwidth is maximized according to the invention
The execution of a backward substitution operation on a decomposed system of equations characterized by IJTxA will now be described with reference to FIG. To briefly recall the explanation given earlier, it is important to note that before the start of the backward assignment operation, the product or system of equations describing the physical system being investigated is generated in the form (K)(Y) = [R). should be understood. From what we have explained above, (Y) is a vector representing an unknown field variable that describes the properties of the physical system being investigated, such as the displacement of a spring system, with components Y,...YN Consists of.

方程式系はそれらの成分について解く。他方、ベクトル
（Ｒ）は、解析されているばね系における力のような、
結果としてのベクトル変数を表す既知の成分Ｒ０，・・
・ＲＮを有する。（Ｋ）はスチフネスマトリックスであ
って、前記したように、有限要素網目の特定の結節点に
おける未知の場のベクトル変数に既知の結果ベクトルを
関連づける複数の値す々わち複数の定数を含む。The system of equations is solved for its components. On the other hand, the vector (R), like the force in the spring system being analyzed,
Known components R0, representing the resulting vector variables
- Has an RN. (K) is a stiffness matrix, as described above, containing values or constants relating known result vectors to unknown field vector variables at particular nodes of the finite element network.

本発明に従って、上記方程式系は別の方程式系（Ａ）（
Ｘ）＝　　（Ｑ）に変換すなわち分解される。ここに、〔Ａ〕は前記した
ようにバンド幅を最大にされたマトリックスであ！０、
〔Ｑ）は既知ベクトル、〔Ｘ〕は未知ベクトルである。According to the invention, the above system of equations is replaced by another system of equations (A) (
It is converted or decomposed into X)=(Q). Here, [A] is the matrix with the maximum bandwidth as described above! 0,
[Q] is a known vector, and [X] is an unknown vector.

この新しいバンド幅が最大にされた形に系が変換される
と、本発明の解法は第４図に示されている構造の並列プ
ロセッサに対して（少くとも解法のある部分に対して）
実施でき、後進代入技術または前進代入技術を収縮プレ
イに対して実行する。したがって、この解法の目的は、
〔Ａ〕と（Ｑ）が既知であるとすると、後進代入技術に
よりＸｌ・・・ＸＮの値を計算することである。分解さ
れたマトリックスＡは、それの基本的か形において、か
つ前記したように、通常は大きくて粗である。とくに、
そのマトリックスは２５０００Ｘ２５０００のオーダー
の要素を有するものとすることができ、そのうちの非零
要素の数は要素の総数の非常に僅かな率を占めるにすぎ
かい。第５Ｂ図に示すように、そのマトリックスは上側
の三角形のバンド幅が最大にされた形、すなわち、マト
リックスの全ての対角線要素が非零であシ、他の全ての
非零要素（いくつかの零要素を含む）がマトリックスの
右上隅にまとめられ、幅にのバンドが零値データだけを
含む対角線の上に配置される。第３図のマトリックス（
Ａ）はまさにそのような上側の三角形バンド幅が最大に
されたマトリックスである。Once the system is transformed into this new bandwidth-maximized form, our solution can be applied to a parallel processor with the structure shown in Figure 4 (at least for some parts of the solution).
A backward substitution technique or a forward substitution technique can be performed on the contraction play. Therefore, the purpose of this solution is
Assuming that [A] and (Q) are known, the value of Xl...XN is calculated by backward substitution technique. The decomposed matrix A, in its basic form, and as mentioned above, is usually large and coarse. especially,
The matrix may have elements of the order of 25000x25000, of which the number of non-zero elements only accounts for a very small percentage of the total number of elements. As shown in FIG. (containing zero elements) are grouped in the upper right corner of the matrix, and a wide band is placed on the diagonal containing only zero value data. The matrix in Figure 3 (
A) is just such an upper triangle bandwidth-maximized matrix.

上側の三角形バンド幅が最大にされた形の分解されたマ
）　Ｉ）ツクスが設けられたら、この解法の第１のステ
ップは、第２図および第３図を参照して説明し友ように
、マトリックス（Ａ）をロードし、格納することである
。第３図に示されているマトリックス（Ａ）のローディ
ングと格納の結果が第４図に示されている。第４図にお
いては、セル１のメモリ５６が対角線要素Ａ・・・Ｙを
メモリ５６の記憶場所ＭＥＭ−ＭＥＭ＋　１１にそれぞ
れ格納される。Once the upper triangle bandwidth has been maximized, the first step of the solution is explained with reference to Figures 2 and 3. , to load and store matrix (A). The results of loading and storing matrix (A) shown in FIG. 3 are shown in FIG. In FIG. 4, memory 56 of cell 1 stores diagonal elements A...Y in locations MEM-MEM+11 of memory 56, respectively.

格納されている各対角線値には、メモリ部分５９に格納
されて、それに関連する対角線要素の行の超厚を示す指
数が伴う。ベクトルＸの計算された値Ｘ１・・・ＸＨｌ
　　を格納するためにセル１に部分結果記憶装置５７も
設けられる。その部分結果記憶装置の動作については後
で詳しく説明する。Each stored diagonal value is accompanied by an index stored in memory portion 59 indicating the superthickness of the row of diagonal elements associated with it. Calculated value of vector X X1...XHl
A partial result store 57 is also provided in cell 1 for storing . The operation of the partial result storage device will be explained in detail later.

同様にして、セル２〜のメモリ５６がマトリックスの残
シの非零値を格納している様子が示されている。メモリ
部分５８に含まれている各格納されている非零要素には
、それに伴う非零要素が発生されるマトリックスの行を
識別するために部分５９に格納されている指数が伴う。Similarly, memory 56 in cells 2 through is shown storing the remaining non-zero values of the matrix. Each stored non-zero element contained in memory portion 58 is accompanied by an index stored in portion 59 to identify the row of the matrix in which the associated non-zero element is generated.

各セル２〜４は、後で詳しく説明するように、格納され
ているマトリックス要素を伴う格納されている指数に従
って累積された部分結果を格納する記憶装置５７も含む
。Each cell 2-4 also includes a storage device 57 for storing partial results accumulated according to stored indices with stored matrix elements, as will be explained in more detail below.

マトリックスＡの各列の要素はマツピング過程中に並べ
換えられて、各記憶場所ＭＥＭ−ＭＥＭ＋　１１に「置
換ベクトル」と呼ぶことができるものを形成することＫ
も注意すべきである。対角線要素はセル１に集められて
いる。たとえば、マトリックスＡの列９は記憶場所ＭＥ
Ｍ＋８において置換ベクトルに変換され、マトリックス
の行９からの対角線要素がセル１内に配置され、列４の
残少の非零要素（Ｊ、Ｋ）が、それらの生じたマトリッ
クスの行をそれぞれ示す指数（１２）とともにセル２゜
３にそれぞれ含まれる。前記したように、関連する列内
の全ての非零要素が格納された後で残っている各置換ベ
クトル内の充されていない記憶場所に格納するために零
データ値が用いられる。The elements of each column of matrix A are permuted during the mapping process to form what can be called a "permutation vector" in each memory location MEM-MEM+11.
You should also be careful. The diagonal elements are collected in cell 1. For example, column 9 of matrix A is memory location ME
Converted to a permutation vector at M+8, the diagonal elements from row 9 of the matrix are placed in cell 1, with the remaining non-zero elements (J, K) in column 4 indicating their respective rows of the matrix from which they occurred. They are included in cells 2 and 3 together with index (12), respectively. As mentioned above, zero data values are used to store unfilled locations in each permutation vector that remain after all nonzero elements in the associated column have been stored.

第３図の上側の三角形マトリックス（Ａ）が与えられる
と、Ｃｘ）について解くべき方程式系は次の通りである
。Given the triangular matrix (A) in the upper part of Figure 3, the system of equations to be solved for Cx) is as follows.

７・′ （１）ＡＸ１＋・・・　　　　　　　　・・・鴫＋へ　
　　　・・・　＝Ｑ工（２１１％＋・・・　　　　　　
　　・・・鳩→［１，＋　　［１□＝Ｑ。7・' (1) AX1+... ...to Shizuku+
...=Q engineering (211%+...
... Pigeon → [1, + [1□=Q.

（３）　　　　　α、＋・・・　　　　　　　　・・・
原、−■１□　　＝Ｑ。(3) α, +......
Hara, -■1□ =Q.

＋４１　　　　　　　［Ｘ４＋・・・　　　　　　　　
・・・■、−へ、＝Ｑ。+41 [X4+...
...■, -to, =Q.

（５）　　　　　　　ＩＸ、＋・−Ｓ　　　−ＶＪＸｌ
、　＝Ｑ。(5) IX, +・-S −VJXl
, =Q.

＋６）　　　　　　　　　　　Ｆ′ｘ６＋−−−−−−
＝Ｑ。+6) F′x6+−−−−−−
= Q.

（７）　　　　　　　　　　　　　α、＋・・・　　　
　　・・・　＝Ｑ。(7) α, +...
...=Q.

＋８）　　　　　　　　　　　　　　　ＩＸ、＋・・・
　　　・・・　＝Ｑ。+8) IX, +...
...=Q.

＋９１　　　　　　　　　　　　　ＬＸ９＋由　・・・
　＝Ｑ。+91 LX9+Yu...
= Q.

（１０）　　　　　　　　　　　　　　　　　Ｐ）’１
０＋・・・　　＝　ＱＩＱ（１１）　　　　　　　　　
　　　　　　　　　　　Ｔｘｌｌ”””　＝　Ｑｔ、（
１２）　　　　　　　　　　　　　　　　　　　　　ｙ
ｘ、＝　Ｑ。(10) P)'1
0+...=QIQ(11)
Txll””” = Qt, (
12) y
x,=Q.

前記し念ように、上記方程式系を後進代入法により解く
ために、方程式（１２）を解くことによりＸ。As previously mentioned, in order to solve the above equation system by the backward substitution method, X by solving equation (12).

をまず計算する。同様に、類似のやシ方で、〔Ｑ〕の関
連する°既知成分を対応する対角線要素Ｔ、Ｐ。First calculate. Similarly, in a similar way, the diagonal elements T, P correspond to the related known components of [Q].

Ｌ、Ｉ、Ｇ、Ｆによシ単に割るだけでＸ□１．Ｘ工。、
Ｘ、＋ｘ８＋Ｘ、　、　Ｘ６を１ちに解くことができる
。未知の場の変数〔Ｘ）のいくつかの成分を、場の変数
の他の任意の値を顧慮せずに同時に計算できることに本
発明の性能があることに注意すべきである。これによ）
、後で詳しく説明するように、並列プロセッサによる計
算速度を大幅に高くできる。このことは、場の変数の１
つの成分の計算を除き、残シの値を必ず計算し、後続す
る値が既知であるような、先行技術および未決の前記米
国特許出願明細書に開示されている従来の演算方法とは
明らかに対照的である。このデータ依存性の問題は、当
業者であれば容易にわかるように、並列プロセッサによ
る後進代入法によシそれらの方程式を解く速度を著るし
く低下する。Simply divide by L, I, G, and F to create X□1. X engineering. ,
X, +x8+X, , X6 can be solved at once. It should be noted that the ability of the present invention lies in the ability to calculate several components of the unknown field variable [X) simultaneously without regard to any other values of the field variable. This)
, as will be explained in detail later, can greatly increase the computational speed of parallel processors. This means that the field variable 1
This is clearly different from the conventional method of calculation disclosed in the prior art and pending U.S. patent application, in which the value of the remainder is always calculated and the subsequent values are known, except for the calculation of one component. Contrasting. This data dependency problem significantly slows down the speed at which these equations can be solved by backward substitution techniques on parallel processors, as will be readily apparent to those skilled in the art.

前記したように、簡単な偏差によすＸｌｌ−Ｘ８の値を
計算し九ら、それらの既知の値と、それらの既知の値を
含む演算の結果を、〔Ｘ〕の未知成分に対する残りの値
、すなわち、方程式（５）〜（１）中のＸ、〜Ｘ工を計
算するために利用できる。ここで、上記方程式系は、有
限要素解析法を用いて研究される典型的な物理問題にお
いて遭遇する非常に大きくて、粗であるシステムの一例
を示すものであることに再び注意すべきである。上記方
程式系はここでは単なる例示のためのものであって、第
５Ｂ図に示されている形のバンド幅が最大にされた層成
マトリックスを用いることによシ解法におけるデータ依
存性の減少から生ずる本発明の詳細な説明する九めのも
のであることに再び注意すべきである。As mentioned above, we calculate the value of It can be used to calculate the values, ie, X, ~X in equations (5) to (1). It should again be noted that the above system of equations is an example of a very large and coarse system encountered in typical physical problems studied using finite element analysis methods. . The above system of equations is here for illustrative purposes only, and the use of a maximized bandwidth stratified matrix of the form shown in FIG. It should again be noted that this is the ninth detailed description of the resulting invention.

次に、第４図および第４Ａ〜４Ｌ図を参照して、収縮マ
ルチプロセッサ装置において後進代入技術によシ上記方
程式系を解く実際の方法について詳しく説明する。この
マルチプロセッサ装置は４つのプロセッサセル１〜４で
構成される。、各プロセッサセルは解に寄与する特定の
演算を実行することを割当てられる。プロセッサセルに
よす実行される各演算は、第４図においては機能ユニッ
トにより示されている。第４図例示されている機能ユニ
ットは機械の種々のハードウェア部品を指すものではな
く、当業者であれば容易に理解されるであろうが、装置
の各セルにより実行できる種々のデータ処理オペレーシ
ョンを指すものであることを記憶すべきである。各セル
により実行される演算は機械の各サイクル中に定期的か
つ繰返えしであるように構成される。プロセッサを通る
情報の流れを第４Ａ〜４Ｌ図を参照して説明する。それ
らの図は、前記方程式系の例に対する典型的な後進代入
法における初めの１２サイクル中に演算子と結果が機械
を流れることを示すものである。The actual method of solving the above system of equations by backward substitution techniques in a contracting multiprocessor system will now be described in detail with reference to FIG. 4 and FIGS. 4A-4L. This multiprocessor device is composed of four processor cells 1-4. , each processor cell is assigned to perform a particular operation that contributes to the solution. Each operation performed by a processor cell is represented in FIG. 4 by a functional unit. The functional units illustrated in FIG. 4 do not refer to the various hardware components of the machine, but rather to the various data processing operations that can be performed by each cell of the device, as will be readily understood by those skilled in the art. It should be remembered that it refers to The operations performed by each cell are arranged to be periodic and repetitive during each cycle of the machine. The flow of information through the processor will now be described with reference to Figures 4A-4L. The figures illustrate the flow of operators and results through the machine during the first 12 cycles of a typical backward substitution method for the above example system of equations.

第４図に示されているマルチプロセッサ装置は各セル１
〜４に関連するメモリ５６を含むことがわかる。メモリ
５６は、前記したように解マトリックスＡの要素の値を
格納する第１のアレイ５８を有するものと考えることが
できる。メモリアレイ５Ｂに格納されている各要素には
、アレイ５９に格納されて、それに関連する格納されて
いる値を生ずるマトリックスの行を示す指数が関連させ
られる。このように、セル１について調べると、マトリ
ックス要素人はマトリックスの行１から生じ、マトリッ
クス要素りはマトリックスの行９から生ずる、等である
。セル１への格納は、格納されている要素が第２図に示
されている格納技術を基にした対角線要素であることを
意味する。The multiprocessor device shown in FIG.
It can be seen that the memory 56 associated with .about.4 is included. Memory 56 can be thought of as having a first array 58 that stores the values of the elements of solution matrix A, as described above. Each element stored in memory array 5B has associated with it an index indicating the row of the matrix that is stored in array 59 and yields the stored value associated with it. Thus, looking at cell 1, matrix element 1 arises from row 1 of the matrix, matrix element 1 arises from row 9 of the matrix, and so on. Storing in cell 1 means that the elements being stored are diagonal elements based on the storage technique shown in FIG.

各セルは別の記憶装置５７も含む。この記憶装置５７は
、後で行う計算過程の説明において詳しく説明するよう
に、装置により行われるインデックスされた計算の結果
を格納する。その記憶装置５７は未知の場の変数の最後
の成分値Ｘ１〜Ｘ１□を格納する記憶装置として機能す
る。セル２，３゜４の記憶装［５７は、値を順次入力お
よび出力する代りに指数に従って結果を格納し、インデ
ックスされた入力に応答して、インデックスされた出力
、すなわち、それらの指数に関連する格納されている値
を与える。そのなめに、セル２〜４の記憶装置５７は２
つの異なる出力を生ずる。線５７′へ与えられる第１の
出力は一連の累算器５３へ与えられる。それらの累算器
においては、同じ指数に関連する種々の値が、その指数
を基にした順序でメモリで回復される前に一緒に加え合
わされる。Each cell also includes another storage device 57. This storage device 57 stores the results of the indexed calculations performed by the device, as will be explained in more detail in the description of the calculation process that follows. The storage device 57 functions as a storage device for storing the last component values X1 to X1□ of the unknown field variables. The storage devices [57 in cells 2, 3. gives the stored value. For this reason, the memory devices 57 of cells 2 to 4 are
produces two different outputs. The first output provided on line 57' is provided to a series of accumulators 53. In these accumulators, different values associated with the same index are added together before being recovered in memory in order based on that index.

線５ｒにおける出力は、対角線の指数に関連するメモリ
に格納されて、累算器５３において同時に処理される値
を表す。この理由から、１１１５７’における出力はメ
モリ５９から線５ｇへ与えられた指数入力により制御さ
れる。セル２〜４の各記憶装置５Ｔから線５ｒへ与えら
れた第２の出力が加算器６５へ与えられる。線５ｒにお
けるインデックスされた出力は先に行われた計算の累算
ＱＮ′を表す。それらの累算は後進代入法の実施時に加
え合わされる。線５ｒにおける出力は、サイクル１０間
に指数Ｘ１から始ってサイクル１１中の指数１までカウ
ントダウンするカウンタ６１からセル２〜４の各記憶装
置５Ｔへの同時出力により選択される（第４Ａ〜４Ｌ図
）。The output on line 5r represents the value stored in memory associated with the diagonal index and processed simultaneously in accumulator 53. For this reason, the output at 11157' is controlled by the exponent input provided from memory 59 to line 5g. A second output from each memory device 5T of cells 2 to 4 to line 5r is provided to adder 65. The indexed output on line 5r represents the accumulation QN' of the previous calculations. Their accumulations are added together when performing the backward assignment method. The output on line 5r is selected by the simultaneous output to each memory 5T of cells 2-4 from the counter 61, which starts from index X1 during cycle 10 and counts down to index 1 during cycle 11 (4A-4L). figure).

セル１〜４の記憶装置５８に格納されている値は、各セ
ルに関連する装置の残りの部分中の機能ユニットへ与え
られる。記憶装置５８に格納されている値は１番上のア
ドレスすなわち最も高い番号のアドレス（第４図）から
線５８′を介して機能ユニツ）５２．５１へ段々に与え
られる。この段々に与えられる、というのは、セルｌの
記憶装置５Ｂの記憶場所ＭＥＭ＋　１１に格納されてい
る値が、与えられた機械サイクルで除算器５２へ与えら
れることを意味する。前記与えられた機械サイクルは、
セル２．３．４の記憶装置５８内の同じ記憶場所におけ
る対応する値ｄｆｉ１１！　５８’を介して関連する乗
算器５１へ与えられる前の２〜３サイクルである。これ
については後で詳しく説明する。セル１は減算器６０と
除算器５２を含む。減算器６゜には加算器６５から第１
の演算子が線６５′を介して供給される。減算器６０へ
の第２の入力が物理系Ｑ１〜Ｑｌ、ｌ　　における既知
のベクトル成分の１つの値である。除算器５２は減算器
６０において行われた減算の結果をセル１の記憶装置か
らの適切な値により除す。それは格納されているマ）　
ＩＪソックス選択された対角線要素である。The values stored in memory 58 of cells 1-4 are provided to the functional units in the remainder of the device associated with each cell. The values stored in memory 58 are applied step by step from the top or highest numbered address (FIG. 4) via line 58' to functional unit 52.51. This step-by-step application means that the value stored in memory location MEM+11 of memory 5B of cell l is applied to divider 52 in a given machine cycle. The given machine cycle is
The corresponding value dfi11 at the same memory location in the memory 58 of cell 2.3.4! 58' before being applied to the associated multiplier 51. This will be explained in detail later. Cell 1 includes a subtracter 60 and a divider 52. The subtracter 6° receives the first signal from the adder 65.
is supplied via line 65'. The second input to the subtractor 60 is the value of one of the known vector components in the physical system Q1-Ql,l. Divider 52 divides the result of the subtraction performed in subtractor 60 by the appropriate value from cell 1's storage. It is stored (ma)
IJ socks are the selected diagonal elements.

したがって、セル１の除算器５２から発生された各結果
は、解かれる方程式系の未知の場の変数の１つの成分の
解を表す。それらの結果は、前記後進代入法に従って、
セル１の記憶装置５７に格納される。また、除算器５２
における除算の各結果に、マ）　ＩＪソックス対応する
列を表す置換ベクトルを乗じなければならない。たとえ
ば、Ｘ１２にマトリックスの列１２からの各非零要素を
乗じなければならない。それらの演算の結果を集めて、
行指数に従って後で加え合わさなければならない。Therefore, each result generated from the divider 52 of cell 1 represents the solution of one component of the unknown field variables of the system of equations being solved. According to the backward substitution method, those results are
It is stored in the memory device 57 of cell 1. Also, the divider 52
The result of each division in M) must be multiplied by a permutation vector representing the corresponding column of IJ socks. For example, X12 must be multiplied by each nonzero element from column 12 of the matrix. Collecting the results of those operations,
Must be added later according to the row index.

この目的のために、除算器５２からの各結果がパイプラ
インのようにしてセル２〜４の乗算器５１へ供給される
。それらの乗算器においてインデックスされた結果が計
算される。各乗算器５１の出力が累算器５３において行
指数に従って累算され、その累算の結果が行指数に従っ
て記憶装置５Ｔに再び格納され、カウンタ６１の制御の
下に＃５ｒを介して加算器６５へ出力される。For this purpose, each result from the divider 52 is fed in a pipeline manner to the multiplier 51 of cells 2-4. Indexed results are calculated in those multipliers. The output of each multiplier 51 is accumulated in an accumulator 53 according to the row index, and the result of the accumulation is stored again in the storage device 5T according to the row index, and is sent to the adder via #5r under the control of the counter 61. 65.

第４図に示す装置を、前記連立方程式を後進代入法によ
り解く時の動作を説明することによって詳しく説明する
。この説明においては、この目的のために装置内のデー
タの流れを示し、かつ後進代入方法論を実行する第４Ａ
〜４Ｌ図を参照して行う。The apparatus shown in FIG. 4 will be explained in detail by explaining the operation when solving the above-mentioned simultaneous equations by the backward substitution method. In this description, for this purpose we will use the fourth A
~4L Refer to the diagram.

一般的には、第４図に示されている処理装置はパイプラ
イン式に動作する。計算過程の初めには、記憶装置５６
に格納されている値と既知の値Ｑ工〜Ｑ１２　を利用で
きる。それらの値Ｑ０〜Ｑｌ１１　　は、減算器６０へ
適切に入力させるためにホスト１（第１図）のような附
随するプロセッサに格納さ。Generally, the processing apparatus shown in FIG. 4 operates in a pipelined manner. At the beginning of the calculation process, the storage device 56
The values stored in and the known values Q~Q12 can be used. These values Q0-Ql11 are stored in an associated processor, such as host 1 (FIG. 1), for proper input to subtractor 60.

れる。引続く機械サイクルにおいて、それらの値のうち
の選択された１つが、結果演算子を発生するためにマル
チプロセッサセルが利用できるようにされる。データを
第４図で全体として左から右へプロセッサを通じて動か
すことにより処理が進行し、各サイクル中に各セルが有
意の計算を行うまでプロセッサを徐々に充す。最初のい
くつかの機械サイクルが、未知の場の変数（Ｘ）の成分
についての初めのいくつかの値を同時に計算しながら２
、後で説明するようにしてパイプラインを充す。It will be done. In a subsequent machine cycle, a selected one of those values is made available to the multiprocessor cell for generating a result operator. Processing proceeds by moving data through the processors generally from left to right in FIG. 4, gradually filling the processors until each cell performs a significant calculation during each cycle. The first few machine cycles are performed while simultaneously computing the first few values for the components of the unknown field variable (X).
, fills the pipeline as described below.

まず第４Ａ図を参照して、最初の機械サイクル中に値Ｑ
□、が適当な記憶装置またはホストコンピュータ（図示
せず）から減算器６００１つの入力端子へ与えられる。Referring first to Figure 4A, during the first machine cycle the value Q
, is provided to one input terminal of subtractor 600 from a suitable storage device or host computer (not shown).

減算器６０へ加算器６５から与えられる別の入力Ｑよ、
゛は過程中のこの暗点くおいて零である。というのは、
作用すべき演算子が、後進代入法のこの点において装置
の他の機能ユニッ）Ｋ達しておらないからである。他の
全ての機能ユニットへの入力は全て零である。Another input Q given to the subtracter 60 from the adder 65,
゛ is zero at this dark spot in the process. I mean,
This is because the operators to be acted upon do not reach the other functional units of the device at this point in the backward assignment method. Inputs to all other functional units are all zero.

機械サイクル２（第４Ｂ図）の間は、前記方程・式１２
がＸ工、について解かれる。これは、減算器６０からの
出力Ｑ１ｍを、マトリックスＡの最初に格納された対角
線要素ＹＫより除して、最初の解Ｘ１ｍ１を除算器５２
の出力端子に生じさせることによシ行われる。それと同
時に１サイクル２０間に加算器６５からの出力９０１′
が減算器６０において値Ｑ１□から差し引かれて、次の
サイクルのために新しい入力を除算器５２へ与えるため
に減算器６０内の値Ｑ１□から加算器６５′からの出力
端子Ｑ□□から差し引かれる。また、値Ｘ１ｆｉが線６
４を介して記憶装置５７に格納され、線６３を介してセ
ル２の乗算器５１へ与えられる。During machine cycle 2 (Figure 4B), the equation 12
is solved for X engineering. This is done by dividing the output Q1m from the subtractor 60 by the first stored diagonal element YK of the matrix A, and then returning the first solution X1m1 to the divider 52.
This is done by causing a signal to appear at the output terminal of. At the same time, the output 901' from the adder 65 during one cycle 20
is subtracted from the value Q1□ in the subtracter 60 to provide a new input to the divider 52 for the next cycle from the value Q1□ in the subtracter 60 to the output terminal Q□□ from the adder 65'. Deducted. Also, the value X1fi is line 6
4 and is applied to multiplier 51 of cell 2 via line 63.

サイクル３０間（第４Ｃ図）は、値Ｑ□１をＴで除し、
それら＠６４を介してセル１の記憶装置５７に格納する
ことにより、方程式１１がＸ□１に対して解かれる。ま
た、サイクル３０間は、セル２において以前に計算され
て線６３に与えられた値Ｘ１ｍに値Ｗ（マトリックスの
１２番目列からのマトリックス要素の１つ）をセル２０
乗算器５１において乗ぜられて、セル２の累算器５３０
１つの入力端子に部分積Ｗ−Ｘ１．を発生させる。また
、サイクル３０間は加算器６５の出力（依然として零で
ある）がセル１０減算器６０において減算器６０内の値
Ｑ工。から差し引かれる。During cycle 30 (Figure 4C), divide the value Q□1 by T,
By storing them in memory 57 of cell 1 via these @64, equation 11 is solved for X□1. Also, during cycle 30, the value W (one of the matrix elements from the 12th column of the matrix) is added to the value
Multiplied in multiplier 51, accumulator 530 of cell 2
One input terminal has a partial product W-X1. to occur. Also, during cycle 30, the output of adder 65 (which is still zero) is the value Q in subtracter 60 in cell 10 subtracter 60. will be deducted from

それに類似のやシ方で、サイクル４０間は、除算器５２
中の次に格納されている対角線要素Ｐによυ値Ｑ１゜を
除すことにより方程式１０がＸ１ｏについて解かれる。In a somewhat similar way, during cycle 40, divider 52
Equation 10 is solved for X1o by dividing the υ value Q1° by the next stored diagonal element P inside.

それと同時に、セル２において、以前に計算されたＸ□
１ＫＳが乗ぜられて部分積Ｓ−Ｘ、、を発生し、次のサ
イクルでＱ　、１として格納するために、以前に発生さ
れた部分積Ｗ−Ｘ工、がセル２の累算器５３において別
のＱ、゛部分積（線５Ｔにおいて）と累算される。また
、１サイクルの遅れ５５を通ったＸ工、かいまはセル３
に達し、マトリックス要素Ｖが乗ぜられて部分積を発生
する。その部分積は引続くサイクルにおいて累算され、
セル３の記憶装置５７に格納される。At the same time, in cell 2, the previously calculated
The previously generated partial product W-X, is multiplied by 1 K to generate the partial product S-X, , and stored as Q,1 in the next cycle in the accumulator 53 of cell 2. is accumulated with another Q, partial product (at line 5T). In addition, the X-manufacturer, which has passed the one-cycle delay 55, is cell 3.
is reached and multiplied by the matrix element V to generate a partial product. The partial products are accumulated in subsequent cycles,
It is stored in the memory device 57 of cell 3.

サイクル５の間に、加算器６５からの出力Ｑ、＋（依然
として零に等しい）が減算器６０において値Ｑ８から減
算され、Ｑ、（以前のサイクルにおける減算器６０から
の出力）が除算器５２においてＬにより除されてＸ、を
発生する。以前に計算されたＸよ。がセル１の記憶装置
５７に格納され、かつセル２においてＮが乗ぜられる。During cycle 5, the output Q, + (still equal to zero) from adder 65 is subtracted from the value Q8 in subtractor 60, and Q, (the output from subtracter 60 in the previous cycle) is subtracted from divider 52. is divided by L to generate X. The previously calculated X. is stored in the storage device 57 of cell 1, and multiplied by N in cell 2.

セル２の乗算器５１からの以前の出力Ｓ−Ｘ工１が、同
様にインデックスてれて記憶装置５７に格納されていた
値（＃ｉ５７’におけるＱ、′（もしあれば）と累算器
５３において累算され、次のサイクルに記憶装ｆ１５７
に格納される。以前に計算されたＸｌ、にセル３の乗算
器５１においてＲが乗ぜられ、ｖ−Ｘｌｓが、同様にイ
ンデックスされて、以前に格納された値Ｑ、′（もしあ
れば）とセル３の累算器５３において累算され、次のサ
イクルにおいて記憶装置５７に格納される。最後に、Ｘ
ｏ、がセル４まで動いて乗算器５１においてＵを乗ぜら
れ、その積Ｍ−Ｘ１．がセル４の累算器５３へ送られて
累算され、次のサイクルの間に格納される。The previous output SX1 from the multiplier 51 of cell 2 is similarly indexed and stored in the storage 57 (Q,' (if any) in #i57' and the accumulator 53 and stored in memory f157 in the next cycle.
is stored in The previously calculated Xl, is multiplied by R in the multiplier 51 of cell 3, and v-Xls is similarly indexed to the previously stored value Q,′ (if any) and the accumulator of cell 3. It is accumulated in the calculator 53 and stored in the storage device 57 in the next cycle. Finally, X
o, moves to cell 4 and is multiplied by U in multiplier 51, and the product M-X1. is sent to accumulator 53 in cell 4 to be accumulated and stored during the next cycle.

後進代入法を行うための引続く機械サイクル６〜１３は
前記したのと類似のよう圧して進行する。The subsequent machine cycles 6-13 for carrying out the backward substitution process proceed in a manner similar to that described above.

第４Ｆ〜４Ｍ図は引続く方程式を解く際に機械を進むデ
ータの流れの詳細を示すものである。簡単に要約すれば
、引続く各サイクル中に前記方程式の別のものがＸｌ１
１・・・Ｘｏ　　について解かれ、この成分の値が以前
に計算された値とともにセル１の記憶装置５７に格納さ
れる。新に計算された各位も、第４図で見て、プロセッ
サの各後の段まで右へ順次送られ、演算に関連するマト
リックス要素の起原の行に従うインデックスによシ累算
される部分積ＱＮ′の計算を行う。たとえば、サイクル
６（第４Ｆ図）の間に、次のように後で加算器６５にお
いて加算するために３つの部分積が同時に累算される。Figures 4F-4M detail the flow of data through the machine in solving subsequent equations. To briefly summarize, during each subsequent cycle another of the above equations becomes Xl1
1...Xo and the value of this component is stored in memory 57 of cell 1 along with the previously calculated value. Each newly calculated position is also sent sequentially to the right to each subsequent stage of the processor, as seen in Figure 4, and is accumulated as a partial product by an index according to the row of origin of the matrix element involved in the operation. Calculate QN'. For example, during cycle 6 (Figure 4F), three partial products are accumulated simultaneously for later addition in adder 65 as follows.

（１）第１の部分積Ｎ−Ｘ１ｏが（要素Ｎと同じ行指数
を共用する任意の別の以前の部分積とともに）セル２の
記憶装置５７において累算される。このように、要素Ｎ
の起原はマトリックスの行３からであるから、　Ｎ−Ｘ
１ｏＦｉＱ、’　として格納される。(1) The first partial product N-X1o is accumulated in the memory 57 of cell 2 (along with any other previous partial product sharing the same row index as element N). In this way, element N
Since the origin is from row 3 of the matrix, N−X
1oFiQ,'.

（２）第２の部分積Ｒ−Ｘ１、も指数３の下にＱ　、１
としてセル３の記憶装置５１に格納される。(2) The second partial product R-X1 is also Q under the exponent 3, 1
It is stored in the storage device 51 of the cell 3 as a.

（３）同時罠、マトリックス要素Ｕがマトリックスの行
２からのものであるから、第３の部分積”Ｘｌ　ｌがＱ
　、ｔと共用の行超厚を示す他の部分積とともに第３の
部分積Ｖ−Ｘ□、が累算される。(3) Simultaneous trap, since matrix element U is from row 2 of the matrix, the third partial product “Xl l is Q
A third partial product, V-X□, is accumulated along with other partial products representing the row superthickness shared with , t.

上記部分積の初めの２つは、マトリックスの行３に関連
する全ての部分積をカウンタ６１がアクセスする期間で
ある機械サイクルまで記憶装置５１に格納されたままで
ある。これはサイクル９（第４工図）において起る。そ
のサイクルの間に以前に累算された行３０部分積Ｑ３′
の全てが加算器６５へ与えられる。サイクル１００間に
、サイクル１１の間に差Ｑ、−Ｑ、’　を対角線要素Ｏ
で除してＸ８を発生するように、減算器６０において既
知の値Ｑ８から減算器するために加算（Ｎ−Ｘ工。＋Ｒ
−Ｘ１□）を利用できる。し念がって、前記方程式３を
サイクル１１の間に解くことができることがわかる。そ
の理由は、以前のサイクルにおいては、未知ベクトルの
以前に計算された値と、行３のマトリックス要素の全て
の積の行指数を基にして累算が行われ念ことのみである
。The first two partial products remain stored in storage 51 until the machine cycle during which counter 61 accesses all partial products associated with row 3 of the matrix. This occurs in cycle 9 (fourth engineering drawing). The row 30 partial product Q3' previously accumulated during that cycle
are applied to adder 65. During cycle 100, during cycle 11 the difference Q, -Q,' is expressed as a diagonal element O
Addition (N-X engineering.+R
-X1□) can be used. As a reminder, it can be seen that equation 3 above can be solved during cycle 11. The reason is only that in previous cycles the accumulation was done on the basis of the row index of the product of all the matrix elements of row 3 with the previously calculated value of the unknown vector.

本発明の方法論のいくつかの重要な特徴が当業者には明
らかであろう。主か特徴は、前記した１２個の連立方程
式を１３のステップで解くことができることである。こ
れは、解に用いられるマトリックスのバンド幅が最大に
された形の結果としてデータ依存性が除かれたことと、
ここで説明しているように、マルチプロセッサにマトリ
ックス値を格納し、かつそれらのマトリックス値をアク
セスするための独特のパターンとの組合わせのためにの
み可能とされる。更に詳しくいえば、機械サイクル中に
未知ベクトルＸの初めのいくつかの値が簡単な除算ステ
ップで１つのセルにおいて計算されていると（マトリッ
クスの最大にされた形により可能とされる）、マトリッ
クス演算が他のプロセッサセルにおいて同時に行われて
、以前に計算された成分を用いて部分積（ＱＮ’　）を
発生する。Several important features of the methodology of the present invention will be apparent to those skilled in the art. The main feature is that the above-mentioned 12 simultaneous equations can be solved in 13 steps. This is because data dependence is removed as a result of the bandwidth-maximized form of the matrix used in the solution;
As described herein, this is only possible in combination with a unique pattern for storing and accessing matrix values on a multiprocessor. More specifically, if during a machine cycle the first few values of the unknown vector Operations are performed simultaneously in other processor cells to generate partial products (QN') using previously calculated components.

このようにして処理を進めてゆくと、以前に計算された
未知ベクトル成分に依存する部分積が必要とされた時に
、それらの成分を計算のために利用できる。このように
１後進代入法は、それらの部分積を利用できるようＫな
るまで遅れることなしに進行する。Proceeding in this manner, when partial products that depend on previously calculated unknown vector components are needed, those components can be used for calculations. In this way, the one-backward substitution method proceeds without delay until K are available so that these partial products can be utilized.

前記したように、マトリックスの主対角線と非零要素の
右上隅の集υとの間に配置された零のバンドの最小バン
ド幅Ｋがマルチプロセッサ装置の股引を基にして選択さ
れる。一般的に、Ｋは、与えられた任意の未知ベクトル
成分Ｘｌ　を計算するために必要なデータを集め、かつ
利用できるようにするのに要するサイクルすなわちステ
ップの最大数に等しくなければならない。第４図に示さ
れている装置に関して、たとえばサイクル２の間にＸｌ
、ｌがひとたび計算されると、Ｘ１３に関連する部分積
の計算、すなわち、累算および加算器を終えるため、お
よび方程式の解法に使用するためにそれらの部分積を減
算器６０へちょうど良い時刻に与えるためには、更に６
ステツプすなわち６サイクル（３〜８）を必要とする。As mentioned above, the minimum bandwidth K of the band of zeros located between the main diagonal of the matrix and the upper right corner collection of nonzero elements υ is selected based on the specifications of the multiprocessor system. In general, K should equal the maximum number of cycles or steps required to gather and make available the data necessary to compute any given unknown vector component Xl. For the apparatus shown in FIG. 4, for example, during cycle 2
. 6 more to give
It requires steps or 6 cycles (3-8).

したがって、バンド＠Ｋが６でおると第４図に示されて
いる特定の収縮プレイをフルに利用するための１スペー
スと時間を与えるために十分である。他のハードウェア
装置のために必要な特定のバンドは、そのハードウェア
装置の特性にもちろん依存する。より狭いバンド幅が与
えられると、操作される連立方程式中のデータの組成に
応じて効率がある程度犠牲にされることがある。Therefore, a band@K of 6 is sufficient to give one space and time to fully utilize the particular contraction play shown in FIG. The particular bands required for other hardware devices will of course depend on the characteristics of that hardware device. Given a narrower bandwidth, some efficiency may be sacrificed depending on the composition of the data in the system of equations being manipulated.

この技術分野において一般的に理解されているように、
パンデツドネス（ｂ亀ｎｄ＠ｄｎｅｓｓ）　　ト粗性は
有限要素網目の連結性（ｃｏｎｎ＊ｃｔｉｖｉ　ｔｙ）
を反映する。したがって、システムマトリックス（Ｋ）
を、第５Ｂ図に示すように粗構造を有する上側三角形の
バンド幅を最大にされたマトリックス〔Ａ〕に変換する
ことは、当業者が良く成し得ることであることはこの技
術分野において周知のことである。As commonly understood in this technical field,
breadthdness (bkaend@dness) roughness is the connectivity of the finite element network (conn*ctivity)
reflect. Therefore, the system matrix (K)
It is well known in the art that it is well within the skill of those skilled in the art to convert A into a matrix [A] that maximizes the bandwidth of the upper triangle having a coarse structure as shown in FIG. 5B. It is about.

第５Ａ図に示されているようなバンド幅を最小にするこ
とは、隣接する任意の２つの結節点の間の最小の差が有
性するように有限要素網目の結節点に番号をつけ直すこ
とを含むが、第５Ｂ図におけるマトリックスを最大にす
ることはシステムの結節点に番号をつけ直して隣接する
結節点の差を最大にすることにより行われる。このこと
は、結節点に割当てるために最初の合格が乱数を用いる
というような試行錯誤技術によシネ当に長い時間がかか
ることなしに本発明により行うことができる。Minimizing the bandwidth as shown in Figure 5A involves renumbering the nodes of the finite element network such that the minimum difference between any two adjacent nodes is significant. However, maximizing the matrix in FIG. 5B is accomplished by renumbering the nodes of the system to maximize the difference between adjacent nodes. This can be done in accordance with the present invention by trial and error techniques, such as using random numbers to assign the first pass to the nodes, without requiring a lengthy period of time.

典型的な網目は大きいから、この技術は対角線の近くの
バンドから非零データの９０％を通常除去する。それ以
上の非零データ除去は、バンドの内側と外側の非零デー
タの結節点の間の番号を切換えることにより行われる。Because typical meshes are large, this technique typically removes 90% of the nonzero data from bands near the diagonal. Further non-zero data removal is performed by switching the numbers between the nodes of non-zero data inside and outside the band.

この試行錯誤技術をもう少し続けることにより、分解さ
れたマトリックスを本発明の技術に従って効率良く動作
させるのに十分な幅のバンドが得られることになる。更
に、比較的少数のプロセッサを使用するようにしている
から、バンドの寸法はシステム網目中の有限要素の全体
の数より一般的に小さい。By continuing this trial and error technique a little longer, bands will be obtained that are wide enough to allow the decomposed matrix to operate efficiently in accordance with the techniques of the present invention. Furthermore, since a relatively small number of processors are used, the band size is generally smaller than the total number of finite elements in the system network.

当業者であれば理解できるであろうが、本発明のように
バンド幅を最大にしたマトリックスＡを用いることによ
り、未知の場の変数ベクトルの複数の未知成分ＸＮを、
場の変数の他の値に頼ることなしに計算できる。もし、
計算が１つのプロセッサ（上記の例ではセル１）におい
て順次行われるものとすると、装置の他のプロセッサ（
セル２〜４）は、同じ時間内に、ＸＮの全ての成分（と
くに、以前に計算された値に依存する成分）の計算を達
成するために後で必要となる部分積ＱＮ′の計算および
累算を行うことができる。したがって、データ依存無し
に実行できるＸＮの計算の回数を増すことにより、アイ
ドル時間を増すことなしに中間計算を行うためにマルチ
プロセッサ装置に時間が与えられる。装置の全てのプロ
セッサは、後進代入法のある部分を待たされる代シに、
各機械サイクル中に有意な計算を行うことができるよう
Ｋされる。As those skilled in the art will understand, by using the matrix A with the maximum bandwidth as in the present invention, the plural unknown components XN of the unknown field variable vector can be
It can be calculated without relying on other values of field variables. if,
Assuming that the calculations are performed sequentially in one processor (cell 1 in the above example), the other processors of the device (cell 1 in the above example)
Cells 2-4) calculate and calculate the partial products QN' that are later required to achieve the calculation of all components of XN (especially those that depend on previously calculated values) within the same time. Accumulation can be performed. Therefore, by increasing the number of XN calculations that can be performed without data dependence, time is given to the multiprocessor device to perform intermediate calculations without increasing idle time. All processors in the device are forced to wait for some part of the backward assignment method.
K so that meaningful calculations can be made during each machine cycle.

ここでとくに述べたマトリックス計算法は後進代入法で
あるが、同様のやり方で前進消去法も行うことができる
。その前進消去法は本発明の原理を用いて当業者が実現
できる。しかし、前進消去法ハ元のシステムマトリック
スを、後進代入法で用いられる上側三角形ではなくて、
下側三角形に変換する必要がある。前進消去法も前記諸
文献に記載されている。Although the matrix calculation method specifically mentioned here is a backward substitution method, a forward elimination method can also be performed in a similar manner. The forward elimination method can be implemented by one skilled in the art using the principles of the present invention. However, the forward elimination method converts the original system matrix into the upper triangle used in the backward substitution method.
Need to convert to lower triangle. Forward elimination methods are also described in the above documents.

本発明を簡潔に説明するために小さいマトリックスと小
さい方程式系について本発明を説明し九が、本発明の利
点は、有限要素解析法に従って解析すべき典型的な物理
系を表す大きくて粗でちる方程式系に関連する大きくて
粗であるマ）　ＩＪラックスついての操作から一層明ら
かに得られることをここで再び強調しておく。Although the invention is described in terms of a small matrix and a small system of equations in order to briefly explain the invention, the advantages of the invention are We emphasize here again that the large and coarse associated system of equations follows more clearly from the operation on the IJ lux.

以上、収縮コンピュータアーキテクチャに対して行われ
る後進代入操作に使用することを含む好適な実施例につ
いて本発明を説明したが、本発明は特定の並列マルチプ
ロセッサマトリックスの操作に限定されるものではない
。Although the invention has been described in terms of preferred embodiments including use in backward assignment operations performed on contracted computer architectures, the invention is not limited to the operation of any particular parallel multiprocessor matrix.

[Brief explanation of drawings]

第１図は大きくて粗であるマトリックスを処理するため
に本発明の１つの方法に従って用いられる基本的な収縮
アーキテクチャを示し、第２図は大きくて粗であるマト
リックスを並列マルチプロセッサの記憶装置に格納する
ための方法を示す流れ図、第３図は特定のマトリックス
、たとえば大きくて粗であり、バンド幅が最大にされた
マトリックスを本発明の一実施例に従って並列マルチグ
ロセツサアレイにマツピングする方法を示し、第４図は
第３図に従って並列マルチプロセッサ装置に格納された
バンド幅を最大にされた特定の分解マトリックスによシ
一部特徴づけられた三角形にされた線形方程式系の解法
において後進代入法を実行する方法を示し、第一４Ａ〜
４Ｍ図は一例としての格納されているバンド幅を最大に
された第４図に示されている分解マトリックスに対して
後進代入法を実行する引続く機械サイクル中に第４図に
示されているマルチプロセッサ装置を通るデータの流れ
を示し、第５Ａ図は先行技術のバンド幅を最小にされた
分解マ）　ＩＪソックス粗性構造を示し、第５Ｂ図は本
発明のバンド幅を最大にてれた分解マトリックスの粗性
構造を示す。１０・・・・ホスト、１２・・・・インター７エイスユ
ニツ）、１５・・・・セル、５１　・・・・乗算器、５
２・・・・除算器、５３・・・・累算器、５Ｂ、５７．
５８．５９・・・・記憶装置、６゜・・・・減算器、６
１・・・・カウンタ、６５・・・・加算器。復代理人　山川政樹（＃ν１２名）図面の浄書（；’〕’ｌ’ｔ？ニ変更なし）手続補正書
ζ年代つ１．事件の表示昭和６ヲ年特　　許顆第２２’３６０号２、裕ｎ目の名
称ｒ′＋４Ｉ７７コ七乞象に緋任才）λ法３、補正をする
者FIG. 1 shows the basic contraction architecture used according to one method of the invention to process large, coarse matrices, and FIG. 2 shows how to process large, coarse matrices into parallel multiprocessor storage. FIG. 3 is a flowchart illustrating a method for mapping a particular matrix, e.g., a large, coarse, bandwidth-maximized matrix, to a parallel multi-gross processor array in accordance with one embodiment of the present invention. 4 shows backward substitution in the solution of a triangulated system of linear equations characterized in part by a bandwidth-maximized particular decomposition matrix stored in a parallel multiprocessor device according to FIG. Showing how to carry out the law, 1st 4A~
The 4M diagram is shown in FIG. 4 during a subsequent machine cycle performing a backward substitution method on the decomposition matrix shown in FIG. 4 with exemplary stored bandwidth maximized. FIG. 5A shows a prior art bandwidth-minimized decomposition matrix, and FIG. 5B shows a bandwidth-maximized decomposition structure of the present invention. The rough structure of the decomposed matrix is shown. 10... Host, 12... Inter 7 Eighth Units), 15... Cell, 51... Multiplier, 5
2...Divider, 53...Accumulator, 5B, 57.
58.59...Storage device, 6°...Subtractor, 6
1...Counter, 65...Adder. Sub-agent Masaki Yamakawa (#ν12 people) Engraving of drawings (;']'l't? No changes) Procedural amendment ζ Date 1. Indication of the case 1932 patent No. 22'360 2, name of Yun's eye r'+4I77 7 beggar elephants) λ method 3, person making amendment

Claims

[Claims]

(1) In order to analyze physical phenomena, unknown N vector Y
a first linear equation characterized by a large and coarse known system matrix of order N×N that relates R to a known N vector R by a vector representation KY=R, the unknown vector, and the known vector. generating a system having only nonzero elements on a main diagonal, a band of zero elements extending from said main diagonal by a preselected bandwidth, and no nonzero elements in said band; transforming said first system of linear equations into a second system of equations characterized by a decomposed triangular matrix such that all remaining nonzero elements are located in the outer part of said band; and said decomposition. loading the elements of the decomposed matrix into a memory associated with a plurality of processors such that the diagonal elements of the decomposed matrix are contained in a memory associated with one of the processors; solving the second system of equations by calculating values of a plurality of components of the unknown N vector independently of previously calculated values of any components of the N vector of A method for analyzing characteristic physical phenomena.

(2) A process of dividing the solution region into a large but finite number of elements, and an approximate function determined by the value of the unknown field variable at the specified node existing on the boundary of the elements. and assembling the element functions into a first system of equations that describes the behavior of the solution domain, the first system of equations having the general matrix form KY=R; In the finite element analysis method of analyzing physical systems, where Y represents the unknown value of the field variable at a node in the system of equations, R represents the known boundary value, and K is a large and coarse matrix. an improved process for solving the system of equations by converting the system into a large, coarsely decomposed triangular matrix with only nonzero elements on the main diagonal; A finite element analysis method for analyzing physical phenomena, characterized in that a band of zero data is placed at a right-angled corner of a matrix, and a band of zero data is placed between the diagonal and the cluster.