JPH08255151A

JPH08255151A - Parallel computer and storing method for matrix

Info

Publication number: JPH08255151A
Application number: JP5725495A
Authority: JP
Inventors: Kazuto Kubota; 和人久保田; Satoshi Ito; 聡伊藤; Yasuyuki Kumagai; 泰幸熊谷; Tatsuo Shimizu; 達雄清水; Hiroyuki Takano; 裕之高野
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 1995-03-16
Filing date: 1995-03-16
Publication date: 1996-10-01

Abstract

PURPOSE: To provide parallel computers which can execute matrix calculation of small granularity at a calculation speed equal to that of a supercomputer and at low cost. CONSTITUTION: The parallel computers which calculates ligen values of a matrix and are provided with memory units 1a-1m and floating point computing elements 2a-2m as many as each other are equipped with a data transmission line which connects the memory units 1a-1m and floating point computing elements 2a-2m that are previously made to correspond to each other, a cross point switch 3 which connects the memory units 1a-1m and floating point computing elements 2a-2m in pairs according to an externally inputted control signal, and a controller 5 which generates the control signal for driving this cross point switch 3 from respective element values of the matrix through specific calculation.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、例えばヤコビ法を用い
て行列の固有値計算を実行する並列計算機及びこの並列
計算機に適用して好適な行列の格納方法に係り、特に粒
度の小さい行列計算の実行を高速かつ低コストで実現す
ることを可能とする並列計算機及びこの並列計算機に適
用して好適な行列の格納方法に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a parallel computer for executing eigenvalue calculation of a matrix by using, for example, the Jacobi method, and a matrix storage method suitable for application to the parallel computer. The present invention relates to a parallel computer capable of realizing execution at high speed and at low cost, and a matrix storage method suitable for application to this parallel computer.

【０００２】[0002]

【従来の技術】行列の固有値計算は、計算材料科学や流
体解析をはじめとする様々な分野で必要とされる。その
解法の一つとしてヤコビ法がある。ヤコビ法について
は、例えば、「戸川隼人著、マトリクスの数値計算」に
詳しい。ヤコビ法は精度が良い半面計算量が多いため、
ヤコビ法で大きな次元の行列を解くことは困難であり、
解く場合でもスーパコンピュータのような高速な計算機
を用いる必要があった。2. Description of the Related Art Calculation of eigenvalues of a matrix is required in various fields such as computational material science and fluid analysis. The Jacobian method is one of the solutions. The Jacobian method is described in detail in, for example, “Toyama Hayato, Matrix Numerical Calculation”. Since the Jacobi method is accurate and has a large amount of half-plane calculation,
It is difficult to solve a large dimensional matrix by the Jacobi method,
Even when solving it, it was necessary to use a high-speed computer such as a super computer.

【０００３】[0003]

【発明が解決しようとする課題】スーパーコンピュータ
は誰しもが使える環境にある訳ではなく、使用する際の
コストも大きい。また、複数の人間で使われているた
め、レスポンスタイムは実際の計算時間より大きい。ヤ
コビ法の計算は、並列性があるために並列計算機で並列
計算すれは処理速度を向上させることは可能である。し
かしながら、一般の並列計算機では、粒度が大きい場合
は性能が出るが、ヤコビ法のような粒度の小さい並列計
算では性能が出ないといった問題があった。。The supercomputer is not in an environment that anyone can use, and the cost of using it is high. Also, since it is used by multiple people, the response time is longer than the actual calculation time. Since the Jacobian method has parallelism, it is possible to improve the processing speed by parallel computing with a parallel computer. However, in a general parallel computer, there is a problem that the performance is obtained when the granularity is large, but the performance is not obtained in the parallel computing having the small granularity like the Jacobian method. .

【０００４】本発明は上記実情に鑑みなされたものであ
り、ヤコビ法をはじめとする粒度の小さい行列計算の実
行をスーパコンピュータに匹敵する計算速度で、かつ低
コストで実現することを可能とする並列計算機及びこの
並列計算機に適用して好適な行列の格納方法を提供する
ことを目的とする。The present invention has been made in view of the above circumstances, and makes it possible to execute matrix calculations with a small granularity such as the Jacobian method at a calculation speed comparable to that of a super computer and at a low cost. An object of the present invention is to provide a parallel computer and a suitable matrix storage method applied to this parallel computer.

【０００５】[0005]

【課題を解決するための手段】本発明の並列計算機は、
行列の固有値計算を行う並列計算機であって、メモリユ
ニットと浮動小数点演算器とが同じ数だけ設けられる並
列計算機において、予め対応づけられた上記メモリユニ
ットと浮動小数点演算器とを接続するデータ伝送路と、
上記メモリユニットと浮動小数点演算器とを外部入力さ
れる制御信号に従って対にして接続するクロスポイント
スイッチと、このクロスポイントスイッチを駆動制御す
るための制御信号を上記行列の各要素値から所定の演算
により生成する手段とを具備してなることを特徴とす
る。The parallel computer of the present invention is
A parallel computer for calculating eigenvalues of a matrix, in which the same number of memory units and floating-point arithmetic units are provided, a data transmission line connecting the previously associated memory units and floating-point arithmetic units When,
A cross point switch for connecting the memory unit and the floating point arithmetic unit in a pair according to a control signal input from the outside, and a control signal for driving and controlling the cross point switch are calculated from each element value of the matrix in a predetermined manner. And means for generating by.

【０００６】また、本発明の並列計算機は、データ入出
力幅がＷビット幅であるメモリユニットがｍ個搭載され
た基板ｎ枚と、データ入出力幅がＷビット幅である浮動
小数点演算器がｍ個搭載された基板ｎ枚とをデータ入出
力幅が１ビット幅であるＷｍ×Ｗｍのクロスポイントス
イッチｎ個で接続する際に、このクロスポイントスイッ
チすべてでＷ／ｎビット幅のｎｍ×ｎｍのクロスポイン
トスイッチとなるように、それぞれのクロスポイントス
イッチを上記メモリユニット及び浮動小数点演算器それ
ぞれとＷ／ｎビット幅ずつ接続することを特徴とする。Further, the parallel computer of the present invention comprises n boards having m memory units each having a data input / output width of W bit width and a floating point arithmetic unit having a data input / output width of W bit width. When n boards with m mounted boards are connected with n Wm × Wm crosspoint switches with a data input / output width of 1 bit, all crosspoint switches have a W / n bit width of nm × nm Each of the crosspoint switches is connected to each of the memory unit and the floating point arithmetic unit by W / n bit width so as to be a crosspoint switch.

【０００７】また、本発明の行列の格納方法は、行列の
固有値計算を行う並列計算機であって、メモリユニット
と浮動小数点演算器とがｍ個ずつ設けられた並列計算機
において、ｎ×ｎ行列の固有値計算を行うときの該行列
の上三角部分のｉ行ｊ列の要素を上記いずれかのメモリ
ユニットに格納する際に、ｉ＋ｊ＜ｎ＋２が成り立つと
きは、その要素を（ｉ＋ｊ）ｍｏｄｍ番目のメモリユ
ニットに格納し、ｉ＋ｊ＞ｎ＋１が成り立つときは、そ
の要素を（ｉ＋ｊｎ）ｍｏｄｍ番目のメモリに格納す
ることを特徴とする。The matrix storage method of the present invention is a parallel computer for calculating the eigenvalues of a matrix, and in a parallel computer provided with m memory units and floating point arithmetic units, an n × n matrix When i + j <n + 2 is satisfied when the element at the i-th row and the j-th column of the upper triangular portion of the matrix when performing the eigenvalue calculation is satisfied, i + j <n + 2 is satisfied. When stored in a memory unit and i + j> n + 1 holds, the element is stored in the (i + jn) mod mth memory.

【０００８】[0008]

【作用】ヤコビ法を用いた行列計算は、以下に示す２つ
の手順からなる。行列要素は、あらかじめ複数のメモリ
ユニットで分割して格納されているものとする。まず、
手順１について説明する。The matrix calculation using the Jacobian method consists of the following two procedures. It is assumed that the matrix elements are divided and stored in advance in a plurality of memory units. First,
Procedure 1 will be described.

【０００９】（手順１）メモリユニットそれぞれに設け
られ、その制御を行うメモリコントローラは、該メモリ
ユニット内の行列要素を順次読みだして、予め対応づけ
られた浮動小数点演算器（実施例の浮動小数点演算器Ａ
に対応する）に送信する。浮動小数点演算器では、この
送信された行列要素の中から最大の要素と、その要素の
行番号、列番号とを求める。各浮動小数点演算器で求め
られた最大値は、装置全体を制御する制御部（実施例の
浮動小数点演算器Ｂ及びコントローラに対応する）に送
られ、制御部は、その中の最大の要素値とその行番号、
列番号を得る。そして、求められた最大要素値を用いて
計算を行い、その結果を浮動小数点演算器それぞれに送
信する。(Procedure 1) A memory controller provided in each memory unit and controlling the memory unit sequentially reads out matrix elements in the memory unit and associates them in advance with a floating point arithmetic unit (floating point in the embodiment). Arithmetic unit A
(Corresponding to). The floating-point arithmetic unit obtains the maximum element among the transmitted matrix elements, and the row number and column number of the element. The maximum value obtained by each floating point arithmetic unit is sent to the control unit (corresponding to the floating point arithmetic unit B and the controller of the embodiment) that controls the entire apparatus, and the control unit determines the maximum element value in the unit. And its line number,
Get the column number. Then, calculation is performed using the obtained maximum element value, and the result is transmitted to each floating point arithmetic unit.

【００１０】次に手順２について説明する。（手順２）クロスポイントスイッチによるメモリユニッ
トと浮動小数点演算器との接続形態は、（手順１）で求
められた最大要素の行番号、列番号に応じて決定され
る。また、メモリコントローラは、（手順１）で求めら
れた最大要素の行番号、列番号に応じてメモリユニット
から順次、行列要素のデータを読み出す。そして、これ
らの要素データは、メモリユニットに予め対応づけられ
た浮動小数点演算器と、クロスポイントスイッチを介し
て接続される浮動小数点演算器とに送られる。Next, the procedure 2 will be described. (Procedure 2) The connection form of the memory unit and the floating point arithmetic unit by the cross point switch is determined according to the row number and column number of the maximum element obtained in (Procedure 1). Further, the memory controller sequentially reads the data of the matrix element from the memory unit according to the row number and the column number of the maximum element obtained in (procedure 1). Then, these element data are sent to the floating point arithmetic unit previously associated with the memory unit and the floating point arithmetic unit connected via the cross point switch.

【００１１】一方、浮動小数点演算器では、スイッチを
介して入力された要素と、直接入力されたデータを用い
て積和演算が行われ、その結果は、再びネットワークを
通じてメモリユニットへと送られる。メモリコントロー
ラは、スイッチから送られてきたデータを所定の位置に
格納する。即ち、これにより粒度の小さい行列計算の実
行を高速かつ低コストで実現することができることにな
る。On the other hand, in the floating point arithmetic unit, the sum of products operation is performed using the element input via the switch and the data directly input, and the result is sent again to the memory unit through the network. The memory controller stores the data sent from the switch in a predetermined position. In other words, this makes it possible to realize matrix calculation with small granularity at high speed and at low cost.

【００１２】[0012]

【実施例】以下図面を参照して本発明の実施例を説明す
る。図１は本発明に係る並列計算機の概略構成を示す図
である。図１に示すように、本発明に係る並列計算機
は、複数のメモリユニット１ａ〜１ｍと、複数の浮動小
数点演算器Ａ２ａ〜２ｍと、ｍ×ｍのクロスポイントス
イッチ３と、浮動小数点演算Ｂ４と、これら全体を制御
するコントローラ５を有してなる。各メモリユニット１
ａ〜１ｍには行列要素が格納される。このメモリユニッ
トのデータの入出力は、メモリコントローラにより制御
される。また、クロスポイントスイッチ３は、メモリユ
ニット１ａ〜１ｍと、浮動小数点演算記Ａ２ａ〜２ｍと
を１対１に接続するものである。このクロスポイントス
イッチ３は、スイッチコントローラを介しコントローラ
６により接続形態が制御される。Embodiments of the present invention will be described below with reference to the drawings. FIG. 1 is a diagram showing a schematic configuration of a parallel computer according to the present invention. As shown in FIG. 1, the parallel computer according to the present invention includes a plurality of memory units 1a to 1m, a plurality of floating point arithmetic units A2a to 2m, an m × m crosspoint switch 3, and a floating point arithmetic B4. , And has a controller 5 for controlling all of them. Each memory unit 1
Matrix elements are stored in a to 1m. The input / output of data of this memory unit is controlled by the memory controller. The cross point switch 3 connects the memory units 1a to 1m and the floating point arithmetic units A2a to 2m in a one-to-one manner. The crosspoint switch 3 has a connection configuration controlled by a controller 6 via a switch controller.

【００１３】浮動小数点演算器Ａ２ａ〜２ｍは、浮動小
数点の積和演算及び入力されたデータ列中の最大値を持
つデータの検索を行う。また、浮動小数点演算器Ａ２ａ
〜２ｍそれぞれは、予め定められたメモリユニットとの
間にデータ伝送路が設けられるとともに、クロスポイン
トスイッチ３とも接続される。さらに、各浮動小数点演
算器Ａ２ａ〜２ｍは、バス６を介してコントローラ５と
接続される。The floating-point arithmetic units A2a to 2m perform a floating-point multiply-accumulate operation and search for data having the maximum value in the input data string. In addition, the floating point arithmetic unit A2a
Each of .about.2 m is provided with a data transmission path between it and a predetermined memory unit, and is also connected to the cross point switch 3. Further, each of the floating point arithmetic units A2a to 2m is connected to the controller 5 via the bus 6.

【００１４】浮動小数点演算器Ｂ４は、浮動小数点の加
減乗除と、ｔａｎ^-1の計算とを行う。また、コントロー
ラ５は、装置全体を制御する。まず、第１実施例を説明
する。The floating point calculator B4 performs addition, subtraction, multiplication and division of floating point and calculation of tan ^-1 . Further, the controller 5 controls the entire device. First, the first embodiment will be described.

【００１５】同実施例では、まず、複数のメモリユニッ
ト１ａ〜１ｍに分割して格納された行列要素の中から、
浮動小数点演算器Ａ２ａ〜２ｍと、コントローラ５によ
って最大要素が求められる。この最大値を用いた計算が
浮動小数点演算器Ｂ４で行われ、その結果は浮動小数点
演算器Ａ２ａ〜２ｍそれぞれに送られる。In the embodiment, first, from among the matrix elements divided and stored in a plurality of memory units 1a to 1m,
The maximum element is obtained by the floating point arithmetic units A2a to 2m and the controller 5. The calculation using the maximum value is performed by the floating point arithmetic unit B4, and the result is sent to each of the floating point arithmetic units A2a to 2m.

【００１６】次に、メモリユニット１ａ〜１ｍに格納さ
れた行列の要素は、クロスポイントスイッチ３を経由し
て、浮動小数点演算器Ａ２ａ〜２ｍへと送られ、浮動小
数点演算器Ｂ４で計算された値との間で計算が行われ
る。この計算結果はクロスポイントスイッチ３を介して
メモリユニット１ａ〜１ｍに格納される。Next, the elements of the matrix stored in the memory units 1a to 1m are sent to the floating point arithmetic units A2a to 2m via the cross point switch 3 and calculated by the floating point arithmetic unit B4. Calculations are performed between and. The calculation result is stored in the memory units 1a to 1m via the cross point switch 3.

【００１７】ここで、ヤコビ法を用いた行列の固有値計
算のアルゴリズムを図２に示す。ここでは、例として図
３に示すような５×５の行列を用い、要素をａ［０，
０］，ａ［０，１］，…，ａ［０，４］，ａ［１，
０］，ａ［１，１］，…，ａ［４，３］，ａ［４，４］
とする。また、メモリユニット及び浮動小数点演算器Ａ
の数を１０、クロスポイントスイッチは１０×１０のも
のとする。FIG. 2 shows an algorithm for calculating the eigenvalue of a matrix using the Jacobian method. Here, as an example, a 5 × 5 matrix as shown in FIG. 3 is used, and elements are a [0,
0], a [0, 1], ..., a [0, 4], a [1,
0], a [1,1], ..., a [4,3], a [4,4]
And In addition, the memory unit and the floating point arithmetic unit A
Is 10 and the cross point switch is 10 × 10.

【００１８】そして、行列の各要素は、メモリユニット
に分散して格納するものとする。例えば、図４に示すよ
うに、（１）メモリユニット（０）ａ［０，０］，ａ［１，４］，ａ［２，３］，ａ［３，２］，ａ［４，１］（２）メモリユニット（１）ａ［０，１］，ａ［１，０］，ａ［２，４］，ａ［３，３］，ａ［４，２］（３）メモリユニット（２）ａ［０，２］，ａ［１，１］，ａ［２，０］，ａ［３，４］，ａ［４，３］（４）メモリユニット（３）ａ［０，３］，ａ［１，２］，ａ［２，１］，ａ［３，０］，ａ［４，４］（４）メモリユニット（４）ａ［０，４］，ａ［１，３］，ａ［２，２］，ａ［３，１］，ａ［４，０］のように格納する。Then, each element of the matrix is distributed and stored in the memory unit. For example, as shown in FIG. 4, (1) memory unit (0) a [0,0], a [1,4], a [2,3], a [3,2], a [4,1 ] (2) Memory unit (1) a [0,1], a [1,0], a [2,4], a [3,3], a [4,2] (3) Memory unit (2 ) A [0,2], a [1,1], a [2,0], a [3,4], a [4,3] (4) Memory unit (3) a [0,3], a [1,2], a [2,1], a [3,0], a [4,4] (4) Memory unit (4) a [0,4], a [1,3], a It is stored as [2,2], a [3,1], a [4,0].

【００１９】このように格納することで、いずれかのメ
モリユニットにアクセスが集中することを防ぐ。まず、
同実施例では、行列要素の最大値を求める処理を行う
（図２のステップＡ１）。コントローラ５は、すべての
メモリコントローラに対して、メモリユニット１ａ〜１
ｍに格納されている行列要素を全て読みだし、それぞれ
対応する浮動小数点演算器Ａ２ａ〜２ｍに送るように命
令する。一方、浮動小数点演算器Ａ２ａ〜２ｍでは、各
メモリユニット１ａ〜１ｍ内の絶対値が最大の要素が求
められる。そして、コントローラ５は、各浮動小数点演
算器Ａ２ａ〜２ｍから絶対値が最大の要素を集め、その
中で一番絶対値が大きい要素を求める。ここでは、これ
をａ［Ｉ，Ｊ］とする。そして、いま、ａ［１，２］が
最大のものであったとする。By storing in this way, it is possible to prevent access from concentrating on any of the memory units. First,
In this embodiment, the process of obtaining the maximum value of the matrix element is performed (step A1 in FIG. 2). The controller 5 has memory units 1a to 1 for all memory controllers.
All the matrix elements stored in m are read out, and instructions are sent to the corresponding floating point arithmetic units A2a to 2m. On the other hand, in the floating point arithmetic units A2a to 2m, the element having the maximum absolute value in each of the memory units 1a to 1m is obtained. Then, the controller 5 collects the elements having the maximum absolute values from the respective floating point arithmetic units A2a to 2m, and obtains the element having the largest absolute value among them. Here, this is a [I, J]. Then, suppose that a [1,2] is now the maximum.

【００２０】次に、コントローラ５は、このａ［１，
２］と、予め定められたｅという値とを比較し（図２の
ステップＡ２）、ａ［１，２］がｅより小さいなら処理
を終了する（図２のステップＡ２のＹ）。Next, the controller 5 uses the a [1,
2] is compared with a predetermined value of e (step A2 in FIG. 2), and if a [1,2] is smaller than e, the process ends (Y in step A2 in FIG. 2).

【００２１】そして、ａ［１，２］がｅより小さくない
ときに（図２のステップＡ２のＮ）、以下の処理を継続
する。まず、メモリユニットからａ［Ｉ，Ｉ］、ａ
［Ｊ，Ｊ］をコントローラ５に読み出し、コントローラ
５によりこのａ［Ｉ，Ｉ］とａ［Ｊ，Ｊ］とからｔｈｅ
ｔａを計算し、さらにｓとｃを計算する（図２のステッ
プＡ３）。このｓとｃとは、バス６を介して各浮動小数
点演算器Ａ２ａ〜２ｍへと送られる。そして、コントロ
ーラ５は、ＩとＪとを各メモリコントローラとスイッチ
コントローラとに送信する。Then, when a [1,2] is not smaller than e (N in step A2 of FIG. 2), the following processing is continued. First, from the memory unit, a [I, I], a
[J, J] is read out to the controller 5, and the controller 5 uses the a [I, I] and a [J, J] to the
ta is calculated, and s and c are calculated (step A3 in FIG. 2). The s and c are sent to the floating point arithmetic units A2a to 2m via the bus 6. Then, the controller 5 sends I and J to each memory controller and switch controller.

【００２２】ｍ×ｍのクロスポイントスイッチ３では、
メモリユニット１ａ〜１ｍ側の入出力端子ｌと、浮動小
数点演算器Ａ２ａ〜２ｍ側の入出力端子（ｌ−｜Ｉ−Ｊ
｜）ｍｏｄｍとを接続する（図２のステップＡ４）。
即ち、Ｉ＝１、Ｊ＝２の場合、スイッチコントローラ
は、メモリユニット（０）、（１）、（２）、（３）、
（４）が、浮動小数点演算器Ａ（４）、（０）、
（１）、（２）、（３）とそれぞれつながるように設定
する。In the m × m cross point switch 3,
The input / output terminal l on the side of the memory units 1a to 1m and the input / output terminal 1 on the side of the floating point arithmetic units A2a to 2m (l- | I-J
|) Mod m is connected (step A4 in FIG. 2).
That is, when I = 1 and J = 2, the switch controller determines that the memory units (0), (1), (2), (3),
(4) is a floating point arithmetic unit A (4), (0),
Settings are made so that they are connected to (1), (2), and (3), respectively.

【００２３】次に、各メモリコントローラは、Ｉ、Ｊか
ら読み出すべき要素を求める。これにより図５に示すよ
うなデータ送信が行われることになる。（１）メモリユニット（０）ａ［４，１］→浮動小数点演算器Ａ（０）：データ伝送路経由ａ［３，２］→浮動小数点演算器Ａ（４）：クロスポイントスイッチ経由（２）メモリユニット（１）ａ［０，１］→浮動小数点演算器Ａ（１）：データ伝送路経由ａ［４，２］→浮動小数点演算器Ａ（０）：クロスポイントスイッチ経由（３）メモリユニット（２）ａ［１，１］→浮動小数点演算器Ａ（２）：データ伝送路経由ａ［０，２］→浮動小数点演算器Ａ（１）：クロスポイントスイッチ経由（４）メモリユニット（３）ａ［２，１］→浮動小数点演算器Ａ（３）：データ伝送路経由ａ［１，２］→浮動小数点演算器Ａ（２）：クロスポイントスイッチ経由（５）メモリユニット（４）ａ［３，１］→浮動小数点演算器Ａ（４）：データ伝送路経由ａ［２，２］→浮動小数点演算器Ａ（３）：クロスポイントスイッチ経由そして、これらの要素を受けとった浮動小数点演算器
は、これらの要素とｓ、ｃから、それぞれ以下の演算を
行う（図２のステップＡ５）。（１）浮動小数点演算器Ａ（０）ａ［４，１］＝ｃ×ａ［４，１］＋ｓ×ａ［４，２］ａ［４，２］＝−ｓ×ａ［４，１］＋ｃ×ａ［４，２］計算された結果のうちａ［４，１］はメモリユニット
（０）に書き戻され、ａ［４，２］はスイッチを介して
メモリユニット（１）に書き戻される。（２）浮動小数点演算器Ａ（１）ａ［０，１］＝ｃ×ａ［０，１］＋ｓ×ａ［０，２］ａ［０，２］＝−ｓ×ａ［０，１］＋ｃ×ａ［０，２］計算された結果のうちａ［０，１］はメモリユニット
（１）に書き戻され、ａ［０，２］はスイッチを介して
メモリユニット（２）に書き戻される。（３）浮動小数点演算器Ａ（２）ａ［１，１］＝ｃ×ａ［１，１］＋ｓ×ａ［１，２］ａ［１，２］＝−ｓ×ａ［１，１］＋ｃ×ａ［１，２］計算された結果のうちａ［１，１］はメモリユニット
（２）に書き戻され、ａ［１，２］はスイッチを介して
メモリユニット（３）に書き戻される。（４）浮動小数点演算器Ａ（３）ａ［２，１］＝ｃ×ａ［２，１］＋ｓ×ａ［２，２］ａ［２，２］＝−ｓ×ａ［２，１］＋ｃ×ａ［２，２] 計算された結果のうちａ［２，１］はメモリユニット
（３）に書き戻され、ａ［２，２］はスイッチを介して
メモリユニット（４）に書き戻される。（５）浮動小数点演算器Ａ（４）ａ［３，１］＝ｃ×ａ［３，１］＋ｓ×ａ［３，２］ａ［３，２］＝−ｓ×ａ［３，１］＋ｃ×ａ［３，２］計算された結果のうちａ［３，１］はメモリユニット
（４）に書き戻され、ａ［３，２］はスイッチを介して
メモリユニット（０）に書き戻される。Next, each memory controller obtains an element to be read from I and J. As a result, data transmission as shown in FIG. 5 is performed. (1) Memory unit (0) a [4,1] → floating point arithmetic unit A (0): via data transmission line a [3,2] → floating point arithmetic unit A (4): via crosspoint switch (2) ) Memory unit (1) a [0,1] → floating point arithmetic unit A (1): via data transmission line a [4,2] → floating point arithmetic unit A (0): via crosspoint switch (3) memory Unit (2) a [1,1] → floating point arithmetic unit A (2): via data transmission line a [0,2] → floating point arithmetic unit A (1): via crosspoint switch (4) Memory unit ( 3) a [2,1] → floating point arithmetic unit A (3): via data transmission line a [1,2] → floating point arithmetic unit A (2): via crosspoint switch (5) memory unit (4) a [3,1] → floating-point arithmetic unit A (4): Via data transmission line a [2,2] → floating-point arithmetic unit A (3): via crosspoint switch Then, the floating-point arithmetic unit which receives these elements, from these elements and s and c, respectively, Calculation is performed (step A5 in FIG. 2). (1) Floating point arithmetic unit A (0) a [4,1] = c * a [4,1] + s * a [4,2] a [4,2] =-s * a [4,1] + C × a [4,2] Of the calculated results, a [4,1] is written back to the memory unit (0), and a [4,2] is written back to the memory unit (1) via the switch. Be done. (2) Floating point arithmetic unit A (1) a [0,1] = c * a [0,1] + s * a [0,2] a [0,2] =-s * a [0,1] + C × a [0,2] Of the calculated results, a [0,1] is written back to the memory unit (1), and a [0,2] is written back to the memory unit (2) via the switch. Be done. (3) Floating point arithmetic unit A (2) a [1,1] = c × a [1,1] + s × a [1,2] a [1,2] = − s × a [1,1] + C × a [1,2] Of the calculated results, a [1,1] is written back to the memory unit (2), and a [1,2] is written back to the memory unit (3) via the switch. Be done. (4) Floating point arithmetic unit A (3) a [2,1] = c * a [2,1] + s * a [2,2] a [2,2] =-s * a [2,1] + C × a [2,2] Of the calculated results, a [2,1] is written back to the memory unit (3), and a [2,2] is written back to the memory unit (4) via a switch. Be done. (5) Floating point arithmetic unit A (4) a [3,1] = c * a [3,1] + s * a [3,2] a [3,2] =-s * a [3,1] + C × a [3,2] Of the calculated results, a [3,1] is written back to the memory unit (4), and a [3,2] is written back to the memory unit (0) via the switch. Be done.

【００２４】次に、クロスポイントスイッチ３の接続パ
ターンはそのままで、以下の処理を行う（図２のステッ
プＡ６）。即ち、各メモリコントローラは、Ｉ、Ｊから
読み出すべき要素を求める。これにより以下に示すよう
なデータ送信が行われることになる。（１）メモリユニット（０）ａ［１，４］→浮動小数点演算器Ａ（０）：データ伝送路経由ａ［２，３］→浮動小数点演算器Ａ（４）：クロスポイントスイッチ経由（２）メモリユニット（１）ａ［１，０］→浮動小数点演算器Ａ（１）：データ伝送路経由ａ［２，４］→浮動小数点演算器Ａ（０）：クロスポイントスイッチ経由（３）メモリユニット（２）ａ［１，１］→浮動小数点演算器Ａ（２）：データ伝送路経由ａ［２，０］→浮動小数点演算器Ａ（１）：クロスポイントスイッチ経由（４）メモリユニット（３）ａ［１，２］→浮動小数点演算器Ａ（３）：データ伝送路経由ａ［２，１］→浮動小数点演算器Ａ（２）：クロスポイントスイッチ経由（５）メモリユニット（４）ａ［１，３］→浮動小数点演算器Ａ（４）：データ伝送路経由ａ［２，２］→浮動小数点演算器Ａ（３）：クロスポイントスイッチ経由そして、これらの要素を受けとった浮動小数点演算器
は、これらの要素とｓ、ｃから、それぞれ以下の演算を
行う。（１）浮動小数点演算器Ａ（０）ａ［１，４］＝ｃ×ａ［１，４］＋ｓ×ａ［２，４］ａ［２，４］＝−ｓ×ａ［１，４］＋ｃ×ａ［２，４］計算された結果のうちａ［１，４］はメモリユニット
（０）に書き戻され、ａ［２，４］はスイッチを介して
メモリユニット（１）に書き戻される。（２）浮動小数点演算器Ａ（１）ａ［１，０］＝ｃ×ａ［１，０］＋ｓ×ａ［２，０］ａ［２，０］＝−ｓ×ａ［１，０］＋ｃ×ａ［２，０］計算された結果のうちａ［１，０］はメモリユニット
（１）に書き戻され、ａ［２，０］はスイッチを介して
メモリユニット（２）に書き戻される。（３）浮動小数点演算器Ａ（２）ａ［１，１］＝ｃ×ａ［１，１］＋ｓ×ａ［２，１］ａ［２，１］＝−ｓ×ａ［１，１］＋ｃ×ａ［２，１] 計算された結果のうちａ［１，１］はメモリユニット
（２）に書き戻され、ａ［２，１］はスイッチを介して
メモリユニット（３）に書き戻される。（４）浮動小数点演算器Ａ（３）ａ［１，２］＝ｃ×ａ［１，２］＋ｓ×ａ［２，２］ａ［２，２］＝−ｓ×ａ［１，２］＋ｃ×ａ［２，２] 計算された結果のうちａ［１，２］はメモリユニット
（３）に書き戻され、ａ［２，２］はスイッチを介して
メモリユニット（４）に書き戻される。（５）浮動小数点演算器Ａ（４）ａ［１，３］＝ｃ×ａ［１，３］＋ｓ×ａ［２，３］ａ［２，３］＝−ｓ×ａ［１，３］＋ｃ×ａ［２，３] 計算された結果のうちａ［１，３］はメモリユニット
（４）に書き戻され、ａ［２，３］はスイッチを介して
メモリユニット（０）に書き戻される。Next, the following processing is performed with the connection pattern of the cross point switch 3 unchanged (step A6 in FIG. 2). That is, each memory controller obtains an element to be read from I and J. As a result, the following data transmission is performed. (1) Memory unit (0) a [1,4] → floating point arithmetic unit A (0): via data transmission path a [2,3] → floating point arithmetic unit A (4): via crosspoint switch (2) ) Memory unit (1) a [1,0] → floating point arithmetic unit A (1): via data transmission line a [2,4] → floating point arithmetic unit A (0): via crosspoint switch (3) memory Unit (2) a [1,1] → floating point arithmetic unit A (2): via data transmission line a [2,0] → floating point arithmetic unit A (1): via crosspoint switch (4) Memory unit ( 3) a [1,2] → floating point arithmetic unit A (3): via data transmission line a [2,1] → floating point arithmetic unit A (2): via crosspoint switch (5) memory unit (4) a [1,3] → floating point arithmetic unit A (4): Via data transmission line a [2,2] → floating-point arithmetic unit A (3): via crosspoint switch Then, the floating-point arithmetic unit which receives these elements, from these elements and s and c, respectively, Calculate. (1) Floating point arithmetic unit A (0) a [1,4] = c × a [1,4] + s × a [2,4] a [2,4] = − s × a [1,4] + C × a [2,4] Of the calculated results, a [1,4] is written back to the memory unit (0), and a [2,4] is written back to the memory unit (1) via the switch. Be done. (2) Floating point arithmetic unit A (1) a [1,0] = c × a [1,0] + s × a [2,0] a [2,0] = − s × a [1,0] + C × a [2,0] Among the calculated results, a [1,0] is written back to the memory unit (1), and a [2,0] is written back to the memory unit (2) via the switch. Be done. (3) Floating point arithmetic unit A (2) a [1,1] = c × a [1,1] + s × a [2,1] a [2,1] = − s × a [1,1] + C × a [2,1] Of the calculated results, a [1,1] is written back to the memory unit (2), and a [2,1] is written back to the memory unit (3) via the switch. Be done. (4) Floating point arithmetic unit A (3) a [1,2] = c × a [1,2] + s × a [2,2] a [2,2] = − s × a [1,2] + C × a [2,2] Of the calculated results, a [1,2] is written back to the memory unit (3), and a [2,2] is written back to the memory unit (4) via the switch. Be done. (5) Floating point arithmetic unit A (4) a [1,3] = c × a [1,3] + s × a [2,3] a [2,3] = − s × a [1,3] + C × a [2,3] Among the calculated results, a [1,3] is written back to the memory unit (4), and a [2,3] is written back to the memory unit (0) via the switch. Be done.

【００２５】以上の処理が終了すると、再び先頭から処
理を繰り返す。これにより、粒度の小さい行列計算の実
行を高速かつ低コストで実現することができることにな
る。When the above processing is completed, the processing is repeated from the beginning. This makes it possible to execute matrix calculation with a small granularity at high speed and at low cost.

【００２６】次に、本発明の第２実施例を説明する。こ
こでは、８ビット幅のメモリユニットｍ個と、８ビット
幅の浮動小数点演算器ｍ個とを８ビット幅のｍ×ｍのク
ロスポイントスイッチで接続する場合について考える。Next, a second embodiment of the present invention will be described. Here, consider a case where m memory units of 8-bit width and m floating-point arithmetic units of 8-bit width are connected by an m × m crosspoint switch of 8-bit width.

【００２７】例えば、いま８ビット幅のメモリユニット
が４個搭載されたメモリボードと、８ビット幅の浮動小
数点演算器が４個搭載された浮動小数点演算器ボード
と、１ビット幅の３２×３２のクロスポイントスイッチ
ボードとがそれぞれ１枚つづあるとすると、図６に示す
ように、クロスポイントスイッチを８ビット幅の４×４
のクロスポイントのスイッチとして各基板を接続する。
基板間の接続には、接続用基板を用いてもよいし、コネ
クターを用いてもよい。For example, a memory board on which four 8-bit width memory units are mounted, a floating-point arithmetic unit board on which four 8-bit width floating-point arithmetic units are mounted, and a 32 × 32 1-bit width unit Assuming that there are one cross point switch board and one cross point switch board, as shown in FIG.
Connect each board as a cross point switch.
A connection substrate or a connector may be used for the connection between the substrates.

【００２８】また、上述したメモリボードと、浮動小数
点演算器ボードと、クロスポイントスイッチボードとが
それぞれ２枚づつあるとすると、図７に示すように、２
枚のクロスポイントスイッチを、それぞれ４ビット幅の
８×８のクロスポイントスイッチとして利用し、メモリ
ユニットや浮動小数点演算器の入出力の８ビットを４ビ
ットづつに分けて、それぞれを別々のクロスポイントス
イッチ基板に接続する。If there are two memory boards, two floating-point arithmetic unit boards, and two cross-point switch boards, as shown in FIG.
Each of the crosspoint switches is used as an 8x8 crosspoint switch with a 4-bit width, and the 8 bits of the input / output of the memory unit or the floating point arithmetic unit are divided into 4 bits, and each of them is a separate crosspoint. Connect to the switch board.

【００２９】これにより、クロスポイントスイッチ基板
間の配線が不要になり、ハードウェアの拡張が容易にな
る。次に、本発明の第３実施例を説明する。This eliminates the need for wiring between the crosspoint switch boards and facilitates hardware expansion. Next, a third embodiment of the present invention will be described.

【００３０】ヤコビ法においては、対称行列を扱うため
に、ｎ×ｎの行列を扱う際、図８に示すように、上三角
行列のみ保持すればよい。このとき、以下に示す行列の
格納方法をとることにより各メモリユニットごとの要素
の数がほぼ均等になり、かつ、計算の際に使用される要
素も特定のメモリユニットに集中しなくなる。In the Jacobian method, in order to handle a symmetric matrix, when handling an n × n matrix, it is sufficient to hold only the upper triangular matrix as shown in FIG. At this time, the number of elements for each memory unit becomes substantially equal by adopting the following matrix storage method, and the elements used in the calculation are not concentrated in a specific memory unit.

【００３１】ここでは、５×５の行列を用い、要素をａ
［０，０］，ａ［０，１］，…，ａ［０，４］，ａ
［１，０］，ａ［１，１］，…，ａ［３，４］，ａ
［４，４］とする。また、同実施例に係る並列計算機の
メモリユニット及び浮動小数点演算器の数は５、スイッ
チは５×５のものとする。Here, a 5 × 5 matrix is used, and the element is a
[0,0], a [0,1], ..., a [0,4], a
[1,0], a [1,1], ..., a [3,4], a
[4, 4]. Further, the number of memory units and floating-point arithmetic units of the parallel computer according to the embodiment is 5, and the number of switches is 5 × 5.

【００３２】まず、行列の各要素をメモリユニットに分
散して格納する。上三角行列のみ格納すれば良いので、
図９に示すように、（１）メモリユニット（０）ａ［０，０］，ａ［１，４］，ａ［２，３］（２）メモリユニット（１）ａ［０，１］，ａ［２，４］，ａ［３，３］（３）メモリユニット（２）ａ［０，２］，ａ［１，１］，ａ［３，４］（４）メモリユニット（３）ａ［０，３］，ａ［１，２］，ａ［４，４］（５）メモリユニット（４）ａ［０，４］，ａ［１，３］，ａ［２，２］というように格納する。このように格納することで、同
一のメモリユニットにアクセスが集中するのを防ぐ。First, each element of the matrix is distributed and stored in the memory unit. Since we only need to store the upper triangular matrix,
As shown in FIG. 9, (1) memory unit (0) a [0,0], a [1,4], a [2,3] (2) memory unit (1) a [0,1], a [2,4], a [3,3] (3) Memory unit (2) a [0,2], a [1,1], a [3,4] (4) Memory unit (3) a [0,3], a [1,2], a [4,4] (5) Memory unit (4) a [0,4], a [1,3], a [2,2] Store. By storing in this way, it is possible to prevent access from being concentrated on the same memory unit.

【００３３】この格納方法を図１０を参照して説明す
る。ここでは、ｎ×ｎの行列を対象とし、各要素をｉ行
ｊ列で表現する（ｉ及びｊは０〜ｎとする）。This storage method will be described with reference to FIG. Here, for an n × n matrix, each element is represented by i rows and j columns (i and j are 0 to n).

【００３４】まず、ｉ行ｊ列のｉを０とする（図１０の
ステップＢ１）。そして、ｉ行ｊ列のｊをｉとする（図
１０のステップＢ２）。即ち、最初は０行０列の要素が
格納対象の要素となることになる。First, i in row i and column j is set to 0 (step B1 in FIG. 10). Then, j at i-th row and j-th column is set to i (step B2 in FIG. 10). That is, initially, the element in the 0th row and 0th column becomes the element to be stored.

【００３５】次に、ｉ＋ｊ＜ｎ＋２が成り立つか否かを
判定し（図１０のステップＢ３）、成り立つ場合には
（図１０のステップＢ３のＹ）、この要素を（ｉ＋ｊ）
ｍｏｄｍ番のメモリユニットに格納する（図１０の
ステップＢ４）。一方、成り立たない場合には（図１０
のステップＢ３のＮ）、この要素を（ｉ＋ｊ−ｎ）ｍ
ｏｄｍ番のメモリユニットに格納する（図１０のステ
ップＢ５）。Next, it is judged whether or not i + j <n + 2 is satisfied (step B3 in FIG. 10). If it is satisfied (Y in step B3 in FIG. 10), this element is (i + j).
The data is stored in the mod mth memory unit (step B4 in FIG. 10). On the other hand, if it does not hold (Fig. 10
Step B3 N), this element is (i + j−n) m
The data is stored in the memory unit number od m (step B5 in FIG. 10).

【００３６】ここで、ｊの値をカウントアップし（図１
０のステップＢ６）、その結果、ｊがｎに達しない場合
には（図１０のステップＢ７のＮ）、再びステップＢ３
からの処理を繰り返す。一方、ｊがｎに達した場合には
（図１０のステップＢ７のＹ）、今度はｉをカウントア
ップする（図１０のステップＢ８）。Here, the value of j is counted up (see FIG. 1).
0, step B6), and as a result, if j does not reach n (N in step B7 in FIG. 10), step B3 is executed again.
Repeat the process from. On the other hand, when j reaches n (Y in step B7 of FIG. 10), i is counted up this time (step B8 of FIG. 10).

【００３７】この結果、ｎに達しない場合には（図１０
のステップＢ９のＮ）、再びステップＢ２からの処理を
繰り返し、ｉがｎに達した場合には（図１０のステップ
Ｂ９のＹ）、この処理を終了する。As a result, if n is not reached (see FIG.
Step B9 N), the processing from step B2 is repeated again, and when i reaches n (Y of step B9 in FIG. 10), this processing ends.

【００３８】これにより、行列要素の中の上三角行列の
みを効率良く分散させて各メモリユニットに分散させて
格納することができる。次に、図１１を参照して、この
格納方法を適用した際の並列計算機の処理について説明
する。As a result, only the upper triangular matrix of the matrix elements can be efficiently dispersed and stored in each memory unit. Next, the processing of the parallel computer when this storage method is applied will be described with reference to FIG.

【００３９】まず、同実施例では、行列要素の最大値を
求める処理を行う（図１１のステップＣ１）。この処理
は、第１実施例で説明した手順で行われる。いま、ａ
［１，２］が求められたとすると、コントローラ５は、
このａ［１，２］と、予め定められたｅという値と比較
し（図１１のステップＣ２）、ａ［１，２］がｅより小
さいなら処理を終了する（図２のステップＡ２のＹ）。First, in this embodiment, the process of obtaining the maximum value of the matrix element is performed (step C1 in FIG. 11). This processing is performed according to the procedure described in the first embodiment. Now a
If [1,2] is calculated, the controller 5
This a [1,2] is compared with a predetermined value of e (step C2 in FIG. 11), and if a [1,2] is smaller than e, the process ends (Y in step A2 in FIG. 2). ).

【００４０】まず、メモリユニットからａ［Ｉ，Ｉ］、
ａ［Ｊ，Ｊ］をコントローラ５に読み出し、コントロー
ラ５によりこのａ［Ｉ，Ｉ］とａ［Ｊ，Ｊ］とからｔｈ
ｅｔａを計算し、さらにｓとｃを計算する（図１１のス
テップＣ３）。このｓとｃとは、バス６を介して各浮動
小数点演算器Ａ２ａ〜２ｍへと送られる。そして、コン
トローラ５は、ＩとＪとを各メモリコントローラとスイ
ッチコントローラとに送信する。First, from the memory unit, a [I, I],
a [J, J] is read out to the controller 5, and the controller 5 calculates th from this a [I, I] and a [J, J].
eta is calculated, and s and c are further calculated (step C3 in FIG. 11). The s and c are sent to the floating point arithmetic units A2a to 2m via the bus 6. Then, the controller 5 sends I and J to each memory controller and switch controller.

【００４１】ここで、コントローラ５は、ａａ［Ｉ，
Ｉ］＝ａ［Ｉ，Ｉ］＊ｃ＊ｃ＋２＊ａ［Ｉ，Ｊ］＊ｓ＊
ｃ＋ａ［Ｊ，Ｊ］＊ｓ＊ｓと、ａａ［Ｊ，Ｊ］＝ａ
［Ｉ，Ｉ］＊ｓ＊ｓ−２＊ａ［Ｉ，Ｊ］＊ｓ＊ｃ＋ａ
［Ｊ，Ｊ］＊ｃ＊ｃとを計算し、ａａ［Ｉ，Ｊ］＝０と
する（図１１のステップＣ４）。Here, the controller 5 uses aa [I,
I] = a [I, I] * c * c + 2 * a [I, J] * s *
c + a [J, J] * s * s and aa [J, J] = a
[I, I] * s * s-2 * a [I, J] * s * c + a
[J, J] * c * c are calculated and aa [I, J] = 0 is set (step C4 in FIG. 11).

【００４２】ｍ×ｍのクロスポイントスイッチ３では、
メモリユニット１ａ〜１ｍ側の入出力端子ｌと、浮動小
数点演算器Ａ２ａ〜２ｍ側の入出力端子（ｌ−｜ｉ−ｊ
｜）ｍｏｄｍとを接続する（図１１のステップＣ
５）。即ち、Ｉ＝１、Ｊ＝２の場合、スイッチコントロ
ーラは、メモリユニット（０）、（１）、（２）、
（３）、（４）が、浮動小数点演算器Ａ（４）、
（０）、（１）、（２）、（３）とそれぞれつながるよ
うに設定する。In the m × m cross point switch 3,
The input / output terminal l on the side of the memory units 1a to 1m and the input / output terminal (l- | i-j on the side of the floating point arithmetic units A2a to 2m)
|) Mod m is connected (step C in FIG. 11).
5). That is, when I = 1 and J = 2, the switch controller determines that the memory units (0), (1), (2),
(3) and (4) are floating point arithmetic units A (4),
Settings are made so that they are connected to (0), (1), (2), and (3), respectively.

【００４３】次に、各メモリコントローラは、Ｉ、Ｊか
ら読み出すべき要素を求める。ここで、ａ［ｉ，ｊ］を
読み出す場合、ｉ＞ｊならば、ａ［ｊ，ｉ］を読み出
す。これにより以下に示すようなデータ送信が行われる
ことになる。（１）メモリユニット（０）ａ［１，４］→浮動小数点演算器Ａ（０）：データ伝送路経由ａ［２，３］→浮動小数点演算器Ａ（４）：クロスポイントスイッチ経由（２）メモリユニット（１）ａ［０，１］→浮動小数点演算器Ａ（１）：データ伝送路経由ａ［２，４］→浮動小数点演算器Ａ（０）：クロスポイントスイッチ経由（３）メモリユニット（２）ａ［１，１］→浮動小数点演算器Ａ（２）：データ伝送路経由ａ［０，２］→浮動小数点演算器Ａ（１）：クロスポイントスイッチ経由（４）メモリユニット（３）ａ［１，２］→浮動小数点演算器Ａ（３）：データ伝送路経由ａ［１，２］→浮動小数点演算器Ａ（２）：クロスポイントスイッチ経由（５）メモリユニット（４）ａ［１，３］→浮動小数点演算器Ａ（４）：データ伝送路経由ａ［２，２］→浮動小数点演算器Ａ（３）：クロスポイントスイッチ経由そして、これらの要素を受けとった浮動小数点演算器
は、これらの要素とｓ、ｃから、それぞれ以下の演算を
行う（図１１のステップＣ６）。（１）浮動小数点演算器Ａ（０）ａ［１，４］＝ｃ＊ａ［１，４］＋ｓ＊ａ［２，４］ａ［２，４］＝−ｓ＊ａ［１，４］＋ｃ＊ａ［２，４] 計算された結果のうちａ［１，４］はメモリユニット
（０）に書き戻され、ａ［２，４］はスイッチを介して
メモリユニット（１）に書き戻される。（２）浮動小数点演算器Ａ（１）ａ［０，１］＝ｃ＊ａ［０，１］＋ｓ＊ａ［０，２］ａ［０，２］＝−ｓ＊ａ［０，１］＋ｃ＊ａ［０，２］計算された結果のうちａ［０，１］はメモリユニット
（１）に書き戻され、ａ［０，２］はスイッチを介して
メモリユニット（２）に書き戻される。（３）浮動小数点演算器Ａ（２）ａ［１，１］＝ｃ＊ａ［１，１］＋ｓ＊ａ［１，２］ａ［１，２］＝−ｓ＊ａ［１，１］＋ｃ＊ａ［１，２］計算された結果のうちａ［１，１］はメモリユニット
（２）に書き戻され、ａ［１，２］はスイッチを介して
メモリユニット（３）に書き戻される。（４）浮動小数点演算器Ａ（３）ａ［１，２］＝ｃ＊ａ［１，２］＋ｓ＊ａ［２，２］ａ［２，２］＝−ｓ＊ａ［１，２］＋ｃ＊ａ［２，２] 計算された結果のうちａ［１，２］はメモリユニット
（３）に書き戻され、ａ［２，２］はスイッチを介して
メモリユニット（４）に書き戻される。（５）浮動小数点演算器Ａ（４）ａ［１，３］＝ｃ＊ａ［１，３］＋ｓ＊ａ［２，３］ａ［２，３］＝−ｓ＊ａ［１，３］＋ｃ＊ａ［２，３］計算された結果のうちａ［１，３］はメモリユニット
（４）に書き戻され、ａ［２，３］はスイッチを介して
メモリユニット（０）に書き戻される。Next, each memory controller obtains an element to be read from I and J. Here, when reading a [i, j], if i> j, a [j, i] is read. As a result, the following data transmission is performed. (1) Memory unit (0) a [1,4] → floating point arithmetic unit A (0): via data transmission path a [2,3] → floating point arithmetic unit A (4): via crosspoint switch (2) ) Memory unit (1) a [0,1] → floating point arithmetic unit A (1): via data transmission line a [2,4] → floating point arithmetic unit A (0): via crosspoint switch (3) memory Unit (2) a [1,1] → floating point arithmetic unit A (2): via data transmission line a [0,2] → floating point arithmetic unit A (1): via crosspoint switch (4) Memory unit ( 3) a [1,2] → floating point arithmetic unit A (3): via data transmission line a [1,2] → floating point arithmetic unit A (2): via crosspoint switch (5) memory unit (4) a [1,3] → floating point arithmetic unit A (4): Via data transmission line a [2,2] → floating-point arithmetic unit A (3): via crosspoint switch Then, the floating-point arithmetic unit which receives these elements, from these elements and s and c, respectively, Calculation is performed (step C6 in FIG. 11). (1) Floating point arithmetic unit A (0) a [1,4] = c * a [1,4] + s * a [2,4] a [2,4] =-s * a [1,4] + C * a [2,4] Of the calculated results, a [1,4] is written back to the memory unit (0), and a [2,4] is written back to the memory unit (1) via the switch. Be done. (2) Floating point arithmetic unit A (1) a [0,1] = c * a [0,1] + s * a [0,2] a [0,2] =-s * a [0,1] + C * a [0,2] Of the calculated results, a [0,1] is written back to the memory unit (1), and a [0,2] is written back to the memory unit (2) via the switch. Be done. (3) Floating point arithmetic unit A (2) a [1,1] = c * a [1,1] + s * a [1,2] a [1,2] =-s * a [1,1] + C * a [1,2] Of the calculated results, a [1,1] is written back to the memory unit (2), and a [1,2] is written back to the memory unit (3) via the switch. Be done. (4) Floating point arithmetic unit A (3) a [1,2] = c * a [1,2] + s * a [2,2] a [2,2] =-s * a [1,2] + C * a [2,2] Of the calculated results, a [1,2] is written back to the memory unit (3), and a [2,2] is written back to the memory unit (4) via the switch. Be done. (5) Floating point arithmetic unit A (4) a [1,3] = c * a [1,3] + s * a [2,3] a [2,3] =-s * a [1,3] + C * a [2,3] Of the calculated results, a [1,3] is written back to the memory unit (4), and a [2,3] is written back to the memory unit (0) via the switch. Be done.

【００４４】そして、コントローラ５により計算され
た、ａａ［Ｉ，Ｉ］、ａａ［Ｉ，Ｊ］、ａａ［Ｊ，Ｊ］
をそれぞれａ［Ｉ，Ｉ］、ａ［Ｉ，Ｊ］、ａ［Ｊ，Ｊ］
とし、それらが格納されているメモリユニットに書き戻
す（図１１のステップＣ７）。Then, aa [I, I], aa [I, J], aa [J, J] calculated by the controller 5
A [I, I], a [I, J], a [J, J] respectively
And write them back to the memory unit in which they are stored (step C7 in FIG. 11).

【００４５】以上の処理が終了すると、再び先頭から処
理を繰り返す。これにより、粒度の小さい行列計算の実
行を高速かつ低コストで実現することができることにな
る。When the above process is completed, the process is repeated from the beginning. This makes it possible to execute matrix calculation with a small granularity at high speed and at low cost.

【００４６】[0046]

【発明の効果】以上詳記したように本発明の並列計算機
及び行列の格納方式によれば、ヤコビ法を用いた行列の
固有値計算等、粒度の小さい行列計算の実行を高速かつ
低コストで実現することが可能となる。As described in detail above, according to the parallel computer and the matrix storage method of the present invention, it is possible to realize matrix calculation with small granularity such as matrix eigenvalue calculation using the Jacobian method at high speed and at low cost. It becomes possible to do.

[Brief description of drawings]

【図１】本発明に係る並列計算機の概略構成を示す図。FIG. 1 is a diagram showing a schematic configuration of a parallel computer according to the present invention.

【図２】第１実施例に係るヤコビ法を用いた行列の固有
値計算を説明するためのフローチャート。FIG. 2 is a flowchart for explaining eigenvalue calculation of a matrix using the Jacobian method according to the first embodiment.

【図３】第１実施例で処理される行列を示す概念図。FIG. 3 is a conceptual diagram showing a matrix processed in the first embodiment.

【図４】第１実施例における各行列要素のメモリユニッ
トへの格納を示す概念図。FIG. 4 is a conceptual diagram showing storage of each matrix element in a memory unit in the first embodiment.

【図５】第１実施例における各行列要素のデータ送信を
示す概念図。FIG. 5 is a conceptual diagram showing data transmission of each matrix element in the first embodiment.

【図６】第２実施例におけるメモリボード、スイッチボ
ード及び浮動小数点演算器ボード各１枚を接続する例を
示した図。FIG. 6 is a diagram showing an example in which one memory board, one switch board, and one floating-point arithmetic unit board are connected in the second embodiment.

【図７】第２実施例におけるメモリボード、スイッチボ
ード及び浮動小数点演算器ボード各２枚を接続する例を
示した図。FIG. 7 is a diagram showing an example in which two memory boards, two switch boards and two floating point arithmetic unit boards are connected in the second embodiment.

【図８】第３実施例で処理される行列を示す概念図。FIG. 8 is a conceptual diagram showing a matrix processed in the third embodiment.

【図９】第３実施例における各行列要素のメモリユニッ
トへの格納を示す概念図。FIG. 9 is a conceptual diagram showing storage of each matrix element in a memory unit in the third embodiment.

【図１０】第３実施例における行列の格納方法を説明す
るためのフローチャート。FIG. 10 is a flowchart illustrating a method of storing a matrix in the third embodiment.

【図１１】第３実施例に係るヤコビ法を用いた行列の固
有値計算を説明するためのフローチャート。FIG. 11 is a flowchart for explaining eigenvalue calculation of a matrix using the Jacobian method according to the third embodiment.

[Explanation of symbols]

１ａ〜１ｍ…メモリユニット、２ａ〜２ｍ…浮動小数点
演算器Ａ、３…クロスポイントスイッチ、４…浮動小数
点演算器Ｂ、５…コントローラ、６…バス。1a to 1m ... Memory unit, 2a to 2m ... Floating point arithmetic unit A, 3 ... Crosspoint switch, 4 ... Floating point arithmetic unit B, 5 ... Controller, 6 ... Bus.

フロントページの続き (72)発明者清水達雄神奈川県川崎市幸区小向東芝町１番地株式会社東芝研究開発センター内 (72)発明者高野裕之神奈川県川崎市幸区小向東芝町１番地株式会社東芝研究開発センター内Front page continued (72) Inventor Tatsuo Shimizu 1 Komukai Toshiba-cho, Sachi-ku, Kawasaki-shi, Kanagawa Within the Corporate Research and Development Center, Toshiba Corporation (72) Hiroyuki Takano 1 Komukai-shiba-cho, Kawasaki-shi, Kanagawa Incorporated company Toshiba Research and Development Center

Claims

[Claims]

1. A parallel computer for calculating eigenvalues of a matrix, wherein the same number of memory units and floating-point arithmetic units are provided, wherein the memory units and the floating-point arithmetic units previously associated with each other are provided. A data transmission line to be connected, a cross point switch for connecting the memory unit and the floating point arithmetic unit in a pair according to a control signal input from the outside, and a control signal for driving and controlling the cross point switch are provided in the matrix. A parallel computer comprising means for generating a predetermined calculation from each element value.

2. A substrate n on which m memory units having a data I / O width of W bit width are mounted and a substrate n on which m floating point arithmetic units having a data I / O width of W bit width are mounted. Wm whose data input / output width is 1 bit
X When connecting with n Wm crosspoint switches,
Connect each crosspoint switch to each of the above memory unit and floating point arithmetic unit by W / n bit width so that all of these crosspoint switches become nm × nm crosspoint switches with W / n bit width. The parallel computer according to claim 1, which is characterized in that.

3. A parallel computer for calculating an eigenvalue of a matrix, wherein the parallel computer is provided with m memory units and floating point arithmetic units, and the matrix of the n × n matrix is calculated when the eigenvalue calculation is performed. When i + j <n + 2 is satisfied when the element in the i-th row and the j-th column of the upper triangular portion is stored in any of the above memory units, the element is stored in the (i + j) mod mth memory unit, and i + j> n + 1. When the above is true, the element is stored in the (i + jn) mod mth memory, and the matrix storage method is characterized.