JP2023115660A

JP2023115660A - Information processing apparatus, information processing system, addition and subtraction apparatus, multiplication apparatus, information processing method, and information processing program

Info

Publication number: JP2023115660A
Application number: JP2022018009A
Authority: JP
Inventors: 達史大塚; Tatsufumi Otsuka
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2022-02-08
Filing date: 2022-02-08
Publication date: 2023-08-21

Abstract

To provide an information processing apparatus, an information processing system, an addition and subtraction apparatus, a multiplication apparatus, an information processing method, and an information processing program capable of calculating a large-scale matrix at high speed.SOLUTION: An information processing apparatus 10 comprises: an acquisition unit 10a for acquiring addition and subtraction operation performance of one or more addition and subtraction apparatuses 2j that perform addition and subtraction and multiplication operation performance of one or more multiplication apparatuses 3i that perform multiplication; a calculation unit 10b for calculating a matrix size that makes the processing time for performing one submatrix multiplication equal to the processing time for performing submatrix addition and subtraction a predetermined number of times as a critical matrix size NC; and a selection unit 10c for selecting the number of times of divisions n in the matrix to be calculated using the calculated critical matrix size NC.SELECTED DRAWING: Figure 9

Description

本発明は、情報処理装置、情報処理システム、加減算装置、乗算装置、情報処理方法及び情報処理プログラムに関する。 The present invention relates to an information processing device, an information processing system, an addition/subtraction device, a multiplication device, an information processing method, and an information processing program.

特許文献１には、複数のプロセッサエレメント、制御装置、それらを接続する第１通信路、第１通信路とは別の隣接プロセッサエレメントを接続する第２通信路を持つ並列計算機が記載されている。特許文献１の並列計算機は、高価な半導体素子や複雑なネットワークを必要とせずに行列積計算を行う。 Patent Document 1 describes a parallel computer having a plurality of processor elements, a control device, a first communication path connecting them, and a second communication path connecting adjacent processor elements different from the first communication path. . The parallel computer of Patent Document 1 performs matrix multiplication calculations without requiring expensive semiconductor devices or complicated networks.

特許文献２には、本体部にアクセラレータ部が脱着可能な情報処理装置が記載されている。特許文献２の情報処理装置は、性能情報に応じて、本体側演算器及びアクセラレータ側演算器を駆動するための駆動電圧あるいは駆動周波数を設定する。 Patent Literature 2 describes an information processing device in which an accelerator section can be detachably attached to a main body section. The information processing apparatus of Patent Document 2 sets the drive voltage or drive frequency for driving the main unit side arithmetic unit and the accelerator side arithmetic unit according to the performance information.

特許文献３には、行列計算を行う乗算器及び加算器を含む積和演算回路が記載されている。特許文献３の積和演算回路において、乗算器は、行列Ａの行を分割した部分行ベクトルと行列Ｂの列を分割した部分列ベクトルとの乗算を並列に実行し、加算器は、乗算結果を加算する。 Patent Document 3 describes a sum-of-products operation circuit including multipliers and adders that perform matrix calculations. In the sum-of-products arithmetic circuit of Patent Document 3, the multiplier executes parallel multiplication of the partial row vector obtained by dividing the row of the matrix A and the partial column vector obtained by dividing the column of the matrix B, and the adder performs the multiplication result. is added.

特許文献４には、通信チャンネルを介して受信したデータ信号を処理する方法に、通信チャンネルの特徴を表現する複数の成分を備えた行列を反転させ、反転された行列を用いて処理することが記載されている。 Patent Document 4 discloses a method for processing data signals received over a communication channel, including inverting a matrix having a plurality of elements representing characteristics of the communication channel and processing using the inverted matrix. Are listed.

特開平０９－０６２６５６号公報Japanese Patent Application Laid-Open No. 09-062656 特開２００３－０１５７８５号公報JP-A-2003-015785 特開２００９－２４５３８１号公報JP 2009-245381 A 特表２００９－５２７１８２号公報Japanese Patent Publication No. 2009-527182

例えば、情報処理装置におけるデータ信号の処理に用いられる大規模行列を高速に計算することが所望されている。 For example, there is a demand for high-speed calculation of large-scale matrices used for processing data signals in information processing apparatuses.

本開示の目的は、上述した課題に鑑み、大規模行列を高速に計算することができる情報処理装置、情報処理システム、加減算装置、乗算装置、情報処理方法及び情報処理プログラムを提供することにある。 An object of the present disclosure is to provide an information processing device, an information processing system, an addition/subtraction device, a multiplication device, an information processing method, and an information processing program capable of calculating a large-scale matrix at high speed in view of the above-described problems. .

一実施の形態に係る情報処理装置は、加減算を行う１または複数の加減算装置の加減算演算性能、及び、乗算を行う１または複数の乗算装置の乗算演算性能を取得する取得部と、部分行列の乗算１回の処理時間と、前記部分行列の加減算を所定回数行う前記処理時間とが等しくなる行列サイズを臨界行列サイズとして算出する算出部と、算出した前記臨界行列サイズを用いて、計算対象である行列における分割回数を選定する選定部と、を備える。 An information processing apparatus according to an embodiment includes an acquisition unit that acquires addition/subtraction operation performance of one or more addition/subtraction devices that perform addition/subtraction and multiplication operation performance of one or more multiplication units that perform multiplication; A calculation unit that calculates a matrix size that makes the processing time for one multiplication equal to the processing time for performing addition and subtraction of the submatrix a predetermined number of times as a critical matrix size, and a calculation target using the calculated critical matrix size. a selection unit that selects the number of divisions in a given matrix.

一実施の形態に係る情報処理システムは、加減算を行う１または複数の加減算装置と、乗算を行う１または複数の乗算装置と、前記加減算装置及び前記乗算装置に接続された情報処理装置と、を備え、前記情報処理装置は、前記加減算装置の加減算演算性能、及び、前記乗算装置の乗算演算性能を取得する取得部と、部分行列の乗算１回の処理時間と、前記部分行列の加減算を所定回数行う前記処理時間とが等しくなる行列サイズを臨界行列サイズとして算出する算出部と、算出した前記臨界行列サイズを用いて、計算対象である行列における分割回数を選定する選定部と、を有する。 An information processing system according to an embodiment includes one or more adder/subtractor devices that perform addition and subtraction, one or more multiplier devices that perform multiplication, and an information processing device connected to the adder/subtractor device and the multiplier device. The information processing device includes an acquisition unit that acquires the addition/subtraction operation performance of the addition/subtraction device and the multiplication operation performance of the multiplication device; A calculation unit that calculates a matrix size equal to the processing time that is performed the number of times as a critical matrix size, and a selection unit that selects the number of divisions in the matrix to be calculated using the calculated critical matrix size.

一実施の形態に係る加減算装置は、加減算演算性能及び乗算を行う１または複数の乗算装置の乗算演算性能を取得する取得部と、部分行列の乗算１回の処理時間と、前記部分行列の加減算を所定回数行う前記処理時間とが等しくなる行列サイズを臨界行列サイズとして算出する算出部と、算出した前記臨界行列サイズを用いて、計算対象である行列における分割回数を選定する選定部と、を有する情報処理装置に接続され、加減算を行う前記加減算演算性能を有する。 An addition/subtraction device according to an embodiment includes an acquisition unit that acquires addition/subtraction operation performance and multiplication operation performance of one or more multiplication units that perform multiplication, a processing time for one multiplication of a submatrix, and addition/subtraction of the submatrix. a calculation unit that calculates, as a critical matrix size, a matrix size that equals the processing time of performing a predetermined number of times, and a selection unit that selects the number of divisions in the matrix to be calculated using the calculated critical matrix size. It is connected to an information processing device having the addition/subtraction operation performance to perform addition/subtraction.

一実施の形態に係る乗算装置は、乗算演算性能及び加減算を行う１または複数の加減算装置の加減算演算性能を取得する取得部と、部分行列の乗算１回の処理時間と、前記部分行列の加減算を所定回数行う前記処理時間とが等しくなる行列サイズを臨界行列サイズとして算出する算出部と、算出した前記臨界行列サイズを用いて、計算対象である行列における分割回数を選定する選定部と、を有する情報処理装置に接続され、乗算を行う前記乗算演算性能を有する。 A multiplication device according to an embodiment includes an acquisition unit that acquires multiplication operation performance and addition/subtraction operation performance of one or more addition/subtraction devices that perform addition/subtraction, a processing time for one submatrix multiplication, and addition/subtraction of the submatrix. a calculation unit that calculates, as a critical matrix size, a matrix size that equals the processing time of performing a predetermined number of times, and a selection unit that selects the number of divisions in the matrix to be calculated using the calculated critical matrix size. It is connected to an information processing device having the above-mentioned multiplication operation performance for performing multiplication.

一実施の形態に係る情報処理方法は、加減算を行う１または複数の加減算装置の加減算演算性能、及び、乗算を行う１または複数の乗算装置の乗算演算性能を取得させるステップと、部分行列の乗算１回の処理時間と、前記部分行列の加減算を所定回数行う前記処理時間とが等しくなる行列サイズを臨界行列サイズとして算出させるステップと、算出した前記臨界行列サイズを用いて、計算対象である行列における分割回数を選定させるステップと、を備える。 An information processing method according to an embodiment comprises a step of acquiring addition/subtraction operation performance of one or more addition/subtraction devices that perform addition/subtraction and a multiplication operation performance of one or more multiplication devices that perform multiplication; a step of calculating a matrix size that makes one processing time equal to the processing time for performing addition and subtraction of the submatrix a predetermined number of times as a critical matrix size; and using the calculated critical matrix size, the matrix to be calculated and selecting the number of divisions in.

一実施の形態に係る情報処理プログラムは、加減算を行う１または複数の加減算装置の加減算演算性能、及び、乗算を行う１または複数の乗算装置の乗算演算性能を取得させるステップと、部分行列の乗算１回の処理時間と、前記部分行列の加減算を所定回数行う前記処理時間とが等しくなる行列サイズを臨界行列サイズとして算出させるステップと、算出した前記臨界行列サイズを用いて、計算対象である行列における分割回数を選定させるステップと、をコンピュータに実行させる。 An information processing program according to an embodiment comprises steps of obtaining addition/subtraction operation performance of one or more addition/subtraction devices for performing addition/subtraction and multiplication operation performance of one or more multiplication devices for multiplication; a step of calculating a matrix size that makes one processing time equal to the processing time for performing addition and subtraction of the submatrix a predetermined number of times as a critical matrix size; and using the calculated critical matrix size, the matrix to be calculated and causing the computer to select the number of divisions in .

本開示によれば、大規模行列を高速に計算することができる情報処理装置、情報処理システム、加減算装置、乗算装置、情報処理方法及び情報処理プログラムを提供することができる。 According to the present disclosure, it is possible to provide an information processing device, an information processing system, an addition/subtraction device, a multiplication device, an information processing method, and an information processing program capable of calculating a large-scale matrix at high speed.

実施形態１に係る通常の行列の乗算式及びストラッセンアルゴリズムによる行列の乗算式を例示した図である。FIG. 4 is a diagram exemplifying a normal matrix multiplication formula and a matrix multiplication formula according to the Strassen algorithm according to the first embodiment; 実施形態１に係る通常の行列の乗算式及びストラッセンアルゴリズムによる行列の乗算式の関係を例示した図である。4 is a diagram illustrating the relationship between a normal matrix multiplication formula and a matrix multiplication formula according to the Strassen algorithm according to the first embodiment; FIG. 実施形態１に係るストラッセンアルゴリズムによる行列の乗算の分割回数の算出を例示した図である。FIG. 4 is a diagram illustrating calculation of the number of divisions of matrix multiplication by the Strassen algorithm according to the first embodiment; 実施形態１に係る情報処理システムを例示した構成図である。1 is a configuration diagram illustrating an information processing system according to a first embodiment; FIG. 実施形態１に係る情報処理システムにおいて、加減算管理装置を例示したブロック図である。2 is a block diagram illustrating an addition/subtraction management device in the information processing system according to the first embodiment; FIG. 実施形態１に係る情報処理システムにおいて、加減算装置を例示したブロック図である。3 is a block diagram illustrating an addition/subtraction device in the information processing system according to the first embodiment; FIG. 実施形態１に係る情報処理システムにおいて、乗算管理装置を例示したブロック図である。4 is a block diagram illustrating a multiplication management device in the information processing system according to the first embodiment; FIG. 実施形態１に係る情報処理システムにおいて、乗算装置を例示したブロック図である。2 is a block diagram illustrating a multiplier in the information processing system according to Embodiment 1; FIG. 実施形態１に係る情報処理システムにおいて、情報処理装置を例示したブロック図である。2 is a block diagram illustrating an information processing device in the information processing system according to the first embodiment; FIG. 実施形態１に係る情報処理方法を例示したフローチャート図である。2 is a flowchart illustrating an information processing method according to the first embodiment; FIG. 実施形態２に係る情報処理装置を例示したブロック図である。FIG. 11 is a block diagram illustrating an information processing device according to a second embodiment; 実施形態２に係る情報処理方法を例示したフローチャート図である。FIG. 11 is a flow chart diagram illustrating an information processing method according to the second embodiment;

以下、実施形態について、図面を参照しながら説明する。説明の明確化のため、以下の記載及び図面は、適宜、省略、及び簡略化がなされている。また、各図面において、同一の要素には同一の符号が付されており、必要に応じて重複説明は省略されている。 Hereinafter, embodiments will be described with reference to the drawings. For clarity of explanation, the following descriptions and drawings are omitted and simplified as appropriate. Moreover, in each drawing, the same elements are denoted by the same reference numerals, and redundant description is omitted as necessary.

（実施形態１）
実施形態１に係る情報処理システム及び情報処理装置を説明する。本実施形態の情報処理システム及び情報処理装置は、行列を分割するアルゴリズムを用いた大規模行列の乗算を高速に計算する。行列の乗算は、基本的な線形代数計算の一つであり、多くの科学技術分野の数値計算で重要な位置を占めている。科学技術分野の進歩に伴い、大きなサイズの行列の乗算を必要とする機会が増えている。大きなサイズの行列の乗算を高速化するアルゴリズムとして、例えば、ストラッセンアルゴリズムが挙げられる。 (Embodiment 1)
An information processing system and an information processing apparatus according to the first embodiment will be described. The information processing system and information processing apparatus according to the present embodiment perform high-speed multiplication of large-scale matrices using a matrix partitioning algorithm. Matrix multiplication is one of the basic linear algebra calculations, and it occupies an important position in numerical computation in many fields of science and technology. As the scientific and technological fields advance, the occasions where multiplication of matrices of large size is required are increasing. Algorithms that speed up multiplication of large-sized matrices include, for example, the Strassen algorithm.

ストラッセンアルゴリズムによる大規模行列の乗算を高速化することを目的として、複数の装置を使用することが考えられる。部分行列の各演算のデータ依存性を考慮した上での、部分行列の加減算あるいは乗算の並列計算を行うことにより、行列計算は、高速化される。乗算する行列が大規模になるにつれてメモリ要求・並列化のために複数のマシンを使用することが必要となる。 In order to speed up the multiplication of large matrices by the Strassen algorithm, it is conceivable to use multiple devices. Matrix calculation is sped up by parallel calculation of addition/subtraction or multiplication of submatrices while taking into account the data dependency of each operation of submatrices. As the matrices to be multiplied become large, it becomes necessary to use multiple machines for memory requirements and parallelization.

ストラッセンアルゴリズムの産業上の利用として、特許文献４のように、データ信号を処理する方法、データ処理部及びコンピュータプログラム製品があげられる。以下では、行列を分割するアルゴリズムを用いた大規模行列の乗算の一例として、＜ストラッセンアルゴリズム＞を説明する。そして、＜情報処理ステム＞を説明し、情報処理システムを構成する＜加減算管理装置＞、＜加減算装置＞、＜乗算管理装置＞、＜乗算装置＞及び＜情報処理装置＞を説明する。その後、＜情報処理方法＞を説明する。 Industrial applications of the Strassen algorithm include a method for processing data signals, a data processing unit, and a computer program product, as in Patent Document 4. Below, <Strassen algorithm> will be described as an example of multiplication of large-scale matrices using an algorithm for partitioning a matrix. Then, <information processing system> will be described, and <addition/subtraction management device>, <addition/subtraction device>, <multiplication management device>, <multiplication device>, and <information processing device> constituting the information processing system will be described. After that, <information processing method> will be described.

＜ストラッセンアルゴリズム＞
図１は、実施形態１に係る通常の行列の乗算式及びストラッセンアルゴリズムによる行列の乗算式を例示した図である。図２は、実施形態１に係る通常の行列の乗算式及びストラッセンアルゴリズムによる行列の乗算式の関係を例示した図である。図１及び図２に示すように、通常の行列の乗算では、部分行列の乗算は８回、部分行列の加減算は４回行われる。具体的には、下記に示すように、（１）式の行列Ａと行列Ｂとの乗算では、（２）～（５）式の各式に２回含まれる乗算（合計８回の乗算）、及び、（２）～（５）式の各式に１回含まれる加減算（合計４回の加減算）が行われる。 <Strassen Algorithm>
FIG. 1 is a diagram exemplifying a normal matrix multiplication formula and a matrix multiplication formula according to the Strassen algorithm according to the first embodiment. FIG. 2 is a diagram illustrating the relationship between a normal matrix multiplication formula and a matrix multiplication formula according to the Strassen algorithm according to the first embodiment. As shown in FIGS. 1 and 2, in normal matrix multiplication, submatrix multiplication is performed eight times and submatrix addition/subtraction is performed four times. Specifically, as shown below, in the multiplication of matrix A and matrix B in formula (1), the multiplication included twice in each of formulas (2) to (5) (8 multiplications in total) , and additions and subtractions included in each of the expressions (2) to (5) once (a total of four additions and subtractions) are performed.

Ｃ＝ＡＢ（１） C=AB (1)

Ｃ_１１＝Ａ_１１Ｂ_１１＋Ａ_１２Ｂ_２１（２）
Ｃ_１２＝Ａ_１１Ｂ_１２＋Ａ_１２Ｂ_２２（３）
Ｃ_２１＝Ａ_２１Ｂ_１１＋Ａ_２２Ｂ_２１（４）
Ｃ_２２＝Ａ_２１Ｂ_１２＋Ａ_２２Ｂ_２２（５） _C11 ₌ _A11B11 + _A12B21 ₍ 2)
_C12 ₌ _A11B12 + _A12B22 ₍ 3)
_C21 ₌ _A21B11 + _A22B21 (4 ₎
_C22 ₌ _A21B12 + _A22B22 ₍ 5)

１＝Ａ_２１＋Ａ_２２Ｓ_２＝Ｓ_１－Ａ_１１Ｓ_３＝Ａ_１１－Ａ_２１Ｓ_４＝Ａ_１２－Ａ_２１Ｓ_５＝Ｂ_１２－Ｂ_１１Ｓ_６＝Ｂ_２２－Ｓ_５Ｓ_７＝Ｂ_２２－Ｂ_１２Ｓ_８＝Ｓ_６－Ａ_２１Ｍ_１＝Ｓ_２Ｓ_６Ｍ_２＝Ａ_１１Ｂ_１１Ｍ_３＝Ａ_１２Ｂ_２１Ｍ_４＝Ｓ_３Ｓ_７Ｍ_５＝Ｓ_１Ｓ_５Ｍ_６＝Ｓ_４Ｂ_２２Ｍ_７＝Ａ_２２Ｓ_８Ｖ_１＝Ｍ_１＋Ｍ_２Ｖ_２＝Ｖ_１＋Ｍ_４Ｖ_３＝Ｍ_５＋Ｍ_６Ｃ_１１＝Ｍ_２＋Ｍ_３Ｃ_１２＝Ｖ_１＋Ｖ_３Ｃ_２１＝Ｖ_２－Ｍ_７Ｃ_２２＝Ｖ_２＋Ｍ_５ On the other hand, in the matrix multiplication > の乗算では、（１４）式～（２０）式の各式に１回含まれる乗算（合計７回の乗算）、及び、（６）式～（１３）式、（２１）～（２７）式の各式に１回含まれる加減算（合計１５回の加減算）が行われる。
（６）
（７）
（８）
（９）
（１０）
（１１）
（１２）
（１３）
（１４）
（１５）
（１６）
（１７）
（１８）
（１９）
（２０）
（２１）
（２２）
（２３）
（２４）
（２５）
（２６）
（２７） by the Strassen algorithm, the multiplication included once in each of the formulas (14) to (20) (7 multiplications in total), and the formulas (6) to (13), ( 21) to (27) include one addition/subtraction (total 15 additions/subtractions).
_S1 = _A21 + _A22 (6)
S ₂ =S ₁ -A ₁₁ (7)
S ₃ =A ₁₁ -A ₂₁ (8)
S ₄ = A ₁₂ - A ₂₁ (9)
_S5 = _B12 - _B11 (10)
S ₆ =B ₂₂ -S ₅ (11)
_S7 = _B22 - _B12 (12)
S ₈ =S ₆ -A ₂₁ (13)
_M1 = _S2S6 ₍ 14)
M ₂ =A ₁₁ B ₁₁ (15)
_M3 = _A12B21 (16 ₎
_M4 = _S3S7 ( ₁₇ )
M ₅ =S ₁ S ₅ (18)
_M6 = _S4B22 (19 ₎
_M7 = _A22S8 (20 ₎
_V1 = _M1 + _M2 (21)
_V2 = _V1 + _M4 (22)
_V3 = _M5 + _M6 (23)
_C11 = _M2 + _M3 (24)
_C12 = _V1 + _V3 (25)
C ₂₁ =V ₂ -M ₇ (26)
_C22 = _V2 + _M5 (27)

したがって、ストラッセンアルゴリズムによる行列の乗算は、通常の行列の乗算に比べて、部分行列の乗算の回数を、８回から７回に減少させる。一方、部分行列の加減算の回数を、４回から１５回に増加させる。このように、ストラッセンアルゴリズムによる乗算は、部分行列の乗算の回数を減少させる。 Therefore, matrix multiplication by the Strassen algorithm reduces the number of submatrix multiplications from 8 to 7 compared to normal matrix multiplication. On the other hand, the number of submatrix additions and subtractions is increased from 4 to 15 times. Thus, multiplication by the Strassen algorithm reduces the number of submatrix multiplications.

Ｎ行Ｎ列の正方行列の場合には、行列の乗算における数の演算回数は、Ｎ^３のオーダーで増加する。これに対して、行列の加減算における数の演算回数は、Ｎ^２のオーダーで増加する。したがって、行列のサイズが大きくなるほど、ストラッセンアルゴリズムは、通常の行列乗算よりも高速になる。 For a square matrix with N rows and N columns, the number of number operations in matrix multiplication increases by the order of ^N3 . On the other hand, the number of arithmetic operations in matrix addition/subtraction increases on the order of ^N2 . Therefore, the larger the size of the matrix, the faster the Strassen algorithm is than the normal matrix multiplication.

ストラッセンアルゴリズムは、分割統治法の一つである。図１及び上記（１４）式～（２０）式に示したＭ_１からＭ_７の部分行列の乗算に、再び、ストラッセンアルゴリズムを適用することができる。また、その部分行列、すなわち、孫となる部分行列の乗算に、再びストラッセンアルゴリズムを適用することができる。このように分割を繰り返していくことにより、さらなる高速化が可能になる。分割をｎ回行うと、ストラッセンアルゴリズムによる計算の、通常の行列の乗算に対する高速化率（ｓｐｅｅｄｕｐ、速度向上率とも呼ぶ。）は、近似的に（８／７）^ｎとなる。よって、分割回数ｎが大きくなるほど、高速化率は、増加する。ここで、８は、通常の行列乗算での部分行列の乗算の回数であり、７は、ストラッセンアルゴリズムにおける部分行列の乗算の回数である。 The Strassen algorithm is one of the divide-and-conquer methods. Again, the Strassen algorithm can be applied to the multiplication of submatrices M1 through _M7 shown in FIG. ₁ and equations (14)-(20) above. Also, the Strassen algorithm can be applied again to the multiplication of submatrices thereof, that is, submatrices that are grandchildren. By repeating the division in this way, it becomes possible to further increase the speed. After n divisions, the speed up of the Strassen algorithm computation over the normal matrix multiplication is approximately (8/7) ⁿ . Therefore, the speedup rate increases as the number of divisions n increases. where 8 is the number of submatrix multiplications in the normal matrix multiplication and 7 is the number of submatrix multiplications in the Strassen algorithm.

しかし、行列の分割を繰り返していくと、部分行列のサイズは小さくなっていく。よって、部分行列の乗算の回数を減らし加減算の回数を増やすストラッセンアルゴリズムの利点が消失する。したがって、分割回数をどこまでも増やすことはできない。 However, as the matrix division is repeated, the size of the submatrices becomes smaller. Therefore, the advantage of the Strassen algorithm that reduces the number of submatrix multiplications and increases the number of additions and subtractions is lost. Therefore, the number of divisions cannot be increased endlessly.

ストラッセンアルゴリズムによる大規模行列の乗算のさらなる高速化を目的として、複数のマシン（装置とも呼ぶ。）を使用することが考えられる。部分行列の加減算あるいは乗算を、複数のマシンを用いた並列計算を行うことにより、行列計算を高速化することができる。この場合に、図１及び図２で示した部分行列の各演算のデータ依存性を考慮した上で、各演算が適する複数のマシンで並列計算することが好ましい。 In order to further speed up the multiplication of large matrices by the Strassen algorithm, it is conceivable to use multiple machines (also called devices). Matrix calculation can be sped up by performing addition/subtraction or multiplication of submatrices in parallel using a plurality of machines. In this case, it is preferable to consider the data dependence of each operation of the submatrices shown in FIGS. 1 and 2 and perform parallel calculations on a plurality of machines suitable for each operation.

また、乗算をする行列が大規模化するにつれて、メモリ容量に対する要求が高くなることからも、複数のマシンで並列計算することが必要となる。しかし、このように、複数のマシンを利用した場合に、ストラッセンアルゴリズム固有の高速化法である分割の最適回数を予測する方法は、これまで存在しなかった。 In addition, as the matrices to be multiplied become larger in scale, the demand for memory capacity also increases, making it necessary to perform parallel calculations on a plurality of machines. However, until now there has been no method for predicting the optimal number of divisions, which is a speed-up method unique to the Strassen algorithm when multiple machines are used.

最適な分割回数を選定するためには、ストラッセンアルゴリズムの利点が消失する部分行列のサイズ、すなわち、ストラッセンアルゴリズムによる乗算の、通常の乗算に対する高速化率が１．０になる臨界行列サイズを求める必要がある。例えば、臨界行列サイズは、実測によって見出すことができる。しかしながら、複数のマシンを利用する場合には、非常に手間のかかる作業となる。特に、マシンの空き状況によって、毎回使用できるマシンが異なる場合には、臨界行列サイズは、その度に異なることになる。例えば、加減算の演算性能が高いマシンが多く、乗算の演算性能が高いマシンが少ない場合には、乗算のコストを減らすために分割回数を多くする必要がある。また、その逆の場合には、加減算のコストの増加を減らすために、分割回数を抑える必要がある。 In order to select the optimum number of divisions, the size of the submatrix where the advantage of the Strassen algorithm disappears, that is, the critical matrix size at which the speed-up ratio of the multiplication by the Strassen algorithm to the normal multiplication becomes 1.0 need to ask. For example, the critical matrix size can be found by empirical measurements. However, when using a plurality of machines, it becomes a very time-consuming task. In particular, when machines that can be used each time are different depending on machine availability, the critical matrix size will be different each time. For example, if there are many machines with high addition/subtraction calculation performance and few machines with high multiplication calculation performance, the number of divisions must be increased in order to reduce the cost of multiplication. In the opposite case, it is necessary to reduce the number of divisions in order to reduce the increase in the cost of addition and subtraction.

本実施形態は、演算性能の異なる複数のマシンを、例えば、ストラッセンアルゴリズムによる大規模行列の乗算に使用した場合の最適な分割回数を決める。これにより、大規模行列を高速に計算する。 This embodiment determines the optimum number of divisions when a plurality of machines with different computational performances are used, for example, for multiplication of large-scale matrices by the Strassen algorithm. This enables fast calculation of large-scale matrices.

前述したように、分割を繰り返していくと、部分行列のサイズは、次第に小さくなる。これにより、部分行列の乗算の回数を減らし、加減算の回数を増やすストラッセンアルゴリズムの利点が消失する。まず、この利点がちょうど消失する行列のサイズ、すなわち、ストラッセンアルゴリズムによる行列の乗算が、標準的な行列の乗算よりも高速となる臨界行列サイズＮ_Ｃを、使用するマシンの演算性能を用いて算出する。以下で、臨界行列サイズＮ_Ｃの選定方法を具体的に説明する。 As described above, the size of the submatrices gradually decreases as the division is repeated. This eliminates the advantage of the Strassen algorithm, which reduces the number of submatrix multiplications and increases the number of additions and subtractions. First, determine the matrix size at which this advantage just vanishes, i.e., the critical matrix size N _C at which matrix multiplication by the Strassen algorithm is faster than standard matrix multiplication, using the computational power of the machine used. calculate. A method for selecting the critical matrix size N _C will be specifically described below.

ストラッセンアルゴリズムによる行列の乗算は、標準的な行列の乗算に比べて、部分行列の乗算回数が１回少なくなり、加減算の回数が１１回多くなる。臨界行列サイズＮ_Ｃを、部分行列の乗算１回の処理時間と部分行列の加減算１１回の処理時間が等しくなる行列サイズとして選定する。 Matrix multiplication by the Strassen algorithm requires one less submatrix multiplication and 11 more additions and subtractions than standard matrix multiplication. The critical matrix size N _C is selected as the matrix size at which the processing time for one submatrix multiplication is equal to the processing time for 11 submatrix additions and subtractions.

そこで、まず、加減算及び乗算の処理時間を算出するために、加減算及び乗算に使用する演算装置の演算性能を取得する。その際に、加減算及び乗算のそれぞれに、演算装置をどのように並列に使用するか設定する。また、各演算装置が加減算または乗算を何回逐次計算するかを設定する。なお、加減算を行う演算装置を加減算装置と呼び、乗算を行う演算装置を乗算装置と呼ぶ。 Therefore, first, in order to calculate the processing time for addition/subtraction and multiplication, the calculation performance of the calculation unit used for addition/subtraction and multiplication is obtained. At that time, how to use arithmetic units in parallel is set for addition/subtraction and multiplication, respectively. It also sets how many additions/subtractions or multiplications each arithmetic unit performs. An arithmetic unit that performs addition and subtraction is called an addition/subtraction unit, and an arithmetic unit that performs multiplication is called a multiplication unit.

上記設定に基づいて、部分行列の乗算１回の処理時間、及び、部分行列の加減算１１回の処理時間を行列サイズＮの関数として表す。そして、部分行列の乗算１回の処理時間と部分行列の加減算１１回の処理時間とが等しくなるサイズを臨界行列サイズＮ_Ｃとして導き出す。算出に用いる式は、以下の（２８）式である。 Based on the above settings, the processing time for one submatrix multiplication and the processing time for 11 submatrix additions and subtractions are expressed as a function of the matrix size N. Then, the critical matrix size _NC is derived so that the processing time for one submatrix multiplication and the processing time for 11 submatrix additions and subtractions are equal. The formula used for the calculation is the following formula (28).

ここで、以下のとおりである。
Ｒ_ｍｉは、各乗算装置の乗算演算性能を示す。
Ｒ_ｓｊは、各加減算装置の加減算演算性能を示す。
ｘ_ｉ、ｘ_ｊは、各乗算装置及び各加減算装置の逐次演算回数を示す。

where:
_Rmi indicates the multiplication performance of each multiplier.
R _sj indicates the addition/subtraction operation performance of each addition/subtraction device.
x _i and x _j indicate the number of sequential operations of each multiplier and each adder/subtracter.

ストラッセンアルゴリズムによる乗算は、通常の乗算の場合に比べて、部分行列の乗算の回数が８から７に減少する。一方、部分行列の加減算の回数は１１回増加する。（２８）式の左辺は、行列の正方行列の乗算を８回から７回に減らしたことにより減少した計算時間である。（２８）式の右辺は、同じ行列の加減算を１１回増やしたことにより増加した計算時間である。左辺の（２Ｎ_Ｃ－１）Ｎ_Ｃ ^２は、Ｎ_Ｃ行Ｎ_Ｃ列の正方行列の乗算での数の演算回数である。右辺のＮ_Ｃ ^２は、加減算での数の演算回数である。（２８）式の両辺において、それらを使用する演算装置の演算性能（演算速度）で割っている。 Multiplication by the Strassen algorithm reduces the number of submatrix multiplications from 8 to 7 compared to normal multiplication. On the other hand, the number of submatrix additions and subtractions increases by 11 times. The left side of equation (28) is the reduced computation time due to the reduction of the square matrix multiplications of the matrix from 8 to 7 times. The right side of equation (28) is the computation time increased by increasing the number of additions and subtractions of the same matrix by 11 times. (2N _C −1)N _C ² on the left side is the number of operations in the multiplication of the square matrix of N _C rows and N _C columns. N _C ² on the right-hand side is the number of arithmetic operations in addition and subtraction. Both sides of the equation (28) are divided by the computation performance (computation speed) of the computation device using them.

行列の乗算あるいは加減算に使用する演算装置が並列で処理を行う場合には、それらの演算性能は、加算される。一つの演算装置が行列の乗算あるいは加減算を逐次でｘ回行う場合には、その演算性能は、１／ｘになる。（２８）式は、行列サイズＮ_Ｃでストラッセンアルゴリズムによる行列の乗算と通常の行列の乗算とが等しい速度になることを示している。 When the arithmetic units used for matrix multiplication or addition/subtraction operate in parallel, their arithmetic performance is added. When one arithmetic unit sequentially performs matrix multiplication or addition/subtraction x times, its arithmetic performance becomes 1/x. Equation (28) shows that matrix multiplication by the Strassen algorithm and normal matrix multiplication are equally fast for matrix size N _C .

次に、（２８）式で算出したＮ_Ｃを用いて、最も高速になる分割回数を、図３に示した方法で選定する。図３は、実施形態１に係るストラッセンアルゴリズムによる行列の乗算の分割回数の算出を例示した図である。図３に示すように、最初の行列がＮ行Ｎ列の場合、分割をｎ回行うと、部分行列のサイズは、Ｎ／２^ｎとなる。最適な分割回数は、Ｎ／２^ｎがＮ_Ｃ以上でＮ_Ｃに最も近くなるものである。このようにして、使用する演算装置の演算性能のデータが与えられる。それらの演算装置をストラッセンアルゴリズムのどの部分行列の乗算あるいはどの加減算に割り当てるか決めてもよい。これにより、それらの演算装置の組み合わせにとって最適な分割回数を選定することができる。 Next, using the _NC calculated by the equation (28), the number of divisions that provides the fastest speed is selected by the method shown in FIG. FIG. 3 is a diagram illustrating calculation of the number of divisions of matrix multiplication by the Strassen algorithm according to the first embodiment. As shown in FIG. 3, if the initial matrix has N rows and N columns, after n divisions, the size of the submatrix is N/ ²ⁿ . The optimal number of divisions is the one in which N/ ²ⁿ is greater than or equal to _Nc and is closest to _Nc . In this way, data on the computing performance of the computing device to be used is provided. You may decide to assign those arithmetic units to which submatrix multiplications or additions and subtractions of the Strassen algorithm. This makes it possible to select the optimum number of divisions for the combination of these arithmetic units.

＜情報処理システム＞
次に、大規模行列の乗算のための分割回数を選定する情報処理システムを説明する。図４は、実施形態１に係る情報処理システムを例示した構成図である。図４に示すように、情報処理システム１は、情報処理装置１０、加減算管理装置２０、１つまたは複数の加減算装置２１～２ｊ、乗算管理装置３０、１つまたは複数の乗算装置３１～３ｉを備えている。１つまたは複数の加減算装置２１～２ｊを総称して、加減算装置２ｊと呼ぶ。１つまたは複数の乗算装置３１～３ｉを総称して、乗算装置３ｉと呼ぶ。情報処理装置１０、加減算管理装置２０、加減算装置２ｊ、乗算管理装置３０、乗算装置３ｉは、それぞれ、情報処理手段、加減算管理手段、加減算手段、乗算管理手段、乗算手段としての機能を有している。 <Information processing system>
Next, an information processing system for selecting the number of divisions for multiplication of large matrices will be described. FIG. 4 is a configuration diagram illustrating an information processing system according to the first embodiment. As shown in FIG. 4, the information processing system 1 includes an information processing device 10, an addition/subtraction management device 20, one or more addition/subtraction devices 21 to 2j, a multiplication management device 30, and one or more multiplication devices 31 to 3i. I have. One or a plurality of adder/subtractors 21 to 2j are collectively referred to as adder/subtractor 2j. One or more multipliers 31-3i are collectively referred to as multipliers 3i. The information processing device 10, addition/subtraction management device 20, addition/subtraction device 2j, multiplication management device 30, and multiplication device 3i have functions as information processing means, addition/subtraction management means, addition/subtraction means, multiplication management means, and multiplication means, respectively. there is

情報処理装置１０は、加減算管理装置２０に情報伝達可能な通信回線で接続されている。加減算管理装置２０は、１つまたは複数の加減算装置２ｊに情報伝達可能な通信回線で接続されている。よって、情報処理装置１０は、１つまたは複数の加減算装置２ｊに加減算管理装置２０を介して接続されている。また、情報処理装置１０は、乗算管理装置３０に情報伝達可能な通信回線で接続されている。乗算管理装置３０は、１つまたは複数の乗算装置３ｉに情報伝達可能な通信回線で接続されている。よって、情報処理装置１０は、１つまたは複数の乗算装置３ｉに乗算管理装置３０を介して接続されている。以下で、各構成を説明する。 The information processing device 10 is connected to the addition/subtraction management device 20 via a communication line capable of transmitting information. The addition/subtraction management device 20 is connected to one or a plurality of addition/subtraction devices 2j via a communication line capable of transmitting information. Accordingly, the information processing device 10 is connected to one or more addition/subtraction devices 2j via the addition/subtraction management device 20. FIG. The information processing device 10 is also connected to the multiplication management device 30 via a communication line capable of transmitting information. The multiplication management device 30 is connected to one or more multiplication devices 3i via a communication line capable of transmitting information. Accordingly, the information processing device 10 is connected to one or more multipliers 3 i via the multiplication management device 30 . Each configuration will be described below.

＜加減算管理装置＞
図５は、実施形態１に係る情報処理システム１において、加減算管理装置２０を例示したブロック図である。図５に示すように、加減算管理装置２０は、取得部２０ａ、記憶部２０ｂ及び送信部２０ｃを備えている。取得部２０ａ、記憶部２０ｂ及び送信部２０ｃは、それぞれ、取得手段、記憶手段及び送信手段としての機能を有している。 <Addition/subtraction management device>
FIG. 5 is a block diagram illustrating the addition/subtraction management device 20 in the information processing system 1 according to the first embodiment. As shown in FIG. 5, the addition/subtraction management device 20 includes an acquisition unit 20a, a storage unit 20b, and a transmission unit 20c. The acquisition unit 20a, the storage unit 20b, and the transmission unit 20c function as acquisition means, storage means, and transmission means, respectively.

取得部２０ａは、各加減算装置２ｊの加減算演算性能を予め測定した結果を各加減算装置２ｊから取得する。記憶部２０ｂは、各加減算装置２ｊの加減算演算性能をリスト化した加減算リストファイルＬＦ２０を記憶する。加減算リストファイルＬＦ２０は、各加減算装置２ｊの番号、担当演算、演算の種類（乗算か加減算か）、演算性能が記録されている。送信部２０ｃは、各加減算装置２ｊの加減算演算性能及び演算結果を情報処理装置１０に送信する。 The acquisition unit 20a acquires the result of pre-measurement of the addition/subtraction operation performance of each addition/subtraction device 2j from each addition/subtraction device 2j. The storage unit 20b stores an addition/subtraction list file LF20 listing the addition/subtraction operation performance of each addition/subtraction device 2j. The addition/subtraction list file LF20 records the number of each addition/subtraction device 2j, the operation in charge, the type of operation (multiplication or addition/subtraction), and the operation performance. The transmission unit 20c transmits the addition/subtraction calculation performance and the calculation result of each addition/subtraction device 2j to the information processing device 10 .

＜加減算装置＞
図６は、実施形態１に係る情報処理システム１において、加減算装置２ｊを例示したブロック図である。図６に示すように、加減算装置２ｊは、取得部２ｊａ、演算部２ｊｂ、送信部２ｊｃを備えている。取得部２ｊａ、演算部２ｊｂ、送信部２ｊｃは、それぞれ、取得手段、演算手段、送信手段としての機能を有している。 <Addition/subtraction device>
FIG. 6 is a block diagram illustrating the addition/subtraction device 2j in the information processing system 1 according to the first embodiment. As shown in FIG. 6, the addition/subtraction device 2j includes an acquisition unit 2ja, a calculation unit 2jb, and a transmission unit 2jc. The acquisition unit 2ja, the calculation unit 2jb, and the transmission unit 2jc have functions as acquisition means, calculation means, and transmission means, respectively.

取得部２ｊａは、情報処理装置１０から加減算管理装置２０を介して演算対象の部分行列を取得する。演算部２ｊｂは、部分行列の加減算を行う。送信部２ｊｃは、加減算の演算性能及び加減算の演算結果を、加減算管理装置２０を介して情報処理装置１０に送信する。 The acquisition unit 2ja acquires a submatrix to be operated from the information processing device 10 via the addition/subtraction management device 20 . The calculation unit 2jb performs addition and subtraction of submatrices. The transmission unit 2 jc transmits the addition/subtraction operation performance and the addition/subtraction operation result to the information processing apparatus 10 via the addition/subtraction management apparatus 20 .

加減算装置２ｊは、加減算演算性能に優れたものが好ましい。一例として、加減算管理装置２０に、１個の加減算装置２１が接続されてもよい。この場合には、加減算装置２１は、図２及び（６）式～（１３）式、（２１）～（２７）式に示したＳ_１～Ｓ_８、Ｖ_１～Ｖ_３及びＣ_１１～Ｃ_２２の計算を割り当てられる。なお、加減算管理装置２０に接続される加減算装置２ｊの個数は、１個に限らず、複数でもよい。また、加減算装置２ｊは、演算対象の部分行列を、加減算管理装置２０を介さずに、直接、情報処理装置１０から取得してもよい。加減算装置２ｊは、加減算の演算性能及び加減算の演算結果を、加減算管理装置２０を介さずに直接、情報処理装置１０に送信してもよい。 The addition/subtraction device 2j preferably has excellent addition/subtraction operation performance. As an example, one addition/subtraction device 21 may be connected to the addition/subtraction management device 20 . In this case, the adder/subtractor 21 uses S ₁ to S ₈ , V ₁ to V ₃ and C ₁₁ to C shown in FIGS. Allocated ₂₂ calculations. The number of addition/subtraction devices 2j connected to the addition/subtraction management device 20 is not limited to one, and may be plural. Further, the addition/subtraction device 2 j may acquire the submatrix to be operated on directly from the information processing device 10 without going through the addition/subtraction management device 20 . The addition/subtraction device 2 j may directly transmit the addition/subtraction calculation performance and the addition/subtraction calculation result to the information processing device 10 without going through the addition/subtraction management device 20 .

＜乗算管理装置＞
図７は、実施形態１に係る情報処理システム１において、乗算管理装置３０を例示したブロック図である。図７に示すように、乗算管理装置３０は、取得部３０ａ、記憶部３０ｂ及び送信部３０ｃを備えている。取得部３０ａ、記憶部３０ｂ及び送信部３０ｃは、それぞれ、取得手段、記憶手段及び送信手段としての機能を有している。 <Multiplication management device>
FIG. 7 is a block diagram illustrating the multiplication management device 30 in the information processing system 1 according to the first embodiment. As shown in FIG. 7, the multiplication management device 30 includes an acquisition section 30a, a storage section 30b, and a transmission section 30c. The acquisition unit 30a, the storage unit 30b, and the transmission unit 30c function as acquisition means, storage means, and transmission means, respectively.

取得部３０ａは、各乗算装置３ｉの乗算演算性能を予め測定した結果を各乗算装置３ｉから取得する。記憶部３０ｂは、各乗算装置３ｉの乗算演算性能をリスト化した乗算リストファイルＬＦ３０を記憶する。乗算リストファイルＬＦ３０は、各乗算装置３ｉの番号、担当演算、演算の種類（乗算か加減算か）、演算性能が記録されている。送信部３０ｃは、各乗算装置３ｉの乗算演算性能及び演算結果を情報処理装置１０に送信する。 The acquisition unit 30a acquires the result of pre-measurement of the multiplication operation performance of each multiplication device 3i from each multiplication device 3i. The storage unit 30b stores a multiplication list file LF30 listing the multiplication operation performance of each multiplication device 3i. The multiplication list file LF30 records the number of each multiplier 3i, the operation in charge, the type of operation (multiplication or addition/subtraction), and the operation performance. The transmission unit 30c transmits the multiplication operation performance and the operation result of each multiplication device 3i to the information processing device 10. FIG.

＜乗算装置＞
図８は、実施形態１に係る情報処理システム１において、乗算装置３ｉを例示したブロック図である。図８に示すように、乗算装置３ｉは、取得部３ｉａ、演算部３ｉｂ、送信部３ｉｃを備えている。取得部３ｉａ、演算部３ｉｂ、送信部３ｉｃは、それぞれ、取得手段、演算手段、送信手段としての機能を有している。 <Multiplication device>
FIG. 8 is a block diagram illustrating the multiplier 3i in the information processing system 1 according to the first embodiment. As shown in FIG. 8, the multiplication device 3i includes an acquisition section 3ia, a calculation section 3ib, and a transmission section 3ic. The acquisition unit 3ia, the calculation unit 3ib, and the transmission unit 3ic have functions as acquisition means, calculation means, and transmission means, respectively.

取得部３ｉａは、情報処理装置１０から乗算管理装置３０を介して演算対象の部分行列を取得する。演算部３ｉｂは、部分行列の乗算を行う。送信部３ｉｃは、乗算の演算能力及び乗算の演算結果を乗算管理装置３０を介して情報処理装置１０に送信する。 The acquisition unit 3ia acquires a submatrix to be operated on from the information processing device 10 via the multiplication management device 30 . The calculation unit 3ib multiplies submatrices. The transmission unit 3 ic transmits the computing power of multiplication and the computation result of multiplication to the information processing device 10 via the multiplication management device 30 .

乗算装置３ｉは、乗算演算性能に優れたものが好ましい。一例として、乗算管理装置３０に、７個の乗算装置３１～３７が接続されてもよい。この場合には、乗算装置３１～３７は、図２に示したＭ_１～Ｍ_７の計算を割り当てられる。なお、乗算管理装置３０に接続される乗算装置３ｉの個数は、７個に限らない。また、乗算装置３ｉは、演算対象の部分行列を、乗算管理装置３０を介さずに、直接、情報処理装置１０から取得してもよい。乗算装置３ｉは、乗算の演算性能及び乗算の演算結果を、乗算管理装置３０を介さずに直接、情報処理装置１０に送信してもよい。 The multiplier 3i preferably has excellent multiplication operation performance. As an example, the multiplication management device 30 may be connected to seven multiplication devices 31-37. In this case, the multipliers 31-37 are assigned the computations of M ₁ -M ₇ shown in FIG. The number of multipliers 3i connected to the multiplication management device 30 is not limited to seven. Further, the multiplication device 3 i may acquire the submatrix to be operated on directly from the information processing device 10 without going through the multiplication management device 30 . The multiplication device 3 i may directly transmit the computation performance of the multiplication and the computation result of the multiplication to the information processing device 10 without going through the multiplication management device 30 .

＜情報処理装置＞
図９は、実施形態１に係る情報処理システム１において、情報処理装置１０を例示したブロック図である。図９に示すように、情報処理装置１０は、取得部１０ａ、算出部１０ｂ及び選定部１０ｃを備えている。取得部１０ａ、算出部１０ｂ及び選定部１０ｃは、それぞれ、取得手段、算出手段及び選定手段としての機能を有している。 <Information processing device>
FIG. 9 is a block diagram illustrating the information processing device 10 in the information processing system 1 according to the first embodiment. As shown in FIG. 9, the information processing apparatus 10 includes an acquisition unit 10a, a calculation unit 10b, and a selection unit 10c. The acquisition unit 10a, the calculation unit 10b, and the selection unit 10c function as acquisition means, calculation means, and selection means, respectively.

取得部１０ａは、加減算を行う１または複数の加減算装置２ｊの加減算演算性能、及び、乗算を行う１または複数の乗算装置３ｉの乗算演算性能を取得する。具体的には、取得部１０ａは、加減算装置２ｊ及び乗算装置３ｉの演算性能を、加減算管理装置２０のリストファイルＬＦ２０及び乗算管理装置３０のリストファイルＬＦ３０から取得する。なお、取得部１０ａは、加減算装置２ｊ及び乗算装置３ｉの演算性能を、加減算装置２ｊ及び乗算装置３ｉから直接取得してもよい。 The acquisition unit 10a acquires the addition/subtraction operation performance of one or a plurality of addition/subtraction devices 2j that perform addition/subtraction and the multiplication operation performance of one or a plurality of multiplication devices 3i that perform multiplication. Specifically, the acquisition unit 10 a acquires the arithmetic performance of the addition/subtraction device 2 j and the multiplication device 3 i from the list file LF20 of the addition/subtraction management device 20 and the list file LF30 of the multiplication management device 30 . The obtaining unit 10a may directly obtain the arithmetic performance of the adder/subtractor 2j and the multiplier 3i from the adder/subtractor 2j and the multiplier 3i.

算出部１０ｂは、部分行列の乗算１回の処理時間と、部分行列の加減算を所定回数行う処理時間とが略等しくなる行列サイズを臨界行列サイズＮ_Ｃとして算出する。具体的には、算出部１０ｂは、取得した加減算装置２ｊの加減算演算性能及び乗算装置３ｉの乗算演算性能を用いて、上述した（２８）式を用いて臨界行列サイズＮ_Ｃを算出する。 The calculation unit 10b calculates the critical matrix size _NC as the matrix size at which the processing time for one multiplication of the submatrices and the processing time for performing addition and subtraction of the submatrices a predetermined number of times are substantially equal. Specifically, the calculating unit 10b calculates the critical matrix size _NC using the above-described equation (28) using the obtained addition/subtraction operation performance of the addition/subtraction device 2j and the multiplication operation performance of the multiplication device 3i.

選定部１０ｃは、算出した臨界行列サイズＮ_Ｃから、計算対象である行列における分割回数を選定する。例えば、選定部１０ｃは、計算対象であるＮ行Ｎ列の行列における分割回数をｎとした場合に、分割回数ｎを、Ｎ／（２^ｎ）が、臨界行列サイズＮ_Ｃ以上で、臨界行列サイズＮ_Ｃに最も近くなるような分割回数ｎから選定する。 The selection unit 10c selects the number of divisions in the matrix to be calculated from the calculated critical matrix size N _C . For example, when the number of divisions in the matrix of N rows and N columns to be calculated is n, the selection unit 10c sets the number of divisions n as N/(2 ⁿ ) is the critical matrix size N _C or more, and the critical matrix The number of divisions n that is closest to the size _NC is selected.

選定部１０ｃは、分割回数ｎを選定する場合に、上限を設定してもよい。臨界行列サイズＮ_Ｃまで分割した場合に、加減算装置２ｊ及び乗算装置３ｉ等の演算装置が行う計算は、速度向上率が１．０倍となる計算を含む場合がある。この場合には、加減算装置２ｊ及び乗算装置３ｉは、無駄な計算を行うことになる。そこで、速度向上率に閾値を設けることで、演算装置が余計な計算をしないようしてもよい。例えば、速度向上率の閾値として、１．０１倍と設定し、それ以下の速度向上率になってしまう場合には、分割回数を増やさないようにしてもよい。このようにして、選定部１０ｃは、速度向上率に基づいて、分割回数ｎを選定してもよい。 The selection unit 10c may set an upper limit when selecting the number of divisions n. When the matrix is divided up to the critical matrix size N _C , calculations performed by arithmetic units such as the addition/subtraction unit 2j and the multiplication unit 3i may include calculations with a speed improvement rate of 1.0 times. In this case, the addition/subtraction device 2j and the multiplication device 3i perform useless calculations. Therefore, by setting a threshold value for the speed improvement rate, the calculation device may be prevented from performing unnecessary calculations. For example, the speed improvement rate threshold may be set to 1.01 times, and if the speed improvement rate is lower than that, the number of divisions may not be increased. In this manner, the selection unit 10c may select the number of divisions n based on the speed improvement rate.

上述した情報処理装置１０、加減算管理装置２０、加減算装置２ｊ、乗算管理装置３０、乗算装置３ｉは、例えば、サーバ装置、パーソナルコンピュータ等のコンピュータを含む情報処理装置である。これらの装置は、それぞれ、制御部、通信部、記憶部及びインターフェース部を有している。制御部、通信部、記憶部及びインターフェース部は、それぞれ、制御手段、通信手段、記憶手段及びインターフェース手段としての機能を有している。 The information processing device 10, the addition/subtraction management device 20, the addition/subtraction device 2j, the multiplication management device 30, and the multiplication device 3i described above are, for example, information processing devices including computers such as server devices and personal computers. These devices each have a control section, a communication section, a storage section and an interface section. The control unit, communication unit, storage unit, and interface unit function as control means, communication means, storage means, and interface means, respectively.

制御部は、例えば、ＣＰＵ（Central Processing Unit）、ＭＰＵ（Micro Processing Unit）、ＥＣＵ（Electronic Control Unit）、ＦＰＧＡ（Field-Programmable Gate Array）やＡＳＩＣ（Application Specific Integrated Circuit）等のプロセッサを含む。制御部は、制御処理及び演算処理等を行う演算装置としての機能を有する。また、制御部は、通信部、記憶部、インターフェース部及び各装置の機能を実行するための各構成要素の動作を制御する。 The control unit includes, for example, processors such as a CPU (Central Processing Unit), MPU (Micro Processing Unit), ECU (Electronic Control Unit), FPGA (Field-Programmable Gate Array), and ASIC (Application Specific Integrated Circuit). The control unit has a function as an arithmetic device that performs control processing, arithmetic processing, and the like. Also, the control unit controls the operation of each component for executing the functions of the communication unit, the storage unit, the interface unit, and each device.

各装置の各構成要素は、例えば、制御部の制御によって、プログラムを実行させることによって実現できる。より具体的には、各構成要素は、記憶部に格納されたプログラムを、制御部が実行することによって実現され得る。また、必要なプログラムを任意の不揮発性記録媒体に記録しておき、必要に応じてインストールすることで、各構成要素を実現するようにしてもよい。また、各構成要素は、プログラムによるソフトウェアで実現することに限ることなく、ハードウェア、ファームウェア、及びソフトウェアのうちのいずれかの組み合わせ等により実現してもよい。 Each component of each device can be implemented by, for example, executing a program under the control of the control unit. More specifically, each component can be implemented by the control unit executing a program stored in the storage unit. Further, each component may be realized by recording necessary programs in an arbitrary non-volatile recording medium and installing them as necessary. Moreover, each component may be implemented by any combination of hardware, firmware, and software, without being limited to being implemented by program software.

通信部は、各装置が情報処理を行う上で必要な通信を行う。記憶部は、例えば、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）又はＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）等である。記憶部は、制御部によって実行される制御プログラム及び演算プログラム等を記憶するための機能を有する。また、記憶部は、処理データ等を一時的に記憶するための機能を有する。 The communication unit performs communication necessary for each device to perform information processing. The storage unit is, for example, ROM (Read Only Memory) or RAM (Random Access Memory). The storage unit has a function of storing a control program, a calculation program, and the like executed by the control unit. Also, the storage unit has a function of temporarily storing processing data and the like.

インターフェース部は、例えば、ユーザインターフェース（ＵｓｅｒＩｎｔｅｒｆａｃｅ）である。インターフェース部は、キーボード、タッチパネル又はマウス等の入力装置と、ディスプレイ又はスピーカ等の出力装置とを有する。インターフェース部は、ユーザ（オペレータ等）によるデータの入力の操作を受け付け、ユーザに対して情報を出力する。 The interface unit is, for example, a user interface. The interface unit has an input device such as a keyboard, touch panel, or mouse, and an output device such as a display or speaker. The interface unit receives a data input operation by a user (operator or the like) and outputs information to the user.

＜具体例Ｉ＞
以下で、最適な分割回数ｎを導く数値計算の具体例Ｉ及び具体例ＩＩを示す。まず、具体例Ｉを説明する。例えば、行列の乗算用の装置として、乗算装置３１及び３２の２種類を考える。各乗算装置３１及び３２の演算速度を以下の（２９）式及び（３０）式と仮定する。 <Specific example I>
Specific example I and specific example II of numerical calculation for deriving the optimum number of divisions n are shown below. First, specific example I will be described. For example, consider two types of multiplication devices 31 and 32 as devices for matrix multiplication. Assume that the operation speed of each multiplier 31 and 32 is given by the following equations (29) and (30).

Ｒ_ｍ１＝７．８×１０^１２ＦＬＯＰＳ（２９）
Ｒ_ｍ２＝７．８×１０^１２ＦＬＯＰＳ（３０） R _m1 =7.8×10 ¹² FLOPS (29)
R _m2 =7.8×10 ¹² FLOPS (30)

例えば、行列の加減算用の装置として、加減算装置２１の１種類を考える。加減算装置２１の演算速度を以下の（３１）と仮定する。 For example, consider one type of adder/subtractor 21 as a device for adding and subtracting matrices. Assume that the calculation speed of the adder/subtractor 21 is the following (31).

Ｒ_ｓ１＝３．７５×１０^１０ＦＬＯＰＳ（３１） R _s1 =3.75×10 ¹⁰ FLOPS (31)

これらの値は、所定の演算速度（例えば、ＮＶＩＤＡ・ＧＰＵ・Ｖ１００及びＰ１００）を参考にしている。また、ストラッセンアルゴリズムを８０００行８０００列の正方行列の乗算に適用する場合を考える。 These values refer to a given computing speed (eg, NVIDA GPU V100 and P100). Also, consider the application of the Strassen algorithm to multiplication of square matrices of 8000 rows and 8000 columns.

各乗算装置３１及び３２を部分行列の乗算に用い、加減算装置２１を部分行列の加減算に用いる。行列の標準的な乗算では８個の部分行列の乗算がある。乗算装置３１及び３２は、並列にそれぞれ４個及び３個の乗算を逐次計算する。その結果、臨界行列サイズＮ_Ｃは次の（３２）式で求められる。 Multiplication devices 31 and 32 are used for multiplication of submatrices, and addition/subtraction device 21 is used for addition/subtraction of submatrices. In standard matrix multiplication there are eight submatrix multiplications. Multiplication units 31 and 32 serially compute four and three multiplications, respectively, in parallel. As a result, the critical matrix size N _C is obtained by the following equation (32).

（８／（Ｒ_ｍ１／４＋Ｒ_ｍ２／４）－７／（Ｒ_ｍ１／４＋Ｒ_ｍ２／３））（２Ｎ_Ｃ－１）
＝１１／Ｒ_ｓ１（３２） (8/(R _m1 /4+R _m2 /4)−7/(R _m1 /4+R _m2 /3)) (2N _C −1)
=11/R _s1 (32)

上記式から臨界行列サイズＮ_Ｃ＝２６３が求められる。８０００×（１／２^４）＝５００＞臨界行列サイズＮ_Ｃである。この臨界行列サイズＮ_Ｃ以上において、Ｎ＝８０００の行列は、４回分割可能である。最適分割回数は、４回となる。高速化率は、近似的に、以下の（３３）式となる。 A critical matrix size N _C =263 is obtained from the above equation. 8000×(1/2 ⁴ )=500>critical matrix size N _C . Above this critical matrix size N _C , a matrix of N=8000 can be divided 4 times. The optimum number of divisions is four. The speed-up rate is approximately given by the following equation (33).

（８／７）^４＝１．７１（３３） (8/7) ⁴ = 1.71 (33)

ただし、この高速化率は、使用する演算装置の演算速度を考慮したものである。他の因子、例えば、メモリの確保と開放や装置間でのデータの転送にかかる時間などの影響を無視している。 However, this speed-up rate takes into consideration the computing speed of the computing device to be used. It ignores the effects of other factors, such as the time it takes to allocate and free memory and transfer data between devices.

＜具定例ＩＩ＞
次に、具体例ＩＩを説明する。例えば、行列の乗算用の装置として、乗算装置３１～３４を考える。乗算装置３１及び３２の演算速度をＲ_ｍ１、乗算装置３３及び３４の演算速度をＲ_ｍ２と仮定する。また、行列の加減算用の装置として、加減算装置２１の演算速度をＲ_ｓ１と仮定する。ストラッセンアルゴリズムを８０００行８０００列の正方行列の乗算に適用する場合を考える。 <Specific example II>
Next, specific example II will be described. For example, consider the multipliers 31-34 as devices for matrix multiplication. Assume that the operation speed of multipliers 31 and 32 is R _m1 and that of multipliers 33 and 34 is R _m2 . Also, as a device for matrix addition and subtraction, the operation speed of the adder/subtractor 21 is assumed to be R _s1 . Consider the application of the Strassen algorithm to the multiplication of square 8000-by-8000 matrices.

各乗算装置３１～３４を部分行列の乗算に用い、加減算装置２１を部分行列の加減算に用いる。臨界行列サイズＮ_Ｃは次の（３４）式で求められる。 Multiplication devices 31 to 34 are used for multiplication of submatrices, and addition/subtraction device 21 is used for addition/subtraction of submatrices. The critical matrix size N _C is obtained by the following equation (34).

（８／（Ｒ_ｍ１／２＋Ｒ_ｍ１／２＋Ｒ_ｍ２／２＋Ｒ_ｍ２／２）－７／（Ｒ_ｍ１／２＋Ｒ_ｍ１／２＋Ｒ_ｍ２／２＋Ｒ_ｍ２／２））（２Ｎ_Ｃ－１）
＝１１／Ｒ_ｓ１（３４） (8/(R _m1 /2+R _m1 /2+R _m2 /2+R _m2 /2)−7/(R _m1 /2+R _m1 /2+R _m2 /2+R _m2 /2))(2N _C −1)
=11/R _s1 (34)

上記式から臨界行列サイズＮ_Ｃ＝８６６が求められる。この臨界行列サイズＮ_Ｃ以上において、Ｎ＝８０００の行列は、３回分割可能である。最適分割回数は、３回となる。高速化率は、近似的に、以下の（３５）式となる。 A critical matrix size N _C =866 is obtained from the above equation. Above this critical matrix size N _C , an N=8000 matrix can be divided three times. The optimal number of divisions is three. The speed-up rate is approximately given by the following equation (35).

（８／７）^３＝１．４７（３５） (8/7) ³ = 1.47 (35)

乗算装置３ｉを具体例Ｉより多く使用する一方で、高速化率が低下している。この理由は、標準的な乗算に同様のより多くの乗算装置３ｉを使用したためである。計算時間は、具体例ＩＩの方が、具体例Ｉに比べ短くなっている。 While using more multipliers 3i than in Example I, the speedup ratio is reduced. The reason for this is the use of similar more multipliers 3i for standard multiplication. Calculation time is shorter in Example II than in Example I.

＜情報処理方法＞
次に、情報処理装置１０を用いた情報処理方法を説明する。図１０は、実施形態１に係る情報処理方法を例示したフローチャート図である。図１０に示すように、情報処理方法は、取得ステップＳＴＥＰ１１、算出ステップＳＴＥＰ１２及び選定ステップＳＴＥＰ１３を備えている。 <Information processing method>
Next, an information processing method using the information processing apparatus 10 will be described. FIG. 10 is a flowchart illustrating an information processing method according to the first embodiment. As shown in FIG. 10, the information processing method includes an acquisition step STEP11, a calculation step STEP12, and a selection step STEP13.

まず、取得ステップＳＴＥＰ１１において、加減算を行う１または複数の加減算装置２ｊの加減算演算性能、及び、乗算を行う１または複数の乗算装置３ｉの乗算演算性能を取得する。具体的には、情報処理装置１０の取得部１０ａに、加減算装置２ｊ及び乗算装置３ｉの演算性能を、加減算管理装置２０のリストファイルＬＦ２０及び乗算管理装置３０のリストファイルＬＦ３０から取得させる。 First, in acquisition step STEP11, the addition/subtraction operation performance of one or a plurality of addition/subtraction devices 2j and the multiplication operation performance of one or a plurality of multiplication devices 3i are acquired. Specifically, the acquisition unit 10a of the information processing device 10 acquires the arithmetic performance of the addition/subtraction device 2j and the multiplication device 3i from the list file LF20 of the addition/subtraction management device 20 and the list file LF30 of the multiplication management device 30. FIG.

次に、算出ステップＳＴＥＰ１２において、部分行列の乗算１回の処理時間と、部分行列の加減算を所定回数行う処理時間とが等しくなる行列サイズを臨界行列サイズＮ_Ｃとして算出する。例えば、算出部１０ｂに、所定回数を１１回とした場合に、上述した（２８）式により、臨界行列サイズであるＮ_Ｃを算出させてもよい。 Next, in the calculation step STEP12, the critical matrix size _NC is calculated as the matrix size at which the processing time for one multiplication of the submatrix is equal to the processing time for performing addition and subtraction of the submatrix a predetermined number of times. For example, when the predetermined number of times is 11, the calculation unit 10b may be caused to calculate _NC , which is the critical matrix size, using the above-described equation (28).

次に、選定ステップＳＴＥＰ１３において、算出した臨界行列サイズＮ_Ｃを用いて、計算対象である行列における分割回数を選定する。具体的には、選定部１０ｃに、臨界行列サイズＮ_Ｃを用いて、計算対象のＮ行Ｎ列の行列における分割回数ｎを、Ｎ／（２^ｎ）が、臨界行列サイズＮ_Ｃ以上で、臨界行列サイズＮ_Ｃに最も近い分割回数ｎから選定させてもよい。さらに、選定部１０ｃに、速度向上率に基づいて、分割回数ｎを選定させてもよい。このようにして、大規模行列を高速に計算するための情報処理を行う。 Next, in the selection step STEP13, the calculated critical matrix size N _C is used to select the number of divisions in the matrix to be calculated. Specifically, using the critical matrix size N _C in the selection unit 10c, the number of divisions n in the N-row N-column matrix to be calculated is N/(2 ⁿ ) is the critical matrix size N _C or more, The number of divisions n closest to the critical matrix size N _C may be selected. Furthermore, the selection unit 10c may select the number of divisions n based on the speed improvement rate. In this manner, information processing is performed for high-speed calculation of large-scale matrices.

次に、本実施形態の効果を説明する。本実施形態の情報処理装置１０は、ストラッセンアルゴリズムによる大規模行列の乗算に、加減算装置２ｊ及び乗算装置３ｉを用いる。そして、その際に、各装置の演算性能に基づいて、ストラッセンアルゴリズムの分割回数ｎを決める。これにより、大規模行列の乗算に最適な状態で加減算装置２ｊ及び乗算装置３ｉを利用することができる。よって、大規模行列を高速に計算することができる。例えば、複数の演算装置を必要とする大規模行列の乗算において、担当演算装置の空き状況によって、最適な分割回数ｎを採択することができる。よって、大規模の行列乗算を高速化することができ、数値計算の効率を向上させることができる。 Next, the effects of this embodiment will be described. The information processing apparatus 10 of the present embodiment uses the addition/subtraction device 2j and the multiplication device 3i for multiplication of large-scale matrices by the Strassen algorithm. At that time, the number of divisions n of the Strassen algorithm is determined based on the arithmetic performance of each device. As a result, the addition/subtraction unit 2j and the multiplication unit 3i can be used in an optimum state for multiplication of large-scale matrices. Therefore, large-scale matrices can be calculated at high speed. For example, in the multiplication of a large-scale matrix that requires a plurality of arithmetic units, the optimum number of divisions n can be adopted depending on the availability of the arithmetic units in charge. Therefore, the speed of large-scale matrix multiplication can be increased, and the efficiency of numerical calculation can be improved.

なお、上述した部分行列に分割するアルゴリズムは、ストラッセンアルゴリズムに限らない。部分行列に分割することによって、加減算及び乗算の回数が通常の行列の加減算及び乗算の回数と変われば、他のアルゴリズムでもよい。その場合には、部分行列に分割する際に、通常の行列の乗法に比べて、加減算の増減と乗算の増減との関係から、部分行列の乗算１回の処理時間と、部分行列の加減算を所定回数行う処理時間とが等しくなる行列サイズを臨界行列サイズＮ_Ｃとして算出してもよい。 Note that the above-described algorithm for dividing into submatrices is not limited to the Strassen algorithm. Other algorithms may be used as long as the number of additions/subtractions and multiplications is changed from the number of additions/subtractions/multiplications of a normal matrix by dividing the matrix into submatrices. In that case, when dividing into submatrices, compared to normal matrix multiplication, the processing time for one submatrix multiplication and the addition and subtraction of submatrices can be reduced due to the relationship between the increase and decrease of addition and subtraction and the increase and decrease of multiplication. The critical matrix size _NC may be calculated as the matrix size that takes the same processing time as the predetermined number of times.

（実施形態２）
次に、実施形態２を説明する。図１１は、実施形態２に係る情報処理装置を例示したブロック図である。図１１に示すように、本実施形態の情報処理装置４０は、さらに、分割部１０ｄ、転送部１０ｅ、判断部１０ｆ及び統合部１０ｇを備えている。分割部１０ｄ、転送部１０ｅ、判断部１０ｆ及び統合部１０ｇは、それぞれ、分割手段、転送手段、判断手段及び統合手段としての機能を有している。 (Embodiment 2)
Next, Embodiment 2 will be described. FIG. 11 is a block diagram illustrating an information processing apparatus according to the second embodiment; As shown in FIG. 11, the information processing apparatus 40 of this embodiment further includes a division unit 10d, a transfer unit 10e, a determination unit 10f, and an integration unit 10g. The division unit 10d, the transfer unit 10e, the judgment unit 10f, and the integration unit 10g function as division means, transfer means, judgment means, and integration means, respectively.

分割部１０ｄは、行列を部分行列に分割する。転送部１０ｅは、分割された部分行列を乗算装置３ｊ及び加減算装置２ｉに転送する。判断部１０ｆは、再分割するかどうか判断する。統合部１０ｇは、乗算装置３ｊ及び加減算装置２ｉの計算結果を統合する。 The dividing unit 10d divides the matrix into submatrices. The transfer unit 10e transfers the divided submatrices to the multiplication device 3j and the addition/subtraction device 2i. The determination unit 10f determines whether to re-divide. The integration unit 10g integrates the calculation results of the multiplier 3j and the adder/subtractor 2i.

次に、情報処理装置１０を用いた情報処理方法を説明する。図１２は、実施形態２に係る情報処理方法を例示したフローチャート図である。図１２に示すように、本実施形態の情報処理方法は、取得ステップＳＴＥＰ２１、算出ステップＳＴＥＰ２２及び選定ステップＳＴＥＰ２３の他に、分割ステップＳＴＥＰ２４、転送ステップＳＴＥＰ２５、演算結果の取得ステップＳＴＥＰ２６、判断ステップＳＴＥＰ２７、演算結果の取得ステップＳＴＥＰ２８、及び、統合ステップＳＴＥＰ２９を備えている。 Next, an information processing method using the information processing apparatus 10 will be described. FIG. 12 is a flowchart illustrating an information processing method according to the second embodiment. As shown in FIG. 12, the information processing method of this embodiment includes, in addition to an acquisition step STEP21, a calculation step STEP22, and a selection step STEP23, a division step STEP24, a transfer step STEP25, a calculation result acquisition step STEP26, a judgment step STEP27, It has a calculation result acquisition step STEP28 and an integration step STEP29.

まず、取得ステップＳＴＥＰ２１において、取得ステップＳＴＥＰ１１と同様に、取得部１０ａに、加減算装置２ｊの加減演算性能、及び、乗算装置３ｉの乗算演算性能を取得させる。 First, in acquisition step STEP21, the acquisition unit 10a acquires the addition/subtraction operation performance of the addition/subtraction device 2j and the multiplication operation performance of the multiplication device 3i, similarly to the acquisition step STEP11.

次に、算出ステップＳＴＥＰ２２において、算出ステップＳＴＥＰ１２と同様に、算出部１０ｂに、部分行列の乗算１回の処理時間と、部分行列の加減算を所定回数行う処理時間とが等しくなる行列サイズを臨界行列サイズＮ_Ｃとして算出させる。 Next, in the calculation step STEP22, similarly to the calculation step STEP12, the calculation unit 10b is provided with a matrix size that makes the processing time for one multiplication of the submatrix equal to the processing time for performing addition and subtraction of the submatrix a predetermined number of times. Let it be calculated as size _NC .

次に、選定ステップＳＴＥＰ２３において、選定ステップＳＴＥＰ１３と同様に、選定部１０ｃに、算出した臨界行列サイズＮ_Ｃを用いて、計算対象である行列における分割回数ｎを選定させる。上述したように、速度向上率に閾値を設定し、分割回数に上限を設けてもよい。 Next, in the selection step STEP23, similarly to the selection step STEP13, the selection unit 10c uses the calculated critical matrix size _NC to select the number of divisions n in the matrix to be calculated. As described above, a threshold may be set for the speed improvement rate and an upper limit may be set for the number of divisions.

次に、分割ステップＳＴＥＰ２４において、分割部１０ｄに計算対象の行列を部分行列に分割させる。分割部１０ｄに、分割した分割回数を記憶させてもよい。 Next, in a dividing step STEP24, the dividing unit 10d divides the matrix to be calculated into submatrices. The division number of divisions may be stored in the division unit 10d.

次に、転送ステップＳＴＥＰ２５において、転送部１０ｅに、分割された部分行列を、加減算装置２ｊ及び乗算装置３ｉに転送させる。部分行列を転送された各演算装置は、分割可能なＭ_１～Ｍ_７の準備するためのＳ_１～Ｓ_８の計算を行う。 Next, in a transfer step STEP25, the transfer unit 10e is caused to transfer the divided submatrices to the addition/subtraction device 2j and the multiplication device 3i. Each arithmetic unit to which the submatrices are transferred performs calculations S ₁ to S ₈ for preparing divisible M ₁ to M ₇ .

次に、演算結果の取得ステップＳＴＥＰ２６において、取得部１０ａに、演算結果を取得させる。 Next, in a calculation result acquisition step STEP26, the acquisition unit 10a acquires the calculation result.

次に、判断ステップＳＴＥＰ２７において、判断部１０ｆに再分割を行うか判断させる。判断部１０ｆは、選定した分割回数ｎと記憶された分割回数とを比較して判断する。判断部１０ｆが再分割を行うと判断したＹＥＳの場合には、ステップＳＴＥＰ２４に戻り、ステップＳＴＥＰ２４～ステップＳＴＥＰ２７を繰り返す。 Next, in determination step STEP27, the determination unit 10f is caused to determine whether to re-divide. The determination unit 10f makes a determination by comparing the selected number of divisions n with the stored number of divisions. If the determination unit 10f determines YES to re-divide, the process returns to step STEP24, and steps STEP24 to STEP27 are repeated.

判断ステップＳＴＥＰ２７において、判断部１０ｆが再分割を行わないと判断したＮＯの場合には、ステップＳＴＥＰ２８において、取得部１０ａに、演算結果を取得させる。演算結果は、各演算装置が行ったＭ_１～Ｍ_７、Ｖ_１～Ｖ_３、Ｃ_１１～Ｃ_２２までの計算結果である。 In the judgment step STEP27, when the judgment unit 10f judges that the division is not to be performed again and NO, in step STEP28, the acquisition unit 10a acquires the calculation result. The computation results are the computation results of M ₁ to M ₇ , V ₁ to V ₃ , and C ₁₁ to C ₂₂ performed by each computation device.

次に、統合ステップＳＴＥＰ２９において、統合部１０ｇに、加減算装置２ｊ及び乗算装置３ｉの計算結果を統合させる。具体的には、統合部１０ｇに、算出された部分行列の計算結果Ｃ１１～Ｃ２２を統合させる。このようにして、大規模行列を高速に計算するための情報処理を行う。 Next, in an integration step STEP29, the integration section 10g integrates the calculation results of the addition/subtraction device 2j and the multiplication device 3i. Specifically, the integration unit 10g integrates the calculation results C11 to C22 of the calculated submatrices. In this manner, information processing is performed for high-speed calculation of large-scale matrices.

次に、本実施形態の効果を説明する。本実施形態では、情報処理装置４０が選定した分割回数ｎで計算対象の行列を部分行列に分割する。そして、分割された部分行列を加減算装置２ｊ及び乗算装置３ｉに計算させる。よって、大規模行列を高速に計算することができる。これ以外の構成及び効果は、実施形態１の記載に含まれている。 Next, the effects of this embodiment will be described. In this embodiment, the matrix to be calculated is divided into submatrices by the number of divisions n selected by the information processing apparatus 40 . Then, the divided submatrices are calculated by the addition/subtraction device 2j and the multiplication device 3i. Therefore, large-scale matrices can be calculated at high speed. Configurations and effects other than this are included in the description of the first embodiment.

なお、本発明は上記実施の形態に限られたものではなく、趣旨を逸脱しない範囲で適宜変更することが可能である。例えば、実施形態１及び２の各構成を組み合わせたものも実施形態の技術的思想の範囲に含まれる。また、情報処理方法をコンピュータに実行させる情報処理プログラムも実施形態の技術的思想の範囲に含まれる。 It should be noted that the present invention is not limited to the above embodiments, and can be modified as appropriate without departing from the scope of the invention. For example, a combination of the configurations of Embodiments 1 and 2 is also included within the scope of the technical idea of the embodiment. An information processing program that causes a computer to execute an information processing method is also included in the scope of the technical ideas of the embodiments.

上記の実施形態の一部又は全部は、以下の付記のようにも記載されうるが、以下には限られない。 Some or all of the above-described embodiments can also be described in the following supplementary remarks, but are not limited to the following.

（付記Ａ１）
加減算を行う１または複数の加減算装置と、
乗算を行う１または複数の乗算装置と、
前記加減算装置及び前記乗算装置に接続された情報処理装置と、
を備え、
前記情報処理装置は、
前記加減算装置の加減算演算性能、及び、前記乗算装置の乗算演算性能を取得する取得部と、
部分行列の乗算１回の処理時間と、前記部分行列の加減算を所定回数行う前記処理時間とが等しくなる行列サイズを臨界行列サイズとして算出する算出部と、
算出した前記臨界行列サイズを用いて、計算対象である行列における分割回数を選定する選定部と、
を有する、
情報処理システム。
（付記Ａ２）
前記算出部は、前記所定回数を１１回とした場合に、下記の（Ａ）式により、前記臨界行列サイズであるＮ_Ｃを算出する、

ここで、
Ｒ_ｍｉは、各乗算装置の前記乗算演算性能を示し、
Ｒ_ｓｊは、各加減算装置の前記加減算演算性能を示し、
ｘ_ｉ、ｘ_ｊは、各乗算装置及び各加減算装置の逐次演算回数を示す、
付記Ａ１に記載の情報処理システム。
（付記Ａ３）
前記選定部は、算出した前記臨界行列サイズであるＮ_Ｃを用いて、前記計算対象であるＮ行Ｎ列の前記行列における前記分割回数であるｎを、Ｎ／（２^ｎ）が前記Ｎ_Ｃ以上で、前記Ｎ_Ｃに最も近い前記ｎから選定する、
付記Ａ１またはＡ２に記載の情報処理システム。
（付記Ａ４）
前記選定部は、速度向上率に基づいて、前記ｎを選定する、
付記Ａ３に記載の情報処理システム。
（付記Ａ５）
前記情報処理装置は、
前記行列を前記部分行列に分割する分割部と、
分割された前記部分行列を前記加減算装置及び前記乗算装置に転送する転送部と、
前記加減算装置及び前記乗算装置の計算結果を統合する統合部と、
をさらに有する、
付記Ａ１～Ａ４のいずれか１項に記載の情報処理システム。
（付記Ｂ１）
加減算演算性能及び乗算を行う１または複数の乗算装置の乗算演算性能を取得する取得部と、
部分行列の乗算１回の処理時間と、前記部分行列の加減算を所定回数行う前記処理時間とが等しくなる行列サイズを臨界行列サイズとして算出する算出部と、
算出した前記臨界行列サイズを用いて、計算対象である行列における分割回数を選定する選定部と、
を有する情報処理装置に接続され、
加減算を行う前記加減算演算性能を有する、
加減算装置。
（付記Ｂ２）
前記算出部は、前記所定回数を１１回とした場合に、下記の（Ａ）式により、前記臨界行列サイズであるＮ_Ｃを算出する、

ここで、
Ｒ_ｍｉは、各乗算装置の前記乗算演算性能を示し、
Ｒ_ｓｊは、各加減算装置の前記加減算演算性能を示し、
ｘ_ｉ、ｘ_ｊは、各乗算装置及び各加減算装置の逐次演算回数を示す、
付記Ｂ１に記載の加減算装置。
（付記Ｂ３）
前記選定部は、算出した前記臨界行列サイズであるＮ_Ｃを用いて、前記計算対象であるＮ行Ｎ列の前記行列における前記分割回数であるｎを、Ｎ／（２^ｎ）が前記Ｎ_Ｃ以上で、前記Ｎ_Ｃに最も近い前記ｎから選定する、
付記Ｂ１またはＢ２に記載の加減算装置。
（付記Ｂ４）
前記選定部は、速度向上率に基づいて、前記ｎを選定する、
付記Ｂ３に記載の加減算装置。
（付記Ｂ５）
前記情報処理装置は、
前記行列を前記部分行列に分割する分割部と、
分割された前記部分行列を前記加減算装置及び前記乗算装置に転送する転送部と、
前記加減算装置及び前記乗算装置の計算結果を統合する統合部と、
をさらに有する、
付記Ｂ１～Ｂ４のいずれか１項に記載の加減算装置。
（付記Ｃ１）
乗算演算性能及び加減算を行う１または複数の加減算装置の加減算演算性能を取得する取得部と、
部分行列の乗算１回の処理時間と、前記部分行列の加減算を所定回数行う前記処理時間とが等しくなる行列サイズを臨界行列サイズとして算出する算出部と、
算出した前記臨界行列サイズを用いて、計算対象である行列における分割回数を選定する選定部と、
を有する情報処理装置に接続され、
乗算を行う前記乗算演算性能を有する、
乗算装置。
（付記Ｃ２）
前記算出部は、前記所定回数を１１回とした場合に、下記の（Ａ）式により、前記臨界行列サイズであるＮ_Ｃを算出する、

ここで、
Ｒ_ｍｉは、各乗算装置の前記乗算演算性能を示し、
Ｒ_ｓｊは、各加減算装置の前記加減算演算性能を示し、
ｘ_ｉ、ｘ_ｊは、各乗算装置及び各加減算装置の逐次演算回数を示す、
付記Ｃ１に記載の乗算装置。
（付記Ｃ３）
前記選定部は、算出した前記臨界行列サイズであるＮ_Ｃを用いて、前記計算対象であるＮ行Ｎ列の前記行列における前記分割回数であるｎを、Ｎ／（２^ｎ）が前記Ｎ_Ｃ以上で、前記Ｎ_Ｃに最も近い前記ｎから選定する、
付記Ｃ１またはＣ２に記載の乗算装置。
（付記Ｃ４）
前記選定部は、速度向上率に基づいて、前記ｎを選定する、
付記Ｃ３に記載の乗算装置。
（付記Ｃ５）
前記情報処理装置は、
前記行列を前記部分行列に分割する分割部と、
分割された前記部分行列を前記加減算装置及び前記乗算装置に転送する転送部と、
前記加減算装置及び前記乗算装置の計算結果を統合する統合部と、
をさらに有する、
付記Ｃ１～Ｃ４のいずれか１項に記載の乗算装置。
（付記Ｄ１）
加減算を行う１または複数の加減算装置の加減算演算性能、及び、乗算を行う１または複数の乗算装置の乗算演算性能を取得させるステップと、
部分行列の乗算１回の処理時間と、前記部分行列の加減算を所定回数行う前記処理時間とが等しくなる行列サイズを臨界行列サイズとして算出させるステップと、
算出した前記臨界行列サイズを用いて、計算対象である行列における分割回数を選定させるステップと、
を備えた情報処理方法。
（付記Ｄ２）
前記算出させるステップにおいて、
前記所定回数を１１回とした場合に、下記の（Ａ）式により、前記臨界行列サイズであるＮ_Ｃを算出させる、

ここで、
Ｒ_ｍｉは、各乗算装置の前記乗算演算性能を示し、
Ｒ_ｓｊは、各加減算装置の前記加減算演算性能を示し、
ｘ_ｉ、ｘ_ｊは、各乗算装置及び各加減算装置の逐次演算回数を示す、
付記Ｄ１に記載の情報処理方法。
（付記Ｄ３）
前記選定させるステップにおいて、
算出した前記臨界行列サイズであるＮ_Ｃを用いて、前記計算対象であるＮ行Ｎ列の前記行列における前記分割回数であるｎを、Ｎ／（２^ｎ）が前記Ｎ_Ｃ以上で、前記Ｎ_Ｃに最も近い前記ｎから選定させる、
付記Ｄ１またはＤ２に記載の情報処理方法。
（付記Ｄ４）
前記選定させるステップにおいて、
速度向上率に基づいて、前記ｎを選定させる、
付記Ｄ３に記載の情報処理方法。
（付記Ｄ５）
前記行列を前記部分行列に分割させるステップと、
分割された前記部分行列を前記加減算装置及び前記乗算装置に転送させるステップと、
前記加減算装置及び前記乗算装置の計算結果を統合させるステップと、
をさらに備えた、
付記Ｄ１～Ｄ４のいずれか１項に記載の情報処理方法。
（付記Ｅ１）
加減算を行う１または複数の加減算装置の加減算演算性能、及び、乗算を行う１または複数の乗算装置の乗算演算性能を取得させるステップと、
部分行列の乗算１回の処理時間と、前記部分行列の加減算を所定回数行う前記処理時間とが等しくなる行列サイズを臨界行列サイズとして算出させるステップと、
算出した前記臨界行列サイズを用いて、計算対象である行列における分割回数を選定させるステップと、
をコンピュータに実行させる情報処理プログラム。
（付記Ｅ２）
前記算出させるステップにおいて、
前記所定回数を１１回とした場合に、下記の（Ａ）式により、前記臨界行列サイズであるＮ_Ｃを算出させる、

ここで、
Ｒ_ｍｉは、各乗算装置の前記乗算演算性能を示し、
Ｒ_ｓｊは、各加減算装置の前記加減算演算性能を示し、
ｘ_ｉ、ｘ_ｊは、各乗算装置及び各加減算装置の逐次演算回数を示す、
付記Ｅ１に記載の情報処理プログラム。
（付記Ｅ３）
前記選定させるステップにおいて、
算出した前記臨界行列サイズであるＮ_Ｃを用いて、前記計算対象であるＮ行Ｎ列の前記行列における前記分割回数であるｎを、Ｎ／（２^ｎ）が前記Ｎ_Ｃ以上で、前記Ｎ_Ｃに最も近い前記ｎから選定させる、
付記Ｅ１またはＥ２に記載の情報処理プログラム。
（付記Ｅ４）
前記選定させるステップにおいて、
速度向上率に基づいて、前記ｎを選定させる、
付記Ｅ３に記載の情報処理プログラム。
（付記Ｅ５）
前記行列を前記部分行列に分割させるステップと、
分割された前記部分行列を前記加減算装置及び前記乗算装置に転送させるステップと、
前記加減算装置及び前記乗算装置の計算結果を統合させるステップと、
をさらにコンピュータに実行させる付記Ｅ１～Ｅ２のいずれか１項に記載の情報処理プログラム。 (Appendix A1)
one or more addition/subtraction devices that perform addition/subtraction;
one or more multipliers for multiplication;
an information processing device connected to the addition/subtraction device and the multiplication device;
with
The information processing device is
an acquisition unit that acquires the addition/subtraction operation performance of the addition/subtraction device and the multiplication operation performance of the multiplication device;
a calculation unit that calculates, as a critical matrix size, a matrix size that makes the processing time for one multiplication of a submatrix equal to the processing time for performing addition and subtraction of the submatrix a predetermined number of times;
a selection unit that selects the number of divisions in a matrix to be calculated using the calculated critical matrix size;
having
Information processing system.
(Appendix A2)
When the predetermined number of times is 11, the calculation unit calculates _NC , which is the critical matrix size, according to the following formula (A):

here,
R _mi indicates the multiplication operation performance of each multiplier,
R _sj indicates the addition/subtraction operation performance of each addition/subtraction device,
x _i and x _j indicate the number of sequential operations of each multiplier and each adder/subtractor;
The information processing system according to appendix A1.
(Appendix A3)
The selection unit uses the calculated critical matrix size _NC to determine n, which is the number of divisions in the matrix of N rows and N columns, which is the calculation target, so that N/(2 ⁿ ) is the _NC above, select from the n closest to the N _C ,
The information processing system according to appendix A1 or A2.
(Appendix A4)
The selection unit selects the n based on the speed improvement rate.
The information processing system according to appendix A3.
(Appendix A5)
The information processing device is
a dividing unit that divides the matrix into the submatrices;
a transfer unit that transfers the divided submatrices to the addition/subtraction device and the multiplication device;
an integration unit that integrates the calculation results of the addition/subtraction device and the multiplication device;
further having
The information processing system according to any one of Appendices A1 to A4.
(Appendix B1)
an acquisition unit that acquires addition/subtraction operation performance and multiplication operation performance of one or more multiplication devices that perform multiplication;
a calculation unit that calculates, as a critical matrix size, a matrix size that makes the processing time for one multiplication of a submatrix equal to the processing time for performing addition and subtraction of the submatrix a predetermined number of times;
a selection unit that selects the number of divisions in a matrix to be calculated using the calculated critical matrix size;
connected to an information processing device having
Having the addition and subtraction operation performance to perform addition and subtraction,
Adder/Subtractor.
(Appendix B2)
When the predetermined number of times is 11, the calculation unit calculates _NC , which is the critical matrix size, according to the following formula (A):

here,
R _mi indicates the multiplication operation performance of each multiplier,
R _sj indicates the addition/subtraction operation performance of each addition/subtraction device,
x _i and x _j indicate the number of sequential operations of each multiplier and each adder/subtractor;
An addition/subtraction device according to Appendix B1.
(Appendix B3)
The selection unit uses the calculated critical matrix size _NC to determine n, which is the number of divisions in the matrix of N rows and N columns, which is the calculation target, so that N/(2 ⁿ ) is the _NC above, select from the n closest to the N _C ,
An addition/subtraction device according to Appendix B1 or B2.
(Appendix B4)
The selection unit selects the n based on the speed improvement rate.
An addition/subtraction device according to Appendix B3.
(Appendix B5)
The information processing device is
a dividing unit that divides the matrix into the submatrices;
a transfer unit that transfers the divided submatrices to the addition/subtraction device and the multiplication device;
an integration unit that integrates the calculation results of the addition/subtraction device and the multiplication device;
further having
An addition/subtraction device according to any one of Appendixes B1 to B4.
(Appendix C1)
an acquisition unit that acquires multiplication operation performance and addition/subtraction operation performance of one or more addition/subtraction devices that perform addition/subtraction;
a calculation unit that calculates, as a critical matrix size, a matrix size that makes the processing time for one multiplication of a submatrix equal to the processing time for performing addition and subtraction of the submatrix a predetermined number of times;
a selection unit that selects the number of divisions in a matrix to be calculated using the calculated critical matrix size;
connected to an information processing device having
Having the multiplication operation performance to perform multiplication,
multiplier.
(Appendix C2)
When the predetermined number of times is 11, the calculation unit calculates _NC , which is the critical matrix size, according to the following formula (A):

here,
R _mi indicates the multiplication operation performance of each multiplier,
R _sj indicates the addition/subtraction operation performance of each addition/subtraction device,
x _i and x _j indicate the number of sequential operations of each multiplier and each adder/subtractor;
Multiplication device according to Appendix C1.
(Appendix C3)
The selection unit uses the calculated critical matrix size _NC to determine n, which is the number of divisions in the matrix of N rows and N columns, which is the calculation target, so that N/(2 ⁿ ) is the _NC above, select from the n closest to the N _C ,
Multiplication device according to Appendix C1 or C2.
(Appendix C4)
The selection unit selects the n based on the speed improvement rate.
Multiplication device according to Appendix C3.
(Appendix C5)
The information processing device is
a dividing unit that divides the matrix into the submatrices;
a transfer unit that transfers the divided submatrices to the addition/subtraction device and the multiplication device;
an integration unit that integrates the calculation results of the addition/subtraction device and the multiplication device;
further having
A multiplication device according to any one of Appendices C1 to C4.
(Appendix D1)
Acquiring addition/subtraction operation performance of one or more addition/subtraction devices that perform addition/subtraction and multiplication operation performance of one or more multiplication devices that perform multiplication;
a step of calculating a matrix size as a critical matrix size at which the processing time for one multiplication of a submatrix is equal to the processing time for performing addition and subtraction of the submatrix a predetermined number of times;
using the calculated critical matrix size to select the number of divisions in the matrix to be calculated;
An information processing method comprising
(Appendix D2)
In the calculating step,
When the predetermined number of times is 11, calculate N _C , which is the critical matrix size, by the following formula (A),

here,
R _mi indicates the multiplication operation performance of each multiplier,
R _sj indicates the addition/subtraction operation performance of each addition/subtraction device,
x _i and x _j indicate the number of sequential operations of each multiplier and each adder/subtractor;
The information processing method according to appendix D1.
(Appendix D3)
In the selecting step,
Using N _C which is the calculated critical matrix size, n, which is the number of divisions in the matrix of N rows and N columns, which is the calculation target, is calculated so that N/(2 ⁿ ) is equal to or greater than N _C and N having the n closest to _C selected from;
The information processing method according to appendix D1 or D2.
(Appendix D4)
In the selecting step,
selecting the n based on the speedup rate;
The information processing method according to appendix D3.
(Appendix D5)
partitioning said matrix into said sub-matrices;
transferring the divided submatrices to the adder/subtractor and the multiplier;
integrating the calculation results of the adder/subtractor and the multiplier;
further comprising
The information processing method according to any one of Appendices D1 to D4.
(Appendix E1)
Acquiring addition/subtraction operation performance of one or more addition/subtraction devices that perform addition/subtraction and multiplication operation performance of one or more multiplication devices that perform multiplication;
calculating a matrix size as a critical matrix size at which the processing time for one multiplication of a submatrix is equal to the processing time for performing addition and subtraction of the submatrix a predetermined number of times;
using the calculated critical matrix size to select the number of divisions in the matrix to be calculated;
An information processing program that causes a computer to execute
(Appendix E2)
In the calculating step,
When the predetermined number of times is 11, calculate N _C , which is the critical matrix size, by the following formula (A),

here,
R _mi indicates the multiplication operation performance of each multiplier,
R _sj indicates the addition/subtraction operation performance of each addition/subtraction device,
x _i and x _j indicate the number of sequential operations of each multiplier and each adder/subtractor;
The information processing program according to appendix E1.
(Appendix E3)
In the selecting step,
Using N _C which is the calculated critical matrix size, n, which is the number of divisions in the matrix of N rows and N columns which is the calculation target, is calculated so that N/(2 ⁿ ) is equal to or greater than N _C and N having the n closest to _C selected from;
The information processing program according to appendix E1 or E2.
(Appendix E4)
In the selecting step,
selecting the n based on the speedup rate;
The information processing program according to appendix E3.
(Appendix E5)
partitioning said matrix into said sub-matrices;
transferring the divided submatrices to the adder/subtractor and the multiplier;
integrating the calculation results of the adder/subtractor and the multiplier;
The information processing program according to any one of Appendices E1 to E2, further causing the computer to execute

プログラムは、コンピュータに読み込まれた場合に、実施形態で説明された１又はそれ以上の機能をコンピュータに行わせるための命令群（又はソフトウェアコード）を含む。プログラムは、非一時的なコンピュータ可読媒体又は実体のある記憶媒体に格納されてもよい。限定ではなく例として、コンピュータ可読媒体又は実体のある記憶媒体は、random-access memory（RAM）、read-only memory（ROM）、フラッシュメモリ、solid-state drive（SSD）又はその他のメモリ技術、CD-ROM、digital versatile disc（DVD）、Blu-ray（登録商標）ディスク又はその他の光ディスクストレージ、磁気カセット、磁気テープ、磁気ディスクストレージ又はその他の磁気ストレージデバイスを含む。プログラムは、一時的なコンピュータ可読媒体又は通信媒体上で送信されてもよい。限定ではなく例として、一時的なコンピュータ可読媒体又は通信媒体は、電気的、光学的、音響的、またはその他の形式の伝搬信号を含む。 A program includes instructions (or software code) that, when read into a computer, cause the computer to perform one or more of the functions described in the embodiments. The program may be stored in a non-transitory computer-readable medium or tangible storage medium. By way of example, and not limitation, computer readable media or tangible storage media may include random-access memory (RAM), read-only memory (ROM), flash memory, solid-state drives (SSD) or other memory technology, CDs -ROM, digital versatile disc (DVD), Blu-ray disc or other optical disc storage, magnetic cassette, magnetic tape, magnetic disc storage or other magnetic storage device. The program may be transmitted on a transitory computer-readable medium or communication medium. By way of example, and not limitation, transitory computer readable media or communication media include electrical, optical, acoustic, or other forms of propagated signals.

１情報処理システム
１０、４０情報処理装置
１０ａ取得部
１０ｂ算出部
１０ｃ選定部
１０ｄ分割部
１０ｅ転送部
１０ｆ判断部
１０ｇ統合部
２０加減算管理装置
２０ａ取得部
２０ｂ記憶部
２０ｃ送信部
２１、２ｊ加減算装置
２ｊａ取得部
２ｊｂ演算部
２ｊｃ送信部
３０乗算管理装置
３０ａ取得部
３０ｂ記憶部
３０ｃ送信部
３１、３ｉ乗算装置
３ｊａ取得部
３ｊｂ演算部
３ｊｃ送信部
ＬＦ２０加減算リストファイル
ＬＦ３０乗算リストファイル 1 information processing system 10, 40 information processing device 10a acquisition unit 10b calculation unit 10c selection unit 10d division unit 10e transfer unit 10f determination unit 10g integration unit 20 addition/subtraction management device 20a acquisition unit 20b storage unit 20c transmission units 21 and 2j addition/subtraction device 2ja Acquisition unit 2jb Operation unit 2jc Transmission unit 30 Multiplication management device 30a Acquisition unit 30b Storage unit 30c Transmission unit 31, 3i Multiplication unit 3ja Acquisition unit 3jb Operation unit 3jc Transmission unit LF20 Addition/subtraction list file LF30 Multiplication list file

Claims

an acquisition unit that acquires addition/subtraction operation performance of one or more addition/subtraction devices that perform addition/subtraction and multiplication operation performance of one or more multiplication devices that perform multiplication;
a calculation unit that calculates, as a critical matrix size, a matrix size that makes the processing time for one multiplication of a submatrix equal to the processing time for performing addition and subtraction of the submatrix a predetermined number of times;
a selection unit that selects the number of divisions in a matrix to be calculated using the calculated critical matrix size;
Information processing device with

When the predetermined number of times is 11, the calculation unit calculates _NC , which is the critical matrix size, according to the following formula (A):

here,
R _mi indicates the multiplication operation performance of each multiplier,
R _sj indicates the addition/subtraction operation performance of each addition/subtraction device,
x _i and x _j indicate the number of sequential operations of each multiplier and each adder/subtractor;
The information processing device according to claim 1 .

The selection unit uses the calculated critical matrix size _NC to determine n, which is the number of divisions in the matrix of N rows and N columns, which is the calculation target, so that N/(2 ⁿ ) is the _NC above, select from the n closest to the N _C ,
The information processing apparatus according to claim 1 or 2.

The selection unit selects the n based on the speed improvement rate.
The information processing apparatus according to claim 3.

a dividing unit that divides the matrix into the submatrices;
a transfer unit that transfers the divided submatrices to the addition/subtraction device and the multiplication device;
an integration unit that integrates the calculation results of the addition/subtraction device and the multiplication device;
further comprising
The information processing apparatus according to any one of claims 1 to 4.

one or more addition/subtraction devices that perform addition/subtraction;
one or more multipliers for multiplication;
an information processing device connected to the addition/subtraction device and the multiplication device;
with
The information processing device is
an acquisition unit that acquires the addition/subtraction operation performance of the addition/subtraction device and the multiplication operation performance of the multiplication device;
a calculation unit that calculates, as a critical matrix size, a matrix size that makes the processing time for one multiplication of a submatrix equal to the processing time for performing addition and subtraction of the submatrix a predetermined number of times;
a selection unit that selects the number of divisions in a matrix to be calculated using the calculated critical matrix size;
having
Information processing system.

an acquisition unit that acquires addition/subtraction operation performance and multiplication operation performance of one or more multiplication devices that perform multiplication;
a calculation unit that calculates, as a critical matrix size, a matrix size that makes the processing time for one multiplication of a submatrix equal to the processing time for performing addition and subtraction of the submatrix a predetermined number of times;
a selection unit that selects the number of divisions in a matrix to be calculated using the calculated critical matrix size;
connected to an information processing device having
Having the addition and subtraction operation performance to perform addition and subtraction,
Adder/Subtractor.

an acquisition unit that acquires multiplication operation performance and addition/subtraction operation performance of one or more addition/subtraction devices that perform addition/subtraction;
a calculation unit that calculates, as a critical matrix size, a matrix size that makes the processing time for one multiplication of a submatrix equal to the processing time for performing addition and subtraction of the submatrix a predetermined number of times;
a selection unit that selects the number of divisions in a matrix to be calculated using the calculated critical matrix size;
connected to an information processing device having
Having the multiplication operation performance to perform multiplication,
multiplier.

Acquiring addition/subtraction operation performance of one or more addition/subtraction devices that perform addition/subtraction and multiplication operation performance of one or more multiplication devices that perform multiplication;
a step of calculating a matrix size as a critical matrix size at which the processing time for one multiplication of a submatrix is equal to the processing time for performing addition and subtraction of the submatrix a predetermined number of times;
using the calculated critical matrix size to select the number of divisions in the matrix to be calculated;
An information processing method comprising

Acquiring addition/subtraction operation performance of one or more addition/subtraction devices that perform addition/subtraction and multiplication operation performance of one or more multiplication devices that perform multiplication;
a step of calculating a matrix size as a critical matrix size at which the processing time for one multiplication of a submatrix is equal to the processing time for performing addition and subtraction of the submatrix a predetermined number of times;
using the calculated critical matrix size to select the number of divisions in the matrix to be calculated;
An information processing program that causes a computer to execute