JP2000076224A

JP2000076224A - Vector operation method

Info

Publication number: JP2000076224A
Application number: JP10260835A
Authority: JP
Inventors: Masaru Hashimoto; 賢橋本
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1998-08-31
Filing date: 1998-08-31
Publication date: 2000-03-14
Anticipated expiration: 2018-08-31
Also published as: JP3289685B2

Abstract

PROBLEM TO BE SOLVED: To reduce the number of coping times between a memory and registers and the number of execution times of a vector operation instruction and to improve a processing speed by collectively copying arrays in the same set into a vector register and processing them with the same vector operation instruction. SOLUTION: When an addition expression in which the sum of elements with each other of arrays E and F is made and element for an array D is described while following an addition expression in which the sum of elements with each other of arrays B and C is made an element for an array A, respective elements of the arrays B and E are copied into a vector register B (Vb) from a continuous area on a memory (S1). Respective elements of the arrays C and F are copied into a vector register C (Vc) from the continuous area of the memory (C2). A vector operation instruction Va=Vb+Vc is carried out (S3). Each element of a vector register A (Va) is copied into the continuous area of the memory (S4). Thus, processing needing eight steps in the conventional practice comes to need only four steps.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明はベクトル演算機能を
有するコンピュータ上で配列に対する演算をベクトル演
算により実行する方法に関し、特に同種の複数のベクト
ル演算を一括して実行するベクトル演算方法に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a method for performing an operation on an array by a vector operation on a computer having a vector operation function, and more particularly to a vector operation method for simultaneously executing a plurality of vector operations of the same type.

【０００２】[0002]

【従来の技術】ベクトル演算機能を有するコンピュータ
は、幾つかのベクトルレジスタと、加算や乗算などの演
算種別毎のベクトル演算器とを備えており、科学技術計
算などに多く現れる配列に対する繰り返し演算を高速に
実行することができる。2. Description of the Related Art A computer having a vector operation function includes several vector registers and a vector operation unit for each operation type such as addition and multiplication. Can be executed at high speed.

【０００３】例えば、Ａ（：）＝Ｂ（：）＋Ｃ（：） …（１）（但し、配列Ａ，Ｂ，Ｃの要素の範囲は１〜ｎ）という
配列どうしの加算は、ベクトル演算を用いると、以下の
４ステップで実行できる。１．配列Ｂの各要素を、メモリ上からベクトルレジスタ
Ｂ（Ｖｂ）に複写する。２．配列Ｃの各要素を、メモリ上からベクトルレジスタ
Ｃ（Ｖｃ）に複写する。３．ベクトル演算命令Ｖａ＝Ｖｂ＋Ｖｃを実行する。４．ベクトルレジスタＡ（Ｖａ）の各要素を、メモリ上
の配列Ａに複写する。For example, A (:) = B (:) + C (:) (1) (where the range of elements of arrays A, B, and C is 1 to n). If used, it can be executed in the following four steps. 1. Each element of the array B is copied from the memory to the vector register B (Vb). 2. Each element of the array C is copied from the memory to the vector register C (Vc). 3. Execute the vector operation instruction Va = Vb + Vc. 4. Each element of the vector register A (Va) is copied to the array A on the memory.

【０００４】ここで、１，２，４の各ステップでは、メ
モリ上で連続的に配置されているｎ個の配列要素とベク
トルレジスタとの間の複写を一括して行う複写命令が使
用される。Here, in each of the steps 1, 2, and 4, a copy instruction for collectively copying between the n array elements and the vector register which are continuously arranged on the memory is used. .

【０００５】また、科学技術計算などの分野において
は、配列の数や演算の種類が等しいベクトル演算を連続
して実行する場合がある。この場合も従来は、各演算毎
に前記（１）式の場合と同様な手順を繰り返していた。
以下に、同種のベクトル演算が連続する幾つかの例と、
その場合の従来の演算方法とを示す。[0005] In the field of scientific and technological calculations, vector operations having the same number of arrays and the same type of operation may be continuously executed. Also in this case, conventionally, the same procedure as in the case of the equation (1) has been repeated for each operation.
Below are some examples where the same kind of vector operation is continuous,
A conventional calculation method in that case will be described.

【０００６】例１Ａ（：）＝Ｂ（：）＋Ｃ（：） …（２−１）Ｄ（：）＝Ｅ（：）＋Ｆ（：） …（２−２）（但し、配列Ａ，Ｂ，Ｃの要素の範囲は１〜ｎ配列Ｄ，Ｅ，Ｆの要素の範囲は１〜ｍ）これは、配列Ｂと配列Ｃとの要素どうしの和を配列Ａの
要素とする加算式（２−１）に続けて、配列Ｅと配列Ｆ
との要素どうしの和を配列Ｄの要素とする加算式（２−
２）を実行する例である。Example 1 A (:) = B (:) + C (:) (2-1) D (:) = E (:) + F (:) (2-2) (However, arrays A and B , C has a range of 1 to n. The range of the elements of arrays D, E, and F has a range of 1 to m. -1) followed by sequence E and sequence F
The addition formula (2-
This is an example of executing 2).

【０００７】図７は実行時にメモリ上に割り付けられた
配列Ａ〜Ｆを示す。各配列Ａ〜Ｆは、同図に示すように
メモリ上の連続した領域Ｅ１〜Ｅ６に割り付けられてい
る。FIG. 7 shows arrays A to F allocated on the memory at the time of execution. Each of the arrays A to F is allocated to continuous areas E1 to E6 on the memory as shown in FIG.

【０００８】図８は、前記の２つの式（２−１），（２
−２）をベクトル演算を用いて実行する場合の従来の手
順を示すフローチャートであり、以下のステップから構
成される。ステップＳ２１；配列Ｂの各要素をメモリ上からベクト
ルレジスタＢ（Ｖｂ）に複写する。ステップＳ２２；配列Ｃの各要素をメモリ上からベクト
ルレジスタＣ（Ｖｃ）に複写する。ステップＳ２３；ベクトル演算命令Ｖａ＝Ｖｂ＋Ｖｃを
実行する。ステップＳ２４；ベクトルレジスタＡ（Ｖａ）の各要素
をメモリ上の配列Ａに複写する。ステップＳ２５；配列Ｅの各要素をメモリ上からベクト
ルレジスタＢ（Ｖｂ）に複写する。ステップＳ２６；配列Ｆの各要素をメモリ上からベクト
ルレジスタＣ（Ｖｃ）に複写する。ステップＳ２７；ベクトル演算命令Ｖａ＝Ｖｂ＋Ｖｃを
実行する。ステップＳ２８；ベクトルレジスタＡ（Ｖａ）の各要素
をメモリ上の配列Ｄに複写する。FIG. 8 shows the above two equations (2-1) and (2)
2 is a flowchart showing a conventional procedure in the case of executing (-2) using a vector operation, and includes the following steps. Step S21: Copy each element of the array B from the memory to the vector register B (Vb). Step S22: Each element of the array C is copied from the memory to the vector register C (Vc). Step S23: Execute the vector operation instruction Va = Vb + Vc. Step S24: Each element of the vector register A (Va) is copied to the array A on the memory. Step S25: Copy each element of the array E from the memory to the vector register B (Vb). Step S26: Copy each element of the array F from the memory to the vector register C (Vc). Step S27: Execute the vector operation instruction Va = Vb + Vc. Step S28: Each element of the vector register A (Va) is copied to the array D on the memory.

【０００９】例２ＤＯＩ＝１，１₁，１₂ Ａ（：，Ｉ）＝Ｂ（：，Ｉ）＋Ｃ（：，Ｉ） …（３−１）Ｄ（：，Ｉ）＝Ｅ（：，Ｉ）＋Ｆ（：，Ｉ） …（３−２）ＥＮＤＤＯ（但し、２次元配列Ａ，Ｂ，Ｃの１次元目の添字の範囲
は１〜ｎ、２次元目の添字の範囲は１〜１₁、２次元配
列Ｄ，Ｅ，Ｆの１次元目の添字の範囲は１〜ｍ、２次元
目の添字の範囲は１〜１₁で、１₁≧１₂≧１とする）このＤＯループは、配列Ｂと配列Ｃの２次元目の添字が
１，１＋１₂，１＋２×１₂，…となる要素どうしの和
を配列Ｃの２次元目の添字が１，１＋１₂，１＋２×１
₂，…となる要素とする計算式（３−１）と、配列Ｅと
配列Ｆの２次元目の添字が１，１＋１₂，１＋２×
１₂，…となる要素どうしの和を配列Ｄの２次元目の添
字が１，１＋１₂，１＋２×１₂，…となる要素とする
計算式（３−２）とを実行する例である。[0009] Example _{2 DO I = 1,1 1, 1} 2 A (:, I) = B (:, I) + C (:, I) ... (3-1) D (:, I) = E (: , I) + F (:, I) ENDDO (However, the range of the first dimension of the two-dimensional arrays A, B, and C is 1 to n, and the range of the second dimension is 1 to 1 ₁ , the range of the first dimension subscript of the two-dimensional arrays D, E, and F is 1 to m, and the range of the second dimension subscript is 1 to ₁₁ , and 1 ₁ ≧ 1 ₂ ≧ 1. loop, subscripts second dimension of the array C and array B is _{1,1 + 1 2, 1 + 2} × 1 2, ... become the second dimension of the subscripts a sum of the elements to each other sequence C is _{1,1 + 1 2, 1 + 2} × 1
_2, a calculation formula to ... become elements (3-1), the second dimension of the array subscript F and SEQ E is _{1,1 + 1 2, 1 + 2} ×
1 _2, ... second dimension subscripts a sum of the elements to each other sequence D is 1, 1 + 1 _2, 1 + 2 × 1 _2, are examples for performing the calculation formula (3-2) to ... become elements .

【００１０】図９は実行時にメモリ上に割り付けられた
配列Ａ〜Ｆを示す。各配列Ａ〜Ｆは、同図に示すように
メモリ上の連続した領域Ｅ１〜Ｅ６に割り付けられてい
る。FIG. 9 shows arrays A to F allocated on the memory at the time of execution. Each of the arrays A to F is allocated to continuous areas E1 to E6 on the memory as shown in FIG.

【００１１】図１０は、前記のＤＯループをベクトル演
算を用いて実行する場合の従来の手順を示すフローチャ
ートであり、以下のステップから構成される。ステップＳ３１；ループ制御変数Ｉに１を代入する。ステップＳ３２；Ｉ＞１₁ならば処理を終了し、そうで
ないならステップＳ３３に分岐する。ステップＳ３３；配列Ｂの各要素Ｂ（１，Ｉ）〜Ｂ
（ｎ，Ｉ）をメモリ上からベクトルレジスタＢ（Ｖｂ）
に複写する。ステップＳ３４；配列Ｃの各要素Ｃ（１，Ｉ）〜Ｃ
（ｎ，Ｉ）をメモリ上からベクトルレジスタＣ（Ｖｃ）
に複写する。ステップＳ３５；ベクトル演算命令Ｖａ＝Ｖｂ＋Ｖｃを
実行する。ステップＳ３６；ベクトルレジスタＡ（Ｖａ）の内容を
メモリ上の配列Ａの各要素Ａ（１，Ｉ）〜Ａ（ｎ，Ｉ）
に複写する。ステップＳ３７；配列Ｅの各要素Ｅ（１，Ｉ）〜Ｅ
（ｍ，Ｉ）をメモリ上からベクトルレジスタＢ（Ｖｂ）
に複写する。ステップＳ３８；配列Ｆの各要素Ｆ（１，Ｉ）〜Ｆ
（ｍ，Ｉ）をメモリ上からベクトルレジスタＣ（Ｖｃ）
に複写する。ステップＳ３９；ベクトル演算命令Ｖａ＝Ｖｂ＋Ｖｃを
実行する。ステップＳ３Ａ；ベクトルレジスタＡ（Ｖａ）の内容を
メモリ上の配列Ｄの各要素Ｄ（１，Ｉ）〜Ｄ（ｍ，Ｉ）
に複写する。ステップＳ３Ｂ；Ｉに１₂を加算し、ステップＳ３２に
戻る。FIG. 10 is a flowchart showing a conventional procedure for executing the DO loop by using a vector operation, and comprises the following steps. Step S31: 1 is substituted into the loop control variable I. Step S32; I> ₁ 1 if the process ends, the process branches to step S33 if not. Step S33: Each element B (1, I) to B of array B
(N, I) is stored in the vector register B (Vb) from the memory.
Copy to Step S34: Each element C (1, I) to C of array C
(N, I) is stored in a vector register C (Vc) from the memory.
Copy to Step S35: Execute the vector operation instruction Va = Vb + Vc. Step S36: The contents of the vector register A (Va) are stored in the elements A (1, I) to A (n, I) of the array A on the memory.
Copy to Step S37: Each element E (1, I) to E of the array E
(M, I) is stored in the vector register B (Vb) from the memory.
Copy to Step S38: Each element F (1, I) to F of the array F
(M, I) is stored in a vector register C (Vc) from the memory.
Copy to Step S39: Execute the vector operation instruction Va = Vb + Vc. Step S3A: The contents of the vector register A (Va) are stored in the elements D (1, I) to D (m, I) of the array D on the memory.
Copy to Step S3B; 1 ₂ is added to the I, the flow returns to step S32.

【００１２】なお、ベクトル演算機能を有するコンピュ
ータに関しては、例えば「新版情報処理ハンドブック」
（株式会社オーム社，平成７年１１月２５日発行）の第
３編，第６章の『ベクトル計算機』に詳しい解説があ
る。A computer having a vector operation function is described in, for example, "New Edition Information Processing Handbook".
(Ohm Co., Ltd., issued on November 25, 1995), there is a detailed explanation in "Vector Calculator" in Chapter 3, Chapter 6.

【００１３】[0013]

【発明が解決しようとする課題】上述したように、配列
の数や演算の種類が等しい複数の計算式をベクトル演算
を用いて実行する場合、従来は、各計算式毎にメモリ上
の配列をベクトルレジスタへ複写する処理、ベクトル演
算命令を実行する処理、ベクトルレジスタの内容をメモ
リへ複写する処理を繰り返している。このため、特に要
素数の少ない配列に対する演算の場合には、ベクトル演
算命令を実行する前にメモリ上の配列をベクトルレジス
タに複写するのにかかる時間と、ベクトル演算命令で得
られた値をベクトルレジスタからメモリ上の配列に複写
するのにかかる時間とが、ベクトル演算命令自身の所要
時間に比べて相対的に大きくなってしまい、ベクトル演
算の高速性が十分に活かせないという問題点があった。As described above, when a plurality of calculation formulas having the same number of arrays and the same kind of calculation are executed by using a vector calculation, conventionally, an array on a memory is used for each calculation formula. The process of copying to the vector register, the process of executing the vector operation instruction, and the process of copying the contents of the vector register to the memory are repeated. Therefore, particularly in the case of an operation on an array having a small number of elements, the time required to copy the array in memory to the vector register before executing the vector operation instruction and the value obtained by the vector operation instruction The time required to copy data from a register to an array in memory is relatively large compared to the time required for the vector operation instruction itself, and there has been a problem that the high speed of vector operation cannot be fully utilized. .

【００１４】そこで本発明の目的は、配列の数や演算の
種類が等しい複数の計算式をベクトル演算を用いて実行
する際に必要となる、メモリとベクトルレジスタ間の複
写回数およびベクトル演算命令の実行回数を削減し、処
理速度を向上させることにある。Accordingly, an object of the present invention is to provide a method for executing a plurality of formulas having the same number of arrays and the same type of operation by using a vector operation. An object of the present invention is to reduce the number of executions and improve the processing speed.

【００１５】[0015]

【課題を解決するための手段】本発明は、各々独立して
計算可能な複数の計算式であって、各計算式に含まれる
配列が全て１次元配列でその数および演算の種類が同じ
であり且つ複数の計算式の左辺に現れる配列どうし及び
右辺の同じ位置（同じ項）に現れる配列どうしの要素数
の和がコンピュータの保有するベクトルレジスタのサイ
ズ以下である複数の計算式や、各々独立して計算可能な
複数の計算式をループ内に含むＤＯループであって、各
計算式に含まれる配列が全て同じ次元数の多次元配列で
その数および演算の種類が同じであり且つ複数の計算式
の左辺に現れる配列どうし及び右辺の同じ位置に現れる
配列どうしのループの１回の繰り返し当たりの要素数の
和がコンピュータの保有するベクトルレジスタのサイズ
以下であるＤＯループ中の複数の計算式などは、ベクト
ル演算を用いて実行する場合、１組のベクトルレジスタ
でまとめて処理できること、同じベクトルレジスタで処
理可能な各計算式中の配列要素をメモリ上で連続するよ
うに配置しておくことで１度の複写命令で一括してベク
トルレジスタに複写したり、その逆にメモリ上に複写で
きることに着目してなされたものであり、そのような複
数の計算式における左辺に現れる配列どうし及び右辺の
同じ位置に現れる配列どうしをそれぞれ組にして、同じ
組の配列をメモリ上の連続した領域に割り当てる第１の
ステップと、前記複数の計算式の右辺の同じ位置に現れ
る配列どうしの組毎に１つの入力用のベクトルレジスタ
を割り当て、各組の配列を前記メモリ上の連続した領域
から前記入力用のベクトルレジスタに複写命令によって
複写する第２のステップと、前記入力用のベクトルレジ
スタに複写された配列に対する演算を前記計算式の演算
の種類に対応するベクトル演算器で実行し、その演算結
果を出力用のベクトルレジスタに格納する第３のステッ
プとを含むことを特徴とする。According to the present invention, there are provided a plurality of calculation formulas which can be calculated independently of each other, wherein all the arrays included in each calculation formula are one-dimensional arrays, and the number and type of operation are the same. There are a plurality of formulas in which the sum of the number of elements of the arrays that appear on the left side of the plurality of formulas and the number of elements of the arrays that appear at the same position (the same term) on the right side is smaller than the size of the vector register held by the computer, A DO loop including a plurality of calculation formulas that can be calculated in a loop, wherein the arrays included in each calculation formula are all multidimensional arrays having the same number of dimensions, the number of which is the same, and the type of operation is the same. The sum of the number of elements per iteration of the loop between the arrays that appear on the left side of the calculation formula and the array that appears at the same position on the right side is less than the size of the vector register held by the computer. When a plurality of calculation formulas in the map are executed by using a vector operation, they can be processed collectively by one set of vector registers, and array elements in the calculation formulas that can be processed by the same vector register are successively stored on the memory. By arranging them in such a manner, they can be copied to a vector register all at once by a single copy instruction, and conversely, copied to a memory. A first step of allocating the arrays that appear on the left side and the arrays that appear at the same position on the right side to each other, and allocating the same set of arrays to a continuous area on the memory; One input vector register is assigned to each set of appearing arrays, and each set of arrays is read from a contiguous area on the memory. A second step of copying in accordance with a copy instruction, and an operation on the array copied in the input vector register is executed by a vector operation unit corresponding to the type of operation of the calculation expression, and the operation result is output to a vector for output. And storing the data in a register.

【００１６】また、前記出力用のベクトルレジスタに格
納された演算結果を、前記複数の計算式の左辺に現れる
配列どうしの組に割り当てられた前記メモリ上の連続し
た領域に複写命令によって複写する第４のステップを含
むことを特徴とする。The operation result stored in the output vector register is copied by a copy instruction to a continuous area on the memory allocated to a set of arrays appearing on the left side of the plurality of formulas. It is characterized by including four steps.

【００１７】[0017]

【発明の実施の形態】次に本発明の実施の形態の例につ
いて図面を参照して詳細に説明する。DESCRIPTION OF THE PREFERRED EMBODIMENTS Next, embodiments of the present invention will be described in detail with reference to the drawings.

【００１８】図１は本発明のベクトル演算方法を実施す
るコンピュータの一例を示すブロック図である。この例
のコンピュータは、ＣＰＵ１と、メモリ（主記憶）２
と、ローダ３と、記憶装置４，５と、コンパイラ６とを
備えている。FIG. 1 is a block diagram showing an example of a computer for implementing the vector operation method of the present invention. The computer of this example includes a CPU 1 and a memory (main memory) 2
, A loader 3, storage devices 4 and 5, and a compiler 6.

【００１９】ＣＰＵ１は、ベクトル演算機能を有するＣ
ＰＵであり、ベクトルレジスタセット１１と、ベクトル
加算器１２およびベクトル乗算器１３と、データ転送部
１４と、命令解釈部１５とを含んでいる。The CPU 1 has a C having a vector operation function.
It is a PU, and includes a vector register set 11, a vector adder 12, a vector multiplier 13, a data transfer unit 14, and an instruction interpreting unit 15.

【００２０】ベクトルレジスタセット１１は、複数のベ
クトルレジスタＡ（Ｖａ）、Ｂ（Ｖｂ）、Ｃ（Ｖｃ）、
…を備えている。各々のベクトルレジスタのサイズ（ベ
クトル長）はＬである。The vector register set 11 includes a plurality of vector registers A (Va), B (Vb), C (Vc),
... The size (vector length) of each vector register is L.

【００２１】ベクトル加算器１２は、ベクトルレジスタ
セット１１中の任意の２つのベクトルレジスタから供給
されるデータを加算し、その加算結果をベクトルレジス
タセット１１中の他の１つのベクトルレジスタに格納す
る処理を、ベクトル乗算器１３は、ベクトルレジスタセ
ット１１中の任意の２つのベクトルレジスタから供給さ
れるデータを乗算し、その乗算結果をベクトルレジスタ
セット１１中の他の１つのベクトルレジスタに格納する
処理を、それぞれ司る。The vector adder 12 adds data supplied from any two vector registers in the vector register set 11, and stores the addition result in another vector register in the vector register set 11. And the vector multiplier 13 multiplies the data supplied from any two vector registers in the vector register set 11 and stores the multiplication result in another vector register in the vector register set 11. , Respectively.

【００２２】データ転送部１４は、複写命令に従って、
メモリ２上の連続する領域に存在するデータをベクトル
レジスタセット１１中の任意のベクトルレジスタに複写
したり、その反対に、任意のベクトルレジスタに格納さ
れた内容をメモリ２上の連続する領域に複写する手段で
ある。The data transfer unit 14 responds to a copy command
Data existing in a continuous area on the memory 2 is copied to an arbitrary vector register in the vector register set 11, and conversely, contents stored in an arbitrary vector register are copied to a continuous area on the memory 2. It is a means to do.

【００２３】命令解釈部１５は、メモリ２からベクトル
演算命令，複写命令等の命令コードをフェッチし、解読
して、データ転送部１４，ベクトルレジスタセット１
１，ベクトル加算器１２およびベクトル乗算器１３を制
御する手段である。The instruction interpreter 15 fetches and decodes instruction codes, such as a vector operation instruction and a copy instruction, from the memory 2, and decodes the fetched instruction code.
1, means for controlling the vector adder 12 and the vector multiplier 13.

【００２４】なお、ＣＰＵ１はベクトル演算機能以外に
スカラ演算機能も有しているが、本発明の動作と直接関
係しないため、図示は省略している。Although the CPU 1 has a scalar operation function in addition to the vector operation function, it is not shown because it is not directly related to the operation of the present invention.

【００２５】メモリ２は、命令コードを格納する命令コ
ード部２１と、配列などのデータを格納するデータ部２
２とから構成される。命令コード部２１には、本発明に
より最適化された命令コード列２１１が格納され、デー
タ部２２には、本発明により最適配置されたデータ列２
２１が格納される。The memory 2 includes an instruction code section 21 for storing an instruction code and a data section 2 for storing data such as an array.
And 2. The instruction code section 21 stores an instruction code string 211 optimized according to the present invention, and the data section 22 stores a data string 2 optimally arranged according to the present invention.
21 is stored.

【００２６】記憶装置５は磁気ディスク装置上のファイ
ルであり、配列に対する演算式を複数記述したソースプ
ログラム５１を記憶している。The storage device 5 is a file on the magnetic disk device, and stores a source program 51 describing a plurality of arithmetic expressions for an array.

【００２７】コンパイラ６は、ソースプログラム５１を
記憶装置５から入力し、構文解析，意味解析，コード生
成などを行って、オブジェクトプログラム４１を生成す
る。この際、ソースプログラム５１中に、配列に対する
計算式が連続して現れる場合、それら複数の計算式に対
して本発明による最適化が可能か否かを判定し、可能な
らば最適化されたオブジェクトプログラム４１を生成す
る。The compiler 6 inputs the source program 51 from the storage device 5 and performs syntax analysis, semantic analysis, code generation, and the like to generate an object program 41. At this time, when the calculation formulas for the array appear continuously in the source program 51, it is determined whether or not the plurality of calculation formulas can be optimized according to the present invention. A program 41 is generated.

【００２８】複数の計算式に対して本発明による最適化
が可能となる条件は、以下の通りである。（１）各々の計算式が独立して計算可能であること。つ
まり、先行する計算式で定義された配列を後続の計算式
で引用する等、依存関係が存在しないこと。（２）全ての計算式に含まれる配列の数が等しく、か
つ、同じ演算が用いられていること。（３）各計算式に含まれる配列が全て１次元配列である
場合、各計算式の左辺に現れる配列どうし及び右辺の同
じ位置に現れる配列どうしの要素数の和がベクトルレジ
スタのサイズ以下であること。（４）各計算式に含まれる配列が全て同じ次元数の多次
元配列であり、それらの計算式が同じＤＯループ中に含
まれる場合、各計算式の左辺に現れる配列どうし及び右
辺の同じ位置に現れる配列どうしのループの１回の繰り
返し当たりの要素数の和がベクトルレジスタのサイズ以
下であること。The conditions under which a plurality of formulas can be optimized according to the present invention are as follows. (1) Each formula can be calculated independently. In other words, there must be no dependencies, such as quoting the array defined in the preceding calculation formula in the subsequent calculation formula. (2) The number of arrays included in all the formulas is equal and the same operation is used. (3) When the arrays included in each calculation expression are all one-dimensional arrays, the sum of the numbers of elements of the arrays appearing on the left side of each calculation expression and the arrays appearing at the same position on the right side is equal to or smaller than the size of the vector register. thing. (4) When the arrays included in each calculation formula are all multidimensional arrays having the same number of dimensions and the calculation formulas are included in the same DO loop, the arrays appearing on the left side of each calculation formula and the same position on the right side The sum of the number of elements per iteration of the loop between arrays appearing in the above is equal to or smaller than the size of the vector register.

【００２９】上記の条件を満たす複数の計算式に対して
は、コンパイラ６は、以下のような最適化を行う。The compiler 6 performs the following optimization for a plurality of calculation expressions satisfying the above conditions.

【００３０】まず、データ割り付け方法として、複数の
計算式における左辺に現れる配列どうし及び右辺の同じ
位置に現れる配列どうしをそれぞれ組にして、同じ組の
配列がメモリ２上の連続した領域に割り付けられるよう
なデータ割り付け方法を採用する。First, as a data allocating method, arrays which appear on the left side and arrays which appear at the same position on the right side in a plurality of formulas are respectively set as a set, and the same set of arrays is allocated to a continuous area on the memory 2. Such a data allocation method is adopted.

【００３１】次に、複数の計算式に対応する命令コード
として、以下のような命令コードを生成する。（１）複数の計算式の右辺の同じ位置に現れる配列どう
しの組毎に１つの入力用のベクトルレジスタを割り当
て、各組の配列をメモリ２上の連続した領域から対応す
る入力用のベクトルレジスタに複写する複写命令の命令
コードを生成する。この際、複写するベクトル長は、そ
の組の配列の要素数の合計値に等しい。（２）入力用のベクトルレジスタに複写された配列を、
演算の種類に対応するベクトル演算器でベクトル演算
し、その演算結果を出力用のベクトルレジスタに格納す
るベクトル演算命令の命令コードを生成する。この際、
演算にかかるベクトル長は、入力用ベクトルレジスタに
複写された配列のサイズ、つまり、複数の計算式の右辺
の同じ位置に現れる配列の要素数の合計値に等しい。（３）出力用のベクトルレジスタに格納された演算結果
を、前記複数の計算式の左辺に現れる配列どうしの組に
割り当てられたメモリ２上の連続した領域に複写する複
写命令の命令コードを生成する。Next, the following instruction codes are generated as instruction codes corresponding to a plurality of calculation formulas. (1) One input vector register is allocated to each set of arrays appearing at the same position on the right side of a plurality of calculation formulas, and the array of each set is allocated from a continuous area on the memory 2 to a corresponding input vector register. To generate an instruction code of a copy instruction to be copied to the. At this time, the length of the vector to be copied is equal to the total value of the number of elements of the array of the set. (2) The array copied to the input vector register is
A vector operation is performed by a vector operation unit corresponding to the type of operation, and an instruction code of a vector operation instruction for storing the operation result in an output vector register is generated. On this occasion,
The vector length required for the operation is equal to the size of the array copied to the input vector register, that is, the total value of the number of elements of the array appearing at the same position on the right side of a plurality of formulas. (3) Generating an instruction code of a copy instruction for copying the operation result stored in the output vector register to a continuous area on the memory 2 allocated to a set of arrays appearing on the left side of the plurality of formulas I do.

【００３２】次に図１の記憶装置４は、コンパイラ６に
よって生成されたオブジェクトプログラム４１を格納す
る磁気ディスク装置上のファイルであり、ローダ３は、
オブジェクトプログラム４１をメモリ２にロードする手
段である。オブジェクトプログラム４１中の命令コード
列は命令コード列２１１として命令コード部２１にロー
ドされ、配列などのデータはデータ列２２１としてデー
タ部２２にロードされる。配列のデータ部２２へのロー
ドに際しては、コンパイラ６で決定されたデータ割り付
け方法に従う。Next, the storage device 4 in FIG. 1 is a file on a magnetic disk device for storing the object program 41 generated by the compiler 6, and the loader 3
This is a means for loading the object program 41 into the memory 2. The instruction code sequence in the object program 41 is loaded into the instruction code unit 21 as an instruction code sequence 211, and data such as an array is loaded into the data unit 22 as a data sequence 221. When loading the array into the data section 22, the data allocation method determined by the compiler 6 is followed.

【００３３】上述のようにしてメモリ２の命令コード部
２１にロードされたオブジェクトプログラム４１の命令
コード列２１１は、ＣＰＵ１の命令解釈部１５によって
順次に解釈され、実行される。The instruction code sequence 211 of the object program 41 loaded into the instruction code unit 21 of the memory 2 as described above is sequentially interpreted and executed by the instruction interpreting unit 15 of the CPU 1.

【００３４】次に、本実施の形態の実施例について説明
する。Next, an example of this embodiment will be described.

【００３５】今、ソースプログラム５１中に、以下のよ
うに、配列Ｂと配列Ｃとの要素どうしの和を配列Ａの要
素とする加算式（４−１）に続けて、配列Ｅと配列Ｆと
の要素どうしの和を配列Ｄの要素とする加算式（４−
２）が記述されているものとする。この式は従来技術で
挙げた式（２−１），（２−２）と同じである。Ａ（：）＝Ｂ（：）＋Ｃ（：） …（４−１）Ｄ（：）＝Ｅ（：）＋Ｆ（：） …（４−２）但し、配列Ａ，Ｂ，Ｃの要素の範囲は１〜ｎ、配列Ｄ，
Ｅ，Ｆの要素の範囲は１〜ｍとし、ｎ＋ｍはベクトルレ
ジスタセット１１中のベクトルレジスタのサイズＬ以下
とする。Now, in the source program 51, following the addition formula (4-1) in which the sum of the elements of the arrays B and C is used as the element of the array A, The addition formula (4-
Assume that 2) is described. This equation is the same as the equations (2-1) and (2-2) described in the prior art. A (:) = B (:) + C (:) (4-1) D (:) = E (:) + F (:) (4-2) However, the range of the elements of the arrays A, B, and C Are 1 to n, array D,
The range of the elements of E and F is 1 to m, and n + m is not more than the size L of the vector register in the vector register set 11.

【００３６】コンパイラ６は、ソースプログラム５１中
に配列に対する演算式が連続している箇所を検出する
と、これらの演算式に対し本発明による最適化が可能か
否かを判定する。今の場合、式（４−１）と式（４−
２）とは独立して計算可能であり、双方の式に含まれる
配列の数が３つで等しく、かつ、加算という同じ演算が
用いられている。また、各式の左辺に現れる配列Ａ，Ｄ
どうし及び右辺の同じ位置に現れる配列ＢとＥ、ＣとＦ
どうしの要素数の和は、ｎ＋ｍであり、ベクトルレジス
タのサイズＬ以下である。このため、コンパイラ６は式
（４−１）と（４−２）とは本発明による最適化が可能
と判断し、以下のような最適化を行う。When the compiler 6 detects a position in the source program 51 where arithmetic expressions for an array are continuous, it determines whether or not these arithmetic expressions can be optimized by the present invention. In this case, equations (4-1) and (4-
It can be calculated independently of 2), the number of arrays included in both equations is equal to three, and the same operation of addition is used. Arrays A and D appearing on the left side of each expression
Arrays B and E, C and F appearing at the same position on the right side
The sum of the numbers of elements is n + m, which is equal to or smaller than the size L of the vector register. Therefore, the compiler 6 determines that the equations (4-1) and (4-2) can be optimized according to the present invention, and performs the following optimization.

【００３７】まず、データ割り付け方法として、式（４
−１），（４−２）の左辺に現れる配列Ａ，Ｄどうし及
び右辺の同じ位置に現れる配列ＢとＥ、ＣとＦどうしを
それぞれ組にして、図２に示すように、同じ組の配列が
メモリ２上の連続した領域Ｅ１，Ｅ２，Ｅ３に割り付け
られるようなデータ割り付け方法を採用する。First, as a data allocation method, equation (4)
-1) and (4-2), the arrays A and D appearing on the left side and the arrays B and E appearing at the same position on the right side, and C and F constitute a set, and as shown in FIG. A data allocation method is adopted in which the array is allocated to continuous areas E1, E2, and E3 on the memory 2.

【００３８】次に、式（４−１），（４−２）に対応す
る命令コードとして、以下のような命令コードを生成す
る。（１）式の右辺の第１項に現れる配列ＢとＥの組に１つ
の入力用のベクトルレジスタ（Ｂ（Ｖｂ）とする）を割
り当て、その組の配列ＢとＥをメモリ２上の図２に示し
た連続した領域Ｅ２からベクトルレジスタＢ（Ｖｂ）に
複写する複写命令の命令コードを生成する。（２）同様に、式の右辺の第２項に現れる配列ＣとＦの
組に１つの入力用のベクトルレジスタ（Ｃ（Ｖｃ）とす
る）を割り当て、その組の配列ＣとＦをメモリ２上の図
２に示した連続した領域Ｅ３からベクトルレジスタＣ
（Ｖｃ）に複写する複写命令の命令コードを生成する。（３）ベクトル加算命令Ｖａ＝Ｖｂ＋Ｖｃを生成する。
このベクトル加算命令は、図３に示すように、ベクトル
レジスタＢ（Ｖｂ）の要素とベクトルレジスタＣ（Ｖ
ｃ）の要素との加算値をベクトル加算器１２で求め、そ
の結果を出力用のベクトルレジスタＡ（Ｖａ）に格納す
る命令である。（４）ベクトルレジスタＡ（Ｖａ）に格納された演算結
果を、式（４−１），（４−２）の左辺に現れる配列Ａ
とＤどうしの組に割り当てられたメモリ２上の図２に示
した連続した領域Ｅ１に複写する複写命令の命令コード
を生成する。Next, the following instruction codes are generated as instruction codes corresponding to the expressions (4-1) and (4-2). One input vector register (referred to as B (Vb)) is assigned to a set of arrays B and E appearing in the first term on the right side of the equation (1), and the arrays B and E of the set are stored in the memory 2. An instruction code of a copy instruction to be copied to the vector register B (Vb) from the continuous area E2 shown in FIG. 2 is generated. (2) Similarly, one vector register (referred to as C (Vc)) for input is assigned to a set of arrays C and F appearing in the second term on the right side of the equation, and the arrays C and F of the set are stored in the memory 2 From the continuous area E3 shown in FIG.
An instruction code of a copy instruction to be copied to (Vc) is generated. (3) Generate a vector addition instruction Va = Vb + Vc.
As shown in FIG. 3, the vector addition instruction includes an element of the vector register B (Vb) and a vector register C (V
This is an instruction for obtaining an addition value with the element of c) by the vector adder 12, and storing the result in an output vector register A (Va). (4) The operation result stored in the vector register A (Va) is stored in an array A that appears on the left side of the equations (4-1) and (4-2).
2 and D are generated, and an instruction code of a copy instruction to be copied to the continuous area E1 shown in FIG.

【００３９】図４は以上のようにして生成された式（４
−１），（４−２）に対応する命令コード列をフローチ
ャート形式で示している。FIG. 4 shows the equation (4) generated as described above.
Instruction code strings corresponding to -1) and (4-2) are shown in a flowchart format.

【００４０】コンパイラ６によって生成された図４に示
すような命令コード列を含むオブジェクトプログラム４
１は記憶装置４に格納され、その後、ローダ３によっ
て、命令コード列２１１がメモリ２の命令コード部２１
にロードされ、配列等のデータ列２２１がデータ部２２
にロードされる。このとき、式（４−１），（４−２）
で定義，参照される配列は図２で示したように、所定の
配列の組ごとにメモリ２の連続した領域Ｅ１〜Ｅ３に割
り付けられる。そして、ＣＰＵ１の命令解釈部１５が命
令コード列２１１から命令コードを順次に読み取って解
釈し、各部を制御する。これにより、図４に示した命令
コード列の箇所に実行が進んだとき、以下のような動作
が行われる。An object program 4 including an instruction code string as shown in FIG.
1 is stored in the storage device 4, and thereafter the instruction code sequence 211 is stored in the instruction code unit 21 of the memory 2 by the loader 3.
And a data string 221 such as an array is
Is loaded. At this time, Expressions (4-1) and (4-2)
As shown in FIG. 2, the array defined and referred to is allocated to continuous areas E1 to E3 of the memory 2 for each set of predetermined arrays. Then, the instruction interpreting unit 15 of the CPU 1 sequentially reads and interprets the instruction code from the instruction code sequence 211 and controls each unit. Thus, when the execution proceeds to the instruction code string shown in FIG. 4, the following operation is performed.

【００４１】ステップＳ１；命令解釈部１５の制御の下
に、データ転送部１４は、メモリ２上の連続領域Ｅ２か
ら配列Ｂと配列Ｅの各要素をベクトルレジスタＢ（Ｖ
ｂ）に複写する。ステップＳ２；命令解釈部１５の制御の下に、データ転
送部１４は、メモリ２上の連続領域Ｅ３から配列Ｃと配
列Ｆの各要素をベクトルレジスタＣ（Ｖｃ）に複写す
る。ステップＳ３；命令解釈部１５の制御の下に、ベクトル
加算器１２は、ベクトル演算命令Ｖａ＝Ｖｂ＋Ｖｃを実
行する。すなわち、ベクトルレジスタＢ（Ｖｂ）とベク
トルレジスタＣ（Ｖｃ）の各要素ごとの加算値を求め、
ベクトルレジスタＡ（Ｖａ）に格納する。ステップＳ４；命令解釈部１５の制御の下に、データ転
送部１４は、ベクトルレジスタＡ（Ｖａ）の各要素を、
メモリ２上の連続領域Ｅ１に複写する。Step S1: Under the control of the instruction interpreting unit 15, the data transfer unit 14 stores the elements of the array B and the array E from the continuous area E2 on the memory 2 into the vector register B (V
Copy to b). Step S2: Under the control of the instruction interpreting section 15, the data transfer section 14 copies each element of the arrays C and F from the continuous area E3 on the memory 2 to the vector register C (Vc). Step S3: Under the control of the instruction interpreting section 15, the vector adder 12 executes the vector operation instruction Va = Vb + Vc. That is, an addition value for each element of the vector register B (Vb) and the vector register C (Vc) is obtained,
The data is stored in the vector register A (Va). Step S4: Under the control of the instruction interpretation unit 15, the data transfer unit 14 stores each element of the vector register A (Va)
Copies to the continuous area E1 on the memory 2.

【００４２】以上のようにして、従来は図８に示したよ
うに８ステップを要した処理が、本実施例では図４に示
すように４ステップで済むようになる。As described above, the processing which conventionally required eight steps as shown in FIG. 8 can be completed in this embodiment with four steps as shown in FIG.

【００４３】以上の実施例では、演算の右辺の項数が２
であり、左辺が右辺の項の何れの配列とも同じでなく、
しかも演算が加算であるような複数の計算式について説
明したが、右辺の項数や演算の種類、項の異同について
は制限は無く、例えば次のような複数の演算式の組に対
しても適用可能である。In the above embodiment, the number of terms on the right side of the operation is 2
And the left-hand side is not the same as any array of terms on the right-hand side,
In addition, although the description has been given of a plurality of calculation formulas in which the calculation is addition, there is no limitation on the number of terms on the right side, the type of calculation, and the difference in terms. Applicable.

【００４４】Ａ（：）＝−Ａ（：）Ｂ（：）＝−Ｂ（：） …（５）Ａ（：）＝Ｂ（：）−Ｃ（：）＊Ｄ（：）Ｅ（：）＝Ｆ（：）−Ｇ（：）＊Ｈ（：） …（６）Ａ（：）＝ｓｉｎＢ（：）＋ｃｏｓＣ（：）Ｄ（：）＝ｓｉｎＥ（：）＋ｃｏｓＦ（：） …（７）A (:) = − A (:) B (:) = − B (:) (5) A (:) = B (:) − C (:) * D (:) E (:) = F (:)-G (:) * H (:) (6) A (:) = sinB (:) + cosC (:) D (:) = sinE (:) + cosF (:) (7)

【００４５】また、以上の例は１次元配列に対する演算
を取り上げたが、本発明は２次元以上の配列に対する演
算にも適用可能である。以下にその例を示す。In the above example, the operation for a one-dimensional array has been described, but the present invention is also applicable to an operation for an array having two or more dimensions. An example is shown below.

【００４６】ＤＯＩ＝１，１₁，１₂ …（８−１）Ａ（：，Ｉ）＝Ｂ（：，Ｉ）＋Ｃ（：，Ｉ） …（８−２）Ｄ（：，Ｉ）＝Ｅ（：，Ｉ）＋Ｆ（：，Ｉ） …（８−３）ＥＮＤＤＯ …（８−４）但し、２次元配列Ａ，Ｂ，Ｃの１次元目の添字の範囲は
１〜ｎ、２次元目の添字の範囲は１〜１₁、２次元配列
Ｄ，Ｅ，Ｆの１次元目の添字の範囲は１〜ｍ、２次元目
の添字の範囲は１〜１₁で、１₁≧１₂≧１とし、ま
た、ｎ＋ｍはベクトルレジスタセット１１中のベクトル
レジスタのサイズＬ以下とする。DO I = 1, 1 ₁ , 1 ₂ (8-1) A (:, I) = B (:, I) + C (:, I) (8-2) D (:, I) = E (:, I) + F (:, I) (8-3) ENDDO (8-4) However, the range of subscripts in the first dimension of the two-dimensional arrays A, B, and C is 1 to n, The range of the subscript of the dimension is 1 to 11 ₁ , the range of the subscript of the first dimension of the two-dimensional arrays D, E, and F is 1 to m, the range of the subscript of the second dimension is 1 to ₁₁ and 1 ₁ ≧ 1 ₂ ≧ 1 and n + m is equal to or smaller than the size L of the vector register in the vector register set 11.

【００４７】上記のＤＯループによる計算式は、配列Ｂ
と配列Ｃの２次元目の添字が１，１＋１₂，１＋２×１
₂，…となる要素どうしの和を配列Ｃの２次元目の添字
が１，１＋１₂，１＋２×１₂，…となる要素とし、配
列Ｅと配列Ｆの２次元目の添字が１，１＋１₂，１＋２
×１₂，…となる要素どうしの和を配列Ｄの２次元目の
添字が１，１＋１₂，１＋２×１₂，…となる要素とす
るもので、従来の技術で挙げた式（３−１），（３−
２）と同じである。The calculation formula by the above DO loop is as follows:
And the second dimension of the array C are 1,1 + 1 ₂ , 1 + 2 × 1
_2, ... Element by element second dimension subscripts 1,1 + 1 ₂ of the sum sequence C of, 1 + 2 × 1 _2, and ... become elements, the second dimension of the array subscript F and SEQ E is 1,1 + 1 ₂ , 1 + 2
The sum of the elements of × 1 ₂ ,... Is defined as the element whose second dimensional subscript of the array D is 1,1 + 1 ₂ , 1 + 2 × 1 ₂ ,. 1), (3-
Same as 2).

【００４８】上記のような計算式がソースプログラム５
１中に存在する場合、コンパイラ６は、これらの計算式
に対し本発明による最適化が可能か否かを判定する。今
の場合、ＤＯループ中の式（８−２）と式（８−３）と
は独立して計算可能であり、双方の式に含まれる配列の
数が３つで等しく、かつ、加算という同じ演算が用いら
れている。また、各式の左辺に現れる配列Ａ，Ｄどうし
及び右辺の同じ位置に現れる配列ＢとＥ、ＣとＦどうし
のループの１回の繰り返し当たりの要素数の和は、ｎ＋
ｍであり、ベクトルレジスタのサイズＬ以下である。こ
のため、コンパイラ６は上記ＤＯループ中の式（８−
２）と（８−３）とは本発明による最適化が可能と判断
し、以下のような最適化を行う。The above formula is used in the source program 5
If they are present in 1, the compiler 6 determines whether or not these formulas can be optimized according to the present invention. In this case, the equations (8-2) and (8-3) in the DO loop can be calculated independently, the number of arrays included in both equations is equal to three, and the addition is called addition. The same operation is used. The sum of the number of elements per iteration of the loop between the arrays A and D appearing on the left side of each expression and the arrays B and E appearing at the same position on the right side and between the arrays C and F is n +
m, which is smaller than or equal to the size L of the vector register. For this reason, the compiler 6 calculates the expression (8-
2) and (8-3) determine that the optimization according to the present invention is possible, and perform the following optimization.

【００４９】まず、データ割り付け方法として、式（８
−２），（８−３）の左辺に現れる配列Ａ，Ｄどうし及
び右辺の同じ位置に現れる配列ＢとＥ、ＣとＦどうしを
それぞれ組にして、メモリ２上の連続した領域に割り付
けるようなデータ割り付け方法を採用する。この際、式
（８−２），（８−３）はＤＯループ中に存在するた
め、各連続領域内では、ループ制御変数Ｉに対応する添
字が同じものが連続するように割り付ける。つまり、図
５に示すように、配列Ａ，Ｄについては、ループ制御変
数Ｉに対応する２次元目の添字が１のものＡ（１，
１），…，Ａ（ｎ，１）とＤ（１，１），…，Ｄ（ｍ，
１）を連続してメモリ２上の連続領域Ｅ１に割り付け、
次に添字が２のもの、３のもの、…、１₁のものをメモ
リ２上の連続領域Ｅ１に同様に連続して割り付ける。ま
た、配列ＢとＥ、配列ＣとＦも同様に、図５に示すよう
にメモリ２上の連続領域Ｅ２，Ｅ３に割り付ける。First, as a data allocation method, equation (8)
2), arrays A and D appearing on the left side of (8-3) and arrays B, E, C and F appearing at the same position on the right side are respectively grouped and assigned to a continuous area on the memory 2. Adopt a simple data allocation method. At this time, since the equations (8-2) and (8-3) exist in the DO loop, in each continuous area, the same subscripts corresponding to the loop control variables I are allocated so as to be continuous. That is, as shown in FIG. 5, for the arrays A and D, the subscript of the second dimension corresponding to the loop control variable I is 1 (A (1,
1),..., A (n, 1) and D (1, 1),.
1) is continuously allocated to the continuous area E1 on the memory 2,
Next thing subscript 2, a three, ..., allocated likewise continuously from 1 ₁ in a continuous area E1 in the memory 2. Similarly, arrays B and E and arrays C and F are allocated to continuous areas E2 and E3 on the memory 2 as shown in FIG.

【００５０】次に、ＤＯループに対応する命令コードと
して、以下のような命令コードを生成する。Next, the following instruction code is generated as an instruction code corresponding to the DO loop.

【００５１】まず、ＤＯループの制御文（８−１），
（８−４）に対応して、ループ制御変数Ｉを１に初期化
する命令コード、ループ制御変数Ｉを終値１₁と比較し
て分岐処理する命令コード、１ループ実行毎にループ制
御変数Ｉを増分値１₂だけ加算する命令コードを生成す
る。First, the control statement (8-1) of the DO loop,
Corresponds to (8-4), the instruction code initializes the loop control variable I, instruction code processing branches by comparing the controlled variable I closing 1 ₁ and the loop control variable for each run one loop I the generates an instruction code for adding increment value 1 _2.

【００５２】次に、式（８−２），（８−３）に対応す
る命令コードとして、以下のような命令コードを生成す
る。（１）式の右辺の第１項に現れる配列ＢとＥの組に１つ
の入力用のベクトルレジスタ（Ｂ（Ｖｂ）とする）を割
り当て、その組の配列ＢとＥの現ループ制御変数Ｉの値
に応じた各要素をメモリ２上の図５に示した連続した領
域Ｅ２からベクトルレジスタＢ（Ｖｂ）に複写する複写
命令の命令コードを生成する。（２）同様に、式の右辺の第２項に現れる配列ＣとＦの
組に１つの入力用のベクトルレジスタ（Ｃ（Ｖｃ）とす
る）を割り当て、その組の配列ＣとＦの現ループ制御変
数Ｉの値に応じた各要素をメモリ２上の図５に示した連
続した領域Ｅ３からベクトルレジスタＣ（Ｖｃ）に複写
する複写命令の命令コードを生成する。（３）ベクトル加算命令Ｖａ＝Ｖｂ＋Ｖｃを生成する。
このベクトル加算命令は、図３に示したように、ベクト
ルレジスタＢ（Ｖｂ）の要素とベクトルレジスタＣ（Ｖ
ｃ）の要素との加算値をベクトル加算器１２で求め、そ
の結果を出力用のベクトルレジスタＡ（Ｖａ）に格納す
る命令である。（４）ベクトルレジスタＡ（Ｖａ）に格納された演算結
果を、式（８−２），（８−３）の左辺に現れる配列Ａ
とＤどうしの組に割り当てられたメモリ２上の図５に示
した連続した領域Ｅ１における現ループ制御変数Ｉの値
に対応する部分に複写する複写命令の命令コードを生成
する。Next, the following instruction codes are generated as instruction codes corresponding to the equations (8-2) and (8-3). One set of input vector registers (referred to as B (Vb)) is assigned to the set of arrays B and E appearing in the first term on the right side of the equation (1), and the current loop control variable I of the arrays B and E of the set is assigned. Is generated from the continuous area E2 shown in FIG. 5 in the memory 2 into the vector register B (Vb). (2) Similarly, one vector register for input (C (Vc)) is assigned to a set of arrays C and F appearing in the second term on the right side of the equation, and the current loop of arrays C and F of the set is assigned. An instruction code of a copy instruction for copying each element corresponding to the value of the control variable I from the continuous area E3 shown in FIG. 5 on the memory 2 to the vector register C (Vc) is generated. (3) Generate a vector addition instruction Va = Vb + Vc.
As shown in FIG. 3, the vector addition instruction includes the elements of the vector register B (Vb) and the vector register C (V
This is an instruction for obtaining an addition value with the element of c) by the vector adder 12, and storing the result in an output vector register A (Va). (4) The operation result stored in the vector register A (Va) is converted into an array A appearing on the left side of the equations (8-2) and (8-3).
The instruction code of a copy instruction to be copied to a portion corresponding to the value of the current loop control variable I in the continuous area E1 shown in FIG.

【００５３】図６は以上のようにして生成された上記Ｄ
Ｏループに対応する命令コード列をフローチャート形式
で示している。FIG. 6 shows the above D generated as described above.
An instruction code string corresponding to the O loop is shown in a flowchart format.

【００５４】コンパイラ６によって生成された図６に示
すような命令コード列を含むオブジェクトプログラム４
１は記憶装置４に格納され、その後、ローダ３によっ
て、命令コード列２１１がメモリ２の命令コード部２１
にロードされ、配列等のデータ列２２１がデータ部２２
にロードされる。このとき、式（８−２），（８−３）
で定義，参照される配列は図５で示したように、所定の
配列の組ごとにメモリ２の連続した領域Ｅ１〜Ｅ３に割
り付けられる。そして、ＣＰＵ１の命令解釈部１５が命
令コード列２１１から命令コードを順次に読み取って解
釈し、各部を制御する。これにより、図６に示した命令
コード列の箇所に実行が進んだとき、以下のような動作
が行われる。An object program 4 including an instruction code sequence as shown in FIG.
1 is stored in the storage device 4, and thereafter the instruction code sequence 211 is stored in the instruction code unit 21 of the memory 2 by the loader 3.
And a data string 221 such as an array is
Is loaded. At this time, equations (8-2) and (8-3)
As shown in FIG. 5, the array defined and referred to is allocated to the continuous areas E1 to E3 of the memory 2 for each set of the predetermined array. Then, the instruction interpreting unit 15 of the CPU 1 sequentially reads and interprets the instruction code from the instruction code sequence 211 and controls each unit. Thus, when the execution proceeds to the position of the instruction code string shown in FIG. 6, the following operation is performed.

【００５５】ステップＳ１１；命令解釈部１５はループ
制御変数Ｉに１を代入する。ステップＳ１２；命令解釈部１５はＩ＞１₁ならばステ
ップＳ１３に分岐し、そうでなければ図６の処理を終了
する。ステップＳ１３；命令解釈部１５の制御の下に、データ
転送部１４は、メモリ２上の連続領域Ｅ２から配列Ｂの
要素Ｂ（１，Ｉ）〜Ｂ（ｎ，Ｉ）と配列Ｅの要素Ｅ
（１，Ｉ）〜Ｅ（ｍ，Ｉ）を、ベクトルレジスタＢ（Ｖ
ｂ）に複写する。ステップＳ１４；命令解釈部１５の制御の下に、データ
転送部１４は、メモリ２上の連続領域Ｅ３から配列Ｃの
要素Ｃ（１，Ｉ）〜Ｃ（ｎ，Ｉ）と配列Ｆの要素Ｆ
（１，Ｉ）〜Ｆ（ｍ，Ｉ）を、ベクトルレジスタＣ（Ｖ
ｃ）に複写する。ステップＳ１５；命令解釈部１５の制御の下に、ベクト
ル加算器１２は、ベクトル演算命令Ｖａ＝Ｖｂ＋Ｖｃを
実行する。すなわち、ベクトルレジスタＢ（Ｖｂ）とベ
クトルレジスタＣ（Ｖｃ）の各要素ごとの加算値を求
め、ベクトルレジスタＡ（Ｖａ）に格納する。ステップＳ１６；命令解釈部１５の制御の下に、データ
転送部１４は、ベクトルレジスタＡ（Ｖａ）の各要素
を、メモリ２上の連続領域Ｅ１に複写する。ステップＳ１７；命令解釈部１５はループ制御変数Ｉに
１₂を加算し、ステップＳ１２に進む。Step S11: The instruction interpreter 15 substitutes 1 for a loop control variable I. Step S12; instruction interpreting unit 15 branches to I> 1 ₁ if step S13, and terminates the process of FIG. 6 otherwise. Step S13: Under the control of the instruction interpreting unit 15, the data transfer unit 14 reads the elements B (1, I) to B (n, I) of the array B and the elements E of the array E from the continuous area E2 on the memory 2.
(1, I) to E (m, I) are stored in a vector register B (V
Copy to b). Step S14: Under the control of the instruction interpreting unit 15, the data transfer unit 14 reads the elements C (1, I) to C (n, I) of the array C and the element F of the array F from the continuous area E3 on the memory 2.
(1, I) to F (m, I) are stored in a vector register C (V
Copy to c). Step S15: Under the control of the instruction interpreting unit 15, the vector adder 12 executes the vector operation instruction Va = Vb + Vc. That is, an addition value for each element of the vector register B (Vb) and the vector register C (Vc) is obtained and stored in the vector register A (Va). Step S16: Under the control of the instruction interpreting section 15, the data transfer section 14 copies each element of the vector register A (Va) to the continuous area E1 on the memory 2. Step S17; instruction interpreting unit 15 adds 1 ₂ to the loop control variable I, the process proceeds to step S12.

【００５６】以上のようにして、従来は図１０に示した
ように多くのステップを要した処理が、本実施例では図
６に示すように少ないステップで済むようになる。As described above, the processing which conventionally required many steps as shown in FIG. 10 can be reduced to a small number of steps as shown in FIG. 6 in this embodiment.

【００５７】[0057]

【発明の効果】以上説明したように本発明によれば、配
列の数や演算の種類が等しい複数の計算式をベクトル演
算を用いて実行する際の処理速度を向上することができ
る。その理由は、複数の計算式の右辺の同じ位置に現れ
る配列どうしを組にしてメモリ上の連続した領域に割り
付けておき、同じ組の配列を同じベクトルレジスタに一
括して複写し、同じベクトル演算命令で処理するため、
メモリからベクトルレジスタへの複写回数およびベクト
ル演算命令の実行回数が削減されるからである。また、
複数の計算式の左辺に現れる配列どうしを組にしてメモ
リ上の連続した領域に割り付けておき、ベクトル演算命
令によってベクトルレジスタ上に得られた演算結果を一
括してメモリ上の連続領域に複写するため、ベクトルレ
ジスタからメモリへの複写回数が削減されるからであ
る。As described above, according to the present invention, it is possible to improve the processing speed when executing a plurality of formulas having the same number of arrays and the same type of operation by using the vector operation. The reason is that arrays that appear at the same position on the right side of multiple formulas are grouped and allocated to a continuous area in memory, and the arrays of the same group are collectively copied to the same vector register, and the same vector operation is performed. To process with instructions,
This is because the number of times of copying from the memory to the vector register and the number of times of executing the vector operation instruction are reduced. Also,
Arrays appearing on the left side of a plurality of formulas are grouped and assigned to a continuous area in the memory, and the operation results obtained in the vector register by the vector operation instruction are collectively copied to a continuous area in the memory. This is because the number of times of copying from the vector register to the memory is reduced.

[Brief description of the drawings]

【図１】本発明のベクトル演算方法を実施するコンピュ
ータの一例を示すブロック図である。FIG. 1 is a block diagram illustrating an example of a computer that executes a vector operation method according to the present invention.

【図２】本発明の一実施例における配列のメモリへの割
り付け方法を示す図である。FIG. 2 is a diagram showing a method of allocating arrays to memories in one embodiment of the present invention.

【図３】ベクトル加算命令Ｖａ＝Ｖｂ＋Ｖｃの説明図で
ある。FIG. 3 is an explanatory diagram of a vector addition instruction Va = Vb + Vc.

【図４】本発明の一実施例における最適化された命令コ
ード列をフローチャート形式で示す図である。FIG. 4 is a diagram showing, in a flowchart form, an optimized instruction code sequence in one embodiment of the present invention.

【図５】本発明の別の実施例における配列のメモリへの
割り付け方法を示す図である。FIG. 5 is a diagram showing a method of allocating arrays to memories according to another embodiment of the present invention.

【図６】本発明の別の実施例における最適化された命令
コード列をフローチャート形式で示す図である。FIG. 6 is a diagram showing, in a flowchart form, an optimized instruction code sequence in another embodiment of the present invention.

【図７】一次元配列のメモリへの割り付け方法の従来例
を示す図である。FIG. 7 is a diagram showing a conventional example of a method of allocating a one-dimensional array to a memory.

【図８】従来のベクトル演算方法の手順を示すフローチ
ャートである。FIG. 8 is a flowchart showing a procedure of a conventional vector calculation method.

【図９】二次元配列のメモリへの割り付け方法の従来例
を示す図である。FIG. 9 is a diagram showing a conventional example of a method of allocating a two-dimensional array to a memory.

【図１０】従来のベクトル演算方法の手順を示すフロー
チャートである。FIG. 10 is a flowchart showing a procedure of a conventional vector calculation method.

[Explanation of symbols]

１…ＣＰＵ１１…ベクトルレジスタセット１２…ベクトル加算器１３…ベクトル乗算器１４…データ転送部１５…命令解釈部２…メモリ（主記憶）２１…命令コード部２１１…命令コード列２２…データ部２２１…データ列３…ローダ４…記憶装置４１…オブジェクトプログラム５…記憶装置５１…ソースプログラム６…コンパイラ DESCRIPTION OF SYMBOLS 1 ... CPU 11 ... Vector register set 12 ... Vector adder 13 ... Vector multiplier 14 ... Data transfer part 15 ... Instruction interpretation part 2 ... Memory (main memory) 21 ... Instruction code part 211 ... Instruction code string 22 ... Data part 221 ... data string 3 ... loader 4 ... storage device 41 ... object program 5 ... storage device 51 ... source program 6 ... compiler

Claims

[Claims]

1. A method for performing an operation on an array by a vector operation on a computer having a vector operation function, wherein a plurality of same types of calculation formulas which can be calculated independently are to be optimized, and the plurality of calculation formulas are A first step of grouping the arrays that appear on the left side and the arrays that appear at the same position on the right side and assigning the same set of arrays to a continuous area on the memory; and the same position on the right side of the plurality of formulas Allocate one input vector register for each set of arrays that appear in
A second copy of each set of arrays from a continuous area on the memory to the input vector register by a copy instruction;
And an operation on the array copied to the input vector register is executed by a vector operation unit corresponding to the type of operation of the calculation expression, and the operation result is stored in an output vector register. And a vector operation method.

2. The method according to claim 1, wherein the operation result stored in the vector register for output is copied by a copy instruction to a continuous area on the memory allocated to a set of arrays appearing on the left side of the plurality of formulas. 4. The method according to claim 1, further comprising the steps of:

3. A plurality of calculation formulas each of which can be calculated independently, wherein all the arrays included in each calculation formula are one-dimensional arrays, the number and type of operation are the same, and the left side of the plurality of calculation formulas 3. The vector according to claim 1 or 2, wherein a plurality of formulas in which the sum of the number of elements of the arrays appearing in the array and the number of elements of the arrays appearing in the same position on the right side are equal to or smaller than the size of the vector register held by the computer are optimized. Calculation method.

4. A DO loop including a plurality of formulas which can be independently calculated in a loop, wherein the arrays included in the formulas are all multidimensional arrays having the same number of dimensions and the number and type of operation. And the sum of the number of elements per iteration of a loop between arrays appearing on the left side and arrays appearing at the same position on the right side of a plurality of formulas is equal to or less than the size of the vector register held by the computer. 3. The vector calculation method according to claim 1, wherein a plurality of calculation expressions in the DO loop are to be optimized.