JPS63285668A

JPS63285668A - Vector load processing method

Info

Publication number: JPS63285668A
Application number: JP12151787A
Authority: JP
Inventors: Hideo Serizawa; 芹澤　英夫; Masaki Aoki; 正樹青木
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1987-05-19
Filing date: 1987-05-19
Publication date: 1988-11-22

Abstract

PURPOSE:To shorten a period during which a vector register is occupied in order to hold the loaded vector data and to decrease the number of desired vector registers, by correcting a reference instruction so that the coincidence is secured between a vector register set at a loading destination and a reference vector register for the reference instruction. CONSTITUTION:A vector load instruction is detected and a reference instruction corresponding to said load instruction is retrieved. Then the vector load instruction is moved to a position close as much as possible to the reference instruction as long as the optimization is possible. Then more reference instructions are retrieved out of the subsequent intermediate code strings and a vector load instruction equivalent to the first vector load instruction is produced and put into a place near the reference instruction when a reference instruction is retrieved and can be optimized. This processing is continued for the intermediate code strings set within a prescribed range. Thus it is possible to secure the effect of optimization for decrease of the number of desired vector registers even when plural reference instructions are scattered in response to a single vector load instruction.

Description

【発明の詳細な説明】（概　要〕計算機のコンパイラの処理で、ベクトルレジスタの所要
数を減少する最適化のための、ベクトルロードの処理方
法である。[Detailed Description of the Invention] (Summary) This is a vector load processing method for optimization to reduce the required number of vector registers in computer compiler processing.

翻訳処理過程の中間コード列から、ベクトルロード命令
を検出し、ロードされたベクトルデータを参照する命令
のうち、先頭の参照命令の直前に該ベクトルロード命令
を移動し、その他の参照命令の直前には、同じベクトル
データのベクトルロード命令を生成して挿入し、それら
のロード先のベクトルレジスタと参照命令の参照ベクト
ルレジスタとが一致するように参照命令を修正する。A vector load instruction is detected from the intermediate code string in the translation processing process, and the vector load instruction is moved to immediately before the first reference instruction among the instructions that refer to the loaded vector data, and immediately before the other reference instructions. generates and inserts vector load instructions for the same vector data, and modifies the reference instruction so that the vector registers to which they are loaded match the reference vector register of the reference instruction.

この方法により、ロードされたベクトルデータを保持す
るためにベクトルレジスタを占有している期間が短縮さ
れ、所要ベクトルレジスタ数を減少できる。With this method, the period during which vector registers are occupied to hold loaded vector data is shortened, and the number of required vector registers can be reduced.

[Industrial application field]

本発明は、計算機プログラムを翻訳するコンパイラの処
理方法に係り、特に所要のベクトルレジスタ数を減少す
るための、ベクトルロードの処理方法に関する。The present invention relates to a processing method for a compiler that translates a computer program, and particularly to a vector load processing method for reducing the number of required vector registers.

[Conventional technology]

第３図は計算機の構成例を示すブロック図である。処理
装置１はコンパイラ２のプログラムを実行することによ
り、記憶装置３に格納されている原始プログラム４を翻
訳して目的プログラム５を記憶装置６へ出力する。FIG. 3 is a block diagram showing an example of the configuration of a computer. By executing the program of the compiler 2, the processing device 1 translates the source program 4 stored in the storage device 3 and outputs the target program 5 to the storage device 6.

原始プログラム４は例えばＦＯＲＴＲＡＮプログラミン
グ言語で記述されたプログラムであり、コンパイラ２は
、この原始プログラムからいわゆるベクトルプロセッサ
で実行するようにベクトル化した目的プログラムを生成
することができる。The source program 4 is, for example, a program written in the FORTRAN programming language, and the compiler 2 can generate from this source program a vectorized target program to be executed by a so-called vector processor.

このために、コンパイラ２において、中間コード生成部
７が原始プログラム４を読み込んでプログラム文を解析
し、中間コード列１０を生成して記憶装置１１に出力す
る。For this purpose, in the compiler 2, the intermediate code generation unit 7 reads the source program 4, analyzes the program statements, generates an intermediate code string 10, and outputs it to the storage device 11.

中間コード列の生成においてコンパイラはデータに記憶
領域を割り付け、又ベクトルデータの処理部分等でベク
トルプロセッサによる並列実行可能な部分を検出して、
ベクトルプロセッサで実行されるベクトル命令に対応さ
せるための、いわゆるベクトル化を行い、それらを所定
の中間コード列で表現する。When generating an intermediate code string, the compiler allocates a storage area for the data, and also detects parts that can be executed in parallel by a vector processor, such as in the processing of vector data.
In order to correspond to vector instructions executed by a vector processor, so-called vectorization is performed, and these are expressed as a predetermined intermediate code string.

この中間コード列１０について、中間コード最適化部８
は、目的プログラムの実行効率等を改善するようにプロ
グラムを変更する最適化処理を実行し、その結果を目的
プログラム生成部９によって処理して、いわゆる機械語
のスカラ命令及びベクトル命令からなる目的プログラム
を生成し、目的プログラム５として出力する。Regarding this intermediate code string 10, the intermediate code optimization unit 8
executes an optimization process that changes the program to improve the execution efficiency of the target program, and the result is processed by the target program generation unit 9 to create a target program consisting of so-called machine language scalar instructions and vector instructions. is generated and output as the target program 5.

第４図に、公知のＦＯＲＴＲＡＮ言語で記述された原始
プログラム、及びそれから生成される中間コード列の一
例を示す。FIG. 4 shows an example of a source program written in the well-known FORTRAN language and an intermediate code string generated from the source program.

図の原始プログラム１７は公知のように、ＤＯ文とラベ
ル「１０」のＣｏＮＴＩＮＵＥ文とに挟まれた２個の代
人文を、添字「■」の値を１から１００まで１づつ増加
させて、繰り返し実行することを指定する内容である。As is well known, the source program 17 shown in the figure increases the value of the subscript "■" by 1 from 1 to 100 for the two substitute sentences sandwiched between the DO statement and the CoNTINUE statement with the label "10". This is the content that specifies repeated execution.

従って、代入文のオペランドにあるＡ（Ｉ）、Ｂ（１）
、Ｃ（Ｉ）、−・−・のデータは、典型的なベクトルデ
ータであるので、このプログラムはベクトル化されて２
、中間コード列１８が生成される。Therefore, A(I), B(1) in the operands of the assignment statement
, C(I), --- data is typical vector data, so this program is vectorized into 2
, an intermediate code string 18 is generated.

中間コード列１８ニおイテ、ｒＶＬＥＮＧ＝１００　Ｊ
　ハ＋れに続く中間コードで処理するベクトルデータの
ベクトル長（ベクトルの要素数）を示し、Ｂ（＊）、Ｃ
（＊）等の形式で指定の要素数の各ベクトルデータを示
す。Intermediate code string 18 days, rVLENG=100 J
B(*), C
Indicates each vector data of the specified number of elements in the format such as (*).

ｖｔｌ　、ｖｔ２等はベクトルプロセッサから割り当て
るべきベクトルレジスタを示し、従って例えばｒｖｔｌ
＝８（本）」等で示されるベクトルロード命令の中間コ
ード（以下では、単にベクトルロード命令というものと
し、その他の中間コードについても同様とする）は、ベ
クトルデータＢの１００個の要素データをベクトルレジ
スタνｔ１にロードするベクトル命令に対応する。vtl, vt2, etc. indicate vector registers to be allocated from the vector processor, so for example rvtl
The intermediate code of the vector load instruction (hereinafter referred to simply as the vector load instruction, and the same applies to other intermediate codes), which is indicated by "=8 (book)", etc. Corresponds to a vector instruction to load vector register νt1.

又ｒｖｔ３＝ｖｔｌ＋ｖｔ２　Ｊのような中間コードは
、それ以前のベクトルロード等の処理でベクトルレジス
タｖｔｌ及びｖｔ２にロードされているベクトルデータ
の要素間の加算結果のベクトルデータを、ベクトルレジ
スタｖｔ３に格納することを示す。Also, an intermediate code such as rvt3=vtl+vt2 J stores vector data, which is the result of addition between elements of vector data loaded into vector registers vtl and vt2 in a previous vector load process, in vector register vt3. Show that.

公知のように、ベクトルプロセッサは、いわゆるパイプ
ライン方式の演算機構によって多数の同種の演算を並列
に実行することによって高速処理を実現する処理装置で
あり、その特徴を有効に活用するには、なるべく大きな
味クトル長のベクトルデータの演算を行うようにベクト
ル化することが必要である。As is well known, a vector processor is a processing device that achieves high-speed processing by executing a large number of similar operations in parallel using a so-called pipeline calculation mechanism. It is necessary to perform vectorization to perform calculations on vector data with a large vector length.

そのためには、ＶＬＨＮＧで指定する必要な要素数のベ
クトルデータを格納できるベクトルレジスタを必要個数
膜けなければならないが、ベクトルプロセッサのパード
ウ亙ア構成上から、例えば８１９２要素のデータを格納
できる記憶装置を使用して、８分割して８個までの各１
０２４要素データを保持するベクトルレジスタとし、８
個で不足の場合は１６分割して、１６個までの各５１２
要素データを保持するベクトルレジスタとし、このよう
にして、３２個のベクトルレジスタに分割した場合は、
各保持できるデータ要素数は２５６に、６４個のベクト
ルレジスタにすればデータ要素数は１２８に減少するよ
うに構成される。To do this, it is necessary to install the necessary number of vector registers that can store the required number of elements of vector data specified by the VLHNG, but due to the hardware configuration of the vector processor, a storage device that can store data of, for example, 8192 elements is necessary. Divide into 8 pieces using
A vector register holding 024 element data, 8
If there are not enough pieces, divide it into 16 pieces and make up to 16 pieces each with 512 pieces.
If we use a vector register that holds element data and divide it into 32 vector registers in this way,
The number of data elements that can be held by each register is 256, and the number of data elements can be reduced to 128 by using 64 vector registers.

一般に、複雑な演算を要するプログラムになると、ＶＬ
ＥＮＧで指定された所要ベクトル長を有し、同時に必要
になるベクトルレジスタの個数は増加する。その結果、
必要なベクトルレジスタ数を得るために、可能なベクト
ル長が短くなる場合には、一連の演算の途中でベクトル
レジスタの内容を入れ換える等の処理が必要になり、そ
の結果前記パイプライン演算機構の処理効率を落さざる
を得ない状況が生じる。Generally, when it comes to programs that require complex operations, VL
The number of vector registers that are required at the same time and has the required vector length specified by ENG increases. the result,
In order to obtain the required number of vector registers, if the possible vector length becomes shorter, processing such as exchanging the contents of the vector registers during a series of operations becomes necessary, and as a result, the processing of the pipeline operation mechanism described above becomes shorter. A situation arises in which efficiency must be reduced.

従って、常に所要ベクトルレジスタ数をできるだけ減少
させる処置をとることが望ましく、このための中間コー
ド最適化処理の方式が、本出願人の特許願（特願昭６１
−３９６７３号）の明細書に開示されている。Therefore, it is desirable to always take measures to reduce the number of required vector registers as much as possible, and a method of intermediate code optimization processing for this purpose is proposed in the patent application filed by the present applicant (Japanese Patent Application No. 61
-39673).

即ち、第５図に示す最適化処理において、処理ステップ
２０で１つのＶＬＥＮＧで指定されたベクトル長の有効
範囲で、同時に必要になるベクトルレジスタ個数の状況
を把握し、処理ステップ２１で中間コード列を走査して
ベクトルロード命令の１つを検出する。That is, in the optimization processing shown in FIG. 5, in processing step 20, the situation of the number of vector registers that will be simultaneously required within the effective range of the vector length specified by one VLENG is grasped, and in processing step 21, the intermediate code string is is scanned to detect one of the vector load instructions.

処理ステップ２２で、検出したベクトルロード命令に続
く中間コードを走査して、そのベクトルロード命令でロ
ードされたベクトルレジスタを参照しているベクトル演
算命令（以下において参照命令という）を検出し、処理
ステップ２３でベクトルロード命令を参照命令の前のな
るべく近傍、通常は直前の位置、に移動することによっ
て、ベクトルレジスタの所要数を減少できるが識別し、
可能であれば処理ステップ２４でベクトルロード命令を
その位置へ移動する。In processing step 22, the intermediate code following the detected vector load instruction is scanned to detect a vector operation instruction (hereinafter referred to as reference instruction) that references the vector register loaded by the vector load instruction, and the processing step 23, the required number of vector registers can be reduced by moving the vector load instruction to a position as close as possible before the referenced instruction, usually to the immediately previous position;
If possible, process step 24 moves the vector load instruction to that location.

この処理により、例えば第６図（ａ）の中間コード列１
２の例のベクトルロード命令１３と参照命令１４との場
合のように、ベクトルロード命令でロードされたベクト
ルレジスタ（図の例ではｖｔｌ）をはじめて参照するベ
クトル演算命令が、ベクトルロード命令から離れた位置
にある場合に、ら）の中間コード列１５に示すように、
そのベクトルロード命令１３を、参照命令１４の直前の
位置に移動することにより、中間コード列１３のプログ
ラム区間１６において必要となるベクトルレジスタの個
数を減少することができる。Through this process, for example, intermediate code string 1 in FIG.
As in the case of vector load instruction 13 and reference instruction 14 in Example 2, the vector operation instruction that references the vector register (vtl in the example in the figure) loaded by the vector load instruction for the first time is separated from the vector load instruction. As shown in intermediate code string 15 of ra),
By moving the vector load instruction 13 to the position immediately before the reference instruction 14, the number of vector registers required in the program section 16 of the intermediate code string 13 can be reduced.

[Problem that the invention seeks to solve]

ベクトル化される原始プログラムの前記００文によるル
ープが大規模になると、その中間コード列の１つのベク
トルロード命令に対応する参照命令が複数存在し、それ
らがプログラム中に分散している場合がしばしば生じ、
そのような場合には、前記の最適化処理による、１参照
命令に関するベクトルロード命令の移動のみでは、ベク
トルレジスタ数を減少する効果が局限され、十分な効果
を期待出来ないという問題がある。When the loop using the 00 statements in the source program to be vectorized becomes large-scale, there are often multiple reference instructions corresponding to one vector load instruction in the intermediate code string, and these instructions are scattered throughout the program. arise,
In such a case, there is a problem in that the effect of reducing the number of vector registers is limited and a sufficient effect cannot be expected if only the vector load instruction related to one reference instruction is moved by the optimization process described above.

[Means for solving problems]

第１図は、本発明の構成を示す処理の流れ図である。 FIG. 1 is a process flowchart showing the configuration of the present invention.

図はコンパイラにおける所要ベクトルレジスタ数を減少
するための最適化処理の流れを示し、３０〜３８は処理
ステップを示す。The figure shows the flow of optimization processing for reducing the number of required vector registers in the compiler, and 30 to 38 indicate processing steps.

[For production]

コンパイラは中間コード列について、所要ベクトルレジ
スタ数を減少するための最適化処理を、第１図の処理ス
テップ３０〜３４において、従来のようにベクトルロー
ド命令を検出し、それに対応する参照命令を検索して、
最適化できるなら参照命令のなるべく近傍にベクトルロ
ード命令を移動するように実行する。The compiler performs optimization processing to reduce the number of required vector registers for the intermediate code string in processing steps 30 to 34 in FIG. do,
If optimization is possible, move the vector load instruction as close to the reference instruction as possible.

次に処理ステップ３５で、後続の中間コード列から更に
参照命令を検索し、参照命令があれば処理ステップ３６
〜３８で、最適化ができる場合には前記ベクトルロード
命令と等価なベクトルロード命令を生成して参照命令の
近傍に挿入するための処理を行い、所要の範囲の中間コ
ード列について以上の処理を続ける。Next, in processing step 35, a reference instruction is further searched from the subsequent intermediate code string, and if there is a reference instruction, processing step 36
In steps 38 to 38, if optimization is possible, a process is performed to generate a vector load instruction equivalent to the vector load instruction and insert it near the reference instruction, and the above process is performed for the intermediate code string in the required range. continue.

以上の処理方法により、１つのベクトルロード命令に対
応する複数の参照命令が分散している場合にも、所要ベ
クトルレジスタ数を減少する最適化の効果を上げること
ができる。With the above processing method, even when a plurality of reference instructions corresponding to one vector load instruction are distributed, it is possible to increase the optimization effect of reducing the number of required vector registers.

〔Example〕

第１図の処理の流れにおいて、処理ステップ３０〜３４
は前記従来の処理ステップ２０〜２４の各処理と同様と
し、その結果ベクトルレジスタ数を減少する最適化が可
能な場合には、ベクトルロード命令が参照命令の近傍に
移動される。In the processing flow of FIG. 1, processing steps 30 to 34
are similar to each of the conventional processing steps 20 to 24, and as a result, if optimization to reduce the number of vector registers is possible, the vector load instruction is moved to the vicinity of the reference instruction.

次に本発明により、処理ステップ３５で上記参照命令に
後続する中間コード列について、更に参照命令の検索を
続ける。Next, according to the present invention, in processing step 35, the search for a reference instruction is further continued in the intermediate code string following the reference instruction.

検索の結果、参照命令があった場合には、処理゛　ステ
ップ３６において、検出しであるベクトルロード命令と
等価なベクトルロード命令を生成して、その参照命令の
前の適当な近傍に挿入することにより、検出しであるベ
クトルロード命令でロードされたベクトルレジスタを、
その前の参照命令の後も引き続いて占有する必要がない
ようにすることによって、ベクトルレジスタ所要数の減
少が得られるか判定する。As a result of the search, if there is a reference instruction, the process ゛ In step 36, a vector load instruction equivalent to the detected vector load instruction is generated and inserted in an appropriate vicinity before the reference instruction. detects the vector register loaded by the vector load instruction,
It is determined whether the required number of vector registers can be reduced by eliminating the need for continued occupation even after the previous reference instruction.

その結果ベクトルレジスタ数減少効果があれば処理ステ
ップ３７で上記のようなベクトルロード命令の挿入を行
う。As a result, if there is an effect of reducing the number of vector registers, a vector load instruction as described above is inserted in processing step 37.

ニーで、検出したベクトルロード命令と等価なベクトル
ロード命令とは、前者と同一のベクトルデータをロード
するが、ロード先とするベクトルレジスタは必ずしも同
一ではないベクトルロード命令を意味するものとし、挿
入するために生成するベクトルロード命令のロード先ベ
クトルレジスタには、使用状況を考慮して適当なベクト
ルレジスタを割り当てる。A vector load instruction equivalent to the detected vector load instruction means a vector load instruction that loads the same vector data as the former, but the vector register to which it is loaded is not necessarily the same, and is inserted. An appropriate vector register is allocated to the load destination vector register of the vector load instruction generated for the purpose, taking into consideration the usage situation.

その結果、参照命令の修正が一般に必要になり、処理ス
テップ３８において参照命令の参照オペランドに指定し
である、該当のベクトルレジスタ塩を、挿入したベクト
ルロード命令で割り当てたベクトルレジスタ塩に変更す
る。As a result, it is generally necessary to modify the reference instruction, and in process step 38 the corresponding vector register salt specified in the reference operand of the reference instruction is changed to the vector register salt assigned by the inserted vector load instruction.

その後処理ステップ３５に戻り、以上の処理ステップ３
５〜３８の処理を参照命令がある間反復し、処理ステッ
プ３５でＶＬ［！ＮＧ命令の有効範囲の中間コード列の
残りの範囲を検索して、参照命令が無かったことにより
処理を終わる。After that, the process returns to processing step 35, and the above processing step 3
Processes 5 to 38 are repeated as long as there is a reference instruction, and in processing step 35, VL[! The remaining range of the intermediate code string within the effective range of the NG instruction is searched, and since there is no reference instruction, the process ends.

第２図の中間コード列４０はベクトルロード命令４１に
対応する参照命令が４２〜４４のように複数個分散して
いる例を示し、中間コード列４５は、中間コード列４０
から前記の最適化処理によって更新された結果の例であ
る。An intermediate code string 40 in FIG. 2 shows an example in which a plurality of reference instructions corresponding to a vector load instruction 41 are distributed as 42 to 44, and an intermediate code string 45 corresponds to the intermediate code string 40.
This is an example of the result updated by the optimization process described above.

この例は、何れの参照命令についても最適化処理が有効
と判定された場合とし、中間コード列４５において、ベ
クトルロード命令４１は参照命令４２の直前に移動され
、参照命令４３及び４４の直前にはベクトルロード命令
と等価なベクトルロード命令４６及び４７がそれぞれ挿
入される。In this example, it is assumed that optimization processing is determined to be effective for any reference instruction, and in the intermediate code string 45, the vector load instruction 41 is moved immediately before the reference instruction 42, and the vector load instruction 41 is moved immediately before the reference instructions 43 and 44. Vector load instructions 46 and 47, which are equivalent to the vector load instruction, are inserted, respectively.

挿入されるベクトルロード命令４６．４７のロード先ベ
クトルレジスタには、それぞれ適当に割り当てたベクト
ルレジスタｖｔＸ　％　Ｖｊｙが指定されるので、それ
に応じて参照命令４３．４４の参照オペランドの１つが
、それぞれ参照命令４８．４９として示すように、それ
ぞれｖｔ、及びｖｔ、に変更される。The vector registers vtX % Vjy assigned appropriately are specified in the load destination vector registers of the vector load instructions 46 and 47 to be inserted, so one of the reference operands of the reference instructions 43 and 44 is respectively referenced accordingly. vt and vt, respectively, as shown in instructions 48.49.

〔Effect of the invention〕

以上の説明から明らかなように本発明によれば、ベクト
ル化した目的プログラムを生成するコンパイラにおいて
、同時に必要になるベクトルレジスタの個数を減少する
最適化の効果を向上することができるので、より実行効
率の良い目的プログラムの生成が可能になるという著し
い工業的効果がある。As is clear from the above description, according to the present invention, in a compiler that generates a vectorized target program, it is possible to improve the optimization effect of reducing the number of vector registers that are simultaneously required. This has a significant industrial effect in that it becomes possible to generate efficient target programs.

[Brief explanation of the drawing]

第１図は本発明の構成を示す処理の流れ図、第２図は本
発明の詳細な説明する図、第３図は計算機の構成例ブロック図、第４図は中間コード列の説明図、第５図は従来の処理の流れ図、第６図は従来の処理例を説明する図である。図において、１は処理装置、　　　　　２はコンパイラ、３．６．１
１は記憶装置、４．１７は原始プログラム、５は目的プ
ログラム、　７は中間コード生成部、８は中間コード最
適化部、９は目的プログラム生成部、１０．１２．１５．１８．４０．４５は中間コード列、
２０〜２４．３０〜３８は処理ステップ本発明の構成を
示す処理の流れ図第１図本発明の詳細な説明する図第２図計算機の構成例ブロック図第３図中間コード列の説明図第４図従来の処理の流れ図第５図FIG. 1 is a process flowchart showing the configuration of the present invention, FIG. 2 is a diagram explaining the invention in detail, FIG. 3 is a block diagram of an example configuration of a computer, FIG. 4 is an explanatory diagram of an intermediate code string, FIG. 5 is a flowchart of conventional processing, and FIG. 6 is a diagram explaining an example of conventional processing. In the figure, 1 is a processing unit, 2 is a compiler, 3.6.1
1 is a storage device, 4.17 is a source program, 5 is a target program, 7 is an intermediate code generation unit, 8 is an intermediate code optimization unit, 9 is a target program generation unit, 10.12.15.18.40.45 is the intermediate code string,
20 to 24. 30 to 38 are processing steps. Fig. 1 is a process flow diagram showing the configuration of the present invention. Fig. 2 is a detailed explanation of the invention. Fig. 2 is a block diagram of an example of the configuration of a computer. Fig. 3 is an explanatory diagram of an intermediate code string. Figure 5: Flowchart of conventional processing

Claims

[Scope of Claim] A compiler that translates a source program to generate a vectorized intermediate code string, and generates a target program by optimizing the intermediate code string, wherein vector data is generated from the intermediate code string. Detects a vector load instruction that loads the vector register into the vector register (30, 3
1) Move the vector load instruction to the vicinity of the first reference instruction among the instructions that refer to the vector data using the vector register (32 to 34), and move each reference instruction other than the first reference instruction. Insert a vector load instruction to load the vector data near each (35 to 3).
7) A vector load processing method, characterized in that the vector register referenced by each reference instruction is changed to match the vector register loaded by each inserted vector load instruction (38).