JPH056712B2

JPH056712B2 -

Info

Publication number: JPH056712B2
Application number: JP59213315A
Authority: JP
Inventors: Masaki Aoki; Hiroshi Nakada; Toshihiro Hirabayashi
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1984-10-12
Filing date: 1984-10-12
Publication date: 1993-01-27
Also published as: JPS61100862A

Description

【発明の詳細な説明】〔産業上の利用分野〕本発明は、ベクトル計算機用のオブジエクト・
モジユールを作成するコンパイラ、特にベクトル
化された複数のDOループ間における命令の逐次
化処理方式に関するものである。[Detailed Description of the Invention] [Industrial Field of Application] The present invention provides an object
It is concerned with compilers that create modules, especially the method of serializing instructions between multiple vectorized DO loops.

[Conventional technology and problems]

ベクトル計算機においては、演算器の高速化と
その演算器に見合うデータの供給能力が、実行効
率向上の重要な鍵である。このため最近のベクト
ル計算機では、並列動作可能な２本のロード／ス
トア・パイプラインを用意し、データの供給能力
を高めている。しかし、複数のロード／ストア・
パイプラインが並列に動作することにより、メモ
リ・アクセス命令の同期化（逐次化と同義）が必
要となつてきた。ハードウエアでは、このような
同期化は困難であり、従来のベクトル計算機を含
むシステムでは、これをソフトウエアで実現して
いる。 In vector computers, increasing the speed of arithmetic units and the ability to supply data commensurate with the arithmetic units are important keys to improving execution efficiency. For this reason, recent vector computers have two load/store pipelines that can operate in parallel to increase their data supply capacity. However, multiple load/store
As pipelines operate in parallel, it has become necessary to synchronize (synonymous with serialization) memory access instructions. Such synchronization is difficult to achieve with hardware, and in systems including conventional vector computers, this is accomplished using software.

ベクトル計算機のハードウエアでは、メモリ・
アクセス命令の同期化手段としては、下記のもの
がある。 Vector computer hardware requires memory and
The access command synchronization means include the following.

(a) パイプラインID ベクトルのメモリ・アクセス命令が動作するパ
イプラインを指定するもので、順序関係を保証す
る必要のあるメモリ・アクセス命令を同一のパイ
プラインで動作させることにより同期を取ること
が出来る。(a) Pipeline ID This specifies the pipeline in which the memory access instructions of the vector operate. Synchronization can be achieved by running memory access instructions that require order guarantees in the same pipeline. I can do it.

(b) 同期化命令（POST／WAIT命令）メモリ・アクセス命令間の順序関係を同期化命
令で保証する方法である。この方法を用いること
により、POST命令以前のメモリ・アクセス命令
とWAIT命令以後のメモリ・アクセス命令との
同期を取ることが出来る。(b) Synchronization instruction (POST/WAIT instruction) This is a method of guaranteeing the order relationship between memory access instructions using a synchronization instruction. By using this method, it is possible to synchronize the memory access instructions before the POST instruction and the memory access instructions after the WAIT instruction.

同期化処理においては、単にメモリ・アクセス
命令の順序関係を保証するだけではなく、実行性
能が低下しないように効率的に同期化を行う必要
がある。しかしながら、従来のコンパイラにおい
ては、ベクトル化された複数のDOループ間での
データの依存関係を考慮していなかつた。そのた
め、個々のDOループ単位にその終了時点で逐次
化処理が成されており、並列処理計算における実
行効率低下の一因となつていた。 In synchronization processing, it is necessary not only to guarantee the order of memory access instructions, but also to perform synchronization efficiently so that execution performance does not deteriorate. However, conventional compilers do not take into account data dependencies between multiple vectorized DO loops. Therefore, serialization processing is performed for each DO loop at the end of each DO loop, which is one of the causes of reduced execution efficiency in parallel processing calculations.

[Purpose of the invention]

本発明は、上記の考察に基づくものであつて、
複数のDOループ間において最適な命令の逐次化
処理を施し、実行性能を高めることを目的として
いる。 The present invention is based on the above considerations, and includes:
The purpose is to improve execution performance by performing optimal instruction serialization processing between multiple DO loops.

[Means to achieve the purpose]

そしてそのため、本発明の命令の逐次化方式はベクトル化後の中間テキストについて逐次化処
理を施す逐次化処理部を持つコンパイラにおい
て、上記逐次化処理部が、制御の流れが一定のDOループ群を取り出す
処理と、配列に出現する添字を参照して、DOループ
群内のデータ依存関係を調べる処理と、 DOループ間にベクトルとスカラの依存関係
があるか否かを調べる処理と、で依存関係なしとされたことを条件に、パ
イプラインIDによる多重ループ内の逐次化を
施す処理と、で依存関係ありとされたことを条件に、
DOループ単位に逐次化を施す処理と、の逐次化の効率が良好か否かを調べる処理
と、の処理で良好でないとされたことを条件
に、DOループ間の逐次化を施す処理と、の逐次化の効率が良好か否かを調べる処理
と、の処理で良好でないとされたことを条件
に、DOループ内の逐次化を施す処理とを行うように構成されていることを特徴とするものである。 Therefore, the instruction serialization method of the present invention is such that in a compiler that has a serialization processing unit that performs serialization processing on intermediate text after vectorization, the serialization processing unit executes a group of DO loops with a constant flow of control. The process of extracting data, the process of checking the data dependencies within a group of DO loops by referring to the subscripts that appear in the array, and the process of checking whether there are vector and scalar dependencies between DO loops. On the condition that there is no dependency, the process of serializing multiple loops using pipeline ID, and on the condition that there is a dependency relationship,
A process of serializing each DO loop, a process of checking whether the serialization efficiency of is good, a process of serializing between DO loops on the condition that it is not good in the process of It is characterized by being configured to perform processing to check whether the serialization efficiency of is good or not, and processing to perform serialization in the DO loop on the condition that it is not good in the processing of . It is something to do.

[Embodiments of the invention]

以下、本発明を図面を参照しつつ説明する。第
１図は本発明のコンパイラの概要を示す図であ
る。このコンパイラは、ベクトル計算機を含むシ
ステムで実行されるオブジエクト・モジユールを
生成するVPコンパイラである。第１図において、
１はソース解析部、２は番地割付け部、３はベク
トル化部、４は逐次化処理部、５は中間テキスト
最適化部、６はレジスタ割付け部、７は命令生成
部をそれぞれ示している。ソース解析部１は、宣
言文で定義された配列や変数とソース・プログラ
ムの手続き部における取扱との矛盾を検出した
り、未定義の配列や変数が定義又は参照されてい
ないかを調べると共に、ソース・プログラムをブ
ロツク化したりするものである。番地割付け部２
は、データに対してメモリ領域を割付たり、配列
や変数に対して初期値を与えたりするものであ
る。ベクトル化部３は、DOループをベクトル命
令列に変換するものである。逐次化処理部４は、
命令の逐次化を行うものである。本発明は逐次化
処理部４に関するものである。中間テキスト最適
化部５は、ベクトル化後の最適化等を行うもので
ある。レジスタ割付け部６は、データをレジスタ
に割付ける等の処理を行うものである。命令生成
部７は、中間テキストを機械語命令に変換するも
のである。 Hereinafter, the present invention will be explained with reference to the drawings. FIG. 1 is a diagram showing an outline of the compiler of the present invention. This compiler is a VP compiler that generates object modules that run on systems that include vector computers. In Figure 1,
1 is a source analysis section, 2 is an address allocation section, 3 is a vectorization section, 4 is a serialization processing section, 5 is an intermediate text optimization section, 6 is a register allocation section, and 7 is an instruction generation section. The source analysis unit 1 detects inconsistencies between arrays and variables defined in declaration statements and their handling in the procedure division of the source program, and checks whether undefined arrays and variables are defined or referenced. It is used to block source programs. Address allocation part 2
is used to allocate memory areas for data and give initial values to arrays and variables. The vectorizer 3 converts the DO loop into a vector instruction sequence. The serialization processing unit 4
It serializes instructions. The present invention relates to the serialization processing section 4. The intermediate text optimization unit 5 performs optimization after vectorization. The register allocation unit 6 performs processing such as allocating data to registers. The instruction generation unit 7 converts intermediate text into machine language instructions.

要約すると、本発明は、ベクトル化後の中間テ
キスト（命令列）において、ベクトル化の技術
（配列に出現する添字の振るまい方）を応用して、
広範囲にデータ依存関係を把握し、データ依存関
係（逐次化に必要なデータ）に対してパイプライ
ンID又は同期化命令を用いて最適な命令の逐次
化処理を施すものである。 To summarize, the present invention applies vectorization technology (how to behave subscripts appearing in an array) in intermediate text (instruction sequence) after vectorization,
It grasps data dependencies over a wide range and performs optimal instruction serialization processing for data dependencies (data required for serialization) using pipeline IDs or synchronization instructions.

第２図は本発明の命令の逐次化処理の流れを示
す図である。 FIG. 2 is a diagram showing the flow of instruction serialization processing according to the present invention.

制御の流れが一定のDOループ群を取出す。
第３図は制御の流れが一定なDOループ群の例
を示すものであり、矢印Ａ−Ｂ、Ｃ−Ｄ、Ｅ−
Ｆ等が制御の流れが一定なDOループ群をしめ
す。制御の流れが一定であるプログラム構造と
は、飛び出し／飛び込みがないプログラム構造
のことであり、最適化コンパイラ作成者にとり
自明のことである。 Extract a group of DO loops with constant control flow.
Figure 3 shows an example of a DO loop group with a constant flow of control, and arrows A-B, C-D, E-
F etc. indicate a group of DO loops with a constant flow of control. A program structure in which the flow of control is constant is a program structure without jumps/jumps, and is self-evident to the creator of an optimizing compiler.

DOループ群内のデータ依存関係を把握す
る。即ち、複数次元の添字に対して重なりをチ
エツクスする。この際、上位次元の添字情報に
おいて、ずれが生じていれば下位次元において
重なりはない。例えば下記のようなプログラム
があつたとする。 Understand data dependencies within DO loops. That is, the overlap is checked for subscripts in multiple dimensions. At this time, if there is a shift in the subscript information of the higher dimension, there is no overlap in the lower dimension. For example, suppose we have a program like the one below.

DO 10 Ｊ＝１，Ｎ DO 10 Ｉ＝１，ＮＡ（Ｉ，Ｊ）＝Ａ（Ｉ，Ｊ−１）＋Ｓ 10 CONTINUE この文章は下記のように展開される。 DO 10 J=1,N DO 10 I=1,N A(I,J)=A(I,J-1)+S 10 CONTINUE This article is expanded as follows.

DO 10 Ｉ＝１，Ｎ 10 Ａ（Ｉ，１）＝Ａ（Ｉ，Ｏ）＋Ｓ DO 10′I＝１，Ｎ 10′A（Ｉ，２）＝Ａ（Ｉ，１）＋Ｓこの例において、内側のDOループでは２次
元目の添字が異なるため、Ａのメモリ・アクセ
スに対して重なりはない（逐次化不必要）。し
かし、外側のループを考えたときＡのストアと
Ａのロードで重なりが生じ、逐次化を行う必要
がある。 DO 10 I=1,N 10 A(I,1)=A(I,O)+S DO 10′I=1,N 10′A(I,2)=A(I,1)+S In this example, In the inner DO loop, since the subscripts of the second dimension are different, there is no overlap for A's memory access (serialization is not necessary). However, when considering the outer loop, there is an overlap between the store of A and the load of A, and it is necessary to perform serialization.

DOループ間にベクトルとスカラの依存関係
があるか否かを調べる。Yesのときはの処理
を行い、Noのときはの処理を行う。 Check if there are vector and scalar dependencies between DO loops. If Yes, perform the process; if No, perform the process.

多重DOループ内の逐次化を行う。、即ち外
側の回転によるデータの依存関係に基づき逐次
化を行う。 Perform serialization within multiple DO loops. , that is, serialization is performed based on the data dependence due to the outer rotation.

DOループ単位の逐次化を行う。逐次化は、
パイプラインID又は同期化命令により行われ
る。 Serializes each DO loop. Serialization is
This is done by pipeline ID or synchronization instructions.

効率をチエツクする。即ち、パイプライン
IDの密度を調べる。NGであればの処理を行
う。効率が良好か否かの判定基準は実行性能に
より決定され、ハードウエア毎の特性による。 Check efficiency. i.e. pipeline
Check the density of IDs. If it is NG, process it. The criteria for determining whether efficiency is good or not is determined by execution performance and depends on the characteristics of each hardware.

DOループ間の逐次化を行う。即ち、最内次
元のみ（上から下のみ）のデータ依存関係に基
づいて逐次化を行う。逐次化は、パイプライン
ID又は同期化命令により行われる。 Perform serialization between DO loops. That is, serialization is performed based on data dependence relationships only in the innermost dimension (from top to bottom). Serialization is a pipeline
This is done by ID or synchronization command.

効率をチエツクする。NGであればの処理
を行う。効率が良好か否かの判定基準は実行性
能により決定され、ハードウエア毎の特性によ
る。 Check efficiency. If it is NG, process it. The criteria for determining whether efficiency is good or not is determined by execution performance and depends on the characteristics of each hardware.

DOループ内の逐次化を行う。即ち、DOル
ープ内の閉じたデータ依存関係に基づいて逐次
化を行う。逐次化は、パイプラインID又は同
期化命令により行われる。 Perform serialization within the DO loop. That is, serialization is performed based on closed data dependencies within the DO loop. Serialization is performed using pipeline IDs or synchronization instructions.

次に本発明を具体例で説明する。いま、下記の
ようなDOループ群を考える。 Next, the present invention will be explained using specific examples. Now, consider the following DO loops.

DO 10 Ｊ＝２，100 DO 10 Ｉ＝２，100 Ａ（Ｉ，Ｊ）＝Ａ（Ｉ−１，Ｊ−１）＋Ａ（Ｉ，
Ｊ−１） 10 CONTINUE この例では、内側DOループの回転によるデー
タ依存関係はない。しかし、外側DOループの回
転により→、→なるデータ依存関係が生
ずる。なお、はＡ（Ｉ，Ｊ）を、はＡ（Ｉ−
１，Ｊ−１）を、はＡ（Ｉ，Ｊ−１）を示して
いる。この場合、広域的な範囲（外側のDOルー
プのデータ依存関係）で同期化を行うと、ない
しのメモリ・アクセスに対して同一のパイプラ
インIDが必要になるため、並列処理効率が著し
く悪くなる。従つて、局所的範囲で（内側DOル
ープのデータ依存関係で）同期化を行う方が良
い。このとき、他範囲のデータ依存関係は、
POST／WAIT命令により同期化を取る。 DO 10 J=2,100 DO 10 I=2,100 A(I, J)=A(I-1, J-1)+A(I,
J-1) 10 CONTINUE In this example, there is no data dependency due to rotation of the inner DO loop. However, due to the rotation of the outer DO loop, data dependencies such as → and → occur. In addition, is A(I, J), and is A(I-
1, J-1), and A(I, J-1). In this case, if synchronization is performed over a wide range (data dependencies of the outer DO loop), the same pipeline ID is required for all memory accesses, resulting in a significant decrease in parallel processing efficiency. . Therefore, it is better to synchronize locally (with the data dependencies of the inner DO loop). At this time, the data dependence of other ranges is
Synchronization is achieved using the POST/WAIT command.

広域的な範囲で同期化が最適な場合の例につい
て説明する。いま、下記のようなDOループ群を
考える。 An example where synchronization is optimal over a wide area will be described. Now, consider the following DO loops.

DO 10 Ｊ＝１，100 DO 10 Ｉ＝１，100 Ａ（Ｉ，Ｊ）＝Ｂ（Ｉ，Ｊ）＋Ａ（Ｉ，Ｊ−１） 10 CONTINUE この例は、先の例と同様の構造を持つが、外側
DOループの回転によるデータ依存関係は→
のみであり、広域的な範囲（外側DOループのデ
ータ依存関係）で同期化を行つても並列処理効率
は高い。なお、はＡ（Ｉ，Ｊ）を、はＡ（Ｉ，
Ｊ−１）を示している。従つて、パイプライン
IDを用いて広域的な範囲で同期化を行う方が最
適である。 DO 10 J=1,100 DO 10 I=1,100 A(I,J)=B(I,J)+A(I,J-1) 10 CONTINUE This example has a similar structure to the previous example. But outside
Data dependence due to DO loop rotation is →
Even if synchronization is performed over a wide range (data dependencies of the outer DO loop), parallel processing efficiency is high. In addition, is A(I, J), and is A(I,
J-1) is shown. Therefore, the pipeline
It is best to perform synchronization over a wide area using IDs.

広域的な範囲で同期化が最適な場合の他例につ
いて説明する。いま、下記のようなDOループ群
を考える。 Another example where synchronization is optimal over a wide area will be described. Now, consider the following DO loop group.

DO 10 Ｉ＝１，100 Ａ（Ｉ）＝Ｃ（Ｉ）＋Ｂ（Ｉ−１） 10 CONTINUE DO 20 Ｉ＝１，100 Ｂ（Ｉ）＝Ｃ（Ｉ）＊Ａ（Ｉ＋１） 20 CONTINUE この例においては、局所的な範囲で同期化を行
つた場合、DOループ間でPOST／WAIT命令に
より同期が取られるため、並列処理効率が悪くな
つてしまう。しかしDOループ間のデータ依存関
係で同期化した場合には、→及び→にパ
イプラインIDが必要となるのみで、並列処理効
率も高い。なお、はＡ（Ｉ）を、はＢ（Ｉ−
１）を、はＢ（Ｉ）を、はＡ（Ｉ＋１）を示し
ている。 DO 10 I=1,100 A(I)=C(I)+B(I-1) 10 CONTINUE DO 20 I=1,100 B(I)=C(I)*A(I+1) 20 CONTINUE In this example If synchronization is performed in a local range, synchronization is achieved between DO loops using POST/WAIT instructions, resulting in poor parallel processing efficiency. However, if data dependencies between DO loops are synchronized, pipeline IDs are only needed for → and →, and parallel processing efficiency is high. In addition, is A(I), and is B(I-
1), indicates B(I), and indicates A(I+1).

局所的な範囲で同期化が最適な場合の他例につ
いて説明する。いま、下記のようなDOループ群
を考える。 Another example where synchronization is optimal within a local range will be described. Now, consider the following DO loops.

DO 10 Ｉ＝１，100 Ａ（Ｉ）＝Ｃ（Ｉ） 10 CONTINUE DO 20 Ｊ＝１，50 Ｂ（Ｊ）＝Ａ（50） 20 CONTINUE この例では、DOループ間にベクトルとスカラ
の依存関係があるので、DOループ単位で逐次化
を行う。上記のDOループ群に対応するベクトル
命令列は下記のようになる。 DO 10 I=1,100 A(I)=C(I) 10 CONTINUE DO 20 J=1,50 B(J)=A(50) 20 CONTINUE In this example, there are vector and scalar dependencies between the DO loops. Therefore, serialization is performed in DO loop units. The vector instruction sequence corresponding to the above DO loop group is as follows.

VL VR1，Ｃ（１：100） VST VR1 Ａ（１：100） VPT VWT VL VR2，Ａ（50） VST VR2，Ｂ（１：50）なお、VLはベクトル・ロード命令、VSTはベ
クトル・ストア命令、VPTはPOST命令、VWT
はWAIT命令、VRXはベクトル・レジスタをそ
れぞれ示す。 VL VR1, C (1:100) VST VR1 A (1:100) VPT VWT VL VR2, A (50) VST VR2, B (1:50) Note that VL is a vector load instruction, and VST is a vector store instruction. , VPT is POST instruction, VWT
indicates the WAIT instruction, and VRX indicates the vector register.

〔Effect of the invention〕

以上の説明から明らかなように、本発明によれ
ば、データ依存関係を広範囲に把握し、最適な逐
次化処理を行うことにより、ベクトル化された
DOループ間（ベクトル命令列）及びその他の範
囲（スカラ命令列）との並列性が高まり、実行効
率が向上する。 As is clear from the above explanation, according to the present invention, data dependencies are comprehensively grasped and vectorized data is obtained by performing optimal serialization processing.
Parallelism between DO loops (vector instruction sequence) and other ranges (scalar instruction sequence) increases, improving execution efficiency.

[Brief explanation of the drawing]

第１図は本発明のコンパイラの概要を示す図、
第２図は本発明の命令の逐次化処理の流れを示す
図、第３図は制御の流れが一定なDOループ群の
例を示す図である。１……ソース解析部、２……番地割付け部、３
……ベクトル化部、４……逐次化処理部、５……
中間テキスト最適化部、６……レジスタ割付け
部、７……命令生成部。 FIG. 1 is a diagram showing an overview of the compiler of the present invention,
FIG. 2 is a diagram showing the flow of instruction serialization processing according to the present invention, and FIG. 3 is a diagram showing an example of a DO loop group with a constant control flow. 1... Source analysis section, 2... Address allocation section, 3
...Vectorization unit, 4...Serialization processing unit, 5...
Intermediate text optimization section, 6... Register allocation section, 7... Instruction generation section.

Claims

[Claims] 1. In a compiler having a serialization processing unit that performs serialization processing on intermediate text after vectorization, the serialization processing unit performs processing for extracting a group of DO loops with a constant flow of control, and an array. The process of checking the data dependencies within a group of DO loops by referring to the subscripts that appear in , and the process of checking whether there are vector and scalar dependencies between DO loops. On the condition that , the process of serializing multiple loops using pipeline ID, and on the condition that there is a dependency relationship with ,
A process of serializing each DO loop, a process of checking whether the serialization efficiency of is good, a process of serializing between DO loops on the condition that it is not good in the process of It is configured to perform processing to check whether the serialization efficiency of is good or not, and processing to perform serialization in the DO loop on the condition that it is not good in the processing of . An instruction serialization method characterized by: