JPH0440742B2

JPH0440742B2 -

Info

Publication number: JPH0440742B2
Application number: JP61302371A
Authority: JP
Inventors: Toshihiro Hirabayashi; Shinya Miura
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1986-12-18
Filing date: 1986-12-18
Publication date: 1992-07-06
Also published as: JPS63155264A

Description

【発明の詳細な説明】〔概要〕コンパイラによつてベクトル演算命令が生成さ
れる原始プログラムについて、予めそのプログラ
ムの実行解析情報に基づき、ベクトル化により実
行性能が低下するループを自動検出し、そのルー
プのベクトル化を抑止する最適化制御行を組み入
れる手段を設けることにより、ベクトル化による
性能低下要因を除去し、FORTRANプログラム
等の原始プログラムをベクトル計算機向けに最適
チユーニングする。[Detailed Description of the Invention] [Summary] For a source program in which vector operation instructions are generated by a compiler, loops whose execution performance deteriorates due to vectorization are automatically detected based on execution analysis information of the program in advance. By providing a means to incorporate an optimization control line that suppresses vectorization of loops, the factors that degrade performance due to vectorization are removed, and source programs such as FORTRAN programs are optimally tuned for vector computers.

[Industrial application field]

本発明は、FORTRANプログラム等の原始プ
ログラムをベクトル計算機向けに最適チユーニン
グするベクトル計算機用言語チユーニング処理方
式に関するものである。 The present invention relates to a language tuning processing method for vector computers that optimally tunes source programs such as FORTRAN programs for vector computers.

データ処理装置による化学技術計算等では、大
量のデータを高速に演算するベクトル計算機が用
いられている。例えばFORTRAN言語等により
記述された原始（ソース）プログラムを、コンパ
イラによつて自動ベクトル化することが行われて
いるが、その目的（ブプジエクト）プログラムの
実行性能のに関する最適化が望まれる。 Vector computers that operate on large amounts of data at high speed are used in chemical and technical calculations using data processing devices. For example, a compiler is used to automatically vectorize a source program written in the FORTRAN language, but it is desired to optimize the execution performance of the target program.

[Conventional technology]

従来、FORTRANプログラム等の最適化を促
進するために、プログラム中の各文について、実
行回数や実行コスト等の実行解析情報を出力する
実行解析ツールが用いられている。また、ベクト
ル演算命令を生成可能であるコンパイラでは、自
動ベクトル化およびベクトル化された目的プログ
ラムの最適化を促進するため各種オプテイマイズ
処理が行われている。 Conventionally, in order to promote optimization of FORTRAN programs, etc., execution analysis tools have been used that output execution analysis information such as the number of executions and execution cost for each statement in the program. In addition, compilers capable of generating vector operation instructions perform various optimization processes to promote automatic vectorization and optimization of vectorized target programs.

しかし、ベクトル化により性能低下を招く要因
を検出し、それを除去する手段は、従来ないた
め、その最適化を行う場合には、人間が実行性能
を分析して、コンパイラに対し、いちいち最適化
を指示する必要があつた。 However, there is no conventional method for detecting and eliminating factors that cause performance degradation due to vectorization, so when optimizing them, humans analyze execution performance and tell the compiler to optimize them one by one. It was necessary to give instructions.

[Problem that the invention seeks to solve]

例えばFORTRAN言語等により記述された原
始プログラムをコンパイルして、ベクトル計算機
用の目的プログラムを生成する場合、その性能を
向上させるためには、より多くのDOループを自
動ベクトル化する必要があると考えられる。しか
し、例えば短いベクトル長の演算の場合には、ベ
クトル命令によつて処理するよりも、通常のスカ
ラ命令によつて処理したほうが実行速度が上がる
場合がある。 For example, when compiling a source program written in FORTRAN language etc. to generate a target program for a vector computer, it is considered necessary to automatically vectorize more DO loops in order to improve its performance. It will be done. However, for example, in the case of an operation with a short vector length, the execution speed may be faster if the processing is performed using normal scalar instructions rather than using vector instructions.

コンパイラにおける自動ベクトル化において、
ベクトルコストとスカラコストとの比較ができれ
ば、コンパイラ内における最適化が可能である
が、例えば(a)DOループのループ回数が変数であ
る場合に、ベクトル長が不明であること、(b)動作
するベクトル計算機の機種によつてベクトルコス
トが変動することなどにより、コンパイラ内では
コスト比較ができない。 In automatic vectorization in the compiler,
If vector cost and scalar cost can be compared, optimization within the compiler is possible, but for example, (a) when the loop count of the DO loop is a variable, the vector length is unknown, and (b) operation Because vector costs vary depending on the type of vector computer used, cost comparisons cannot be made within the compiler.

そのため従来、ベクトル化によつて、かえつて
部分的に性能低下を招くことがあるという問題が
あつた。 Therefore, in the past, there has been a problem that vectorization may actually cause a partial performance deterioration.

本発明は上記問題点の解決を図り、性能低下の
要因となる短ベクトル長の演算を認識し、自動的
に当該ループのベクトル化を抑止することによつ
て、性能低下の要因を除去する手段を提供するこ
とを目的としている。 The present invention aims to solve the above-mentioned problems, and is a means for eliminating the cause of performance deterioration by recognizing short vector length operations that cause performance deterioration and automatically suppressing vectorization of the loop. is intended to provide.

[Means for solving problems]

第１図は本発明の原理ブロツク図を示す。 FIG. 1 shows a block diagram of the principle of the present invention.

第１図において、１０は原始プログラムのベク
トル化に関する最適チユーニングを行う最適チユ
ーニング処理部、１１は実行解析情報入力部、１
２はベクトルコストを分析するベクトルコスト分
析部、１３は最適化制御行を生成しベクトル化の
抑止を指示する最適化制御行生成部、１４は原始
プログラムを入力する原始プログラム入力部、１
５はチユーニングされた原始プログラムを出力す
るチユーニング原始プログラム出力部、１６はチ
ユーニング対象プログラムに関する実行解析情
報、１７はチユーニング対象となる原始プログラ
ム、１８はチユーニングされたチユーニング原始
プログラムを表す。 In FIG. 1, 10 is an optimal tuning processing unit that performs optimal tuning regarding vectorization of a source program, 11 is an execution analysis information input unit, and 1
2 is a vector cost analysis unit that analyzes vector costs; 13 is an optimization control line generation unit that generates optimization control lines and instructs to suppress vectorization; 14 is a source program input unit that inputs a source program;
Reference numeral 5 represents a tuning source program output unit that outputs a tuned source program, 16 represents execution analysis information regarding the program to be tuned, 17 represents the source program to be tuned, and 18 represents the tuned source program.

実行解析情報入力部１１は、実行解析ルーツの
出力である実行解析情報１６を入力する。この実
行解析情報１６は、実際のプログラムの走行また
は実行シミユレートによつて、各文毎の実行回数
等が出力されたものである。 The execution analysis information input unit 11 inputs execution analysis information 16 that is the output of the execution analysis roots. This execution analysis information 16 is obtained by outputting the number of executions of each statement, etc., by running an actual program or simulating execution.

ベクトルコスト分析部１２は、実行解析情報入
力部１１が入力した実行解析情報１６に基づき、
ベクトル化対象となるループ範囲について、ベク
トルコストとスカラコストとを計算する。 Based on the execution analysis information 16 input by the execution analysis information input unit 11, the vector cost analysis unit 12 calculates
Vector cost and scalar cost are calculated for the loop range to be vectorized.

最適化制御行生成部１３は、ベクトルコストが
スカラコストよりも大きくなる場合に、ベクトル
化を抑止することをコンパイラに指示する最適化
制御行を生成する。そして、原始プログラム入力
部１４によつて入力された原始プログラム１７中
に、その最適化制御行を組み入れ、チユーニング
原始プログラム出力部１５を介して、チユーニン
グ原始プログラム１８を出力する。 The optimization control line generation unit 13 generates an optimization control line that instructs the compiler to suppress vectorization when the vector cost becomes larger than the scalar cost. Then, the optimization control line is incorporated into the source program 17 inputted by the source program input section 14, and the tuning source program 18 is outputted via the tuning source program output section 15.

[Effect]

本発明によれば、ベクトルコスト分析部１２に
より、実行解析情報に基づくコスト計算が行わ
れ、ベクトル化されることによつて低能低下を招
く要因が自動検出される。そして、最適化制御行
生成部１３によつて、最適化制御行が原始プログ
ラム１７中に組み入れられるので、コンパイル時
には、その最適化制御行で指定されたループにつ
いてのベクトル化が抑止されることになる。 According to the present invention, the vector cost analysis unit 12 performs cost calculation based on execution analysis information, and automatically detects factors that cause low performance due to vectorization. Then, since the optimization control line is incorporated into the source program 17 by the optimization control line generation unit 13, vectorization of the loop specified by the optimization control line is suppressed during compilation. Become.

従つて、コンパイラでは、ベクトル化により実
際に性能が向上する部分だけ、ベクトル演算命令
の生成を行い、ベクトル化によつて性能が低下す
る部分については、通常のスカラ演算命令の生成
を行うので、実行性能が向上することになる。 Therefore, the compiler generates vector operation instructions only for parts where vectorization actually improves performance, and generates normal scalar operation instructions for parts where vectorization degrades performance. Execution performance will improve.

〔Example〕

第２図は本発明が適用されるシステムの例、第
３図は本発明の一実施例に係るベクトルコスト分
析を説明するための図、第４図は本発明の一実施
例に係る最適化制御行の生成を説明するための
図、第５図は本発明の一実施例処理説明図、第６
図は最適化による性能比較説明図を示す。 Figure 2 is an example of a system to which the present invention is applied, Figure 3 is a diagram for explaining vector cost analysis according to an embodiment of the present invention, and Figure 4 is an optimization diagram according to an embodiment of the present invention. FIG. 5 is a diagram for explaining the generation of control lines; FIG.
The figure shows an explanatory diagram of performance comparison by optimization.

本発明は、例えば第２図に示すような
FORTRANのコンパイルを行う処理システムに
適用される。 The present invention, for example, as shown in FIG.
Applies to processing systems that perform FORTRAN compilation.

第２図において、第１図と同符号のものは第１
図に示すものに対応し、２０はCPUおよびメモ
リなどからなる処理装置、２１はFORTRANプ
ログラムの実行解析ツールであるFORTRAN実
行解析部、２２はFORTRAN言語により記述さ
れたプログラムを計算機の機械語命令等からなる
目的プログラムに翻訳するFORTRANコンパイ
ラ、２３は最適化制御行の有無を判定する最適化
制御行判定部、２４はFORTRAN原始プログラ
ム、２５はチユーニングFORTRAN原始プログ
ラム、２６は目的プログラムを表す。 In Figure 2, the same numbers as in Figure 1 are numbered 1.
Corresponding to what is shown in the figure, 20 is a processing unit consisting of a CPU, memory, etc., 21 is a FORTRAN execution analysis unit that is an execution analysis tool for FORTRAN programs, and 22 is a program written in FORTRAN language, such as computer machine language instructions, etc. 23 is an optimization control line determination unit that determines the presence or absence of an optimization control line; 24 is a FORTRAN source program; 25 is a tuning FORTRAN source program; and 26 is a target program.

FROTRAN実行解析部２１は、例えば実行解
析対象となるFORTRAN原始プログラム２４中
における制御移行に関連する部分に、実行回数を
カウントする命令を埋め込むことなどにより、ル
ープの繰り返し回数や1F文の真率等の情報を含
む実行解析情報１６を出力する。なお、この
FORTRAN実行解析部２１は、従来からいわゆ
るFORTRANチユーニングルーツとして用いら
れているものを利用できる。 The FROTRAN execution analysis unit 21 calculates the number of loop repetitions, the true rate of 1F statements, etc. by, for example, embedding an instruction to count the number of executions in a part related to control transfer in the FORTRAN source program 24 that is the target of execution analysis. The execution analysis information 16 including the information is output. Furthermore, this
The FORTRAN execution analysis unit 21 can utilize what has been conventionally used as so-called FORTRAN tuning roots.

最適チユーニング処理部１０は、ベクトルコス
ト分析部１２および最適化制御行生成部１３によ
つて、実行解析情報１６に基づくコスト計算を行
い、チユーニング対象となるFORTRAN原始プ
ログラム２４中に、最適化制御行を組み込んだチ
ユーニングFORTRAN原始プログラム２５を出
力する。 The optimal tuning processing unit 10 calculates the cost based on the execution analysis information 16 using the vector cost analysis unit 12 and the optimization control line generation unit 13, and calculates the optimization control line in the FORTRAN source program 24 to be tuned. Outputs a tuning FORTRAN source program 25 that incorporates.

FORTRANコンパイラ２２は、チユーニング
FORTRAN原始プログラム２５を機械語に翻訳
するにあたつて、最適化制御行判定部２３によ
り、ベクトル化を抑止する最適化制御行を検出す
ると、その指定範囲についてのベクトル化を抑止
する。 FORTRAN compiler 22 is tuning
When translating the FORTRAN source program 25 into machine language, when the optimization control line determining unit 23 detects an optimization control line that suppresses vectorization, it suppresses vectorization for that specified range.

次に、第３図に従つて、ベクトルコスト分析部
１２によるベクトルコストの分析例について説明
する。 Next, an example of vector cost analysis by the vector cost analysis section 12 will be described with reference to FIG.

チユーニング対象となるFORTRAN原始プロ
グラムの各文のスカラコストがSiで、その実行回
数が第３図に示すようになつていたとする。例え
ば、ループ内の文１の実行回数は、ループ回数が
100回であり、このDOループが10回実行される
ので、1000回となる。この文１のループ当たりの
平均実行回数ex₁は、100回となる。また、この例
における1F文の真率、即ち、1F条件が成立する
確率が５％であるとすると、文の実行回数は50
回であり、そのループ当たりの平均実行回数ex₁
は５回となる。 Assume that the scalar cost of each statement in the FORTRAN source program to be tuned is Si, and the number of executions is as shown in Figure 3. For example, the number of executions of statement 1 in the loop is
100 times, and this DO loop will be executed 10 times, so it will be 1000 times. The average number of executions ex ₁ of this statement 1 per loop is 100 times. Also, if the truth rate of the 1F statement in this example, that is, the probability that the 1F condition is satisfied, is 5%, the number of executions of the statement is 50.
times, and the average number of executions per loop ex ₁
will be 5 times.

このDOループについてベクトル化する場合、
ベクトル長はループ回数に対応し、例えは1F文
に続く部分については、いわゆるマスク演算によ
つて処理される。 If we vectorize for this DO loop,
The vector length corresponds to the number of loops, and for example, the portion following the 1F statement is processed by a so-called mask operation.

従つて、ベクトルコストとスカラコストとの比
較を行う場合には、マスク演算によるベクトル長
が正しく反映されるように、スカラコストを補正
することが必要となる。 Therefore, when comparing vector cost and scalar cost, it is necessary to correct the scalar cost so that the vector length obtained by mask calculation is correctly reflected.

以上の考慮により、DOループのベクトルコス
トＶ−COSTは、次の式で求められる。 Based on the above considerations, the vector cost V-COST of the DO loop can be calculated using the following formula.

Ｖ−COST＝〓ⁱ S_i＊ex_L／ex_i＊α_L α_L＝ｆ（ex_L，cpu）ここで、S_iは各文のスカラコスト、 ex_iは各文のループ当たりの平均実行回数、 ex_Lはループ先頭の文のベクトルベクトル計算
機性能、 CPUはα_Lはex_Lおよびcpuにより求まるベクト
ル対スカラ性能比率である。 V-COST= 〓 ⁱ S _i *ex _L /ex _i *α _L α _L = f (ex _L , cpu) Here, S _i is the scalar cost of each statement, and ex _i is the average execution per loop of each statement. The number of times, ex _L is the vector computer performance of the statement at the beginning of the loop, and the CPU is α _L is the vector-to-scalar performance ratio determined by ex _L and CPU.

このα_Lは、予めベクトル計算機による標準プロ
グラムについての実測値に基づいて定められ、例
えばテーブル化されて保持される。 This α _L is determined in advance based on actual values for a standard program using a vector computer, and is stored, for example, in a table.

こうして求められたベクトルコストＶと、ルー
プ内のスラスコストS_iの和との比較により、性能
比較が行われる。 Performance comparison is performed by comparing the vector cost V obtained in this way with the sum of the slusc costs S _i in the loop.

スカラコストが小さい場合には、第４図に示す
ように、最適化制御行の生成が行われる。第４図
において、３０は最適化制御行、３１はベクトル
化抑止範囲を表す。 When the scalar cost is small, an optimization control line is generated as shown in FIG. In FIG. 4, 30 represents an optimization control line, and 31 represents a vectorization suppression range.

即ち、FORTRAN原始プログラム２４のDO文
の前に、例えば「＊VOCL LOOP，SCALAR」
という最適化制御行３０が組み入れられ、チユー
ニングFORTRAN原始プログラム２５が生成さ
れる。コンパイラでは、この最適化制御行３０を
検出すると、これに続くDOループをベクトル化
抑止範囲３１として認識し、この部分については
ベクトル演算命令の出力を抑止する。 That is, before the DO statement of the FORTRAN source program 24, for example, "*VOCL LOOP, SCALAR"
The optimization control line 30 is incorporated, and the tuning FORTRAN source program 25 is generated. When the compiler detects this optimization control line 30, it recognizes the DO loop following it as a vectorization suppression range 31, and suppresses the output of vector operation instructions for this portion.

第５図は、本発明の主要部分についての処理例
を示している。以下、第５図に示す処理〜に
従つて説明する。 FIG. 5 shows an example of processing for the main parts of the present invention. The processing shown in FIG. 5 will be explained below.

チユーニング対象の原始プログラムについて
ベクトル化可能なDOループを検出する。 Detect vectorizable DO loops in the source program to be tuned.

DOループを検出したならば、実行解析情報
に基づきループ内のスカラコストの和Ｓ−
COSTを計算する。 If a DO loop is detected, the sum of the scalar costs in the loop S- based on the execution analysis information
Calculate COST.

次に、第３図で説明したDOループのベクト
ルコストＶ−COSTを、実行解析情報と所定の
ベクトル対スカラ性能比率α_Lとに基づき計算す
る。 Next, the vector cost V-COST of the DO loop explained in FIG. 3 is calculated based on the execution analysis information and a predetermined vector-to-scalar performance ratio α _L.

Ｓ−COSTとＶ−COSTとの大小を比較し、
Ｖ−COSTがＳ−COSTより大きければ、次の
処理を実行する。なお、等しい場合には、ど
ちらでもよい。 Compare the size of S-COST and V-COST,
If V-COST is greater than S-COST, the following process is executed. Note that if they are equal, either one may be used.

第４図に示すようにベクトル化抑止の最適化
制御行を生成し、原始プログラム中に組み入れ
る。 As shown in FIG. 4, an optimization control line for suppressing vectorization is generated and incorporated into the source program.

第６図は、本発明の効果を説明するための性能
比較を示す図である。第６図において、Ｌ１はス
カラループ、Ｌ２はベクトル化によつて性能が低
下するループ、Ｌ３はベクトル化によつて性能が
向上するループを表している。 FIG. 6 is a diagram showing a performance comparison for explaining the effects of the present invention. In FIG. 6, L1 represents a scalar loop, L2 represents a loop whose performance is degraded by vectorization, and L3 represents a loop whose performance is improved by vectorization.

今、プログラムのベクトル化率が95％であると
仮定する。また、ベクトル化による性能向上比が
１０倍、ベクトル化による性能低下比が５倍、低
下部分（Ｌ２の部分）がベクトル化可能部分の10
％であるとする。第６図イに示すオリジナルの実
行時間、即ち、すべてスカラの演算命令により処
理した場合の実行時間を100とすると、第６図ロ
に示すベクトル化による実行時間T1は、次のよ
うになる。 Now, assume that the vectorization rate of the program is 95%. Also, the performance improvement ratio due to vectorization is 10 times, the performance decrease ratio due to vectorization is 5 times, and the degraded part (L2 part) is 10 times the part that can be vectorized.
%. Assuming that the original execution time shown in FIG. 6A, that is, the execution time when processing is performed entirely using scalar arithmetic instructions, is 100, the execution time T1 by vectorization shown in FIG. 6B is as follows.

T1＝95×０・９／10＋95×0.1×５＋５＝61.05 この実行時間T1は、従来方式によるベクトル
化の性能と考えてよい。 T1=95×0.9/10+95×0.1×5+5=61.05 This execution time T1 can be considered as the performance of vectorization using the conventional method.

一方、本発明による最適チユーニングを行え
ば、その実行時間T2は、第６図ハに示すように
なり、次のようになる。 On the other hand, if the optimal tuning according to the present invention is performed, the execution time T2 becomes as shown in FIG. 6C, and is as follows.

T2＝95×0.9／10＋95×0.1＋５＝23.05 対象性能比およびオリジナル比は、それぞれ次
のようになる。 T2=95×0.9/10+95×0.1+5=23.05 The target performance ratio and the original ratio are as follows.

相対性能比＝T1／T2＝61.05／23.05＝2.64（倍）オリジナル比＝100／23.05＝4.33（倍）〔発明の効果〕以上説明したように、本発明によれば、ベクト
ル化による性能低下要因が除去されるので、ベク
トル計算機向けの最適チユーニングが可能にな
る。 Relative performance ratio = T1/T2 = 61.05/23.05 = 2.64 (times) Original ratio = 100/23.05 = 4.33 (times) [Effects of the invention] As explained above, according to the present invention, the factor of performance decline due to vectorization is removed, making it possible to perform optimal tuning for vector computers.

[Brief explanation of drawings]

第１図は本発明の原理ブロツク図、第２図は本
発明が適用されるシステムの例、第３図は本発明
の一実施例に係るベクトルコスト分析を説明する
ための図、第４図は本発明の一実施例に係る最適
化制御行の生成を説明するための図、第５図は本
発明の一実施例処理説明図、第６図は最適化によ
る性能比較説明図を示す。図中、１０は最適チユーニング処理部、１１は
実行解析情報入力部、１２はベクトルコスト分析
部、１３は最適化制御行生成部、１４は原始プロ
グラム入力部、１５はチユーニング原始プログラ
ム出力部を表す。 FIG. 1 is a principle block diagram of the present invention, FIG. 2 is an example of a system to which the present invention is applied, FIG. 3 is a diagram for explaining vector cost analysis according to an embodiment of the present invention, and FIG. 4 5 is a diagram illustrating generation of an optimized control line according to an embodiment of the present invention, FIG. 5 is a diagram illustrating processing of an embodiment of the present invention, and FIG. 6 is a diagram illustrating performance comparison by optimization. In the figure, 10 is an optimal tuning processing section, 11 is an execution analysis information input section, 12 is a vector cost analysis section, 13 is an optimization control line generation section, 14 is a source program input section, and 15 is a tuning source program output section. .

Claims

[Scope of Claims] 1. A vector computer language tuning processing method for tuning a source program of a program to be run on a processing device having a vector computer, which is a vectorization target based on execution analysis information of the program to be tuned. Vector cost analysis means 12 for analyzing vector costs related to execution performance due to vectorization for loop ranges
and an optimization control line generation means 13 that incorporates an optimization control line instructing to suppress vectorization for the loop range into the tuning target source program when the vector cost is larger than the scalar cost in the case of not vectorizing. A language tuning processing method for a vector computer, characterized by comprising the following.