JP2000259423A

JP2000259423A - Method for compiling program and recording medium with method recorded thereon

Info

Publication number: JP2000259423A
Application number: JP11059979A
Authority: JP
Inventors: Shizuka Koyama; 静香小山
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1999-03-08
Filing date: 1999-03-08
Publication date: 2000-09-22
Anticipated expiration: 2019-03-08
Also published as: JP3627905B2

Abstract

PROBLEM TO BE SOLVED: To accelerate even a program that frequently uses either an integer operation or a floating point operation without making hardware have redundancy. SOLUTION: This program compiling method decides whether or not a calculation operation included in a certain execution section on an input program is localized to an instruction belonging to a specified function unit (steps 401 and 402) and translates a part of the instruction into an alternate instruction string belonging to another function unit or translates it into library calling composed of alternative instruction strings including an instruction belonging to the other function unit (step 404).

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、スーパースケイラ
またはＶＬＩＷ等の、複数の機能ユニットを有するとと
もにそれぞれの機能ユニットが同時並列的に命令を実行
できる計算機において、プログラムの実行を高速化する
ための技術に関し、特に、ハードウェアに冗長性をもた
せることなく、各機能ユニットの作業負荷を同程度にす
るプログラムコンパイル方法およびそのプログラムコン
パイル方法を記録した記録媒体に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a computer having a plurality of functional units, such as a superscaler or a VLIW, each of which can execute instructions in parallel in order to speed up program execution. In particular, the present invention relates to a program compiling method for making the workload of each functional unit comparable without giving hardware redundancy, and a recording medium recording the program compiling method.

【０００２】[0002]

【従来の技術】プログラムの実行を高速化するための手
段として、プロセッサの処理能力の向上が挙げられる。
プロセッサの処理能力を向上させる方法としては、プロ
セッサを構成するハードウェア個々の性能（処理速度、
処理量）の向上を図る方法と、ハードウェア自体の性能
向上ではなく、複数の命令を同時に実行させることによ
って性能の向上を図る方法とがある。2. Description of the Related Art As means for speeding up the execution of a program, there is an improvement in the processing capability of a processor.
As a method of improving the processing capability of a processor, the performance of each hardware constituting the processor (processing speed,
There is a method of improving the processing amount) and a method of improving the performance not by improving the performance of the hardware itself but by simultaneously executing a plurality of instructions.

【０００３】後者の方法は、内部に複数の機能ユニット
をもち、各機能ユニットが同時に命令を実行できるプロ
セッサを用いる（文献「Hennessy,D.A. and Patterson,
J.L.,"Computer Architecture A Quantitative Approac
h second edition, " pp.278-289,Morgan Kaufmann pub
lishers, San Francisco, California, 1996」参照）。
この場合に用いられるプロセッサは、各機能ユニットご
とに、実行できる命令が決まっている場合が多い。In the latter method, a processor having a plurality of functional units therein and capable of simultaneously executing instructions is used (see Hennessy, DA and Patterson,
JL, "Computer Architecture A Quantitative Approac
h second edition, "pp.278-289, Morgan Kaufmann pub
lishers, San Francisco, California, 1996 ").
In many cases, a processor used in this case has an executable instruction determined for each functional unit.

【０００４】例えば、プロセッサが内部に２つの機能ユ
ニットを持ち、その一方（整数演算ユニット）が整数演
算を実行する機能を有し、もう一方（浮動小数点演算ユ
ニット）が浮動小数点演算を実行する機能を有するもの
とすると、プロセッサは整数演算ユニットを用いて整数
命令を実行し、浮動小数点演算ユニットを用いて浮動小
数点命令を実行する。このとき、プロセッサは、整数命
令１命令と浮動小数点命令１命令を、それぞれの機能ユ
ニットを使って同時に実行することができる。ただし、
複数の命令を同時に実行するためには、ある単位時間内
に実行しようとする各機能ユニットの作業負荷が同程度
である必要がある。そのため、特にループ中の計算操作
が特定の機能ユニットに属する命令に偏っているような
場合では、複数の命令を同時に実行することはできな
い。For example, a processor has two function units therein, one of which has a function of executing an integer operation (an integer operation unit) and the other of which has a function of executing a floating point operation. , The processor executes the integer instruction using the integer operation unit, and executes the floating point instruction using the floating point operation unit. At this time, the processor can simultaneously execute one integer instruction and one floating-point instruction using the respective functional units. However,
In order to execute a plurality of instructions at the same time, the workload of each functional unit to be executed within a certain unit time needs to be approximately the same. Therefore, a plurality of instructions cannot be executed at the same time, particularly when the calculation operation in the loop is biased toward instructions belonging to a specific functional unit.

【０００５】また、各機能ユニットの作業負荷を分散さ
せることで、プログラムの実行速度を高速化する従来技
術として、普段、あまり使用しない機能ユニットに、出
現頻度が高い命令を実行する機能を付加する方法があ
る。例えば、プロシーディングオブザエイシーエム
シグプランピーエルディーアイ（１９９８年）第１１
８頁から第１２９頁（Sastry, S. S. , Palachada, S.
and Smith J. E. , "Exploiting Idle Floating Point
Resources For Integer Execution, " Proceedings of
the ACM SIGPLAN PLDI 1998, pp. 118-129）において論
じられている方法では、整数命令を実行する整数演算ユ
ニットと、浮動小数点命令を実行する浮動小数点演算ユ
ニットの２種類の機能ユニットを有するプロセッサにお
いて、浮動小数点演算ユニットに整数演算の機能を付加
し、ハードウェアに冗長性をもたせることで、整数演算
を多用するプログラムを高速に実行する。この場合の計
算機用のコンパイラでは、ソースプログラム上の整数演
算処理を、整数演算ユニットで実行すべきものと、浮動
小数点演算ユニットに付加した整数演算機能を実行すべ
きものに分離する。Further, as a conventional technique for increasing the execution speed of a program by distributing the workload of each functional unit, a function of executing a frequently appearing instruction is added to a functional unit which is not usually used often. There is a way. For example, Proceeding of the ACM
Sigplan PDI (1998) 11th
Pages 8 to 129 (Sastry, SS, Palachada, S.
and Smith JE, "Exploiting Idle Floating Point
Resources For Integer Execution, "Proceedings of
In the method discussed in the ACM SIGPLAN PLDI 1998, pp. 118-129), a processor having two types of functional units, an integer operation unit executing an integer instruction and a floating point operation unit executing a floating point instruction, is used. By adding an integer operation function to the floating-point operation unit and adding redundancy to hardware, a program that makes heavy use of integer operation can be executed at high speed. In this case, the compiler for the computer separates the integer arithmetic processing on the source program into one to be executed by the integer arithmetic unit and one to execute the integer arithmetic function added to the floating-point arithmetic unit.

【０００６】[0006]

【発明が解決しようとする課題】上述した従来の方法
は、浮動小数点演算ユニットに整数演算の機能を付加し
た、冗長性をもった計算機でしか実現できない。また、
一般的によく使われるプログラムの多く（コンパイラ、
エディタ、オペレーティングシステム）は整数演算を多
用するプログラムであるので、浮動小数点演算ユニット
で整数演算ユニットの肩代わりをする従来の方法が有効
であるが、科学技術計算のように浮動小数点演算を多用
するプログラムでは有効でない。一般に、複数の機能ユ
ニットをもち、各機能ユニットが同時に命令を実行でき
る計算機において、実際に各機能ユニットが同時に命令
を実行するためには、ある単位時間に対する各機能ユニ
ットの作業負荷が同程度でなければならない。The above-mentioned conventional method can be realized only by a computer having a redundancy in which an integer operation function is added to a floating-point operation unit. Also,
Many commonly used programs (compilers,
Editor, operating system) is a program that makes heavy use of integer arithmetic, so the conventional method of replacing the integer arithmetic unit with a floating-point arithmetic unit is effective, but a program that makes heavy use of floating-point arithmetic, such as scientific calculation Is not valid. In general, in a computer having a plurality of functional units and each functional unit can execute an instruction at the same time, in order for each functional unit to execute an instruction at the same time, the workload of each functional unit for a certain unit time is almost the same. There must be.

【０００７】本発明の目的は、ハードウェアに冗長性を
もたせることなしに、整数演算と浮動小数点演算のどち
らを多用するプログラムをも高速化することが可能なプ
ログラムコンパイル方法およびそのコンパイル方法を記
録した記録媒体を提供することにある。An object of the present invention is to record a program compiling method and a compiling method capable of accelerating a program that makes heavy use of both integer arithmetic and floating-point arithmetic without giving hardware redundancy. It is another object of the present invention to provide a recording medium that has been designed.

【０００８】[0008]

【課題を解決するための手段】本発明のコンパイル方法
は、上記目的を達成するために、まず、プログラム翻訳
時に、入力プログラム（ソースプログラム）上のある実
行区間に含まれる計算操作が、特定の機能ユニットに属
する命令に偏っているか否かを判断する。その結果、偏
っていると判断した場合には、各機能ユニットの作業負
荷を同程度にするために、当該命令の一部を、他の機能
ユニットに属する代替命令列に翻訳するか、あるいは、
他の機能ユニットに属する命令を含む代替命令列で構成
されたライブラリ呼び出しに翻訳する。According to the compile method of the present invention, in order to achieve the above object, first, at the time of program translation, a calculation operation included in a certain execution section on an input program (source program) is specified. It is determined whether or not the instruction belongs to the functional unit. As a result, when it is determined that the instruction is biased, in order to make the workload of each functional unit the same, a part of the instruction is translated into an alternative instruction sequence belonging to another functional unit, or
It translates into a library call composed of an alternative instruction sequence including instructions belonging to other functional units.

【０００９】また、本発明の記録媒体は、上記プログラ
ムコンパイル方法の各ステップの処理をプログラムコー
ド化して記録したコンピュータで読み取り可能な記録媒
体である。A recording medium according to the present invention is a computer-readable recording medium in which the processing of each step of the program compiling method described above is converted into a program code and recorded.

【００１０】[0010]

【発明の実施の形態】以下、本発明の実施例を図面を参
照しながら説明する。しかし、本発明が実施例に限定さ
れるものではないことはいうまでもない。図１は、本発
明のコンパイル方法が適用される計算機システムの一例
を示す図である。図１の計算機は、プロセッサ１０１、
主記憶１０２、ディスク装置１０３、読み込み装置１０
４、バス１０５を有し、読み込み装置１０４は記憶媒体
１０６に記憶されたプログラム等を読み込むことができ
る。本発明を利用するコンパイラはディスク装置１０３
または記憶媒体１０６に記憶されており、バス１０５を
介して主記憶１０２に取り込まれた後、、解読されてプ
ロセッサ１０１で実行される。Embodiments of the present invention will be described below with reference to the drawings. However, it goes without saying that the present invention is not limited to the examples. FIG. 1 is a diagram showing an example of a computer system to which the compiling method of the present invention is applied. 1 includes a processor 101,
Main memory 102, disk device 103, reading device 10
4. It has a bus 105, and the reading device 104 can read programs and the like stored in the storage medium 106. A compiler utilizing the present invention is a disk device 103
Alternatively, after being stored in the storage medium 106 and taken into the main memory 102 via the bus 105, it is decoded and executed by the processor 101.

【００１１】図２は、プロセッサ１０１の機能構成の一
例を示す図である。図２の例では、プロセッサ１０１
は、ロードストアユニット２０１、整数演算ユニット２
０２、浮動小数点加減乗算ユニット２０３、浮動小数点
除算ユニット２０４の複数の機能ユニットを持ってお
り、各機能ユニットは同時に並列的に命令を実行できる
ものとする。FIG. 2 is a diagram showing an example of a functional configuration of the processor 101. In the example of FIG.
Is the load store unit 201, the integer operation unit 2
02, a floating-point addition / subtraction / multiplication unit 203, and a floating-point division unit 204, each of which has a plurality of functional units, each of which can execute instructions in parallel at the same time.

【００１２】図３は、本実施例におけるコンパイラの処
理の流れの概要を示す図である。同図において、３０１
は入力プログラム（ソースプログラム）、３０２はコン
パイラ、３０３は最適化部、３０４は目的プログラム
（オブジェクトプログラム）、３０５は本発明に係るコ
ンパイル方法が適用される処理を示している。コンパイ
ラ３０２は、入力プログラム３０１を読み込み、最適化
部３０３で最適化して、目的プログラム３０４に翻訳す
る。本発明のコンパイル方法は最適化部３０３中の処理
３０５に適用される。本発明のコンパイル方法が適用さ
れる処理３０５は、特にループに適用された場合に効果
的となるため、以下の説明ではループへの適用例を示
す。FIG. 3 is a diagram showing an outline of a processing flow of the compiler in the present embodiment. Referring to FIG.
Is an input program (source program), 302 is a compiler, 303 is an optimization unit, 304 is a target program (object program), and 305 is a process to which the compile method according to the present invention is applied. The compiler 302 reads the input program 301, optimizes it by the optimizing unit 303, and translates it into the target program 304. The compiling method of the present invention is applied to the processing 305 in the optimizing unit 303. Since the processing 305 to which the compiling method of the present invention is applied is particularly effective when applied to a loop, the following description shows an example of application to a loop.

【００１３】図４は、処理３０５の具体的なフローチャ
ートである。本実施例における処理３０５は、プログラ
ム中のある実行区間に含まれる計算操作に対してプロセ
ッサ１０１が持っている各機能ユニットごとの作業負荷
を見積もり、当該実行区間の計算操作中の命令のうち、
代替命令列に翻訳した場合に作業負荷が各機能ユニット
で同程度になる命令の数ｎを計算し、もしｎが０であれ
ば全ての命令を本来の命令に翻訳し、ｎが１個以上の場
合は、ｎ個を代替命令列に翻訳するようにしたものであ
る。FIG. 4 is a specific flowchart of the process 305. The process 305 in the present embodiment estimates the workload of each functional unit possessed by the processor 101 for the calculation operation included in a certain execution section in the program, and among the instructions during the calculation operation of the execution section,
Calculate the number n of instructions whose work load is the same for each functional unit when translated into an alternative instruction sequence, and if n is 0, translate all instructions to the original instruction, and n is 1 or more In the case of, n are translated into an alternative instruction sequence.

【００１４】次に、処理３０５の流れを図４に沿って詳
細に説明する。まず、各ループ実行の計算操作を本最適
化適用対象の処理単位として認識し、各機能ユニットの
作業負荷の見積もる（ステップ４０１）。次に、各機能
ユニットの作業負荷を同程度にするために、ステップ４
０１で認識した計算操作中で、最も作業負荷が重い機能
ユニットに属する命令を代替命令列に翻訳すべき数ｎを
計算する（ステップ４０２）。計算の結果、翻訳すべき
数ｎが０であれば、全ての命令を本来の命令に翻訳する
（ステップ４０３）。翻訳すべき数ｎが１以上であった
場合には、命令数ｎを代替命令列あるいはライブラリ呼
び出しに翻訳する（ステップ４０４）。Next, the flow of the process 305 will be described in detail with reference to FIG. First, the calculation operation of each loop execution is recognized as a processing unit to which the optimization is applied, and the workload of each functional unit is estimated (step 401). Next, in order to equalize the workload of each functional unit, step 4 is performed.
In the calculation operation recognized in step 01, the number n to translate the instruction belonging to the functional unit having the heaviest workload into the alternative instruction sequence is calculated (step 402). As a result of the calculation, if the number n to be translated is 0, all instructions are translated into original instructions (step 403). If the number n to be translated is 1 or more, the number n of instructions is translated into an alternative instruction sequence or a library call (step 404).

【００１５】本発明を適用できる計算操作には多くの種
類が考えられるが、実際には、ループ中にある除算や平
方根などの実行時間が長い命令を、乗算や加算などの実
行時間が短い命令の組み合わせで代替するのが最も効果
的である。ここでは実施例として、除算の一種である
「逆数演算ｆ(ｘ)＝１／ｘ」を「乗加算」によって代替
する場合を取り上げる。また、前提として、除算命令は
２０サイクル、乗加算命令は１サイクルおきに実行でき
るものとし、浮動小数点数は、ＩＥＥＥ７５４による形
式のように、符号部と指数部、および仮数部に分かれる
形で表現されているものとする。There are many types of calculation operations to which the present invention can be applied. In practice, an instruction having a long execution time such as division or square root in a loop is replaced with an instruction having a short execution time such as multiplication or addition. It is most effective to substitute with a combination of Here, as an embodiment, a case where “reciprocal operation f (x) = 1 / x”, which is a kind of division, is replaced by “multiplication and addition” will be described. It is assumed that a division instruction can be executed every 20 cycles, a multiply-add instruction can be executed every other cycle, and a floating-point number is expressed in a form divided into a sign part, an exponent part, and a mantissa part as in IEEE754. It is assumed that

【００１６】使用する算法として以下に示すニュートン
法による近似を用いる場合を示す。Ｘn+1 ≒ Ｘn＋(１−ｘ×Ｘn)×Ｘn まず、計算を始めるにあたって、引数ｘの指数部をある
範囲に還元する。ゼロ次近似Ｘ0は、還元されたｘの値
から、あらかじめ用意されたテーブルを引くことで求め
る。ニュートン法は二乗収束を示すので、数ビットの精
度を持つ適当なゼロ次近似があれば、８バイトの浮動小
数点数に対して、Ｘ3またはＸ4で完全に収束する。ある
いは、近似値に対して検算の操作を加えることで、除算
命令による計算結果と一致させることができる。A case where approximation by the Newton method shown below is used as an algorithm to be used will be described. Xn + 1 ≒ Xn + (1−x × Xn) × Xn First, at the start of calculation, the exponent part of the argument x is reduced to a certain range. The zero-order approximation X0 is obtained by subtracting a prepared table from the reduced value of x. Since Newton's method shows square convergence, an appropriate zero-order approximation with a few bits of precision will completely converge on an 8-byte floating point number at X3 or X4. Alternatively, by performing a check operation on the approximate value, it is possible to match the result of the calculation with the division instruction.

【００１７】一回の近似ステップに要する計算量は乗算
命令２命令、加減算命令２命令であり、乗加算命令をも
つ計算機であれば、２命令で実現可能であるから、仮に
Ｘ4で収束するならば８命令で実現可能であるが、ここ
では１０命令を要するものと仮定する。一方、還元のた
めの指数部の操作は整数演算により実現されるが、この
作業負荷は比較的軽いので本例では無視する。したがっ
て、逆数を計算するのに除算命令を用いると除算ユニッ
ト（２０４）を２０サイクル占有し、前記の代替命令列
を用いると加減乗算ユニット（２０３）を１０サイクル
占有することになる。The amount of calculation required for one approximation step is 2 instructions for multiplication and 2 instructions for addition / subtraction. If a computer has a multiplication / addition instruction, it can be realized with 2 instructions. For example, it can be realized with eight instructions, but it is assumed here that ten instructions are required. On the other hand, the operation of the exponent part for reduction is realized by an integer operation, but this work load is relatively light and is ignored in this example. Therefore, using a division instruction to calculate the reciprocal occupies the division unit (204) for 20 cycles, and using the alternative instruction sequence occupies the addition / subtraction / multiplication unit (203) for 10 cycles.

【００１８】前記の条件において、ある実行区間中にお
ける除算命令数をｐ、加減乗算命令数をｑとすれば、乗
加算命令による代替命令列に翻訳すべき除算命令の数を
ｎとしたとき、ｑが２０ｐ未満の場合には、（ｐ−ｎ）
＊２０＝ｑ＋１０ｎすなわちｎ＝（２０ｐ−ｑ）／３０
となり、一方、ｑが２０ｐ以上の場合はｎ＝０となる。
図５は、逆数を代替命令列に翻訳すべき除算命令数ｎを
計算する処理のプログラム例を示している。Under the above conditions, if the number of division instructions in a certain execution section is p and the number of addition / subtraction / multiplication instructions is q, then if the number of division instructions to be translated into an alternative instruction sequence by a multiply-add instruction is n, When q is less than 20p, (pn)
* 20 = q + 10n, that is, n = (20p−q) / 30
On the other hand, when q is 20p or more, n = 0.
FIG. 5 shows a program example of a process for calculating the number n of division instructions to be translated into an alternative instruction sequence.

【００１９】また、除算命令をライブラリ呼び出しに置
き換える場合には、ｑが２０ｐ未満のときには、除算ユ
ニットだけが動作しているサイクル数（２０ｐ−ｑ）だ
けライブラリ呼び出しにする。したがって、ｎ＝（２０
ｐ−ｑ）／２０となる。一方、ｑが２０ｐ以上の場合は
ｎ＝０となる。逆数を代替命令列のライブラリ呼び出し
に翻訳すべき除算命令数ｎを計算する処理のプログラム
例を示している。When the division instruction is replaced with a library call, when q is less than 20p, the library call is made by the number of cycles (20p-q) in which only the division unit is operating. Therefore, n = (20
p−q) / 20. On the other hand, when q is 20p or more, n = 0. 9 shows a program example of a process for calculating the number n of division instructions to be translated into a library call of an alternative instruction sequence.

【００２０】このライブラリは、各機能ユニット間の作
業負荷が等しくなるように、除算命令１に対して乗加算
命令による代替命令列２の割合で逆数演算を実施するコ
ードにより構成されている。ライブラリを呼び出す際に
は、引数の配列と結果の配列、及び演算個数の受け渡し
が必要なため、ライブラリ呼び出しの際に、一時的な配
列の生成が必要になる場合がある。具体的な例として、
図７に示すＢ（ｎ）＝１／（５＊Ａ（ｎ））のソースプ
ログラムを前記のライブラリ呼び出しにすると図８のよ
うになる。図８において、ＬＩＢ（ＴＭＰ，Ｂ，Ｎ）は
ライブラリ、ＴＭＰはコンパイラが生成した一時的な配
列を示している（ｎ＝１〜Ｎ）。いずれの場合において
も、最適な作業負荷バランスを実現するために、適当な
ループ展開最適化が併用されていると効果的である場合
が多い。This library is composed of codes for performing a reciprocal operation at a ratio of a division instruction 1 to an alternative instruction sequence 2 by a multiplication / addition instruction so that the workloads among the functional units are equal. When calling the library, it is necessary to pass an array of arguments, an array of results, and the number of operations, so that a temporary array may need to be generated when calling the library. As a specific example,
When the source program of B (n) = 1 / (5 * A (n)) shown in FIG. 7 is called as the library, the result is as shown in FIG. In FIG. 8, LIB (TMP, B, N) indicates a library, and TMP indicates a temporary array generated by a compiler (n = 1 to N). In any case, it is often effective to use an appropriate loop unrolling optimization in order to achieve an optimal work load balance.

【００２１】次に、図４に示した処理３０５のフローチ
ャートに沿って、図９に示したＦＯＲＴＲＡＮのプログ
ラム片を翻訳する過程を示す。コンパイラは、図９に示
したＦＯＲＴＲＡＮプログラムのdoループを独立な最適
化単位であると認識し、各機能ユニットの作業負荷を見
積もる（ステップ４０１）。各機能ユニットの負荷を同
程度にするために代替命令列を用いる場合は、図５に示
した計算方法で代替命令列に翻訳すべき逆数演算ｎを計
算する。また、ライブラリを用いる場合は、図６に示し
た計算方法によって、ライブラリで計算すべき逆数演算
ｎを計算する。いずれの場合も、ｎは１以上になる（ス
テップ４０２）。Next, the process of translating the FORTRAN program fragment shown in FIG. 9 will be described with reference to the flowchart of the process 305 shown in FIG. The compiler recognizes the do loop of the FORTRAN program shown in FIG. 9 as an independent optimization unit, and estimates the workload of each functional unit (Step 401). When an alternative instruction sequence is used to make the load of each functional unit the same, the reciprocal operation n to be translated into the alternative instruction sequence is calculated by the calculation method shown in FIG. When a library is used, the reciprocal operation n to be calculated by the library is calculated by the calculation method shown in FIG. In any case, n becomes 1 or more (step 402).

【００２２】図９において、ループ１回、すなわち逆数
を１個計算するのにかかる時間は、全て本来の除算命令
に翻訳した場合は２０サイクルであり、全て代替命令列
に翻訳した場合は１０サイクルである。ここで図９のル
ープを４展開し、４個の逆数のうち３個を代替命令列に
翻訳した場合には、逆数１個あたり７．５サイクルに短
縮される。またライブラリ呼び出しに翻訳した場合に
は、逆数１個あたり約６．７サイクルになる（ステップ
４０４）。本実施例の過程のもとでは、いずれにしても
２倍以上の高速化が達成できることがわかる。一般に，
ライブラリ内のコードは、性能上、完璧なものを実現で
きるという長所がある。In FIG. 9, the time required for one loop, that is, the calculation of one reciprocal, is 20 cycles when all are translated into the original division instruction, and 10 cycles when all are translated into the alternative instruction sequence. It is. Here, when the loop of FIG. 9 is expanded into four and three of the four reciprocals are translated into an alternative instruction sequence, the number of cycles is reduced to 7.5 cycles per reciprocal. In the case of translation into a library call, the number of reciprocals is about 6.7 cycles (step 404). It can be seen that in any case under the process of the present embodiment, a speed-up of twice or more can be achieved. In general,
The code in the library has the advantage that it can achieve perfect performance.

【００２３】以上、本発明に係るプログラムコンパイル
方法を詳細に説明したが、本コンパイル方法の各処理を
プログラムコード化して、例えばＣＤ−ＲＯＭやフレキ
シブルディスク（ＦＤ）などの記録媒体に記録して流通
させれば、ユーザは、その記録媒体を入手して使用する
ことにより、自分の計算機システムの処理性能（ハード
ウェアやＯＳなどを含む）に最適な目的プログラムを得
ることができる。The program compiling method according to the present invention has been described in detail above. Each process of the compiling method is converted into a program code, recorded on a recording medium such as a CD-ROM or a flexible disk (FD), and distributed. Then, by obtaining and using the recording medium, the user can obtain a target program optimal for the processing performance (including hardware and OS) of his / her computer system.

【００２４】さらに、ソフトウェア販売会社では、本プ
ログラムコンパイル方法を用いて、各種の計算機システ
ム（ハードウェアおよびＯＳなどを含む）に適合した目
的プログラムを生成して、例えばＣＤ−ＲＯＭやフレキ
シブルディスク（ＦＤ）などの記録媒体に記録して、各
種計算機システム専用のソフトウェアとして流通させる
ことができる。その場合、ユーザは、自分の計算機シス
テムにあった記録媒体を入手して使用することにより、
自分の計算機システムの処理性能（ハードウェアやＯＳ
などを含む）に最適な目的プログラムで計算機システム
を動作させることができる。Further, the software sales company generates a target program suitable for various computer systems (including hardware and OS) by using the program compiling method, and for example, generates a target program such as a CD-ROM or a flexible disk (FD). ), And can be distributed as software dedicated to various computer systems. In that case, the user obtains and uses the recording medium suitable for his computer system,
Processing performance of your computer system (hardware and OS
The computer system can be operated with the optimal target program.

【００２５】[0025]

【発明の効果】本発明のプログラムコンパイル方法によ
れば、ハードウェアに冗長性をもたせることなく、各機
能ユニットの作業負荷を同程度にすることができ、複数
の機能ユニットが同時に命令を実行できるようになり、
プロセッサの処理能力が向上し、プログラムの実行を高
速化できる。また、本発明に係る記録媒体を入手して使
用することにより、ユーザは自分の計算機システムの性
能に最適な目的プログラムを得ることができ、また実行
させることができる。According to the program compiling method of the present invention, the workload of each functional unit can be made equal to each other without giving hardware redundancy, and a plurality of functional units can execute instructions at the same time. Like
The processing capability of the processor is improved, and the execution of the program can be accelerated. Further, by obtaining and using the recording medium according to the present invention, the user can obtain and execute the target program optimal for the performance of his / her computer system.

[Brief description of the drawings]

【図１】本発明のコンパイル方法が適用される計算機シ
ステムの一例を示す図である。FIG. 1 is a diagram showing an example of a computer system to which a compiling method of the present invention is applied.

【図２】プロセッサ１０１の機能構成の一例を示す図で
ある。FIG. 2 is a diagram illustrating an example of a functional configuration of a processor 101.

【図３】本実施例におけるコンパイラの処理の流れの概
要を示す図である。FIG. 3 is a diagram illustrating an outline of a processing flow of a compiler in the embodiment.

【図４】処理３０５の具体的なフローチャートである。FIG. 4 is a specific flowchart of a process 305.

【図５】逆数を代替命令列に翻訳すべき除算命令数ｎを
計算する処理のプログラム例である。FIG. 5 is a program example of a process for calculating the number n of division instructions for which a reciprocal is to be translated into an alternative instruction sequence;

【図６】逆数を代替命令列のライブラリに翻訳すべき除
算命令数ｎを計算する処理のプログラム例である。FIG. 6 is a program example of a process of calculating the number n of division instructions for which a reciprocal is to be translated into a library of alternative instruction strings.

【図７】ライブラリ呼び出しの際に、一時的な配列を必
要とするソースプログラムの例である。FIG. 7 is an example of a source program that requires a temporary array when calling a library.

【図８】図７のソースプログラムを、一時的な配列を用
いて書き換えた例である。FIG. 8 is an example in which the source program of FIG. 7 is rewritten using a temporary array.

【図９】本発明を利用するソースプログラム（ＦＯＲＴ
ＲＡＮ）の一例である。FIG. 9 shows a source program (FORT) utilizing the present invention.
RAN).

[Explanation of symbols]

１０１：プロセッサ、１０２：主記憶、１０３：ディスク装置、１０４：読み込み装置、１０５：バス、１０６：記憶媒体、２０１：ロードストアユニット、２０２：整数演算ユニット、２０３：浮動小数点加減乗算ユニット、２０４：浮動小数点除算ユニット３０１：入力プログラム、３０２：コンパイラ、３０３：目的プログラム、３０５：処理（本発明のコンパイル方法）。 101: Processor, 102: Main Memory, 103: Disk Device, 104: Reading Device, 105: Bus, 106: Storage Medium, 201: Load Store Unit, 202: Integer Operation Unit, 203: Floating Point Addition / Subtraction / Multiplication Unit, 204: Floating point division unit 301: input program, 302: compiler, 303: target program, 305: processing (compile method of the present invention).

Claims

[Claims]

1. A program compiling method for a computer having a plurality of functional units each capable of executing an instruction in parallel, wherein a calculation operation included in a certain execution section on an input program is executed by an instruction belonging to a specific functional unit. A first step of determining whether the instruction is biased, and, if determined to be biased by the first step, translating a part of the instruction to a substitute instruction sequence belonging to another functional unit or performing another Translating into a library call composed of a substitute instruction sequence including an instruction belonging to the functional unit.

2. The method according to claim 1, wherein the determination in the first step is performed by estimating a workload of each functional unit with respect to a calculation operation included in a certain execution section. Is the number of instructions that, when translated into the alternative instruction sequence belonging to another functional unit, of the instructions during the calculation operation, the workload becomes the same for each functional unit, or the alternative instruction sequence including the instruction belonging to another functional unit. 2. The method according to claim 1, further comprising calculating the number of instructions that translates into a structured library call and that results in the same workload in each functional unit.
The program compilation method described.

3. A computer-readable storage medium in which the processing of each step of the program compiling method according to claim 1 is converted into a program code and recorded.