JPH11120003A

JPH11120003A - Parallel executing method for loop including loop jump-out and parallel program generating method

Info

Publication number: JPH11120003A
Application number: JP28491397A
Authority: JP
Inventors: Makoto Sato; 真琴佐藤
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1997-10-17
Filing date: 1997-10-17
Publication date: 1999-04-30

Abstract

PROBLEM TO BE SOLVED: To speed up loop execution by overlapping calculation which is performed through the repetition of a loop over processors. SOLUTION: In a program generation part 120 in a loop parallelizing part 110, an overlap calculating process generation part 122 calculates the value of data which is updated in a loop including a loop jump-out and used after the jump-out by overlapping the loop repetition by processors while saving the value each time the loop is repeated as many times as specified. Further, a loop jump-out decision time process generation part 123 performs calculation for guaranteeing the value of the data after the loop jump-out by using the value of the last saved data when loop jump-out conditions are met. After the loop jump-out conditions are met, a processor having detected the loop jump-out informs another processor that the loop jump-out conditions have been met and the processor having been informed also informs another processor of that repeatedly one after another.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】ループ飛び出しを含む高級言
語プログラムを入力してプロセス間通信を含む並列プロ
グラムを生成するコンパイラにかかる。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a compiler that inputs a high-level language program including a loop jump and generates a parallel program including inter-process communication.

【０００２】[0002]

【従来の技術】従来、ループ飛び出しを含む高級言語プ
ログラムを入力してベクトル化されたプログラムを生成
するコンパイラでは、マイケルウルフ著、オプティマ
イズィングスーパーコンパイラーズフォースーパーコ
ンピューターズ、エムアイティープレス１９８９、第
１５１頁から第１５５頁（Ｍichael Ｗolfe,Ｏptimizin
g Ｓupercompilers for Ｓupercomputers,ＭＩＴＰres
s,１９８９，pp.１５１−１５５）で論じられているよ
うに、まず、ループ飛び出しを含むループのループ飛び
出し判定以外の文を全てのループ繰り返しないし規定回
数の繰り返しだけ計算し、次に上記回数だけループ飛び
出し判定のみを計算し、次に上記判定結果から最初に飛
び出し条件が成立するループ回数を調べ、最後に、その
回数までに計算された値のみをデータに設定して元のプ
ログラムの意味を変えないようにしている。2. Description of the Related Art Conventionally, a compiler that generates a vectorized program by inputting a high-level language program including a loop jump has been proposed by Michael Wolf, Optimizing Supercompilers for Supercomputers, MTI Press 1989, No. 151. Page to page 155 (Michael Wolfe, Optimizin
g Supercompilers for Supercomputers, MIT Pres
s, 1989, pp. 151-155), first, a statement other than the loop jump determination of a loop including a loop jump is calculated for all loop iterations or a specified number of iterations. Only the loop jump judgment is calculated, then the number of loops where the jump condition is satisfied is first checked from the above judgment result, and finally, only the values calculated up to that number are set in the data, meaning the original program Is not changed.

【０００３】以下、上記著作から２つのプログラム例を
引用する。Hereinafter, two examples of programs will be cited from the above works.

【０００４】最初に、第１の従来技術として、全てのル
ープ繰り返し回数だけ計算する例を挙げる。First, as a first conventional technique, an example in which calculation is performed for all loop repetitions will be described.

【０００５】（例１） do Ｉ＝１，ＮＡ(Ｉ)＝Ｂ(Ｉ)＊Ｃ(Ｉ)＋４／Ｄ(Ｉ) ef(Ａ(Ｉ)＝Ｓ(Ｉ)) goto label enddo 上記プログラムは下記のようにベクトル化される：Ｓ１：ＡＴＥＭＰ(１：Ｎ)＝Ｂ(１：Ｎ)＊Ｃ(１：Ｎ)
＋４／Ｄ(１：Ｎ) Ｓ２： bit(１：Ｎ)＝ＡＴＥＭＰ(１：Ｎ)＝Ｓ(１：
Ｎ) Ｓ３：Ｉ＝Ｆirst１(bit(１：Ｎ)) Ｓ４：Ａ(１：Ｉ)＝ＡＴＥＭＰ(１：Ｉ) Ｓ５： if(Ｉ＞０)goto label ここでＡＴＥＭＰは一時配列であり、Ａに余分な値まで
格納しないように一時的にこの配列に値を格納する。(Example 1) do I = 1, NA (I) = B (I) * C (I) + 4 / D (I) ef (A (I) = S (I)) goto label enddo The above program Is vectorized as follows: S1: ATEMP (1: N) = B (1: N) * C (1: N)
+ 4 / D (1: N) S2: bit (1: N) = ATEMP (1: N) = S (1: N)
N) S3: I = First 1 (bit (1: N)) S4: A (1: I) = ATEMP (1: I) S5: if (I> 0) goto label where ATEMP is a temporary array and A Temporarily store values in this array so that no extra values are stored.

【０００６】Ｓ１は１からＮまでの全ての値Ｉに対し
て、ループ飛び出し判定を除いた文であるＡＴＥＭＰ
（Ｉ）＝Ｂ（Ｉ）＊Ｃ（Ｉ）＋４／Ｄ（Ｉ）をベクトル
計算によって計算することを表す。ベクトル計算に関し
ては、ハンスジーマ、バーバラチャップマン共著、ス
ーパーコンパイラーズフォーパラレルアンドベ
クターコンピューターズ、アディソンウエスリパブ
リッシングカンパニーインク（Ｈans Ｚima and Ｂar
bara Ｃhapmann,Ｓupercompilers for Ｐarallel and
Ｖector Ｃomputers,Ａddison- Ｗesley Ｐublishing
Ｃompany,Ｉnc.）に詳しく記述されている。S1 is an ATEMP statement for all values I from 1 to N excluding loop jump determination.
(I) = B (I) * C (I) + 4 / D (I) is calculated by vector calculation. Regarding vector calculation, Hans Zima and Barbara Chapman, Super Compilers for Parallel and Vector Computers, Addison Wesley Publishing Company, Inc. (Hans Zima and Bar)
bara Chapmann, Supercompilers for Paraallel and
Vector Computers, Addison- Wesley Publishing
Company, Inc.).

【０００７】Ｓ２は１からＮまでの全ての値Ｉに対し
て、ループ飛び出し判定文であるＡ（Ｉ）＝Ｓ（Ｉ）の
みを実行し、もしこれが成立するならＴＲＵＥを、成立
しないならＦＡＬＳＥを、bit(Ｉ)に代入することを表
す。In S2, only the loop jump determination statement A (I) = S (I) is executed for all values I from 1 to N, TRUE if this is true, FALSE if not. Is substituted for bit (I).

【０００８】Ｓ３はループ飛び出し判定結果であるbit
(１)からbit(Ｎ)の内、最初にbit(Ｉ)の値がＴＲＵＥに
なる添字の値、即ちループ飛び出し条件が最初に成立す
るループ回数をＩに代入することを表す。S3 is a bit which is a loop jump determination result.
From (1) to bit (N), the value of the subscript whose bit (I) value is TRUE first, that is, the number of loops in which the loop jump condition is first satisfied is substituted into I.

【０００９】Ｓ４はループ飛び出しが起こるまでの値を
Ａに格納する。これによって、プログラムはベクトル化
しない場合と同じ実行結果を得る。At S4, the value until the loop jumps out is stored in A. As a result, the program obtains the same execution result as when no vectorization is performed.

【００１０】Ｓ５はループ飛び出し条件が成立する時の
み、指定された位置へプログラムの制御を移すことを表
す。S5 represents that the control of the program is shifted to the designated position only when the loop jump condition is satisfied.

【００１１】次に、第２の従来技術として、ループ繰り
返し依存があり、かつ、ループ繰り返し回数が不定であ
るループに対して、規定回数だけ計算する例を挙げる。Next, as a second conventional technique, an example will be described in which a specified number of calculations are performed on a loop in which the loop repetition is dependent and the number of loop repetitions is undefined.

【００１２】（例２） while(Ｘ＞ＥＰＳ)do Ｘ＝Ｆ(Ｘ) endwhile 上記プログラムは下記のようにベクトル化される: Ｓ１： while(Ｘ＞ＥＰＳ)do Ｓ２：ＸＴＥＭＰ(０)＝ＸＳ３： doＩ＝１，６４Ｓ４：ＸＴＥＭＰ(Ｉ)＝Ｆ(ＸＴＥＭＰ(Ｉ−１)) Ｓ５： enddo Ｓ６： bit(１：６４)＝ＸＴＥＭＰ(１：６４)＞ＥＰＳＳ７：Ｊ＝Ｆirst１(bit(１：６４)) Ｓ８：Ｘ＝ＡＴＥＭＰ(Ｊ) Ｓ９： endwhile whileループではループ実行回数は不定なので、上記ベ
クトル化プログラムでは６４回実行してループ飛び出し
条件が成立したか否か検査し、成立しなければ再び６４
回実行する。これを繰り返すことでループ実行回数が不
定な場合に対応している。この６４はベクトルレジスタ
の数であり、マシンによって異なる。(Example 2) while (X> EPS) do X = F (X) endwhile The above program is vectorized as follows: S1: while (X> EPS) do S2: XTEMP (0) = X S3: doI = 1,64 S4: XTEMP (I) = F (XTEMP (I-1)) S5: enddo S6: bit (1:64) = XTEMP (1:64)> EPS S7: J = First1 (bit (1:64)) S8: X = ATEMP (J) S9: Since the number of times of loop execution is undefined in the endwhile while loop, the vectorization program executes 64 times to check whether or not the loop jump condition has been satisfied. If not, again 64
Execute it twice. By repeating this, the case where the number of times of loop execution is indefinite is supported. This 64 is the number of vector registers, which differs depending on the machine.

【００１３】Ｓ３，Ｓ４，Ｓ５で示されるループはマシ
ンによってベクトル化される場合とベクトル化されない
場合がある。ベクトル化されない場合、この部分の実行
は逐次的になるため、whileループ全体の実行時間は元
の実行時間よりも遅くなる。The loops indicated by S3, S4, and S5 may or may not be vectorized by the machine. If not vectorized, the execution of this part is sequential, so the execution time of the entire while loop is slower than the original execution time.

【００１４】Ｓ６，Ｓ７はベクトル化されているが、Ｓ
８はスカラー実行され、その結果からＳ１でループ飛び
出しを実行するか否かを判定する。即ち、Ｓ８直前でベ
クトル実行終了のための同期が挿入される。Although S6 and S7 are vectorized, S6
8 is scalar-executed, and it is determined from the result whether or not to execute loop jumping in S1. That is, immediately before S8, a synchronization for ending the vector execution is inserted.

【００１５】[0015]

【発明が解決しようとする課題】上記第１の従来技術
は、少ないループ繰り返し回数に対してループ飛び出し
が発生する場合でも、あらかじめ全てのループ繰り返し
回数だけ実行するため、実行時間がかえって遅くなる、
という問題点がある。In the first prior art, even when a loop jumps out for a small number of loop repetitions, the execution time is rather slow because the loop is executed in advance for all the loop repetitions.
There is a problem.

【００１６】記第２の従来技術は、規定回数と同じ要素
数を持つ一時配列が必要になり、使用メモリ量が増大す
る、という問題点がある。The second prior art has a problem that a temporary array having the same number of elements as the specified number of times is required, and the amount of used memory is increased.

【００１７】また、上記第２の従来技術は、内側に逐次
実行されるループがある場合、そのループは高速化され
ない、という問題点がある。Further, the second prior art has a problem that if there is a loop that is sequentially executed inside, the loop is not speeded up.

【００１８】また、上記第２の従来技術は、規定回数ご
とに同期が挿入されるため、同期の分だけ実行時間が増
加するとともに、規定回数を増やすことによって利用で
きるループ繰り返し間の並列性が同期によって失われ、
その分の実行時間が増加する、という問題点がある。In the second prior art, since synchronization is inserted at a specified number of times, the execution time is increased by the amount of synchronization, and the parallelism between loop iterations that can be used by increasing the specified number of times is reduced. Lost by synchronization,
There is a problem that the execution time increases accordingly.

【００１９】本発明の目的は、ループ飛び出し条件成立
後の余分なループ実行にかかる時間を少なくすることで
ある。An object of the present invention is to reduce the time required for executing an extra loop after a loop jump condition is satisfied.

【００２０】また、本発明の別の目的は、使用メモリ量
の増加を少なくすることである。It is another object of the present invention to reduce an increase in the amount of memory used.

【００２１】また、本発明の別の目的は、内側に逐次実
行されるループがあっても、外側ループに関する並列性
を検出して高速化することである。Another object of the present invention is to detect the parallelism of the outer loop and speed up the processing even if there is a loop executed sequentially inside.

【００２２】また、本発明の別の目的は、ループ繰り返
し回数が不定の場合でも、規定回数ごとに処理を逐次化
させないことである。Further, another object of the present invention is to prevent the processing from being performed every prescribed number of times even when the number of times of loop repetition is indefinite.

【００２３】[0023]

【課題を解決するための手段】上記目的を達成するため
に、ループ飛び出しを含むループに対して、そのループ
内で値が更新されループ飛び出し後に値が使用されるデ
ータの値を、ループが指定回数だけ繰り返されるたびに
保存しながらループ繰り返しを複数プロセッサでオーバ
ラップさせて計算するオーバラップ計算手段と、ループ
飛び出し条件が成立した時、直前に保存したデータの値
を使用してそのデータのループ飛び出し後の値を保証す
る計算を行うループ飛び出し判定時手段と、を設けたも
のである。In order to achieve the above object, for a loop including a loop jump, the loop specifies a data value whose value is updated in the loop and whose value is used after the loop jump. Overlap calculation means for calculating by overlapping the loop repetition with a plurality of processors while saving each time it is repeated, and when the loop jump condition is satisfied, the loop of the data is stored using the value of the data stored immediately before. And a loop pop-out determination means for performing a calculation for guaranteeing the value after pop-out.

【００２４】また、上記目的を達成するために、上記オ
ーバラップ計算手段は、元のループでの上記データに対
するすべての値の格納を値保存用データへの格納に置き
換え、次のループ繰り返しでは、元のループでの上記デ
ータに対するすべての値の使用を値保存用データの使用
に置き換える保存兼計算手段、を含むものである。Further, in order to achieve the above object, the overlap calculating means replaces the storage of all values for the data in the original loop with the storage of value storage data, and in the next loop iteration, Storage and calculation means for replacing the use of all values for the data in the original loop with the use of value storage data.

【００２５】また、上記目的を達成するために、上記ル
ープ飛び出し条件判定手段は、複数のプロセッサの内、
１つのプロセッサが元のループのおけるループ飛び出し
条件判定を行い、その判定の結果を他プロセッサに通知
し、通知された判定結果を受け取る手段、を含むもので
ある。Further, in order to achieve the above object, the loop jump condition judging means comprises:
One processor performs a loop jump condition determination in the original loop, notifies the other processor of the determination result, and receives the notified determination result.

【００２６】また、上記目的を達成するために、プログ
ラムを入力し、構文解析し、プログラム解析を行い、並
列機向けの並列プログラムないし並列オブジェクトプロ
グラムを生成する並列化コンパイラであって、ループ飛
び出しがあるループを検出し、そのループ内で値が更新
されループ飛び出し後に値が使用されるデータを検出す
る適用性判定部と、上記データの値を、指定ループ繰り
返し回数ごとに保存しながらループ繰り返しを複数プロ
セッサでオーバラップさせて計算するオーバラップ計算
処理生成部と、ループ飛び出し条件が成立した時、直前
に保存したデータの値を使用してそのデータのループ飛
び出し後の値を保証する計算を行うループ飛び出し判定
時処理生成部と、を設けたものである。In order to achieve the above object, a parallelizing compiler for inputting a program, analyzing a syntax, analyzing a program, and generating a parallel program or a parallel object program for a parallel machine, wherein a loop jumping out. An applicability determining unit that detects a certain loop, detects data whose value is updated in the loop, and uses the value after jumping out of the loop, and performs loop repetition while storing the value of the data for each designated loop repetition count An overlap calculation processing generation unit that calculates by overlapping with a plurality of processors, and when a loop jump condition is satisfied, performs a calculation that guarantees a value of the data after the loop jump using the value of the data stored immediately before And a loop jump determination process generation unit.

【００２７】また、上記目的を達成するために、上記オ
ーバラップ計算処理生成部は、元のループでの上記デー
タに対するすべての値の格納を値保存用データへの格納
に置き換え、次のループ繰り返しでは、元のループでの
上記データに対するすべての値の使用を値保存用データ
の使用に置き換える保存兼計算処理生成部、を含むもの
である。In order to achieve the above object, the overlap calculation processing generation unit replaces the storage of all values for the data in the original loop with the storage of value storage data, and repeats the next loop. Includes a storage / calculation processing generation unit that replaces the use of all values for the data in the original loop with the use of value storage data.

【００２８】また、上記目的を達成するために、上記ル
ープ飛び出し判定時処理生成部は、複数のプロセッサの
内、１つのプロセッサが元のループのおけるループ飛び
出し条件判定を行い、その判定の結果を他プロセッサに
通知し、通知された判定結果を受け取る処理の生成部、
を含むものである。Further, in order to achieve the above object, the loop jumping judgment processing generation unit performs a loop jumping condition judgment in an original loop by one processor among a plurality of processors, and determines a result of the judgment. A generation unit for processing that notifies other processors and receives the notified determination result;
Is included.

【００２９】[0029]

【発明の実施の形態】以下、本発明の第１の実施例を図
１から図１４を用いて説明する。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS A first embodiment of the present invention will be described below with reference to FIGS.

【００３０】図１は、本発明による並列化コンパイラの
構成を示したものである。FIG. 1 shows a configuration of a parallelizing compiler according to the present invention.

【００３１】１００は並列化コンパイラであり、ソース
プログラム１０１を入力して辞書１０４及び中間語１０
５を出力する構文解析部１０２、辞書１０４及び中間語
１０５を入力してプログラムの制御フロー、データフロ
ー、データ依存等を解析してその結果を中間語１０５に
反映させたり追加情報として付加するプログラム解析部
１０３、辞書１０４及び中間語１０５を入力してプログ
ラム内のループを解析してループ飛び出しを含むループ
を並列化してその結果を辞書１０４及び中間語１０５に
反映させるループ並列化部１１０、辞書１０４及び中間
語１０５を入力してプログラムを高速実行するように変
換してその結果を辞書１０４及び中間語１０５に反映す
る最適化部１０６、辞書１０４及び中間語１０５を入力
して並列化プログラム１０８を生成するコード生成部１
０７より成る。Reference numeral 100 denotes a parallelizing compiler, which inputs a source program 101 and inputs a dictionary 104 and an intermediate language 10.
5, a program for inputting the syntax analysis unit 102, the dictionary 104, and the intermediate language 105, analyzing the control flow, data flow, data dependence, and the like of the program and reflecting the result in the intermediate language 105 or adding it as additional information A loop parallelizing unit 110 that inputs the analysis unit 103, the dictionary 104, and the intermediate language 105, analyzes a loop in the program, parallelizes a loop including a loop jump, and reflects the result in the dictionary 104 and the intermediate language 105. Optimizer 106 that inputs and converts intermediate program 105 and intermediate language 105 to execute the program at high speed and reflects the result in dictionary 104 and intermediate language 105, and inputs dictionary 104 and intermediate language 105 and generates parallelized program 108 Code generation unit 1 that generates
07.

【００３２】本実施例では、並列化プログラム１０８と
して分散メモリマシン向けのメッセージ通信入りの並列
化プログラムを生成する。In this embodiment, a parallelized program with message communication for a distributed memory machine is generated as the parallelized program 108.

【００３３】ループ並列化部１１０は、ループ飛び出し
を含むループの検出やそのループが並列化するための条
件を満たしているか否かを判定する適用性判定部１１
１、その結果、適用可能と判定された場合、上記ループ
を並列化するために新規変数を導入してその辞書１０４
を作成することや、並列用に中間語１０５を作成したり
修正したりすることを行うプログラム生成部１２０、適
用性判定部１１１で適用不能と判定された場合、上記ル
ープに対して従来からある並列化が適用可能か否かを判
定する並列性判定部１１２、その結果、並列性有りと判
定された場合、辞書１０４や中間語１０５に対して並列
化変換を行う並列化変換部１１３より成る。The loop parallelizing section 110 detects a loop including a loop jump and determines whether or not the loop satisfies a condition for parallelizing.
1. As a result, if it is determined that the dictionary can be applied, a new variable is introduced to parallelize the loop and the dictionary 104 is introduced.
If the program generation unit 120 and the applicability determination unit 111 determine that the loop is not applicable, the loop is conventionally created for the loop. A parallelism judging unit 112 for judging whether parallelization can be applied, and as a result, when it is judged that there is parallelism, a parallelization conversion unit 113 for performing parallelization conversion on the dictionary 104 and the intermediate language 105. .

【００３４】次に本発明の並列化コンパイラの動作を、
図１を元に、図２から図１４まで、具体例を用いて説明
する。Next, the operation of the parallelizing compiler of the present invention will be described.
2 to 14 will be described based on FIG. 1 using specific examples.

【００３５】図８はソースプログラム１０１の具体例で
ある。８０１から８１３までで示されるループの中に８
１０のループ飛び出し判定文があり、この条件が満たさ
れる時に８１１の文によりループ外の文８１４に制御が
移る。ここで８１０のepsは定数である。８１０のルー
プ飛び出し判定で使用される変数ｓは８０２で初期化さ
れ、８０７で値が更新される。FIG. 8 shows a specific example of the source program 101. 8 in the loop indicated by 801 to 813
There are ten loop jump determination statements, and when this condition is satisfied, the statement 811 transfers control to a statement 814 outside the loop. Here, eps of 810 is a constant. The variable s used in the loop jump determination at 810 is initialized at 802, and the value is updated at 807.

【００３６】並列化コンパイラ１００はこのソースプロ
グラム１０１を入力して構文解析部１０２で中間語１０
５を生成する。The parallelizing compiler 100 inputs the source program 101 and the syntax analyzer 102
5 is generated.

【００３７】図９は中間語の例である。中間語９００は
文８０５を構文解析した結果の中間語であり、中間語９
０１は文８０６を構文解析した結果の中間語である。FIG. 9 is an example of an intermediate language. The intermediate language 900 is an intermediate language as a result of parsing the sentence 805.
01 is an intermediate language as a result of parsing the sentence 806.

【００３８】９００は、演算ノード”＝”，”＋”，”
＊”、変数ノード”ａ（ｉ＋１，ｊ）”等、定数ノー
ド”−４”から成り、左側のノードからそのすぐ右側に
ある２つのノードへは、図９において矢印で示されるポ
インタで結合されている。Reference numeral 900 denotes an operation node “=”, “+”, “
* ", A variable node" a (i + 1, j) ", etc., and a constant node" -4 ". The left node is connected to the two nodes immediately to the right by a pointer indicated by an arrow in FIG. ing.

【００３９】９０１も同様に、演算ノード”＝”，”
＋”，”＊”、変数ノード”ａ（ｉ＋１，ｊ）”，”
ｔ”等、定数ノード”omega”から成り、左側のノード
からそのすぐ右側にある２つのノードへは、ポインタで
結合されている。Similarly, the operation node 901 has the operation nodes "=", ""
+ "," * ", Variable node" a (i + 1, j) ","
It consists of a constant node "omega" such as "t", and is linked by a pointer from the node on the left to the two nodes immediately on the right.

【００４０】変数ノードはまた参照点とも呼ばれる。プ
ログラム解析部１０３は、この中間語を入力し解析情報
を中間語に付加して出力する。図９の９０２から９０６
はその情報の一部である参照点間の依存の方向を示した
ものである。参照点間には図中の矢印の向きにさらに以
下の情報が付加される：９０２：ｋループにまたがるフロー依存がある９０３：ｉループにまたがるフロー依存がある９０４：ｋループにまたがるフロー依存がある９０５：ｊループにまたがるフロー依存がある９０６：ループ独立な逆依存があるこれらの用語の説明は、ハンスジーマ、バーバラチャ
ップマン共著、スーパーコンパイラーズフォーパラ
レルアンドベクターコンピューターズ、アディソ
ンウエスリパブリッシングカンパニーインク（Ｈan
s Ｚima and Ｂarbara Ｃhapmann,Ｓupercompilers for
Ｐarallel and Ｖector Ｃomputers,Ａddison- Ｗesle
y Ｐublishing Ｃompany,Ｉnc.）に詳しく記述されてい
る。Variable nodes are also called reference points. The program analysis unit 103 inputs the intermediate language, adds the analysis information to the intermediate language, and outputs the result. 902 to 906 in FIG.
Indicates the direction of dependence between reference points that are part of the information. The following information is further added between the reference points in the direction of the arrow in the drawing: 902: There is a flow dependency that spans the k loop 903: There is a flow dependency that spans the i loop 904: A flow dependency that spans the k loop 905: There is a flow dependency across the j loop 906: There is a loop independent inverse dependency Description of these terms is provided by Hans Zima and Barbara Chapman, Supercompilers for Parallel and Vector Computers, Addison Wesley Publishing Company, Inc. (Han
s Zima and Barbara Chapmann, Supercompilers for
Parallel and Vector Computers, Addison- Wesle
y Publishing Company, Inc.).

【００４１】中間語には以上の文または参照点の情報の
他に、プログラムの制御の流れを表現する情報がある。
図１０は図８の入力プログラムに対する基本ブロックと
制御フローを表す。The intermediate language includes information expressing the control flow of the program, in addition to the information on the sentence or the reference point.
FIG. 10 shows a basic block and a control flow for the input program of FIG.

【００４２】基本ブロックは文の集合であり、最初の文
直前以外に、他の基本ブロックから制御が移ってくるこ
とはなく、最後の文直後以外に他の基本ブロックへ制御
が移ることがないようなものである。１０００から１０
１０はすべて基本ブロックである。実線矢印は基本ブロ
ック間の制御フローを示したものである。１００７、１
００８、１０１０の左肩のラベルiend，jend，kendはそ
れぞれ、分岐先を明示するためのラベルである。たとえ
ば、１００３の文中の”goto jend”は、１００３の条
件文が成立すればラベル”jend”で示される基本ブロッ
クへ制御が移ることを表す。点線矢印はループ中の基本
ブロック同士を結合するポインタである。但し、２つの
ループに包含関係がある場合、一方を包含するループに
対しては、そのループに含まれるが他方のループに含ま
れない基本ブロックのみを結合する。A basic block is a set of sentences. Control does not transfer from another basic block except immediately before the first sentence, and control does not transfer to another basic block except immediately after the last sentence. It is like. 1000 to 10
All 10 are basic blocks. Solid arrows indicate control flows between basic blocks. 1007, 1
Labels iend, jend, and kend on the left shoulder of 008 and 1010 are labels for specifying the branch destinations. For example, “goto jend” in the sentence 1003 indicates that control is transferred to the basic block indicated by the label “jend” if the conditional sentence 1003 is satisfied. A dotted arrow is a pointer that connects basic blocks in a loop. However, when there is an inclusion relationship between two loops, only a basic block included in the loop but not included in the other loop is connected to the loop including one.

【００４３】文８０１から文８１３で示されるｋループ
内の基本ブロックは、１００１、１００２、１００８、
１００９間の点線矢印で結合される。文８０３から文８
０９で示されるｊループ内の基本ブロックは、１００
３、１００４、１００７間の点線矢印で結合される。文
８０４から文８０８で示されるｊループ内の基本ブロッ
クは、１００５、１００６間の点線矢印で結合される。The basic blocks in the k loop represented by statements 801 to 813 are 1001, 1002, 1008,
They are connected by dotted arrows between 1009. Sentence 803 to sentence 8
The basic block in the j loop denoted by 09 is 100
3, 1004 and 1007 are connected by a dotted arrow. The basic blocks in the j loop indicated by the statements 804 to 808 are connected by a dotted arrow between 1005 and 1006.

【００４４】以上の基本ブロックと基本ブロック間の制
御フローは、構文解析部１０２ないしプログラム解析部
１０３で解析された結果である。また、ループおよびそ
のループ中の基本ブロック間の接続はプログラム解析部
１０３で解析された結果である。The control flow between the basic blocks described above is a result of analysis performed by the syntax analysis unit 102 or the program analysis unit 103. Further, the connection between the loop and the basic block in the loop is a result analyzed by the program analysis unit 103.

【００４５】プログラム解析部１０３は、中間語の解析
に関連してプログラム中からループを認識しループテー
ブルを作成する。The program analysis unit 103 recognizes a loop from the program in relation to the analysis of the intermediate language and creates a loop table.

【００４６】図１１は、ループテーブルの一例である。
１１００、１１１０、１１２０は各々ループｋ，ループ
ｊ，ループｉに対するループテーブルを示す。FIG. 11 is an example of a loop table.
Reference numerals 1100, 1110, and 1120 denote loop tables for loop k, loop j, and loop i, respectively.

【００４７】１１０１はループ制御変数、１１０２はル
ープ中の最初の基本ブロック、１１０３は最後の基本ブ
ロック、１１０４は内側にループを含む時、そのループ
に対するループテーブルへのポインタを示す。ループｋ
に対しては、それぞれ、ｋ、１００１、１００８、ルー
プｊに対するループテーブルである。１００１、１００
８は、図１０の基本ブロックであり、それぞれ点線矢印
で結合された最初と最後の基本ブロックである。Reference numeral 1101 denotes a loop control variable, 1102 denotes the first basic block in the loop, 1103 denotes the last basic block, and 1104 denotes a pointer to a loop table for the loop when a loop is included inside. Loop k
Is a loop table for k, 1001, 1008, and loop j, respectively. 1001, 100
Reference numeral 8 denotes a basic block in FIG. 10, which is the first and last basic blocks connected by dotted arrows.

【００４８】同様にして、１１１０では、１１１１がル
ープ制御変数ｊ、１１１２が最初の基本ブロック１００
３、１１１３が最後の基本ブロック１００７、１１１４
が内側ループｉに対するループテーブルへのポインタを
示す。Similarly, in 1110, 1111 is the loop control variable j, and 1112 is the first basic block 100.
3, 1113 is the last basic block 1007, 1114
Indicates a pointer to the loop table for the inner loop i.

【００４９】また、同様にして、１１２０では、１１２
１がループ制御変数ｉ、１１２２が最初の基本ブロック
１００５、１１２３が最後の基本ブロック１００６、１
１２４はもう内側ループはないので値は０である。Similarly, at 1120, 112
1 is the loop control variable i, 1122 is the first basic block 1005, 1123 is the last basic block 1006, 1
The value 124 is 0 because there is no inner loop anymore.

【００５０】次に図２を用いてループ並列化部１１０の
最初の処理である適用性判定部１１１を説明する。最初
の処理２００は内側にループを含むことが図１１の１１
０４がループテーブル１１１０をポイントすることから
わかるのでＹＥＳである。Next, the applicability judging unit 111 which is the first process of the loop parallelizing unit 110 will be described with reference to FIG. The first process 200 may include a loop inside 11 in FIG.
Since it can be seen from the fact that 04 points to the loop table 1110, it is YES.

【００５１】次の２０１はループ飛び出しがあるか否か
である。これは図１１のループｋに対するループテーブ
ル１１００からループｉに対するループテーブル１１２
０を順にたどって調べる。The next step 201 is whether or not there is a loop jump. This is from the loop table 1100 for loop k to the loop table 112 for loop i in FIG.
0 is examined in order.

【００５２】まず、ループテーブル１１００の最初の基
本ブロック１１０２は１００１である。図１０よりこの
基本ブロックから制御が移るのは基本ブロック１０１０
と基本ブロック１００２である。基本ブロック１００１
はループヘッダというループを制御する特別の基本ブロ
ックであり基本ブロック１０１０への制御の遷移はルー
プの繰り返し回数が与えられた回数に達したときに実行
される特別なものなので、これはループ飛び出しではな
い。基本ブロック１００２は基本ブロック１００１から
点線矢印で結合されているのでループｋ内の基本ブロッ
クである。First, the first basic block 1102 of the loop table 1100 is 1001. From FIG. 10, the control is transferred from this basic block to the basic block 1010.
And the basic block 1002. Basic block 1001
Is a special basic block that controls a loop called a loop header, and the transition of control to the basic block 1010 is a special one that is executed when the number of loop iterations reaches a given number. Absent. The basic block 1002 is a basic block in the loop k because it is connected to the basic block 1001 by a dotted arrow.

【００５３】次に、基本ブロック１００１から点線矢印
で結合されているループｋ内の基本ブロック１００２に
ついて調べる。基本ブロック１００２から制御が移るの
は基本ブロック１００３である。これは図１１の１１１
２が示す基本ブロックなので、ループｋの内側ループ内
の基本ブロックである。したがって、ループ飛び出しで
はない。Next, the basic block 1002 in the loop k connected by the dotted arrow from the basic block 1001 is examined. Control is transferred from the basic block 1002 to the basic block 1003. This is 111 in FIG.
Since the basic block 2 is a basic block, it is a basic block in the inner loop of the loop k. Therefore, it is not a loop jump.

【００５４】次に、基本ブロック１００２から点線矢印
で結合されているループｋ内の基本ブロック１００８に
ついて調べる。基本ブロック１００８から制御が移るの
は基本ブロック１００９と基本ブロック１０１０であ
る。基本ブロック１００９は基本ブロック１００８から
点線矢印で結合されているのでループｋ内の基本ブロッ
クである。基本ブロック１０１０は、図１１の各ループ
テーブルからポイントされる基本ブロックの範囲に含ま
れていないのでループ外の基本ブロックである。したが
って、ループｋはループ飛び出しを持ち、図２の２０１
はＹＥＳである。Next, the basic block 1008 in the loop k connected by the dotted arrow from the basic block 1002 will be examined. Control is transferred from the basic block 1008 to the basic block 1009 and the basic block 1010. The basic block 1009 is a basic block in the loop k because it is connected to the basic block 1008 by a dotted arrow. The basic block 1010 is a basic block outside the loop because it is not included in the range of the basic block pointed from each loop table in FIG. Therefore, the loop k has a loop jump, and 201 in FIG.
Is YES.

【００５５】２０２は基本ブロック１００８からｓであ
るとわかる。It can be seen that 202 is s from the basic block 1008.

【００５６】２０３はループ中で、ｓが出現するのは基
本ブロック１００８を除くと、基本ブロック１００２で
の”ｓ＝０．０”と基本ブロック１００６での”ｓ＝ｓ
＋abs(t)”のみとわかり、ｓは他の変数の値更新計算に
使用されないとわかるのでＹＥＳである。In the loop 203, s appears in the loop except for the basic block 1008, where “s = 0.0” in the basic block 1002 and “s = s” in the basic block 1006.
+ Abs (t) "alone, and s is not used for the update calculation of the other variables.

【００５７】２０４はｓ以外に図９で説明したループｋ
に対するループ繰り返しフロー依存として９０２、９０
４があるのでＹＥＳである。Reference numeral 204 denotes a loop k described with reference to FIG.
902 and 90 as loop iteration flow dependencies for
YES because there are four.

【００５８】したがって２０５は、”ａ（ｉ，ｊ）”
と”ａ（ｉ＋１，ｊ）”、および”ａ（ｉ，ｊ）”と”
ａ（ｉ，ｊ＋１）”であるとわかる。Therefore, 205 is "a (i, j)"
And "a (i + 1, j)" and "a (i, j)" and "
a (i, j + 1) ".

【００５９】２０６は、最初の組についてはループｉの
ループ制御変数ｉを含む添字の差が１、２番目の組につ
いてはループｊのループ制御変数ｊを含む添字の差が１
であるとわかる。規定値を１とすると両方ともこれを満
たしているのでＹＥＳである。In the reference numeral 206, the difference between the subscripts including the loop control variable i of the loop i is 1 for the first set, and the difference between the subscripts including the loop control variable j of the loop j is 1 for the second set.
It turns out that it is. If the prescribed value is set to 1, both of these conditions are satisfied, so that the result is YES.

【００６０】２０７でのループ出口とはループ飛び出し
基本ブロック１００８およびループヘッダ１００１であ
る。ある変数がある基本ブロックの最後でＬＩＶＥと
は、その基本ブロック内またはそれ以前でその変数の値
が更新され、その基本ブロック以降でその値が使用され
る可能性があることである。ループ出口以降で使用され
る変数は基本ブロック１０１０中の配列ａである。した
がってａはＬＩＶＥである。また、ａは基本ブロック１
００６中の”ａ（ｉ，ｊ）＝ａ（ｉ，ｊ）＋omega*t”
で値が更新される。したがって、配列ａが終値保証配列
として登録される。２０８において内側ループとして
ループｊを取ると、配列ａのループｊ内の各参照点のい
ずれにおいてもｊはただ１つの次元である２次元目にだ
け出現するのでＹＥＳである。The loop exit at 207 is a loop pop-out basic block 1008 and a loop header 1001. LIVE at the end of a basic block with a variable means that the value of the variable is updated in or before the basic block, and the value may be used after the basic block. The variable used after the loop exit is the array a in the basic block 1010. Therefore, a is LIVE. A is the basic block 1
“A (i, j) = a (i, j) + omega * t” in 006
Will update the value. Therefore, the array a is registered as a close-guaranteed array. If a loop j is taken as an inner loop at 208, YES is present at each of the reference points in the loop j of the array a because j appears only in the second dimension, which is only one dimension.

【００６１】２００ではｊは配列ａでの参照点に対して
も同一次元である２次元目に出現するのでＹＥＳであ
る。In the case of 200, since j appears in the second dimension which is the same dimension with respect to the reference point in the array a, the result is YES.

【００６２】以上より２１０で適用可能となる。As described above, application becomes possible at 210.

【００６３】以上により適用性判定部１１１で適用可能
となり、プログラム生成部１２０の最初の処理である全
体制御処理生成部１２１に移る。As described above, the application can be applied by the applicability determining unit 111, and the process proceeds to the overall control processing generating unit 121 which is the first processing of the program generating unit 120.

【００６４】次に全体制御処理生成部１２１を図２から
図５と図１２を用いて説明する。Next, the overall control processing generator 121 will be described with reference to FIGS. 2 to 5 and FIG.

【００６５】図３は全体制御処理生成部１２１のアルゴ
リズムを説明した図である。FIG. 3 is a diagram for explaining the algorithm of the overall control processing generation unit 121.

【００６６】図１２はプログラム生成部１２０で生成さ
れる中間語をソースプログラムスタイルで記述したもの
である。FIG. 12 describes the intermediate language generated by the program generator 120 in a source program style.

【００６７】３００では変数exit，ret，rest，kk，p，
ssに対する辞書を作成する。ssを除いたものは整数属性
で、ssは飛び出し条件判定で用いられる変数ｓと同じ属
性の配列で大きさは２＊intervalである。ここで、inte
rvalはコンパイルオプションまたはコンパイラが決定す
る定数で、終値保証配列の値を１つの保存変数に保存す
る間隔を表す。At 300, the variables exit, ret, rest, kk, p,
Create a dictionary for ss. Those excluding ss are integer attributes, and ss is an array of the same attribute as the variable s used in the jump condition determination, and has a size of 2 * interval. Where inte
rval is a compile option or a constant determined by the compiler, and indicates an interval at which the value of the close-guaranteed array is stored in one storage variable.

【００６８】３０１では終値保証配列ａ（図８の入力プ
ログラムでも同じ配列名ａ）に対して、それと同じ変数
属性を持つ保存配列ａ１、ａ２の辞書を作成する。In 301, a dictionary of storage arrays a1 and a2 having the same variable attributes as the final price guarantee array a (the same array name a in the input program of FIG. 8) is created.

【００６９】３０２では”exit＝０”，”ret＝０”を
ループ直前に、”kk＝mod（ｋ，２＊interval）”をル
ープ入口直後に生成する。この結果、図１２において１
２０１、１２０２、１２０４が生成される。At 302, "exit = 0" and "ret = 0" are generated immediately before the loop, and "kk = mod (k, 2 * interval)" is generated immediately after the loop entry. As a result, in FIG.
201, 1202, and 1204 are generated.

【００７０】３０３ではｓ(k)をss(kk)に置き換える。
この結果、１２０５の条件文の中身、１２４１の条件文
が得られる。In step 303, s (k) is replaced with ss (kk).
As a result, the contents of the conditional statement 1205 and the conditional statement 1241 are obtained.

【００７１】３０４ではループ内にあり内側ループ外に
ある文の内、ループ飛び出し判定文以外をプロセッサ０
だけが実行するように条件文を作成する。条件を満たす
のは図２の８０２だけなので１２０５を得る。At step 304, the processor 0 removes the statements outside the loop from the statements inside the loop and outside the inner loop.
Create conditional statements that only execute. Since only the condition 802 in FIG. 2 satisfies the condition, 1205 is obtained.

【００７２】３０５により１２４０と１２４８から成る
条件文を作成する。A conditional statement composed of 1240 and 1248 is created by 305.

【００７３】次にループ飛び出し判定処理生成部３０６
に移る。これは図４を用いて説明する。Next, a loop jump determination processing generation unit 306
Move on to This will be described with reference to FIG.

【００７４】まず、４００において、終値保証配列は
ａ、その保存配列はａ１、ａ２、ループ飛び出しは”go
to１０”なので、１２３０から１２３３を得る。First, in 400, the closing price guarantee array is a, its storage sequences are a1 and a2, and the loop jump is "go".
Since it is to10 ", 1233 is obtained from 1230.

【００７５】次に４０１においても同様にして、１２４
２から１２４６を得る。Next, in the same manner as in 401,
2 is obtained from 1246.

【００７６】次に保存兼計算処理生成部３０７に移る。Next, the process proceeds to the storage / calculation processing generation unit 307.

【００７７】図５は保存兼計算処理生成部３０７のアル
ゴリズムを説明した図である。FIG. 5 is a diagram for explaining the algorithm of the storage / calculation processing generation unit 307.

【００７８】まず、５００において、関数辞書Ｆを作成
する。First, at 500, a function dictionary F is created.

【００７９】次に５０１において、終値保証配列はａだ
けなので、終値保証配列用、保存配列用とss(kk),s,p,e
xit,rest,retから成る８個の引数リストを作成する。Next, in 501, since the closing price guarantee array is only a, the ss (kk), s, p, e
Create an eight argument list consisting of xit, rest, ret.

【００８０】次に５０２において、終値保証配列はａだ
けなので、引数の最初の２つは、上の行の引数から順
に、a,a１;a１,a;a,a;a,a２;a２,a;a,aとなる。結局、
１２１０から１２２２までを得る。Next, at 502, since the final price guarantee array is only a, the first two arguments are a, a1; a1, a; a, a; a, a2; a2, a; a, a. After all,
1210 to 1222 are obtained.

【００８１】これで図１２の生成プログラムの全ての文
を得ることができた。Thus, all the sentences of the generation program of FIG. 12 have been obtained.

【００８２】次にオーバラップ計算処理生成部１２２を
図６、図８と図１３を用いて説明する。Next, the overlap calculation processing generation unit 122 will be described with reference to FIGS. 6, 8 and 13. FIG.

【００８３】図６はオーバラップ計算処理生成部１２２
のアルゴリズムを説明した図である。FIG. 6 shows an overlap calculation processing generation unit 122.
FIG. 3 is a diagram illustrating an algorithm of FIG.

【００８４】図１３はオーバラップ計算処理生成部１２
２の生成した中間語をソースプログラムスタイルで記述
したものである。FIG. 13 shows the overlap calculation processing generation unit 12.
2 describes the generated intermediate language in a source program style.

【００８５】オーバラップ計算処理生成部１２２の入力
は図８の８０３から８０９の文である。The input of the overlap calculation processing generation unit 122 is the sentences 803 to 809 in FIG.

【００８６】まず、６００において、８０３から８０９
を保存兼計算処理生成部３０７で生成された関数辞書Ｆ
を持ち、同生成部で作成された引数リストを持つサブル
ーチンとする。第１引数はａ、第２引数はｂとする。こ
れで１３００が得られた。First, in 600, 803 to 809
Is the function dictionary F generated by the storage / calculation processing generation unit 307.
And a subroutine having an argument list created by the generation unit. The first argument is a and the second argument is b. Thus, 1300 was obtained.

【００８７】次に、６０１において終値保証配列ａの内
側ループ制御変数ｊの出現次元は２次元目なので、ａの
２次元目をblock分割する。block分割はデータ分割の一
種であり、配列の添字の連続部分が１つのプロセッサに
割り当てられるようにデータを分割する方法である。こ
のようなデータ分割方法およびデータ分割指示の仕様に
ついては、ハイパフォーマンスフォートランフォー
ラム、ハイパフォーマンスフォートランランゲー
ジスペシフィケーションヴァージョン１．０、セン
ターフォーリサーチオンパラレルコンピュテ
ーション、ライス大学、ヒューストン、テキサス州、１
９９３年度発行（Ｈigh ＰerformanceＦortran Ｆorum,
Ｈigh Ｐerformance Ｆortran Ｌanguage Ｓpecificati
on ver. １.０，Ｃenter for Ｒesearch on Ｐarallel
Ｃomputation，Ｒice Ｕniv.，Ｈouston，Ｔx，１９９
３．）に詳しい。Next, in 601, since the appearance dimension of the inner loop control variable j of the close price guarantee array a is the second dimension, the second dimension of a is divided into blocks. Block division is a type of data division, and is a method of dividing data such that a continuous part of an array subscript is assigned to one processor. Specifications for such data partitioning methods and data partitioning instructions are provided in High Performance Fortran Forum, High Performance Fortran Language Specification Version 1.0, Center for Research on Parallel Computation, Rice University, Houston, Texas,
Published in 993 (High Performance Fortran Forum,
High Performance Fortran Language Speccificati
on ver. 1.0, Center for Research on Paraallel
Computation, Rice Univ., Houston, Tx, 199
3. ).

【００８８】６０２において、終値保証配列ａの定義参
照即ち、ａの値を更新する参照点は図８の８０６の左辺
なので、これを第２引数ｂで置き換えて１３４３を得
る。In 602, since the reference to the definition of the close price guarantee array a, that is, the reference point for updating the value of a is the left side of 806 in FIG. 8, this is replaced with the second argument b to obtain 1343.

【００８９】６０３により１３４４を得る。According to 603, 1344 is obtained.

【００９０】６０４によりプロセッサ数をｎｐとしてル
ープｊを分散する。このとき、ループ繰り返し範囲はｎ
ｐで割られるのでループ回数はＮ／ｎｐとなり、１３４
０を得る。また、本実施例ではメッセージ通信入り並列
化プログラムを生成するので、１３０１、１３０５から
１３０７、１３０８から１３０９、１３１３から１３１
４、１３３０から１３３１、１３３３から１３３４、１
３５０、１３５２から１３５４を得る。これらのメッセ
ージ通信の生成方法については、ヒラナンダニ、ケネデ
ィ、ツェン、エヴァリュエイティングオブコンパイ
ラオプティマイゼーションズフォーフォートラン
ディー、ジャーナルオブパラレルアンドデイス
トリビューテッドコンピューテイング、２１号、第２
７頁から第４５頁、１９９４年度発行（Ｈiranandani，
Ｋennedy，Ｔseng，Ｅvaluating of Ｃompiler Ｏptimi
zations for Ｆortran Ｄ，Ｊournal of Ｐarallel and
Ｄistributed Ｃomputing，２１，pp.２７−４５，１９
９４．）に詳しい。According to 604, the loop j is distributed with the number of processors as np. At this time, the loop repetition range is n
Since it is divided by p, the number of loops is N / np, and 134
Get 0. In this embodiment, since a parallelized program with message communication is generated, 1301, 1305 to 1307, 1308 to 1309, and 1313 to 131.
4, 1330 to 1331, 1333 to 1334, 1
350, 1352 to 1354 are obtained. For a method of generating these message communications, see Hilane Dani, Kennedy, Tseng, Evaluating of Compiler Optimizations for Fortlandy, Journal of Parallel and Distributed Computing, No. 21, No. 2,
Pages 7 to 45, published in 1994 (Hiranandani,
Kennedy, Tseng, Evaluating of Compiler Optimi
zations for Fortran D, Journal of Paraallel and
Distributed Computing, 21, pp. 27-45, 19
94. ).

【００９１】６０５において、１３３０から１３３１、
１３３３から１３３４、１３５０、１３５２から１３５
４がプロセッサ番号が小さい方から大きい方への通信な
ので、これらの通信と融合することにより、ssの通信１
３３２、１３５１を得る。実際はこれらは融合して通信
するためのバッファ配列buf２からのデータの取り出し
と同配列への代入とにそれぞれ対応する。At 605, 1330 to 1331,
1333 to 1334, 1350, 1352 to 135
4 is a communication from the smaller processor number to the larger processor number.
332, 1351 are obtained. Actually, these correspond to the extraction of data from the buffer array buf2 and the assignment to the same, respectively, for fusing and communicating.

【００９２】次に、６０６において、１３０１、１３０
５から１３０７、１３０８から１３０９、１３１３から
１３１４がプロセッサ番号が大きい方から小さい方への
通信なので、これらの通信と融合することにより、１３
０２から１３０４、１３１０から１３１２を得る。これ
らも同様にして融合して通信するためのバッファ配列bu
f１への代入と受信したバッファからのデータの取り出
しとにそれぞれ対応する。Next, in 606, 1301, 130
5 to 1307, 1308 to 1309, and 1313 to 1314 are communications from the larger processor number to the smaller processor number.
02 to 1304 and 1310 to 1312 are obtained. Similarly, a buffer array bu for fusing and communicating
This corresponds to the assignment to f1 and the retrieval of data from the received buffer, respectively.

【００９３】６０７により、１３２０から１３２３を得
る。According to 607, 1320 to 1323 are obtained.

【００９４】以上により図１３の全ての文を得る。Thus, all the sentences in FIG. 13 are obtained.

【００９５】次にループ飛び出し判定時処理生成部１２
３を図７と図１４を用いて説明する。Next, the process generation unit 12 for loop jump determination
3 will be described with reference to FIGS.

【００９６】図７はループ飛び出し判定時処理生成部１
２３のアルゴリズムを説明した図である。FIG. 7 shows a process generation unit 1 for determining a loop jump.
It is a figure explaining 23 algorithms.

【００９７】図１４はループ飛び出し判定時処理生成部
１２３の生成した中間語をソースプログラムスタイルで
記述したものである。FIG. 14 shows the intermediate language generated by the loop generation processing unit 123 in a source program style.

【００９８】まず、７００において、終値保証配列はａ
のみなので、引数リストはａ，ａ１，ａ２，ｓ，rest，
exit，ｐとなる。したがって、１４００を得る。First, at 700, the closing price guarantee array is a
, A, a1, a2, s, rest,
exit, p. Therefore, 1400 is obtained.

【００９９】次に、７０１で、オーバラップ計算処理生
成部１２２においてプロセッサ番号が大きい方から小さ
い方への通信は１３０１から１３１２までなので、その
中のａの部分を、restがintervalより小さい時はａ１
に、それ以外はａ２に置き換えることにより１４０１か
ら１４１１を得る。Next, in 701, since the communication from the larger processor number to the smaller processor number is from 1301 to 1312 in the overlap calculation processing generation unit 122, the part a in it is replaced when rest is smaller than the interval. a1
Otherwise, 1401 to 1411 are obtained by substituting a2 with a2.

【０１００】７０２で、”rest＜interval”の時、オー
バラップ計算処理生成部１２２においてプロセッサ番号
が小さい方から大きい方へのssの通信を除く通信は１３
３０から１３３１、１３３３から１３３４、１３５０、
１３５２から１３５４であるので、これらから通信を融
合するためのバッファを使わないようにし、ａへの受信
をａ１に置き換えることで、１４２６、１４２８を得
る。ssの値更新計算を除く内側ループ計算は、１４２７
におけるサブルーチンＦＦの呼び出しと１４５０から１
４５６で示されたサブルーチンＦＦで実現される。第１
引数をａ１にすることで終値保証配列ａの使用が全てａ
１に置き換わる。In 702, when “rest <interval”, the overlap calculation processing generation unit 122 performs communication other than communication of ss from the smaller processor number to the larger processor number 13
30 to 1331, 1333 to 1334, 1350,
Since 1352 to 1354 are used, buffers 1426 and 1428 are obtained by not using a buffer for merging communications and replacing reception to a with a1. The inner loop calculation excluding the ss value update calculation is 1427
Call of subroutine FF and 1450 to 1
This is realized by a subroutine FF indicated by 456. First
By setting the argument to a1, all the uses of the close price guarantee array a
Replace with 1.

【０１０１】同様にして、”上記以外”の時、同じ通信
から通信を融合するためのバッファを使わないように
し、ａへの受信をａ２に置き換えることで、１４３０か
ら１４３２を得る。第１引数をａ２にすることで終値保
証配列ａの使用が全てａ２に置き換わる。Similarly, when "other than the above", a buffer for merging the same communication is not used, and the reception to a is replaced with a2 to obtain 1430 to 1432. By setting the first argument to a2, all the uses of the close price guarantee array a are replaced with a2.

【０１０２】結局、７０２の結果、１４２０から１４３
４までを得る。After all, as a result of 702, 1420 to 143
Get up to 4.

【０１０３】７０３により、プロセッサ番号が小さい方
から大きい方へのssの通信を除く通信は１４４３、１４
４５で与えられる。また、プロセッサ番号が大きい方か
ら小さい方へのｓ，exit，restを除く通信は１４４１、
１４４２で与えられる。内側ループ計算は１４４４のサ
ブルーチン呼び出しで与えられる。これを（rest−１）
回繰り返すのであるから、１４４０、１４４６を得る。According to 703, the communication except for the communication of ss from the smaller processor number to the larger processor number is 1443, 14
Given at 45. In addition, communication except for s, exit, and rest from the larger processor number to the smaller processor number is 1441.
1442. The inner loop calculation is provided by a 1444 subroutine call. This is (rest-1)
Since it is repeated twice, 1440 and 1446 are obtained.

【０１０４】図１４より、ループ飛び出し通知元プロセ
ッサからの、直線のａ１へのデータ保存回数からループ
飛び出し回数までのループ繰り返し回数rest、および、
直前の保存データa１ないしa２の値で通知先プロセッサ
で使用するデータ１４０６ないし１４０８の通知１４０
１から１４１１、通知先プロセッサでの保存データから
の値の代入ないしそれを用いたループ１回分の計算１４
２０から１４３４、残りの（rest−１）回のループ計算
１４４０、１４４６が得られた。As shown in FIG. 14, the number of loop repetitions rest from the number of times data is stored in the straight line a1 to the number of times the loop pops out from the loop popping-out notification source processor, and
Notification 140 of data 1406 to 1408 used by the notification destination processor based on the value of immediately preceding stored data a1 or a2
1 to 1411, substitution of a value from saved data in the notification destination processor or calculation for one loop using the value 14
From 20 to 1434, the remaining (rest-1) loop calculations 1440 and 1446 were obtained.

【０１０５】以上より、ループ並列化部１１０の並列化
結果の中間語である図１２、図１３、図１４を得た。As described above, FIG. 12, FIG. 13, and FIG. 14, which are intermediate words of the parallelization result of the loop parallelization unit 110, are obtained.

【０１０６】最適化部１０６はこの結果を入力し、中間
語を最適な中間語に変換する。The optimizing unit 106 receives the result and converts the intermediate language into an optimal intermediate language.

【０１０７】コード生成部は最適な中間語を入力し、ソ
ースプログラムの形またはオブジェクトプログラムの形
の並列化プログラム１０８として出力する。The code generator inputs the optimal intermediate language and outputs it as a parallelized program 108 in the form of a source program or an object program.

【０１０８】図１５は従来技術による図８のプログラム
の実行の様子を表したものである。FIG. 15 shows the state of execution of the program of FIG. 8 according to the prior art.

【０１０９】（Ａ）は２次元配列ａのデータ分割の様子
を示した図である。２次元目の添字ｊ方向でデータを４
分割し、それぞれプロセッサｐ０，ｐ１，ｐ２，ｐ３が
計算することを示す。（Ｂ）はこのデータ分割に従って
分散されたプログラムの実行の様子である。横軸がプロ
セッサｐ０，ｐ１，ｐ２，ｐ３を、縦軸が時間を表す。
ｔ０がある繰り返し回のループｋの最初の位置であり、
ｔ１がループｋの最後の位置に相当する。矩形の部分は
内側ループの計算時間を表し、矩形の中の数字はループ
の繰り返し回数を表す。これからループｋの内側ループ
の計算は逐次化されていることがわかる。計算以外に通
信が発生するので、結局、１プロセッサで実行するより
時間がかかる。(A) is a diagram showing a state of data division of a two-dimensional array a. Fourth data in the second dimension subscript j direction
The division is performed, and the processors p0, p1, p2, and p3 respectively calculate. (B) shows the state of execution of the program distributed according to the data division. The horizontal axis represents the processors p0, p1, p2, and p3, and the vertical axis represents time.
t0 is the first position of loop k for a given iteration,
t1 corresponds to the last position of the loop k. The rectangular part represents the calculation time of the inner loop, and the number in the rectangle represents the number of loop iterations. This shows that the calculation of the inner loop of the loop k is serialized. Since communication occurs in addition to the calculation, it takes longer than the execution by one processor.

【０１１０】図１６は本発明による図８のプログラムの
実行の様子を表したものである。横軸がプロセッサｐ
０，ｐ１，ｐ２，ｐ３を、縦軸が時間を表す。矩形の部
分は内側ループの計算時間を表し、矩形の中の数字は保
存配列ａ１へのデータの保存が開始されてからのループ
ｋの繰り返し回数を表わす。図１２等におけるinterval
の値は１０である。したがって、２＊intervalは２０で
あり、１０繰り返しの間隔でａ１ないしａ２へデータの
値を保存している。ｔ０は、２＊intervalの新しい繰り
返しが始まった時刻を表し、ここから始まる第１回目の
ループｋの繰り返し（図の矩形中の数字が１である計
算）では、データの値をａ１に保存しながら同時に計算
も行う。FIG. 16 shows how the program of FIG. 8 is executed according to the present invention. The horizontal axis is the processor p
0, p1, p2, and p3, and the vertical axis represents time. The rectangular portion represents the calculation time of the inner loop, and the number in the rectangle represents the number of repetitions of the loop k since the start of storing data in the storage array a1. Interval in FIG. 12 etc.
Is 10. Therefore, 2 * interval is 20, and data values are stored in a1 or a2 at intervals of 10 repetitions. t0 represents the time when a new repetition of 2 * interval starts, and in the first repetition of loop k starting from here (the calculation in which the number in the rectangle in the figure is 1), the value of the data is stored in a1 While doing the calculation at the same time.

【０１１１】また、第２回目のループｋの繰り返し（図
の矩形中の数字が２である計算）では、ａ１に保存され
たデータの値を利用して計算を行う。図１６では時刻ｔ
２にプロセッサｐ３で、ループ飛び出し条件が成立した
ことを表し、その時にプロセッサｐ０，ｐ１では既に第
３回目のループ繰り返しを計算していることを表してい
る。In the second iteration of loop k (the calculation in which the number in the rectangle in the figure is 2), the calculation is performed using the value of the data stored in a1. In FIG. 16, time t
2 indicates that the loop jump condition is satisfied by the processor p3, and that the processors p0 and p1 have already calculated the third loop iteration.

【０１１２】ｔ２直後でプロセッサｐ３はプロセッサｐ
２に対してループ飛び出しを通知する通信を行う。この
ときプロセッサｐ３は保存データの内で、プロセッサｐ
２がその計算に必要な部分も同時に通信する。そして、
第２回目の繰り返しに必要なデータがプロセッサｐ２か
ら通信されるのを待つ。Immediately after t2, the processor p3 becomes the processor p.
2 is communicated to notify a loop jump. At this time, the processor p3 stores the processor p in the stored data.
2 also communicates the parts necessary for the calculation at the same time. And
It waits for data necessary for the second iteration to be communicated from the processor p2.

【０１１３】プロセッサｐ２ではプロセッサｐ３からル
ープ飛び出しを知り、保存データの内でプロセッサｐ１
がその計算に必要な部分と同時に飛び出しを通知する通
信をプロセッサｐ１に対して行う。そして、第２回目の
繰り返しに必要なデータがプロセッサｐ１から通信され
るのを待つ。このとき、プロセッサｐ０は第４回目の繰
り返しを計算している。In the processor p2, the loop jumping out of the processor p3 is known, and the processor p1 is included in the stored data.
Performs communication for notifying the pop-out to the processor p1 at the same time as the part necessary for the calculation. Then, it waits for data necessary for the second repetition to be transmitted from the processor p1. At this time, the processor p0 has calculated the fourth iteration.

【０１１４】プロセッサｐ１でも同様にプロセッサｐ２
からループ飛び出しを知り、保存データの内でプロセッ
サｐ０がその計算に必要な部分と同時に飛び出しを通知
する通信をプロセッサｐ０に対して行う。そして、第２
回目の繰り返しに必要なデータがプロセッサｐ０から通
信されるのを待つ。Similarly, the processor p1 also operates as the processor p2.
, The processor p0 communicates with the processor p0 notifying the jump at the same time as the part necessary for the calculation in the stored data. And the second
It waits for data necessary for the second repetition to be transmitted from the processor p0.

【０１１５】プロセッサｐ０ではループ飛び出しの通知
をプロセッサｐ１から受け、同時にプロセッサｐ１から
送られてきたデータと保存データからループ繰り返し第
２回目の計算を行い、計算結果の一部でプロセッサｐ１
の計算に必要なデータをプロセッサｐ１に送る。以下、
同様にｐ１、ｐ２、ｐ３は計算を続ける。The processor p0 receives the notice of the loop jump from the processor p1, and at the same time, performs the second calculation of the loop repetition from the data sent from the processor p1 and the stored data.
Is sent to the processor p1. Less than,
Similarly, p1, p2, and p3 continue to calculate.

【０１１６】ｔ３はループ飛び出しがｐ０にまで通知さ
れた時刻を、ｔ４は全ての実行が終了した時刻を表す。[0116] t3 represents the time when the loop jump is notified to p0, and t4 represents the time when all the executions have been completed.

【０１１７】本発明では、従来技術に比べてループｋの
繰り返しにまたがる計算がプロセッサにまたがってオー
バラップされており、それだけループの実行が高速化さ
れる。In the present invention, the calculation over the repetition of the loop k is overlapped over the processors as compared with the prior art, and the execution of the loop is accelerated accordingly.

【０１１８】尚、本実施例では、分散メモリマシン向け
のメッセージ通信入りの並列化プログラムに関する説明
を行ったが、共有メモリマシン向けの共有メモリを用い
たプロセス間通信を含む並列プログラムに対しても本発
明は適用可能である。In this embodiment, a description has been given of a parallelized program with message communication for a distributed memory machine. However, a parallel program including inter-process communication using a shared memory for a shared memory machine is also described. The present invention is applicable.

【０１１９】図１７は本発明のコンパイラが対象とする
並列計算機システムの構成の一例を示したものである。
１はローカルメモリ、２はプロセッサエレメント、３は
ネットワーク、４は入出力用プロセッサエレメント、５
は入出力用コンソールまたはワークステーションを表
す。FIG. 17 shows an example of the configuration of a parallel computer system targeted by the compiler of the present invention.
1 is a local memory, 2 is a processor element, 3 is a network, 4 is an input / output processor element, 5
Represents an input / output console or workstation.

【０１２０】本発明のコンパイラは、入出力用コンソー
ルまたはワークステーション５において実行され、通信
を含む並列ソースプログラムまたは並列オブジェクトプ
ログラムに変換される。前者の並列ソースプログラム
は、さらに、プロセッサエレメント２向けのコンパイラ
により並列オブジェクトプログラムに変換される。上記
並列オブジェクトプログラムはリンカによりロードモジ
ュールに変換され、入出力用プロセッサエレメント４を
通じて各プロセッサエレメント２のローカルメモリ１に
ロードされ、各プロセッサエレメント２により実行され
る。実行時における各ロードモジュール間の通信はネッ
トワーク３を通じて行われる。The compiler of the present invention is executed on the input / output console or the workstation 5, and is converted into a parallel source program or a parallel object program including communication. The former parallel source program is further converted into a parallel object program by a compiler for the processor element 2. The parallel object program is converted into a load module by a linker, loaded into the local memory 1 of each processor element 2 through the input / output processor element 4, and executed by each processor element 2. Communication between the load modules at the time of execution is performed through the network 3.

【０１２１】本発明のコンパイラは上記並列計算機シス
テムを有効利用してプログラムを高速化するものであ
る。The compiler of the present invention speeds up a program by effectively utilizing the parallel computer system.

【０１２２】[0122]

【発明の効果】本発明によれば、ループ飛び出し条件が
成立した後、ループ飛び出しを検出したプロセッサは他
のプロセッサにループ飛び出し条件が成立したことを通
知し、通知されたプロセッサもまた別のプロセッサに通
知してこれを次々に続けるので、ループ飛び出し条件成
立後の余分なループ実行にかかる時間が少なくなる。According to the present invention, after the loop jump condition is satisfied, the processor that has detected the loop jump notifies another processor that the loop jump condition has been satisfied, and the notified processor is another processor. And continues this one after another, so that the time required for extra loop execution after the loop jump condition is satisfied is reduced.

【０１２３】また、本発明によれば、使用メモリは２つ
のバッファ分だけ増やせばいいので、使用メモリ量の増
加を少ない。Further, according to the present invention, since the used memory only needs to be increased by two buffers, the increase in the used memory amount is small.

【０１２４】また、本発明によれば、内側に逐次実行さ
れるループがあっても外側のループ繰り返しにまたがっ
て内側ループ処理がオーバラップするのでループの実行
は高速化される。Further, according to the present invention, even if there is a loop that is executed sequentially inside, the execution of the loop is accelerated because the inner loop processing overlaps over the outer loop repetition.

【０１２５】また、本発明によれば、ループ繰り返し回
数が不定の場合でも、規定回数ごとに同期を取るなど処
理を逐次化させないので、ループの実行時間はその分、
高速化される。Further, according to the present invention, even when the number of loop repetitions is indefinite, the processing is not serialized, such as synchronization every prescribed number of times, so that the loop execution time is reduced by
Speed up.

[Brief description of the drawings]

【図１】本発明による並列化コンパイラの構成。FIG. 1 shows a configuration of a parallelizing compiler according to the present invention.

【図２】適用性判定部を説明した図。FIG. 2 is a diagram illustrating an applicability determining unit.

【図３】全体制御処理生成部を説明した図。FIG. 3 is a diagram illustrating an overall control processing generation unit.

【図４】ループ飛び出し判定処理生成部を説明した図。FIG. 4 is a diagram illustrating a loop pop-out determination processing generation unit.

【図５】保存兼計算処理生成部を説明した図。FIG. 5 is a diagram illustrating a storage and calculation processing generation unit.

【図６】オーバラップ計算処理生成部を説明した図。FIG. 6 is a diagram illustrating an overlap calculation processing generation unit.

【図７】ループ飛び出し判定時処理生成部を説明した
図。FIG. 7 is a diagram illustrating a loop jumping out determination process generation unit.

【図８】入力プログラムの例を示した図。FIG. 8 is a diagram showing an example of an input program.

【図９】入力プログラムの一部の中間語によるツリー表
現を示した図。FIG. 9 is a diagram showing a tree representation of a part of an input program by an intermediate language.

【図１０】入力プログラムに対する基本ブロックと基本
ブロック間の制御フローを示した図。FIG. 10 is a diagram showing a control flow between basic blocks for an input program.

【図１１】入力プログラムに対するループテーブルの一
部を示した図。FIG. 11 is a diagram showing a part of a loop table for an input program.

【図１２】生成プログラムの内、全体制御処理生成部で
生成される中間語をソースプログラムスタイルで記述し
た図。FIG. 12 is a diagram illustrating, in a source program style, an intermediate language generated by the overall control processing generation unit in the generation program.

【図１３】生成プログラムの内、オーバラップ計算処理
生成部で生成される中間語をソースプログラムスタイル
で記述した図。FIG. 13 is a diagram in which an intermediate language generated by an overlap calculation processing generation unit in a generation program is described in a source program style.

【図１４】生成プログラムの内、ループ飛び出し判定時
処理生成部で生成される中間語をソースプログラムスタ
イルで記述した図。FIG. 14 is a diagram illustrating, in a source program style, an intermediate language generated by a loop jump determination processing generation unit in a generation program.

【図１５】従来技術による生成プログラムの実行の様子
を説明した図。FIG. 15 is a view for explaining a state of execution of a generation program according to a conventional technique.

【図１６】本発明による生成プログラムの実行の様子を
説明した図。FIG. 16 is a view for explaining a state of execution of a generation program according to the present invention.

【図１７】本発明のコンパイラが対象とする並列計算機
システムの構成の一例。FIG. 17 shows an example of the configuration of a parallel computer system targeted by the compiler of the present invention.

[Explanation of symbols]

１００…並列化コンパイラ、１０１…ソースプログラ
ム、１０２…構文解析部、１０３…プログラム解
析部、１０４…辞書、１０５…中間語、１０
６…最適化部、１０７…コード生成部、１０８…並
列化プログラム、１１０…ループ並列化部、１１１…適
用性判定部、１１２…並列性判定部、１１３…並列
化変換部、１２０…プログラム生成部、１２１…全体制
御処理生成部、１２２…オーバラップ計算処理生成部、
１２３…ループ飛び出し判定時処理生成部。Reference numeral 100: parallelizing compiler, 101: source program, 102: syntax analysis unit, 103: program analysis unit, 104: dictionary, 105: intermediate language, 10
6: Optimizer 107: Code generator 108: Parallelized program 110: Loop parallelizer 111: Applicability determiner 112: Parallelism determiner 113: Parallelization converter 120: Program generation Unit, 121: overall control processing generation unit, 122: overlap calculation processing generation unit,
123: Loop jump determination time processing generation unit.

Claims

[Claims]

In a loop including a loop jump, a value of data whose value is updated within the loop and whose value is used after the loop jump is stored every time the loop is repeated a specified number of times, and the loop is repeated. Overlap calculation means for calculating by overlapping with a plurality of processors, and when a loop jump condition is satisfied, a loop jump for performing a calculation for guaranteeing the value of the data after the loop jump using the value of the data stored immediately before Determination means;
A parallel execution method for a loop including a loop jump.

2. The overlap calculating means according to claim 1,
Replace the storage of all values for the above data in the original loop with the storage of value storage data, and use the values for the above data in the original loop in the next loop iteration. A parallel execution method for a loop including a loop jump-out.

3. The loop jump condition determining means according to claim 1, wherein one of the plurality of processors determines a loop jump condition in an original loop, and notifies a result of the determination to another processor; A means for receiving the notified determination result. A parallel execution method for a loop including a loop jump.

4. A parallelizing compiler for inputting a program, parsing the program, analyzing the program, and generating a parallel program or a parallel object program for a parallel machine. The applicability judgment unit that detects data whose value is updated and the value is used after jumping out of the loop, and calculates the data value by overlapping the loop repetition with multiple processors while saving the data value for each specified number of loop repetitions An overlap calculation processing generation unit that performs a calculation that guarantees a value of the data after the loop jump using the value of the data stored immediately before when the loop jump condition is satisfied; And a method for generating a parallel program for a loop including a loop jump out.

5. The overlap calculation processing generation unit according to claim 4, wherein the storage of all the values for the data in the original loop is replaced by the storage in the value storage data, and in the next loop iteration, the original loop A method for generating a parallel program for a loop including a loop jump, comprising: a storage and calculation processing generation unit that replaces use of all values for the data in the loop with use of data for storing values.

6. A loop jump determination processing generation unit according to claim 4, wherein one of the plurality of processors determines a loop jump condition in an original loop, and notifies a result of the determination to another processor. And generating a process for receiving the notified determination result. A method for generating a parallel program for a loop including loop jumping out.