JP3551352B2

JP3551352B2 - Loop splitting method

Info

Publication number: JP3551352B2
Application number: JP00058598A
Authority: JP
Inventors: 雄一郎青木; 孝好飯塚; 真琴佐藤; 純男菊池
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1998-01-06
Filing date: 1998-01-06
Publication date: 2004-08-04
Anticipated expiration: 2018-01-06
Also published as: JPH11194947A

Description

【０００１】
【発明の属する技術分野】
本発明は、計算機用プログラムを実行可能なプログラムに変換する最適化コンパイラに関し、特に、手続き呼び出しを有するループのループボディを、複数のループボディに分割して複数のループを作成するループ分割方法に関する。
【０００２】
【従来の技術】
従来、ループ実行高速化を促進するため、種々のループ構造変換法が利用されている。その中の１つにループ分割（ｌｏｏｐｄｉｓｔｒｉｂｕｔｉｏｎ）がある。ループ分割とは、ループ制御変数を除くループ内変数の定義（変数に値を代入すること）・使用（変数の値を使用すること）の順序関係を保ったまま、１つのループのループボディを複数のループボディに分割して、複数のループを作成する方法である。
これにより、元のループでは不可能だった並列化やベクトル化などの最適化が一部の分割ループでは可能になり、ループ実行性能が向上する場合がある。
最初に、本明細書で用いる用語を説明する。変数の定義・使用の順序関係を依存、定義と使用をまとめて参照と呼ぶ。また、ループの繰り返しに跨った依存
（ある繰り返しで変数を参照し、次回以降の繰り返しで同じ変数を参照する依存）をループ運搬依存、跨らない依存をループ独立依存、定義の後に使用が来る依存をフロー依存、使用の後に定義が来る依存を逆依存、定義の後に定義が来る依存を出力依存と呼ぶ。ループの繰り返しに跨って存在するフロー依存をループ運搬フロー依存、ループの繰り返しに跨らずに存在するフロー依存をループ独立フロー依存、ループの繰り返しに跨らずに存在する逆依存をループ独立逆依存、ループの繰り返しに跨らずに存在する出力依存をループ独立出力依存と呼ぶ。特定の依存関係を持つ変数を含む文をその依存関係がある文、文Ａで参照された変数が文Ｂで参照される場合の依存関係を文Ａから文Ｂへの依存、特定の依存関係がある文を含むループをその依存関係があるループと呼ぶ。また、依存サイクルとは、文間の依存関係を文Ａから文Ｂ、文Ｂから文Ｃとたどっていくと、最終的に出発点の文Ａに戻ってくる場合の依存のことを指す。
【０００３】
まず、ループ分割方法について述べる。ループ分割は、１つの依存サイクルに含まれる文を同じ分割ループに割り当てるよう、元のループを分割する。依存サイクルがある文とない文を別々の分割ループに割り当てることにより、各々の依存関係に応じた最適化を、別々に各分割ループに適用することができる。
次に、ループ中に手続き呼び出し文がある場合のループ分割方法について述べる。手続きに跨って依存関係を調べる手続き間依存解析を行なうと、呼び出し元手続きと呼び出し先手続きの間の依存関係がわかる。ループ中で、呼び出し元手続きと呼び出し先手続きの間に依存サイクルがあった場合、手続き呼び出し文は依存サイクルに含まれると解釈して、前記のループ分割を行なう方法もある。
しかし、この方法は呼び出し先手続きにある依存サイクルに含まれない文も依存サイクルに含まれると拡大解釈してしまう。このような拡大解釈を行なわない方法として、ループ分割の前処理としてインライン展開を行なう方法がある。インライン展開とは、手続き呼び出し文を手続き本体に置き換える処理である。これにより拡大解釈せずに文毎に依存を評価できるようになり、また呼び出し先手続きの文を別々の分割ループに割り当てることが可能になる。
【０００４】
例としてループ１を考える。Ｓ１〜Ｓ４，ＳＣは文番号である。

ループ１内に現れる文には、次のような依存関係がある。
−依存関係（ａ）文Ｓ１から文Ｓ２へ、配列Ａに関するループ独立フロー依存
−依存関係（ｂ）文Ｓ２から文Ｓ１へ、配列Ｂに関するループ運搬フロー依存
−依存関係（ｃ）文Ｓ２から文Ｓ４へ、配列Ｂに関するループ独立フロー依存
−依存関係（ｄ）文Ｓ３から文Ｓ４へ、配列Ｄに関するループ独立逆依存
依存関係（ａ），（ｂ）から、ループ１では文Ｓ１，Ｓ２が依存サイクルを構成していることがわかる。
【０００５】
一時変数は、手続き呼び出しの実引数と仮引数の対応関係や、呼び出し元手続きと呼び出し先手続きに同名の別変数がある場合、仮引数やローカル変数に別名関係（複数の変数が同じメモリアドレスを指していること）がある場合の変数間の対応関係を保証するために必要である。実引数と仮引数との間の対応を、一時変数を用いてではなくレジスタ渡しで行なう方法もある。しかし、引数の数が多ければレジスタが足りなくなり、一時変数を使わなければならない。更に、実引数と仮引数の間で配列形状が異なった場合、例えば、実引数で２次元配列だが仮引数では１次元配列として受け取るような場合には、レジスタ渡しは適用できず、一時変数を用いた計算が必要となる。
手続きｓｕｂ１を手続き呼び出し文Ｓｃに関してインライン展開すると、下記
のループ１ａが生成される。ｔｍｐＡ，ｔｍｐＢ，ｔｍｐＣはインライン展開のためにコンパイラが生成した一時変数である。また、文Ｓ２，Ｓ３は一時変数を用いた文Ｓ２’，Ｓ３’に変換された。Ｓ１，Ｓ４，Ｓ２’，Ｓ３’，ＳＴ１〜ＳＴ６は文番号である。
【０００６】

このループ１ａに現れる依存関係は以下の通りである。
−依存関係（１）文Ｓ１から文ＳＴ１へ、配列Ａに関するループ独立フロー依存
−依存関係（２）文ＳＴ１から文ＳＴ５へ、配列Ａに関するループ独立逆依存
−依存関係（３）文Ｓ１から文ＳＴ５へ、配列Ａに関するループ独立出力依存
−依存関係（４）文ＳＴ１から文Ｓ２’へ、配列ｔｍｐＡに関するループ独立フロー依存
−依存関係（５）文ＳＴ１から文ＳＴ５へ、配列ｔｍｐＡに関するループ独立フロー依存
−依存関係（６）文ＳＴ２から文ＳＴ６へ、配列Ｂに関するループ独立逆依存
−依存関係（７）文ＳＴ６から文Ｓ１へ、配列Ｂに関するループ運搬フロー依存
−依存関係（８）文ＳＴ６から文Ｓ４へ、配列Ｂに関するループ独立フロー依存
−依存関係（９）文ＳＴ２から文Ｓ２’へ、配列ｔｍｐＢに関するループ独立出力依存
−依存関係（１０）文Ｓ２’から文ＳＴ６へ、配列ｔｍｐＢに関するループ独立フロー依存
−依存関係（１１）文ＳＴ３から文ＳＴ７へ、配列Ｃに関するループ独立逆依存
−依存関係（１２）文ＳＴ３から文Ｓ３’へ、配列ｔｍｐＣに関するループ独立出力依存
−依存関係（１３）文Ｓ３’から文ＳＴ７へ、配列ｔｍｐＣに関するループ独立フロー依存
−依存関係（１４）文ＳＴ４から文ＳＴ８へ、配列Ｄに関するループ独立逆依存
−依存関係（１５）文ＳＴ８から文Ｓ４へ、配列Ｄに関するループ独立出力依存
−依存関係（１６）文ＳＴ４から文Ｓ３’へ、配列ｔｍｐＤに関するループ独立フロー依存
−依存関係（１７）文ＳＴ４から文ＳＴ８へ、配列ｔｍｐＤに関するループ独立フロー依存
【０００７】
依存関係（１），（４），（７），（１０）により、ループ１ａでは文Ｓ１，ＳＴ１，Ｓ２’，ＳＴ６が依存サイクルを構成していることがわかる。次に、依存関係を保ったまま、依存サイクルを持つ文と持たない文をできるだけまとまめるように、ループボディ中の文の順番を入れ換ると、次のようなループ１ａ’ になる。

【０００８】
更に、ループ１ａ’ にループ分割を適用すると、次のようなループ１ａａ，１ａｂが生成される。

【０００９】
このループ分割の結果、文Ｓ１，Ｓ２’ を分割ループ１ａａに、文Ｓ４，文Ｓ３’ を分割ループ１ａｂに割り当てることになった。
更に、分割後のループを最適化することを考える。分割前のループがある依存を持つために最適化できなかった場合でも、分割後のループの中にはその依存をもたないものがありうるので、その依存を持たないループに対して最適化を適用することができる。例えば、最適化として並列化やベクトル化を例に挙げる。ループが並列化できる条件はループ中にループ運搬フロー依存がないことであり、ループがベクトル化できる条件はループ中に依存サイクルがないことである。
分割前のループ１はどちらの条件も満たさないため、並列化もベクトル化もできなかった。
しかし、分割後のループ１ａｂはループ運搬フロー依存も依存サイクルも持たないので、並列化もベクトル化も可能である。
このようなループ分割やインライン展開は公知技術であり、例えば
ＭｉｃｈａｅｌＷｏｌｆｅ（１９９６） ”ＨｉｇｈＰｅｒｆｏｒｍａｎｃｅＣｏｍｐｉｌｅｒｓｆｏｒＰａｒａｌｌｅｌＣｏｍｐｕｔｉｎｇ”，Ａｄｄｉｓｏｎ−ＷｅｓｌｅｙＰｕｂｌｉｓｈｉｎｇＣｏｍｐａｎｙ，ｐｐ．３２３−ｐｐ．３３０、ｐｐ．３６０−ｐｐ．３６１で述べられている。
【００１０】
【発明が解決しようとする課題】
上記従来技術は、そのインライン展開において、多数の一時変数を必要とする。
そのため、ループ分割して一部の分割ループに最適化を適用しても、一時変数の計算に時間がかかって性能がでないという問題点があった。
例えば、従来技術において、ループ１で手続きｓｕｂ１をインライン展開したループ１ａでは、一時変数ｔｍｐＡ，ｔｍｐＢ，ｔｍｐＣを用いたＳＴ１〜ＳＴ８という新たな８文が生成された。インライン展開前にはループ中にはＳＴ１〜ＳＴ４の４文しかなかったが、インライン展開後には１２文となり、文数が３倍になった。手続き呼び出しにかかる時間とループ制御にかかる時間を無視できると仮定すると、ループの実行時間は、

となる。Ｔｓｔｍｔａｖは１文の平均実行時間、Ｎはループ繰り返し回数である。
【００１１】
仮に、並列化やベクトル化等の最適化でループ１ａｂの実行時間がループ１ａａの実行時間に比べて非常に短くなり、

とみなせるようになったとしても、ループ１ａａ＋ループ１ａｂの実行時間はループ１の実行時間の５／４倍ある。この例の場合は、ループ分割して最適化を行なったならば、逆に性能が悪化してしまった。
本発明の目的は、上記問題を解決し、最適化を促進するために、手続き呼び出しを含むループを、前処理としてインライン展開を行なわずにループ分割し、ループ中から呼び出される呼び出し先手続き内の文のうち、最適化可能文を最適化可能ループに、最適化不能文を最適化不能ループに割り当てることを可能とするループ分割方法を提供することにある。
本発明の他の目的は、ループの実行時間を短縮するプログラムまたはオブジェクトコードを出力することが可能なループ分割方法を提供することにある。
【００１２】
【課題を解決するための手段】
上記課題を解決するため、本発明のループ分割方法は、
（ａ）該ループに関して手続き間依存解析を行なう処理と、
（ｂ）該ループを、複数の最適化可能分割ループと複数の最適化不能分割ループに分割する処理と、
（ｃ）該最適化不能分割ループ中の文を調べて最適化可能な最適化可能文を検出する処理と、
（ｄ）以下の３種の手続き（ｄ．１）該最適化不能分割ループから直接呼び出される直接呼び出し手続き、（ｄ．２）該最適化可能文を持つ最適化可能手続き、（ｄ．３）該コールグラフ上で該直接呼び出し手続きからエッジをたどって該最適化可能手続きに至る間に現れる手続きである中間手続き、をコピー対象手続きとする処理と、
（ｅ）該最適化不能分割ループ中の該最適化可能文に、該最適化不能文から該最適化可能文への依存を持たない前方最適化可能文がある場合は、（ｅ．１）該直接呼び出し手続き、該最適化可能手続き、該中間手続きをコピーした、前方直接呼び出しコピー手続き、前方最適化可能コピー手続き、前方中間コピー手続きを作成し、（ｅ．２）該最適化不能分割ループの直前にある該最適化可能分割ループの最後に、該前方直接呼び出しコピー手続きの手続き呼び出し文を挿入し、（ｅ．３）該前方直接呼び出しコピー手続きを調べ、手続き呼び出し文があれば呼び出し先手続きを再帰的に調べ、該コピー対象手続きの手続き呼び出し文があれば該前方中間コピー手続きまたは該前方最適化可能コピー手続きの手続き呼び出し文に変更し、（ｅ．４）該前方最適化可能文以外かつ該手続き呼び出し文以外の文を、該前方直接呼び出しコピー手続き、該前方最適化可能コピー手続き、該前方中間コピー手続きから削除する処理と、
【００１３】
（ｆ）該最適化不能分割ループ中の該最適化可能文に、該最適化可能文から該最適化不能文への依存を持たない後方最適化可能文がある場合は、（ｆ．１）該直接呼び出し手続き、該最適化可能手続き、該中間手続きをコピーした、後方直接呼び出しコピー手続き、後方最適化可能コピー手続き、後方中間コピー手続きを作成し、（ｆ．２）該最適化不能分割ループの直後にある該最適化可能分割ループの先頭に、該後方直接呼び出しコピー手続きの手続き呼び出し文を挿入し、（ｆ．３）該後方直接呼び出しコピー手続きを調べ、手続き呼び出し文があれば呼び出し先手続きを再帰的に調べ、該コピー対象手続きの手続き呼び出し文があれば該後方中間コピー手続きまたは該後方最適化可能コピー手続きの手続き呼び出し文に変更し、（ｆ．４）該後方最適化可能文以外かつ該手続き呼び出し文以外の文を、該後方直接呼び出しコピー手続き、該後方最適化可能コピー手続き、該後方中間コピー手続きから削除する処理と、
（ｇ）（ｇ．１）該直接呼び出し手続き、該最適化可能手続き、該中間手続きをコピーした、オリジナル直接呼び出しコピー手続き、オリジナル最適化可能コピー手続き、オリジナル中間コピー手続きを作成し、（ｇ．２）該最適化不能分割ループ中の該直接呼び出し手続きの手続き呼び出し文を該オリジナル直接呼び出しコピー手続きの手続き呼び出し文に置換し、（ｇ．３）該オリジナル直接呼び出しコピー手続きを調べ、手続き呼び出し文があれば呼び出し先手続きを再帰的に調べ、該コピー対象手続きの手続き呼び出し文があれば該オリジナル中間コピー手続きまたは該オリジナル最適化可能コピー手続きの手続き呼び出し文に置換し、（ｇ．４）該最適化可能文を、該オリジナル直接呼び出しコピー手続き、該オリジナル最適化可能コピー手続き、該オリジナル中間コピー手続きから削除する手段と、（ｈ）該最適化可能分割ループを最適化する処理、とを有することを特徴とする。
【００１４】
【発明の実施の形態】
以下、本発明の実施例を、図面により詳細に説明する。
まず、本発明の実施例について説明する前に、本発明が適用される逐次処理計算機の構成を説明する。
図１３は、本発明の適用対象である逐次処理計算機の構成の一例を示す図である。
この逐次処理計算機５０は、プロセッサ５１、メモリ５３、プロセッサ５１とメモリ５３を結合するバス５２とから構成される。
この逐次処理計算機５０は、図１に示すような最適化コンパイラの各ステップおよび入力プログラムをメモリ５３に格納しておき、これをプロセッサ５１が順次読み出して実行する。処理中に出力された中間データも、メモリ５３に格納される。
図１は、本発明のループ分割方法を実行するコンパイラの構成を示す図である。
図１に示すように、コンパイラ１０は、構文解析部１１、ループ分割部１３、最適化部１５、コード生成部１７から構成される。ここで、ループ分割部１３が、本発明で新規に設置された処理手順であって、その他の部分は従来から設置されたものをそのまま利用できる。
構文解析部１１は、入力プログラム９０を読み込んで中間語９１を生成する。中間語９１はコンパイラ内部のプログラムの表現であり、その形式は通常のコンパイラの場合と特に変わらないので、ここでは詳細には述べない。
ループ分割部１３は、構文解析部１１の結果を利用して本発明のループ分割方法が適用可能なループを検出し、本ループ分割方法を適用してループ構造を変換する。ループ分割部１３は、依存解析部１３１、ループ分割解析変換部１３２、文解析部１３３、手続きコピー解析部１３４、手続きコピー変換部１３５、文変換部１３６から構成される。依存解析部１３１の実行方法は、通常のコンパイラが手続きにまたがって依存解析を行なう手続き間依存解析の場合と特に変わらないので、ここでは詳細には述べない。ステップ１３２〜１３６の説明は後述する。
【００１５】
図２は、図１におけるループ分割解析変換部１３２の実行方法を表したフローチャートである。
本変換部１３２の実行方法は、ステップ１３２０１〜１３２１０から構成される。
ステップ１３２０１では、依存解析部１３１の手続き間依存解析結果を用いて、ループ中から依存サイクルに含まれる文を見つける。
ステップ１３２０２では、依存サイクルに含まれる文は依存サイクルに含まれる文同志で連続して並ぶように、また依存サイクルに含まれない文は依存サイクルに含まれない文同志で連続して並ぶように、依存関係を壊さない範囲でループ中の文の位置を移動して、それぞれをまとめる。
ステップ１３２０３では、ループ中の文を先頭から出現順にたどり、ステップ
１３２０４で文が依存サイクル中に含まれるかを判定する。このとき、手続き呼び出し文があっても呼出先手続きの文はたどらない。もし文が依存サイクル中に含まれれば、ステップ１３２０７を実行する。もし文が依存サイクル中に含まれていなければ、ステップ１３２０６で次の文が依存サイクル中に含まれているかどうか判定する。もし含まれていれば、ステップ１３２０５で、文の直後でループを分割するようにループ分割点を設置する。もし含まれていなければ、ステップ１３２０７を実行する。
【００１６】
ステップ１３２０７で、ループ中の最終文であると判定されたならば、ステップ１３２０５で設置した分割点を用いて、ステップ１３２０８でループ分割を行なう。最終文でなければ、ステップ１３２０３に戻って、次の文に関して処理を続行する。ステップ１３２０９では、ステップ１３２０８で生成した分割ループが最適化が可能な最適化可能ループか、不可能な最適化不能ループかを判定する。分割ループを並列化する場合は、分割ループ中にループ運搬フロー依存が含まれていなければ最適化可能ループ、含まれていれば最適化不能ループと判定する。分割ループをベクトル化する場合は、分割ループ中に依存サイクルが含まれていなければ最適化可能ループ、含まれていれば最適化不能ループと判定する。
ステップ１３２１０では、分割点の前後の分割ループを調べ、前後の分割ループとも最適化可能ループ同志または最適化不能ループ同志である場合には、その分割点でのループ分割をやめ、元に戻す。
以上で、図２を用いたループ分割解析変換部１３２の説明を終る。
【００１７】
図３は、図１における文解析部１３３の実行方法を表したフローチャートである。
文解析部１３３の実行方法は、ステップ１３３０１〜１３３０８から構成される。
ステップ１３３０１では、ループが最適化不能分割ループであるかを判定する。ループが最適化不能分割ループであった場合は、ステップ１３３０２で、最適化不能分割ループ内の文を、手続き呼び出し文があれば呼び出し先手続き内の文も再帰的にたどる。ループが最適化不能分割ループでなかった場合には、ステップ１３３０８でループ中の文を最適化可能文とする。ループ中の手続き呼び出し文から呼び出される呼び出し先手続き内の文は、この手続き呼び出し文に関して最適化可能文とする。
ステップ１３３０３では、文が手続き呼び出し文であるか否かを判定する。文が手続き呼び出し文であった場合は、ステップ１３３０２へ戻って次の文に関して処理を続行する。文が手続き呼び出し文でなかった場合は、ステップ１３３０４を実行する。
ステップ１３３０４では、文が最適化可能か否かを判定する。最適化として並列化を行なう場合の最適化可能な文の条件は、文がループ運搬フロー依存を持たないことである。最適化としてベクトル化を行なう場合の最適化可能な文の条件は、文が依存サイクルに含まれないことである。最適化可能であれば、ステップ１３３０５でその文を最適化可能文とし、最適化可能でなければ、ステップ１３３０６でその文を最適化不能文とする。
ステップ１３３０７では、文が分割ループ中の最終文かどうか判定する。最終文であれば処理を終了し、最終文でなければ、ステップ１３３０２へ戻って次の文に関して処理を続行する。
以上で、図３を用いた文解析部１３３の説明を終る。
【００１８】
図４は、図１における手続きコピー解析部１３４の実行方法を表したフローチャートである。
手続きコピー解析部１３４の実行方法は、ステップ１３４００〜１３４１２から構成される。
ステップ１３４００では、コールグラフを作成する。コールグラフとは、手続きをノードとし、呼び出し元手続きから呼出先手続きへエッジをつないで、手続き呼びだし関係を表した木状グラフのことである。コールグラフの作成方法は、通常のコンパイラがコールグラフを作成する場合と特に変わらないので、ここでは詳細には述べない。
ステップ１３４０１では、最適化不能分割ループ内の文を、手続き呼び出し文があれば呼び出し先手続き内の文も再帰的にたどり、ステップ１３４０２で今たどっている文である自文が最適化可能文かどうか判定する。もし自文が最適化可能文であれば、ステップ１３４００で作成したコールグラフを利用して直接呼び出し手続き、最適化可能手続き、中間手続きを見つけ、それらをコピー対象手続きとし、ステップ１３４０４を実行する。もし自文が最適化可能文でなければ、ステップ１３４１２を実行する。
ステップ１３４０４では、自文と最適化不能分割ループ内の最適化不能文との間に依存関係があるか否かを判定する。もし、依存関係があれば、ステップ１３４０６を実行する。もし依存関係がなければ、ステップ１３４０５を実行する。
【００１９】
ステップ１３４０５では、最適化不能分割ループ直前の分割ループと直後の分割ループとの粒度を比較する。なお、粒度とは、ここではループボディの実行コストであると定義する。具体的には、ループボディ中の文の数または命令の数で比較する。
直前の分割ループの粒度の方が小さかった場合は、ステップ１３４０７を実行する。そうでない場合は、ステップ１３４０８を実行する。
ステップ１３４０６では、自文と最適化不能文との間に存在する依存が、自文から最適化不能文への依存のみかどうか判定する。もし、自文から最適化不能文への依存のみであった場合は、ステップ１３４０８を実行する。もし、自文から最適化不能文への依存のみではなかった場合は、ステップ１３４０９を実行する。
ステップ１３４０７では、直接呼び出しコピー手続きの手続き呼び出し文の挿入位置を、直後の分割ループの先頭とする。
ステップ１３４０８では、直接呼び出しコピー手続きの手続き呼び出し文の挿入位置を、直前の分割ループの最後とする。
ステップ１３４０９では、自文と最適化不能文との間に存在する依存が、最適化不能文から自文への依存のみかどうかを判定する。もし、最適化不能文から自文への依存のみであった場合は、ステップ１３４０７を実行する。もし、最適化不能文から自文への依存のみではなかった場合は、ステップ１３４１０を実行する。
ステップ１３４１０では、自文を最適化不能文と再設定する。
ステップ１３４１１では、最適化不能分割ループ中およびコピー対象手続き中からコピー対象手続きを呼び出す手続き呼び出し文を、コピーした手続きであるコピー手続きの手続き呼び出し文への置換位置とする。
ステップ１３４１２では、自文が最適化不能分割ループの最終文であると判定する。もし、最終文であれば処理を終了し、最終文でなければ、ステップ１３４０１へ戻って次の文に関して処理を続行する。
以上で、図４を用いたコピー手続き解析部１３４の説明を終る。
【００２０】
図５は、図１における手続きコピー変換部１３５の実行方法を表したフローチャートである。
手続きコピー変換部１３５の実行方法は、ステップ１３５０１〜１３５０６から構成される。
ステップ１３５０１では、最適化不能分割ループ内の文を、手続き呼び出し文があれば呼び出し先手続き内の文も再帰的にたどり、ステップ１３５０２で今たどっている文が手続き呼び出し文かどうかを判定する。もし、手続き呼び出し文であれば、ステップ１３５０３を実行する。手続き呼び出し文でなければ、ステップ１３５０１に戻り、次の文に関して処理を続行する。
ステップ１３５０３では、呼び出し先手続きがコピー対象手続きかどうか判定する。もしコピー対象手続きならば、ステップ１３５０４を実行する。コピー対象手続きでなければ、ステップ１３５０１に戻り、次の文に関して処理を続行する。
ステップ１３５０４では、コピー対象手続きをコピーして、コピー手続きを作成する。作成するコピー手続きの数は、コピー手続きの手続き呼び出し文を挿入または置換する数と同一である。
ステップ１３５０５では、文が最適化不能分割ループの最終文であるか判定する。
もし最終文であれば、ステップ１３５０６を実行する。最終文でなければ、ステップ１３５０１に戻り、次の文に関して処理を続行する。
ステップ１３５０６では、コピー手続きの手続き呼び出し文の全ての挿入位置および置換位置に、コピー手続きの手続き呼び出し文を挿入または置換する。
以上で、図５を用いた手続きコピー変換部１３５の説明を終る。
【００２１】
図６は、文変換部１３６の実行方法を表したフローチャートである。
文変換部１３６の実行方法は、ステップ１３６０１〜１３６０９から構成される。
ステップ１３６０１では、分割ループ内の文を、手続き呼び出し文があれば呼び出し先手続き内の文も再帰的にたどり、ステップ１３６０２で今たどっている文が最適化可能文かどうか判定する。もし最適化可能文ならば、ステップ１３６０３を実行する。最適化可能文でなければ、ステップ１３６０４を実行する。
ステップ１３６０３では、分割ループが最適化可能分割ループかどうか判定する。
もし最適化可能分割ループならば、ステップ１３６０７を実行する。最適化可能分割ループでなければ、ステップ１３６０５を実行する。
ステップ１３６０４では、分割ループが最適化可能分割ループかどうかを判定する。もし、最適化可能分割ループならばステップ１３６０５を実行する。最適化可能分割ループでなければ、ステップ１３６０７を実行する。
ステップ１３６０５では、文が手続き呼び出し文かどうかを判定する。もし手続き呼び出し文ならば、ステップ１３６０７を実行する。手続き呼び出し文でなければステップ１３６０６を実行する。
ステップ１３６０６では、文を削除し、ステップ１３６０１に戻って削除した文の次の文に関して処理を続行する。
ステップ１３６０７では、文が分割ループ中の最終文であるか否かを判定する。もし、最終文であれば、ステップ１３６０８を実行する。最終文でなければ、ステップ１３６０１に戻って次の文に関して処理を続行する。
ステップ１３６０８では、コピー手続き内で参照されていない変数を、手続きの引数から削除する。
ステップ１３６０９では、実行文を含まないコピー手続きとその手続き呼び出し文を削除する。
以上で、図６を用いたコピー手続き解析部１３６の説明を終る。
これで、ループ分割部１３の説明を終る。
【００２２】
最適化部１５は、ループ分割部１３の結果を利用してプログラムに対して並列化やベクトル化等の最適化変換を行なう。その構成は、通常の最適化コンパイラの場合と特に変わらないので、ここでは詳細には述べない。
コード生成部１７は、中間語９１を読み込んで出力プログラム９２を生成する。これらの処理の内容は通常のコンパイラの場合と特に変わらないので、ここでは詳細には述べない。
以上で、本発明によるループ分割方法の説明を終る。
【００２３】
図７の入力プログラム９０は、コンパイラ１０のループ分割部１３への中間語９１の入力のソースイメージの一例であり、図８のプログラム９３は、コンパイラ１０のループ分割解析変換部１３２の中間語９１への出力のソースイメージの一例であり、図９のプログラム９４は、コンパイラ１０の手続きコピー変換部１３５の中間語９１への出力のソースイメージの一例であり、図１０のプログラム９５は、コンパイラ１０の文変換部１３６の中間語９１への出力のソースイメージの一例であり、図１１のプログラム９６は、コンパイラ１０が最適化として並列化を行なった場合の、最適化部１５からの中間語９１への出力のソースイメージの一例であり、図１２のプログラム９７は、コンパイラ１０が最適化としてベクトル化をおこなった場合の、最適化部１５からの中間語９１への出力のソースイメージの一例である。プログラム９０、９２〜９７を用いて、コンパイラ１０によるループ分割の一例を説明する。
【００２４】
図７の入力プログラム９０で、図の左端にある番号は、理解を助けるために打った行番号である。図の１行目は、整数型の変数ｉ，Ｎの宣言文、図の２行目は、整数型の変数Ｎの初期化文、図の３行目は、実数型であり１〜Ｎのサイズを持つ１次元配列Ａ，Ｂ，Ｃ，Ｄと、実数型の変数Ｘの宣言文、図の４〜８行目が入力プログラム９０の主処理部、図の１０行目は、主処理部の６行目から呼び出される手続きｓｕｂ１の手続き文、図の１１行目は、整数型の変数ｉ，Ｎの宣言文、図の１２行目は、整数型の変数Ｎの初期化文、図の１３行目は、実数型であり１〜Ｎのサイズを持つ１次元配列Ａ，Ｂ，Ｃ，Ｄの宣言文、図の１４〜１７行目が入力プログラム９０の手続きｓｕｂ１処理部である。
【００２５】
図８のループ分割解析変換部１３２の出力プログラム９３で、図の左端にある番号は、理解を助けるために打った行番号である。図の１行目は、整数型の変数ｉ，Ｎの宣言文、図の２行目は、整数型の変数Ｎの初期化文、図の３行目は、実数型であり１〜Ｎのサイズを持つ１次元配列Ａ，Ｂ，Ｃ，Ｄと、実数型の変数Ｘの宣言文、図の４〜１０行目が出力プログラム９３の主処理部（４〜７行が最適化不能の処理、８〜１０行が最適化可能の処理）、図の１２行目は、主処理部の６行目から呼び出される手続きｓｕｂ１の手続き文、図の１３行目は、整数型の変数ｉ，Ｎの宣言文、図の１４行目は、整数型の変数Ｎの初期化文、図の１５行目は、実数型であり１〜Ｎのサイズを持つ１次元配列Ａ，Ｂ，Ｃ，Ｄの宣言文、図の１６〜１９行目が出力プログラム９３の手続きｓｕｂ１処理部である。
【００２６】
図９の手続きコピー変換部１３５の出力プログラム９４で、図の左端にある番号は、理解を助けるために打った行番号である。図の１行目は、整数型の変数ｉ，Ｎの宣言文、図の２行目は、整数型の変数Ｎの初期化文、図の３行目は実数型であり１〜Ｎのサイズを持つ１次元配列Ａ，Ｂ，Ｃ，Ｄと、実数型の変数Ｘの宣言文、図の４〜１１行目が出力プログラム９４の主処理部（４〜７行が最適不可能部分、８〜１１行が最適可能部分）、図の１３行目は、主処理部の６行目から呼び出されるコピー手続き＿ｃｏｐｙ１＿ｓｕｂ１の手続き文、図の１４行目は、整数型の変数ｉ，Ｎの宣言文、図の１５行目は、整数型の変数Ｎの初期化文、図の１６行目は、実数型であり１〜Ｎのサイズを持つ１次元配列Ａ，Ｂ，Ｃ，Ｄの宣言文、図の１７〜２０行目が出力プログラム９４の手続き＿ｃｏｐｙ１＿ｓｕｂ１処理部である。図の２２行目は、主処理部の９行目から呼び出されるコピー手続き＿ｃｏｐｙ２＿ｓｕｂ１の手続き文、図の２３行目は、整数型の変数ｉ，Ｎの宣言文、図の２４行目は、整数型の変数Ｎの初期化文、図の２５行目は、実数型であり１〜Ｎのサイズを持つ１次元配列Ａ，Ｂ，Ｃ，Ｄの宣言文、図の２６〜２９行目が出力プログラム９４の手続き＿ｃｏｐｙ２＿ｓｕｂ１処理部である。
【００２７】
図１０の文変換部１３６の出力プログラム９５で、図の左端にある番号は、理解を助けるために打った行番号である。図の１行目は、整数型の変数ｉ，Ｎの宣言文、図の２行目は、整数型の変数Ｎの初期化文、図の３行目は、実数型であり１〜Ｎのサイズを持つ１次元配列Ａ，Ｂ，Ｃ，Ｄと、実数型の変数Ｘの宣言文、図の４〜１１行目が出力プログラム９５の主処理部（４〜７行が最適化不可能部分、８〜１１行が最適化可能部分）、図の１３行目は、主処理部の６行目から呼び出されるコピー手続き＿ｃｏｐｙ１＿ｓｕｂ１の手続き文、図の１４行目は、整数型の変数ｉ，Ｎの宣言文、図の１５行目は、整数型の変数Ｎの初期化文、図の１６行目は、実数型であり１〜Ｎのサイズを持つ１次元配列Ａ，Ｂの宣言文、図の１７〜１９行目が出力プログラム９５の手続き＿ｃｏｐｙ１＿ｓｕｂ１処理部（最適化不可能部分）である。図の２１行目は、主処理部の９行目から呼び出されるコピー手続き＿ｃｏｐｙ２＿ｓｕｂ１の手続き文、図の２２行目は、整数型の変数ｉ，Ｎの宣言文、図の２３行目は、整数型の変数Ｎの初期化文、図の２４行目は、実数型であり１〜Ｎのサイズを持つ１次元配列Ｃ，Ｄの宣言文、図の２５〜２７行目が出力プログラム９５の手続き＿ｃｏｐｙ２＿ｓｕｂ１処理部（最適化可能部分）である。
【００２８】
図１１の最適化部１５が並列化を行なった場合の出力プログラム９６で、図の左端にある番号は、理解を助けるために打った行番号である。図の１行目は、整数型の変数ｉ，Ｎ，プロセッサ台数Ｎｐｒｏｃの宣言文、図の２行目は、整数型の変数Ｎの初期化文、図の３行目は、実数型であり１〜Ｎのサイズを持つ１次元配列Ａ，Ｂ，Ｃ，Ｄと、実数型の変数Ｘの宣言文、図の４〜１１行目が出力プログラム９６の主処理部（４〜７行が最適化不可部分、８〜１１行が最適化可能部分）、図の１３行目は、主処理部の６行目から呼び出されるコピー手続き＿ｃｏｐｙ１＿ｓｕｂ１の手続き文、図の１４行目は、整数型の変数ｉ，Ｎの宣言文、図の１５行目は、整数型の変数Ｎの初期化文、図の１６行目は、実数型であり１〜Ｎのサイズを持つ１次元配列Ａ，Ｂの宣言文、図の１７〜１９行目が出力プログラム９６の手続き＿ｃｏｐｙ１＿ｓｕｂ１処理部（最適化不可能部分）である。図の２１行目は、主処理部の９行目から呼び出されるコピー手続き＿ｃｏｐｙ２＿ｓｕｂ１の手続き文、図の２２行目は、整数型の変数ｉ，Ｎの宣言文、図の２３行目は、整数型の変数Ｎの初期化文、図の２４行目は、実数型であり１〜Ｎのサイズを持つ１次元配列Ｃ，Ｄの宣言文、図の２５〜２７行目が出力プログラム９６の手続き＿ｃｏｐｙ２＿ｓｕｂ１処理部（最適化可能部分）である。
【００２９】
図１２の最適化部１５がベクトル化を行なった場合の出力プログラム９７で、図の左端にある番号は、理解を助けるために打った行番号である。図の１行目は、整数型の変数ｉ，Ｎの宣言文、図の２行目は、整数型の変数Ｎの初期化文、図の３行目は、実数型であり１〜Ｎのサイズを持つ１次元配列Ａ，Ｂ，Ｃ，Ｄと、実数型の変数Ｘの宣言文、図の４〜９行目が出力プログラム９７の主処理部、図の１１行目は、主処理部の６行目から呼び出されるコピー手続き＿ｃｏｐｙ１＿ｓｕｂ１の手続き文、図の１２行目は、整数型の変数ｉ，Ｎの宣言文、図の１３行目は、整数型の変数Ｎの初期化文、図の１４行目は、実数型であり１〜Ｎのサイズを持つ１次元配列Ａ，Ｂの宣言文、図の１５〜１７行目が出力プログラム９６の手続き＿ｃｏｐｙ１＿ｓｕｂ１処理部である。
【００３０】
図７〜図１２で明らかなように、ループ分割部１３は、入力プログラム９０を出力プログラム９５に変換する。そのために、まず、ループ分割解析変換部１３２が、入力プログラム９０を出力プログラム９３に変換する（図８参照）。入力プログラム９０の４〜８行目のループが、出力プログラム９３の４〜７行目、８〜１０行目のループに分割された。前者のループが最適化不能分割ループ、後者のループが最適化可能分割ループである。
次に、手続きコピー変換部１３５が、出力プログラム９３を出力プログラム９４に変換する（図９参照）。出力プログラム９３の６行目の手続き呼び出し文が手続きｓｕｂ１を呼び出していたのが、出力プログラム９４の６行目の手続き呼び出し文では、手続きｓｕｂ１をコピーしたコピー手続き＿ｃｏｐｙ１＿ｓｕｂ１を呼び出すようになった。また、出力プログラム９４の９行目に、手続きｓｕｂ１をコピーしたコピー手続き＿ｃｏｐｙ２＿ｓｕｂ１を呼び出す手続き呼び出し文が追加された。
また、手続き本体もｓｕｂ１から＿ｃｏｐｙ１＿ｓｕｂ１、＿ｃｏｐｙ２＿ｓｕｂ１の本体に変更された。元の手続きｓｕｂ１の本体はプログラム中から削除されてはいないが、手続きコピー変換部１３５によって手続き呼び出し文が削除されてしまったので、出力プログラム９４からは省略した。出力プログラム９５〜９７でも同様の理由で省略した。
【００３１】
次に、文変換部１３６が、出力プログラム９４を出力プログラム９５に変換する（図１０参照）。出力プログラム９４の１８行目の代入文が最適化可能文として、２６行目の代入文が最適化不能文として、出力プログラム９５のそれぞれ４〜７行目の最適化不能分割ループ、８〜１０行目の最適化可能分割ループで実行されないように削除された。
次に、最適化部１５は、並列化を行なう場合には、出力プログラム９５を出力プログラム９６に変換する（図１１参照）。出力プログラム９５の８〜１１行目の最適化可能分割ループが、出力プログラム９６の８〜１１行目の並列ループに変換された。ループ範囲が１〜Ｎより１〜Ｎ＝Ｎｐｒｏｃに変更されている。
また、最適化部１５は、ベクトル化を行なう場合には、出力プログラム９５を出力プログラム９７に変換する（図１２参照）。出力プログラム９５の８〜１１行目の最適化可能分割ループが、コピー手続き＿ｃｏｐｙ２＿ｓｕｂ１をインライン展開してからベクトル化することで、出力プログラム９６の８〜９行目の配列代入文に変換された。配列代入文は、ベクトル化可能なループと同一の意味を表す。以上で、プログラム９０、９２〜９７を用いたループ分割の一例の説明を終る。
【００３２】
もし、本ループ分割を実行しなければ、入力プログラム９０の４〜８行目のループは、５〜６行目を実行する最適化不能分割ループと７行目を実行する最適化可能分割ループにしかループ分割できず、７行目の文しか並列化やベクトル化ができない。
しかし、本発明のループ分割によって、呼び出し先手続きｓｕｂ１内の１４，１５行目の文も別々の分割ループに割り当てることができ、その結果、７行目の文に加えて１５行目の文も最適化可能分割ループに割り当てることができた。その結果、並列化やベクトル化可能な文の数が１文から２文に増えた。また、一時変数を使用していないので、余分な文の実行をせずに済み、プログラムの実行がより高速化できる。
【００３３】
【発明の効果】
以上説明したように、本発明によれば、手続き呼び出し先の文も呼び出し元ループにおけるループ分割後の最適化対象となるので、より実行性能の良いプログラムまたはオブジェクトコードが得られる。また、一時変数を導入することなくループ分割が可能になるので、分割前のループに対して実行性能を劣化させることがない。
【図面の簡単な説明】
【図１】本発明の一実施例を示すループ分割方法を実行するコンパイラの構成図である。
【図２】図１におけるループ分割解析変換部の処理フローチャートである。
【図３】図１における文解析部の処理フローチャートである。
【図４】図１における手続きコピー解析部の処理フローチャートである。
【図５】図１における手続きコピー変換部の処理フローチャートである。
【図６】図１における文変換部の処理フローチャートである。
【図７】本発明で使用される入力プログラムの一例図である。
【図８】本発明でループ分割されたプログラムの一例図である。
【図９】本発明により手続きコピー変換されたプログラムの一例図である。
【図１０】本発明により文変換されたプログラムの一例図である。
【図１１】本発明により並列化された出力プログラムの一例図である。
【図１２】本発明によりベクトル化された出力プログラムの一例図である。
【図１３】本発明が適用される逐次処理計算機の構成の一例を示す図である。
【符号の説明】
１０…本発明のコンパイラ、１１ …構文解析部、１３…ループ分割部、
１５…最適化部、１７…コード生成部、５０…逐次処理計算機、５１…プロセッサ、
５２…バス、５３…メモリ、９０…入力プログラム、９１…中間語、
９３…ループ分割されたプログラム、９２…出力プログラム、
９４…手続きコピー変換されたプログラム、９５…文変換されたプログラム、
９６…並列化出力プログラム、９７…ベクトル化出力プログラム、
１３１…依存解析部、１３２…ループ分割解析変換部、１３３…文解析部、
１３４…手続きコピー解析部、１３５…手続きコピー変換部、１３６…文変換部。[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to an optimizing compiler for converting a computer program into an executable program, and more particularly to a loop dividing method for dividing a loop body having a procedure call into a plurality of loop bodies to create a plurality of loops. .
[0002]
[Prior art]
Conventionally, various loop structure conversion methods have been used in order to accelerate the speed of loop execution. One of them is a loop division. Loop division means that the loop body of one loop is maintained while maintaining the order relationship of defining (substituting values for variables) and using (using variable values) variables in the loop, excluding loop control variables. This is a method of creating a plurality of loops by dividing into a plurality of loop bodies.
As a result, optimization such as parallelization and vectorization, which was impossible in the original loop, is enabled in some divided loops, and the loop execution performance may be improved.
First, terms used in the present specification will be described. The order relation of the definition and use of variables depends, and the definition and use are collectively called a reference. Also depend on loop iterations
(Dependence that refers to a variable in one iteration and refers to the same variable in the next and subsequent iterations) is a loop-carrying dependency, a loop-independent dependency is a loop-independent dependency, a dependency that comes after definition is flow-dependent, and a dependency is defined after use A dependency that comes with is called an inverse dependency, and a dependency that comes after a definition is called an output dependency. Flow dependencies that exist over loop iterations depend on loop transport flows, flow dependencies that do not exist over loop iterations depend on loop-independent flows, and inverse dependencies that do not exist over loop iterations are loop-independent reverse. An output dependency that does not extend over the dependency and loop repetition is called a loop-independent output dependency. A sentence including a variable having a specific dependency is referred to as a statement having the dependency, a dependency when a variable referred to in statement A is referred to in statement B is referred to as a dependency from statement A to statement B, and a specific dependency. A loop including a certain statement is called a loop having the dependency. Further, the dependency cycle refers to a dependency when the dependency relationship between sentences is sent from sentence A to sentence B and sentence B to sentence C, and finally returns to sentence A as the starting point.
[0003]
First, a loop division method will be described. In loop division, an original loop is divided so that a statement included in one dependent cycle is assigned to the same divided loop. By allocating a sentence having a dependency cycle and a sentence without a dependency cycle to separate split loops, optimization according to each dependency can be separately applied to each split loop.
Next, a loop division method when a procedure call statement is present in a loop will be described. By performing inter-procedural dependency analysis for checking dependencies across procedures, the dependency between the calling procedure and the called procedure can be determined. If there is a dependency cycle between the calling procedure and the called procedure in the loop, there is also a method of interpreting that the procedure call statement is included in the dependency cycle and performing the above-described loop division.
However, in this method, statements that are not included in the dependent cycle in the called procedure are interpreted as being expanded as being included in the dependent cycle. As a method of not performing such an expanded interpretation, there is a method of performing inline expansion as preprocessing of loop division. Inline expansion is a process of replacing a procedure call statement with a procedure body. This makes it possible to evaluate dependencies on a statement-by-statement basis without performing extended interpretation, and to assign statements of a called procedure to separate split loops.
[0004]
Consider Loop 1 as an example. S1 to S4, SC are sentence numbers.

The statements that appear in Loop 1 have the following dependencies:
-Dependency (a) From statement S1 to statement S2, loop independent flow dependency on array A
-Dependency (b) Dependency of loop carrying flow on array B from statement S2 to statement S1
-Dependency (c) Loop-independent flow dependence on array B from statement S2 to statement S4
-Dependency (d) Loop-independent inverse dependency on array D from statement S3 to statement S4
From the dependencies (a) and (b), it can be seen that in the loop 1, the statements S1 and S2 form a dependency cycle.
[0005]
Temporary variables are associated with the actual arguments and the formal parameters of the procedure call, and when the calling procedure and the called procedure have another variable with the same name, It is necessary to guarantee the correspondence between variables when there is one. There is also a method in which the correspondence between the actual argument and the dummy argument is passed by register instead of using a temporary variable. However, if the number of arguments is large, there are not enough registers, and temporary variables must be used. Furthermore, when the array shape differs between the actual argument and the dummy argument, for example, when the actual argument is a two-dimensional array but the dummy argument is received as a one-dimensional array, register passing cannot be applied and the temporary variable is The calculations used are required.
When the procedure sub1 is expanded inline with respect to the procedure call statement Sc,
Is generated. tmpA, tmpB, and tmpC are temporary variables generated by the compiler for inline expansion. The sentences S2 and S3 were converted into sentences S2 ′ and S3 ′ using temporary variables. S1, S4, S2 ', S3', ST1 to ST6 are sentence numbers.
[0006]

The dependencies that appear in this loop 1a are as follows.
-Dependency (1) From statement S1 to statement ST1, loop-independent flow dependency on array A
-Dependency (2) Loop-independent inverse dependency on array A from statement ST1 to statement ST5
-Dependency (3) From statement S1 to statement ST5, loop-independent output dependency on array A
-Dependency (4) Loop-independent flow dependency on array tmpA from statement ST1 to statement S2 '
-Dependency (5) From statement ST1 to statement ST5, loop independent flow dependency on array tmpA
-Dependency (6) Loop-independent inverse dependency on array B from statement ST2 to statement ST6
-Dependency (7) From statement ST6 to statement S1, loop carrying flow dependency on array B
-Dependency (8) From statement ST6 to statement S4, loop-independent flow dependency on array B
-Dependency (9) Loop-independent output dependency on array tmpB from statement ST2 to statement S2 '
-Dependency (10) From statement S2 'to statement ST6, loop independent flow dependency on array tmpB
-Dependency (11) Loop-independent inverse dependency on array C from statement ST3 to statement ST7
-Dependency (12) Loop-independent output dependency on array tmpC from statement ST3 to statement S3 '
-Dependency (13) From statement S3 'to statement ST7, loop independent flow dependency on array tmpC
-Dependency (14) Loop-independent inverse dependency on array D from statement ST4 to statement ST8
-Dependency (15) Loop-independent output dependence on array D from statement ST8 to statement S4
-Dependency (16) From statement ST4 to statement S3 ', loop independent flow dependency on array tmpD
-Dependency (17) From statement ST4 to statement ST8, loop independent flow dependency on array tmpD
[0007]
From the dependencies (1), (4), (7), and (10), it is understood that the statements S1, ST1, S2 ', and ST6 constitute a dependency cycle in the loop 1a. Next, when the order of the statements in the loop body is changed so that statements having a dependency cycle and statements not having a dependency cycle are grouped as much as possible while maintaining the dependency, the following loop 1a 'is obtained. .

[0008]
Further, when the loop division is applied to the loop 1a ′, the following loops 1aa and 1ab are generated.

[0009]
As a result of this loop division, the statements S1 and S2 'are assigned to the division loop 1aa, and the statements S4 and S3' are assigned to the division loop 1ab.
Further, consider optimizing the loop after division. Even if the loop before splitting cannot be optimized because it has certain dependencies, some loops after splitting may not have the dependency, so optimization is performed for loops that do not have the dependency. Can be applied. For example, parallelization and vectorization are given as examples of optimization. The condition that the loop can be parallelized is that there is no loop-carrying flow dependency in the loop, and the condition that the loop can be vectorized is that there is no dependent cycle in the loop.
Since loop 1 before the division did not satisfy either condition, it could not be parallelized or vectorized.
However, since the loop 1ab after the division does not have the loop carrying flow dependency and the dependency cycle, it can be parallelized and vectorized.
Such loop division and inline expansion are known technologies, for example,
Michael Wolfe (1996) "High Performance Compilers for Parallel Computing", Addison-Wesley Publishing Company, pp. 146-64. 323-pp. 330 pp. 360-pp. 361.
[0010]
[Problems to be solved by the invention]
The above prior art requires a large number of temporary variables in the inline expansion.
Therefore, even if the loop is divided and the optimization is applied to some of the divided loops, there is a problem that it takes a long time to calculate a temporary variable and performance is not improved.
For example, in the related art, in the loop 1a in which the procedure sub1 is expanded inline in the loop 1, eight new sentences ST1 to ST8 using the temporary variables tmpA, tmpB, and tmpC are generated. Before the inline expansion, there were only four statements ST1 to ST4 in the loop, but after the inline expansion, the number was 12 and the number of statements was tripled. Assuming that the time spent on procedure calls and the time spent on loop control can be ignored, the execution time of the loop is

It becomes. Tsmtav is the average execution time of one sentence, and N is the number of loop repetitions.
[0011]
Suppose that the execution time of the loop 1ab is significantly shorter than the execution time of the loop 1aa by optimization such as parallelization and vectorization,

Even if it can be regarded as, the execution time of the loop 1aa + the loop 1ab is 5/4 times the execution time of the loop 1. In the case of this example, if the optimization was performed by dividing the loop, the performance would be degraded.
An object of the present invention is to solve the above problem and promote optimization by dividing a loop including a procedure call into loops without performing inline expansion as preprocessing, and executing a loop in a called procedure called from the loop. It is an object of the present invention to provide a loop dividing method capable of assigning an optimizable statement to an optimizable loop and a non-optimizable statement to a non-optimizable loop among statements.
Another object of the present invention is to provide a loop dividing method capable of outputting a program or object code for reducing the execution time of a loop.
[0012]
[Means for Solving the Problems]
In order to solve the above-mentioned problems, a loop dividing method according to the present invention includes:
(A) a process of performing interprocedural dependency analysis on the loop;
(B) dividing the loop into a plurality of optimizable divided loops and a plurality of non-optimizable divided loops;
(C) examining the statements in the non-optimizable split loop to detect an optimizable statement that can be optimized;
(D) the following three procedures (d.1) a direct call procedure called directly from the non-optimizable split loop, (d.2) an optimizable procedure having the optimizable statement, and (d.3) A process in which an intermediate procedure, which is a procedure that appears on the call graph from the direct calling procedure to the optimizable procedure by following an edge, is a procedure to be copied;
(E) When the optimizable statement in the non-optimizable split loop includes a forward optimizable statement having no dependency from the non-optimizable statement to the optimizable statement, (e.1) Creating a forward direct call copy procedure, a forward optimizable copy procedure, and a forward intermediate copy procedure by copying the direct call procedure, the optimizable procedure, and the intermediate procedure, and (e.2) the non-optimizable partition At the end of the optimizable split loop immediately before the loop, insert the procedure call statement of the forward direct call copy procedure, and (e.3) check the forward direct call copy procedure, and call the procedure call statement if any. Recursively examines the preceding procedure, and if there is a procedure call statement of the procedure to be copied, changes it to the procedure call statement of the forward intermediate copy procedure or the forward optimizable copy procedure, .4) statements other than said front optimizable statements other than and 該手 continued call statement, the process of deleting said front direct call copy procedure, said front optimizable copy procedure, from said front intermediate copying procedure,
[0013]
(F) When the optimizable statement in the non-optimizable split loop includes a backward optimizable statement having no dependency from the optimizable statement to the non-optimizable statement, (f.1) Creating a backward direct call copy procedure, a backward optimizable copy procedure, and a backward intermediate copy procedure by copying the direct call procedure, the optimizable procedure, and the intermediate procedure, and (f.2) the non-optimizable split loop Inserts the procedure call statement of the backward direct call copy procedure at the beginning of the optimizable split loop immediately after (f.3), examines the backward direct call copy procedure, and calls the call destination if there is a procedure call statement. The procedure is recursively examined, and if there is a procedure call statement of the procedure to be copied, it is changed to the procedure intermediate statement of the backward intermediate copy procedure or the procedure call statement of the backward optimizable copy procedure, ( .4) statements other than aft optimizable statements other than and 該手 continued call statement, the process of deleting aft direct call copy procedure, aft optimizable copy procedure, the aft intermediate copying procedure,
(G) (g.1) Create an original direct call copy procedure, an original optimizable copy procedure, and an original intermediate copy procedure by copying the direct call procedure, the optimizable procedure, and the intermediate procedure, and (g. 2) replacing the procedure call statement of the direct call procedure in the non-optimizable split loop with the procedure call statement of the original direct call copy procedure, and (g.3) examining the original direct call copy procedure, If there is, the callee procedure is recursively examined, and if there is a procedure call statement of the procedure to be copied, it is replaced with the procedure call statement of the original intermediate copy procedure or the original optimizable copy procedure, and (g.4) the Optimizeable statements are copied in the original direct call copy procedure, the original optimizable And (h) a process of optimizing the optimizable division loop.
[0014]
BEST MODE FOR CARRYING OUT THE INVENTION
Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.
First, before describing an embodiment of the present invention, a configuration of a sequential processing computer to which the present invention is applied will be described.
FIG. 13 is a diagram illustrating an example of a configuration of a sequential processing computer to which the present invention is applied.
The sequential processing computer 50 includes a processor 51, a memory 53, and a bus 52 connecting the processor 51 and the memory 53.
In the sequential processing computer 50, each step of the optimizing compiler and the input program as shown in FIG. 1 are stored in the memory 53, and the processor 51 sequentially reads and executes them. The intermediate data output during the processing is also stored in the memory 53.
FIG. 1 is a diagram showing a configuration of a compiler for executing the loop dividing method of the present invention.
As shown in FIG. 1, the compiler 10 includes a syntax analysis unit 11, a loop division unit 13, an optimization unit 15, and a code generation unit 17. Here, the loop dividing unit 13 is a processing procedure newly installed in the present invention, and other parts that have been conventionally installed can be used as they are.
The syntax analysis unit 11 reads the input program 90 and generates an intermediate language 91. The intermediate language 91 is a representation of a program in the compiler, and its format is not particularly different from that of a normal compiler, and thus will not be described in detail here.
The loop division unit 13 uses the result of the syntax analysis unit 11 to detect a loop to which the loop division method of the present invention can be applied, and converts the loop structure by applying the loop division method. The loop division unit 13 includes a dependence analysis unit 131, a loop division analysis conversion unit 132, a sentence analysis unit 133, a procedure copy analysis unit 134, a procedure copy conversion unit 135, and a sentence conversion unit 136. The execution method of the dependency analysis unit 131 is not particularly different from the case of the inter-procedural dependency analysis in which a normal compiler performs dependency analysis across procedures, and thus will not be described in detail here. Steps 132 to 136 will be described later.
[0015]
FIG. 2 is a flowchart showing an execution method of the loop division analysis conversion unit 132 in FIG.
The execution method of the conversion unit 132 includes steps 13201 to 13210.
In step 13201, a statement included in the dependency cycle is found from the loop using the inter-procedure dependency analysis result of the dependency analysis unit 131.
In step 13202, the sentences included in the dependency cycle are arranged continuously in the sentences included in the dependency cycle, and the sentences not included in the dependency cycle are arranged continuously in the sentences not included in the dependency cycle. Move the positions of the statements in the loop within a range that does not break the dependencies, and put them together.
In step 13203, the statements in the loop are traced from the top in the order of appearance, and
At 13204, it is determined whether the statement is included in the dependency cycle. At this time, even if there is a procedure call statement, the statement of the called procedure is not followed. If the statement is included in the dependency cycle, execute step 13207. If the statement is not included in the dependency cycle, step 13206 determines whether the next statement is included in the dependency cycle. If it is included, in step 13205, a loop division point is set so as to divide the loop immediately after the sentence. If not included, step 13207 is executed.
[0016]
If it is determined in step 13207 that the sentence is the last sentence in the loop, loop division is performed in step 13208 using the division point set in step 13205. If it is not the last sentence, the process returns to step 13203 to continue the process for the next sentence. In step 13209, it is determined whether the divided loop generated in step 13208 is an optimizable loop that can be optimized or an unoptimizable loop that cannot be optimized. When the divided loops are parallelized, it is determined that the loop is an optimizable loop if the loop carrying flow dependence is not included in the divided loop, and is a non-optimizable loop if it is included. When the divided loop is vectorized, it is determined that the loop is an optimizable loop if no dependent cycle is included in the divided loop, and is a non-optimizable loop if it is included.
In step 13210, the split loops before and after the split point are checked, and if the split loops before and after are both optimizable loops or non-optimizable loops, loop splitting at that split point is stopped and the loop is restored.
This concludes the description of the loop division analysis conversion unit 132 with reference to FIG.
[0017]
FIG. 3 is a flowchart showing an execution method of the sentence analysis unit 133 in FIG.
The execution method of the sentence analyzing unit 133 includes steps 13301 to 13308.
In step 13301, it is determined whether the loop is a non-optimizable split loop. If the loop is a non-optimizable split loop, in step 13302, the statements in the non-optimizable split loop are recursively followed by the statement in the called procedure if there is a procedure call statement. If the loop is not a non-optimizable split loop, the statement in the loop is set as an optimizable statement in step 13308. A statement in a called procedure called from a procedure call statement in a loop is a statement that can be optimized with respect to this procedure call statement.
In step 13303, it is determined whether or not the statement is a procedure call statement. If the statement is a procedure call statement, the process returns to step 13302 to continue the processing for the next statement. If the statement is not a procedure call statement, execute step 13304.
In step 13304, it is determined whether the sentence can be optimized. The condition of a statement that can be optimized when performing parallelization as optimization is that the statement has no loop-carrying flow dependence. The condition of a sentence that can be optimized when performing vectorization as optimization is that the sentence is not included in a dependency cycle. If it can be optimized, the statement is made an optimizeable statement in step 13305. If it is not possible to optimize, the statement is made an unoptimizable statement in step 13306.
In step 13307, it is determined whether the sentence is the last sentence in the split loop. If it is the last sentence, the process ends. If it is not the last sentence, the process returns to step 13302 to continue the process for the next sentence.
This concludes the description of the sentence analysis unit 133 with reference to FIG.
[0018]
FIG. 4 is a flowchart showing an execution method of the procedure copy analysis unit 134 in FIG.
The execution method of the procedure copy analysis unit 134 includes steps 13400 to 13412.
In step 13400, a call graph is created. The call graph is a tree-like graph representing a procedure call relationship by connecting a procedure to a node and connecting an edge from a calling procedure to a called procedure. The method for creating the call graph is not particularly different from the case where the ordinary compiler creates the call graph, and thus will not be described in detail here.
In step 13401, the statements in the non-optimizable split loop are recursively traced to the statements in the called procedure if there is a procedure call statement, and the sentence that is being traced in step 13402 is an optimizeable statement. Determine whether If the sentence is an optimizable sentence, a direct call procedure, an optimizable procedure, and an intermediate procedure are found using the call graph created in step 13400, and these are set as copy target procedures, and step 13404 is executed. If the sentence is not an optimizable sentence, step 13412 is executed.
In step 13404, it is determined whether or not there is a dependency between the self-statement and the non-optimizable statement in the non-optimizable division loop. If there is a dependency, step 13406 is executed. If there is no dependency, step 13405 is executed.
[0019]
In Step 13405, the granularity of the divided loop immediately before the non-optimizable divided loop is compared with the granularity of the divided loop immediately after. Here, the granularity is defined as the execution cost of the loop body. Specifically, the comparison is made based on the number of statements or the number of instructions in the loop body.
When the granularity of the immediately preceding division loop is smaller, step 13407 is executed. Otherwise, perform step 13408.
In step 13406, it is determined whether the only dependency existing between the self-statement and the non-optimizable statement is a dependency from the self-statement to the non-optimizable statement. If only the sentence is dependent on the non-optimizable statement, step 13408 is executed. If it is not only the dependence of the self-statement on the non-optimizable statement, step 13409 is executed.
In step 13407, the position where the procedure call statement of the direct call copy procedure is inserted is set as the head of the immediately following division loop.
In step 13408, the insertion position of the procedure call statement of the direct call copy procedure is set to the end of the immediately preceding division loop.
In step 13409, it is determined whether the dependency existing between the self-statement and the non-optimizable statement is only a dependency from the non-optimizable statement to the self-statement. If only the statement that cannot be optimized is dependent on the own statement, step 13407 is executed. If it is not only the dependence of the unoptimizable statement on the own statement, step 13410 is executed.
In step 13410, the own sentence is reset as a non-optimizable sentence.
In step 13411, the procedure call statement that calls the procedure to be copied from the non-optimizable division loop and the procedure to be copied is set as the replacement position of the copied procedure, which is the copied procedure, with the procedure call statement.
In step 13412, it is determined that the own sentence is the last sentence of the non-optimizable split loop. If it is the last sentence, the process is terminated. If it is not the last sentence, the process returns to step 13401 to continue the process for the next sentence.
This concludes the description of the copy procedure analysis unit 134 with reference to FIG.
[0020]
FIG. 5 is a flowchart showing a method of executing the procedure copy conversion unit 135 in FIG.
The execution method of the procedure copy conversion unit 135 includes steps 13501 to 13506.
In step 13501, the statements in the non-optimizable split loop are recursively followed by the statement in the called procedure if there is a procedure call statement, and it is determined in step 13502 whether the statement being traced is a procedure call statement. . If it is a procedure call statement, step 13503 is executed. If it is not a procedure call statement, the flow returns to step 13501 to continue the processing for the next statement.
In step 13503, it is determined whether the called procedure is a procedure to be copied. If the procedure is a copy target procedure, step 13504 is executed. If the procedure is not a copy target procedure, the process returns to step 13501 to continue the processing for the next statement.
In step 13504, the copy procedure is copied to create a copy procedure. The number of copy procedures to be created is the same as the number of inserting or replacing procedure call statements of the copy procedure.
In step 13505, it is determined whether the sentence is the last sentence of the non-optimizable split loop.
If it is the last sentence, step 13506 is executed. If it is not the last sentence, the process returns to step 13501 to continue the processing for the next sentence.
In step 13506, the procedure call statement of the copy procedure is inserted or replaced at all the insertion positions and the replacement positions of the procedure call statement of the copy procedure.
This is the end of the description of the procedure copy conversion unit 135 with reference to FIG.
[0021]
FIG. 6 is a flowchart illustrating an execution method of the sentence conversion unit 136.
The execution method of the sentence conversion unit 136 includes steps 13601 to 13609.
In step 13601, the statements in the split loop are recursively followed by the statement in the called procedure if there is a procedure call statement, and it is determined in step 13602 whether the statement being traced is an optimizable statement. If it is an optimizable statement, execute step 13603. If the sentence is not an optimizable statement, step 13604 is executed.
In step 13603, it is determined whether the divided loop is an optimizable divided loop.
If it is an optimizable division loop, step 13607 is executed. If it is not an optimizable division loop, step 13605 is executed.
In step 13604, it is determined whether the divided loop is an optimizable divided loop. If it is an optimizable division loop, step 13605 is executed. If it is not an optimizable division loop, step 13607 is executed.
In step 13605, it is determined whether the statement is a procedure call statement. If it is a procedure call statement, execute step 13607. If it is not a procedure call statement, step 13606 is executed.
In step 13606, the sentence is deleted, and the process returns to step 13601 to continue the process for the next sentence of the deleted sentence.
In step 13607, it is determined whether the sentence is the last sentence in the split loop. If it is the last sentence, step 13608 is executed. If it is not the last sentence, the process returns to step 13601 to continue the process for the next sentence.
In step 13608, variables not referenced in the copy procedure are deleted from the procedure arguments.
In step 13609, the copy procedure not including the executable statement and the procedure call statement are deleted.
This concludes the description of the copy procedure analysis unit 136 with reference to FIG.
This concludes the description of the loop division unit 13.
[0022]
The optimization unit 15 performs optimization conversion such as parallelization and vectorization on the program using the result of the loop division unit 13. The configuration is not particularly different from the case of a normal optimizing compiler, and thus will not be described in detail here.
The code generator 17 reads the intermediate language 91 and generates an output program 92. Since the contents of these processes are not particularly different from those of a normal compiler, they will not be described in detail here.
This concludes the description of the loop division method according to the present invention.
[0023]
The input program 90 in FIG. 7 is an example of the source image of the input of the intermediate language 91 to the loop division unit 13 of the compiler 10, and the program 93 in FIG. 8 is the intermediate language 91 of the loop division analysis conversion unit 132 of the compiler 10. 9 is an example of a source image of output to the intermediate language 91 of the procedure copy conversion unit 135 of the compiler 10, and a program 95 of FIG. 11 is an example of a source image of an output of the sentence conversion unit 136 to the intermediate language 91. A program 96 in FIG. 11 is a program 96 of the intermediate language 91 from the optimization unit 15 when the compiler 10 performs parallelization as optimization. 12 is an example of the source image of the output to As in the case of subjected to vectorization, it is an example of a source image output to the intermediate language 91 from the optimization unit 15. An example of loop division by the compiler 10 will be described using the

programs

90, 92 to 97.
[0024]
In the input program 90 of FIG. 7, the numbers at the left end of the figure are the line numbers entered to facilitate understanding. The first line in the figure is a declaration statement for the integer type variables i and N, the second line in the figure is an initialization statement for the integer type variable N, and the third line in the figure is a real number type and contains 1 to N Declaration statements for the one-dimensional arrays A, B, C, and D having a size and the variable X of the real number type. The fourth to eighth lines in the figure are the main processing unit of the input program 90. The tenth line in the figure is the main processing unit. The procedure statement of the procedure sub1 called from the sixth line of the above, the eleventh line of the figure is the declaration statement of the integer-type variables i and N, and the twelfth line of the figure is the initialization statement of the integer-type variable N. The 13th line is a declaration statement of the one-dimensional arrays A, B, C, and D which are real numbers and have a size of 1 to N, and the 14th to 17th lines in the figure are the procedure sub1 processing units of the input program 90.
[0025]
In the output program 93 of the loop division analysis / conversion unit 132 in FIG. 8, the numbers at the left end of the figure are line numbers given to facilitate understanding. The first line of the figure is a declaration statement for the variables i and N of the integer type, the second line of the figure is an initialization statement for the variable N of the integer type, and the third line of the figure is a real number type and contains 1 to N One-dimensional arrays A, B, C, and D having a size and the declaration statement of the real type variable X. The 4th to 10th lines in the figure represent the main processing unit of the output program 93 (4 to 7 lines that cannot be optimized). Processing, processing in which lines 8 to 10 can be optimized), the twelfth line in the figure is the procedure statement of the procedure sub1 called from the sixth line of the main processing unit, and the thirteenth line in the figure is an integer-type variable i, The declaration statement of N, the 14th line of the figure is the initialization statement of the variable N of the integer type, and the 15th line of the figure is the one-dimensional array A, B, C, D of the real type and having the size of 1 to N. , The 16th to 19th lines in the figure are the procedure sub1 processing unit of the output program 93.
[0026]
In the output program 94 of the procedure copy conversion unit 135 in FIG. 9, the numbers at the left end of the figure are the line numbers given to facilitate understanding. The first line in the figure is a declaration statement for the integer type variables i and N, the second line in the figure is an initialization statement for the integer type variable N, and the third line in the figure is a real type and has a size of 1 to N. In the one-dimensional arrays A, B, C, and D with the declaration statement of the real type variable X, the fourth to eleventh lines in the figure are the main processing unit of the output program 94 (lines 4 to 7 are non-optimal parts, Lines 8 to 11 can be optimized), line 13 in the figure is the procedure statement of copy procedure _copy1_sub1 called from line 6 of the main processing unit, and line 14 in the figure is the integer type variable i, N Declaration statement, line 15 in the figure is the initialization statement for the variable N of the integer type, and line 16 in the figure is the declaration of the one-dimensional arrays A, B, C, and D of the real type and having a size of 1 to N The 17th to 20th lines of the sentence and the figure are the procedure_copy1_sub1 processing unit of the output program 94. Line 22 in the figure is a procedural statement of copy procedure _copy2_sub1 called from line 9 of the main processing unit, line 23 in the figure is a declaration statement for integer-type variables i and N, and line 24 in the figure is The initialization statement for the variable N of the integer type, the 25th line in the figure, is the declaration statement for the one-dimensional array A, B, C, D of real type and the size of 1 to N, and the 26th to 29th lines in the figure are This is the procedure_copy2_sub1 processing unit of the output program 94.
[0027]
In the output program 95 of the sentence conversion unit 136 in FIG. 10, the numbers at the left end of the figure are the line numbers given to facilitate understanding. The first line in the figure is a declaration statement for the integer type variables i and N, the second line in the figure is an initialization statement for the integer type variable N, and the third line in the figure is a real number type and contains 1 to N Statements for the one-dimensional arrays A, B, C, and D having a size and the variable X of the real number type. Lines 4 to 11 in the figure are the main processing unit of the output program 95 (lines 4 to 7 are non-optimizable parts). , Lines 8 to 11 can be optimized), line 13 in the figure is a procedure statement of copy procedure _copy1_sub1 called from line 6 in the main processing unit, and line 14 in the figure is an integer-type variable i, N Line 15 is an initialization statement for the variable N of the integer type, and line 16 is a declaration statement for the one-dimensional arrays A and B of real type and having a size of 1 to N. 17th to 19th lines are the procedure _copy1_sub1 processing unit of the output program 95 (optimization impossible) Part). The 21st line in the figure is the procedure statement of the copy procedure _copy2_sub1 called from the 9th line of the main processing unit, the 22nd line in the figure is the declaration statement of the variables i and N of the integer type, and the 23rd line in the figure is the integer The initialization statement for the variable N of the type, the 24th line in the figure is the declaration statement of the one-dimensional arrays C and D of the real type and the size of 1 to N, and the 25th to 27th lines in the figure are the procedures of the output program 95. _Copy2_sub1 is a processing unit (optimizable part).
[0028]
In the output program 96 when the optimizing unit 15 of FIG. 11 performs parallelization, the numbers at the left end of the figure are the line numbers given to facilitate understanding. The first line of the figure is a declaration statement of the integer type variables i and N and the number of processors Nproc, the second line of the figure is an initialization statement of the integer type variable N, and the third line of the figure is a real number type. Declaration statements for one-dimensional arrays A, B, C, D having a size of 1 to N and a variable X of real number type. Lines 4 to 11 in the figure correspond to the main processing unit of the output program 96 (lines 4 to 7 are optimal). Lines 8-11 can be optimized), line 13 in the figure is the procedure statement of copy procedure _copy1_sub1 called from line 6 in the main processing unit, and line 14 in the figure is an integer type Declaration statement for variables i and N, line 15 in the figure is an initialization statement for variable N of integer type, and line 16 in the figure is a one-dimensional array A, B of real type and having a size of 1 to N Statement, the 17th to 19th lines of the figure are the procedures _copy1_s of the output program 96. ub1 is a processing unit (a part that cannot be optimized). Line 21 of the figure is the procedure statement of the copy procedure _copy2_sub1 called from line 9 of the main processing unit, line 22 of the figure is the declaration statement of the integer-type variables i and N, and line 23 of the figure is The initialization statement of the variable N of the integer type, the 24th line in the figure is the declaration statement of the one-dimensional arrays C and D of the real type and the size of 1 to N, and the 25th to 27th lines in the figure are the output program 96. Procedure_copy2_sub1 This is a processing unit (optimizable part).
[0029]
In the output program 97 in the case where the optimizing unit 15 of FIG. 12 performs vectorization, the numbers at the left end of the figure are the line numbers given to facilitate understanding. The first line in the figure is a declaration statement for the integer type variables i and N, the second line in the figure is an initialization statement for the integer type variable N, and the third line in the figure is a real type 1 to N The one-dimensional arrays A, B, C, and D having the size of, and the declaration statement of the real type variable X, the fourth to ninth lines in the figure are the main processing part of the output program 97, and the eleventh line in the figure is the main processing. Copy procedure _copy1_sub1 called from line 6 of the section, line 12 in the figure is a declaration statement for integer type variables i and N, and line 13 in the figure is an initialization for integer type variable N. The fourteenth line in the figure is a declaration statement of the one-dimensional arrays A and B having a real number and a size of 1 to N, and the fifteenth to seventeenth lines in the figure are the procedure_copy1_sub1 processing unit of the output program 96.
[0030]
As is clear from FIGS. 7 to 12, the loop dividing unit 13 converts the input program 90 into the output program 95. For this purpose, first, the loop division analysis conversion unit 132 converts the input program 90 into the output program 93 (see FIG. 8). The loops of lines 4 to 8 of the input program 90 were divided into the loops of lines 4 to 7 and lines 8 to 10 of the output program 93. The former loop is a non-optimizable split loop, and the latter loop is an optimizable split loop.
Next, the procedure copy conversion unit 135 converts the output program 93 into the output program 94 (see FIG. 9). The procedure call statement on the sixth line of the output program 93 calls the procedure sub1, but the procedure call statement on the sixth line of the output program 94 now calls the copy procedure_copy1_sub1 that is a copy of the procedure sub1. Also, on line 9 of the output program 94, a procedure call statement for calling a copy procedure_copy2_sub1 obtained by copying the procedure sub1 has been added.
The procedure body has also been changed from sub1 to _copy1_sub1, _copy2_sub1. Although the main body of the original procedure sub1 has not been deleted from the program, it has been omitted from the output program 94 because the procedure call statement has been deleted by the procedure copy conversion unit 135. Output programs 95 to 97 are omitted for the same reason.
[0031]
Next, the sentence conversion unit 136 converts the output program 94 into an output program 95 (see FIG. 10). The substitute statement on line 18 of the output program 94 is an optimizable statement, the assignment statement on line 26 is a non-optimizable statement, the non-optimizable split loop on lines 4 to 7 of the output program 95, 8 to Deleted to prevent execution in the 10th optimizable split loop.
Next, when performing parallelization, the optimization unit 15 converts the output program 95 into an output program 96 (see FIG. 11). The optimizable split loop on lines 8 to 11 of the output program 95 was converted to a parallel loop on lines 8 to 11 of the output program 96. The loop range has been changed from 1 to N to 1 to N = Nproc.
When performing vectorization, the optimizing unit 15 converts the output program 95 into an output program 97 (see FIG. 12). The optimizable split loop on lines 8 to 11 of the output program 95 was converted to an array assignment statement on

lines

8 and 9 of the output program 96 by inlining the copy procedure _copy2_sub1 and then vectorizing it. . An array assignment statement has the same meaning as a vectorizable loop. This concludes the description of an example of loop division using the

programs

90, 92 to 97.
[0032]
If this loop division is not executed, the loops of lines 4 to 8 of the input program 90 are divided into a non-optimizable division loop executing lines 5 to 6 and an optimizable division loop executing line 7 Only the statement on the seventh line can be parallelized or vectorized.
However, by the loop division according to the present invention, the statements on the 14th and 15th lines in the callee procedure sub1 can also be assigned to different division loops. As a result, the statement on the 15th line in addition to the statement on the 7th line It could be assigned to an optimizable split loop. As a result, the number of statements that can be parallelized and vectorized has increased from one to two. In addition, since no temporary variables are used, the execution of extra statements is not required, and the speed of execution of the program can be further increased.
[0033]
【The invention's effect】
As described above, according to the present invention, the statement of the procedure call destination is also subject to optimization after loop division in the caller loop, so that a program or object code with higher execution performance can be obtained. Further, since the loop can be divided without introducing a temporary variable, the execution performance of the loop before the division is not deteriorated.
[Brief description of the drawings]
FIG. 1 is a configuration diagram of a compiler that executes a loop division method according to an embodiment of the present invention.
FIG. 2 is a processing flowchart of a loop division analysis conversion unit in FIG. 1;
FIG. 3 is a processing flowchart of a sentence analysis unit in FIG. 1;
FIG. 4 is a processing flowchart of a procedure copy analysis unit in FIG. 1;
FIG. 5 is a processing flowchart of a procedure copy conversion unit in FIG. 1;
FIG. 6 is a processing flowchart of a sentence conversion unit in FIG. 1;
FIG. 7 is an example of an input program used in the present invention.
FIG. 8 is an example diagram of a program divided into loops according to the present invention.
FIG. 9 is a diagram showing an example of a program subjected to procedure copy conversion according to the present invention.
FIG. 10 is a diagram showing an example of a sentence-converted program according to the present invention.
FIG. 11 is an example of an output program parallelized according to the present invention.
FIG. 12 is an example of an output program vectorized by the present invention.
FIG. 13 is a diagram showing an example of a configuration of a sequential processing computer to which the present invention is applied.
[Explanation of symbols]
10 Compiler of the present invention 11 Syntactic analysis unit 13 Loop division unit
15: optimization unit 17: code generation unit 50: sequential processing computer 51: processor
52: bus, 53: memory, 90: input program, 91: intermediate language,
93: loop-divided program, 92: output program,
94: Procedure-converted program, 95: Sentence-converted program,
96: parallelized output program, 97: vectorized output program,
131: dependency analysis unit 132: loop division analysis conversion unit, 133: sentence analysis unit
134: procedure copy analysis unit; 135: procedure copy conversion unit; 136: sentence conversion unit.

Claims

When dividing into loops that include procedure call statements that appear in a program written for a sequential processing computer and that are divided into an optimizable split loop that can be optimized and a non-optimizable split loop that cannot be optimized ,
(A) If the optimizable part exists in the called procedure of the procedure call statement included in the non-optimizable division loop, the optimizable part can be moved to the optimizable loop. Classifying the object as immovable;
(B) moving the movable optimizable part into an optimizable division loop.

In a loop division method for dividing a loop including a procedure call statement, which appears in a program described for a sequential processing computer,
(A) a dependency analysis step of examining the reference relation of variables across procedures with respect to the loop;
(B) dividing the loop into a plurality of optimizable divided loops that can be optimized and a plurality of non-optimizable divided loops that cannot be optimized;
(C) recursively examines the statement in the non-optimizable split loop and the statement in the called procedure, if any, in the called procedure, and optimizes the optimizable statement and the non-optimizable statement. A sentence parsing step for detecting an unconvertible sentence,
(D) a procedure copy analysis step of creating a call graph representing a procedure call relationship by connecting an edge from a caller procedure to a callee procedure with a procedure as a node, and detecting a procedure to be copied;
(E) detecting a statement that can be moved to the optimizable divided loop among the optimizable statements in the non-optimizable divided loop, and copying a procedure so as to satisfy the dependency between variables even after the movement; A copy conversion step of moving the movable sentence;
(F) A loop splitting method comprising an optimizing step of optimizing the optimizable split loop.

3. A loop splitting method for splitting a loop containing a procedure call statement, which appears in a program described for a sequential processing computer, comprising a step of performing parallelization or vectorization as the optimization. 3. The loop division method according to 1.

The procedure copy analysis step includes:
(D.1) a direct call procedure called directly from the non-optimizable split loop, (d.2) an optimizable procedure having the optimizable statement,
(D.3) an intermediate procedure, which is a procedure that follows an edge on the call graph from the direct call procedure according to the procedure call relation, and appears between the call procedure and the optimizable procedure;
3. The method according to claim 2, further comprising the step of detecting as a copy target procedure.

The copy conversion step includes:
(A) When the optimizable statement in the non-optimizable split loop includes a forward optimizable statement that does not have a data dependency from the non-optimizable statement to the optimizable statement,
(A.1) creating a forward direct call copy procedure, a forward optimizable copy procedure, and a forward intermediate copy procedure by copying the direct call procedure, the optimizable procedure, and the intermediate procedure;
(A.2) inserting a procedure call statement of the forward direct call copy procedure at the end of the optimizable split loop immediately before the non-optimizable split loop;
(A.3) The forward direct call copy procedure, and if there is a procedure call statement in it, recursively examine the callee procedure. If there is a procedure call statement of the procedure to be copied, the forward intermediate copy procedure or the forward intermediate copy procedure Changed to the procedure call statement of the forward optimizable copy procedure,
(A.4) Deleting statements other than the forward optimizable statement and the procedure call statements in the copy target procedure from the forward direct call copy procedure, the forward optimizable copy procedure, and the forward intermediate copy procedure. A forward copy conversion step,
(B) if the optimizable statement in the non-optimizable split loop includes a backward optimizable statement that does not have a data dependency from the optimizable statement to the non-optimizable statement,
(B.1) creating a backward direct call copy procedure, a backward optimizable copy procedure, and a backward intermediate copy procedure which are obtained by copying the direct call procedure, the optimizable procedure, and the intermediate procedure;
(B.2) inserting a procedure call statement of the backward direct call copy procedure at the beginning of the optimizable split loop immediately after the non-optimizable split loop;
(B.3) The backward direct call copy procedure and, if there is a procedure call statement in it, recursively check the callee procedure. If there is a procedure call statement of the procedure to be copied, the backward intermediate copy procedure or the backward intermediate copy procedure or Changed to a procedure call statement of a backward-optimizable copy procedure,
(B.4) A statement other than the backward optimizable statement and other than a procedure call statement in the procedure to be copied is deleted from the backward direct call copy procedure, the backward optimizable copy procedure, and the backward intermediate copy procedure. A backward copy conversion step;
(C) (c.1) creating an original direct call copy procedure, an original optimizable copy procedure, and an original intermediate copy procedure by copying the direct call procedure, the optimizable procedure, and the intermediate procedure;
(C.2) replacing the procedure call statement of the direct call procedure in the non-optimizable split loop with the procedure call statement of the original direct call copy procedure;
(C.3) The original direct call copy procedure, and if there is a procedure call statement in it, recursively examines the callee procedure. If there is a procedure call statement of the procedure to be copied, the original intermediate copy procedure or the original intermediate copy procedure or Replace with the procedure call statement of the original optimizable copy procedure,
(C.4) an original copy conversion step of deleting the optimizable statement from the original direct call copy procedure, the original optimizable copy procedure, and the original intermediate copy procedure. 3. The loop division method according to 2.

6. A recording medium, wherein a process described in each step in the loop division method according to claim 1 is converted into a program, and the program is stored.