JPH10320212A

JPH10320212A - Cache optimizing method

Info

Publication number: JPH10320212A
Application number: JP9130670A
Authority: JP
Inventors: Ichiro Kushima; 伊知郎久島
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1997-05-21
Filing date: 1997-05-21
Publication date: 1998-12-04

Abstract

PROBLEM TO BE SOLVED: To execute cache optimization based on accurate cache miss ratio prediction. SOLUTION: In a step 202, whether this compiling is 1st step compiling or not is checked. In the case of the 1st step compiling, processing goes to a step 203 and cache simulation padding processing is executed. In a case other than the 1st step compiling, the processing goes to a step 204 and cache optimizing processing is executed. In the code generating processing 203, an intermediate code 307 is inputted and an object program 309 or 313 described by machine words or an assembling language is generated. (The cache simulation object program 309 is generated in the 1st step compiling and the final object program 313 is generated in the 2nd step compiling).

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】計算機の利用技術において、
オブジェクトプログラムの実行時間を削減する、コンパ
イル方法に関する。特に、キャッシュミスに起因する性
能の低下を削減するための、コンパイラによる最適化方
法に関する。BACKGROUND OF THE INVENTION In computer utilization technology,
The present invention relates to a compiling method for reducing the execution time of an object program. In particular, the present invention relates to an optimization method by a compiler for reducing a decrease in performance due to a cache miss.

【０００２】[0002]

【従来の技術】コンパイラの生成するオブジェクトプロ
グラムの実行時間を削減するための方法は、これまで数
多く開発されている。その中の１つに、コンパイラによ
るキャッシュ向けの最適化がある。2. Description of the Related Art Many methods for reducing the execution time of an object program generated by a compiler have been developed. One of them is optimization for caching by a compiler.

【０００３】キャッシュ向け最適化方法の１つとして、
ソフトウェアプリフェッチ最適化がある。ソフトウェア
プリフェッチ最適化は、データを主記憶装置（メモリ）
からキャッシュに移動する命令（プリフェッチ命令）
を、コンパイラがオブジェクトプログラムに挿入するこ
とにより、キャッシュミス時に発生する待ち時間（キャ
ッシュミスペナルティ）を削減（隠蔽）する最適化方法
である。ソフトウェアプリフェッチ最適化の方法につい
ては例えば「Bernstein他：Compiler Techniquesfor Da
ta Prefetching on the PowerPC, PACT '95, 1995, pp.
19-26」に記載がある。As one of the optimization methods for cache,
There is software prefetch optimization. Software prefetch optimization, data is stored in main storage (memory)
Instruction to move data from to cache (prefetch instruction)
Is inserted into the object program by the compiler to reduce (conceal) the waiting time (cache miss penalty) generated at the time of a cache miss. For details on software prefetch optimization methods, see, for example, Bernstein et al .: Compiler Techniques for Da.
ta Prefetching on the PowerPC, PACT '95, 1995, pp.
19-26 ".

【０００４】ソフトウェアプリフェッチ最適化では、コ
ンパイラが、キャッシュミスを起こす可能性の高いデー
タのアクセス（ロードまたはストア）に対して、そのア
クセスの前にプリフェッチ命令を発行するようなコード
を生成する。すなわち、プリフェッチ命令でデータをメ
モリからキャッシュに移動し、実際のアクセス命令では
キャッシュミスを起こらないようにしておく。（プリフ
ェッチ命令とロードまたはストア命令の間には別の無関
係の命令を挿入することにより、待ち時間を隠蔽す
る。）つまり、１回のデータアクセスに対し、プリフェ
ッチ命令とアクセス命令の２つの命令を実行する必要が
ある。したがって、キャッシュミスを起こさないデータ
のアクセスに対してソフトウェアプリフェッチを行う
と、プリフェッチ命令の発行が無駄（オーバーヘッド）
になり、その分性能が低下してしまうという問題点があ
った。In software prefetch optimization, a compiler generates a code that issues a prefetch instruction before data access (load or store) that is likely to cause a cache miss. That is, data is moved from the memory to the cache by a prefetch instruction, and a cache miss does not occur in an actual access instruction. (The latency is concealed by inserting another unrelated instruction between the prefetch instruction and the load or store instruction.) That is, for one data access, two instructions of the prefetch instruction and the access instruction are used. Need to do it. Therefore, if software prefetch is performed for access to data that does not cause a cache miss, issuing a prefetch instruction is useless (overhead).
And there is a problem that the performance is reduced accordingly.

【０００５】この問題点を解決するため、従来のコンパ
イラでは、ソースプログラムを解析して、キャッシュミ
スを起こす可能性の高いデータのアクセスについてのみ
キャッシュ向け最適化（例えばソフトウェアプリフェッ
チ）を行うようにしている。キャッシュミスを起こす可
能性を静的に解析する方法として、「Wolf他：A DataLo
cality Optimizing Algorithm, ACM SIGPLAN '91, 199
1, pp.30-40」がある（リユース解析と呼ばれる）。こ
の方法では、プログラム中のループネスト（ループの入
れ子）の中に出現する配列参照コードの添字部分を調べ
ることにより、そのデータが再利用されるかどうか、ま
た再利用されるまでの時間的インターバルを解析する。
あるデータが比較的短いインターバルの間に再利用され
ることがわかれば、そのデータはキャッシュ上に残って
いると推定される。In order to solve this problem, a conventional compiler analyzes a source program and performs optimization for cache (for example, software prefetch) only for data access that is likely to cause a cache miss. I have. As a method of statically analyzing the possibility of causing a cache miss, "Wolf et al .: A DataLo
cality Optimizing Algorithm, ACM SIGPLAN '91, 199
1, pp.30-40 "(called reuse analysis). In this method, by examining the subscript part of the array reference code that appears in the loop nest (nest of loops) in the program, it is determined whether the data is reused, and the time interval until the data is reused. Is analyzed.
If certain data is found to be reused during a relatively short interval, the data is presumed to remain in the cache.

【０００６】[0006]

【発明が解決しようとする課題】上記従来技術（リユー
ス解析方法）を用いたキャッシュ向け最適化方法では、
キャッシュミス予測にどうしても間違いが生じるという
問題があった。たとえば従来方法では、キャッシュミス
を予測する際には、プログラム中の配列のサイズやルー
プの繰り返し回数などを手がかりして解析を進めること
が多いが、これらの値は実行時に決まる（静的な解析で
は求められない）ことが多いため、結果としてキャッシ
ュミス予測がはずれてしまうことが多い。In the optimization method for a cache using the above-mentioned conventional technology (reuse analysis method),
There was a problem that a mistake would occur in prediction of a cache miss. For example, in the conventional method, when predicting a cache miss, the analysis is often performed based on the size of the array in the program and the number of loop iterations, but these values are determined at the time of execution (static analysis). Cannot be obtained in many cases), and as a result, the cache miss prediction often deviates.

【０００７】このようなプログラムの例を図４に示す。
図４はＦortran言語で書かれたプログラムであり、ＦＵ
ＮＣというサブルーチンを定義している。パラメタ宣言
部４０１から、ＦＵＮＣはＡ，Ｂ，Ｃ，Ｎ，Ｍという４
つのパラメタ（引数）を入力とすることが示され、変数
宣言部４０２〜４０３から、Ａ，Ｂ，Ｃは配列であり、
Ｍ，Ｎが整数であることが示される。また２次元配列Ａ
のサイズは第１次元が１からＮ、第２次元が１からＭ、
１次元配列Ｂ，Ｃのサイズは１からＮである。このよう
にプログラムで使われる配列のサイズは入力パラメタ
Ｍ，Ｎによって決まるので、コンパイラ側では判断でき
ない。また、プログラム実行部４０３〜４０７は２重ル
ープになっており、ループの範囲はＪが１からＭ，Ｉが
１からＮと、やはり入力パラメタＭ，Ｎによって決ま
る。このようなプログラムにおけるキャッシュミス率を
推測する場合、ループの範囲および配列のサイズが重要
な手がかりとなるので、これらが静的にわからないと予
測は不確かなものになってしまう。FIG. 4 shows an example of such a program.
FIG. 4 shows a program written in the Fortran language.
NC subroutine is defined. From the parameter declaration section 401, FUNC is A, B, C, N, M
It is shown that two parameters (arguments) are input, and A, B, and C are arrays from the variable declaration sections 402 to 403.
It is shown that M and N are integers. A two-dimensional array A
The size of the first dimension is 1 to N, the second dimension is 1 to M,
The sizes of the one-dimensional arrays B and C are 1 to N. Since the size of the array used in the program is determined by the input parameters M and N, the compiler cannot determine the size. The program execution units 403 to 407 form a double loop, and the range of the loop is determined by the input parameters M and N, where J is 1 to M and I is 1 to N. When estimating the cache miss rate in such a program, the range of the loop and the size of the array are important clues, so that if these are not known statically, the prediction becomes uncertain.

【０００８】また、従来の方法では競合性のキャッシュ
ミスを予測することが困難であるという問題点もあっ
た。競合性のキャッシュミスとは、ダイレクトマップ方
式またはセット連想方式のキャッシュで生じるキャッシ
ュミスであって、キャッシュの容量が十分大きいにもか
かわらず、２つ以上のデータのキャッシュアドレス（キ
ャッシュに格納するアドレス）が偶然一致してしまうた
めに生じるものである。このようなキャッシュミスはプ
ログラムの静的な解析では予測することが困難である。In addition, there is a problem that it is difficult to predict a competitive cache miss in the conventional method. A competitive cache miss is a cache miss that occurs in a direct map type or set associative type cache. Although the cache capacity is sufficiently large, the cache addresses of two or more data (addresses stored in the cache). ) Happen to coincide with each other. Such a cache miss is difficult to predict by static analysis of the program.

【０００９】このようにプログラムの静的解析に基づい
てキャッシュミスを予測した場合は、その予測が外れや
すいので、本当にキャッシュ向け最適化が必要なところ
に最適化が適用されない、または必要のない（すなわち
キャッシュミス率が低い）部分に無駄な最適化を行わ
れ、それがオーバーヘッドになるという問題点があっ
た。In the case where a cache miss is predicted based on a static analysis of a program as described above, the prediction is likely to be missed. Therefore, the optimization is not applied to the place where the cache optimization is really required, or is not required ( That is, there is a problem that useless optimization is performed on a portion where the cache miss rate is low), which becomes an overhead.

【００１０】本発明の目的は、プログラム中の各データ
アクセスに対するキャッシュミス率をより正確に求め、
それを利用してコンパイラによる無駄のない効率的キャ
ッシュ向け最適化を行う方法を与えることである。An object of the present invention is to more accurately determine a cache miss rate for each data access in a program,
An object of the present invention is to provide a method for performing efficient optimization for a cache without waste by a compiler by utilizing the above.

【００１１】[0011]

【課題を解決するための手段】前記目的は、次のように
フィードバックを用いる２段階からなるコンパイル方法
によって解決される。The above object is solved by a two-stage compiling method using feedback as follows.

【００１２】第１段階のコンパイルでは、ソースプログ
ラム中の各メモリ参照コードに対して、そのメモリ参照
におけるキャッシュの動作をシミュレートするコードを
挿入し、オブジェクトプログラムを生成する。次に第１
段階のコンパイルで生成されたオブジェクトプログラム
を実行することにより、各メモリ参照のキャッシュミス
率を記録したデータ（キャッシュミス率記録ファイル）
を作成する。次に第２段階のコンパイルでは、前記キャ
ッシュミス率記録ファイルと元のソースプログラムを入
力として最終的なオブジェクトプログラムを生成する。
第２段階のコンパイルでは各メモリ参照でのキャッシュ
ミス率がわかるので、ソフトウェアプリフェッチ等のキ
ャッシュ向け最適化を適用するメモリ参照を適切に選択
することができ、効果的な最適化が行える。In the first stage of compiling, an object program is generated by inserting, into each memory reference code in a source program, a code that simulates a cache operation in the memory reference. Then the first
Data that records the cache miss rate of each memory reference by executing the object program generated in the stage compilation (cache miss rate recording file)
Create Next, in the second stage of compilation, a final object program is generated using the cache miss rate recording file and the original source program as inputs.
In the second stage of compilation, the cache miss rate at each memory reference is known, so that a memory reference to which cache optimization such as software prefetch is applied can be appropriately selected, and effective optimization can be performed.

【００１３】またキャッシュシミュレーションでは、静
的解析では予測不能な競合性のキャッシュミスも正しく
シミュレートできるので、競合性キャッシュミスに対す
るキャッシュ向け最適化ももれなく適用可能となる。In the cache simulation, a cache miss of a competitiveness which cannot be predicted by a static analysis can be correctly simulated, so that the cache optimization for the competitive cache miss can be applied without fail.

【００１４】このため、コンパイラは現在のコンパイル
が第１段階のコンパイルであるかどうかを調べ、第１段
階のコンパイルである場合はキャッシュシミュレーショ
ンコード埋め込み処理を行い、そうでない場合は、キャ
ッシュ向け最適化処理を行うようし、キャッシュ向け最
適化処理では、キャッシュミス率データを利用してキャ
ッシュ向け最適化を適用する部分を選択する。For this reason, the compiler checks whether or not the current compilation is the first-stage compilation. If the current compilation is the first-stage compilation, the compiler performs a cache simulation code embedding process. The process is performed, and in the cache optimization process, a portion to which the cache optimization is applied is selected using the cache miss rate data.

【００１５】[0015]

【発明の実施の形態】以下、本発明の一実施例を説明す
る。DESCRIPTION OF THE PREFERRED EMBODIMENTS One embodiment of the present invention will be described below.

【００１６】図３は、本発明に係る計算機システムの構
成図である。図示するように、計算機システムはＣＰＵ
３０１、主記憶装置３０２、外部記憶装置３０３、ディ
スプレイ装置３０４、キーボード３０５より構成されて
いる。外部記憶装置３０３にはソースプログラム３０
８、キャッシュシミュレーション用オブジェクトプログ
ラム３０９、キャッシュシミュレーション用ライブラリ
３１０、キャッシュシミュレーション用ロードモジュー
ル３１１、オブジェクトプログラム３１２、キャッシュ
ミス率記録ファイル３１３が格納される。主記憶装置３
０２には、コンパイラ３０６と、コンパイル処理過程で
必要となる中間コード３０７が保持される。コンパイル
処理はＣＰＵ３０１がコンパイラプログラム３０６を実
行することにより行われる。キーボード３０５はユーザ
からのコマンドをコンパイラ３１１に与えるのに用い
る。ディスプレイ装置３０４はコンパイルの終了または
エラーをユーザに知らせる。FIG. 3 is a configuration diagram of a computer system according to the present invention. As shown, the computer system has a CPU
The system includes a main storage device 301, a main storage device 302, an external storage device 303, a display device 304, and a keyboard 305. The external storage device 303 stores the source program 30
8, a cache simulation object program 309, a cache simulation library 310, a cache simulation load module 311, an object program 312, and a cache miss rate recording file 313 are stored. Main storage device 3
02 holds the compiler 306 and the intermediate code 307 required in the compilation process. The compiling process is performed by the CPU 301 executing the compiler program 306. The keyboard 305 is used to give a command from the user to the compiler 311. Display device 304 informs the user of the end of compilation or an error.

【００１７】図１は、本実施例に係るコンパイラにおい
て、ユーザによるコンパイル処理の手順を示したもので
ある。コンパイル処理は、第１段階のコンパイル１０
１、リンク１０２、実行１０３、第２段階のコンパイル
１０４の４つの手順からなる。第１段階のコンパイル１
０１により、ソースプログラム３０６に対して、そのプ
ログラムの実行時のキャッシュの動作をシミュレートす
るコードを埋め込んだオブジェクトプログラム（キャッ
シュシミュレーション用オブジェクトプログラム）３０
７を作成する。次にリンク１０２で、キャッシュシミュ
レーション用オブジェクトプログラム３０７とキャッシ
ュシミュレーション用ライブラリ３０９をリンクするこ
とにより、キャッシュシミュレーション用ロードモジュ
ール３１０を作成する。リンク処理は公知の技術である
ので、ここでは詳しく説明しない。次に実行１０３で、
キャッシュシミュレーション用ロードモジュール３１０
を実行することにより、キャッシュミス率記録ファイル
３１３を作成する。最後に第２段階のコンパイル１０４
で、キャッシュミス率記録ファイル３１３を用いて、ソ
ースプログラム３０６に対してキャッシュ向け最適化を
行ったオブジェクトプログラム３１２を作成する。第１
段階および第２段階のコンパイルにおけるコンパイラ処
理の流れは図２を用いて説明する。FIG. 1 shows a procedure of a compile process by a user in the compiler according to the present embodiment. The compilation process is the first stage compilation 10
1, four steps: link 102, execution 103, and second stage compilation 104. First stage compilation 1
01, an object program (cache simulation object program) 30 in which a code for simulating the cache operation at the time of execution of the program is embedded in the source program 306.
7 is created. Next, the cache simulation load module 310 is created by linking the cache simulation object program 307 and the cache simulation library 309 with the link 102. Since the link processing is a known technique, it will not be described in detail here. Next, in execution 103,
Load module 310 for cache simulation
To create the cache miss rate recording file 313. Finally, the second stage compilation 104
Then, using the cache miss rate recording file 313, an object program 312 in which the source program 306 is optimized for cache is created. First
The flow of the compiler processing in the compiling of the stage and the second stage will be described with reference to FIG.

【００１８】図２は、コンパイル処理の流れを示したフ
ローチャートである。コンパイラの処理は、まずステッ
プ２０１で、構文解析２０１を行う。構文解析はソース
プログラム３０６を読み出し、コンパイラ内部で処理可
能な中間コード３０７を作成する。構文解析処理につい
ては、たとえば「エイホ、セシィ、ウルマン著：コンパ
イラＩ（サイエンス社、１９９０年）３０頁〜７４頁」
に記載されているので、ここでは詳しく説明しない。次
にステップ２０２で、本コンパイルが第１段階のコンパ
イルであるかを調べる。コンパイルが第１段階であるか
どうかはユーザからのコンパイルコマンドにより判定す
る。第１段階のコンパイルである場合はステップ２０３
に進み、キャッシュシミュレーションコード埋め込み処
理を行う。この処理では、この処理については図６のフ
ローチャートを用いて後で詳しく説明する。第１段階の
コンパイルでない場合は、ステップ２０４に進み、キャ
ッシュ向け最適化処理を行う。キャッシュ向け最適化処
理については図１０を用いて後ほど説明する。コード生
成処理２０３では、中間コード３０７を入力とし、機械
語またはアセンブリ言語で記述されたオブジェクトプロ
グラム３０９または３１３を生成する。（第１段階のコ
ンパイルではキャッシュシミュレーション用オブジェク
トプログラム３０９を、第２段階のコンパイルでは最終
的なオブジェクトプログラム３１３を生成する。）コー
ド生成については、同様に「エイホ、セシィ、ウルマン
著：コンパイラII（サイエンス社、１９９０年）６２４
頁〜７０７頁」に記載があるので、ここでは詳しく説明
しない。FIG. 2 is a flowchart showing the flow of the compiling process. In the processing of the compiler, first, in step 201, syntax analysis 201 is performed. The syntax analysis reads the source program 306 and creates an intermediate code 307 that can be processed inside the compiler. The syntax analysis processing is described in, for example, "Eiho, Seshi, Ullman: Compiler I (Science Inc., 1990), pp. 30-74".
And will not be described in detail here. Next, at step 202, it is checked whether the main compilation is the first stage compilation. Whether or not the compilation is the first stage is determined by a compilation command from the user. Step 203 if the compilation is the first stage
And a cache simulation code embedding process is performed. In this processing, this processing will be described later in detail with reference to the flowchart of FIG. If it is not the first stage compilation, the process proceeds to step 204, where optimization processing for the cache is performed. The cache optimization processing will be described later with reference to FIG. In the code generation process 203, the intermediate code 307 is input, and an object program 309 or 313 described in a machine language or an assembly language is generated. (In the first stage of compilation, the object program 309 for cache simulation is generated, and in the second stage of compilation, the final object program 313 is generated.) Regarding the code generation, similarly, "Eiho, Cessy and Ullman: Compiler II ( Science, 1990) 624
Pages to 707 ", and will not be described in detail here.

【００１９】図５は本実施例におけるコンパイラの中間
コードの例である。中間コードは構文解析２０１の処理
により作成される。図５の中間コードは図４のソースプ
ログラムに対応している。図５の中間コードは、基本ブ
ロック（ＢａｓｉｃＢｌｏｃｋ，ＢＢと略される）を
エッジで結んだグラフで表現されている。（このような
グラフは制御フローグラフと呼ばれている。）５０１か
ら５０７は基本ブロックである。これらの基本ブロック
には、ＢＢ１からＢＢ７までの番号がそれぞれ付けられ
ている。基本ブロックは途中で分岐や飛び込みのない、
一連のコード列を表している。エッジ（矢印）は基本ブ
ロック間の遷移を表している。たとえば基本ブロック５
０１から５０２にエッジが張られているので、５０１が
終った後で、５０２へ制御が移ることを示している。基
本ブロックの解析方法や制御フローグラフの構成方法に
ついては前著（コンパイラII）６４２頁〜６４８頁に記
載されているので、ここでは詳しく述べない。各基本ブ
ロック中に書かれているものは実行文であり、その基本
ブロックに制御が移ったときに実行される。ただし、実
行文の中には、ソースプログラム中に陽に表れていない
ものもある。例えばＢＢ５（５０５）中の最後の文「Ｉ
＝Ｉ＋１」はソースプログラム中にはないが、コンパイ
ラがソースプログラムの意味を表すために加えたもので
ある。基本ブロック中の各実行文には、コンパイラによ
って一意的な番号（実行文番号）が付けられている。こ
れを各実行文の左側の[]の中に示している。なお、ここ
では１つの文にはメモリ参照は１つしか現われないよう
にしているので、実行文番号によって対応するメモリ参
照が一意に決まる。FIG. 5 shows an example of the intermediate code of the compiler in the present embodiment. The intermediate code is created by the processing of the syntax analysis 201. The intermediate code in FIG. 5 corresponds to the source program in FIG. The intermediate code in FIG. 5 is represented by a graph in which basic blocks (abbreviated as Basic Block, BB) are connected by edges. (Such a graph is called a control flow graph.) 501 to 507 are basic blocks. These basic blocks are numbered BB1 to BB7, respectively. The basic block does not branch or jump in the middle,
It represents a series of code strings. Edges (arrows) represent transitions between basic blocks. For example, basic block 5
Since an edge is set from 01 to 502, control is transferred to 502 after 501 ends. The method of analyzing the basic block and the method of constructing the control flow graph are described in the previous book (Compiler II), pp. 642-648, and will not be described in detail here. What is written in each basic block is an executable statement, which is executed when control is transferred to the basic block. However, some executable statements are not explicitly shown in the source program. For example, the last sentence “I” in BB5 (505)
= I + 1 "is not present in the source program, but is added by the compiler to represent the meaning of the source program. Each executable statement in the basic block is given a unique number (executable statement number) by the compiler. This is shown in [] on the left side of each executable statement. Here, since only one memory reference appears in one sentence, the corresponding memory reference is uniquely determined by the execution statement number.

【００２０】図６はシミュレーションコード挿入処理２
０２の流れを示したフローチャートである。まずステッ
プ６０１で、中間コード中の未処理の実行文を取り出
す。未処理の実行文がなければステップ６０４へ進む。
ステップ６０２で、取り出した実行文がメモリ参照文で
あるかを調べる。メモリ参照文はたとえば配列要素参照
を含むような文である。メモリ参照文であれば、ステッ
プ６０３で、 sim(メモリアドレス、実行文番号) という関数呼び出し文を作成し、現在処理中の実行文の
直前に挿入する。ここで「メモリアドレス」は、参照し
ようとするメモリのアドレスを表す式である。関数si
m()はキャッシュの動作をシミュレートするためにコン
パイラが用意するライブラリ関数である。その動作につ
いては図８を用いて後ほど説明する。最後にステップ６
０４で、プログラムの先頭にキャッシュシミュレーショ
ン初期化関数を、プログラムの最後にキャッシュシミュ
レーション結果出力関数を挿入する。これらの関数、お
よび関数sim()はいずれもキャッシュシミュレーション
用ライブラリ３０７に含まれる。FIG. 6 shows a simulation code insertion process 2
13 is a flowchart showing the flow of the second embodiment. First, in step 601, an unprocessed executable statement in the intermediate code is extracted. If there is no unprocessed executable statement, the process proceeds to step 604.
In step 602, it is checked whether the extracted execution statement is a memory reference statement. The memory reference statement is, for example, a statement including an array element reference. If it is a memory reference statement, in step 603, a function call statement called sim (memory address, execution statement number) is created and inserted immediately before the execution statement currently being processed. Here, “memory address” is an expression representing the address of the memory to be referred to. Function si
m () is a library function prepared by the compiler to simulate cache operation. The operation will be described later with reference to FIG. Finally step 6
At 04, a cache simulation initialization function is inserted at the beginning of the program, and a cache simulation result output function is inserted at the end of the program. These functions and the function sim () are all included in the cache simulation library 307.

【００２１】図７はシミュレーションコード挿入処理後
の中間コードを示した図である。図７では実行文番号
４，５，７にそれぞれ「B(I)」「C(I)」「A(I,J)」とい
うメモリ参照があるので、これらの文の直前にシミュレ
ーションコードが挿入されている。たとえば実行文４の
前には「sim(&B(I),4)」という文が挿入されている。こ
こで「&B(I)」というのは、配列要素Ｂ(I)のアドレスを
表す式である。同様に実行文５の前には「sim(&C(I),
5)」という文が、実行文７の前には「sim(&A(I,J),7)」
という文が挿入されている。FIG. 7 shows the intermediate code after the simulation code insertion processing. In FIG. 7, since the execution statement numbers 4, 5, and 7 have memory references "B (I)", "C (I)", and "A (I, J)", the simulation code is inserted immediately before these statements. Have been. For example, a sentence “sim (& B (I), 4)” is inserted before the execution sentence 4. Here, “& B (I)” is an expression representing the address of array element B (I). Similarly, before execution statement 5, "sim (& C (I),
The statement "5)" is preceded by executable statement 7 with "sim (& A (I, J), 7)"
Is inserted.

【００２２】図８は関数sim()の動作を摸式的に示した
図である。以降この関数を便宜的にキャッシュシミュレ
ータと呼ぶことにする。キャッシュシミュレータはキャ
ッシュシミュレーション用ロードモジュール３１１を実
行するときに動作する。キャッシュシミュレータ８０１
は、大きく分けて制御部８０２、キャッシュテーブル８
０３、キャッシュミス率記録テーブル８０４からなる。
またキャッシュシミュレータへの入力となるのはメモリ
アドレス８０５と実行文番号８０６である。（これは関
数simの２つの引数に対応する。）キャッシュモデルテ
ーブル８０３は実際のキャッシュをソフトウェア的にシ
ミュレートしたものである。制御部８０２は与えられた
メモリアドレスを基に、そのアドレスがキャッシュ上に
存在するか（キャッシュヒットしているか）をキャッシ
ュモデルテーブルを使って調べ、もしミスしていれば必
要に応じてキャッシュモデルテーブルの内容を変更する
とともに、その結果をキャッシュミス率記録テーブル８
０４に登録する。登録するときには実行文番号を用い、
その実行文番号に対応するエントリに記録していく。な
お、キャッシュモデルテーブルおよびキャッシュミス率
記録テーブルはキャッシュシミュレーション初期化関数
の実行時（プログラムの最初）初期化される。また、キ
ャッシュミス率記録テーブルの内容はキャッシュシミュ
レーション結果出力関数の実行時（プログラムの最後）
キャッシュミス率記録ファイル３１３に出力される。キ
ャッシュモデルテーブル等を用いてキャッシュシミュレ
ーションを行う方法については例えば「T.M.Conte, C.
E.Gimac著：Fast Simulationof Computer Architectur
e, pp87-108, Kluwer, 1995」などに記載があるので、
ここでは詳しく説明しない。FIG. 8 is a diagram schematically showing the operation of the function sim (). Hereinafter, this function is referred to as a cache simulator for convenience. The cache simulator operates when the cache simulation load module 311 is executed. Cache simulator 801
Is roughly divided into the control unit 802 and the cache table 8
03, a cache miss rate recording table 804.
The inputs to the cache simulator are the memory address 805 and the execution statement number 806. (This corresponds to the two arguments of the function sim.) The cache model table 803 is a software simulation of an actual cache. The control unit 802 checks, based on the given memory address, whether the address exists in the cache (whether or not there is a cache hit) by using a cache model table. The contents of the table are changed, and the result is recorded in the cache miss rate recording table 8.
Register in 04. When registering, use the executable statement number,
It is recorded in the entry corresponding to the executable statement number. The cache model table and the cache miss rate recording table are initialized when the cache simulation initialization function is executed (at the beginning of the program). Also, the contents of the cache miss rate recording table are at the time of execution of the cache simulation result output function (at the end of the program).
Output to the cache miss rate recording file 313. For a method of performing cache simulation using a cache model table or the like, see, for example, `` TMConte, C.
E. Gimac: Fast Simulation of Computer Architectur
e, pp87-108, Kluwer, 1995 "
It will not be described in detail here.

【００２３】図９はキャッシュミス率記録テーブル８０
４の内容（の例）を示した図である。キャッシュミス率
記録テーブルは、実行文番号９０１、アクセス回数９０
２、キャッシュヒット回数９０３、キャッシュミス回数
９０４、キャッシュミス率９０５の５つのフィールドか
らなる。実行文番号９０１はメモリ参照のある実行文の
番号を表す。アクセス回数９０２はメモリ参照回数を表
すもので、これはキャッシュヒット回数９０３とキャッ
シュミス回数９０５の和となる（したがって９０２〜９
０４の３つのうち、１つは他から計算できるので省略可
能）。キャッシュミス率９０５は「キャッシュミス回数
／アクセス回数」を表す（これも他から計算できるので
省略可能）。キャッシュミス率記録ファイル３１３の内
容もキャッシュミス率テーブル８０４と同じ構成であ
る。FIG. 9 shows a cache miss rate recording table 80.
6 is a diagram showing (example of) the contents of FIG. The cache miss rate record table includes an execution statement number 901 and an access count 90
2, five fields of cache hit count 903, cache miss count 904, and cache miss rate 905. The execution statement number 901 indicates the number of an execution statement having a memory reference. The access count 902 indicates the number of memory references, which is the sum of the cache hit count 903 and the cache miss count 905 (accordingly, 902 to 9).
04 can be omitted because one can be calculated from the other). The cache miss ratio 905 indicates “the number of cache misses / the number of accesses” (this can also be calculated from other sources and can be omitted). The contents of the cache miss rate recording file 313 have the same configuration as the cache miss rate table 804.

【００２４】図１０はキャッシュ向け最適化処理２０４
の流れを示したフローチャートである。本実施例ではキ
ャッシュ向け最適化処理としてソフトウェアプリフェッ
チを行うものとする。まずステップ１００１で、中間コ
ード中の未処理の実行文がまだあるか調べる。未処理の
実行文がなければ終了する。ステップ１００２で、それ
がメモリ参照文であるかを調べる。メモリ参照文であれ
ば、ステップ１００３で、キャッシュミス率記録ファイ
ルからその実行文に対応するエントリがあるかを調べ
る。エントリがなければその実行文は実行されなかった
ということであるので、スキップしてステップ１００１
へ進む。エントリがあればステップ１００４で、その実
行文のキャッシュミス率を取り出し、それが一定値以上
であるかを調べる。キャッシュミス率が一定値以上であ
れば、キャッシュ向け最適化の効果があるメモリ参照と
いうことなので、ステップ１００５でキャッシュプリフ
ェッチ命令を挿入する。FIG. 10 shows a cache optimization process 204.
5 is a flowchart showing the flow of the process. In the present embodiment, it is assumed that software prefetch is performed as optimization processing for cache. First, in step 1001, it is checked whether there is any unprocessed executable statement in the intermediate code. If there is no unprocessed executable statement, the process ends. Step 1002 checks whether it is a memory reference. If it is a memory reference statement, it is checked in step 1003 whether there is an entry corresponding to the executable statement from the cache miss rate recording file. If there is no entry, it means that the executable statement has not been executed, so skip to step 1001.
Proceed to. If there is an entry, in step 1004, the cache miss rate of the execution statement is extracted, and it is checked whether or not it is equal to or more than a predetermined value. If the cache miss rate is equal to or more than a certain value, it means that the memory is to be referenced for the effect of cache optimization.

【００２５】図１１は、図５の入力中間コードおよび図
９のキャッシュミス率記録ファイルのデータを基に、ソ
フトウェアプリフェッチ最適化を行った結果の中間語で
ある。図５のプログラムでは実行文番号４，５，７の３
箇所にメモリ参照があるが、そこに対応するキャッシュ
ミス率はそれぞれ０.１９％，０.１９％，２５.０％で
あることがキャッシュミス率記録ファイルからわかる。
ここで、キャッシュミス率３％以上の参照にのみキャッ
シュ向け最適化を行うとすると、実行文７のメモリ参照
に対してのみプリフェッチ命令を挿入すればよい。した
がって図１２の中間語では、実行文７に対するプリフェ
ッチ命令である実行文１０「prefetch(&A(I+3,J)」が新
たに挿入されている。なお、プリフェッチ命令の挿入方
法については、例えば前記文献「Bernstein他：Compile
r Techniques for Data Prefetching on the PowerPC,
PACT '95, 1995, pp.19-26」に記載されているのでここ
では詳しく述べない。FIG. 11 shows an intermediate language obtained by performing software prefetch optimization based on the input intermediate code of FIG. 5 and the data of the cache miss rate recording file of FIG. In the program of FIG. 5, 3 of executable statement numbers 4, 5, and 7
Although there is a memory reference at a location, it can be seen from the cache miss rate recording file that the corresponding cache miss rates are 0.19%, 0.19%, and 25.0%, respectively.
Here, assuming that the cache optimization is performed only for the reference having the cache miss rate of 3% or more, the prefetch instruction may be inserted only for the memory reference of the executable statement 7. 12, an execution statement 10 “prefetch (& A (I + 3, J)” ”, which is a prefetch instruction for the execution statement 7, is newly inserted. "Bernstein et al .: Compile
r Techniques for Data Prefetching on the PowerPC,
PACT '95, 1995, pp. 19-26 "and will not be described here in detail.

【００２６】なお本実施例では、キャッシュ向け最適化
としてソフトウェアプリフェッチを行うコンパイラの構
成を示したが、本発明はこれに限定されるものではな
く、ループブロッキングあるいはタイリングと呼ばれる
他のキャッシュ向け最適化にも同様に適用可能である。
また本発明では、キャッシュミス率記録ファイルでは主
にキャッシュミス率のみを記録したが、これに加えてキ
ャッシュミスの種類（初期ミス、容量性ミス、競合性ミ
ス）を記録することにより、適用するキャッシュ最適化
をより細かく選択することも可能である。In this embodiment, the configuration of the compiler that performs software prefetching as optimization for the cache has been described. However, the present invention is not limited to this configuration. The same can be applied to the conversion.
In the present invention, the cache miss rate recording file mainly records only the cache miss rate. In addition to this, the present invention is applied by recording the type of the cache miss (initial miss, capacity miss, contention miss). It is also possible to select more finely the cache optimization.

【００２７】[0027]

【発明の効果】本発明によれば、静的解析ではキャッシ
ュミス率が予測できないプログラムに対しても、プログ
ラム中の各データアクセスに対するキャッシュミス率を
より正確に求め、それを利用してコンパイラによる無駄
のない効率的キャッシュ向け最適化を行うことができ
る。According to the present invention, even for a program in which the cache miss rate cannot be predicted by the static analysis, the cache miss rate for each data access in the program is obtained more accurately, and the program is used by the compiler by utilizing the same. It is possible to perform efficient optimization for cache without waste.

[Brief description of the drawings]

【図１】ユーザによるコンパイル処理の手順である。FIG. 1 shows a procedure of a compile process by a user.

【図２】コンパイラの処理の流れである。FIG. 2 shows a flow of processing of a compiler.

【図３】計算機システムの構成図である。FIG. 3 is a configuration diagram of a computer system.

【図４】ソースプログラム例である。FIG. 4 is an example of a source program.

【図５】中間コードの例である。FIG. 5 is an example of an intermediate code.

【図６】キャッシュシミュレーションコード埋め込み処
理の流れである。FIG. 6 is a flowchart of a cache simulation code embedding process.

【図７】キャッシュシミュレーションコード挿入後の中
間コードである。FIG. 7 shows an intermediate code after a cache simulation code is inserted.

【図８】キャッシュシミュレーション関数simの動作モ
デル図である。FIG. 8 is an operation model diagram of a cache simulation function sim.

【図９】キャッシュミス率記録テーブルである。FIG. 9 is a cache miss rate recording table.

【図１０】キャッシュ向け最適化処理の流れである。FIG. 10 shows a flow of a cache optimization process.

【図１１】キャッシュ向け最適化処理後の中間コードで
ある。FIG. 11 shows an intermediate code after a cache optimization process.

[Explanation of symbols]

３０１…ＣＰＵ、３０２…主記憶装置、３
０３…外部記憶装置、３０４…ディスプレイ装置、３０
５…キーボード。301: CPU, 302: Main storage device, 3
03: external storage device, 304: display device, 30
5 ... Keyboard.

Claims

[Claims]

An optimization method for a cache in a compiler, the method comprising: determining whether compilation is a first-stage compilation; and determining a memory reference in a program when the compilation is a first-stage compilation. A step of inserting a code for performing a cache simulation on the other hand, and a step of performing optimization for a cache with respect to a memory reference in a program at the time of the second stage compilation. And performing a process using the cache characteristic data in the memory reference.

2. The cache optimizing method according to claim 1, wherein said cache characteristic data includes a cache miss rate.

3. The cache optimizing method according to claim 1, wherein said cache characteristic data includes a type of a cache miss.

4. The cache optimizing method according to claim 1, wherein said cache characteristic data includes information indicating to which reference in the program the data is. Optimization method.

5. The optimization method for a cache according to claim 1, wherein said optimization for a cache performs software prefetch.

6. A compiler using the cache optimization method according to claim 1.

7. A storage medium storing the compiler according to claim 6.