JPH08263372A

JPH08263372A - Device for collection of cache miss information and tuning

Info

Publication number: JPH08263372A
Application number: JP7069121A
Authority: JP
Inventors: Masanori Yamada; 雅則山田; Toshihiro Tsurugasaki; 俊博鶴ヶ▲崎▼
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1995-03-28
Filing date: 1995-03-28
Publication date: 1996-10-11

Abstract

PURPOSE: To tune the performance by a computer by collecting cache miss information with respect to the object program outputted from the compiler and putting this information to the use of performance tuning and feeding back the obtained cache miss information to the compiler. CONSTITUTION: A control part 4 traces an execution instruction of a measuring object 2 outputted from a compiler 1 and extracts the data address accessed by the instruction and transfers it to a cache simulation part 5 together with compile information 3. The cache simulation part 5 simulates a cache memory; and if the transferred data address is not within the address range of data which is considered to be taken into the cache memory, a cache miss is detected, and the frequency in access of cache miss information 6 and the frequency in cache miss are counted up.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、コンパイラが出力する
オブジェクトプログラムについてキャッシュミス情報を
収集し、このキャッシュミス情報に基づいてプログラム
のチューニングを行う装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an apparatus for collecting cache miss information on an object program output by a compiler and tuning the program based on the cache miss information.

【０００２】[0002]

【従来の技術】Ｉ／Ｏバウンドでないプログラムの実行
時間の大きな部分は、書かれたプログラムテキストのわ
ずか数％に集中しているという経験則が知られている。
従ってこの数％の部分が効率よく実行されるように工夫
すれば、プログラム全体の実行性能が向上することにな
る。プログラム実行が集中するわずか数％の部分を検出
するために、プログラム中の各実行文が何回実行された
かを計数する性能解析ツールが知られている。例えば情
報処理，ＶＯＬ２０，ＮＯ．８，ＰＰ７０３−７１１
（１９７９）は、そのような動的解析ツールについて論
じている。It is known that a large part of the execution time of a program that is not I / O bound is concentrated in only a few percent of the written program text.
Therefore, if the device is devised so that this few percent portion is efficiently executed, the execution performance of the entire program will be improved. A performance analysis tool is known that counts the number of times each executable statement in a program is executed in order to detect only a few percent of the area where the program execution is concentrated. For example, information processing, VOL 20, NO. 8, PP703-711
(1979) discusses such a dynamic analysis tool.

【０００３】[0003]

【発明が解決しようとする課題】従来技術の性能解析ツ
ールを使えばプログラム中の各実行文の実行回数を収集
することはできるが、プログラム中の実行文が取り扱う
データのアクセス性能については考慮されていない。特
にサイズの大きな配列を取り扱うようなプログラムをキ
ャッシュメモリのついた情報処理装置で走行させる場合
の性能劣化について配慮されていないという問題があっ
た。Although it is possible to collect the number of executions of each execution statement in a program by using a performance analysis tool of the prior art, the access performance of data handled by the execution statement in the program is taken into consideration. Not not. In particular, there is a problem in that performance deterioration is not taken into consideration when running a program that handles a large-sized array in an information processing apparatus having a cache memory.

【０００４】本発明は、コンパイラが出力するオブジェ
クトプログラムについてキャッシュミス情報を収集し、
性能チューニングに役立てることを目的とする。The present invention collects cache miss information about an object program output by a compiler,
The purpose is to be useful for performance tuning.

【０００５】また本発明は、得られたキャッシュミス情
報をコンパイラへフィードバックしてコンパイラによっ
て性能チューニングを行うことを目的とする。It is another object of the present invention to feed back the obtained cache miss information to the compiler for performance tuning by the compiler.

【０００６】[0006]

【課題を解決するための手段】本発明は、オブジェクト
プログラムの実行命令をトレースして命令がアクセスす
るデータのデータアドレスを抽出し、このデータアドレ
スに基づいてキャッシュメモリをシミュレートし、デー
タアドレスに係わるデータがキャッシュメモリに取り込
まれていないとみなせるときにこの状態をキャッシュミ
スとして検出し、データアドレスについてのアクセス回
数とキャッシュミス回数とをカウントアップして出力す
るキャッシュミス情報の収集装置を特徴とする。According to the present invention, an execution instruction of an object program is traced to extract a data address of data accessed by the instruction, a cache memory is simulated based on this data address, The present invention is characterized by a cache miss information collection device that detects this state as a cache miss when it can be considered that the relevant data has not been taken into the cache memory, and counts up and outputs the number of accesses and the number of cache misses for the data address. To do.

【０００７】本発明は、さらにこのキャッシュミス情報
を参照し、実行命令に関するデータアドレスのキャッシ
ュミス回数のアクセス回数に対する比率が所定値以上の
ときにこの命令と並行して実行され得るだけの独立した
後続命令を並行実行させるよう命令の実行順序を変更す
る機能をコンパイラに設けたチューニング装置を特徴と
する。The present invention further refers to this cache miss information and is independent so that it can be executed in parallel with this instruction when the ratio of the number of cache misses of the data address relating to the execution instruction to the number of accesses is a predetermined value or more. The tuning device is provided with a function of changing the execution order of the instructions so that the subsequent instructions are executed in parallel.

【０００８】[0008]

【作用】ロード／ストア命令のように主記憶装置中のデ
ータにアクセスする命令によってアクセスされるすべて
のデータアドレスについて、アクセス回数とキャッシュ
ミス回数とを集計できるので、キャッシュミス回数の多
いデータアドレスについて改善を加えることによってプ
ログラムの実行性能を向上させることが可能である。The number of accesses and the number of cache misses can be aggregated for all data addresses accessed by instructions that access data in the main memory such as load / store instructions. By making improvements, it is possible to improve the execution performance of the program.

【０００９】さらにキャッシュミス情報をコンパイラに
フィードバックしてキャッシュミスを考慮に入れた命令
スケジュールを行うことができ、プログラムの実行性能
向上に寄与できる。Further, the cache miss information can be fed back to the compiler to carry out the instruction schedule in consideration of the cache miss, which can contribute to the improvement of the execution performance of the program.

【００１０】[0010]

【実施例】以下、本発明の一実施例について図面を用い
て詳細に説明する。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS An embodiment of the present invention will be described in detail below with reference to the drawings.

【００１１】図１は、本実施例のキャッシュミス情報の
収集とチューニングを行う装置の構成と処理の流れの概
略を示す図である。コンパイラ１は、ソースプログラム
９を入力して測定オブジェクト２及びコンパイル情報３
を生成して出力する。測定オブジェクト２は、コンパイ
ラ１によって生成され、チューニングの対象となるオブ
ジェクトプログラムである。コンパイル情報３は、コン
パイラ１が作成するソースプログラム９及び測定オブジ
ェクト２に関する情報である。制御部４は、測定オブジ
ェクト２の実行をトレースし、測定オブジェクト２から
ロード／ストア命令が発行されるたびにロード／ストア
命令の主記憶格納場所を示すアクセス命令アドレスとこ
れらのロード／ストア命令がアクセスするデータのアク
セスデータアドレスとをキャッシュシミュレート部５に
渡す。制御部４は、また関連するソースプログラム９の
コンパイル情報３をキャッシュシミュレート部５に渡
す。キャッシュシミュレート部５は、制御部４から受け
取ったアクセス命令アドレスとアクセスデータアドレス
についてキャッシュメモリのアクセス状況をシミュレー
トし、各アクセス命令アドレスとアクセスデータアドレ
スについてアクセス回数とキャッシュミス回数を計数し
てキャッシュミス情報６を出力する。画面表示部７は、
ソースプログラム９及びキャッシュミス情報６を入力し
てソースプログラムの行単位にアクセス回数とキャッシ
ュミス回数とをディスプレイ画面８上に表示する。また
画面表示部７は、ソースプログラム９に出現する配列に
ついてアクセス回数とキャッシュミス回数とをディスプ
レイ画面８上に表示する。コンパイラ１は、キャッシュ
ミス情報６を入力してロード／ストア命令がアクセスす
るデータのキャッシュミス回数を考慮に入れ、性能を向
上させるようにオブジェクトプログラムを最適化する。
コンパイル情報３、キャッシュミス情報６及びソースプ
ログラム９は情報処理装置の記憶装置に格納される情報
である。コンパイラ１、測定オブジェクト２、制御部
４、キャッシュシミュレート部５及び画面表示部７は処
理部であり、この情報処理装置の記憶装置に格納される
プログラムを実行することによって実現される。FIG. 1 is a diagram showing the outline of the configuration and processing flow of an apparatus for collecting and tuning cache miss information according to this embodiment. The compiler 1 inputs the source program 9 and inputs the measurement object 2 and the compilation information 3
Is generated and output. The measurement object 2 is an object program that is generated by the compiler 1 and is a target of tuning. The compilation information 3 is information about the source program 9 and the measurement object 2 created by the compiler 1. The control unit 4 traces the execution of the measurement object 2, and every time a load / store instruction is issued from the measurement object 2, the access instruction address indicating the main memory storage location of the load / store instruction and these load / store instructions are stored. The access data address of the data to be accessed and the cache simulation unit 5 are passed. The control unit 4 also passes the compile information 3 of the related source program 9 to the cache simulating unit 5. The cache simulating unit 5 simulates the access status of the cache memory for the access instruction address and the access data address received from the control unit 4, and counts the number of accesses and the number of cache misses for each access instruction address and access data address. The cache miss information 6 is output. The screen display unit 7
The source program 9 and the cache miss information 6 are input and the number of accesses and the number of cache misses are displayed on the display screen 8 for each line of the source program. The screen display unit 7 also displays the access count and the cache miss count for the array appearing in the source program 9 on the display screen 8. The compiler 1 inputs the cache miss information 6 and takes the number of cache misses of the data accessed by the load / store instruction into consideration, and optimizes the object program so as to improve the performance.
The compile information 3, the cache miss information 6 and the source program 9 are information stored in the storage device of the information processing device. The compiler 1, the measurement object 2, the control unit 4, the cache simulation unit 5, and the screen display unit 7 are processing units, and are realized by executing a program stored in the storage device of this information processing apparatus.

【００１２】図２は、コンパイル情報３のデータ構成を
示す図である。ソースファイル名称は、ソースプログラ
ム９が格納されるファイルの名称である。行番号は、ソ
ースプログラム９の各ソースステートメント行の番号で
ある。アクセス命令アドレスは、各行番号に対応してそ
のソースステートメントのアクセス命令アドレスを格納
する。コンパイル情報３の後半部分のソースファイル名
称はソースプログラム９が格納されるファイルの名称、
配列名称はソースプログラムに出現する配列の名称であ
り、先頭アドレスは当該配列の先頭のアドレスであり、
サイズは当該配列全体のサイズであり、要素サイズは当
該配列を構成する要素のサイズである。各配列について
サイズを要素サイズで割ったものが要素数である。FIG. 2 is a diagram showing a data structure of the compilation information 3. The source file name is the name of the file in which the source program 9 is stored. The line number is the number of each source statement line of the source program 9. The access instruction address stores the access instruction address of the source statement corresponding to each line number. The source file name in the latter half of the compilation information 3 is the name of the file in which the source program 9 is stored,
The array name is the name of the array that appears in the source program, the start address is the start address of the array,
The size is the size of the entire array, and the element size is the size of the elements that make up the array. The number of elements is the size of each array divided by the element size.

【００１３】図３は、キャッシュミス情報６のデータ構
成を示す図である。図３（ａ）は、ソースプログラムの
行位置に対応したアクセス回数とキャッシュミス回数の
表である。アクセス命令アドレス、ソースファイル名称
及び行番号は、コンパイル情報３から得られる情報であ
る。図３（ｂ）は、アクセスデータアドレス別のアクセ
ス回数とキャッシュミス回数の表である。アクセスデー
タアドレスは、制御部４からキャッシュシミュレート部
５に渡されるデータのアドレスであり、配列を構成する
各要素のアクセスデータアドレスを含む。図３（ｃ）
は、配列アドレス表であり、配列名称、先頭アドレス、
サイズ及び要素サイズから構成される。これらの情報は
コンパイル情報３から得られるものであり、これらの情
報によって配列を構成する各要素のアクセスデータアド
レスを計算し、図３（ｂ）の情報と対応付けることがで
きる。FIG. 3 is a diagram showing a data structure of the cache miss information 6. FIG. 3A is a table of the number of accesses and the number of cache misses corresponding to the line position of the source program. The access instruction address, source file name and line number are information obtained from the compilation information 3. FIG. 3B is a table of the number of accesses and the number of cache misses for each access data address. The access data address is the address of the data passed from the control unit 4 to the cache simulating unit 5, and includes the access data address of each element forming the array. Figure 3 (c)
Is an array address table, array name, start address,
It consists of size and element size. These pieces of information are obtained from the compilation information 3, and the access data address of each element forming the array can be calculated based on these pieces of information and can be associated with the information of FIG. 3B.

【００１４】制御部４は、測定オブジェクト２を起動さ
せ、測定オブジェクト２に含まれる命令をそのシーケン
スに従って１つずつ実行させ、トレースした結果として
ロード／ストア命令のアドレスと命令によってアクセス
されるデータのデータアドレスとを抽出してキャッシュ
シミュレート部５へ渡す。本実施例ではロード／ストア
命令以外の命令は主記憶をアクセスせずキャッシュミス
に関係しないのでキャッシュシミュレート部５へ渡さな
いが、一般には主記憶をアクセスするすべての命令につ
いてその命令アドレスとデータアドレスとをキャッシュ
シミュレート部５へ渡す必要がある。制御部４は測定オ
ブジェクト２をトレースする前にコンパイル情報３を参
照して関連するコンパイル情報をキャッシュシミュレー
ト部５へ渡す。The control unit 4 activates the measurement object 2, executes the instructions contained in the measurement object 2 one by one according to the sequence, and as a result of tracing, stores the address of the load / store instruction and the data accessed by the instruction. The data address and is extracted and passed to the cache simulating unit 5. In the present embodiment, instructions other than load / store instructions are not passed to the cache simulating unit 5 because they do not access the main memory and are not related to cache misses, but in general, the instruction address and data of all the instructions that access the main memory. It is necessary to pass the address and the cache simulating unit 5. The control unit 4 refers to the compile information 3 before tracing the measurement object 2 and passes the relevant compile information to the cache simulating unit 5.

【００１５】一般にキャッシュメモリは複数のブロック
に分割されており、ブロツクを単位として主記憶装置上
の命令及びデータを取り込んでいる。ここでブロックと
はあるアドレス範囲をもった主記憶領域である。キャッ
シュメモリ上に主記憶装置上のブロックの写しが無い状
態がキャッシュミスである。キャッシュミスが生じたと
き、キャッシュメモリ上のいずれかのブロックが新しく
アクセスされたブロックによって置き換えられる。キャ
ッシュシミュレート部５は、このようなキャッシュメモ
リの動作をシミュレートするものであり、キャッシュメ
モリが有するブロックの数と同数のエントリを有するテ
ーブルを内蔵する。各エントリにはブロックの先頭アド
レスを格納する。制御部４から渡されたアクセス命令ア
ドレス及びアクセスデータアドレスから得られるブロッ
ク・アドレスがこのテーブルのいずれかのエントリに格
納されたアドレスと一致すればキャッシュのデータにヒ
ットしたのであり、一致しなければキャッシュミスが生
じたことになる。キャッシュミスが生じたとき、キャッ
シュシミュレート部５は実際のキャッシュメモリのブロ
ック置き換えアルゴリズムに従ってこのテーブル上のい
ずれかのエントリのブロック・アドレスを新しくアクセ
スされたブロックの先頭アドレスによって置き換える。
キャッシュシミュレート部５は、このようにして制御部
４から渡されたアクセス命令アドレス及びアクセスデー
タアドレスについてキャッシュミス情報６上の対応する
アクセス回数をカウントアップするとともに、キャッシ
ュミスが生じたか否かを判定し、キャッシュミスが生じ
たときはキャッシュミス情報６上の対応するキャッシュ
ミス回数をカウントアップする。Generally, the cache memory is divided into a plurality of blocks and fetches instructions and data in the main storage device in units of blocks. Here, a block is a main memory area having a certain address range. A cache miss is a state in which there is no copy of the block on the main memory in the cache memory. When a cache miss occurs, any block in the cache memory is replaced by the newly accessed block. The cache simulating unit 5 simulates the operation of such a cache memory, and has a built-in table having the same number of entries as the number of blocks of the cache memory. The head address of the block is stored in each entry. If the block address obtained from the access instruction address and the access data address passed from the control unit 4 matches the address stored in any entry of this table, the cache data has been hit, and if they do not match. A cache miss has occurred. When a cache miss occurs, the cache simulating unit 5 replaces the block address of any entry on this table with the start address of the newly accessed block according to the block replacement algorithm of the actual cache memory.
The cache simulating unit 5 counts up the corresponding access count on the cache miss information 6 for the access instruction address and the access data address passed from the control unit 4 in this way, and also determines whether or not a cache miss has occurred. If it is determined that a cache miss has occurred, the number of corresponding cache misses in the cache miss information 6 is counted up.

【００１６】キャッシュシミュレート部５は、アクセス
回数及びキャッシュミス回数のカウントアップに先立っ
て制御部４からコンパイル情報３を受け取り、図３
（ａ）及び図３（ｃ）に示すようにキャッシュミス情報
６として展開する。図３（ｂ）のアクセスデータアドレ
ス別のアクセス状況表については、制御部４による測定
オブジェクト２のトレースが開始され、実際のアクセス
データアドレスを受けたとき、新しいアクセスデータア
ドレスであればキャッシュミス情報６に追加登録して回
数のカウントアップを行い、既に登録されているもので
あれば回数のカウントアップのみを行う。The cache simulating unit 5 receives the compile information 3 from the control unit 4 prior to counting up the number of accesses and the number of cache misses.
It is expanded as cache miss information 6 as shown in (a) and FIG. 3 (c). In the access status table for each access data address of FIG. 3B, when the control unit 4 starts tracing the measurement object 2 and receives the actual access data address, if it is a new access data address, cache miss information is displayed. 6 is additionally registered and the number of times is counted up, and if it is already registered, only the number of times is counted up.

【００１７】図４は、ソースプログラムの行単位のアク
セス状況と配列ごとのアクセス状況を表示するまでの処
理手順をフローチャートにしたものである。制御部４
は、コンパイル情報３からソースファイル名称、行番
号、アクセス命令アドレス、配列名称及び配列の先頭ア
ドレス、サイズ、要素サイズの情報を読み出してキャッ
シュシミュレート部５に渡す（ステップ２１）。キャッ
シュシミュレート部５はこれらのコンパイル情報をキャ
ッシュミス情報６の固定情報として記憶装置上に展開す
る。次に制御部４は測定オブジェクト２を起動して実行
される命令を１つずつトレースし、得られたアクセス命
令アドレス及びアクセスデータアドレスを収集し、キャ
ッシュシミュレート部５に渡す（ステップ２２）。キャ
ッシュシミュレート部５はキャッシュメモリをシュミレ
ートするテーブルを参照してキャッシュミスか否か判定
する（ステップ２３）。次にキャッシュシミュレート部
５はこの判定に従ってキャッシュミス情報６上の関連す
るアドレスについてアクセス回数及びキャッシュミス回
数を更新する（ステップ２４）。制御部４は測定オブジ
ェクト２の実行が終了していなければ（ステップ２５Ｙ
ＥＳ）、ステップ２２に戻って処理を続ける。測定オブ
ジェクト２の実行が終了したとき（ステップ２５Ｎ
Ｏ）、画面表示部７に制御を渡す。画面表示部７はソー
スプログラム９及びキャッシュミス情報６を参照し、ソ
ースプログラムの各行のソースステートメントと対応さ
せてアクセス回数とキャッシュミス回数をディスプレィ
画面８上に表示する（ステップ２６）。次に画面表示部
７はキャッシュミス情報６上の図３（ｂ）に示すアクセ
スデータアドレス別のアクセス回数とキャッシュミス回
数の表を参照し、各アクセスデータアドレスがいずれか
の配列の先頭アドレスとサイズとから決まる配列領域内
に含まれるか否かを判定し、含まれればその配列の要素
とアクセス回数／キャッシュミス回数とを対応付けて記
憶する（ステップ２７）。最後に画面表示部７は配列要
素ごとのアクセス回数とキャッシュミス回数とをその配
列全体について累計し、配列全体及び配列要素ごとのア
クセス回数とキャッシュミス回数を表示する（ステップ
２８）。FIG. 4 is a flowchart showing the processing procedure until the line-by-line access status of the source program and the access status by array are displayed. Control unit 4
Reads the source file name, line number, access instruction address, array name and array start address, size, element size information from the compilation information 3 and passes it to the cache simulating unit 5 (step 21). The cache simulating unit 5 expands these compilation information as fixed information of the cache miss information 6 on the storage device. Next, the control unit 4 traces the instructions executed by activating the measurement object 2 one by one, collects the obtained access instruction address and access data address, and passes them to the cache simulating unit 5 (step 22). The cache simulating unit 5 refers to the table simulating the cache memory and determines whether or not there is a cache miss (step 23). Next, the cache simulating unit 5 updates the access count and the cache miss count for the relevant address on the cache miss information 6 according to this determination (step 24). If the execution of the measurement object 2 has not been completed, the control unit 4 (step 25Y
ES), and returns to step 22 to continue the processing. When the execution of the measurement object 2 is completed (step 25N
O), and passes control to the screen display unit 7. The screen display unit 7 refers to the source program 9 and the cache miss information 6, and displays the number of accesses and the number of cache misses on the display screen 8 in correspondence with the source statement of each line of the source program (step 26). Next, the screen display unit 7 refers to the table of the number of accesses and the number of cache misses for each access data address shown in FIG. 3B on the cache miss information 6, and each access data address is the start address of any array. Whether or not it is included in the array area determined by the size is determined, and if it is included, the elements of the array are stored in association with the access count / cache miss count (step 27). Finally, the screen display unit 7 accumulates the access count and cache miss count for each array element for the entire array and displays the access count and cache miss count for the entire array and each array element (step 28).

【００１８】図５は、配列全体及び配列の各要素につい
てのアクセス状況を表示するディスプレイ画面８の例を
示す図である。各配列要素を示すブロックの中にそれぞ
れアクセス回数とキャッシュミス回数が表示される。FIG. 5 is a diagram showing an example of the display screen 8 for displaying the access status for the entire array and each element of the array. The number of accesses and the number of cache misses are displayed in the block indicating each array element.

【００１９】以上のようにしてソースプログラムの各行
ごとにアクセス状況が表示されるので、アクセス回数と
キャッシュミス回数の多い行を検出でき、キャッシュを
有効に使用するようソースプログラムを改善することが
できる。また配列についてのアクセス状況を調べ、どの
配列について性能劣化が生じているかを知ることがで
き、性能向上に役立てることができる。Since the access status is displayed for each line of the source program as described above, it is possible to detect a line with a large number of accesses and cache misses and improve the source program so that the cache is used effectively. . In addition, it is possible to check the access status of an array to know which array has performance degradation, which can be useful for improving performance.

【００２０】なお上記実施例では、コンパイル情報３は
制御部４を経由してキャッシュシミュレート部５に渡さ
れるが、制御部４を経由せずキャッシュシミュレート部
５が直接コンパイル情報３を入力するようにしてもよ
い。このとき制御部４は測定オブジェクト２の実行をト
レースする機能だけとなる。In the above embodiment, the compile information 3 is passed to the cache simulating section 5 via the control section 4, but the cache simulating section 5 directly inputs the compile information 3 without passing through the control section 4. You may do it. At this time, the control unit 4 has only the function of tracing the execution of the measurement object 2.

【００２１】以下コンパイラ１が行うプログラムのチュ
ーニング処理の実施例について説明する。An embodiment of the program tuning process performed by the compiler 1 will be described below.

【００２２】図６（ａ）は、例としてプログラムの一部
をオブジェクトプログラムの形式で示す図である。番号
〜は、命令のシーケンスを示す番号である。Ａ，Ｂ
及びＣは配列要素を示し、ｒ１〜ｒ９はレジスタとその
番号を示す。各命令の命令実行時間のうち、命令受付時
間は当該命令を受け付けてから次の命令が受付完になる
までの時間（マシンサイクル数等）である。命令処理時
間は命令を受け付けてから処理を終了するまでの時間で
あり、キャッシュミスを考慮しなければ命令受付時間と
同じになる。キャッシュミスを考慮したとき命令処理時
間は命令受付時間より長い時間となる。図６（ｂ）は、
キャッシュミス情報６の一部であり、配列要素Ａ，Ｂ及
びＣのアクセス回数とキャッシュミス回数を示すもので
ある。FIG. 6A is a diagram showing a part of the program in the form of an object program as an example. The numbers ~ are numbers indicating the sequence of instructions. A, B
And C indicate array elements, and r1 to r9 indicate registers and their numbers. Of the instruction execution time of each instruction, the instruction acceptance time is the time from the acceptance of the instruction until the completion of acceptance of the next instruction (the number of machine cycles, etc.). The instruction processing time is the time from the acceptance of an instruction to the end of the processing, and is the same as the instruction acceptance time unless a cache miss is considered. Considering cache miss, the instruction processing time is longer than the instruction reception time. FIG.6 (b) is
The cache miss information 6 is a part of the cache miss information 6 and indicates the number of access times and the number of cache misses of the array elements A, B, and C.

【００２３】図７ａ及び図７ｂは、コンパイラ１が行う
チューニング処理の流れを示すフローチャートである。
コンパイラ１は、プログラムから最初の命令Ｐを取り出
し（ステップ３１）、ロード／ストア命令であれば（ス
テップ３２ＹＥＳ）、主記憶に格納されるデータのアク
セスデータアドレスを基にしてアクセス回数とキャッシ
ュミス回数を取り出す（ステップ３３）。キャッシュミ
ス回数をアクセス回数で除したキャッシュミス比率があ
らかじめ設定した比率（例えば５０％）より大きければ
（ステップ３４ＹＥＳ）、当該命令Ｐについてキャッシ
ュミスを考慮した場合の命令処理時間Ｔを設定する（ス
テップ３５）。図６のプログラム例では命令がステッ
プ３５に到達した命令Ｐに相当し、Ｔには６が設定され
る。次に命令Ｐがロード／ストアしたレジスタを使用す
る命令Ｎをプログラム中に検索する（ステップ３６）。
図６のプログラム例では命令が命令Ｎに相当する。次
に命令Ｐと命令Ｎとの間に独立して並行処理される他の
命令の処理時間Ｔ１を求める（ステップ３７）。図６の
プログラム例では命令が他の命令に相当し、処理時間
Ｔ１は１となる。条件を満足する他の命令がなければＴ
１は０である。次に命令Ｎに後続する命令であって命令
Ｎより前に独立して実行できる命令Ｉがあれば取り出
し、その処理時間Ｔ２を求める（ステップ３８）。図６
の例では命令が命令Ｉに相当する。次に処理時間Ｔ１
にＴ２を加えたものをＴ１と置く（ステップ３９）。次
に処理時間Ｔ１が処理時間Ｔより小さければ（ステップ
４０ＹＥＳ）、命令Ｉを上記他の命令の次に移動させて
（ステップ４１）、ステップ３８に戻って処理を継続す
る。すなわち命令Ｉを命令Ｐと並行して実行するよう命
令の実行順序を変更する。図６の例では命令を命令
と命令との間に挿入し、次いで命令も移動できる命
令Ｉであることがわかるから命令と命令との間に挿
入される。処理時間Ｔ１が処理時間Ｔより小さくない
か、ステップ３８を満足する後続命令Ｉがなく（ステッ
プ４０ＮＯ）、全命令の検査が終了していなければ（ス
テップ４２ＮＯ）、ステップ３１に戻って後続の命令に
ついて上記処理を繰り返す。命令Ｐがロード／ストア命
令でない（ステップ３２ＮＯ）かキャッシュミス比率が
小さいとき（ステップ３４ＮＯ）には、ステップ３１に
戻って上記処理を繰り返す。プログラム中のすべての命
令を検査したとき（ステップ４２ＹＥＳ）、処理を終了
する。7a and 7b are flowcharts showing the flow of the tuning process performed by the compiler 1.
The compiler 1 fetches the first instruction P from the program (step 31), and if it is a load / store instruction (YES in step 32), the number of accesses and the number of cache misses are based on the access data address of the data stored in the main memory. Is taken out (step 33). If the cache miss ratio obtained by dividing the cache miss count by the access count is larger than a preset ratio (for example, 50%) (YES in step 34), the instruction processing time T in consideration of the cache miss for the instruction P is set (step). 35). In the program example of FIG. 6, the instruction corresponds to the instruction P that has reached step 35, and T is set to 6. Next, the program searches for an instruction N that uses the register loaded / stored by the instruction P (step 36).
In the program example of FIG. 6, the instruction corresponds to the instruction N. Next, the processing time T1 of another instruction which is independently processed in parallel between the instruction P and the instruction N is calculated (step 37). In the program example of FIG. 6, the instruction corresponds to another instruction, and the processing time T1 is 1. T if there is no other command that satisfies the condition
1 is 0. Next, if there is an instruction I that follows the instruction N and that can be executed independently before the instruction N, the instruction I is taken out and the processing time T2 is obtained (step 38). Figure 6
In the example, the instruction corresponds to the instruction I. Next, processing time T1
Then, T1 is added to T2 (step 39). Next, if the processing time T1 is smaller than the processing time T (YES in step 40), the instruction I is moved after the other instruction (step 41), and the process returns to step 38 to continue the processing. That is, the instruction execution order is changed so that the instruction I is executed in parallel with the instruction P. In the example of FIG. 6, the instruction is inserted between the instructions, and then the instruction I can also be moved, so that the instruction I is inserted between the instructions. If the processing time T1 is not shorter than the processing time T, or there is no subsequent instruction I that satisfies step 38 (step 40 NO) and all the instructions have not been checked (step 42 NO), the process returns to step 31 and the subsequent instruction is executed. The above process is repeated. If the instruction P is not a load / store instruction (NO in step 32) or the cache miss ratio is small (NO in step 34), the process returns to step 31 and the above processing is repeated. When all the instructions in the program are inspected (YES in step 42), the process ends.

【００２４】上記チューニング処理の実施例によれば、
キャッシュミスを生じるロード／ストア命令の命令処理
時間の間にこのロード／ストア命令とは独立した後続の
命令を並行して実行できるから、結果としてチューニン
グ処理をしない場合に比べて性能向上を図ることができ
る。図６の例では、本実施例のチューニング処理をしな
いとき命令からまでの処理時間は１３である（命令
は命令と並行して実行される）のに対して、本実施
例のチューニング処理をしたときの処理時間は９とな
り、約１．４倍の性能向上となる。According to the above embodiment of the tuning process,
During the instruction processing time of a load / store instruction that causes a cache miss, subsequent instructions that are independent of this load / store instruction can be executed in parallel, and as a result, performance should be improved compared to when tuning processing is not performed. You can In the example of FIG. 6, the processing time from the instruction is 13 when the tuning processing of the present embodiment is not performed (the instruction is executed in parallel with the instruction), whereas the tuning processing of the present embodiment is performed. In this case, the processing time is 9 and the performance is improved by about 1.4 times.

【００２５】[0025]

【発明の効果】本発明によれば、コンパイラが出力する
オブジェクトプログラムについてキャッシュミス情報を
収集し、性能チューニングに役立てることができる。ま
た得られたキャッシュミス情報をコンパイラへフィード
バックするので、コンパイラによって性能チューニング
を行うことができる。According to the present invention, cache miss information can be collected for an object program output by a compiler, and can be used for performance tuning. Further, since the obtained cache miss information is fed back to the compiler, performance tuning can be performed by the compiler.

[Brief description of drawings]

【図１】実施例のキャッシュミス情報の収集とチューニ
ングを行う装置の構成を示す図である。FIG. 1 is a diagram showing a configuration of an apparatus for collecting and tuning cache miss information according to an embodiment.

【図２】コンパイル情報３のデータ構成について事例を
もって示す図である。FIG. 2 is a diagram showing an example of a data structure of compilation information 3;

【図３】キャッシュミス情報６のデータ構成について事
例をもって示す図である。FIG. 3 is a diagram showing an example of the data structure of cache miss information 6;

【図４】ソースプログラムの行単位のアクセス状況と配
列のアクセス状況を表示するまでの処理の流れを示すフ
ローチャートである。FIG. 4 is a flowchart showing the flow of processing until the line-by-line access status of the source program and the array access status are displayed.

【図５】配列のアクセス状況を表示する表示画面例を示
す図である。FIG. 5 is a diagram showing an example of a display screen displaying an access status of an array.

【図６】プログラムの一部を事例によって示す図であ
る。FIG. 6 is a diagram showing a part of a program by a case.

【図７ａ】実施例のコンパイラが行うチューニング処理
の流れを示すフローチャート（前半部分）である。FIG. 7a is a flowchart (first half) showing a flow of a tuning process performed by the compiler of the embodiment.

【図７ｂ】実施例のコンパイラが行うチューニング処理
の流れを示すフローチャート（後半部分）である。FIG. 7b is a flowchart (second half) showing the flow of tuning processing performed by the compiler of the embodiment.

[Explanation of symbols]

２・・・測定オブジェクト、３・・・コンパイル情報、
４・・・制御部、５・・・キャッシュシミュレート部、
６・・・キャッシュミス情報2 ... measurement object, 3 ... compilation information,
4 ... control unit, 5 ... cache simulation unit,
6 ... Cash miss information

Claims

[Claims]

Claim: What is claimed is: 1. A processing means for tracing an instruction of an object program output by a compiler one by one in accordance with its execution sequence and extracting an address on a main storage device of data accessed by the instruction, and simulating a cache memory. When the address is not within the address range of the data considered to be fetched in the cache memory, it is detected as a cache miss, and the number of accesses and the number of cache misses for the address are counted up to obtain cache miss information. A device for collecting cache miss information, which comprises:

2. A processing means for tracing instructions one by one in accordance with an execution sequence of an object program output by a compiler and extracting an address on a main storage device of data accessed by the instruction, and simulating a cache memory. When the address is not within the address range of the data considered to be fetched in the cache memory, it is detected as a cache miss, and the number of accesses and the number of cache misses for the address are counted up to obtain cache miss information. The compiler refers to the cache miss information, and is executed in parallel with the instruction when the ratio of the cache miss count of the address for the instruction to the access count is equal to or more than a predetermined value. Get independent subsequent instructions in parallel Device to collect and tuning of a cache miss information and providing a processing means for changing the execution order of instructions so as to line.