JP2000222245A

JP2000222245A - Program execution time evaluating device, source program execution time evaluating device, program execution time evaluating method, and source program execution time evaluating method

Info

Publication number: JP2000222245A
Application number: JP11020660A
Authority: JP
Inventors: Takashi Miura; 貴三浦
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 1999-01-28
Filing date: 1999-01-28
Publication date: 2000-08-11

Abstract

PROBLEM TO BE SOLVED: To improve the detection precision of a stall caused by a program and to accurately measure the execution time of the program by bringing stall occurrence accompanying cache memory access into consideration for static stall detection based upon the program. SOLUTION: A stage state grasping part (1) has a function of grasping respective stage states when an instruction cache and a data cache are not missed, a state state grasping part (2) when the instruction cache is missed, a stage state grasping part (3) when the data cache is missed, and a stage state grasping part (4) when the instruction cache and data cache are missed. Four multipliers 16a to 16d multiply pieces 1 to 4 of stall information ST, corresponding to the respective stage state grasping parts (1) to (4), detected by a stall detection part 13 by parameters based upon an instruction cache miss rate X and a data cache miss rate Y.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、プログラムの実行
時間を評価するプログラム実行時間評価装置及びプログ
ラム実行時間評価方法と、例えばマイクロプロセッサ用
として記述されたソースプログラムの実行時間を評価す
るソースプログラム実行時間評価装置及びソースプログ
ラム実行時間評価方法とに関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a program execution time evaluation apparatus and a program execution time evaluation method for evaluating a program execution time, and a source program execution method for estimating the execution time of a source program described for a microprocessor, for example. The present invention relates to a time evaluation device and a source program execution time evaluation method.

【０００２】[0002]

【従来の技術】組み込みマイクロプロセッサ用プログラ
ムの開発者は、開発しているプログラムの実行時間がマ
イクロプロセッサにとってどのような状態になっている
かを常に把握する必要がある。すなわち、最適な状態で
は実行時間が最も短いが、最悪な条件でも実行時間が仕
様を満たしているかどうかを把握しなければならない。
また、典型的な状態での平均的な実行時間を把握するこ
ともシステム全体の設計には必要になる。2. Description of the Related Art A developer of a program for an embedded microprocessor must always grasp the state of the execution time of the program being developed for the microprocessor. That is, the execution time is the shortest in the optimum state, but it is necessary to grasp whether the execution time satisfies the specification even under the worst conditions.
It is also necessary to grasp the average execution time in a typical state in designing the overall system.

【０００３】現代のマイクロプロセッサは、半導体技術
の向上によって高集積化・高速化が進展し、内部に命令
パイプライン構造を持ち、ＶＬＩＷ（ＶｅｒｙＬｏｎ
ｇＩｎｓｔｒｕｃｔｉｏｎＷｏｒｄ）やスーパースカ
ラ技術を採り入れたり、大容量のキャッシュメモリを搭
載して一層複雑化している。さらに、アプリケーション
に特化した周辺回路を同一シリコン上に搭載する形態に
なりつつある。[0003] Modern microprocessors have become more integrated and faster due to improvements in semiconductor technology, have an internal instruction pipeline structure, and have a VLIW (Very Lon).
gInstruction Word) and super scalar technology are adopted, and a large-capacity cache memory is mounted, which is further complicated. Further, a peripheral circuit specialized for an application is being mounted on the same silicon.

【０００４】従来から、実行時間を測定できる環境とし
て、エミュレータや評価ボードといった、実マイクロプ
ロセッサが搭載された環境がある。しかし、上記の状況
によって実マイクロプロセッサの開発が長期間を要する
と共に、開発工期・コスト的な問題からエミュレータや
評価ボード環境が必ずしもプログラム開発初期段階から
提供されることが期待できなくなってきている。Conventionally, as an environment in which the execution time can be measured, there is an environment in which a real microprocessor such as an emulator or an evaluation board is mounted. However, development of an actual microprocessor takes a long time due to the above situation, and it is not expected that an emulator or an evaluation board environment is always provided from an early stage of program development due to development time and cost problems.

【０００５】一方、一般の計算機環境の速度向上と低価
格化によって、ソフトウェア的にマイクロプロセッサの
動作をシミュレーションする環境を構築することが可能
になってきている。しかし、実行時間を正確に測定する
という観点から、単純なプロセッサでは工期的・コスト
的にみて可能であっても、上記のような複雑であるプロ
セッサでは、命令レベルの論理的な実行を成し得ても実
行時間を正確に測定できるシミュレータは少ない。ま
た、実行時間の正確な測定を行えても、シミュレーショ
ン実行速度が低いために、結果を得るために長時間かか
り、プログラム開発効率が悪いという問題がある。On the other hand, the speedup and cost reduction of a general computer environment has made it possible to construct an environment for simulating the operation of a microprocessor by software. However, from the viewpoint of accurately measuring the execution time, even a simple processor can be executed in terms of time and cost, but the above-mentioned complicated processor can perform instruction-level logical execution. Few simulators can accurately measure execution time even if they are obtained. Further, even if the execution time can be accurately measured, the simulation execution speed is low, so that it takes a long time to obtain a result, and the program development efficiency is low.

【０００６】このような問題に対する１つの対応策とし
て、特願平９−２３１３３１で提案されているような技
術がある。これには、複雑なマイクロプロセッサのプロ
グラムを最適にプログラミングするために、ストールを
検出しつつプログラムを開発する環境が提案されてい
る。As a countermeasure against such a problem, there is a technique proposed in Japanese Patent Application No. 9-231331. In order to optimally program a complicated microprocessor program, an environment for developing a program while detecting a stall is proposed.

【０００７】図８は、特願平９−２３１３３１で開示さ
れたストール検出表示装置の構成を示すブロック図であ
る。FIG. 8 is a block diagram showing a configuration of a stall detection display device disclosed in Japanese Patent Application No. 9-231331.

【０００８】このストール検出表示装置では、ユーザイ
ンタフェースの制御を行うＧＵＩ部１０１におけるファ
イル制御部１０３で、ソースプログラムのファイルを読
み込み、チェック範囲取得部１０４でストールの検出範
囲を取得する。In this stall detection and display device, a file of a source program is read by a file control unit 103 in a GUI unit 101 for controlling a user interface, and a check range obtaining unit 104 obtains a stall detection range.

【０００９】そして、内部処理部１０２のテキスト解釈
部１０９で、ソースプログラムにおけるテキスト文字列
のそれぞれの命令行を解釈し、その解釈結果に基づい
て、ステージ情報作成部１１０はステージ情報を作成す
る。さらに、ストール情報作成部１１０は、発生してい
るストールを検出してストール情報を作成し、同時にイ
メージ情報作成部１１２は、パイプライン処理における
パイプラインイメージ情報を作成する。Then, the text interpreting section 109 of the internal processing section 102 interprets each command line of the text character string in the source program, and the stage information creating section 110 creates stage information based on the interpretation result. Further, the stall information creating unit 110 detects stalls that have occurred and creates stall information, and at the same time, the image information creating unit 112 creates pipeline image information in the pipeline processing.

【００１０】このような一連の処理をストール検出範囲
における全ての命令行に対して行った後、着色情報作成
部１０５はストール着色情報を作成し、イメージ表示部
１０６はパイプラインイメージ情報を表示し、ストール
情報着色部１０７はストール情報を着色する。After performing such a series of processing for all instruction lines in the stall detection range, the coloring information creating unit 105 creates stall coloring information, and the image display unit 106 displays pipeline image information. The stall information coloring unit 107 colors the stall information.

【００１１】このストール検出表示装置によれば、命令
レベルシミュレータのトレース結果を入力してプログラ
ムの命令配置を解釈し、マイクロプロセッサで発生する
ストールを検出し、プログラムの最適度と実行にかかる
時間とを求めることができる。According to the stall detection display device, the trace result of the instruction level simulator is inputted, the instruction arrangement of the program is interpreted, the stall generated in the microprocessor is detected, and the optimality of the program and the time required for execution are reduced. Can be requested.

【００１２】このストール検出表示装置を用いた開発フ
ローを図９に示す。FIG. 9 shows a development flow using the stall detection display device.

【００１３】同図において、例えばＣ言語のような高給
言語ソースプログラム（２０１）は、コンパイラ（２０
２）によりコンパイルされて機械語（２０３）に変換さ
れる。さらに、オブジェクトローダ（２０４）は、機械
語を受け取りオブジェクトプログラムをメモリ部（２０
５）にロードし、メモリ部（２０５）からのプログラム
が命令実行部（２０６）へ渡される。In FIG. 1, a high-paid-language source program (201) such as the C language, for example, has a compiler (20).
Compiled by 2) and converted into machine language (203). Further, the object loader (204) receives the machine language and stores the object program in the memory unit (20).
5), and the program from the memory unit (205) is passed to the instruction execution unit (206).

【００１４】一方、オブジェクトローダ（２０４）のロ
ード情報により、機械語とソースプログラムの対応表
（２０８）が作成され、これに基づいてデバッグ制御
（２０８）が行われ、その結果が前記命令実行部（２０
６）とユーザーＩＦ（２０９）に反映される。On the other hand, a correspondence table (208) between a machine language and a source program is created based on the load information of the object loader (204), and debug control (208) is performed based on the table, and the result is stored in the instruction execution unit. (20
6) and the user interface (209).

【００１５】そして、命令実行部（２０６）で得られた
命令トレース結果（２１０）は、ストール検出部（上記
図８のストール検出表示装置）に入力され、ストール検
出結果（２１３）を得て、実行時間を求め、ストールが
表示される（２１４）。Then, the instruction trace result (210) obtained by the instruction execution unit (206) is input to a stall detection unit (the stall detection display device of FIG. 8) to obtain a stall detection result (213). The execution time is determined and a stall is displayed (214).

【００１６】プログラムの最適化は、ストール検出結果
からなるべくストールを減らすようにプログラムソース
をコーディングすることにより行う。また、実行時間Ｔ
については、次のような関係式を利用して求める。The optimization of the program is performed by coding the program source so as to reduce the stall as much as possible from the stall detection result. The execution time T
Is determined using the following relational expression.

【００１７】[0017]

【数１】ＣＰＩｉ＝（１＋Ｓｉ／Ｎｉ） ……（１）但し、Ｎｉ：ストール検出を行う範囲の命令数ＣＰＩｉ：上記の範囲で１命令当たりに要する平均クロ
ック数Ｃ：プロセッサの動作周期（１クロック当りの周期）Ｓｉ：上記範囲でのストールサイクル数なお、ストールが全くない場合のＣＰＩを１とする。(Equation 1) CPIi = (1 + Si / Ni) (1) where Ni: the number of instructions in the range in which stall detection is performed CPIi: the average number of clocks required for one instruction in the above range C: the operating cycle of the processor (period per clock) ) Si: the number of stall cycles in the above range Note that the CPI when there is no stall is set to 1.

【００１８】上記（１）式が示すものは、結局のとこ
ろ、あるプログラム範囲について着目して、この範囲で
のストール数と命令数を最小にすれば、プログラムがそ
のプロセッサにとって最適化されていて、全体の実行時
間は命令実行数とストールがどのくらい発生しているか
で分かる、ということである。In the end, the expression (1) indicates that a program is optimized for its processor by focusing on a certain program range and minimizing the number of stalls and the number of instructions in this range. In other words, the total execution time can be determined by the number of executed instructions and how much stall is occurring.

【００１９】[0019]

【発明が解決しようとする課題】しかしながら、上記図
８の従来装置で求められた実行時間は、実際のプロセッ
サでは限定的な条件の場合に相当する。なぜなら、実際
のストールサイクル数Ｓｉは次のような式で与えられる
からである。However, the execution time obtained by the conventional apparatus shown in FIG. 8 corresponds to the case of a limited condition in an actual processor. This is because the actual number of stall cycles Si is given by the following equation.

【００２０】Ｓｉ＝（Ｓｓｔａｔｉｃ＋Ｓｍｅｍｏｒｙ）但し、Ｓｓｔａｔｉｃ：命令配置ストール数Ｓｍｅｍｏｒｙ：メモリストール上記従来装置では、Ｓｓｔａｔｉｃ値のみを検出して、
Ｓｍｅｍｏｒｙ値は検出していない。このことは、キャ
ッシュメモリを実装したマイクロプロセッサにおいて、
常にキャッシュメモリミス率が０（いつもヒットする状
況）に相当する。実際には、Ｓｍｅｍｏｒｙ値はプログ
ラムフローに依存して０ではあり得ない。このため、実
際のプロセッサにおける実行時間と比較して、常に実行
時間が小さな値になっているという問題があった。Si = (Sstatic + Smemory) where Sstatic: instruction placement stall number Smemory: memory stall In the above conventional device, only the Sstatic value is detected.
No Smemory value was detected. This means that in microprocessors with cache memory,
The cache memory miss rate always corresponds to 0 (a situation that always hits). In practice, the Smemory value cannot be 0 depending on the program flow. For this reason, there is a problem that the execution time is always smaller than the execution time in the actual processor.

【００２１】また、図９からも明らかなように、全体の
実行時間を得るためには、全実行トレースを得るための
シミュレータなどの実行環境を前提にしている。さら
に、全実行トレースについてストール検出を行わなけれ
ばならないために、トレース結果が巨大になる場合には
ストール検出に時間がかかる可能性がある。小さなソー
スプログラム部分に注目すると、同じパターンによるス
トールサイクルがくり返し発生する状況が起こっている
にもかかわらず、実行トレースの全てについてストール
検出を行うのは冗長である。As is apparent from FIG. 9, in order to obtain the entire execution time, an execution environment such as a simulator for obtaining all execution traces is assumed. Further, since stall detection must be performed for all execution traces, it may take a long time to detect a stall when the trace result is huge. Focusing on a small source program portion, it is redundant to perform stall detection on all execution traces, even though a stall cycle with the same pattern occurs repeatedly.

【００２２】本発明は、上述の如き従来の問題点を解決
するためになされたもので、その目的は、プログラムで
発生するストールの検出精度を向上させて、プログラム
の実行時間を正確に測定することが可能なプログラム実
行時間評価装置及びプログラム実行時間評価方法を提供
することである。その他の目的は、ソースプログラムで
発生するストールの検出精度を向上させてソースプログ
ラムの実行時間を正確に測定すると共に、ストール検出
の効率的な実行によってソースプログラムの実行時間を
高速に測定することできるソースプログラム実行時間評
価装置及びソースプログラム実行時間評価方法を提供す
ることである。SUMMARY OF THE INVENTION The present invention has been made to solve the above-mentioned conventional problems, and an object thereof is to improve the detection accuracy of a stall generated in a program and accurately measure the execution time of the program. It is an object of the present invention to provide a program execution time evaluation device and a program execution time evaluation method capable of performing the above. Another object of the present invention is to improve the accuracy of detecting a stall generated in a source program to accurately measure the execution time of the source program, and to quickly measure the execution time of the source program by efficiently executing the stall detection. An object of the present invention is to provide a source program execution time evaluation device and a source program execution time evaluation method.

【００２３】[0023]

【課題を解決するための手段】上記目的を達成するため
に、請求項１の発明に係るプログラム実行時間評価装置
の特徴は、プログラムを解釈する解釈手段と、前記解釈
手段によって得られたプログラムの解釈結果について、
命令キャッシュとデータキャッシュがミスしなかった場
合の各ステージ状態、前記命令キャッシュがミスした場
合の各ステージ状態、前記データキャッシュがミスした
場合の各ステージ状態、及び前記命令キャッシュと前記
データキャッシュがミスした場合の各ステージ状態をそ
れぞれ把握するステージ状態把握手段と、前記ステージ
状態把握手段によって得られるプログラムのパイプライ
ン処理において発生するストールを検出し、その検出し
たストール情報を出力するストール検出手段と、命令キ
ャッシュミス率を指定する命令キャッシュミス率指定手
段と、データキャッシュミス率を指定するデータキャッ
シュミス率指定手段と、前記ストール検出手段から出力
されるストール情報、前記命令キャッシュミス率指定手
段から得られる命令キャッシュミス率、及び前記データ
キャッシュミス率指定手段から得られるデータキャッシ
ュミス率に基づいて、前記プログラムの実行時間を算出
する実行時間算出手段とを有することにある。In order to achieve the above object, a program execution time evaluation apparatus according to the present invention is characterized by an interpreting means for interpreting a program, and a program executed by the interpreting means. Regarding the interpretation result,
Each stage state when the instruction cache and data cache did not miss, each stage state when the instruction cache missed, each stage state when the data cache missed, and when the instruction cache and the data cache missed Stage state grasping means for grasping each stage state in the case of doing, stall detection means for detecting a stall occurring in pipeline processing of a program obtained by the stage state grasping means, and outputting the detected stall information, An instruction cache miss ratio designating unit for designating an instruction cache miss ratio; a data cache miss ratio designating unit for designating a data cache miss ratio; stall information output from the stall detecting unit; Life Cache miss rate, and on the basis of the data cache miss rate obtained from the data cache miss rate designating means is to have a run time calculation means for calculating the execution time of the program.

【００２４】請求項２の発明に係るソースプログラム実
行時間評価装置の特徴は、ソースプログラムのフローを
解析し、そのフロー解析結果に基づいて分割プログラム
を生成すると共に、該各分割プログラムにおけるソース
プログラム上の位置情報、及び前記各分割プログラムの
実行回数を出力するソースプログラム分割手段と、前記
ソースプログラム分割手段から出力された前記位置情報
と前記実行回数と共に、命令キャッシュミス率の指定
値、データキャッシュミス率の指定値、及び分割プログ
ラム実行時間を、前記各分割プログラムに対応して記憶
する分割プログラム実行情報記憶手段と、請求項１の発
明に係るプログラム実行時間評価装置と、前記分割プロ
グラム実行情報記憶手段に記憶されたソースプログラム
上の位置情報から得られるソースプログラム部分を前記
命令キャッシュミス率と前記データキャッシュミス率と
共に前記プログラム実行時間評価装置に入力し、得られ
た分割プログラムの実行時間を、前記分割プログラム実
行情報記憶手段中の分割プログラム実行時間の記憶領域
へ出力する分割プログラム実行時間測定手段と、前記分
割プログラム実行情報記憶手段に記録された分割プログ
ラムの実行回数と実行時間とを集計して、プログラム実
行時間を算出する総プログラム実行時間算出手段とを有
することにある。A feature of the source program execution time evaluation apparatus according to the second aspect of the present invention is that a flow of a source program is analyzed, a divided program is generated based on a result of the flow analysis, and a source program in each divided program is generated. Source program dividing means for outputting the position information of each of the divided programs and the number of times of execution of each of the divided programs; a designated value of an instruction cache miss ratio, a data cache miss, together with the position information and the number of executions outputted from the source program dividing means; 2. A divided program execution information storage unit for storing a designated value of a rate and a divided program execution time corresponding to each of the divided programs, a program execution time evaluation device according to the invention of claim 1, and the divided program execution information storage. From the location information on the source program stored in the means. The source program portion to be executed is input to the program execution time evaluation device together with the instruction cache miss rate and the data cache miss rate, and the obtained execution time of the divided program is stored in the divided program execution information storage means. A program execution time measuring means for outputting the program execution time to the storage area, and a total program execution time calculation for calculating the program execution time by summing up the number of executions and the execution time of the divided program recorded in the program execution information storage means Means.

【００２５】請求項３の発明に係るソースプログラム実
行時間評価装置の特徴は、請求項２記載のソースプログ
ラム実行時間評価装置において、前記ソースプログラム
分割手段のフロー解析結果に基づいて前記分割プログラ
ム中のデータアクセス情報を検出するデータアクセス情
報検出手段と、前記分割プログラム実行情報記憶手段に
前記データアクセス情報を記憶するデータアクセス情報
記憶手段と、前記分割プログラム実行情報記憶手段に記
憶された前記分割プログラムの実行回数に基づいて前記
命令キャッシュミス率を設定すると共に、前記データア
クセス情報と前記分割プログラムの実行回数とに基づい
て前記データキャッシュミス率を設定するキャッシュミ
ス率設定手段とを有することにある。According to a third aspect of the present invention, in the source program execution time evaluating apparatus according to the second aspect of the present invention, the source program execution time estimating apparatus includes: A data access information detecting means for detecting data access information, a data access information storing means for storing the data access information in the divided program execution information storing means, and a program for storing the divided program stored in the divided program execution information storing means. A cache miss rate setting unit that sets the instruction cache miss rate based on the number of executions and sets the data cache miss rate based on the data access information and the number of times the divided program is executed.

【００２６】請求項４の発明に係るプログラム実行時間
評価方法の特徴は、プログラムを解釈する解釈処理と、
前記解釈処理によって得られたプログラムの解釈結果に
ついて、命令キャッシュとデータキャッシュがミスしな
かった場合の各ステージ状態、前記命令キャッシュがミ
スした場合の各ステージ状態、前記データキャッシュが
ミスした場合の各ステージ状態、及び前記命令キャッシ
ュと前記データキャッシュがミスした場合の各ステージ
状態をそれぞれ把握するステージ状態把握処理と、前記
ステージ状態把握処理によって得られるプログラムのパ
イプライン処理において発生するストールを検出し、そ
の検出したストール情報を出力するストール検出処理
と、前記ストール検出処理から出力されるストール情
報、所定の命令キャッシュミス率、及び所定のデータキ
ャッシュミス率に基づいて、前記プログラムの実行時間
を算出する実行時間算出処理とを実行することにある。A feature of the program execution time evaluation method according to the fourth aspect of the present invention is that an interpreting process for interpreting a program,
Regarding the interpretation result of the program obtained by the interpretation processing, each stage state when the instruction cache and data cache did not miss, each stage state when the instruction cache missed, and each stage state when the data cache missed Detecting a stage state, a stage state grasping process for grasping each stage state when the instruction cache and the data cache miss, and a stall occurring in a pipeline process of a program obtained by the stage state grasping process, Stall detection processing for outputting the detected stall information, and calculating the execution time of the program based on the stall information output from the stall detection processing, a predetermined instruction cache miss rate, and a predetermined data cache miss rate. Execution time calculation It is to execute the process.

【００２７】請求項５の発明に係るソースプログラム実
行時間評価方法の特徴は、ソースプログラムのフローを
解析し、そのフロー解析結果に基づいて分割プログラム
を生成して、該各分割プログラムにおけるソースプログ
ラム上の位置情報、及び前記各分割プログラムの実行回
数を得るソースプログラム分割処理と、前記ソースプロ
グラム上の位置情報から得られるソースプログラム部分
に対して、前記命令キャッシュミス率及び前記データキ
ャッシュミス率を用いて請求項４の発明に係るプログラ
ム実行時間評価方法を実行し、分割プログラムの実行時
間を求める実行時間測定処理と、前記実行時間測定処理
で得られた分割プログラムの実行時間と、前記ソースプ
ログラム分割処理で得られた分割プログラムの実行回数
とを集計して、総プログラム実行時間を算出する総プロ
グラム実行時間算出処理とを実行することにある。A feature of the source program execution time evaluation method according to the fifth aspect of the present invention is that a flow of a source program is analyzed, a divided program is generated based on a result of the flow analysis, and a source program in each divided program is generated. Using the instruction cache miss rate and the data cache miss rate for a source program division process for obtaining position information of the source program and the number of times of execution of each of the divided programs, and for a source program portion obtained from position information on the source program. And executing the program execution time evaluation method according to the invention of claim 4 to determine the execution time of the divided program; executing the divided program obtained in the execution time measurement process; Aggregate the number of executions of the divided program obtained in the process and It is to run the total program execution time calculating process for calculating a program execution time.

【００２８】請求項６の発明に係るソースプログラム実
行時間評価方法の特徴は、請求項５に係る発明のソース
プログラム実行時間評価方法おいて、前記ソースプログ
ラム分割処理のフロー解析結果に基づいて前記分割プロ
グラム中のデータアクセス情報を検出するデータアクセ
ス情報検出処理と、前記ソースプログラム分割処理で得
られた前記分割プログラムの実行回数に基づいて前記命
令キャッシュミス率を設定すると共に、前記データアク
セス情報と前記分割プログラムの実行回数とに基づいて
前記データキャッシュミス率を設定するキャッシュミス
率設定処理とを実行することにある。According to a sixth aspect of the present invention, in the method of evaluating a source program execution time according to the fifth aspect of the present invention, the division of the source program is performed based on a flow analysis result of the source program dividing process. A data access information detection process for detecting data access information in a program, and the instruction cache miss rate is set based on the number of executions of the divided program obtained in the source program division process, and the data access information and the And a cache miss ratio setting process for setting the data cache miss ratio based on the number of executions of the divided program.

【００２９】[0029]

【発明の実施の形態】以下、本発明に係わる実施の形態
について説明する。Embodiments of the present invention will be described below.

【００３０】（第１実施形態）図１は、本発明の第１実
施形態に係るプログラム実行時間評価装置の機能構成を
示すブロック図である。(First Embodiment) FIG. 1 is a block diagram showing a functional configuration of a program execution time evaluation device according to a first embodiment of the present invention.

【００３１】このプログラム実行時間評価装置１０は、
マイクロプロセッサ用ソースプログラムを読み込むソー
スプログラム読取り部１１と、読み込まれたソースプロ
グラムを解釈するソースプログラム解釈部１２とを有し
ている。さらに、ソースプログラム解釈部１２の出力側
には、順次、ステージ状態把握部（１）と、ステージ状
態把握部（２）と、ステージ状態把握部（３）と、ステ
ージ状態把握部（４）とが接続されている。The program execution time evaluation device 10 comprises:
It has a source program reader 11 for reading a microprocessor source program, and a source program interpreter 12 for interpreting the read source program. Further, on the output side of the source program interpreting section 12, a stage state grasping section (1), a stage state grasping section (2), a stage state grasping section (3), and a stage state grasping section (4) are sequentially provided. Is connected.

【００３２】ステージ状態把握部（１）は、ソースプロ
グラム解釈部１２によって得られたソースプログラムの
解釈結果について、命令キャッシュとデータキャッシュ
がミスしなかった場合（Ｉｍｉｓｓ＝０％，Ｄｍｉｓｓ
＝０％）の各ステージ状態を把握する機能を有し、ステ
ージ状態把握部（２）は、命令キャッシュがミスした場
合（Ｉｍｉｓｓ＝１００％，Ｄｍｉｓｓ＝０％）の各ス
テージ状態を把握する機能を有する。また、ステージ状
態把握部（３）は、データキャッシュがミスした場合
（Ｉｍｉｓｓ＝０％，Ｄｍｉｓｓ＝１００％）の各ステ
ージ状態を把握する機能を有し、ステージ状態把握部
（４）は、命令キャッシュとデータキャッシュがミスし
た場合（Ｉｍｉｓｓ＝１００％，Ｄｍｉｓｓ＝１００
％）の各ステージ状態を把握する機能を有する。The stage state grasping unit (1) determines that the instruction cache and the data cache do not miss the interpretation result of the source program obtained by the source program interpretation unit 12 (Imiss = 0%, Dmiss
= 0%), and the stage state grasping unit (2) has a function of grasping each stage state when the instruction cache misses (Imiss = 100%, Dmiss = 0%). Having. Further, the stage state grasping section (3) has a function of grasping each stage state when the data cache misses (Imiss = 0%, Dmiss = 100%). When the cache and the data cache miss (Imiss = 100%, Dmiss = 100
%) Has a function of grasping the state of each stage.

【００３３】そして、これら各ステージ状態把握部
（１）〜（４）の出力側には、ストール検出部１３が接
続されている。ストール検出部１３は、前記各ステージ
状態把握部（１）〜（４）によって得られる、ソースプ
ログラムのパイプライン処理において発生するストール
をそれぞれ検出し、その検出した各ストール情報ＳＴ１
〜ＳＴ４を出力する機能を有している。A stall detecting unit 13 is connected to the output side of each of the stage state grasping units (1) to (4). The stall detection unit 13 detects a stall occurring in the pipeline processing of the source program, obtained by each of the stage state grasping units (1) to (4), and detects the detected stall information ST1.
To ST4.

【００３４】また、命令キャッシュミス率を指定する命
令キャッシュミス率指定部１４と、データキャッシュミ
ス率を指定するデータキャッシュミス率指定部１５とを
備え、命令キャッシュミス率指定部１４、データキャッ
シュミス率指定部１５及び前記ストール検出部１３の出
力側が実行時間算出部１６に接続されている。An instruction cache miss ratio designating unit 14 for designating an instruction cache miss ratio and a data cache miss ratio designating unit 15 for designating a data cache miss ratio are provided. The output sides of the rate specifying unit 15 and the stall detecting unit 13 are connected to the execution time calculating unit 16.

【００３５】実行時間算出部１６は、ストール検出部１
３から出力されるストール情報ＳＴ１〜ＳＴ４と、命令
キャッシュミス率指定部１４から得られる命令キャッシ
ュミス率Ｘと、データキャッシュミス率指定部１５から
得られるデータキャッシュミス率Ｙとからソースプログ
ラムの実行時間を算出する機能を有する。具体的には、
実行時間算出部１６は、４つの乗算器１６ａ〜１６ｄと
１つの加算器１６ｅを有している。４つの乗算器１６ａ
〜１６ｄは、ストール検出部１３において検出された各
ステージ状態把握部（１）〜（４）に対応したストール
情報ＳＴ１〜ＳＴ４と、命令キャッシュミス率Ｘとデー
タキャッシュミス率Ｙに基づいたパラメータとの乗算を
それぞれ実行する。そして、各乗算器１６ａ〜１６ｄの
乗算結果が加算器１６ｅで加算されることにより、ソー
スプログラムの実行時間が算出される。The execution time calculation unit 16 includes the stall detection unit 1
3, execution of the source program from stall information ST1 to ST4 output from instruction cache error rate X obtained from instruction cache miss rate specifying section 14, and data cache miss rate Y obtained from data cache miss rate specifying section 15. It has a function to calculate time. In particular,
The execution time calculation unit 16 has four multipliers 16a to 16d and one adder 16e. Four multipliers 16a
16 to 16d are stall information ST1 to ST4 corresponding to the respective stage state grasping units (1) to (4) detected by the stall detecting unit 13, and parameters based on the instruction cache miss rate X and the data cache miss rate Y. Respectively. Then, the multiplication results of the multipliers 16a to 16d are added by the adder 16e to calculate the execution time of the source program.

【００３６】ここで、乗算器１６ａ，１６ｂ，１６ｃ，
１６ｄに各々入力されるパラメータは、それぞれ、（１
−Ｘ）（１−Ｙ）、Ｘ（１−Ｙ）、（１−Ｘ）Ｙ、ＸＹ
の各式で表される。Here, the multipliers 16a, 16b, 16c,
16d are (1
-X) (1-Y), X (1-Y), (1-X) Y, XY
Are represented by the following equations.

【００３７】本実施形態では、命令キャッシュミス率と
データキャッシュミス率による実行時間計算に線形計画
的な方法を使用している。極短いプログラムについて
は、このような算出方法で十分精度が得られることを、
以下、図２、図３、及び図４を参照しつつ説明する。な
お、図２は、本実施形態のプログラム例を示す図であ
り、図３（ａ）〜（ｄ）は、本実施形態のプログラムパ
イプライン動作を示す図である。また、図４（ａ）〜
（ｅ）は、実際に近いパイプライン動作を示す図であ
る。In this embodiment, a linear programming method is used for calculating the execution time based on the instruction cache miss rate and the data cache miss rate. For extremely short programs, it is necessary to confirm that such a calculation method can provide sufficient accuracy.
Hereinafter, description will be given with reference to FIGS. 2, 3, and 4. FIG. FIG. 2 is a diagram showing a program example of the present embodiment, and FIGS. 3A to 3D are diagrams showing a program pipeline operation of the present embodiment. In addition, FIG.
(E) is a diagram showing a pipeline operation that is close to actual.

【００３８】なお、例として、Ｃ言語で記述された配列
からの読み出しと変数への代入を考え、図２には、その
コンパイル結果（命令Ｉ１〜Ｉ４）が示されている。As an example, consider the case of reading from an array described in the C language and assigning it to a variable. FIG. 2 shows the compilation results (instructions I1 to I4).

【００３９】図１に示した構成において、命令キャッシ
ュミス率を０％，１００％とし、且つデータキャッシュ
ミス率を０％，１００％としたときのパイプラインの様
子を図３（ａ）〜（ｄ）に示す。具体的には、図３
（ａ）は、命令キャッシュミス率及びデータキャッシュ
ミス率を共に０％としたときのパイプラインの様子を示
し、図３（ｂ）は、命令キャッシュミス率を１００％且
つデータキャッシュミス率を０％としたときのパイプラ
インの様子を示し、図３（ｃ）は、命令キャッシュミス
率を０％且つデータキャッシュミス率を１００％とした
ときのパイプラインの様子を示し、図３（ｄ）は、命令
キャッシュミス率及びデータキャッシュミス率を共に１
００％としたときのパイプラインの様子を示している。In the configuration shown in FIG. 1, the pipeline states when the instruction cache miss rate is 0% and 100% and the data cache miss rate is 0% and 100% are shown in FIGS. It is shown in d). Specifically, FIG.
3A shows the state of the pipeline when both the instruction cache miss rate and the data cache miss rate are set to 0%, and FIG. 3B shows the pipeline when the instruction cache miss rate is 100% and the data cache miss rate is 0%. %, And FIG. 3C shows the pipeline when the instruction cache miss ratio is 0% and the data cache miss ratio is 100%, and FIG. Means that both the instruction cache miss rate and the data cache miss rate are 1
The state of the pipeline when it is set to 00% is shown.

【００４０】命令キャッシュミス率及びデータキャッシ
ュミス率を共に０％としたときのケース（図３（ａ））
では、命令Ｉ１から命令Ｉ４までパイプラインストール
は発生せず、実行サイクルＦ，Ｄ，Ｅ，Ｍ，Ｗが１サイ
クルずつずれて、全体で８サイクルで実行されている。A case where both the instruction cache miss rate and the data cache miss rate are 0% (FIG. 3 (a))
In this case, the pipeline stall does not occur from the instruction I1 to the instruction I4, and the execution cycles F, D, E, M, and W are shifted by one cycle, and are executed in eight cycles in total.

【００４１】命令キャッシュミス率を１００％且つデー
タキャッシュミス率を０％としたときのケース（図３
（ｂ））では、命令Ｉ１から命令Ｉ４までの各実行サイ
クルＦで３サイクルのパイプラインストールがそれぞれ
発生し、その結果、全体が２０サイクルで実行され、＋
１２サイクルのストールが発生している。The case where the instruction cache miss rate is 100% and the data cache miss rate is 0% (FIG. 3)
In (b)), three cycles of pipeline stall occur in each execution cycle F from instruction I1 to instruction I4, and as a result, the whole is executed in 20 cycles, and +
A 12-cycle stall has occurred.

【００４２】命令キャッシュミス率を０％且つデータキ
ャッシュミス率を１００％としたときのケース（図３
（ｃ））では、命令Ｉ３の実行サイクルＭで３サイクル
のパイプラインストールが発生し、さらに命令Ｉ４の実
行サイクルＥで３サイクルのパイプラインストールが発
生している結果、全体が１１サイクルで実行され、＋３
サイクルのストールが発生している。The case where the instruction cache miss rate is 0% and the data cache miss rate is 100% (FIG. 3)
In (c)), a three-cycle pipeline stall occurs in the execution cycle M of the instruction I3 and a three-cycle pipeline stall occurs in the execution cycle E of the instruction I4. +3
A cycle stall has occurred.

【００４３】命令キャッシュミス率及びデータキャッシ
ュミス率を共に１００％としたときのケースでは（図３
（ｄ））、命令Ｉ１，Ｉ２，Ｉ３の各実行サイクルＦで
３サイクルのパイプラインストールがそれぞれ発生し、
さらに命令Ｉ３の実行サイクルＭで３サイクルのパイプ
ラインストールが発生し、命令Ｉ４の実行サイクルＦで
６サイクルのパイプラインストールが発生している結
果、全体が２３サイクルで実行され、＋１５サイクルの
ストールが発生している。In the case where both the instruction cache miss rate and the data cache miss rate are 100% (FIG. 3
(D)) In each execution cycle F of the instructions I1, I2, and I3, a pipeline stall of three cycles occurs,
Further, a pipeline stall of three cycles occurs in the execution cycle M of the instruction I3, and a pipeline stall of six cycles occurs in the execution cycle F of the instruction I4. As a result, the whole is executed in 23 cycles and a stall of +15 cycles occurs. are doing.

【００４４】しかし、実際のマイクロプロセッサの場合
においては、図３（ｂ）と図３（ｄ）に示した状況は起
こりにくい。なぜなら、命令局所性の原理によって、命
令キャッシュのミス率は最高でも２０％程度であろうか
らである。もしそうであれば、命令キャッシュのミスは
５命令に１回の割合で起こるため、図２に示した４つの
命令（Ｉ１〜Ｉ４）に関して、全く命令キャッシュミス
が起こらないか、あるいは１命令のみがキャッシュミス
を起こすと考えられる。However, in the case of an actual microprocessor, the situations shown in FIGS. 3B and 3D are unlikely to occur. This is because, due to the principle of instruction locality, the miss rate of the instruction cache will be at most about 20%. If so, the instruction cache miss occurs once in every five instructions, so that no instruction cache miss occurs for the four instructions (I1 to I4) shown in FIG. May cause a cache miss.

【００４５】これを考慮した実際に近いパイプライン動
作を図４（ａ）〜（ｅ）に示す。FIGS. 4 (a) to 4 (e) show a pipeline operation close to the actual operation taking this into consideration.

【００４６】命令キャッシュミス率を２０％且つデータ
キャッシュミス率を０％としたときのケース（図４
（ａ））では、命令Ｉ１の実行サイクルＦで３サイクル
のパイプラインストールが発生している結果、全体が１
１サイクルで実行され、＋３サイクルのストールが発生
している。The case where the instruction cache miss rate is 20% and the data cache miss rate is 0% (FIG. 4)
In (a)), three cycles of pipeline stall occur in the execution cycle F of the instruction I1, and as a result,
Execution is performed in one cycle, and a +3 cycle stall has occurred.

【００４７】命令キャッシュミス率を２０％且つデータ
キャッシュミス率を０％としたときのケース（図４
（ａ））では、命令Ｉ１の実行サイクルＦで３サイクル
のパイプラインストールが発生している結果、全体が１
１サイクルで実行され、＋３サイクルのストールが発生
している。A case where the instruction cache miss rate is 20% and the data cache miss rate is 0% (FIG. 4)
In (a)), three cycles of pipeline stall occur in the execution cycle F of the instruction I1, and as a result,
Execution is performed in one cycle, and a +3 cycle stall has occurred.

【００４８】命令キャッシュミス率を２０％且つデータ
キャッシュミス率を１００％としたときの第１のケース
（図４（ｂ））では、命令Ｉ１の実行サイクルＦで３サ
イクルのパイプラインストールが発生し、さらに命令Ｉ
３の実行サイクルＭで３サイクル、命令Ｉ４の実行サイ
クルＥで３サイクルのパイプラインストールが発生して
いる結果、全体が１４サイクルで実行され、＋６サイク
ルのストールが発生している。In the first case (FIG. 4B) where the instruction cache miss rate is 20% and the data cache miss rate is 100%, three cycles of pipeline stall occur in the execution cycle F of the instruction I1. And the instruction I
As a result of the execution of the pipeline stall of 3 cycles in the execution cycle M of 3 and the execution cycle E of the instruction I4, the entire execution is performed in 14 cycles and a stall of +6 cycles occurs.

【００４９】命令キャッシュミス率を２０％且つデータ
キャッシュミス率を１００％としたときの第２のケース
（図４（ｃ））では、命令Ｉ２の実行サイクルＦで３サ
イクルのパイプラインストールが発生し、さらに命令Ｉ
３の実行サイクルＭで３サイクル、命令Ｉ４の実行サイ
クルＥで３サイクルのパイプラインストールが発生して
いる結果、全体が１４サイクルで実行され、＋６サイク
ルのストールが発生している。In the second case (FIG. 4C) where the instruction cache miss rate is 20% and the data cache miss rate is 100%, three cycles of pipeline stall occur in the execution cycle F of the instruction I2. And the instruction I
As a result of the execution of the pipeline stall of 3 cycles in the execution cycle M of 3 and the execution cycle E of the instruction I4, the entire execution is performed in 14 cycles and a stall of +6 cycles occurs.

【００５０】命令キャッシュミス率を２０％且つデータ
キャッシュミス率を１００％としたときの第３のケース
（図４（ｄ））では、命令Ｉ３の実行サイクルＦで３サ
イクルのパイプラインストールが発生し、さらに命令Ｉ
３の実行サイクルＭで３サイクル、命令Ｉ４の実行サイ
クルＥで３サイクルのパイプラインストールが発生して
いる結果、全体が１４サイクルで実行され、＋６サイク
ルのストールが発生している。In the third case (FIG. 4D) where the instruction cache miss rate is 20% and the data cache miss rate is 100%, three cycles of pipeline stall occur in the execution cycle F of the instruction I3. And the instruction I
As a result of the execution of the pipeline stall of 3 cycles in the execution cycle M of 3 and the execution cycle E of the instruction I4, the entire execution is performed in 14 cycles and a stall of +6 cycles occurs.

【００５１】命令キャッシュミス率を２０％且つデータ
キャッシュミス率を１００％としたときの第４のケース
（図４（ｅ））では、命令Ｉ３の実行サイクルＭで５サ
イクルのパイプラインストールが発生し、さらに命令Ｉ
４の実行サイクルＦで７サイクルのパイプラインストー
ルが発生している結果、全体が１５サイクルで実行さ
れ、＋７サイクルのストールが発生している。In the fourth case (FIG. 4 (e)) where the instruction cache miss rate is 20% and the data cache miss rate is 100%, five cycles of pipeline stall occur in the execution cycle M of the instruction I3. And the instruction I
As a result of the 7-cycle pipeline stall occurring in the execution cycle F of 4, the entire execution is performed in 15 cycles, and a stall of +7 cycles occurs.

【００５２】このように、図３と図４に示したパイプラ
イン動作から、命令キャッシュミスが平均的に起こると
して４命令で＋３サイクル、データキャッシュミスが起
これば、さらに＋３サイクルのストールがおきているの
で、４命令でおおよそ＋６サイクルのストールが発生し
ていることになる。この状況を図１に示した本実施形態
のプログラム実行時間評価装置で評価してみると、スト
ールサイクル数As described above, from the pipeline operations shown in FIGS. 3 and 4, assuming that an instruction cache miss occurs on average, a stall of +3 cycles occurs for four instructions and a further +3 cycle occurs if a data cache miss occurs. Therefore, a stall of approximately +6 cycles is generated by four instructions. When this situation is evaluated by the program execution time evaluation apparatus of the present embodiment shown in FIG.

【数２】＝ＳＴ２×Ｘ×（１−Ｙ）＋ＳＴ３×（１−Ｘ）×Ｙ＋ＳＴ４×Ｘ×Ｙ＝１２×０．２×（１−１）＋３×（１−０．２）×１＋１５×０．２×１＝５．４となり、それほどかけ離れた値にはならず、実際に近似
した数値になる。## EQU2 ## = ST2 × X × (1-Y) + ST3 × (1-X) × Y + ST4 × X × Y = 12 × 0.2 × (1-1) + 3 × (1-0.2) × 1 + 15 × 0.2 × 1 = 5.4, which is a value that is not so far away but a value that is actually approximated.

【００５３】そして、上記の方式で得られたストールサ
イクル数を、前述した従来の関係式（１）に代入するこ
とにより、実行時間が算出される。Then, the execution time is calculated by substituting the number of stall cycles obtained by the above method into the above-mentioned conventional relational expression (1).

【００５４】上記の如く本実施形態によれば、キャッシ
ュメモリアクセスに伴うストール発生を加味するように
したので、例えば極めて短いソースプログラムのストー
ルを検出する場合において、実行時間を高精度に算出す
ることができる。As described above, according to the present embodiment, the occurrence of stall due to cache memory access is taken into account. For example, when detecting the stall of an extremely short source program, the execution time can be calculated with high accuracy. Can be.

【００５５】（第２実施形態）図５は、本発明の第２実
施形態に係るソースプログラム実行時間評価装置の機能
構成を示すブロック図である。(Second Embodiment) FIG. 5 is a block diagram showing a functional configuration of a source program execution time evaluation device according to a second embodiment of the present invention.

【００５６】このソースプログラム実行時間評価装置２
０は、ソースプログラムフロー解析部２１と、分割プロ
グラム実行情報記憶部２２と、分割プログラム実行時間
測定部２３と、総プログラム実行時間算出部２４とを備
えている。This source program execution time evaluation device 2
0 includes a source program flow analysis unit 21, a divided program execution information storage unit 22, a divided program execution time measurement unit 23, and a total program execution time calculation unit 24.

【００５７】ソースプログラムフロー解析部２１は、マ
イクロプロセッサ用ソースプログラムを読み込んでプロ
グラムフローを解析し、このソースプログラムを、逐次
部２２Ｘと、ループ部２２Ｙと、サブルーチン部（手続
き呼び出し部分）２２Ｚとに分割し、この分割プログラ
ムのソースプログラム上の位置情報（以下、単にソース
位置情報と記す）を出力すると共に、各分割プログラム
実行回数を出力する機能を有する。The source program flow analysis unit 21 reads a microprocessor source program and analyzes the program flow, and converts the source program into a sequential unit 22X, a loop unit 22Y, and a subroutine unit (procedure call unit) 22Z. It has a function of outputting the position information (hereinafter simply referred to as source position information) of the divided program on the source program and outputting the number of times of execution of each divided program.

【００５８】分割プログラム実行情報記憶部２２は、各
分割プログラムに対応して、プログラムフロー解析部２
１から出力されたソース位置情報を記憶する記憶領域２
２ａと、命令キャッシュミス率の指定値を記憶する記憶
領域２２ｂと、データキャッシュミス率の指定値を記憶
する記憶領域２２ｃと、分割プログラムの実行時間を記
憶する記憶領域２２ｄと、分割プログラムの実行回数を
記憶する記憶領域２２ｅとを有している。The divided program execution information storage section 22 stores a program flow analysis section 2 corresponding to each divided program.
Storage area 2 for storing source position information output from 1
2a, a storage area 22b for storing the specified value of the instruction cache miss rate, a storage area 22c for storing the specified value of the data cache miss rate, a storage area 22d for storing the execution time of the divided program, and the execution of the divided program. And a storage area 22e for storing the number of times.

【００５９】また、分割プログラム実行時間測定部２３
は、前記分割プログラム実行情報記憶部２２に記憶され
たソース位置情報から得られるソースプログラム部分
を、前記命令キャッシュミス率とデータキャッシュミス
率と共に上記第１実施形態の評価装置１０に入力し、分
割プログラムの実行時間を得て、前記分割プログラム実
行情報記憶部２２中の分割プログラム実行時間の記憶領
域２２ｄに出力する機能を有する。The divided program execution time measuring section 23
Inputs the source program portion obtained from the source position information stored in the divided program execution information storage unit 22 to the evaluation device 10 of the first embodiment together with the instruction cache miss rate and the data cache miss rate, and It has a function of obtaining the execution time of the program and outputting it to the divided program execution time storage area 22d in the divided program execution information storage unit 22.

【００６０】さらに、総プログラム実行時間算出部２４
は、前記分割プログラム実行情報記憶部２２に記録され
た分割プログラム実行回数と分割プログラム実行時間と
を集計して、ソースプログラムの実行時間を算出する機
能を有する。Further, the total program execution time calculation section 24
Has a function of calculating the execution time of the source program by counting the number of times of execution of the divided program and the execution time of the divided program recorded in the divided program execution information storage unit 22.

【００６１】例えば図６に示すようなプログラムが入力
されると、まず、ソースプログラムフロー解析部２１
は、図６中の各矢印部分に対応して、図５中の分割プロ
グラム実行情報記憶部２２にソース位置と実行回数を記
憶する。For example, when a program as shown in FIG. 6 is input, first, the source program flow analysis unit 21
Stores the source position and the number of times of execution in the divided program execution information storage unit 22 in FIG. 5 corresponding to each arrow in FIG.

【００６２】次にプログラムの構成により、平均的な命
令キャッシュミス率とデータキャッシュミス率を入力す
る。例えば、図６のプログラムの関数ｆｕｎｃ（）は、
１０回実行されるｆｏｒループ中にあることから、命令
キャッシュのミス率は１０％であることが判る。このよ
うにして、各部の命令キャッシュのミス率とデータキャ
ッシュのミス率を手入力していく。Next, the average instruction cache miss rate and data cache miss rate are input according to the configuration of the program. For example, the function func () of the program in FIG.
Since it is in a for loop executed ten times, it can be seen that the miss rate of the instruction cache is 10%. In this way, the miss rate of the instruction cache and the miss rate of the data cache of each section are manually input.

【００６３】この入力が終わると、各プログラム部分の
１回実行当たりの実行時間を、第１実施形態で示した評
価装置に入力して得ておく。最終的に、各プログラム部
分の実行時間と実行回数を乗じて積算する。When this input is completed, the execution time for each execution of each program portion is input to the evaluation device shown in the first embodiment and obtained. Finally, multiplication is performed by multiplying the execution time of each program part by the number of executions.

【００６４】要するに、本実施形態では、入力されたマ
イクロプロセッサ用ソースプログラムを分割し、分割ソ
ースプログラムを作成する。各分割ソースプログラムに
ついて、命令キャッシュメモリに対するアクセスがミス
する／ミスしない、データキャッシュメモリに対するア
クセスミスする／ミスしないという条件を付加してスト
ール検出を行う。そして、各条件に対応したストール検
出結果から得られたそれぞれの実行時間を、分割ソース
プログラムに対応して記憶しておく。In short, in this embodiment, the input microprocessor source program is divided to create a divided source program. For each of the divided source programs, stall detection is performed by adding a condition that the access to the instruction cache memory is missed or not, and the access miss to the data cache memory is not missed. Then, each execution time obtained from the stall detection result corresponding to each condition is stored in correspondence with the divided source program.

【００６５】一方、入力されたソースプログラムの実行
フローを解析し、上記の分割ソースプログラムの実行回
数を得る。この実行回数も各分割ソースプログラムに対
応して記憶する。ユーザーは、各分割ソースプログラム
に対して、命令キャッシュミス率とデータキャッシュミ
ス率を指定する。On the other hand, the execution flow of the input source program is analyzed to obtain the number of executions of the divided source program. The number of executions is also stored for each divided source program. The user specifies an instruction cache miss rate and a data cache miss rate for each divided source program.

【００６６】そして、前記ストール検出結果と命令キャ
ッシュミス率とデータキャッシュミス率とから、それぞ
れの分割ソースプログラムを１回実行するのに要する時
間を求める。最後に、入力されたソースプログラムの実
行時間を、分割ソースプログラムの１回当りの実行時間
と実行回数を乗じたものを、全体に積算することで求め
る。Then, from the stall detection result, the instruction cache miss rate and the data cache miss rate, the time required to execute each divided source program once is obtained. Finally, the execution time of the input source program is determined by multiplying the execution time per execution of the divided source program by the number of executions, and integrating the result.

【００６７】このような方式により、入力されたマイク
ロプロセッサ用ソースプログラムの全体の実行時間を、
少ないストール検出で得ることができ、ストール検出の
効率性を高めることができる。With this method, the total execution time of the input microprocessor source program can be reduced by:
It can be obtained with a small amount of stall detection, and the efficiency of stall detection can be increased.

【００６８】（第３実施形態）図７は、本発明の第２実
施形態に係るプログラム実行時間評価装置の機能構成を
示すブロック図である。(Third Embodiment) FIG. 7 is a block diagram showing a functional configuration of a program execution time evaluation device according to a second embodiment of the present invention.

【００６９】このプログラム実行時間評価装置３０で
は、上記第２実施形態のプログラム実行時間評価装置に
おいて、前記ソースプログラムフロー解析部２１に、フ
ロー解析から分割プログラム中のデータアクセスを検出
してデータアクセス情報を出力する機能を追加し、これ
を新たにソースプログラムフロー解析部２１Ａとする。
さらに、分割プログラム実行情報記憶部２２の逐次部２
２Ｘ、ループ部２２Ｙ、及びサブルーチン部２２Ｚの各
エリアに、それぞれ、上記ソースプログラムフロー解析
部２１Ａから出力されるデータアクセス情報の記憶領域
２２ｆを追加している。そして、この分割プログラム実
行情報記憶部２２Ａに記憶されたソ一スプログラム実行
回数から命令キャッシュミス率を設定し、データアクセ
ス情報とソースプログラム実行回数からデータキャッシ
ュミス率を設定するキャッシュミス率設定手段３１が設
けられている。In the program execution time evaluation device 30, in the program execution time evaluation device of the second embodiment, the source program flow analysis unit 21 detects data access in the divided program from the flow analysis, and Is added, and this is newly designated as a source program flow analysis unit 21A.
Further, the sequential unit 2 of the divided program execution information storage unit 22
A storage area 22f for the data access information output from the source program flow analysis unit 21A is added to each area of the 2X, the loop unit 22Y, and the subroutine unit 22Z. A cache miss ratio setting means for setting an instruction cache miss ratio from the number of executions of the source program stored in the divided program execution information storage section 22A and setting a data cache miss ratio from the number of times of execution of the data access information and the source program 31 are provided.

【００７０】この実施形態によれば、上記第２実施形態
で設定していた命令キャッシュ及びデータキャッシュの
ミス率が、フロー解析の結果から判明する部分について
設定するものである。例えば、図６に示したプログラム
では、配列ａｒｒａｙ［］へのアクセスは１回しか起こ
っていない事を、分割プログラム実行情報のデータアク
セス情報２２ｆとして記憶しておく。キャッシュミス率
を設定する段階になったときに、この情報から、ａｒｒ
ａｙ［］へのアクセスは１００％キャッシュミスするこ
とが分かるので、これをデータキャッシュミス率として
自動設定する。According to this embodiment, the miss ratio of the instruction cache and the data cache set in the second embodiment is set for a portion which is found from the result of the flow analysis. For example, in the program shown in FIG. 6, the fact that access to the array [] occurs only once is stored as the data access information 22f of the divided program execution information. When it is time to set the cache miss rate, this information
Since it is known that the access to ay [] causes a 100% cache miss, this is automatically set as the data cache miss rate.

【００７１】また、プログラムフロー解析から、分割プ
ログラム部分の実行回数が判明していて、ソースプログ
ラムが十分小さいことが明らかであれば、命令キャッシ
ュのミス率も自動設定可能な部分がある。プログラムフ
ローから命令キャッシュのミス率が自明な場合として、
例えば１回しか通らないプログラムパスは１００％ミス
し、局所的に２回ループするコードでは５０％ミスとな
る、といった具合である。If the number of executions of the divided program portion is known from the program flow analysis and it is clear that the source program is sufficiently small, there is a portion where the miss rate of the instruction cache can be automatically set. If the instruction cache miss rate is obvious from the program flow,
For example, a program path that passes only once makes a 100% miss, and a code that loops twice locally makes a 50% miss, and so on.

【００７２】このように、本実施形態では、自明なキャ
ッシュミス率が判明した場合に、これを自動設定するよ
うにしたので、キャッシュミス率の設定を省力化し、ま
た実行時間の測定精度を高めることができる。As described above, in the present embodiment, when a trivial cache miss ratio is found, it is automatically set. Therefore, the setting of the cache miss ratio is labor-saving, and the measurement accuracy of the execution time is improved. be able to.

【００７３】[0073]

【発明の効果】以上詳細に説明したように、請求項１の
発明に係るプログラム実行時間評価装置及び請求項４の
発明に係るプログラム実行時間評価方法によれば、プロ
グラムに基づいた静的なストール検出にキャッシュメモ
リアクセスに伴うストール発生を加味したので、ストー
ル検出の精度を向上させることができ、ソースプログラ
ムの実行時間を正確に求めることが可能になる。As described above in detail, according to the program execution time evaluation apparatus according to the first aspect of the present invention and the program execution time evaluation method according to the fourth aspect of the present invention, a static stall based on a program is performed. Since the stall generation accompanying the cache memory access is added to the detection, the accuracy of the stall detection can be improved, and the execution time of the source program can be accurately obtained.

【００７４】請求項２の発明に係るソースプログラム実
行時間評価装置及び請求項５の発明に係るソースプログ
ラム実行時間評価方法によれば、ソースプログラムに基
づいた静的なストール検出にキャッシュメモリアクセス
に伴うストール発生を加味したので、ストール検出の精
度を向上させることができ、ソースプログラムの実行時
間を正確に求めることが可能になる。さらに、ストール
検出の効率的な実行によってソースプログラムの実行時
間を高速に測定することできる。According to the source program execution time evaluation apparatus of the second aspect and the source program execution time evaluation method of the fifth aspect, static stall detection based on the source program accompanies cache memory access. Since stall occurrence is taken into account, the accuracy of stall detection can be improved, and the execution time of the source program can be accurately obtained. Furthermore, the execution time of the source program can be measured at high speed by the efficient execution of stall detection.

【００７５】請求項３の発明に係るソースプログラム実
行時間評価装置及び請求項６の発明に係るソースプログ
ラム実行時間評価方法によれば、請求項２記載のソース
プログラム実行時間評価装置において、自明なキャッシ
ュミス率が判明した場合に、これを自動設定することが
できるので、キャッシュミス率の設定を省力化すること
ができ、また実行時間の測定精度を高めることが可能に
なる。According to the source program execution time evaluation device of the third aspect and the source program execution time evaluation method of the sixth aspect of the present invention, in the source program execution time evaluation device of the second aspect, a self-evident cache is used. When the miss rate is found, it can be automatically set, so that the setting of the cache miss rate can be saved and the execution time measurement accuracy can be improved.

[Brief description of the drawings]

【図１】本発明の第１実施形態に係るプログラム実行時
間評価装置の機能構成を示すブロック図である。FIG. 1 is a block diagram showing a functional configuration of a program execution time evaluation device according to a first embodiment of the present invention.

【図２】第１実施形態のプログラム例を示す図である。FIG. 2 is a diagram illustrating an example of a program according to the first embodiment;

【図３】第１実施形態のプログラムパイプライン動作を
示す図である。FIG. 3 is a diagram showing a program pipeline operation of the first embodiment.

【図４】実際に近いパイプライン動作を示す図である。FIG. 4 is a diagram showing a pipeline operation that is close to the actual operation.

【図５】本発明の第２実施形態に係るプログラム実行時
間評価装置の機能構成を示すブロック図である。FIG. 5 is a block diagram showing a functional configuration of a program execution time evaluation device according to a second embodiment of the present invention.

【図６】第２実施形態のプログラム例を示す図である。FIG. 6 is a diagram illustrating an example of a program according to a second embodiment.

【図７】本発明の第２実施形態に係るプログラム実行時
間評価装置の機能構成を示すブロック図である。FIG. 7 is a block diagram showing a functional configuration of a program execution time evaluation device according to a second embodiment of the present invention.

【図８】従来のストール検出表示装置の構成を示すブロ
ック図である。FIG. 8 is a block diagram showing a configuration of a conventional stall detection display device.

【図９】図８に示したストール検出表示装置を用いた開
発フローを示す図である。9 is a diagram showing a development flow using the stall detection display device shown in FIG.

[Explanation of symbols]

１０，２０，３０プログラム実行時間評価装置１１ソースプログラム読取り部１２ソースプログラム解釈部１３ストール検出部１４命令キャッシュミス率指定部１５データキャッシュミス率指定部１６実行時間算出部２１，２１Ａソースプログラムフロー解析部２２，２２Ａ分割プログラム実行情報記憶部２３分割プログラム実行時間測定部２４総プログラム実行時間算出部３１キャッシュミス率設定手段 10, 20, 30 Program execution time evaluation device 11 Source program reading unit 12 Source program interpretation unit 13 Stall detection unit 14 Instruction cache miss ratio designation unit 15 Data cache miss ratio designation unit 16 Execution time calculation unit 21, 21A Source program flow analysis Unit 22, 22A divided program execution information storage unit 23 divided program execution time measurement unit 24 total program execution time calculation unit 31 cache miss rate setting means

Claims

[Claims]

An interpreting means for interpreting a program; and an interpretation result of the program obtained by the interpreting means, each stage state when an instruction cache and a data cache do not miss, A stage state grasping means for grasping each stage state, each stage state when the data cache misses, and each stage state when the instruction cache and the data cache miss, and a stage state grasping means. Stall detection means for detecting stalls occurring in pipeline processing of a program and outputting the detected stall information; instruction cache miss rate designating means for designating an instruction cache miss rate; and data cache for designating a data cache miss rate. Miss rate Based on the stall information output from the stall detecting means, the instruction cache miss rate obtained from the instruction cache miss rate specifying means, and the data cache miss rate obtained from the data cache miss rate specifying means. A program execution time evaluation device, comprising: execution time calculation means for calculating an execution time of a program.

2. Analyzing the flow of a source program, generating a divided program based on the flow analysis result, and outputting position information on the source program in each of the divided programs and the number of times of execution of each of the divided programs. A source program dividing unit, the designated value of the instruction cache miss ratio, the designated value of the data cache miss ratio, and the divided program execution time, together with the position information and the number of executions output from the source program dividing unit. A divided program execution information storage unit that stores the program in association with the program; a program execution time evaluation device according to claim 1; and a source program part obtained from position information on the source program stored in the divided program execution information storage unit The instruction cache miss rate and the data A divided program execution time measuring unit that inputs the program execution time evaluation device together with the cache miss rate and outputs the obtained divided program execution time to a divided program execution time storage area in the divided program execution information storage unit; A total program execution time calculating means for calculating the program execution time by totalizing the number of executions and the execution time of the divided programs recorded in the divided program execution information storage means. Evaluation device.

3. An interpreting process for interpreting a program, and, regarding an interpreted result of the program obtained by the interpreting process, each stage state when an instruction cache and a data cache do not miss, and when an instruction cache misses, A stage state grasping process for grasping each stage state, each stage state when the data cache misses, and a stage state when the instruction cache and the data cache miss, respectively, are obtained by the stage state grasping process. A stall detection process for detecting a stall occurring in a pipeline process of a program and outputting the detected stall information; stall information output from the stall detection process, a predetermined instruction cache miss rate, and a predetermined data cache miss Based on rate And an execution time calculation process for calculating an execution time of the program.

4. A source for analyzing a flow of a source program, generating a divided program based on a result of the flow analysis, and obtaining position information on the source program in each of the divided programs and the number of times of execution of each of the divided programs. And executing the program execution time evaluation method according to claim 4 using the instruction cache miss ratio and the data cache miss ratio for a source program portion obtained from position information on the source program. The execution time measurement processing for obtaining the execution time of the divided program, the execution time of the divided program obtained in the execution time measurement processing, and the number of times of execution of the divided program obtained in the source program division processing are totaled. Execute the total program execution time calculation processing to calculate the program execution time Source program execution time evaluation method characterized by.