JPH1069389A

JPH1069389A - Device that make good use of branch predictive cache without tag by rearrangement of branch

Info

Publication number: JPH1069389A
Application number: JP9123631A
Authority: JP
Inventors: M Horaa Ann; アン・エム・ホラー; Y Shiyaa Lucky; ラッキー・ワイ・シャー
Original assignee: Hewlett Packard Co
Current assignee: HP Inc
Priority date: 1996-05-14
Filing date: 1997-05-14
Publication date: 1998-03-10
Also published as: US5721893A

Abstract

PROBLEM TO BE SOLVED: To improve the performance by a super scalar computer system by using static technologies without adding extra hardware. SOLUTION: The method which makes good use of the tagless branch predictive cache by rearranging branches after applying other all optimization as the final path of a compiling process generates a path for instructions in each subprogram and arranges all branches using the branch predictive cache(BPC) in buckets corresponding to respective positions in the BPC 315. Together with the branches, directions which are predicted by profiling data or static heuristics 320 are recorded 325. Then respective buckets are inspected 330. As for a bucket including branches whose predicted directions are contradictious 340, some of those branches try to be moved to other buckets 345. It is ideal to perform the movement locally without exerting any influence on the bucket positions of other branches.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明はコンピュータに関
し、特に、コンパイラ用の改善されたブランチ予測方法
に関する。FIELD OF THE INVENTION The present invention relates to computers, and more particularly, to an improved branch prediction method for a compiler.

【０００２】[0002]

【従来の技術】現代のスーパースカラコンピュータで
は、ブランチ命令は、その方向（即ち、とられる、とら
れない）が正確に予測できなければ、潜在的な性能のか
なりの遅延および損失をもたらす。これは、プロセッサ
は、ブランチがどこへ進むかを計算しようとしている間
にその他の動作を休止しなければならないためである。
プロセッサは、ブランチがどこへ進むかを予測できれ
ば、ブランチをフェッチでき、ブランチ動作に関する不
利益がなくなる。2. Description of the Related Art In modern superscalar computers, branch instructions introduce a significant delay and loss of potential performance if their direction (ie, taken or not taken) cannot be accurately predicted. This is because the processor must pause other operations while trying to calculate where the branch will go.
If the processor can predict where the branch will go, it can fetch the branch, eliminating the penalty associated with branch operation.

【０００３】ブランチ予測キャッシュ（ＢＰＣ）は、プ
ログラムのブランチに関する最近のブランチ実行履歴等
の情報を記憶することによって正確なブランチ予測に資
するハードウェア構成である。この情報は、後でブラン
チがとる方向を予測するのに使用する。特定のブランチ
の履歴には、そのブランチのアドレスの何らかの機能を
使用してアクセスする。その機能は、幾つかの別個のブ
ランチアドレスをテーブル上の同じエントリにマップす
る。タグなしＢＰＣでは、多数のブランチの履歴を識別
することができず、したがってブランチの履歴を組み合
わせる。[0003] A branch prediction cache (BPC) is a hardware configuration that contributes to accurate branch prediction by storing information such as recent branch execution history relating to a branch of a program. This information will be used later to predict the direction the branch will take. The history of a particular branch is accessed using some function of that branch's address. The function maps several distinct branch addresses to the same entry on the table. In untagged BPC, the histories of many branches cannot be identified, thus combining the histories of the branches.

【０００４】残念ながら、異なる「とられる−とられな
い(taken-not taken)」実行パターンを有する多数のブ
ランチがＢＰＣ内の同じエントリにマップする場合、そ
れらの履歴が干渉し合い、その結果としてブランチの予
測が不正確になり、関連する性能の不利益がもたらされ
る。さらに、同様の「とられる−とられない」実行パタ
ーンを有する多数のブランチがＢＰＣ内の異なるエント
リにマップする場合、ブランチの履歴を互いに強化し合
う機会が失われる。かかる強化は、予測精度を向上さ
せ、また他のブランチが使用するためにＢＰＣスロット
を開放することを可能にするものである。[0004] Unfortunately, when multiple branches with different "taken-not taken" execution patterns map to the same entry in the BPC, their histories interfere and as a result Incorrect branch prediction leads to associated performance penalties. Further, if multiple branches with similar "taken-not taken" execution patterns map to different entries in the BPC, the opportunity to reinforce the branch histories with each other is lost. Such enhancements improve prediction accuracy and allow BPC slots to be opened for use by other branches.

【０００５】従来技術の解決策は、主としてハードウェ
アによる技法を必要とした。例えば、ＢＰＣのサイズを
大きくして矛盾の可能性を小さくしたり、または同じ位
置にマップするブランチが識別できるように、ＢＰＣに
タグを付ける。また、２レベル予測方法を使用して、以
前に実行されたブランチの「とられる−とられない」状
態をＢＰＣルックアップテーブルに組み込むことによっ
てブランチ予測の精度を向上させる。この技法では、同
じ位置にマップする別個のブランチを識別しやすい。上
記方法は、ハードウェアコストの増大及び過剰データの
記憶を必要とするものとなり、及び／又は要件となる比
較を実施する必要がある。更に、上記方法では、同じブ
ランチ間で共通の履歴を共有することが容易でない。[0005] Prior art solutions have required mainly hardware techniques. For example, the BPC may be increased in size to reduce the likelihood of inconsistency, or the BPC may be tagged so that branches that map to the same location can be identified. It also uses a two-level prediction method to improve the accuracy of branch prediction by incorporating the "taken-not-taken" state of previously executed branches into the BPC look-up table. With this technique, it is easy to identify distinct branches that map to the same location. Such a method would require increased hardware costs and storage of excess data, and / or required comparisons to be made. Further, in the above method, it is not easy to share a common history between the same branches.

【０００６】例えば、以下の文献を参照されたい。For example, refer to the following document.

【０００７】J．E．Smith、A Study of Branch Pre
diction Strategies、8th Symposium on Computer
Architecture、pp．135−148、May 1981（予測精度
の最大化を目的とするブランチ予測方法について論じた
もの） A．Smith、J．Lee、Branch Prediction Strategies
and Branch TargetBuffer Design、Computer、pp．6
−22、January 1984（科学、商業、コンパイラ及びス
ーパバイザの４つのIBM370作業負荷、およびCDC6400お
よびDEC PDP−11作業負荷に分類される26のプログラム
アドレストレースに基づいて優れた予測方法を選択する
ための系統的方法について論じたもの） S．T．Pan、K．So、J．T．Rameh、Improving the Acc
uracy of DynamicBranch Prediction Using Branc
h Correlation、ASPLOS、pp．76−84、October 1992
（特定のブランチの履歴からの情報だけでなく、他のブ
ランチの履歴からの情報をも組み込むことによって予測
精度を向上させる相関に基づく方法） D．Patterson、J．Hennessy、Computer Architectur
e：A Qualitative Approach、2d ed．、pp．262−27
8、Morgan Kaufmann Publishers Inc．1996（ブラン
チ予測機構の精度を向上させるブランチ予測の方法の検
討）[0007] E. Smith, A Study of Branch Pre
diction Strategies, 8th Symposium on Computer
Architecture, pp. 135-148, May 1981 (discussing branch prediction methods for maximizing prediction accuracy). Smith, J.M. Lee, Branch Prediction Strategies
and Branch TargetBuffer Design, Computer, pp. 6
-22, January 1984 (to select a good prediction method based on four IBM370 workloads of scientific, commercial, compiler and supervisor, and 26 program address traces classified as CDC6400 and DEC PDP-11 workloads) Discusses systematic methods. T. Pan, K. So, J. T. Rameh, Improving the Acc
uracy of DynamicBranch Prediction Using Branc
h Correlation, ASPLOS, pp. 76-84, October 1992
(Correlation-based method to improve prediction accuracy by incorporating not only information from the history of a specific branch but also information from the history of other branches) Patterson, J .; Hennessy, Computer Architectur
e: A Qualitative Approach, 2d ed. , Pp. 262-27
8, Morgan Kaufmann Publishers Inc. 1996 (Study of branch prediction method to improve accuracy of branch prediction mechanism)

【０００８】[0008]

【発明が解決しようとする課題】追加のハードウェアを
導入することなしに、スーパースカラコンピュータシス
テムの性能を向上させる技法、即ち静的な技法を使用し
て動的性能を向上させる技法を提供することが有利であ
ろう。SUMMARY OF THE INVENTION There is provided a technique for improving the performance of a superscalar computer system without introducing additional hardware, that is, a technique for improving dynamic performance using a static technique. It would be advantageous.

【０００９】[0009]

【課題を解決するための手段】本発明は、ブランチを再
配置することによってタグなしブランチ予測キャッシュ
を活用する方法および装置を提供する。コンパイルプロ
セスにおける最終パスとして、他の全ての最適化を適用
した後、各サブプログラム内の命令に対してパスを作成
し、ブランチ予測キャッシュ（ＢＰＣ）を使用する全て
のブランチを、ＢＰＣ内の各位置に対応するバケット内
にそれぞれ配置する。各ブランチと共に、プロファイリ
ングデータまたは静的ヒューリスティックに基づいて、
その予想される方向（とられる−とられない）を記録す
る。次いで各バケットを検査する。その予測した方向が
矛盾するブランチを含むバケットについて、それらのブ
ランチのうちの幾つかを他のバケットへ移動しようと試
みる。移動は、他のブランチのバケット位置に影響を及
ぼさないように、局所的に行うことが理想的である。他
のバケットの位置が影響を受けた場合、全てのＢＰＣブ
ランチのバケット位置を再計算した後で、バケット検査
プロセスを再開する。一致する予測した方向をそれぞれ
有するが現時点で別々のバケット内に入っているブラン
チを同じバケット内に移動すれば、ブランチの履歴が互
いに強化し合うようになる。ヒューリスティックを使用
すれば、どのブランチについて考えるか、どんなコード
変換を使用してブランチを再配置するか、また矛盾回避
ならびに一致強化を実施するかどうかを規定できる。SUMMARY OF THE INVENTION The present invention provides a method and apparatus for utilizing an untagged branch prediction cache by relocating branches. After applying all other optimizations as the final pass in the compilation process, a path is created for the instructions in each subprogram, and all branches using the branch prediction cache (BPC) are It is arranged in each bucket corresponding to the position. With each branch, based on profiling data or static heuristics,
Record the expected direction (taken-not taken). Then each bucket is inspected. For buckets containing branches whose predicted directions are inconsistent, try to move some of those branches to other buckets. Ideally, the movement is performed locally so as not to affect the bucket position of other branches. If the position of other buckets is affected, restart the bucket inspection process after recalculating the bucket positions of all BPC branches. Moving branches that each have a matching predicted direction, but are now in separate buckets, into the same bucket allows the branch histories to reinforce each other. Heuristics can be used to specify which branches to think about, what code transformations to use to relocate branches, and whether to implement conflict avoidance and match enhancement.

【００１０】[0010]

【発明の実施の形態】図１は、プロセッサキャッシュを
含むユニプロセッサコンピュータアーキテクチャ１０の
概要を示すブロック図である。同図において、プロセッ
サ11は、システムバス15と通信するキャッシュ12を含
む。システムメモリ13および１つまたは複数の入出力装
置14もシステムバスと通信する。FIG. 1 is a block diagram outlining a uniprocessor computer architecture 10 that includes a processor cache. In the figure, a processor 11 includes a cache 12 that communicates with a system bus 15. The system memory 13 and one or more input / output devices 14 also communicate with the system bus.

【００１１】コンパイル動作において、ユーザは、コン
ピュータ上で動作するプログラムであるソースコードプ
ログラムをコンパイラに与える。コンパイラは、ソース
コードを受容し、そのコードを処理し、対象となるコン
ピュータアーキテクチャ（例えばコンピュータアーキテ
クチャ10について最適化された実行可能ファイルをつく
り出す。In a compiling operation, a user gives a source code program, which is a program operating on a computer, to a compiler. A compiler accepts source code, processes the code, and creates an executable file optimized for a target computer architecture (eg, computer architecture 10).

【００１２】図２は、例えば図１に示されるコンピュー
タアーキテクチャ10と関連して使用されるソフトウェア
コンパイラ20の概要を示すブロック図である。コンパイ
ラフロントエンド構成要素21は、ソースコードファイル
100を読み取り、それを高レベル中間表現110に翻訳す
る。高レベルオプティマイザ22は、高レベル中間表現11
0をより効率的な形式に最適化する。コード生成器23
は、最適化された高レベル中間表現を低レベル中間表現
120に翻訳する。低レベルオプティマイザ24は、低レベ
ル中間表現120をより効率的な（機械実行可能な）形式
に最適化する。最後に、オブジェクトファイル生成器25
は、最適化された低レベル中間表現をオブジェクトファ
イル141内に書き出す。FIG. 2 is a block diagram outlining a software compiler 20 used in connection with, for example, the computer architecture 10 shown in FIG. Compiler front end component 21 is a source code file
Read 100 and translate it into a high-level intermediate representation 110. The high-level optimizer 22 has a high-level intermediate representation 11
Optimize 0 for a more efficient format. Code generator 23
Replaces the optimized high-level intermediate representation with the low-level intermediate representation
Translate to 120. The low-level optimizer 24 optimizes the low-level intermediate representation 120 into a more efficient (machine-executable) form. Finally, the object file generator 25
Writes the optimized low-level intermediate representation into the object file 141.

【００１３】オブジェクトファイル141は、リンカ26に
よってオブジェクトファイル140と共に処理され、コン
ピュータ10上で動作できる実行可能ファイル150が生成
される。本明細書に記載の発明では、実行可能ファイル
150は、それがコンピュータ10上で動作している場合、
後で低レベル中間表現120をよりよく最適化するために
低レベルオプティマイザ24によって使用できる実行プロ
ファイル160が生成されるように、コンパイラ20および
リンカ26によって実施できると仮定する。コンパイラ20
については以下で詳細に説明する。The object file 141 is processed by the linker 26 with the object file 140 to produce an executable file 150 operable on the computer 10. In the invention described in this specification, the executable file
150, if it is running on computer 10,
Suppose that it can be implemented by compiler 20 and linker 26 so that an execution profile 160 is generated that can later be used by low-level optimizer 24 to better optimize low-level intermediate representation 120. Compiler 20
Will be described in detail below.

【００１４】プログラムは、多数の手順からなり、ある
手順中のブランチは他の手順中のブランチと矛盾する可
能性がある。本発明は、かかる矛盾を直接的に扱うもの
ではない。そうではなく、本発明は、ある手順中のブラ
ンチを検査する技法を提供する。リンカは、かかる手順
を互いにどのように配置するかを選択する。本発明の一
態様は、幾つかのクロス手順問題を対象とするが、一般
には、本願で開示する技法は一度に１つの手順に対して
作用する。A program consists of a number of procedures, and branches in one procedure may conflict with branches in another procedure. The present invention does not address such inconsistencies directly. Instead, the present invention provides a technique for checking branches in a procedure. The linker chooses how to arrange such procedures with respect to each other. Although one aspect of the present invention addresses several cross-procedure problems, in general, the techniques disclosed herein operate on one procedure at a time.

【００１５】動作に際して、コンパイラは、典型的なコ
ンパイルプロセス、即ちコンパイルおよび最適化を行
う。したがって、本願の技法では手順中の全ての命令に
0から始まる番号を付ける。これは、事実上、各命令に
アドレスを与えるものとなる。また、本願の技法は複数
のバケットからなるアレイを作成する。それらのバケッ
トは、システムハードウェアによってブランチ命令が配
置されるブランチ予測キャッシュ内の位置を記号化する
ことを目的とするものである。システムハードウェアブ
ランチ予測キャッシュ内の各要素にバケットが１つずつ
存在する。したがって、本願の技法は、命令のウォーク
スルーを開始する手順を提供する。ブランチ命令に遭遇
した場合、およびそのブランチ命令がシステムハードウ
ェアによって予測できる範疇にある場合には必ず、その
ブランチ命令をバケット内に配置する。全ての命令を適
宜バケット内に配置した後、矛盾を探し出し、様々な所
定の判定基準の適用に基づいて矛盾する命令の再配置を
行うことができる。In operation, the compiler performs a typical compilation process, ie, compilation and optimization. Therefore, the technique of the present application applies to every instruction in the procedure.
Assign a number starting from 0. This effectively gives each instruction an address. Also, the present technique creates an array of buckets. These buckets are intended to symbolize the location in the branch prediction cache where the branch instruction will be placed by the system hardware. There is one bucket for each element in the system hardware branch prediction cache. Accordingly, the present technique provides a procedure for initiating instruction walkthrough. Whenever a branch instruction is encountered, and the branch instruction is in a category that can be predicted by system hardware, the branch instruction is placed in a bucket. After all instructions are properly placed in the bucket, the inconsistency can be located and the inconsistent instructions can be relocated based on the application of various predetermined criteria.

【００１６】システムハードウェアが予測を試みない幾
つかのブランチがある。システムハードウェアは、プロ
グラムカウンタに関連するブランチを予測する。システ
ムハードウェアは、ブランチが分岐しようとするターゲ
ットアドレスがレジスタ内にある場合には、そのブラン
チの予測は行わない。したがって、予測されない類のブ
ランチは、間接コールである。したがって、本発明は、
ブランチがとられない場合に、常に同じ場所に進むブラ
ンチ、即ち二通りにしか進めないブランチに適用するこ
とができる。There are several branches where the system hardware does not attempt to make predictions. The system hardware predicts the branch associated with the program counter. If the target address to which the branch is to branch is in a register, the system hardware does not make a prediction for that branch. Thus, the unpredictable kind of branch is an indirect call. Therefore, the present invention
When a branch is not taken, it can be applied to a branch that always goes to the same place, that is, a branch that goes only in two ways.

【００１７】とられるかまたはとられないブランチ命令
に遭遇した場合、本願の技法では、その命令を表す記録
を、その命令がシステムハードウェアによってマップさ
れるバケット内に落とす。その命令を表す記録をバケッ
ト内に落とすことに加えて、ブランチが進むことが予想
される方向の記述が作成される。かかる予想は、部分的
には、後方のブランチはループの一部になりやすいので
とられやすく、前方のブランチは様々な条件をテストす
るのに使用されるのでとられにくいという概念に基づく
ものである。したがって、本発明は、部分的には、ヒュ
ーリスティック、即ちブランチに関する経験に基づく推
測に基づくものである。When a branch instruction that is taken or not taken is encountered, the present technique drops a record representing the instruction into a bucket where the instruction is mapped by system hardware. In addition to dropping a record representing the instruction into the bucket, a description of the direction in which the branch is expected to proceed is created. Such predictions are based, in part, on the notion that backward branches are more likely to be part of a loop and are more likely to be taken, while earlier branches are less likely to be taken because they are used to test various conditions. is there. Thus, the present invention is based, in part, on heuristics, i.e., empirical inferences about branches.

【００１８】ブランチがとられるか否かについて決定し
た後、命令をバケット内に入れる。全ての命令をバケッ
ト内に入れた後、それらが予測可能なバケットであれ
ば、本発明は、そのバケットのウォークスルーを行う。
バケットが２つ以上のブランチ記録を含んでいる場合、
および多数のブランチ記録がそれらの予想において矛盾
する場合、即ちあるブランチ記録はとられることが予想
されあるブランチ記録はとられないことが予想される場
合には、システムハードウェア上で矛盾が生じる。かか
る矛盾に遭遇した場合、本発明は、矛盾するブランチの
幾つかをバケット外へ移動しようと試みる。このステッ
プは、オブジェクトコード内の命令ストリームを変更す
る。After deciding whether or not a branch is taken, the instruction is placed in a bucket. After all instructions are in the bucket, if they are predictable buckets, the present invention walks through the bucket.
If the bucket contains more than one branch record,
And if many branch records conflict in their expectations, i.e., one branch record is expected to be taken and some branch records are not expected to be taken, then a conflict occurs on the system hardware. If such a conflict is encountered, the present invention attempts to move some of the conflicting branches out of the bucket. This step changes the instruction stream in the object code.

【００１９】命令ストリームを変更することは必ずしも
容易ではない。矛盾する命令をノーオペレーション(no-
op)内に差し込むことによって下位に移動する場合に
は、全ての命令が下流側に移動することになる。かかる
場合には、下流側の全てのバケットを再構成しなければ
ならない。この方法は、計算に費用がかかる。Changing the instruction stream is not always easy. No operation (no-
If it moves down by inserting it into op), all instructions will move downstream. In such a case, all downstream buckets must be reconfigured. This method is computationally expensive.

【００２０】Hewlett-Packard社（カリフォルニア州パ
ロアルト）は、命令をブランチの直後に置くことがで
き、その命令がブランチの前に論理的に実行される、遅
延ブランチと呼ばれる特徴を有するシステムアーキテク
チャを提供する。ブランチを下位に移動し、２つの命令
をスワップすることができるが、この場合、現在の命令
の前に次の命令に論理的に遭遇しないことを示すブラン
チ内のビットをオンにする必要がある。この機構によ
り、他の全てのブランチを移動しなくてもブランチを下
位に移動することが可能となる。残念ながら、現在のバ
ケットのすぐ下位のバケット内にすでにブランチがあ
り、したがって命令を下位に移動すると新しい矛盾が発
生することになる。Hewlett-Packard (Palo Alto, Calif.) Provides a system architecture with a feature called a delayed branch, where instructions can be placed immediately after a branch and the instruction is executed logically before the branch. I do. The branch can be moved down and the two instructions swapped, but this requires turning on a bit in the branch that indicates that the next instruction is not logically encountered before the current instruction . This mechanism allows a branch to be moved down without having to move all other branches. Unfortunately, there is already a branch in the bucket immediately below the current bucket, so moving the instruction down will create a new inconsistency.

【００２１】図３は、本発明によるブランチの再配置に
よってタグなしブランチ予測キャッシュを活用する技法
の概要を示すブロック図である。プログラム手順200は
多数の命令205から構成される。本記載の技法は、命令
に番号210を付けて、バケットアレイ220を作成する。バ
ケットは、そのバケット内に入るブランチのリストのポ
インタと考えることができる。ポインタの総数は、シス
テムハードウェアブランチ予測キャッシュ230と等しい
サイズとなり、典型的なマシンでは、256個のエントリ
を有するものとなる。しかしながら、ブランチ予測キャ
ッシュは、目的とするアーキテクチャに合わせて大きく
又は小さくすることが可能である。FIG. 3 is a block diagram outlining a technique for utilizing an untagged branch prediction cache by branch relocation according to the present invention. The program procedure 200 is composed of a number of instructions 205. The described technique numbers the instructions 210 and creates a bucket array 220. A bucket can be thought of as a pointer to a list of branches that fall within the bucket. The total number of pointers will be equal in size to the system hardware branch prediction cache 230, with a typical machine having 256 entries. However, the branch prediction cache can be as large or small as desired for the intended architecture.

【００２２】最初は各バケットは空である。バケットお
よび命令を上述のように初期化した後、次のステップ
は、命令のウォークスルーを行うことである。例えば、
ｘがラベルｃに対してｙよりも大きい場合に命令ｉ_jが
条件付きブランチであると仮定する。かかるブランチ
は、そのブランチがとられるかまたはとられないかを予
想できるように前方または後方に進むことができる。ブ
ランチが検査されて、そのブランチがとられようとして
いるのか否かに関して情報に基づく決定が行われる。次
いで、ブランチのアドレスを使用してバケット内にマッ
プし、そのブランチおよびキャッシュを特徴づけるその
ブランチに関する記録が生成される。Initially, each bucket is empty. After initializing the buckets and instructions as described above, the next step is to perform an instruction walkthrough. For example,
Assume that instruction _ij is a conditional branch if x is greater than y for label c. Such a branch may go forward or backward so that it can predict whether the branch will be taken or not taken. The branch is examined and an informed decision is made as to whether the branch is about to be taken. A record is then generated for the branch that maps into the bucket using the address of the branch and characterizes the branch and the cache.

【００２３】このプロセスは、別のラベルとの別の比較
に関する別の条件付きブランチである別のブランチｉ_k
に遭遇するまで続行される。このブランチはとられよう
としていないものと仮定する。このブランチは、潜在的
に同じバケット内にマップする。しかしながら、命令ｉ
_kはとられないと判定されるものと仮定する。したがっ
て、かかる命令にはとられないものとして「ｎｔ(not t
aken)」とラベル付けされ、他の命令はとられるものと
して「ｔ(taken)」とラベル付けされる。命令のウォー
クスルーを行うこのプロセスを続行して、ブランチの記
述を行い、それらをバケット内に入れる。This process consists of another branch i _k, another conditional branch for another comparison with another label.
Until it encounters Assume that this branch is not being taken. This branch potentially maps into the same bucket. However, the instruction i
Suppose _k is determined not to be taken. Therefore, "nt (not t
aken) "and other instructions are labeled" t (taken) "as being taken. Continuing this process of walking through the instructions, describing the branches and putting them in buckets.

【００２４】システムハードウェアは、テーブルのサイ
ズのアドレスモジュロを使用して、命令を配置するバケ
ットを決定する。命令のアドレスが例えば257であり、
ブランチ予測キャッシュに0〜255の番号を付けた場合に
は、命令257はブランチ予測キャッシュ内の位置1に配置
される。したがって、それらの命令は、それらが同じア
ドレスモジュロを有する場合にハードウェア矛盾を引き
起こす。したがって、各バケットを２つ以上の命令に対
して使用しなければならない手順中に十分な命令が存在
する場合には、矛盾が生じる可能性がある。反対に、そ
の全手順が、ハードウェア内のキャッシュラインの数よ
りも少ない命令を有する場合には矛盾が生じないので命
令のウォークスルーを行う必要はない。The system hardware uses the address modulo of the size of the table to determine the bucket in which to place the instruction. The address of the instruction is, for example, 257,
If the branch prediction cache is numbered 0-255, instruction 257 is located at position 1 in the branch prediction cache. Therefore, those instructions cause a hardware conflict if they have the same address modulo. Thus, inconsistencies can occur if there are enough instructions in the procedure where each bucket must be used for more than one instruction. Conversely, if all of the procedures have fewer instructions than the number of cache lines in the hardware, there is no need to walk through the instructions since there is no conflict.

【００２５】本発明が全ての命令についてウォークスル
ーを行った後、バケットのウォークスルーを行う必要が
ある。ブランチ240間の矛盾が存在するバケットに遭遇
した場合、本発明はその矛盾を解決しようとする。After the present invention walks through all instructions, it is necessary to walk through the buckets. If the bucket encounters a conflict between the branches 240, the present invention seeks to resolve the conflict.

【００２６】（上述の）矛盾を回避する１つの方法は、
ブランチのうちの１つを下位に移動することによってそ
れを除去することである。したがって、矛盾するエント
リのうちの１つと次の１つ下位のエントリとが一致して
いる場合には、第１のエントリを次のバケットへと下位
に移動することができ、エントリが同じであるため矛盾
は生じない、ということをヒューリスティックが規定す
る。One way to avoid the inconsistency (described above) is
Removing one of the branches by moving it down. Thus, if one of the contradictory entries matches the next lower entry, the first entry can be moved down to the next bucket and the entries are the same Therefore, heuristics specify that no contradictions occur.

【００２７】したがって、本発明は、様々なブランチの
履歴を共有することを可能とし、実際に履歴を強化する
ことができる。したがって、２つのブランチの履歴が互
いに強化し合う場合には一方のブランチの履歴に関して
システムハードウェアが訓練されていない時間を経験す
る必要がないので、２つの命令が同じ場合にはブランチ
予測は通常よりも良好なものとなる。Therefore, the present invention enables the history of various branches to be shared, and can actually enhance the history. Thus, when the two instructions are the same, branch prediction is typically performed because the history of two branches does not need to experience untrained time with respect to the history of one branch when the history of one branch reinforces the other. Better than that.

【００２８】既にバケット内に矛盾が生じている場合に
は、ルックアヘッドを行うことが可能であるが、幾つか
の点で、システムは、ルックアヘッドを行い続け、バケ
ットの再編成に多くの時間を費やす。したがって、本発
明は好適には、どちらの命令が矛盾なしに下位に移動で
きるかを確かめるために１つのバケットのみをルックア
ヘッドする。移動できない場合には、本発明では、矛盾
を減らすために行えることがないか確かめるために、矛
盾の量を合計する。例えば、あるバケット内には同量の
矛盾が生じているが、次の別のバケット内にはとられな
いよりもとられる方がはるかに多い場合、本発明は、と
られる命令を下位の次のバケットに移動する。これは、
そうすることにより第２のバケット内の矛盾が減るから
であり、例えば、とられる選択が強化されるためであ
る。したがって、バケットテーブルは、コンパイラが分
岐に関する決定を行うのを助けるものとなる。If there is already an inconsistency in the bucket, it is possible to do a look-ahead, but at some point the system will continue to do the look-ahead and allow more time for the bucket to be reorganized. Spend. Therefore, the present invention preferably looks ahead only one bucket to see which instruction can move down without conflict. If not, the invention sums the amount of inconsistencies to see if anything can be done to reduce them. For example, if the same amount of inconsistency occurs in one bucket, but is much more likely to be taken than not taken in the next another bucket, the present invention will consider the instructions taken Move to bucket. this is,
This is because doing so reduces inconsistencies in the second bucket, for example, because the choices taken are strengthened. Thus, the bucket table helps the compiler make decisions about branches.

【００２９】バケットテーブルは、システムハードウェ
アが何をしようとするかに関するソフトウェア上の抽象
化と考えることができる。コンピュータがプログラムを
動作させている場合、バケットテーブルは一部のハード
ウェア内で表される。ブランチをシステムハードウェア
レベルでバケットのテーブル内で下に移動した場合、シ
ステムハードウェアは、オブジェクトコードがブランチ
に遭遇するまでそれを実行する。システムハードウェア
は、バケットデータ構造220と同じサイズであるブラン
チ予測キャッシュ230を有する。ブランチ予測キャッシ
ュは、全てではないが、幾つかのアドレスビットを有
し、表記「とられる」または「とられない」を有する。
実際、ブランチ予測キャッシュは、そのアドレスに関し
て発生した最後の３つの履歴事象を覚えている。したが
って、そのアドレスが最初にとられ、次いでとられず、
次いでとられた場合、ブランチ予測は、とられる方がと
られない方よりも多いことを示し、したがってとられる
方が選択される。これにより、ブランチ予測キャッシュ
は、履歴が変化した際に適応することができる。A bucket table can be thought of as a software abstraction of what the system hardware is trying to do. When a computer runs a program, a bucket table is represented in some hardware. When a branch is moved down in the table of buckets at the system hardware level, the system hardware executes it until object code encounters the branch. The system hardware has a branch prediction cache 230 that is the same size as the bucket data structure 220. The branch prediction cache has some, but not all, address bits and has the notation "taken" or "not taken".
In fact, the branch prediction cache remembers the last three historical events that occurred for that address. Therefore, that address is taken first, then not,
If taken, then the branch prediction indicates that more is taken than not taken, so the one taken is selected. This allows the branch prediction cache to adapt when the history changes.

【００３０】図４は、本発明によるブランチを再配置す
ることによってタグなしブランチ予測キャッシュを活用
する技法を示すフローチャートである。同図に示すよう
に、本願の技法は、命令が配置される正確なアドレスを
使用する必要がある（それが効果的となる場合）ので、
本発明は、低レベルオプティマイザ24と関連づけること
が好ましい。さもないと、マシン上で動作しようとして
いるオブジェクトコードが使用可能である場合に、コン
パイルプロセスの一番最後までかかるアドレスを知るこ
とができない。FIG. 4 is a flowchart illustrating a technique for utilizing an untagged branch prediction cache by relocating branches according to the present invention. As shown in the figure, the technique of the present application needs to use the exact address where the instruction is located (if that is effective),
The present invention is preferably associated with the low level optimizer 24. Otherwise, if the object code that is about to run on the machine is available, it will not be possible to know the address it takes to the end of the compilation process.

【００３１】コンパイルプロセスの最終パスとして、他
の全ての最適化を適用した後（300）、各サブプログラ
ム内の命令に対してパスを作成し（310）、ブランチ予
測キャッシュ（ＢＰＣ）を使用する全てのブランチを、
それぞれＢＰＣ内の各位置に対応するバケット内に配置
する（315）。各ブランチと共に、プロファイリングデ
ータかまたは静的ヒューリスティック（320）に基づい
て、その予想される方向（とられる、とられない）を記
録する（325）。次いで、各バケットを検査する（33
0）。その予測した方向が矛盾するブランチを含むバケ
ットの場合（340）、それらのブランチのうちの幾つか
を他のバケットへ移動しようと試みる（345）。この移
動は、他のブランチのバケット位置に影響を及ぼさない
よう局所的に行うことが理想的である。他のバケットの
位置が影響を受けた場合（350）には、全てのＢＰＣブ
ランチのバケット位置を再計算した後で（355）、バケ
ット検査プロセスを再開する（360）。一致する予測し
た方向を有するが、現在は別々のバケット内に入ってい
るブランチを同じバケット（370）内に移動すれば、ブ
ランチの履歴が互いに強化し合うようになる。ヒューリ
スティックを使用すれば、どのブランチについて考える
か、どんなコード変換を使用してブランチを再配置する
か、また矛盾回避ならびに一致強化を実施するかどうか
を規定することが可能となる。As a final pass of the compilation process, after applying all other optimizations (300), a pass is created for the instructions in each subprogram (310) and the branch prediction cache (BPC) is used. All branches,
Each is arranged in a bucket corresponding to each position in the BPC (315). With each branch, record its expected direction (taken or not taken) based on profiling data or static heuristics (320) (325). Then inspect each bucket (33
0). If the bucket contains branches whose predicted directions are inconsistent (340), an attempt is made to move some of those branches to other buckets (345). Ideally, this movement is performed locally so as not to affect the bucket position of another branch. If the position of other buckets is affected (350), the bucket inspection process is resumed (360) after recalculating the bucket positions of all BPC branches (355). Moving branches that have matching predicted directions, but are now in separate buckets, into the same bucket (370) will enhance the history of the branches. Using heuristics, it is possible to specify which branches to think about, what code transformations to use to relocate branches, and whether to implement conflict avoidance and match enhancement.

【００３２】以下は、ブランチを再配置することによっ
てタグなしブランチ予測キャッシュを活用するルーチン
の疑似コード例である。The following is a pseudo code example of a routine that takes advantage of the untagged branch prediction cache by relocating branches.

【００３３】1 プログラム中の各手順毎に、 2 その手順についてのコードをコンパイル及び最適化
し、 3 手順中の命令に０から始まる番号を付け、 4 バケットアレイを作成し、そのアレイの長さはブラ
ンチ予測キャッシュサイズであり、 5 手順中の各命令毎に、 6 ブランチ命令をハードウェアによって予測すべき場
合に、 7 命令の番号モジュロのブランチ予測キャッシュサイ
ズについてのバケット内に入れ、 8 ブランチが進むことが予測される方向を記述し、 9 各バケット毎に、 10 バケットが２つ以上のブランチを含む場合および方
向が矛盾する場合に、 11 矛盾するブランチの幾つかを次のバケットに移動し
ようと試みる。1 For each step in the program, 2 Compile and optimize the code for that step, 3 number the instructions in the step starting from 0, 4 create a bucket array, and the length of the array is The branch prediction cache size, for each instruction in 5 steps, 6 if the branch instruction is to be predicted by hardware, 7 is placed in the bucket for the modulo branch prediction cache size of the instruction, and 8 branches are advanced State the expected direction, 9 for each bucket, 10 try to move some of the conflicting branches to the next bucket if the bucket contains more than one branch and if the directions conflict. Try.

【００３４】したがって、本発明は、手順から命令ライ
ンをとり、番号順にそれらを整理することによって、プ
ログラムからの手順に対してブランチ予測を実施する。
次いで、本発明は、ブランチ命令を探し、そのブランチ
命令をモジュロ番号に基づいてバケットアレイに割り当
てる。このアレイは非常に大きいが、アレイ内にはバケ
ットよりも多くの命令が存在する。アレイ内の命令の数
がバケットより多くなければ、矛盾が生じることはな
く、本発明は、ブランチ命令をバケット内に入れる。Thus, the present invention implements branch prediction for a procedure from a program by taking the instruction lines from the procedure and organizing them in numerical order.
The present invention then looks for branch instructions and assigns the branch instructions to the bucket array based on the modulo number. This array is very large, but there are more instructions in the array than buckets. If the number of instructions in the array is not greater than the bucket, then no inconsistency occurs and the present invention places branch instructions in the bucket.

【００３５】しかしながら、多くの場合、命令の数は、
バケットアレイ内の位置の数よりも多い。しばらくする
と、バケットの幾つかが互いに矛盾するブランチ命令を
含むようになる。矛盾が生じたとき、その位置が同じ命
令ブランチを含む場合、または、下位の位置が空である
場合に、その矛盾する命令をバケットアレイ内で下位に
移動するかについて選択を行う必要がある。下位の位置
が別の矛盾をもたらす場合には、ブランチ命令を移動す
ることが有用であるかを決定する必要がある。ブランチ
予測キャッシュは、コードの動作中にこのブランチ情報
を収集し、コードの処理中に実際に何が起こるかに関し
て履歴を生成することができる。これは、プログラムに
データが入力されると、ブランチが（コンパイラが実際
のデータを使用していなかったため）コンパイル時に遭
遇したブランチと異なるものとなるからである。However, in many cases, the number of instructions is
More than the number of locations in the bucket array. After some time, some of the buckets will contain branch instructions that conflict with each other. When a conflict occurs, if the location contains the same instruction branch, or if the lower location is empty, a choice must be made as to whether to move the conflicting instruction lower in the bucket array. If the subordinate position results in another inconsistency, it is necessary to determine if it is useful to move the branch instruction. The branch prediction cache can collect this branch information during the operation of the code and generate a history as to what actually happens during the processing of the code. This is because when data is entered into the program, the branch will be different from the branch encountered at compile time (because the compiler was not using the actual data).

【００３６】本発明の代替的な実施形態では、別の変換
試行を用いて、とられる方を奇数のバケット内に入れ、
とられない方を偶数のバケット内に入れることができ
る。したがって、システムハードウェアをより有効に利
用する所定の方法でブランチを編成するようにアレイ内
で作業することができる。In an alternative embodiment of the invention, another conversion trial is used to place the one taken into an odd number of buckets,
The one not taken can be placed in an even number of buckets. Thus, one can work within the array to organize the branches in a predetermined manner that makes more efficient use of system hardware.

【００３７】以上、好適実施例に関して本発明の説明を
行ってきたが、当該実施例の代わりに本発明の思想およ
び範囲から逸脱することなく他の応用例を実施可能であ
ることが当業者には理解されよう。したがって、本発明
は、特許請求の範囲によってのみ画定されるべきであ
る。While the present invention has been described with reference to preferred embodiments, those skilled in the art will recognize that other applications may be made instead of the embodiments without departing from the spirit and scope of the invention. Will be understood. Accordingly, the invention should only be defined by the claims.

【００３８】以下においては、本発明の種々の構成要件
の組み合わせからなる例示的な実施態様を示す。[0038] In the following, an exemplary embodiment will be described which comprises a combination of various constituent features of the present invention.

【００３９】1．ブランチを再配置することによってタ
グなしブランチ予測キャッシュを活用する装置であっ
て、プログラム中の各手順毎に、前記手順コードをコン
パイルし最適化するコンパイラを含み、前記コンパイラ
が前記手順中の各命令に番号を付け、前記コンパイラが
バケットアレイを作成し、前記コンパイラが、予測すべ
き前記手順中の各ブランチ命令毎に前記命令の番号モジ
ュロのブランチ予測キャッシュサイズ用のバケット内に
命令を入れ、前記コンパイラが、前記ブランチ命令が進
むことが予想される方向（とられる、とられない）を記
録し、前記コンパイラが、各バケット毎に、前記バケッ
トが２つ以上のブランチを含む場合およびブランチの方
向が矛盾する場合に、矛盾するブランチを次のバケット
に移動しようと試みることを特徴とする装置。1. An apparatus for utilizing an untagged branch prediction cache by rearranging a branch, the compiler including, for each procedure in a program, compiling and optimizing the procedure code, wherein the compiler executes each instruction in the procedure. The compiler creates a bucket array, and the compiler places instructions in a bucket for a modulo branch prediction cache size of the instruction number for each branch instruction in the procedure to be predicted, A compiler records the direction in which the branch instruction is expected to be taken (taken or not taken), and for each bucket, the compiler indicates if the bucket contains more than one branch and the direction of the branch If conflicts occur, attempt to move the conflicting branch to the next bucket. Apparatus.

【００４０】2．前記バケットアレイの長さがブランチ
予測キャッシュサイズに等しいことを特徴とする、前項
１に記載の装置。2. 2. The apparatus according to claim 1, wherein a length of the bucket array is equal to a branch prediction cache size.

【００４１】3．後方ブランチがとられやすく、前方ブ
ランチがとられにくいことを特徴とする、前項１に記載
の装置。3. 2. The device according to item 1, wherein a rear branch is easily taken and a front branch is hard to be taken.

【００４２】4．前記コンパイラが２つのブランチの履
歴を使用して互いに強化し合うことを特徴とする、前項
１に記載の装置。4. The apparatus of claim 1, wherein the compiler reinforces each other using the histories of the two branches.

【００４３】5．前記コンパイラが矛盾を減らすことが
できるか確かめるために矛盾の量を合計することを特徴
とする、前項１に記載の装置。5. The apparatus of claim 1, wherein the compiler sums the amount of inconsistencies to see if the compiler can reduce inconsistencies.

【００４４】6．前記コンパイラが、方向選択を強化す
ることによって前記次のバケット内の矛盾を減らすため
に命令を下位の次のバケットへと移動することを特徴と
する、前項１に記載の装置。6. The apparatus of claim 1, wherein the compiler moves instructions to the next lower bucket to reduce inconsistencies in the next bucket by enhancing direction selection.

【００４５】7．前記バケットアレイと同じサイズであ
るシステムハードウェアブランチ予測キャッシュをさら
に含むことを特徴とする、前項１に記載の装置。7. The apparatus of claim 1, further comprising a system hardware branch prediction cache that is the same size as the bucket array.

【００４６】8．前記ブランチ予測キャッシュがブラン
チ履歴が変化した際に適応するように前記ブランチ予測
キャッシュが特定のアドレスに関する複数の最近の履歴
事象を記憶していることを特徴とする、前項１に記載の
装置。8. The apparatus of claim 1, wherein the branch prediction cache stores a plurality of recent history events for a particular address so that the branch prediction cache adapts as the branch history changes.

【００４７】9．タグなしブランチ予測キャッシュを活
用する装置であって、コンパイルプロセス中の最終パス
として、他の全ての最適化を適用した後に、サブプログ
ラム内の命令に対してパスを作成するコンパイラを含
み、前記コンパイラが、ブランチ予測キャッシュ（ＢＰ
Ｃ）を使用する全てのブランチを前記ＢＰＣ内の各位置
に対応するバケット内に配置し、前記コンパイラが、各
ブランチ命令と共に、プロファイリングデータまたは静
的ヒューリスティックに基づき予想される方向（とられ
る、とられない）を前記バケット内に記録し、前記コン
パイラが各バケットを検査し、その予測した方向が矛盾
するブランチを含むバケットについて、前記移動によ
り、前記他のバケットにおいて別の矛盾を発生させまた
は悪化させることなしに前記バケットにおいて前記矛盾
が削減または除去される場合に前記コンパイラが１つま
たは複数の前記ブランチ命令を他のバケットへ移動する
ことを特徴とする装置。9. An apparatus that utilizes an untagged branch prediction cache, comprising: a compiler that creates a path for instructions in a subprogram after all other optimizations have been applied as a final path during the compilation process; Is a branch prediction cache (BP
C) place all the branches using the buckets corresponding to each location in the BPC, and the compiler, along with each branch instruction, the expected direction (taken based on profiling data or static heuristics) Is not recorded in the bucket, and the compiler examines each bucket, and for a bucket containing a branch whose predicted direction is inconsistent, the movement may cause or exacerbate another inconsistency in the other bucket. The apparatus wherein the compiler moves one or more of the branch instructions to another bucket if the inconsistency is reduced or eliminated in the bucket without causing the bucket instruction to be moved.

【００４８】10．全てのＢＰＣのブランチ命令のバケッ
ト位置を再計算する手段と、前記バケット検査プロセス
を再開させる手段とをさらに含むことを特徴とする、前
項９に記載の装置。10. The apparatus of claim 9, further comprising: means for recalculating bucket positions of all BPC branch instructions; and means for restarting the bucket checking process.

【００４９】11．一致する予測した方向を有するが現在
別々のバケット内に入っているブランチ命令が同じバケ
ット内に移動されて、前記ブランチ命令の履歴が互いに
強化し合うことを特徴とする、前項９に記載の装置。11. 10. The apparatus of claim 9, wherein branch instructions having a matching predicted direction but currently in separate buckets are moved into the same bucket and the history of the branch instructions reinforce each other. .

【００５０】12．前記コンパイラが低レベルオプティマ
イザをさらに含むことを特徴とする、前項９に記載の装
置。12. The apparatus of claim 9, wherein the compiler further comprises a low-level optimizer.

[Brief description of the drawings]

【図１】ユニプロセッサコンピュータアーキテクチャの
概要を示すブロック図である。FIG. 1 is a block diagram showing an overview of a uniprocessor computer architecture.

【図２】例えば図1に示すコンピュータアーキテクチャ
と関連して使用されるソフトウェアコンパイラの概要を
示すブロック図である。FIG. 2 is a block diagram outlining a software compiler used in connection with, for example, the computer architecture shown in FIG.

【図３】本発明によるブランチを再配置することによっ
てタグなしブランチ予測キャッシュを活用する技法の概
要を示すブロック図である。FIG. 3 is a block diagram outlining a technique for utilizing an untagged branch prediction cache by relocating branches according to the present invention.

【図４】本発明によるブランチを再配置することによっ
てタグなしブランチ予測キャッシュを活用する技法を示
すフローチャートである。FIG. 4 is a flowchart illustrating a technique for utilizing an untagged branch prediction cache by relocating branches according to the present invention.

[Explanation of symbols]

200 プログラム手順 205 命令 220 バケットアレイ 230 ブランチ予測キャッシュ 240 ブランチ 200 Program Procedure 205 Instruction 220 Bucket Array 230 Branch Predictive Cache 240 Branch

Claims

[Claims]

1. An apparatus for utilizing an untagged branch prediction cache by rearranging a branch, comprising: for each procedure in a program, a compiler for compiling and optimizing the procedure code; Numbering each instruction in the procedure; the compiler creating a bucket array; the compiler for each branch instruction in the procedure to be predicted, in the bucket for the instruction number modulo branch prediction cache size Instructions, wherein the compiler records the direction (taken or not taken) in which the branch instruction is expected to proceed;
Apparatus characterized by attempting to move the conflicting branch to the next bucket if it contains more than one branch and if the directions of the branches conflict.