JP2001249806A

JP2001249806A - Prediction information managing method

Info

Publication number: JP2001249806A
Application number: JP2001043880A
Authority: JP
Inventors: E Mccormic James Jr; ジェームス・イー・マコーミック・ジュニア; R Andy Stephen; スティーヴン・アール・アンディ
Original assignee: Hewlett Packard Co
Current assignee: HP Inc
Priority date: 2000-02-22
Filing date: 2001-02-20
Publication date: 2001-09-14

Abstract

PROBLEM TO BE SOLVED: To provide an efficient means to manage instructions and branching prediction information about the instructions of a computer processor. SOLUTION: When prediction information is stored in memory hierarchy at plural levels including low level and high level prediction caches and a predicted information value is fetched from the low level prediction cache and when no predicted information value is stored there, the predicted information value is fetched from the high level prediction cache. Pieces of the prediction information stored in the low level and high level prediction caches are periodically updated by using the prediction information stored in the low level prediction cache. Furthermore, efficiency of management of the instruction and the branching prediction information is enhanced by integrating the low level prediction cache with a low level instruction cache and controlling them under common management mechanism.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、プロセッサによっ
て使用される予測情報を生成および記憶する方法および
装置に関するもので、特に、分岐予測情報のような制御
フロー予測情報の生成および記憶に関するものである。FIELD OF THE INVENTION The present invention relates to a method and apparatus for generating and storing prediction information used by a processor, and more particularly to generating and storing control flow prediction information such as branch prediction information. .

【０００２】[0002]

【従来の技術】分岐予測情報は、"条件Ｙが満たされれ
ば命令Ｘに分岐する"という形式を持つ命令の結果を予
測することを支援する情報である。分岐予測情報の目的
は２つある。分岐予測情報は、第１に、分岐が成立する
（taken）か否かを予測し、第２に、分岐が成立すると
予測されるならば、分岐の行き先(すなわち分岐の"ター
ゲット")を予測しなければならない。2. Description of the Related Art Branch prediction information is information that assists in predicting the result of an instruction having the form "branch to instruction X if condition Y is satisfied". The purpose of branch prediction information is twofold. The branch prediction information predicts first whether a branch is taken (taken), and second, if it is predicted that a branch is taken, predicts the destination of the branch (ie, the “target” of the branch). Must.

【０００３】今日のプロセッサの多くは、命令パイプラ
インとして知られている構造を取り入れている。命令パ
イプラインは、プロセッサが同時に複数の命令を処理す
ることを可能にしてプロセッサの効率を向上させる。命
令パイプラインは命令組み立てラインとみなすことがで
きる。Instruction_0がパイプラインの第１ステージに
入ると、同時にInstruction_1がパイプラインの第２ス
テージで処理され、さらにInstruction_2がパイプライ
ンの第３ステージで同時に処理されるというように、複
数命令が同時に処理される。周期的に、新しい命令がパ
イプラインに入れられ、パイプラインの中で処理されて
いる各命令はパイプラインの次のステージに移される
か、またはパイプラインから出る。[0003] Many of today's processors incorporate a structure known as an instruction pipeline. The instruction pipeline increases the efficiency of the processor by allowing the processor to process multiple instructions simultaneously. The instruction pipeline can be considered an instruction assembly line. When Instruction_0 enters the first stage of the pipeline, multiple instructions are processed simultaneously, such as Instruction_1 is processed in the second stage of the pipeline at the same time, and Instruction_2 is processed simultaneously in the third stage of the pipeline. . Periodically, new instructions are entered into the pipeline, and each instruction being processed in the pipeline is moved to the next stage of the pipeline or exits the pipeline.

【０００４】命令の実行効率を最大にするため、命令パ
イプラインの各クロックで有用な出力が生成されるよう
に、可能な限り常に命令パイプラインを(パイプライン
の各ステージで処理されている命令で)満たしておくこ
とが望ましい。しかしながら、(1)プログラム・コード
の１つのセクションから別のセクションへプログラムの
フロー制御が移る時、(2)命令が投機的にフェッチおよ
び処理された時、および(3)投機的にフェッチおよび処
理された命令がフェッチおよび処理されるべきではなか
ったと判断される時は、１つまたは複数の命令パイプラ
インが最終的に無駄な出力を生成する。命令パイプライ
ンが無駄な出力を生成するクロックサイクルの各々にお
いて、命令パイプラインはプロセッサの効率に対してマ
イナスの影響を与える。In order to maximize the efficiency of instruction execution, the instruction pipeline should be updated whenever possible so that a useful output is generated at each clock of the instruction pipeline. It is desirable to satisfy. However, (1) when program flow control transfers from one section of program code to another, (2) when instructions are fetched and processed speculatively, and (3) when fetched and processed speculatively. When it is determined that the fetched instruction should not have been fetched and processed, one or more instruction pipelines will eventually produce wasted output. At each clock cycle when the instruction pipeline produces useless output, the instruction pipeline has a negative impact on processor efficiency.

【０００５】分岐命令のようなプログラム・フロー制御
命令は、プログラム・コードの１つのセクションから他
のセクションへプログラム・フロー制御を移すことがで
きる１つの手段である。分岐命令は、条件付きかまたは
無条件かのいずれかである。条件付き分岐命令は、指定
された条件の判定に基づいてプログラム・フロー制御を
決定する。無条件分岐命令は常にプログラム・フロー制
御の移転を伴う。"Ａ＞Ｂなら命令Ｘへ分岐せよ"という
命令は条件付き分岐命令の１つの例である。Ａ＞Ｂな
ら、プログラム・フロー制御は、命令Ｘで始まるプログ
ラム・コードのセクション(すなわちターゲット・コー
ド・セクション)へ移る。Ａ≦Ｂならば、プログラム制
御フローは、条件付き分岐命令の後に順次続くプログラ
ム・コードのセクション(すなわち順次コード・セクシ
ョン)に続く。[0005] Program flow control instructions, such as branch instructions, are one means by which program flow control can be transferred from one section of program code to another. Branch instructions are either conditional or unconditional. Conditional branch instructions determine program flow control based on the determination of a specified condition. Unconditional branch instructions always involve a transfer of program flow control. The instruction "Branch to instruction X if A>B" is one example of a conditional branch instruction. If A> B, program flow control transfers to the section of program code beginning with instruction X (ie, the target code section). If A ≦ B, the program control flow continues with a section of program code that follows the conditional branch instruction (ie, a sequential code section).

【０００６】命令パイプラインは多段階の深さを持つこ
とができるので、分岐命令で指定された条件を決定でき
るようになる前に、その条件付き分岐命令がフェッチさ
れることがしばしばある。そのため、プロセッサは条件
付き分岐命令の結果を予測して、その予測に基づいて投
機的に命令をフェッチし処理する。分岐命令の結果を予
測する時、プロセッサは実際には２つの予測を行わなけ
ればならない。第１に、プロセッサは分岐が成立するか
不成立(not taken)かについて予測しなければならな
い。第２に、分岐が成立すると予測されるならば、プロ
セッサは、分岐のターゲット(すなわちプログラムのフ
ロー制御が移る可能性のあるアドレス)を予測しなけれ
ばならない。予測が行われた後、(分岐が成立すると予
測される場合は)ターゲット・コード・セクション、ま
たは(分岐が不成立と予測される場合は)順次コード・セ
クションから投機的に命令がフェッチされる。[0006] Because the instruction pipeline can have multiple levels of depth, the conditional branch instruction is often fetched before the condition specified in the branch instruction can be determined. Therefore, the processor predicts the result of the conditional branch instruction, and speculatively fetches and processes the instruction based on the prediction. When predicting the result of a branch instruction, the processor must actually make two predictions. First, the processor must predict whether a branch will be taken or not taken. Second, if the branch is predicted to be taken, the processor must predict the target of the branch (ie, the address where the flow control of the program may transfer). After the prediction is made, instructions are speculatively fetched from the target code section (if the branch is predicted to be taken) or from the sequential code section (if the branch is predicted not taken).

【０００７】多くの分岐予測アルゴリズムが存在する
が、それでも分岐予測の誤りは起きる。予測の誤りが確
認される時までには、命令パイプラインが間違ったコー
ド・セクションからフェッチされた多くの命令をすでに
処理している可能性がある。そのような予測の誤りに出
会うと、１つまたは複数のパイプラインで処理されてい
る間違ってフェッチされた命令はパイプラインからフラ
ッシュされ、正しいコード・セクションから命令がフェ
ッチされ、その命令がパイプラインを通して処理されな
ければならない。Although there are many branch prediction algorithms, branch prediction errors still occur. By the time a misprediction is confirmed, the instruction pipeline may have already processed many instructions fetched from the wrong code section. Upon encountering such a misprediction, the incorrectly fetched instruction being processed in one or more pipelines is flushed from the pipeline, the instruction is fetched from the correct code section, and the instruction is Must be processed through

【０００８】パイプラインから命令をフラッシュする
時、バブル(すなわちギャップ)がパイプラインに注入さ
れる。都合の悪いことには、命令パイプラインが再度有
用な出力を生成することができるようになるまでにパイ
プラインは何回かのクロックサイクルを必要とすること
が多い。条件付き分岐命令およびその他のプログラム・
フロー制御命令はプログラム・コードに多数含まれてい
る(例えば５つの命令ごとに１つというオーダーで現わ
れることがある)ので、分岐予測の精度が比較的高い時
でさえ、分岐予測誤りが累積するとプロセッサのパフォ
ーマンスに対して大きく有害な影響を与える。When flushing instructions from the pipeline, bubbles (ie, gaps) are injected into the pipeline. Unfortunately, the pipeline often requires several clock cycles before the instruction pipeline can produce useful output again. Conditional branch instructions and other programs
Because flow control instructions are numerous in the program code (eg, may appear in the order of one out of every five instructions), even when branch prediction accuracy is relatively high, the cumulative Significantly detrimental to processor performance.

【０００９】"分岐が常に成立すると予測する"または"
分岐が常に不成立であると予測する"以外のアルゴリズ
ムに基づいて分岐予測が行われる場合、分岐予測情報を
管理する(すなわち、生成、記憶、アクセス、更新など
を行う)いくつかの手段が必要となる。分岐予測情報を
記憶するための種々の方法が存在するが、それらの多く
は、例えば、予測される分岐ターゲットのキャッシュお
よび分岐履歴のキャッシュ(または単一の大域分岐履歴
レジスタ)だけを備える。従って、(1)分岐履歴にアクセ
スし、(2)その分岐履歴に基づいて分岐が成立するか不
成立かを予測し、(3)分岐が成立すると予測されるなら
ば、分岐ターゲットにアクセスする(または何回も分岐
ターゲットを計算する)ことによって、分岐予測が行わ
れる。"Predict that branch is always taken" or "
When branch prediction is performed based on an algorithm other than "always predict that a branch is not taken", some means for managing branch prediction information (ie, generating, storing, accessing, updating, etc.) is required. Although there are various methods for storing branch prediction information, many of them include, for example, only a cache of predicted branch targets and a cache of branch history (or a single global branch history register). Therefore, (1) access the branch history, (2) predict whether the branch is taken or not based on the branch history, and (3) if the branch is predicted to be taken, access the branch target. (Or by calculating the branch target many times), a branch prediction is made.

【００１０】領域上の制約のため、上記のキャッシュの
各々は典型的には小さく、その結果、キャッシュの各々
の中の単一エントリ(またはエントリのセット)に多数の
分岐命令がマップされることが頻繁に発生する。例え
ば、２ビットの分岐履歴カウンタ(複数)からなるキャッ
シュにおいて、複数の分岐命令が各カウンタへマップさ
れる。そのようなキャッシュの１つの従来技術の実施形
態において、当該カウンタがマップしている分岐のいず
れか１つが成立する時、そのカウンタはインクリメント
される。同様に、当該カウンタがマップしている分岐の
いずれか１つが不成立の時、そのカウンタはデクリメン
トされる。次の予測されるべき分岐命令がマップされて
いるカウンタが"10"または"11"という値を持っている
と、次の分岐は成立すると予測される。同様に、次の予
測されるべき分岐命令がマップされているカウンタが"0
0"または"01"という値を持っていると、次の分岐は不成
立であると予測される。その結果、１つまたは複数の無
関係な分岐の結果に応答してカウンタが更新されたか否
かに関係なく、分岐予測が行われ、その予測は正しいと
仮定される。Due to space constraints, each of the above caches is typically small, so that many branch instructions are mapped to a single entry (or set of entries) in each of the caches. Occurs frequently. For example, in a cache made up of 2-bit branch history counters, multiple branch instructions are mapped to each counter. In one prior art embodiment of such a cache, when any one of the branches to which the counter maps is taken, the counter is incremented. Similarly, when any one of the branches to which the counter maps is not taken, the counter is decremented. If the counter to which the next branch instruction to be predicted is mapped has a value of "10" or "11", the next branch is predicted to be taken. Similarly, the counter to which the next branch instruction to be predicted is mapped is "0".
Having a value of "0" or "01" predicts that the next branch will not be taken, so that the counter has been updated in response to the result of one or more extraneous branches. , A branch prediction is made and the prediction is assumed to be correct.

【００１１】[0011]

【発明が解決しようとする課題】コンピュータ・アーキ
テクチャが並列性、プレディケーション、スペキュレー
ションなどにますます依存しているので、分岐予測情報
のような正確な予測情報を生成してそれにアクセスする
プロセッサの能力はきわめて重要である。今日まで、そ
のような必要性は、予測キャッシュにおけるエントリ数
の増加、予測キャッシュにおけるウェイ数の増加、予測
キャッシュに記憶されるタグのサイズの増大、あるい
は、(例えば"Alternative Implementations of Two-Lev
el AdaptiveBranch Prediction" by T. Yeh and, Y. P
att (Association for Computing Machinery, July 19
92)に記載されているような)多段予測アルゴリズムの実
施によって対処されてきた。As computer architectures increasingly rely on parallelism, predication, speculation, etc., the ability of a processor to generate and access accurate prediction information, such as branch prediction information. Is extremely important. To date, such needs include increasing the number of entries in the predictive cache, increasing the number of ways in the predictive cache, increasing the size of tags stored in the predictive cache, or (eg, "Alternative Implementations of Two-Lev").
el AdaptiveBranch Prediction "by T. Yeh and, Y. P
att (Association for Computing Machinery, July 19
It has been addressed by implementing a multi-stage prediction algorithm (as described in 92).

【００１２】しかしながら、分岐予測情報のような予測
情報を管理するためのよりすぐれた方法および装置が必
要とされている。However, there is a need for better methods and apparatus for managing prediction information, such as branch prediction information.

【００１３】[0013]

【課題を解決するための手段】上述の課題を解決するた
め、本発明は、予測情報を生成および記憶する新しい方
法および装置を提示する。本発明の方法および装置にお
いて、プロセッサによって使用される予測情報は、複数
レベルのメモリ階層に記憶される。本発明において、"
複数レベルのメモリ階層"とは、メモリ階層の複数のレ
ベルに同一タイプの予測情報を記憶する構造と定義され
る。複数レベルのメモリ階層は、前記引用文献"Alterna
tive Implementations of Two-Level Adaptive Branch
Prediction"に開示されているような複数レベル予測ア
ルゴリズムと混同されるべきではない。本明細書は複数
レベルのメモリ階層における分岐予測情報の記憶を特に
記述しているとはいえ、その他のタイプの予測情報を当
該複数レベルのメモリ階層に記憶する方法は当業者に容
易に理解されることであろう。SUMMARY OF THE INVENTION In order to solve the above problems, the present invention proposes a new method and apparatus for generating and storing prediction information. In the method and apparatus of the invention, the prediction information used by the processor is stored in a multi-level memory hierarchy. In the present invention,
The “multi-level memory hierarchy” is defined as a structure in which the same type of prediction information is stored in a plurality of levels of the memory hierarchy.
tive Implementations of Two-Level Adaptive Branch
It should not be confused with the multi-level prediction algorithm as disclosed in "Prediction". Although this specification specifically describes the storage of branch prediction information in a multi-level memory hierarchy, other types of Those skilled in the art will readily understand how to store prediction information in the multi-level memory hierarchy.

【００１４】本発明ではプロセッサによって使用される
予測情報を管理する方法を提供する。該方法は、少なく
とも１つの低レベル予測キャッシュおよび１つの高レベ
ル(higher level)予測キャッシュを含む複数レベルのメ
モリ階層に予測情報を記憶することと、前記低レベル予
測キャッシュから予測情報値をフェッチする試みがなさ
れ、該予測情報値が前記低レベル予測キャッシュに記憶
されていない時、前記高レベル予測キャッシュから該予
測情報値をフェッチすることを試みることと、および前
記低レベル予測キャッシュに記憶されている予測情報を
使用して前記低レベル予測キャッシュおよび前記高レベ
ル予測キャッシュに記憶されている予測情報を周期的に
更新することと、を含む構成をとる。The present invention provides a method for managing prediction information used by a processor. The method stores prediction information in a multi-level memory hierarchy including at least one low-level prediction cache and one higher-level prediction cache, and fetches prediction information values from the low-level prediction cache. When an attempt is made and the prediction information value is not stored in the low-level prediction cache, attempting to fetch the prediction information value from the high-level prediction cache; and And periodically updating the prediction information stored in the low-level prediction cache and the high-level prediction cache using the prediction information that is present.

【００１５】本発明の別の形態では、実行のため分岐命
令がフェッチされるべき時間に先行して分岐予測情報が
生成および記憶される。この分岐予測情報は、分岐命令
の"現在のフェッチ結果"を予測する情報を含む。実行の
ため分岐命令が(例えば低レベル命令キャッシュから)フ
ェッチされる時、(1)(分岐が成立すると予測されるとす
れば)命令ポインタを迅速に向け直し(resteer)、(2)分
岐命令の"次のフェッチ結果"を予測する情報を投機的に
決定するために、分岐予測情報が使用される。このよう
にして、実行のために分岐命令がフェッチされるたびご
とに、その予測情報はすでに生成および記憶されてお
り、命令ポインタを迅速に向け直すことができる。In another aspect of the invention, branch prediction information is generated and stored prior to the time at which a branch instruction is to be fetched for execution. This branch prediction information includes information for predicting the “current fetch result” of the branch instruction. When a branch instruction is fetched for execution (e.g., from a low-level instruction cache), (1) the instruction pointer is quickly redirected (if the branch is predicted to be taken), and (2) the branch instruction The branch prediction information is used to speculatively determine the information for predicting the “next fetch result”. In this way, each time a branch instruction is fetched for execution, its prediction information has already been generated and stored, and the instruction pointer can be quickly redirected.

【００１６】本発明のさらに別の形態では、前記予測情
報は、低レベル命令キャッシュと統合されるキャッシュ
に記憶される。従って、２つのキャッシュが共通の管理
構造を共有することが可能であり、予測情報に対応する
分岐命令がアクセスされるのとほぼ同時に予測情報がア
クセスされることができる。さらに、別個の高レベル命
令キャッシュおよび予測キャッシュを実施することが可
能となる。高速で低レベルのキャッシュを実施して管理
することはコストを要するので、低レベル予測情報キャ
ッシュと低レベル命令キャッシュとの統合によってスケ
ールメリットが達成される。In yet another aspect of the invention, the prediction information is stored in a cache that is integrated with a low-level instruction cache. Therefore, the two caches can share a common management structure, and the prediction information can be accessed almost at the same time that the branch instruction corresponding to the prediction information is accessed. In addition, separate high level instruction and prediction caches can be implemented. Implementing and managing a high-speed, low-level cache is costly, so merging the low-level prediction information cache and the low-level instruction cache achieves economies of scale.

【００１７】一方、(例えばＩＰ相対分岐命令ターゲッ
トのような)一部の予測情報が非常に迅速に生成される
ことが可能であり、また、(ほとんどの分岐ターゲット
については)予測情報の記憶はコストがかかるので、命
令記憶に特に適している高レベルのキャッシュにより多
数の命令を記憶し、かつ、分岐予測情報の記憶に特に適
している高レベルのキャッシュにより多くの分岐予測情
報を記憶することによって、統合された低レベル・キャ
ッシュの外部において、スケールメリットが達成され
る。好ましくは、高レベル命令キャッシュはセットアソ
シエティブキャッシュとして実施され、高レベル予測キ
ャッシュはタグなしのｎウエイ・キャッシュとして実施
される。また、好ましくは、分岐ターゲット、分岐履歴
および分岐トリガ予測情報のような分岐予測情報は、対
応する命令と共に、統合低レベル・キャッシュに記憶さ
れる。しかしながら、分岐履歴および分岐トリガ予測情
報だけは高レベルのキャッシュに記憶されることが好ま
しい。On the other hand, some prediction information (eg, IP relative branch instruction targets) can be generated very quickly, and (for most branch targets) storage of prediction information is Because of the cost, storing a large number of instructions in a high-level cache that is particularly suitable for storing instructions, and storing more branch prediction information in a high-level cache that is particularly suitable for storing branch prediction information This provides economies of scale outside of the integrated low-level cache. Preferably, the high-level instruction cache is implemented as a set-associative cache, and the high-level prediction cache is implemented as an untagged n-way cache. Also, preferably, branch prediction information, such as branch targets, branch history, and branch trigger prediction information, is stored in the unified low-level cache along with corresponding instructions. However, preferably only the branch history and branch trigger prediction information are stored in the high-level cache.

【００１８】さらにまた、本発明によると、"選択"分岐
履歴およびトリガ予測情報を高レベル予測キャッシュに
記憶することのみによって高レベル予測キャッシュの一
層効率的な利用が達成される。Still further, in accordance with the present invention, more efficient utilization of the high level prediction cache is achieved by only storing the "selected" branch history and trigger prediction information in the high level prediction cache.

【００１９】上述の本発明の種々の形態は、分岐予測情
報が(1)適時に生成され、(2)効率的形態で記憶されるこ
とを確実にするために、相互に連係して動作するように
設計されている点に注意するべきである。The various aspects of the present invention described above operate in conjunction with one another to ensure that branch prediction information is (1) timely generated and (2) stored in an efficient manner. It should be noted that it is designed as follows.

【００２０】[0020]

【発明の実施の形態】以下において本明細書に記述され
る方法および装置の好ましい実施形態は、Intel（商
標）/Hewlett-Packard（商標）のIA-64ソフトウェア・
アーキテクチャを実施している。しかしながら、当業者
に容易に理解されることであろうが、本明細書に記述さ
れる方法および装置の多くは、その他のソフトウェア・
アーキテクチャを実施するように適応させることも可能
である。IA-64ソフトウェア・アーキテクチャの詳細
は、"IA-64 Application Developer's Architecture Gu
ide"(Intel, May 1999)に記載されている。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Preferred embodiments of the methods and apparatus described herein below are based on the Intel® / Hewlett-Packard® IA-64 software.
Have implemented the architecture. However, as will be readily appreciated by those skilled in the art, many of the methods and apparatus described herein may be implemented with other software and software.
It is also possible to adapt to implementing the architecture. For more information on the IA-64 software architecture, see the IA-64 Application Developer's Architecture Gu
ide "(Intel, May 1999).

【００２１】IA-64ソフトウェア・アーキテクチャ IA-64において、命令はバンドルの形態でフェッチされ
る。各バンドル１００は、合計１２８ビットであり、３
つの４１ビット命令シラブル(すなわち命令スロット)お
よび１つの５ビット・テンプレート・フィールドを含
む。図１には、シラブル_0、シラブル_1およびシラブル
_2という３つのシラブルを含む典型的なIA-64命令バン
ドル1００が示されている。命令バンドル１００の命令
は、"無条件"または"成立する"分岐命令がプログラム・
フロー制御を異なる命令バンドルの最初の順次の命令へ
移さない限りおよび移すまで(または故障、トラップ、
割り込みまたはその他なんらかの事象がプログラム・フ
ロー制御の受け渡しを引き起こすまで)、最初にシラブ
ル_0、次にシラブル_1、次にシラブル_2というように順
次実行される。プログラム・フロー制御が変更されない
限り、１つのバンドル１００における命令の実行の後
に、次に順次続くバンドルの中の命令が実行される。In the IA-64 software architecture IA-64, instructions are fetched in bundles. Each bundle 100 has a total of 128 bits and 3
Includes one 41-bit instruction syllable (ie, instruction slot) and one 5-bit template field. Figure 1 shows syllable_0, syllable_1 and syllable
A typical IA-64 instruction bundle 100 containing three syllables is shown. The instructions in the instruction bundle 100 include “unconditional” or “taken” branch instructions in the program
Unless and until flow control is transferred to the first sequential instruction in a different instruction bundle (or a fault, trap,
Until an interrupt or some other event causes the transfer of program flow control), syllable_0, then syllable_1, then syllable_2, and so on. Unless the program flow control is changed, the execution of instructions in one bundle 100 is followed by the instructions in the next successive bundle.

【００２２】IA-64には、５つのIA-64命令シラブル・タ
イプ(M、I、F、BおよびL)、６つのIA-64命令タイプ(M、
I、A、F、BおよびL)、および１２の基本テンプレート・
タイプ(MII、MI,I、MLX、MMI、M,MI、MFI、MMF、MIB、M
BB、BBB、MMB、MFB)がある。命令テンプレート・タイプ
は、"ストップ(stop)"の位置と共に、実行ユニットタイ
プへの命令シラブルのマッピングを指定する。基本テン
プレート・タイプの各々には、第３のシラブルの後にス
トップを持つものと持たないものという２つのバージョ
ンがある。ストップは、ストップの後の１つまたは複数
の命令が、ストップの前の１つまたは複数の命令に関し
てある一定の種類のリソース依存関係を持つ可能性のあ
ることを示す(すなわちストップにより並列性の明示的
な指定ができる)。MI,IおよびM.MI命令テンプレート
は、定義上、それらのテンプレートに含まれるカンマに
よって示されるようにバンドル内ストップを有する点は
注意するべきである。IシラブルまたはMシラブルのいず
れにも置くことができるAタイプ命令を除いて、命令
は、テンプレート仕様に基いて、その命令タイプに対応
するシラブルに置かれなければならない。例えば、MII
のテンプレート仕様は、バンドル１００における３つの
命令のうち、最初の命令がメモリ(M)タイプ命令またはA
タイプ命令であり、次の２つの命令が整数(I)タイプ命
令またはAタイプ命令であることを意味する。There are five IA-64 instruction syllable types (M, I, F, B and L) and six IA-64 instruction types (M,
I, A, F, B and L) and 12 basic templates
Type (MII, MI, I, MLX, MMI, M, MI, MFI, MMF, MIB, M
BB, BBB, MMB, MFB). The instruction template type specifies the mapping of the instruction syllable to an execution unit type, along with a "stop" position. There are two versions of each of the basic template types, one with a stop after the third syllable and one without. A stop indicates that the instruction or instructions after the stop may have some type of resource dependency with respect to the instruction or instructions before the stop (i.e., the Can be explicitly specified). It should be noted that MI, I and M.MI instruction templates, by definition, have in-bundle stops as indicated by the commas contained in those templates. Except for A-type instructions, which can be placed in either I-syllables or M-syllables, instructions must be placed in the syllable corresponding to that instruction type, based on the template specification. For example, MII
Of the three instructions in the bundle 100, the first instruction is a memory (M) type instruction or A
Type instruction, which means that the next two instructions are integer (I) type instructions or A type instructions.

【００２３】MLXテンプレートは、メモリ(Ｍ)命令およ
びロング型Ｘタイプ命令の２つだけを含むという点で独
特なテンプレートである。現在のところ、Ｘ命令は整数
命令でしかあり得ないので、MLXテンプレートは実際に
はMLIテンプレートである。MLXテンプレートのＸシラブ
ルは、２２ビットの"長い即値"を持つ整数命令を保持す
る。この２２ビットは、MLXテンプレートのＬシラブル
によって保持される付加的４１ビットの"長い即値"と共
に、整数ユニットに送られる。従って、LX命令は、整数
命令の一形式である。The MLX template is unique in that it contains only two instructions: a memory (M) instruction and a long X-type instruction. At present, the MLX template is actually an MLI template, since the X instruction can only be an integer instruction. The X syllable in the MLX template holds a 22-bit "long immediate" integer instruction. These 22 bits are sent to the integer unit, with the additional 41 bits of "long immediate" held by the L syllable of the MLX template. Thus, the LX instruction is a form of integer instruction.

【００２４】IA-64プロセッサにおける命令実行フロー
は、命令ポインタ(Instruction Pointer：以下ＩＰと呼
称する場合がある)によって制御される。命令ポインタ
は現在実行中のIA-64命令を含むバンドル１００のアド
レスを保持する。ＩＰは、命令バンドル１００が実行さ
れるとインクリメントされ、また、分岐命令の実行によ
って(または他の手段によって)新しい値へセットされ
る。IA-64命令バンドル１００は１６バイトであり、１
６バイト単位でアライメントされるので、ＩＰの最下位
４ビットは常にゼロである。An instruction execution flow in the IA-64 processor is controlled by an instruction pointer (hereinafter, sometimes referred to as IP). The instruction pointer holds the address of the bundle 100 containing the currently executing IA-64 instruction. The IP is incremented when the instruction bundle 100 is executed, and is set to a new value by executing the branch instruction (or by other means). The IA-64 instruction bundle 100 is 16 bytes and contains 1
Since alignment is performed in units of 6 bytes, the least significant 4 bits of the IP are always zero.

【００２５】IA-64ソフトウェア・アーキテクチャは、
ＩＰ相対分岐および間接分岐という２つの基本的な分岐
の区分を提供する。ＩＰ相対分岐は、符号付き２０ビッ
ト・オフセットを使用して分岐ターゲットアドレスを指
定する。このオフセットはＩＰ相対分岐命令と共に運ば
れる。このオフセットがＩＰ相対分岐を含む命令バンド
ルの開始アドレスに加算されると、ターゲットバンドル
の開始アドレスを示す。符号付２０ビット・オフセット
(すなわち２１ビット・オフセット)により、ＩＰ相対分
岐は±１６ＭＢに到達することができる。一方、間接分
岐によって提供される到達可能アドレスはＩＰ相対分岐
の場合より大きいが、コストは高い。間接分岐のターゲ
ットバンドルは、８個の６４ビット分岐レジスタのうち
の１つに保持される値によって指定される。従って、間
接分岐ターゲットは、間接分岐ターゲットが必要とされ
る時より前に、８個の分岐レジスタのうちの１つに移動
されなければならない(これは分岐予測ハードウェアが
分岐予測を目的とするターゲットに依拠する時点より前
である)。従って、間接分岐は、MOVL、MOV-to-BR、BRの
ようなセットアップ・ルーチンを必要とする。MOVL命令
が６４ビット値を汎用レジスタ(GR)に移動する。次に、
MOV-to-BR命令が６４ビット汎用レジスタ値を指定され
た分岐レジスタ(BR)へ移動する。次に、BR命令が、指定
された分岐レジスタに記憶されているターゲットアドレ
スへの分岐を実行する。The IA-64 software architecture is:
It provides two basic branch divisions: IP relative branches and indirect branches. IP-relative branches specify a branch target address using a signed 20-bit offset. This offset is carried with the IP relative branch instruction. When this offset is added to the start address of the instruction bundle including the IP relative branch, it indicates the start address of the target bundle. Signed 20-bit offset
(Ie, 21 bit offset) allows the IP relative branch to reach ± 16 MB. On the other hand, the reachable address provided by the indirect branch is greater than that of the IP relative branch, but at a higher cost. The target bundle of the indirect branch is specified by the value held in one of the eight 64-bit branch registers. Thus, the indirect branch target must be moved to one of the eight branch registers before the indirect branch target is needed (this is because the branch prediction hardware is intended for branch prediction). Before the time of relying on the target). Therefore, indirect branching requires a setup routine such as MOVL, MOV-to-BR, BR. The MOVL instruction moves the 64-bit value to a general purpose register (GR). next,
The MOV-to-BR instruction moves the 64-bit general register value to the specified branch register (BR). Next, the BR instruction executes a branch to the target address stored in the specified branch register.

【００２６】間接分岐のためのセットアップ・ルーチン
は３つの命令を伴い、そのうちのいくつかの命令が６４
ビット値の移動を必要とするので、間接分岐のセットア
ップ・ルーチンの命令は、せいぜい２つの命令バンドル
に適合することができるにすぎない。さらに、これら３
つの命令は相互に依存関係にあり、従って順次実行され
なければならないので、１つの間接分岐の実行は少なく
とも３サイクルを要する(これに対してＩＰ相対分岐は
１つの命令の実行を要するにすぎない)。また、間接分
岐の実行は、少なくとも１つの使用可能な汎用レジスタ
と１つの使用可能な分岐レジスタの使用を必要とする。
従って、間接分岐は多くのオーバーヘッドを必要とす
る。しかしながら間接分岐は、IA-64の６４ビット・ア
ドレス空間の任意の場所に分岐することができるという
点で、ＩＰ相対分岐に対して有利である。The setup routine for an indirect branch involves three instructions, some of which are 64 instructions.
Because of the need to move bit values, the instructions of the setup routine of the indirect branch can only fit at most two instruction bundles. Furthermore, these three
The execution of one indirect branch requires at least three cycles because the two instructions are dependent on each other and must therefore be executed sequentially (as opposed to the IP relative branch, which requires the execution of only one instruction). . Also, performing an indirect branch requires the use of at least one available general purpose register and one available branch register.
Thus, indirect branching requires a lot of overhead. However, indirect branches are advantageous over IP-relative branches in that they can branch anywhere in the IA-64 64-bit address space.

【００２７】分岐は無条件分岐または条件つき分岐のい
ずれかである。無条件分岐は常に成立する。条件つき分
岐は成立することも不成立のこともある。従って、条件
つき分岐の結果はなんらかの方法で予測されなければな
らない。１つの方法では、コンパイラが生成する"ヒン
ト"を介して分岐の結果が予測される。ヒントは、コン
パイル時に分岐命令と共に符号化され、分岐命令が実行
される時に先行してプロセッサによって復号化される。A branch is either an unconditional branch or a conditional branch. Unconditional branches always hold. A conditional branch may or may not be taken. Therefore, the result of the conditional branch must be predicted in some way. In one approach, the outcome of a branch is predicted via "hints" generated by the compiler. The hint is encoded with the branch instruction at compile time and decoded by the processor prior to the execution of the branch instruction.

【００２８】以下は、上記IA-64ソフトウェア・アーキ
テクチャ(および他のソフトウェア・アーキテクチャ)を
実施するための改善されたいくつかの方法および装置の
記述である。そのような方法および装置のいくつかは、
予測情報(特に分岐予測情報)を生成、記憶および管理す
るための改善された手段に関するものである。The following is a description of some improved methods and apparatus for implementing the above IA-64 software architecture (and other software architectures). Some of such methods and devices include:
An improved means for generating, storing and managing prediction information, especially branch prediction information.

【００２９】命令処理／分岐予測ハードウェア図２は、１つのプロセッサ内で命令および分岐予測情報
を管理する種々の構造を示している。図２の構造には、
統合された低レベル命令/分岐予測キャッシュ２００(以
下の記述において単に"統合低レベルキャッシュ"と呼ば
れることがある)、別個の高レベル命令キャッシュ２０
６および分岐予測キャッシュ２０８、および種々のキャ
ッシュに記憶される分岐予測情報の更新を支援するいく
つかの構造が含まれている。 Instruction Processing / Branch Prediction Hardware FIG. 2 shows various structures for managing instruction and branch prediction information within a single processor. In the structure of FIG.
Integrated low-level instruction / branch prediction cache 200 (sometimes referred to in the following description as simply "integrated low-level cache"), separate high-level instruction cache 20
6 and branch prediction cache 208, and several structures that assist in updating the branch prediction information stored in the various caches.

【００３０】１．統合低レベル命令/分岐予測キャッシ
ュ図２の装置の中央にある構造が、統合低レベル命令/分
岐予測キャッシュ２００である。この統合低レベル命令
/分岐予測キャッシュ２００は機能的には単一構造とし
て動作するが、概念的には低レベル命令キャッシュ２０
４(L0I)および低レベル分岐予測キャッシュ２０２(L0IB
R)から構成される(図３をあわせて参照)。1. Integrated Low-Level Instruction / Branch Prediction Cache The central structure of the device of FIG. This integrated low-level instruction
Although the branch prediction cache 200 functionally operates as a single structure, conceptually the low level instruction cache 20
4 (L0I) and the low-level branch prediction cache 202 (L0IB
R) (see also FIG. 3).

【００３１】過去において、大部分のプロセッサは、プ
ロセッサの低レベル命令キャッシュ２０４と機能的に異
なるキャッシュに分岐予測情報を維持していた。その理
由の一部は、過去に使用されて来た分岐予測アルゴリズ
ムが大域分岐予測に大きく依存していることである。大
域分岐予測は、退去分岐命令の各々がその結果(成立か
不成立か)を示す１ビットを大域分岐履歴レジスタの中
にシフトさせるという分岐予測の１つの形式である。次
に、この大域分岐履歴レジスタを使用して、大域分岐履
歴レジスタのビット(複数)が仮定することができる種々
のパターンに関して予測履歴を記憶する２ビット・カウ
ンタのキャッシュがインデックス付けされる。退去分岐
が大域分岐履歴レジスタを更新する度に、適切なパター
ン履歴カウンタも更新される(前述の引用文献"Alternat
ive Implementations of Two-Level Adaptive Branch P
rediction"参照)。In the past, most processors maintained branch prediction information in a cache that was functionally different from the processor's low-level instruction cache 204. Part of the reason is that branch prediction algorithms used in the past rely heavily on global branch prediction. Global branch prediction is a form of branch prediction in which each leaving branch instruction shifts one bit indicating the result (taken or not taken) into the global branch history register. The global branch history register is then used to index a cache of a two-bit counter that stores the prediction history for various patterns in which the bit (s) of the global branch history register can be assumed. Each time a leaving branch updates the global branch history register, the appropriate pattern history counter is also updated (see above-cited "Alternat").
ive Implementations of Two-Level Adaptive Branch P
rediction ").

【００３２】大域分岐予測子（predictor）の精度は大
域履歴レジスタのサイズに大きく依存する。これには２
つの理由がある。第１に、大域履歴が長いほど、異なる
予測シナリオを区別する多くの情報を持つことができる
からである。第２に、大域履歴レジスタが長いほど、２
ビット・パターン履歴カウンタのテーブルが大きいこと
を意味し、従ってカウンタ・エイリアシングが少ないこ
とを意味するからである。大域履歴レジスタ・サイズ
(およびパターン履歴テーブルのサイズ)に対するコンピ
ュータ・プログラム分岐の正しい比率が維持される時、
大域分岐予測子は非常に高いレベルの精度を提供するこ
とができるが、いくつかの理由から、局所分岐予測の方
が有利である。例えば、詳細は後述するが、局所分岐予
測子により、分岐命令の"次のフェッチ結果"を予測する
情報を分岐の現在のフェッチの時点で決定することが可
能になる(ここで、分岐命令の次のフェッチ結果を予測
する情報には、分岐命令が成立か不成立かを予測するト
リガ情報と、分岐が成立すると予測される場合に分岐先
を予測するターゲット情報とが含まれる)。[0032] The accuracy of the global branch predictor depends greatly on the size of the global history register. This is 2
There are two reasons. First, the longer the global history, the more information that can distinguish between different prediction scenarios. Second, the longer the global history register,
This means that the table of the bit pattern history counter is large, and therefore, the counter aliasing is small. Global history register size
When the correct ratio of computer program branch to (and pattern history table size) is maintained,
Although global branch predictors can provide very high levels of accuracy, local branch prediction is advantageous for several reasons. For example, although the details will be described later, the local branch predictor makes it possible to determine information for predicting the “next fetch result” of the branch instruction at the time of the current fetch of the branch (here, the branch instruction The information for predicting the next fetch result includes trigger information for predicting whether a branch instruction is taken or not taken, and target information for predicting a branch destination when it is predicted that a branch will be taken).

【００３３】局所分岐予測のもう１つの利点は、多数の
分岐を含むプログラムに対して大きさの点でより良く適
合するということである。換言すれば、コンピュータ・
プログラムにおける分岐の数が増加するにつれて、局所
分岐予測によって達成されるものと同じ予測精度を達成
するために、大域分岐予測が局所型分岐予測より多くの
メモリ領域を必要とするポイントに到達することとな
る。IA-64ソフトウェア・アーキテクチャは、大きなコ
ンピュータ・プログラム(従って比較的多数の分岐命令
を含むプログラム)を効果的に扱えるように特に設計さ
れているので、局所分岐予測はそのようなアーキテクチ
ャを実施するためにはより優れた選択であると考えられ
る。Another advantage of local branch prediction is that it fits better in size to programs containing many branches. In other words, the computer
As the number of branches in a program increases, a point is reached where global branch prediction requires more memory space than local branch prediction to achieve the same prediction accuracy as achieved by local branch prediction Becomes Because the IA-64 software architecture is specifically designed to effectively handle large computer programs (and thus programs containing a relatively large number of branch instructions), local branch prediction implements such an architecture. Is considered a better choice.

【００３４】局所分岐予測は分岐の現在のフェッチの時
点で分岐の次のフェッチ結果を予測する情報を決定する
ので、分岐命令自体を保持するキャッシュ２０４と統合
されるキャッシュ２０２に分岐命令の次のフェッチ結果
を予測する情報を記憶することは意味を持ち、そうする
ことによって、その分岐命令がフェッチされる次の時に
予測情報を検索して使用することが容易となる。これに
より、分岐がフェッチされた後に大域分岐履歴レジスタ
および連想パターン履歴テーブルにアクセスするのを待
って、さらに分岐ターゲットにアクセス(または、さら
に計算)しなければならない時間が節約される。図２
は、そのような統合低レベル命令／分岐予測キャッシュ
２００を示している。The local branch prediction determines the information predicting the next fetch result of the branch at the time of the current fetch of the branch. Storing information that predicts the fetch result makes sense, which makes it easier to retrieve and use the prediction information the next time the branch instruction is fetched. This saves the time of having to access (or further calculate) the branch target after waiting to access the global branch history register and the associative pattern history table after the branch is fetched. FIG.
Illustrates such an integrated low-level instruction / branch prediction cache 200.

【００３５】分岐の次のフェッチ結果を予測する情報を
記憶するために分岐予測キャッシュの機能を集合的に果
たす１つまたは複数のキャッシュを使用することは可能
であるが、図２に示す統合低レベル・キャッシュ２００
を使用することには多くの利点がある。第１に、局所分
岐予測は、(分岐予測情報は分岐ごとに維持されるので)
分岐命令と分岐予測情報の間に事実上一対一の対応関係
を必要とする。そのような対応関係を確立する１つの方
法は１つまたは複数の独立(stand-alone)分岐予測キャ
ッシュのサイズを増加させることであるが、本発明で
は、統合低レベル命令／分岐予測キャッシュ２００を導
入するほうがより優れた方法であると判断した。While it is possible to use one or more caches that collectively perform the function of a branch prediction cache to store information that predicts the result of the next fetch of a branch, the integrated cache shown in FIG. Level cache 200
There are many advantages to using. First, local branch prediction is (because branch prediction information is maintained for each branch).
It requires a one-to-one correspondence between the branch instruction and the branch prediction information. While one way to establish such a correspondence is to increase the size of one or more stand-alone branch prediction caches, the present invention uses integrated low-level instruction / branch prediction cache 200. We decided that introducing it was a better method.

【００３６】低レベル命令キャッシュ２０４は、完全な
タグ付けを使用する(すなわちそこに記憶される命令を
一意に識別することができるほど十分に大きいタグが使
用される)。これらの完全なタグを記憶するコストは大
きく、完全なタグを効率的に扱うことのできるキャッシ
ュ管理構造の設計、試験および構築にはコストがかか
る。また、分岐ごとに分岐予測情報を記憶することは同
様にコストがかかる。しかしながら、(図３の)共通管理
構造３００のもとに低レベル命令キャッシュ２０４と分
岐予測キャッシュ２０２を統合すると、スケールメリッ
トという利益を得ることができる。単一のアドレス指定
手段を使用して統合低レベル・キャッシュ２００から分
岐命令およびその対応する予測情報を取り出すことがで
きる。The low-level instruction cache 204 uses full tagging (ie, tags that are large enough to uniquely identify the instructions stored therein). The cost of storing these complete tags is high, and designing, testing, and building a cache management structure that can handle complete tags efficiently is costly. Storing the branch prediction information for each branch is also costly. However, integrating the low-level instruction cache 204 and the branch prediction cache 202 under the common management structure 300 (of FIG. 3) can provide benefits of economies of scale. Branch instructions and their corresponding prediction information can be retrieved from the unified low-level cache 200 using a single addressing means.

【００３７】さらに、プロセッサの命令パイプラインが
フルスピードで動作できるようにするためには、分岐予
測キャッシュ２０２のアクセス速度は、必ず、プロセッ
サの低レベル命令キャッシュ２０４の速度とほぼ同じと
する必要がある。低レベル命令キャッシュ２０４と分岐
予測キャッシュ２０２の統合はそのような速度の一致を
保証する。Further, in order for the instruction pipeline of the processor to operate at full speed, the access speed of the branch prediction cache 202 must be almost equal to the speed of the low-level instruction cache 204 of the processor. is there. The integration of the low-level instruction cache 204 and the branch prediction cache 202 ensures such speed matching.

【００３８】代替的形態として、分岐予測キャッシュ
は、集合的に分岐予測キャッシュの機能を果たす１つま
たは複数のキャッシュを使用して実施することも可能で
あるが、そのようなキャッシュ構造において完全なタグ
付けを実施するには費用がかかる。その上、ハードウェ
アは、分岐予測情報をその対応する分岐命令と"関連付
ける"ことを必要とする。分岐予測キャッシュを実施す
るコストは、例えば、完全性を下げたタグ付けをキャッ
シュに使用することによって低減させることができるか
もしれない。しかしながら、こうすると、分岐予測キャ
ッシュからフェッチされた情報が低レベル命令キャッシ
ュ２０４からフェッチされた分岐命令に本当に対応する
か否かが不確実になってしまう。低レベル命令キャッシ
ュ２０４と分岐予測キャッシュ２０２の統合によって、
予測情報の対応関係が正確であることが分かる。As an alternative, the branch prediction cache could be implemented using one or more caches that collectively act as a branch prediction cache, but with a complete cache structure in such a cache structure. Implementing tagging is expensive. In addition, the hardware needs to "associate" the branch prediction information with its corresponding branch instruction. The cost of implementing a branch prediction cache may be reduced, for example, by using less complete tagging for the cache. However, this makes it uncertain whether the information fetched from the branch prediction cache really corresponds to a branch instruction fetched from the low level instruction cache 204. By integrating the low-level instruction cache 204 and the branch prediction cache 202,
It can be seen that the correspondence of the prediction information is accurate.

【００３９】低レベル命令キャッシュ２０４に記憶され
る命令のすべてが分岐命令であるとは限らないので、任
意の所与の時点において、必然的に統合低レベル命令／
分岐予測キャッシュ２００の分岐予測部分２０２におけ
るエントリの多数が使用されないこともある。しかしな
がら、(1)低レベル命令キャッシュ２０４の管理構造を
導入することができ、(2)分岐がフェッチされる時常に
予測情報が使用可能状態にあるという利点のほうが、キ
ャッシュ２０２に記憶される有用な予測情報の密度が比
較的低いという点より重要であると考えられる。Since not all instructions stored in low-level instruction cache 204 are branch instructions, at any given point in time, the integrated low-level instruction /
Many of the entries in the branch prediction portion 202 of the branch prediction cache 200 may not be used. However, the advantage that (1) the management structure of the low-level instruction cache 204 can be introduced, and (2) that the prediction information is always available when a branch is fetched, is useful in the usefulness stored in the cache 202. This is more important than the fact that the density of the prediction information is relatively low.

【００４０】A. L0I 統合低レベル・キャッシュ２００のL0I部分２０４の１
つの好ましい実施形態において、命令はバンドルペアと
してL0I２０４に記憶される。命令の１つのバンドルペ
アは２つの命令バンドルを含み、その各々は３つの命令
シラブルおよび多数のテンプレート・ビットを含む。L0
I２０４に記憶される命令バンドルペアの１例(バンドル
_Oおよびバンドル_1)が図３に示されている。各命令バ
ンドルの３つのシラブルは、(先に実行された分岐命令
によってプログラム・フロー制御が異なる命令バンドル
の開始点へ移らないと仮定すれば)最初にシラブル_0が
実行され、次にシラブル_1が、最後にシラブル_2という
ように順に実行されるように設計されている。 A. One of the L0I portions 204 of the L0I unified low level cache 200
In one preferred embodiment, the instructions are stored in L0I 204 as a bundle pair. One bundle pair of instructions includes two instruction bundles, each including three instruction syllables and a number of template bits. L0
An example of an instruction bundle pair stored in I204 (bundle
_O and bundle_1) are shown in FIG. The three syllables in each instruction bundle are syllable_0 executed first (assuming that program flow control does not move to the start of a different instruction bundle due to a previously executed branch instruction), then syllable_ 1 is designed to be executed in order, such as syllable_2 at the end.

【００４１】プログラム・フロー制御がペアの最初の順
次のバンドルへ移転される時、命令バンドルペアのいず
れの命令バンドルの開始点にでもプログラム・フロー制
御を移すことは可能であるが、命令実行は、(この場合
も、先に実行された分岐命令によってプログラム・フロ
ー制御が異なる命令バンドルの開始点へ移らないと仮定
すれば)最初のバンドルから第２のバンドルへと順次進
む。命令がバンドルペアの形態でL0I２０４に記憶され
るのが好ましい理由の１つは、より多数の命令ビットに
ついて、L0I２０４にアクセスするために使用される非
常に多数のアドレス・ビットのコストを削減できる点に
ある。When program flow control is transferred to the first sequential bundle of the pair, it is possible to transfer program flow control to the beginning of any instruction bundle in the instruction bundle pair, but the execution of instructions is , (Again, assuming that the previously executed branch instruction does not cause program flow control to move to the start of a different instruction bundle), sequentially from the first bundle to the second bundle. One of the reasons that instructions are preferably stored in L0I 204 in a bundle pair is that, for a larger number of instruction bits, the cost of the very large number of address bits used to access L0I 204 can be reduced. It is in.

【００４２】B. L0IBR 統合低レベル・キャッシュ２００のL0IBR部分２０２の
１つの好ましい実施形態において、L0I２０４に記憶さ
れた命令の各バンドルペアに関する分岐予測情報が記憶
される。L0IBR分岐予測情報がとることのできる１つの
形式が図３に示されている。予測情報は、予測されるタ
ーゲットの複数ビット、ターゲット相関データ(TAS)、
および４つの分岐履歴／トリガ予測チャンク(chunk)３
０２、３０４、３０６、３０８を含む。分岐履歴／トリ
ガ予測３０２乃至３０８は以下ＳＴＨ(single-way trig
ger/history)と呼称される。 B. In one preferred embodiment of the L0IBR portion 202 of the L0IBR unified low-level cache 200, branch prediction information for each bundle pair of instructions stored in the L0I 204 is stored. One format that the L0IBR branch prediction information can take is shown in FIG. The prediction information includes multiple bits of the target to be predicted, target correlation data (TAS),
And four branch history / trigger prediction chunks 3
02, 304, 306, and 308. The branch history / trigger predictions 302 to 308 are described below as STH (single-way trig).
ger / history).

【００４３】L0IBR２０２に記憶される４つのＳＴＨチ
ャンク３０２乃至３０８の各々は４つの分岐履歴ビット
および１つのトリガ予測ビットを含む５ビットから形成
される。トリガ予測ビットは、L0I２０４に現在記憶さ
れている分岐がその次のフェッチで成立するか不成立か
を予測する。所与のトリガ予測ビットと共に記憶される
分岐履歴ビットは、トリガ予測ビットが結果を予測する
その分岐の過去の成立／不成立の結果の記録を形成す
る。Each of the four STH chunks 302-308 stored in L0 IBR 202 is formed from five bits, including four branch history bits and one trigger prediction bit. The trigger prediction bit predicts whether the branch currently stored in L0I 204 will be taken or not taken at the next fetch. The branch history bits stored with a given trigger prediction bit form a record of the past taken / failed results of that branch for which the triggered prediction bit predicts a result.

【００４４】１つの命令バンドル１００が最高３つの分
岐命令を保持することができるので、１つの命令バンド
ルは、当該バンドル１００に関して使用可能なＳＴＨチ
ャンク３０２乃至３０８より多くの分岐を持つことが可
能である。従って、ＳＴＨチャンク３０２乃至３０８は
以下のように使用される。命令バンドル１００のシラブ
ル_0が分岐命令を保持していないとすれば、別個のＳＴ
Ｈチャンク３０２、３０４が命令バンドルの残りの２つ
のシラブルの各々に割り当てられる。命令バンドル１０
０のシラブル_0が分岐命令を保持しているとすれば、Ｓ
ＴＨチャンク３０２、３０４のペアが、分岐履歴および
トリガ予測情報を符号化形式４００(図４)で記憶するた
め使用される。符号化情報を記憶するため使用される場
合、命令バンドルの３つシラブルすべてに関する分岐履
歴およびトリガ予測情報を記憶することができる。IA-6
4アーキテクチャが現時点で提供しているのは、命令バ
ンドルのうちのシラブル_0が分岐を保持することができ
る１つのテンプレートだけであり、このテンプレートは
BBBである。その結果、命令バンドル１００のシラブル_
0が分岐を保持する時はいつでも、バンドル１００にお
ける３つの命令シラブルのすべてが分岐を保持すること
が(保証されないが)可能である。符号化されたＳＴＨチ
ャンク４００が使用されるべきか否かに関する決定を円
滑に行うため、シラブル_1およびシラブル_2が実際に分
岐命令を保持するか否かにかかわらず、命令バンドル１
００のシラブル_0が真の分岐を保持する時はいつも、符
号化されたＳＴＨチャンク４００が使用されるのが好ま
しい。Since an instruction bundle 100 can hold up to three branch instructions, an instruction bundle can have more branches than the STH chunks 302-308 available for that bundle 100. is there. Therefore, STH chunks 302 through 308 are used as follows. If syllable_0 of instruction bundle 100 does not hold a branch instruction, a separate ST
H chunks 302, 304 are assigned to each of the remaining two syllables of the instruction bundle. Instruction bundle 10
If syllable_0 of 0 holds a branch instruction, S
The pair of TH chunks 302, 304 is used to store branch history and trigger prediction information in encoded form 400 (FIG. 4). When used to store encoding information, branch history and trigger prediction information for all three syllables of the instruction bundle can be stored. IA-6
At present, the 4 architecture only provides one template that syllable_0 of the instruction bundle can hold a branch.
BBB. As a result, syllable _ of instruction bundle 100
Whenever 0 holds a branch, it is possible (but not guaranteed) that all three instruction syllables in bundle 100 will hold the branch. To facilitate the decision as to whether or not the encoded STH chunk 400 should be used, instruction bundle 1 regardless of whether syllable_1 and syllable_2 actually hold a branch instruction.
Whenever syllable_0 of 00 holds a true branch, the encoded STH chunk 400 is preferably used.

【００４５】図４は、ＳＴＨチャンク４００の符号化さ
れたペアを示している。符号化されたペア４００は、そ
の対応する命令バンドルに関する４つの２ビット履歴事
象および１つの２ビット・トリガ予測を保持する点に注
意する必要がある。トリガ予測の２ビットの"00"という
値は、命令バンドルの分岐のいずれも成立しないと予測
されることを示し、トリガ予測の２ビットの"01"という
値は、シラブル_2の分岐が成立すると予測されることを
示し、トリガ予測の２ビットの"10"という値は、シラブ
ル_1の分岐が成立すると予測されることを示し、トリガ
予測の２ビットの"11"という値は、シラブル_0の分岐が
成立すると予測されることを示す。FIG. 4 shows an encoded pair of STH chunks 400. It should be noted that encoded pair 400 holds four 2-bit history events and one 2-bit trigger prediction for its corresponding instruction bundle. A two-bit value of "00" in the trigger prediction indicates that no branch of the instruction bundle is predicted to be taken, and a "01" value of two bits in the trigger prediction indicates that the branch of syllable_2 is taken. Then, the value of “10” of the two bits of the trigger prediction indicates that the branch of syllable_1 is predicted to be taken, and the value of “11” of the two bits of the trigger prediction is predicted as the syllable. Indicates that the branch of _0 is predicted to be taken.

【００４６】現在のIA-64命令テンプレートを所与とす
れば、L0I２０４に記憶されている命令バンドル１００
に割り当てられる２つのL0IBRのＳＴＨチャンク３０
２、３０４は、次に示す３つの方法うちの１つで使用さ
れることができる。第１に、命令バンドル１００がただ
１つの分岐命令を保持し、その分岐命令がシラブル_0に
現れないとすれば、１つのＳＴＨチャンク３０２がその
分岐に関する分岐履歴およびトリガ予測情報を記憶す
る。第２に、命令バンドル１００が２つの分岐命令を保
持し、これらの分岐命令のいずれもがシラブル_0に現れ
ないとすれば、それら２つの分岐に関する分岐履歴およ
びトリガ予測情報を記憶するため２つのＳＴＨチャンク
３０２、３０４が使用される。第３に、命令バンドル１
００がシラブル_0に分岐命令を保持するとすれば、その
命令の分岐に関する符号化分岐履歴およびトリガ予測情
報４００を記憶するため、２つのＳＴＨチャンク３０
２、３０４が使用される。符号化のケースにおいては、
２つのＳＴＨチャンク４００が最高３つまでの分岐に関
する分岐履歴およびトリガ予測情報を扱うことができる
点に注意する必要がある。しかしながら、２つのＳＴＨ
チャンク４００の符号化は、命令バンドル１００が３つ
の分岐を持つことを必ずしも意味するわけではない。Given the current IA-64 instruction template, the instruction bundle 100 stored in L0I 204
STH chunk 30 of two L0IBRs assigned to
2, 304 can be used in one of three ways: First, if instruction bundle 100 holds only one branch instruction and that branch instruction does not appear in syllable_0, then one STH chunk 302 stores the branch history and trigger prediction information for that branch. Second, if the instruction bundle 100 holds two branch instructions and none of these branch instructions appear in syllable_0, then store the branch history and trigger prediction information for those two branches. One STH chunk 302, 304 is used. Third, instruction bundle 1
If 00 stores a branch instruction in syllable_0, two STH chunks 30 are stored to store the encoded branch history and trigger prediction information 400 relating to the branch of the instruction.
2, 304 are used. In the encoding case,
It should be noted that two STH chunks 400 can handle branch history and trigger prediction information for up to three branches. However, two STHs
The encoding of chunk 400 does not necessarily mean that instruction bundle 100 has three branches.

【００４７】後述するように、未使用のL0IBRのＳＴＨ
チャンク３０２乃至３０８は、L1B管理情報を記憶する
ため使用されることができる。As will be described later, STH of unused L0IBR
Chunks 302 to 308 can be used to store L1B management information.

【００４８】L0I２０４に記憶される各命令バンドルペ
アは、L0IBR２０２に記憶される単一の分岐ターゲット
(またはその部分)にマップされる。従って、成立すると
予測された分岐を有する命令バンドルのうちの最初に順
次実行される分岐命令にターゲットがマップされる。こ
こで、命令バンドルとは、"バンドルペア・エントリポ
イント"に続く最初のバンドルである。IA-64ソフトウェ
ア・アーキテクチャでは、そのプログラム・フロー制御
は任意の命令バンドル１００の開始点に向けられること
ができるので、分岐命令はバンドルペアの第１の命令バ
ンドル(バンドル_O)または第２の命令バンドル(バンド
ル_1)のいずれかにプログラム・フロー制御を向け直す
ことができる。従って、バンドルペアの現在のフェッチ
の間にプログラム・フロー制御がバンドルペアの１つの
バンドルに向けられる(または向け直される)時、バンド
ルペアの次のフェッチにおいてプログラム・フロー制御
が同じバンドルに向けられると仮定される。プログラム
・フロー制御が向けられるポイントは、本明細書におい
て、バンドルペア・エントリポイントと呼ばれる。バン
ドルペア・エントリポイントが一旦決定されると、命令
バンドルペアに対応するトリガ予測ビットを評価して、
命令バンドルペアのうちのどの分岐命令が、そのバンド
ルペア・エントリポイントの後にあって成立が予測され
る最初の分岐命令かを決定することによって、成立が予
測される最初の命令が決定される。Each instruction bundle pair stored in L0I 204 is a single branch target stored in L0IBR 202.
(Or part of it). Therefore, the target is mapped to the first sequentially executed branch instruction of the instruction bundle having the branch predicted to be taken. Here, the instruction bundle is the first bundle following the “bundle pair entry point”. In the IA-64 software architecture, the branch instruction is either the first instruction bundle (bundle_O) or the second instruction bundle of the bundle pair because its program flow control can be directed to the start of any instruction bundle 100. Program flow control can be redirected to any of the instruction bundles (bundle_1). Thus, when program flow control is directed (or redirected) to one bundle of a bundle pair during the current fetch of a bundle pair, program flow control is directed to the same bundle at the next fetch of the bundle pair. Is assumed. The point to which program flow control is directed is referred to herein as a bundle pair entry point. Once the bundle pair entry point is determined, the trigger prediction bits corresponding to the instruction bundle pair are evaluated,
The first instruction predicted to be taken is determined by determining which branch instruction in the instruction bundle pair is the first branch instruction predicted to be taken after the bundle pair entry point.

【００４９】ターゲットと共に記憶されるターゲット相
関ビットは、ターゲットが生成された(またはターゲッ
ト無効とマークされた)命令バンドルペアのシラブルま
たは分岐命令にターゲットを単にマップするものであ
る。上述から計算することができるように、統合低レベ
ル・キャッシュ２００のL0I部分２０４に記憶される各
命令バンドルペア(２５６ビット)の各々に対して、キャ
ッシュ２００のL0IBR部分２０２に６３ビット・データ
が記憶される(L0IBR２０２は実際には命令バンドルペア
各々につき６４ビットを提供するが、そのうち１ビット
は現在のところ使用されていない)。ただし、上記はL0I
BR２０２の１つの好ましい構造にすぎず、そこに含まれ
るビット数は、所与のアプリケーションについて記憶す
るのが望ましいターゲットビットの数、記憶するのが望
ましい分岐あたりの履歴ビットの数、提供される命令バ
ンドル１００(またはバンドルペア)あたりのＳＴＨチャ
ンク３０２乃至３０８の数等々の要因に従って、増減さ
れることができる。The target correlation bits stored with the target simply map the target to the syllable or branch instruction of the instruction bundle pair for which the target was generated (or marked as target invalid). As can be calculated from the above, for each instruction bundle pair (256 bits) stored in the L0I portion 204 of the unified low-level cache 200, 63 bits of data are stored in the L0IBR portion 202 of the cache 200. Stored (LOIBR 202 actually provides 64 bits for each instruction bundle pair, one of which is currently unused). However, the above is L0I
This is just one preferred structure of BR 202, where the number of bits contained is the number of target bits that are desired to be stored for a given application, the number of history bits per branch that are desired to be stored, the instructions provided. It can be increased or decreased according to factors such as the number of STH chunks 302-308 per bundle 100 (or bundle pair).

【００５０】2. 高レベル・キャッシュ統合低レベル・キャッシュ２００に記憶される命令の多
くは非分岐命令である点に注意する必要がある。非分岐
命令のために対応する予測情報エントリが使用されるこ
とはない。この点は、非常に近い将来にプロセッサによ
って必要とされる予測情報の記憶およびアクセスを行う
時の迅速なアクセスとスケールメリットとを達成するた
め容認される。予測情報は、対応する命令がL0I２０４
からフェッチされる時、同時にL0IBR２０２からフェッ
チされることができる。この情報を２つの独立したキャ
ッシュ構造から引き出す必要性はない。2. High Level Cache It should be noted that many of the instructions stored in the unified low level cache 200 are non-branch instructions. No corresponding prediction information entry is used for a non-branch instruction. This is acceptable in order to achieve rapid access and economies of scale in storing and accessing prediction information needed by the processor in the very near future. The prediction information indicates that the corresponding instruction is L0I204
Fetched from the L0IBR 202 at the same time. There is no need to derive this information from two independent cache structures.

【００５１】しかしながら、図２の装置は、統合された
高レベルの命令／分岐予測キャッシュを含まない。その
理由の一部は、高レベルのキャッシュ構造は低レベルの
キャッシュ構造に比較して典型的には大規模かつ低速で
あり、また、高レベルの統合構造の予測情報部分はあま
りに多くのブランク・エントリを含む(すなわちスペー
スを浪費する)という点にある。その代わりに、L0IBR２
０２に記憶される分岐予測情報の比較的多くの部分が個
別の高レベル分岐予測キャッシュすなわちL1B２０８に
記憶される。L1B２０８の独立構成によって、L0IBR２０
２に記憶される場合より高い密度で分岐予測情報を記憶
することができる。However, the apparatus of FIG. 2 does not include an integrated high-level instruction / branch prediction cache. Part of the reason is that higher-level cache structures are typically larger and slower than lower-level cache structures, and the predictive information portion of higher-level consolidated structures has too many blanks. It contains entries (ie wastes space). Instead, L0IBR2
02 is stored in a separate high level branch prediction cache or L1B 208. By the independent configuration of L1B208, L0IBR20
2, the branch prediction information can be stored at a higher density than when stored in the second branch.

【００５２】A. L1キャッシュ図２の装置の一部として開示されている第２の構造は、
第２レベル命令キャッシュ(すなわちL1キャッシュ２０
６)である。L1キャッシュ２０６は、低レベル統合キャ
ッシュ２００に比較して高いレベルのキャッシュであ
り、低レベル統合キャッシュ２００のL0I部分２０４に
記憶されるものより多数の命令を記憶できるように設計
されている。L1キャッシュ２０６に記憶される命令の一
部は、L0Iへロードされた命令のコピーであり、L1キャ
ッシュ２０６に記憶される命令のその他の部分は、L0I
２０４にロードされるべき命令のコピー(またはL0I２０
４に上書きされた命令のコピー)である。しかし、L1キ
ャッシュ２０６がL0I２０４へロードされた命令のコピ
ーを記憶するのに対して、L1２０６はL0IBR２０２へロ
ードされた分岐予測情報値のコピーを記憶しないことが
望ましい(L1B２０８はこの動作をある程度行う)。またL
1キャッシュ２０６はL0I２０４に存在するあらゆる命令
をいつも記憶しているわけではない。 A. L1 Cache The second structure disclosed as part of the device of FIG.
The second level instruction cache (ie, L1 cache 20)
6). L1 cache 206 is a higher level cache compared to lower level unified cache 200 and is designed to store more instructions than are stored in L0I portion 204 of low level unified cache 200. Some of the instructions stored in L1 cache 206 are copies of the instructions loaded into L0I, and other parts of the instructions stored in L1 cache 206 are L0I.
A copy of the instruction to be loaded into 204 (or L0I20
4 is a copy of the instruction overwritten). However, while L1 cache 206 stores a copy of the instruction loaded into L0I 204, L1206 preferably does not store a copy of the branch prediction information value loaded into L0IBR 202 (L1B 208 performs this operation to some extent). . Also L
One cache 206 does not always store every instruction present in L0I 204.

【００５３】L1２０６がL0IBR２０２へロードされた分
岐予測情報値のコピーを記憶しないことが望ましいが、
L1キャッシュ２０６は好ましくはなんらかの分岐予測情
報を記憶する。例えば、コンパイラ等によって生成され
る分岐予測ヒントは、それらがヒントを与える分岐命令
の一部として符号化される場合が多い。その結果、L1キ
ャッシュ２０６の１つの好ましい実施形態はこれらのヒ
ントを記憶し続ける。しかしながら、後述されるよう
に、L1キャッシュ２０６とは別のキャッシュ２０８に
(分岐履歴のような)さらに広範囲な分岐予測情報のバッ
クアップ・コピーを記憶することには利益がある。この
ように、二次分岐予測キャッシュ２０８は、L1キャッシ
ュ２０６とは別に、分岐予測情報の記憶のためさらに最
適化された形態でフォーマットされ管理される。Preferably, L1206 does not store a copy of the branch prediction information value loaded into L0IBR 202,
L1 cache 206 preferably stores some branch prediction information. For example, branch prediction hints generated by a compiler or the like are often encoded as part of the branch instruction that gives the hint. As a result, one preferred embodiment of the L1 cache 206 continues to store these hints. However, as will be described later, a cache 208 different from the L1 cache 206 is used.
It is beneficial to store backup copies of more extensive branch prediction information (such as branch history). In this way, the secondary branch prediction cache 208 is formatted and managed separately from the L1 cache 206 in a more optimized form for storing branch prediction information.

【００５４】B. L1B 図２の装置の一部として開示されている第３の構造は、
二次分岐予測キャッシュ(すなわちL1Bキャッシュ２０
８)である。L1キャッシュ２０６がL0I２０４に対するも
のであるように、L1Bキャッシュ２０８はL0IBR２０２に
対するものである。従って、L1B２０８は、L0IBR２０２
に対して高いレベルのキャッシュであり、その結果、典
型的には、L0IBR２０２に比較して大規模であり低速で
ある。 B. L1B A third structure disclosed as part of the device of FIG.
Secondary branch prediction cache (that is, L1B cache 20)
8). L1B cache 208 is for L0IBR 202, just as L1 cache 206 is for L0I204. Therefore, the L1B 208
Is typically a large and slow cache compared to the L0 IBR 202.

【００５５】L1B２０８の好ましい実施形態は、L0I２０
４に記憶される特定の命令バンドルペアに対応する４つ
のL0IBRチャンク３０２乃至３０８のうちの多くとも２
つを記憶する能力しか持たない点は注意する必要があ
る。従って、L0IBR２０２によって維持される分岐予測
情報のその他の残りはL1B２０８によって維持されな
い。しかしながら、特定のアプリケーションの必要性に
応じて、L1B２０８にそれより多いまたは少ない、また
は異なるタイプの分岐予測情報を記憶することができる
点は当業者に理解されることであろう。A preferred embodiment of L1B208 is L0I20
4, at most two of the four L0 IBR chunks 302-308 corresponding to a particular instruction bundle pair stored in
Note that you only have the ability to remember one. Therefore, the rest of the branch prediction information maintained by L0IBR 202 is not maintained by L1B 208. However, it will be appreciated by those skilled in the art that more or less or different types of branch prediction information may be stored in L1B 208, depending on the needs of a particular application.

【００５６】分岐履歴およびトリガ予測情報に関して、
L1B２０８は、L0IBR２０２に記憶されることができるも
のより非常に多量の情報を記憶する。従って、命令バン
ドルペアおよびその対応する分岐予測情報が低レベル統
合キャッシュ２００から削除されても、L1キャッシュ２
０６からフェッチされる命令バンドルペアに対応する分
岐履歴およびトリガ予測情報がL1Bキャッシュ２０８に
存在する可能性がある。もしも存在していれば、低レベ
ル統合キャッシュ２００は、L1キャッシュ２０６および
L1Bキャッシュ２０８からフェッチされるデータで大部
分満たされることができる。Regarding branch history and trigger prediction information,
L1B 208 stores much more information than can be stored in L0 IBR 202. Therefore, even if the instruction bundle pair and its corresponding branch prediction information are deleted from the low level unified cache 200, the L1 cache 2
The branch history and trigger prediction information corresponding to the instruction bundle pair fetched from 06 may be present in the L1B cache 208. If present, low-level unified cache 200 includes L1 cache 206 and L1 cache 206.
Most can be filled with data fetched from the L1B cache 208.

【００５７】上述のように、L1B２０８は、好ましくはL
0I命令バンドルペアに対応する４つのL0IBRＳＴＨチャ
ンク３０２乃至３０８のうちの２つだけを記憶する。そ
の結果、図２の装置の１つの好ましい実施形態は、２つ
のＳＴＨチャンクだけが命令バンドルペアによって使用
される時、L0IBRＳＴＨチャンク３０２乃至３０８を記
憶するためにL1B２０８を使用するだけである。このよ
うに、(1)バンドルペアの各命令バンドルが１つの分岐
命令を保持する時、(２)バンドルペアの１つの命令バン
ドルが２つの分岐命令を保持し、もう一方の命令バンド
ルが分岐命令を保持していない時、および(3)バンドル
ペアの１つの命令バンドルが３つの分岐命令を保持し
(すなわち命令バンドルのシラブル_0に存在する分岐に
よって最大３つの分岐命令が保持されていると推定され
る)、もう一方の命令バンドルが分岐命令を保持してい
ない時、L1B２０８が使用される。注意されるべき点で
あるが、最後のケースでは、３つの分岐命令に関する分
岐履歴およびトリガ予測情報は２つのＳＴＨチャンク４
００の中だけに符号化されている。その他のすべてのケ
ースにおいて、L1B２０８はL0IBR２０２に対するバック
アップとしては使用されない。幸いにも、通常は命令バ
ンドルペアにおける分岐の配列は上記列挙された３つの
ケースのうちの１つに相当する。当業者に認められるこ
とであろうが、L1B２０８は、使用可能なチップ領域の
許容するかぎりは、さらに多量のL0IBR情報を記憶する
よう十分に大きくすることができる。As mentioned above, L1B 208 is preferably
Store only two of the four L0IBRSTH chunks 302-308 corresponding to the 0I instruction bundle pair. As a result, one preferred embodiment of the apparatus of FIG. 2 only uses L1B 208 to store L0IBRSTH chunks 302-308 when only two STH chunks are used by the instruction bundle pair. Thus, when (1) each instruction bundle in a bundle pair holds one branch instruction, (2) one instruction bundle in the bundle pair holds two branch instructions and the other instruction bundle is a branch instruction And (3) one instruction bundle of the bundle pair holds three branch instructions.
(In other words, it is estimated that a maximum of three branch instructions are held by the branch existing in the syllable_0 of the instruction bundle), and when the other instruction bundle does not hold the branch instruction, the L1B 208 is used. It should be noted that in the last case, the branch history and trigger prediction information for the three branch instructions are two STH chunks 4
It is encoded only in 00. In all other cases, L1B 208 is not used as a backup to L0 IBR 202. Fortunately, the arrangement of branches in an instruction bundle pair usually corresponds to one of the three cases listed above. As will be appreciated by those skilled in the art, L1B 208 can be made large enough to store even more L0IBR information, as long as the available chip area allows.

【００５８】L0IBR２０２とL1B２０８の間の好ましい関
係は、一般的にライトスルー(write-through)関係であ
る。換言すれば、分岐履歴ビットおよびトリガ予測ビッ
トがL0IBR２０２で更新される時、同一のビットがL1B２
０８に書き込まれる。しかしながら、L0IBR２０２とL1B
２０８の１つの好ましい構成は、この一般的なライトス
ルーの方針に多数の例外を取り入れる。第１に、L1B２
０８は命令バンドルペアあたり２つのＳＴＨチャンク３
０２、３０４のための記憶空間を提供するだけであるの
で、命令バンドルペアが２つ以下のＳＴＨチャンク３０
２、３０４を利用する時にのみＳＴＨチャンク３０２、
３０４がL1B２０８に書き込まれる。しかし、大部分の
命令バンドルペアは２つ以下の分岐命令しか保持しない
ので、一般的なライトスルーの方針へのこの第一の例外
は、分岐予測性能をほとんど低下させないと考えられ
る。The preferred relationship between L0IBR 202 and L1B 208 is generally a write-through relationship. In other words, when the branch history bit and the trigger prediction bit are updated in L0IBR 202, the same bit is
08 is written. However, L0IBR202 and L1B
One preferred configuration of 208 incorporates a number of exceptions to this general write-through strategy. First, L1B2
08 is two STH chunks 3 per instruction bundle pair
02, 304, only provide storage space for the STH chunk 30 with no more than two instruction bundle pairs.
STH chunk 302 only when using 2, 304
304 is written to the L1B 208. However, since most instruction bundle pairs hold no more than two branch instructions, this first exception to the general write-through strategy is expected to have little effect on branch prediction performance.

【００５９】L0IBR２０２とL1B２０８との間の一般的な
ライトスルー関係に対する別の１つの例外は、分岐命令
がL1B２０８を使用しないことができることである。例
えば、IA-64ソフトウェア・アーキテクチャの".clr"ヒ
ントの１つのステートは、図２の装置に対して特定の分
岐命令に関してL1B２０８を使用しないように命令する
ため使用される。このように、コンパイラは、特定の分
岐命令(例えば、コンパイラ生成ヒントを使用するだけ
で予測することができる分岐)を予測する際にL1B２０８
を使用しないことを決定することができる。Another exception to the general write-through relationship between L0IBR 202 and L1B 208 is that branch instructions may not use L1B 208. For example, one state of the ".clr" hint of the IA-64 software architecture is used to instruct the device of FIG. 2 not to use L1B 208 for a particular branch instruction. In this way, the compiler can predict L1B 208 when predicting a particular branch instruction (eg, a branch that can be predicted using only compiler generated hints).
Can be decided not to use.

【００６０】L0IBR２０２とL1B２０８との間の一般的な
ライトスルー関係に対する第３の例外は、(例えば、分
岐命令と共に符号化されるコンパイラ・ヒントに基い
て、すべての履歴およびトリガ予測ビットを論理ゼロま
たはその逆に設定するイニシャライザ(initializer)に
よって)L0IBR２０２が単になんらかのデフォルト値に初
期化された場合にすることができる予測よりも高い予測
精度を生む情報をＳＴＨチャンク３０２が提供すると考
えられる時、１つのＳＴＨチャンク３０２がL1B２０８
に書き込まれるだけというものである。L0IBR２０２とL
1B２０８との間の一般的ライトスルー関係に対する上記
第２の例外の詳細は後述される。A third exception to the general write-through relationship between L0IBR 202 and L1B 208 is (for example, based on compiler hints coded with branch instructions, all history and trigger prediction bits are reset to logical zeros). When the STH chunk 302 is deemed to provide information that yields a higher prediction accuracy than can be obtained if the L0 IBR 202 is simply initialized to some default value (by an initializer setting the reverse), 1 One STH chunk 302 is L1B 208
Is simply written to L0IBR202 and L
Details of the second exception to the general write-through relationship with 1B208 will be described later.

【００６１】L0IBR２０２におけるＳＴＨチャンク３０
２を初期化する多くの可能な方法がある。そのような初
期化を実行する１つの方法は、L1B２０８から関連した
データを取り出すものである。しかしながら、多数の要
因によって、必要とされるＳＴＨチャンク３０２乃至３
０８がL1B２０８に存在しない事態があり得る。例え
ば、(1)L1B２０８がL1２０６より小さい場合、(2)命令
バンドルがL1キャッシュ２０６にフェッチされてはいる
がまだL0I２０４へロードされていない場合、(3)L0IBR
２０２に記憶されているデータのすべてがL1B２０８に
コピーされているわけではない場合、または(4)命令ヒ
ントがL1B２０８を使用すべきではないことまたはそれ
に依拠すべきではないことを示す場合には、L1キャッシ
ュ２０６からフェッチされている命令バンドルペアに対
応する値をL1B２０８が保持していない時間が存在す
る。STH chunk 30 in L0IBR 202
There are many possible ways to initialize 2. One way to perform such initialization is to retrieve the relevant data from L1B 208. However, due to a number of factors, the required STH chunks 302-3 are
08 may not be present in the L1B 208. For example, (1) if L1B 208 is smaller than L1206, (2) if the instruction bundle has been fetched into L1 cache 206 but not yet loaded into L0I204, (3) L0IBR
If not all of the data stored in 202 has been copied to L1B 208, or (4) if the instruction hint indicates that L1B 208 should not be used or should not be relied upon, There is a time when the L1B 208 does not hold the value corresponding to the instruction bundle pair fetched from the L1 cache 206.

【００６２】L1B２０８が必要とされる値を含まないと
すれば、L0IBR２０２を初期化するための他のなんらか
の手段が存在しなければならない。L1Bミスに遭遇した
後にL0IBRＳＴＨチャンク３０２を初期化する１つの好
ましい方法は、ＳＴＨチャンク３０２のすべてのビット
を論理ゼロ(または代替的に論理１)で満たすものであ
る。その結果、分岐結果を正しく予測する確率は、その
分岐に初めて遭遇するときには相対的に低くなる。しか
しながら、その分岐が一回または複数回実行された後に
は予測精度は向上する。当業者に認められることであろ
うが、論理ゼロでＳＴＨチャンク３０２を初期化するこ
とを要求されるハードウェアは、実施が比較的簡単で低
コストである。If L1B 208 does not contain the required value, then some other means for initializing L0IBR 202 must exist. One preferred method of initializing the L0IBRSTH chunk 302 after encountering an L1B miss is to fill all bits of the STH chunk 302 with a logical zero (or alternatively a logical one). As a result, the probability of correctly predicting a branch result is relatively low the first time the branch is encountered. However, the prediction accuracy is improved after the branch is executed once or multiple times. As will be appreciated by those skilled in the art, the hardware required to initialize the STH chunk 302 with a logical zero is relatively simple to implement and low cost.

【００６３】L1Bミスに遭遇した後にＳＴＨチャンク３
０２を初期化する別の方法は、分岐命令に符号化される
予測ヒント(例えば、静的または動的コンパイラ・ヒン
ト)によって仮定される値に従って、チャンク３０２の
ビットのすべてを論理ゼロまたは論理１のいずれかで埋
めるものである。より迅速なL0IBRの初期化を提供する
この初期化方式の１つのバリエーションは、(1)分岐が
成立しないと仮定し、(2)その分岐の対応するＳＴＨチ
ャンク３０２をすべてゼロで満たし、(3)論理１で満た
すとより正確な分岐予測を提供するとヒント情報が示す
場合は、１サイクル余分に使ってＳＴＨチャンク３０２
を論理１で満たし直すというものである。STH chunk 3 after encountering L1B mistake
Another way to initialize 02 is to set all of the bits of chunk 302 to logic zero or logic one according to the value assumed by the prediction hint (eg, a static or dynamic compiler hint) encoded in the branch instruction. Is to be filled in. One variation of this initialization scheme that provides faster L0 IBR initialization is (1) assuming that a branch is not taken, (2) filling the corresponding STH chunk 302 of that branch with all zeros, and (3) ) If hint information indicates that satisfying logic 1 provides more accurate branch prediction, use extra 1 cycle to STH chunk 302
With logic 1 again.

【００６４】上述のように、L0IBR２０２が単になんら
かのデフォルト値で初期化される場合に比較してよりす
ぐれた予測精度を生む情報をチャンク３０２が保持する
と考えられる場合にのみ、L0IBRＳＴＨチャンク３０２
はL1B２０８に書き込まれる。L1Bミスに遭遇した後にＳ
ＴＨチャンク３０２がすべて論理ゼロに初期化されると
仮定すると、L0IBR２０２の再初期化はL1B２０８を参照
することなく同じデータを提供するので、L1B２０８へ
のこれらゼロの書込みは実質的な目的になんら貢献しな
い。さらに、L1B２０８へのL0IBR初期化ゼロの書込み失
敗は大きな利点を与える。最大の利点は、L1B２０８へ
のL0IBR初期化ゼロの書込み失敗によってL1B２０８がよ
り大量の予測情報を効果的に記憶することができる点で
ある。As described above, the L0IBRSTH chunk 302 is only used when the chunk 302 is considered to hold information that produces better prediction accuracy than when the L0IBR 202 is simply initialized with some default value.
Is written to the L1B 208. S after encountering L1B mistake
Assuming that the TH chunks 302 are all initialized to logical zeros, writing these zeros to L1B 208 does not serve any substantial purpose since reinitialization of L0IBR 202 provides the same data without reference to L1B 208. do not do. In addition, the failure to write L0IBR initialization zero to L1B 208 provides significant advantages. The greatest advantage is that failure to write L0 IBR initialization zero to L1B 208 allows L1B 208 to effectively store a larger amount of prediction information.

【００６５】例えば、冗長性およびインデックス・ハッ
シュを使用してデータを記憶するタグなしのｎウェイ・
キャッシュとしてL1B２０８が構築されるとすれば、L1B
２０８への新しいデータ値の書込みは、他のデータ値の
重複コピーに上書きする傾向を持つ。換言すれば、デー
タA、データBおよびデータCの重複コピーがL1B２０８に
おけるインデックスづけされた記憶位置の異なるセット
に既に書き込まれていれば、L1BへのデータDの書込み
は、最悪のシナリオでは、データAの重複コピーのうち
の１つ、データBの重複コピーのうちの１つおよびデー
タCの重複コピーのうちの１つを上書きする。理解され
ることであろうが、そのような上書きがあまりにも頻繁
に発生すれば、データA、データBおよびデータCがL1B２
０８においてもはや存在しない時が来る。For example, an untagged n-way cache storing data using redundancy and index hashes
If L1B 208 is constructed as a cache, L1B
Writing new data values to 208 tends to overwrite duplicate copies of other data values. In other words, if duplicate copies of data A, data B, and data C have already been written to different sets of indexed storage locations in L1B 208, writing data D to L1B may be Overwrite one of the duplicate copies of A, one of the duplicate copies of data B, and one of the duplicate copies of data C. As will be appreciated, if such overwriting occurs too frequently, data A, data B, and data C will be stored in L1B2
At 08 comes a time when it no longer exists.

【００６６】データA、データBおよびデータCが上書き
される可能性を減少させる１つの方法は、L1B２０８にL
0IBR初期化のゼロを書き込むことを避けることである。
これらのゼロがL1B２０８に書き込まれ、これらのゼロ
に対応する命令バンドルペアが引き続いてL0I２０４か
ら削除されると、L0I２０２に命令バンドルペアを再ロ
ードし、L1B２０８からゼロを再ロードするアクション
は、L0IBR２０２がなんらかの廉価な手段によってデフ
ォルト値に再初期化される場合と比較して優れた予測精
度は提供されない。One way to reduce the likelihood of data A, data B and data C being overwritten is to use L1B 208
0 Avoid writing zeros for IBR initialization.
If these zeros are written to L1B 208 and the instruction bundle pairs corresponding to these zeros are subsequently deleted from L0I 204, the action of reloading L0I 202 with the instruction bundle pairs and reloading zeros from L1B 208 is L0IBR 202. It does not provide better prediction accuracy than if it were re-initialized to a default value by some inexpensive means.

【００６７】L1B２０８にL0IBR初期化ゼロを書き込むこ
とに失敗するもう１つの利点は、L1B２０８にゼロを書
き込むことに失敗することが、L1Bに関する"書込み"ト
ラフィックの量を減らすことである。これは、L1B２０
８が限られた書込みバンド幅をもつ場合特に重要である
(例えば、L1B２０８が単一の読取り／書込みポートだけ
を持つとすれば、L1B書込み数の低減は、L1B読取りのた
めさらに多くのバンド幅を開放する)。しかしながら、
非ゼロのＳＴＨチャンク３０２がすでにL1B２０８に維
持されている時、すべてゼロのＳＴＨチャンク３０２が
L1B２０８に書き込まれる必要があることは注意すべき
点である。さもなければ、L0IBR２０２へのデータを再
ロードすると、古いＳＴＨチャンク３０２をロードして
しまう可能性がある。Another advantage of failing to write L0IBR initialization zeros to L1B 208 is that failing to write zeros to L1B 208 reduces the amount of "write" traffic for L1B. This is L1B20
8 is especially important if it has a limited write bandwidth
(For example, if L1B 208 has only a single read / write port, reducing the number of L1B writes frees up more bandwidth for L1B reading). However,
When the non-zero STH chunk 302 is already maintained in L1B 208, the all-zero STH chunk 302
Note that it is necessary to write to L1B 208. Otherwise, reloading data into the L0IBR 202 may load the old STH chunk 302.

【００６８】L1B２０８の不必要な更新を避けるため、L
1B関連性情報が維持される。この関連性情報は、好まし
くは（必要ではないが）L0IBR２０２に記憶され、また
ある時点でL1B２０８に書き込まれるＳＴＨチャンク３
０２、３０４の各々に対して記憶される(すなわち、関
連性情報は命令バンドルペアあたり多くとも２つのＳＴ
Ｈチャンク３０２、３０４に関して記憶される)。単一
のＳＴＨチャンク３０２に対応する関連性情報は、単一
の情報ビットだけで構成されなければならない。命令バ
ンドルペアがL1キャッシュ２０６からフェッチされ、対
応してL1Bキャッシュ２０８から予測情報のフェッチが
試みられる時、L1B関連性ビットは初期化される。L1B２
０８からのＳＴＨチャンク３０２のフェッチの試みがL1
Bミスに終わるとすれば、ミスしたL1Bデータで満たす予
定であったＳＴＨチャンク３０２はデフォルト値に初期
化され、その対応するL1B関連性ビットはゼロに設定さ
れる。従って、関連性ビットは、L1B２０８が新たに初
期化されたＳＴＨチャンク３０２のバックアップ・コピ
ーを含まないことを示す。To avoid unnecessary updating of L1B 208,
1B relevance information is maintained. This relevancy information is preferably (but not necessarily) stored in L0 IBR 202 and written to L1B 208 at some point in STH chunk 3
02, 304 (ie, the relevancy information is at most two STs per instruction bundle pair).
H chunks 302, 304). The relevancy information corresponding to a single STH chunk 302 must consist of only a single information bit. When an instruction bundle pair is fetched from the L1 cache 206 and a corresponding attempt is made to fetch prediction information from the L1B cache 208, the L1B relevance bits are initialized. L1B2
Attempt to fetch STH chunk 302 from 08
If so, the STH chunk 302, which was to be filled with the missed L1B data, is initialized to a default value and its corresponding L1B relevance bit is set to zero. Accordingly, the relevance bit indicates that L1B 208 does not include a backup copy of the newly initialized STH chunk 302.

【００６９】L1B２０８からのＳＴＨチャンク３０２の
フェッチの試みがL1Bヒットを生むが、L1B２０８から読
み取られるデータがＳＴＨチャンク３０２のために生成
されるデフォルト値と同じものであれば、ＳＴＨチャン
ク３０２はL1B２０８からフェッチされるデータで満た
される。代替的形態として、ＳＴＨチャンク３０２は、
別のデフォルトのイニシャライザによってそのデフォル
ト値にセットされることもできる。L1B２０８がヒット
を生成するが、コストのより低いL0IBR初期化手段によ
って提供されるものと比較して予測情報が優れていない
場合、新たに満たされたＳＴＨチャンク３０２に対応す
るL1B関連性ビットもまたゼロにセットされる。An attempt to fetch STH chunk 302 from L1B 208 produces an L1B hit, but if the data read from L1B 208 is the same as the default value generated for STH chunk 302, STH chunk 302 is Filled with data to be fetched. As an alternative, the STH chunk 302
It can also be set to that default value by another default initializer. If L1B 208 generates a hit, but the prediction information is not good compared to that provided by the lower cost L0 IBR initialization means, then the L1B relevance bits corresponding to the newly filled STH chunk 302 will also be Set to zero.

【００７０】L1BヒットがL1B２０８からすべてゼロを読
み取るかまたはすべて１を読み取るかに関する判断は、
L1B２０８から読み取られるビット(複数)の単純なＮＯ
ＲまたはＡＮＤによって行うことができる。L1B２０８
からのＳＴＨチャンク３０２のフェッチの試みがL1Bヒ
ットを生み、L1B２０８から読み取られたデータが、前
述のデフォルトのイニシャライザによって生成されるこ
とができるものと異なる予測情報を提供するならば、新
たに満たされたＳＴＨチャンク３０２に対応するL1B関
連性ビットは"１"にセットされ、これによってL1B２０
８はＳＴＨチャンク３０２の関連するバックアップ履歴
を保持していることを示す。The decision as to whether the L1B hit reads all zeros or all ones from L1B 208 is:
Simple NO of bits read from L1B208
This can be done by R or AND. L1B208
If the attempt to fetch STH chunk 302 from L1B produces an L1B hit and the data read from L1B 208 provides different prediction information than can be generated by the default initializer described above, The L1B relevance bit corresponding to the STH chunk 302 that has been set is set to "1".
8 indicates that the related backup history of the STH chunk 302 is held.

【００７１】L1B関連性ビットを初期化する回路５００
の１つの例が図５に示されている。比較器５０２は、す
べて論理ゼロのような選択データ値パターン（または複
数のパターン）をL1B２０８から取り出されたＳＴＨチ
ャンク３０２と比較する。２つの値が一致せず(これは
インバータ５０４の出力により示される)、L1B２０８に
ヒットがあれば、適切なL1B関連性ビット(ERビット)が"
１"にセットされる。そうでない場合は"0"にセットされ
る。Circuit 500 for Initializing L1B Related Bits
Is shown in FIG. Comparator 502 compares the selected data value pattern (or patterns), such as all logic zeros, with STH chunk 302 retrieved from L1B 208. If the two values do not match (as indicated by the output of inverter 504) and there is a hit in L1B 208, then the appropriate L1B relevance bit (ER bit) is "
Set to "1", otherwise set to "0".

【００７２】L0IBRの１つの好ましい実施形態におい
て、命令バンドルペアによって使用されるＳＴＨチャン
ク３０２、３０４に対応する関連性ビットは、分岐履歴
およびトリガ予測情報を記憶するために使用されないＳ
ＴＨチャンク３０６、３０８に記憶される。所与の命令
バンドルペアが分岐履歴およびトリガ予測情報の記憶の
ためにそのL0IBRＳＴＨチャンク３０２乃至３０８のう
ちの２つを越えるチャンクを使用する時、関連性ビット
はL0IBR２０２に記憶されず、L1B２０８は所与の命令バ
ンドルペアに関するデータをバックアップするために使
用されない。命令バンドルペアが分岐履歴およびトリガ
予測情報の記憶のためそのL0IBRＳＴＨチャンク３０２
乃至３０８のうちの２つだけを使用し、符号化されたセ
ット４００を形成するために使用するならば、符号化さ
れたセットに関する関連性ビットのペアは残りの２つの
L0IBRＳＴＨチャンク３０６、３０８に記憶されること
が可能であり、２つの符号化されていないＳＴＨチャン
ク３０２、３０４がバックアップされる場合と同様の方
法で符号化セット４００はL1B２０８にバックアップさ
れることができる。In one preferred embodiment of the L0IBR, the relevance bits corresponding to the STH chunks 302, 304 used by the instruction bundle pair are not used to store branch history and trigger prediction information.
Stored in the TH chunks 306, 308. When a given instruction bundle pair uses more than two of its L0IBRSTH chunks 302-308 for storage of branch history and trigger prediction information, no relevancy bits are stored in L0IBR 202 and L1B 208 is in place. Not used to back up data about a given instruction bundle pair. The instruction bundle pair has its L0IBRSTH chunk 302 for storing branch history and trigger prediction information.
308 are used to form the encoded set 400, the pair of relevance bits for the encoded set is
The coded set 400 can be stored in the L0IBRSTH chunks 306, 308, and the coded set 400 can be backed up to the L1B 208 in the same manner as if two uncoded STH chunks 302, 304 were backed up. .

【００７３】3. 新しい分岐予測構造図２には、２つの新しい分岐予測構造２１０、２１２が
示される。第１の構造２１０は、命令パイプラインの"
フロントエンド"の一部であって、分岐のフェッチの際
に"投機的"分岐予測を生成する。第２の構造２１２は、
命令パイプラインの"バックエンド"の一部であって、分
岐の退去の際に"非投機的"分岐予測を生成する。命令バ
ンドルペアおよびその対応する予測情報を統合低レベル
・キャッシュ２００からフェッチする際、フェッチされ
る予測情報は、そのアドレスと共に、投機的新分岐予測
構造２１０に提供される。投機的新分岐予測構造２１０
は、好ましくはパターン履歴テーブル(pattern history
table：以下ＰＨＴと呼称する)およびＩＰ相対加算器
を含む、多数の下部構造を備える。3. New Branch Prediction Structure FIG. 2 shows two new branch prediction structures 210, 212. The first structure 210 describes the instruction pipeline "
A "speculative" branch prediction that is part of the "front end" and fetches a branch. The second structure 212
Part of the "back end" of the instruction pipeline that generates "non-speculative" branch predictions upon branch eviction. When fetching an instruction bundle pair and its corresponding prediction information from the unified low-level cache 200, the fetched prediction information along with its address is provided to a speculative new branch prediction structure 210. Speculative new branch prediction structure 210
Is preferably a pattern history table
table: hereinafter referred to as PHT) and an IP relative adder.

【００７４】パターン履歴テーブルは、L0IBRＳＴＨチ
ャンク３０２の一部として記憶されることができる分岐
履歴ビットの種々のパターンに対応するカウンタの行を
含むテーブルである。例えば、ＳＴＨチャンク３０２が
４ビットの分岐履歴を保持するとすれば、ＰＨＴは、４
履歴ビットがとることができる種々のパターンに対応す
る２^４(すなわち１６)のカウンタを保有する。ＰＨＴ
は、好ましくは、擬似(quasi)アドレス単位に基づいて
アドレス指定することが可能である。換言すれば、あら
ゆる命令アドレスに関して、１６個のカウンタの個別の
行は存在しないが、エイリアシングが許容可能な低い頻
度に限定されるように、カウンタの十分な行が提供され
る。分岐がパイプラインのバックエンドで退去する時、
フェッチされた時点でその分岐に関して存在していた分
岐履歴(すなわち分岐がフェッチされた時にL0IBR２０２
から取り出された履歴)に対応するカウンタが、適切に
更新される(例えば、分岐が成立した場合はインクリメ
ントされ、不成立だった場合はデクリメントされる)。The pattern history table is a table that includes rows of counters corresponding to various patterns of branch history bits that can be stored as part of the L0IBRTH chunk 302. For example, if the STH chunk 302 holds a 4-bit branch history, the PHT is 4
Carrying counter 2 ⁴ corresponding to various patterns that can be history bit takes ^(i.e. 16). PHT
Are preferably addressable based on quasi address units. In other words, for every instruction address, there is no separate row of 16 counters, but enough rows of counters are provided so that aliasing is limited to an acceptably low frequency. When a branch leaves at the back end of the pipeline,
The branch history that existed for the branch at the time it was fetched (i.e., L0IBR 202 when the branch was fetched).
The counter corresponding to (history extracted from) is appropriately updated (for example, it is incremented when the branch is taken, and decremented when the branch is not taken).

【００７５】従って、図２の装置は、先の分岐結果の履
歴に基いて分岐予測を行うだけではなく、分岐に関する
前の履歴パターンの履歴に基づいて分岐予測を行う。L0
IBR２０２に記憶される分岐履歴と連係して、パターン
履歴テーブルは、２レベル分岐予測アルゴリズムの一部
を形成する。ＰＨＴがとることのできる種々の形式は前
述の引用文献"Alternative Implementations of Two-Le
vel Adaptive BranchPrediction"に開示されている。こ
の引用文献ではＰＨＴは投機的新分岐予測構造２１０の
一部として開示されているが、ＰＨＴによって維持され
るカウンタは非投機的に更新される。Therefore, the apparatus shown in FIG. 2 performs not only branch prediction based on the history of the previous branch result but also branch prediction based on the history of the previous history pattern regarding the branch. L0
In conjunction with the branch history stored in IBR 202, the pattern history table forms part of a two-level branch prediction algorithm. The various forms that the PHT can take are described in the cited reference "Alternative Implementations of Two-Le
vel Adaptive BranchPrediction. In this reference, the PHT is disclosed as part of the speculative new branch prediction structure 210, but the counter maintained by the PHT is updated non-speculatively.

【００７６】ＳＴＨチャンク３０２がL0IBR２０２から
フェッチされる時、チャンク３０２のトリガ部分がその
最も新しい履歴ビット位置にシフトされ、最も古い履歴
ビットがＳＴＨチャンク３０２からシフトアウトされ
る。次に、この新しい履歴が、その対応するバンドル・
アドレスの一部分と共に、ＳＴＨチャンク３０２の一部
として記憶されるべき次のフェッチ・トリガを決定する
ためＰＨＴを指すインデックスとして使用される。When the STH chunk 302 is fetched from the L0IBR 202, the trigger portion of the chunk 302 is shifted to its newest history bit position and the oldest history bit is shifted out of the STH chunk 302. Next, this new history is stored in its corresponding bundle
Along with a portion of the address, it is used as an index pointing to the PHT to determine the next fetch trigger to be stored as part of the STH chunk 302.

【００７７】投機的新分岐予測構造はまたＩＰ相対加算
器を含む。フェッチされた命令バンドルペアによって使
用されるＳＴＨチャンク３０２乃至３０８における各分
岐履歴が更新され、更新された履歴を使用してＰＨＴか
ら更新されたトリガ値が取り出された後、更新されたト
リガ値と共に最新のバンドルペア・エントリポイントを
使用して、そのバンドルペア入口点の後の最初に成立す
ると予測される分岐が決定される。次に、ＩＰ相対加算
器が、成立すると予測された分岐命令によって運ばれる
オフセット値をその分岐のバンドル・アドレスに対応す
る命令ポインタ値に加算することによって、新しいター
ゲット予測が生成される。The speculative new branch prediction structure also includes an IP relative adder. Each branch history in the STH chunks 302-308 used by the fetched instruction bundle pair is updated, and the updated trigger value is retrieved from the PHT using the updated history, and then updated with the updated trigger value. Using the latest bundle pair entry point, the first predicted taken branch after the bundle pair entry point is determined. Next, the IP relative adder generates a new target prediction by adding the offset value carried by the branch instruction predicted to be taken to the instruction pointer value corresponding to the bundle address of that branch.

【００７８】投機的新分岐予測構造２１０によって生成
された更新予測情報のすべてが、ISBBR２２６(後述)お
よび/またはL0IBR２０２を更新する際に順次使用するた
め、マルチプレクサ(MUX)２２８に送られる。投機的新
分岐予測構造２１０によって分岐に関して生成される新
しいトリガ予測は、(その分岐が最新のバンドルペア・
エントリポイントに続く最初に成立すると予測された分
岐であるとすれば)新しいターゲットと共に、ISB２２６
またはL0I２０４に記憶される分岐命令の次のフェッチ
結果を予測する情報を形成する。この情報は命令パイプ
ラインのフロントエンドによって投機的に生成されるの
で、分岐命令の次のフェッチの際には使用可能状態にあ
る(この場合、分岐が先のフェッチから退去する前に次
のフェッチが行われる可能性がある)。さらに予測情報
は、余分のサイクルを必要とすることなく命令ポインタ
を向け直す目的で利用することができる(詳細は後述)。All of the updated prediction information generated by the speculative new branch prediction structure 210 is sent to a multiplexer (MUX) 228 for sequential use in updating ISBBR 226 (described below) and / or L0IBR 202. The new trigger prediction generated for a branch by the speculative new branch prediction structure 210 is (if the branch is the latest bundle pair
ISB 226 with the new target (assuming the first predicted taken branch following the entry point)
Alternatively, information for predicting a fetch result next to the branch instruction stored in the L0I 204 is formed. Since this information is speculatively generated by the front end of the instruction pipeline, it is available on the next fetch of the branch instruction (in this case, the next fetch before the branch leaves the previous fetch). May be done). Further, the prediction information can be used for the purpose of redirecting the instruction pointer without requiring an extra cycle (details will be described later).

【００７９】また、L0IBR２０２からフェッチされる分
岐予測情報は、(例えばローテータｍｕｘ２３４および
（多数あるうちの一つであり得る）分岐実行ユニット２
３６を介して)パイプラインを通される。予測情報に対
応する分岐がパイプラインのバックエンドで退去する
時、その分岐は、その現在のフェッチ予測情報と共に、
非投機的分岐予測構造２１２に渡され、その分岐に関す
る非投機的なフェッチ予測がなされる。非投機的分岐予
測は、マルチプレクサ２２８に渡され、ISBBR２２６ま
たはL0IBR２０２に記憶されている誤った投機的予測を
修正するため使用される。The branch prediction information fetched from the L0 IBR 202 includes the rotator mux 234 and the branch execution unit 2 (which may be one of many).
(Via 36). When the branch corresponding to the prediction information leaves at the back end of the pipeline, the branch, along with its current fetch prediction information,
The information is passed to the non-speculative branch prediction structure 212, and a non-speculative fetch prediction regarding the branch is performed. The non-speculative branch prediction is passed to a multiplexer 228 and used to correct erroneous speculative predictions stored in ISBBR 226 or L0IBR 202.

【００８０】L0IBRＳＴＨチャンク３０２、３０４が更
新される時、L0IBR２０２に記憶されているL1B２０８お
よび適切なL1B関連性ビットもまた更新されなければな
らない。L1B２０８および適切なL1B関連性ビットが更新
されるべきか否かの決定は、好ましくは、非投機的新分
岐予測構造２１２によって行われる。When the L0IBRSTH chunks 302, 304 are updated, the L1B 208 and the appropriate L1B relevance bits stored in the L0IBR 202 must also be updated. The determination of whether the L1B 208 and the appropriate L1B relevance bits should be updated is preferably made by the non-speculative new branch prediction structure 212.

【００８１】L0IBRＳＴＨチャンク３０２が更新される
必要があると判断する度に、非投機的新分岐予測構造２
１２は、L1B２０８およびL0IBR２０２に記憶されている
対応するL1B関連性ビットもまた更新されなければなら
ないと判断する。非投機的新分岐予測構造２１２の１つ
の好ましい実施形態において、更新されたＳＴＨチャン
ク３０２は、(1)L0IBR２０２からのＳＴＨチャンク３０
２の最後のフェッチに対応する関連性情報の状態および
(2)L1B２０８に書き込まれるべき更新されたＳＴＨチャ
ンク３０２が１つまたは複数の選択データ値パターンに
一致するか否かに関する決定に基いて、L1B２０８に書
き込まれる。選択データ値パターンは、好ましくは、L0
IBR２０２に関するデフォルトのイニシャライザがＳＴ
Ｈチャンク３０２をその値に初期化することができる値
の１つまたは複数と一致しなければならない。Each time it is determined that the L0IBRSTH chunk 302 needs to be updated, the non-speculative new branch prediction structure 2
12 determines that the corresponding L1B relevance bits stored in L1B 208 and L0 IBR 202 must also be updated. In one preferred embodiment of the non-speculative new branch prediction structure 212, the updated STH chunk 302 is (1) the STH chunk 30 from the L0 IBR 202.
The state of the relevance information corresponding to the last fetch of 2 and
(2) written to L1B 208 based on a determination as to whether the updated STH chunk 302 to be written to L1B 208 matches one or more selected data value patterns. The selected data value pattern is preferably L0
Default initializer for IBR202 is ST
H chunk 302 must match one or more of the values that can be initialized to that value.

【００８２】例えば、更新されたＳＴＨチャンク３０２
が１つまたは複数の選択データ値パターンのうちの１つ
と一致しない時、L1B２０８が更新されなければならな
いと決定することができる。また、L1B２０８が将来ア
クセスされる時に誤ったＳＴＨ値をヒットすることを避
けるために、最新に維持される必要がある関連データを
L1B２０８が含むことを適切なL1B関連性ビットの状態が
示す時、L1B２０８が更新されなければならないと決定
することができる。For example, the updated STH chunk 302
Does not match one of the one or more selected data value patterns, it can be determined that L1B 208 must be updated. Also, to avoid hitting the wrong STH value when L1B 208 is accessed in the future, the relevant data that needs to be kept up to date is
When the state of the appropriate L1B relevance bit indicates that L1B 208 includes, it can be determined that L1B 208 must be updated.

【００８３】L1B２０８が更新される必要があるか否か
を決定する回路６００の１つの例が図６に示されてい
る。比較器６０２は、選択データ値パターン（単数また
は複数）を更新されたＳＴＨチャンク３０２と比較す
る。２つの値が一致しなければ(これはインバータ６０
４の出力によって示される)、L1B２０８は更新される。
また、L1B２０８が関連データを含むことをL1B関連性ビ
ット(ＥＲビット)が示せば、L1Bデータを最新に維持す
るためにL1B２０８は更新される必要がある。その結
果、上述の２つの可能性のＯＲ演算６０６がL1Bを更新
のするか否かを決定する。One example of a circuit 600 for determining whether L1B 208 needs to be updated is shown in FIG. Comparator 602 compares the selected data value pattern (s) with updated STH chunk 302. If the two values do not match (this is
4), L1B 208 is updated.
Also, if the L1B relevance bit (ER bit) indicates that L1B 208 contains related data, then L1B 208 needs to be updated to keep the L1B data up to date. As a result, the OR operation 606 of the above two possibilities determines whether or not to update L1B.

【００８４】L1B関連性ビットを更新する回路６００の
１つの例も図６に示されている。比較器６０２へ入力さ
れた２つの値が一致せず、対応するＥＲビットがその時
点でセットされていなければ(例えば、ビットが論理ゼ
ロで、L1B２０８に関連情報が現時点で存在しないこと
を示していれば)、ＥＲビットはセットされる(これはＮ
ＯＲゲート６０８の出力によって示される)。比較器６
０２へ入力された２つの値が一致して、ＥＲビットがセ
ットされていれば(例えばビットが論理１であれば)、L1
B関連性ビットはクリアされる(これはＡＮＤゲート６１
０の出力によって示される)。上述の条件のいずれもが
満たされなければ、ＥＲビットのステートは変更されな
い。更新されたＥＲビットのステートが既存のＥＲビッ
トのステートと一致する時は、適切なロジックを使用し
てビットの更新をやめることができることは注意すべき
点である。One example of a circuit 600 for updating the L1B relevance bit is also shown in FIG. If the two values input to comparator 602 do not match and the corresponding ER bit is not set at that time (e.g., the bit is a logical zero, indicating that no relevant information is presently present in L1B 208). ER bit is set (this is N
Indicated by the output of OR gate 608). Comparator 6
If the two values input to 02 match and the ER bit is set (eg, if the bit is a logical one), L1
The B-related bit is cleared (this is the AND gate 61
0). If none of the above conditions are met, the state of the ER bit is not changed. It should be noted that when the state of the updated ER bit matches the state of the existing ER bit, the updating of the bit can be stopped using appropriate logic.

【００８５】4. 分岐退去待ち行列分岐退去待ち行列２１４は、新分岐予測構造２１０、２
１２と連係して動作する。分岐命令が実行のためフェッ
チされる時、投機的新分岐予測構造２１０はL0IBR２０
２を更新する。分岐命令が退去する時(すなわち分岐命
令の実行が完了した時)、分岐退去待ち行列２１４は、
非投機的新分岐予測構造２１２が誤ったL0IBRの更新を
修正するのを支援する。分岐退去待ち行列２１４は、ま
た、分岐が退去する機会を持つ前に命令パイプラインが
フラッシュされる時、L0IBR２０２の修正を支援する。4. Branch / Leave Queue The branch / leave queue 214 has a new branch prediction structure 210,
It operates in cooperation with the T.12. When a branch instruction is fetched for execution, the speculative new branch prediction structure 210
Update 2. When a branch instruction leaves (i.e., when execution of a branch instruction is completed), the branch retirement queue 214
The non-speculative new branch prediction structure 212 helps correct erroneous L0IBR updates. Branch retirement queue 214 also assists in modifying L0IBR 202 when the instruction pipeline is flushed before the branch has a chance to retire.

【００８６】分岐退去待ち行列２１４は、好ましくは、
２つの８エントリ・バンクに分割される１６エントリ待
ち行列として実施される。分岐予測情報がL0IBR２０２
から読み取られる時、予測情報が分岐退去待ち行列２１
４に書き込まれる。分岐が退去した後、その予測情報は
待ち行列２１４から削除される。パイプラインがフラッ
シュされると、分岐退去待ち行列２１４におけるエント
リは、ISBBR２２６および/またはL0IBR２０２へ書き戻
される。The branch-leave queue 214 preferably comprises
Implemented as a 16 entry queue divided into two 8 entry banks. Branch prediction information is L0IBR202
When read from the queue, the prediction information is
4 is written. After the branch leaves, the prediction information is removed from the queue 214. When the pipeline is flushed, entries in the branch evacuation queue 214 are written back to the ISBBR 226 and / or the L0IBR 202.

【００８７】5. 命令ポインタ発生器図２の装置は、また、命令ポインタ発生器(instruction
pointer generator:以下ＩＰ発生器と呼称する)２１６
を含む。IP発生器２１６の中心にIP発生器マルチプレク
サ(IP GEN MUX)２１８がある。L0IBR２０２からフェッ
チされた分岐のトリガ(複数)は、ＯＲされ、IP GEN MUX
２１８のための選択制御ロジック２２０へ渡される。ト
リガと関連するターゲットがIP GEN MUX２１８のデータ
入力に提供される。IP GEN MUX２１８は、IP GEN MUX２
１８によって以前に生成されたＩＰアドレスに対して１
だけインクリメントされるアドレスを含め、その他のデ
ータ入力をも受け取る。分岐命令が他に向かない限り、
命令バンドルが実行される度に１つのバンドルアドレス
ごとにインクリメントされる。しかしながら、分岐命令
のために、ターゲットアドレスのビットがＩＰの前の値
の一部またはすべてを上書きすることもあり得る。5. Instruction Pointer Generator The apparatus of FIG. 2 also includes an instruction pointer generator (instruction pointer generator).
pointer generator: hereinafter referred to as IP generator) 216
including. At the center of the IP generator 216 is an IP generator multiplexer (IP GEN MUX) 218. Branch triggers fetched from L0IBR 202 are ORed and IP GEN MUX
218 is passed to the selection control logic 220. The target associated with the trigger is provided to the IP GEN MUX 218 data input. IP GEN MUX 218
1 for the IP address previously generated by
It also receives other data inputs, including addresses that are only incremented. As long as the branch instruction does not go elsewhere,
Each time the instruction bundle is executed, it is incremented by one bundle address. However, due to a branch instruction, bits of the target address may overwrite some or all of the previous value of the IP.

【００８８】予測された分岐トリガおよびターゲットが
L0IBR２０２に記憶され、対応する分岐命令のフェッチ
の際にそれらがフェッチされるので、予測された分岐ト
リガおよびターゲットは、向け直し(resteer)を伴わず
にIP発生器２１６に提供されることができる。また、分
岐履歴を別にフェッチし、その有効性を検査し、トリガ
予測を計算し、(必要であれば)ターゲットアドレスをフ
ェッチまたは計算する必要による遅延は起こらない。こ
の結果、統合低レベル・キャッシュ２００およびその関
連分岐予測ハードウェアによって、命令パイプラインへ
のバブルの注入を伴うことなく命令ポインタの向け直し
を行うことができる。命令ポインタは単一のキャッシュ
に記憶される命令および予測情報をアドレス指定するこ
とができるので、命令ポインタが駆動しなければならな
いロードを低減し、それによってデータをさらに迅速に
駆動させることができる点に注意するべきである。The predicted branch trigger and target are
The predicted branch triggers and targets can be provided to the IP generator 216 without restoring as they are stored in the L0IBR 202 and are fetched upon fetching the corresponding branch instructions. . Also, there is no delay due to the need to fetch the branch history separately, check its validity, calculate the trigger prediction, and (if necessary) fetch or calculate the target address. As a result, the integrated low-level cache 200 and its associated branch prediction hardware allow the redirection of the instruction pointer without the injection of bubbles into the instruction pipeline. The instruction pointer can address the instruction and prediction information stored in a single cache, thus reducing the load that the instruction pointer must drive, and thereby driving the data more quickly. It should be noted that

【００８９】全般的構造／動作命令パイプラインによって必要とされる命令が統合低レ
ベル・キャッシュのL0I部分２０４にすでに存在してい
ない時、フェッチ・プロセスは、L1キャッシュ２０６お
よびL1Bキャッシュ２０８にデータを求める要求を発信
する。L1B２０８からフェッチされるデータは、もしあ
るとすれば、図２にRAB２２２(バッファ)として示され
ている構造に一時的に記憶される。命令バンドルペアが
L1キャッシュ２０６から戻される時、命令バンドルのテ
ンプレートおよびL1B２０８から取り出された任意のＳ
ＴＨチャンク３０２乃至３０８を含む、命令バンドルペ
アを使用して、バンドルペアの最初に成立すると予測さ
れる分岐（存在するとすれば）のターゲットが計算され
る。ＳＴＨチャンク３０２乃至３０８がL1B２０８に記
憶されていなければ、これらチャンクのデフォルト値が
生成される。ＳＴＨチャンク３０２乃至３０８、多数の
ターゲットビット、ターゲット相関ビットおよび命令バ
ンドルペアのすべてがISB/lSBBR２２６へ供給される。I
SB/lSBBR２２６は、統合低レベル・キャッシュ２００の
エントリをそれぞれが満たすためのバッファの役目を果
たす。When the instructions required by the general structure / operation instruction pipeline are not already present in the L0I portion 204 of the unified low level cache, the fetch process places the data in the L1 cache 206 and L1B cache 208. Submit the request you want. The data fetched from L1B 208, if any, is temporarily stored in a structure shown as RAB 222 (buffer) in FIG. Instruction bundle pair
When returned from L1 cache 206, the template of the instruction bundle and any S retrieved from L1B 208
Using the instruction bundle pair, including the TH chunks 302-308, the target of the branch (if any) that is predicted to be taken first in the bundle pair is calculated. If STH chunks 302-308 are not stored in L1B 208, default values for these chunks are generated. The STH chunks 302-308, the number of target bits, the target correlation bits and the instruction bundle pair are all provided to the ISB / lsBBR 226. I
SB / LSBBR 226 serves as a buffer for each of the entries in the unified low-level cache 200.

【００９０】命令バンドルペアがL0I２０４からフェッ
チされる時、その対応する分岐予測情報がL0IBR２０２
からフェッチされる。これには、バンドルペアにおける
分岐命令の現在のフェッチ結果を予測する情報が含まれ
る。命令バンドルペアがL0I/L0IBR２００からフェッチ
される時、L0IBRターゲット情報が、もしあるとすれ
ば、IP GEN MUX２１８へのデータ入力として提供され、
関連するＳＴＨチャンク３０２からのトリガ予測情報が
ＯＲされ、IP GEN MUX２１８の選択ロジック２１８に提
示される。このようにして、命令ポインタの迅速な１サ
イクルの向け直しが可能となる。When an instruction bundle pair is fetched from L0I 204, its corresponding branch prediction information is
Fetched from This includes information for predicting the current fetch result of the branch instruction in the bundle pair. When an instruction bundle pair is fetched from L0I / L0IBR 200, L0IBR target information, if any, is provided as a data input to IP GEN MUX 218,
The trigger prediction information from the associated STH chunk 302 is ORed and presented to the selection logic 218 of the IP GEN MUX 218. In this way, the instruction pointer can be quickly redirected for one cycle.

【００９１】すべてのL0IBR情報が、前述の目的のた
め、分岐退去待ち行列２１４と新分岐予測構造２１０、
２１２に供給される。分岐命令の次のフェッチ結果を予
測する情報は、投機的新分岐予測ハードウェア２１０に
よって投機的に決定され、ISBBR２２６および/またはL0
IBR２０２に書き戻されるので、その情報は、分岐命令
の次のフェッチの前には使用可能である点に注意するべ
きである。従って、有効な次のフェッチ予測情報が常に
使用可能となる。All L0IBR information is stored in the branch exit queue 214 and new branch prediction structure 210,
212. The information for predicting the next fetch result of the branch instruction is speculatively determined by the speculative new branch prediction hardware 210, and is used for the ISBBR 226 and / or L0.
It should be noted that the information is available before the next fetch of the branch instruction since it is written back to the IBR 202. Therefore, valid next fetch prediction information is always available.

【００９２】L0I２０４からフェッチされる命令は、命
令バッファ２３２に置かれ、ローテータｍｕｘ２３４を
介して適切な実行ユニット(例えば分岐実行ユニット２
３６)へ順次供給される。新分岐予測構造２１０、２１
２および分岐退去待ち行列２１４の出力は、スイッチン
グ手段(例えばマルチプレクサ２２８)に渡され、その
後、更新された情報がISBBR２２６経由でまたは直接、L
0IBR２０２に送られる。The instruction fetched from the L0I 204 is placed in the instruction buffer 232, and is sent to an appropriate execution unit (for example, the branch execution unit 2) via the rotator mux 234.
36). New branch prediction structures 210, 21
2 and the output of the branch / leave queue 214 are passed to a switching means (eg, a multiplexer 228), after which the updated information is passed through the ISBBR 226 or directly to L
0 Sent to IBR202.

【００９３】代替的実施形態上記開示は、バンドルペアの形態で命令をフェッチする
プロセッサを記述した。その結果、命令のバンドルペア
にマップされる多数の分岐予測情報エントリについて記
述してきた。しかしながら、命令を単独で、またはバン
ドルで、または(バンドルペアのような)バンドルグルー
プのいずれかの形態で命令をフェッチするプロセッサに
関して同様の方法で分岐予測を管理することも本発明の
範囲に含まれるものとみなされる。 Alternative Embodiments The above disclosure describes a processor that fetches instructions in the form of a bundle pair. As a result, a number of branch prediction information entries mapped to a bundle pair of instructions have been described. However, it is within the scope of the present invention to manage branch prediction in a similar manner for processors fetching instructions alone or in bundles or in the form of bundle groups (such as bundle pairs). Be considered.

【００９４】上記開示は、また、IA-64の命令バンドル
を全体に引用した。バンドル単位で予測情報を管理する
ため開示された方法および装置が使用される時、代替形
態を持つバンドルに関して予測情報を管理することも本
発明の範囲に含まれるものとみなされる。例えば、命令
バンドルが、さらに多いまたはさらに少ない命令バンド
ルを含むことも可能である。また、命令バンドルは異な
るテンプレート形式を含むことも、あるいは、まったく
テンプレートを持たないことも可能である。そのような
ケースは、例えば、各命令シラブルが特定タイプの命令
だけを受け入れることができるような場合、または、各
命令が命令タイプ情報を保有している場合に可能であ
る。The above disclosure also referred to the IA-64 instruction bundle in its entirety. When the disclosed method and apparatus are used to manage prediction information on a per bundle basis, managing the prediction information for alternative bundles is also considered to be within the scope of the present invention. For example, the instruction bundle may include more or less instruction bundles. Instruction bundles can also include different template types or have no templates at all. Such a case is possible, for example, if each instruction syllable can only accept a specific type of instruction, or if each instruction has instruction type information.

【００９５】さらに、分岐予測情報の任意の数および/
または任意のタイプも統合低レベル・キャッシュ２００
に記憶することができることは注意されるべき点であろ
う。L0IBR２０２に記憶される情報のタイプ（および情
報の量）の選択は特定のアプリケーションに従って変更
することができる。しかし、多かれ少なかれ、予測情報
はL0IBR(およびL1B)に記憶されることができる。例え
ば、IA-64ターゲット情報の４０ビットをIA-64ターゲッ
ト情報のフルの６４ビットに増加させることが可能であ
る。同様に、各分岐命令についてL0I２０４に記憶され
る分岐履歴の長さを増減することが可能であり、また
は、使用可能な分岐履歴(またはＳＴＨチャンク３０２
乃至３０８)の数を増減させることが可能である。さら
に、図２の装置の種々の構成部分は、(図示されている
ように)統合して実施することも、あるいは、図２の装
置が全体として提供する利点の一部だけを備えることに
なるが個別に独立して実施することも可能である。Further, an arbitrary number of branch prediction information and / or
Or any type of integrated low level cache 200
It should be noted that it can be stored in The choice of the type of information (and the amount of information) stored in the L0IBR 202 can vary according to the particular application. However, more or less, the prediction information can be stored in L0IBR (and L1B). For example, it is possible to increase 40 bits of IA-64 target information to full 64 bits of IA-64 target information. Similarly, the length of the branch history stored in L0I 204 for each branch instruction can be increased or decreased, or the available branch history (or STH chunk 302
To 308) can be increased or decreased. Further, the various components of the device of FIG. 2 may be implemented in an integrated manner (as shown) or may have only some of the advantages that the device of FIG. 2 provides as a whole. Can also be implemented individually and independently.

【００９６】以上、本発明の好ましい実施形態を記述し
たが、本発明の理念を上記と異なる種々の形態で実施す
ることが可能である点は理解されるべきことであろう。Although the preferred embodiments of the present invention have been described above, it should be understood that the principles of the present invention can be implemented in various forms different from the above.

【００９７】本発明には、例として次のような実施様態
が含まれる。The present invention includes the following embodiments as examples.

【００９８】(1)プロセッサによって使用される予測情
報を管理する方法であって、少なくとも１つの低レベル
予測キャッシュ(202)および１つの高レベル予測キャッ
シュ(208)を含む複数レベルのメモリ階層に予測情報を
記憶することと、前記低レベル予測キャッシュから予測
情報値をフェッチする試みがなされ、該予測情報値が前
記低レベル予測キャッシュに記憶されていない時、前記
高レベル予測キャッシュから該予測情報値をフェッチす
ることを試みることと、前記低レベル予測キャッシュに
記憶されている予測情報を使用して、前記低レベル予測
キャッシュおよび前記高レベル予測キャッシュに記憶さ
れている予測情報を周期的に更新することと、を含む方
法。(1) A method for managing prediction information used by a processor, comprising: predicting prediction information in a multi-level memory hierarchy including at least one low-level prediction cache (202) and one high-level prediction cache (208). Storing information and attempting to fetch a prediction information value from said low level prediction cache, wherein said prediction information value is not stored in said low level prediction cache; Using the prediction information stored in the low-level prediction cache and periodically updating the prediction information stored in the low-level prediction cache and the high-level prediction cache. And a method comprising.

【００９９】(2)前記低レベル予測キャッシュ(202)を１
つの低レベル命令キャッシュ(204)と統合して、共通キ
ャッシュ管理構造(300)を使用し、もしあるならば、命
令およびそれら命令に対応する予測情報が統合低レベル
キャッシュ(202)に維持されることができるようにする
ことをさらに含む、前記(1)に記載の方法。(2) The low-level prediction cache (202) is
Integrates with the two low-level instruction caches (204) and uses a common cache management structure (300), and instructions, if any, and their corresponding prediction information are maintained in the unified low-level cache (202) The method of (1) above, further comprising the step of:

【０１００】(3)プロセッサによってフェッチされる分
岐命令に関連する予測情報を管理する方法であって、１
つの分岐命令が１つの低レベル命令キャッシュ(204)か
らフェッチされるたびごとに、該分岐命令の現在のフェ
ッチ結果を予測する情報を含み該分岐命令に関連する予
測情報を、１つまたは複数のキャッシュ(202)から取り
出すことと、実質上並行して、(a)該分岐命令の現在の
フェッチ結果を予測する情報が該分岐命令が成立すると
予測する場合、該分岐命令を実行し、該分岐命令の現在
のフェッチ結果を予測する前記情報に応じて命令ポイン
タを更新し(216)、(b)該分岐命令およびその取り出され
た予測情報に基いて、該分岐命令の次のフェッチ結果を
予測する情報を決定する(210)ことと、該分岐命令に関
連し、該分岐命令の次のフェッチ結果を予測する前記情
報を含む、更新された予測情報を前記１つまたは複数の
キャッシュに記憶することと、を含む方法。(3) A method for managing prediction information related to a branch instruction fetched by a processor, comprising:
Each time a branch instruction is fetched from one low-level instruction cache (204), prediction information associated with the branch instruction, including information predicting the current fetch result of the branch instruction, is stored in one or more Substantially in parallel with fetching from the cache (202), (a) execute the branch instruction if the information for predicting the current fetch result of the branch instruction predicts that the branch instruction will be taken; The instruction pointer is updated according to the information for predicting the current fetch result of the instruction (216), and (b) predicting the next fetch result of the branch instruction based on the branch instruction and the fetched prediction information. Determining 210 information to be stored and storing updated prediction information associated with the branch instruction and including the information for predicting a next fetch result of the branch instruction in the one or more caches. That The method comprising.

【０１０１】(4)前記分岐命令の現在および次のフェッ
チ結果を予測する前記情報が、分岐命令が成立するか不
成立かの予測を示すトリガ情報と、前記トリガ情報が該
分岐命令が成立すると予測する場合、該分岐命令の分岐
先を示すターゲット情報と、を含む前記(3)に記載の方
法。(4) The information for predicting the current and next fetch results of the branch instruction includes trigger information indicating whether the branch instruction is taken or not taken, and the trigger information indicating that the branch instruction is taken. The target information indicating a branch destination of the branch instruction.

【０１０２】(5)前記(4)に記載の方法において、前記分
岐命令に関連する前記予測情報がさらに分岐履歴情報を
含み、該方法が、該分岐命令の次のフェッチ結果を予測
する前記情報を決定する前記ステップ(210)と実質上並
行して、該分岐命令の現在のフェッチ結果を予測する情
報の一部を形成する前記トリガ情報に応じて前記分岐履
歴情報を更新することをさらに含む方法。(5) The method according to (4), wherein the prediction information related to the branch instruction further includes branch history information, and the method predicts a next fetch result of the branch instruction. Substantially updating the branch history information in response to the trigger information forming part of the information predicting the current fetch result of the branch instruction, substantially in parallel with the step (210) of determining the branch instruction. Method.

【０１０３】(6)前記(3)に記載の方法において、前記プ
ロセッサがバンドル(100)の形態で命令をフェッチし、
前記１つまたは複数のキャッシュ(202)から取り出され
そこへ記憶される前記予測情報が、バンドルごとに、
(a)該バンドルにおける分岐命令の各々が成立するか不
成立かの予測を示すトリガ予測情報と、(b)前記トリガ
情報が前記バンドル内における１つまたは複数の分岐命
令が成立すると予測する場合、最初に成立すると予測さ
れる分岐命令の分岐先を示すターゲット情報と、を備
え、前記分岐命令に関連する前記予測情報を取り出す
時、同じバンドルの中の分岐命令の各々に関連する予測
情報を取り出すことと、前記分岐命令およびその取り出
された予測情報に基いて該分岐命令の次のフェッチ結果
を予測する情報を決定する時、前記同じバンドル内のフ
ェッチされた分岐命令およびそれらの取り出された予測
情報に基いて、前記同じバンドル内の分岐命令の各々の
次のフェッチ結果を予測する情報を決定すること(210)
と、前記分岐命令に関連した更新された予測情報を前記
１つまたは複数のキャッシュに記憶する時、前記同じバ
ンドル内の分岐命令の各々に関連した更新された予測情
報を記憶することと、を含む方法。(6) The method according to (3), wherein the processor fetches the instruction in the form of a bundle (100),
The prediction information retrieved from and stored in the one or more caches (202) is, for each bundle,
(a) trigger prediction information indicating prediction of whether each of the branch instructions in the bundle is taken or not taken; and (b) when the trigger information predicts that one or more branch instructions in the bundle are taken, Target information indicating a branch destination of a branch instruction predicted to be taken first, and when extracting the prediction information related to the branch instruction, extracting prediction information associated with each of the branch instructions in the same bundle And determining the information for predicting the next fetch result of the branch instruction based on the branch instruction and its fetched prediction information, wherein the fetched branch instructions in the same bundle and their fetched predictions Determining information predicting a next fetch result for each of the branch instructions in the same bundle based on the information (210)
And when storing updated prediction information associated with the branch instruction in the one or more caches, storing updated prediction information associated with each of the branch instructions in the same bundle. Including methods.

【０１０４】(7)前記(3)に記載の方法において、前記プ
ロセッサがバンドルグループの形態で命令をフェッチ
し、前記１つまたは複数のキャッシュ(202)から取り出
されそこへ記憶される前記予測情報が、各バンドル・グ
ループごとに、(a)該バンドル・グループにおける分岐
命令の各々が成立か不成立かの予測を示すトリガ予測情
報と、(b)前記トリガ情報が前記バンドルグループ内に
おける１つまたは複数の分岐命令が成立すると予測する
場合、最初に分岐すると予測される分岐命令の分岐先を
示すターゲット情報と、を備え、前記分岐命令に関連す
る前記予測情報を取り出す時、同じバンドルグループ内
の分岐命令の各々に関連する予測情報を取り出すこと
と、前記分岐命令およびその取り出された予測情報に基
いて該分岐命令の次のフェッチ結果を予測する情報を決
定する時、前記同じバンドルグループ内のフェッチされ
た分岐命令およびそれらの取り出された予測情報に基い
て、前記同じバンドルグループ内の分岐命令の各々の次
のフェッチ結果を予測する情報を決定すること(210)
と、前記分岐命令に関連した更新された予測情報を前記
１つまたは複数のキャッシュに記憶する時、前記同じバ
ンドルグループ内の分岐命令の各々に関連した更新され
た予測情報を記憶することと、を含む方法。(7) The method according to (3), wherein the processor fetches instructions in a bundle group, and the prediction information fetched from the one or more caches (202) and stored therein. For each bundle group, (a) trigger prediction information indicating prediction of whether each of the branch instructions in the bundle group is taken or not taken, and (b) the trigger information is one or more in the bundle group. When predicting that a plurality of branch instructions will be taken, target information indicating a branch destination of a branch instruction predicted to branch first is provided, and when extracting the prediction information related to the branch instruction, Retrieving prediction information associated with each of the branch instructions; and fetching a next fetch of the branch instruction based on the branch instruction and the retrieved prediction information. Predicting the next fetch result of each of the branch instructions in the same bundle group based on the fetched branch instructions in the same bundle group and their fetched prediction information when determining the information for predicting the outcome. Determining what information to do (210)
And storing updated prediction information associated with each of the branch instructions in the same bundle group when storing updated prediction information associated with the branch instruction in the one or more caches; A method that includes

【０１０５】(8)前記１つまたは複数のキャッシュが、
前記低レベル命令キャッシュ(204)と統合され統合低レ
ベル・キャッシュ(200)を形成する予測キャッシュ(202)
を含み、該統合低レベル・キャッシュから分岐命令がフ
ェッチされるたびごとに該分岐命令の現在のフェッチ結
果を予測する情報が該統合低レベル・キャッシュからフ
ェッチされるように、該分岐命令および該分岐命令の現
在のフェッチ結果を予測する前記情報を前記統合低レベ
ル・キャッシュに記憶することをさらに含む、前記(3)
に記載の方法。(8) The one or more caches include:
A prediction cache (202) integrated with the low level instruction cache (204) to form an integrated low level cache (200)
Wherein the branch instruction and the branch instruction are fetched from the unified low-level cache such that information predicting a current fetch result of the branch instruction is fetched from the unified low-level cache each time the branch instruction is fetched from the unified low-level cache. (3) further comprising storing the information predicting a current fetch result of a branch instruction in the unified low-level cache.
The method described in.

【０１０６】(9)プロセッサによってフェッチされる分
岐命令の結果を予測する装置であって、１つの低レベル
命令キャッシュ(204)と、前記低レベル命令キャッシュ
に記憶される任意の分岐命令の現在のフェッチ結果を予
測する情報を含む分岐予測情報を記憶する１つまたは複
数の分岐予測キャッシュ(202)と、前記低レベル命令キ
ャッシュおよび前記１つまたは複数の分岐予測キャッシ
ュから対応する分岐命令および分岐予測情報をそれぞれ
フェッチする論理手段(216)と、前記フェッチされた分
岐予測情報に基いて、フェッチされた分岐命令の次のフ
ェッチ結果を予測する情報を決定する論理手段(210)
と、更新された予測情報を前記１つまたは複数の分岐予
測キャッシュに記憶する論理手段(226、228)と、を備え
る装置。(9) An apparatus for predicting the result of a branch instruction fetched by a processor, comprising: one low-level instruction cache (204); and the current value of any branch instruction stored in said low-level instruction cache. One or more branch prediction caches (202) for storing branch prediction information including information for predicting a fetch result, and corresponding branch instructions and branch predictions from the low-level instruction cache and the one or more branch prediction caches Logic means (216) for fetching information respectively, and logic means (210) for determining information for predicting the next fetch result of the fetched branch instruction based on the fetched branch prediction information
And logic means (226, 228) for storing updated prediction information in said one or more branch prediction caches.

【０１０７】(10)前記低レベル命令キャッシュ(204)お
よび前記１つまたは複数の分岐予測キャッシュ(202)が
１つの統合低レベル・キャッシュ(202)を形成する、前
記(9)に記載の装置。(10) The apparatus according to (9), wherein the low-level instruction cache (204) and the one or more branch prediction caches (202) form an integrated low-level cache (202). .

[Brief description of the drawings]

【図１】IA-64の命令バンドルを示すブロック図であ
る。FIG. 1 is a block diagram showing an instruction bundle of IA-64.

【図２】複数レベルの予測情報メモリ階層および統合低
レベル命令／分岐予測キャッシュを含む、プロセッサの
種々の構成要素を示すブロック図である。FIG. 2 is a block diagram illustrating various components of a processor, including a multi-level prediction information memory hierarchy and an integrated low-level instruction / branch prediction cache.

【図３】図２のメモリ階層の統合低レベル命令／分岐予
測キャッシュの好ましい実施形態のブロック図である。FIG. 3 is a block diagram of a preferred embodiment of the integrated low-level instruction / branch prediction cache of the memory hierarchy of FIG. 2;

【図４】分岐履歴およびトリガ予測情報の符号化形式の
記憶を示すブロック図である。FIG. 4 is a block diagram illustrating storage of encoding formats of branch history and trigger prediction information.

【図５】図２のメモリ階層の高レベル予測キャッシュに
関連情報を記憶するか否かを示す"関連性情報"の初期ス
テートを決定する典型的な回路のブロック図である。FIG. 5 is a block diagram of a typical circuit for determining an initial state of “relevance information” indicating whether to store relevant information in a high-level prediction cache of the memory hierarchy of FIG. 2;

【図６】図２のメモリ階層の高レベルの予測キャッシュ
の更新および図５で参照される関連性情報の更新の必要
性を決定する典型的な回路のブロック図である。FIG. 6 is a block diagram of an exemplary circuit that determines the need for updating a high-level prediction cache of the memory hierarchy of FIG. 2 and updating the relevancy information referenced in FIG.

[Explanation of symbols]

２００統合低レベルキャッシュ２０２低レベル予測キャッシュ２０４低レベル命令キャッシュ２０８高レベル予測キャッシュ２１０投機的新分岐予測構造２１６命令ポインタ発生器３００共通管理構造 Reference Signs List 200 Integrated low-level cache 202 Low-level prediction cache 204 Low-level instruction cache 208 High-level prediction cache 210 Speculative new branch prediction structure 216 Instruction pointer generator 300 Common management structure

───────────────────────────────────────────────────── フロントページの続き (72)発明者スティーヴン・アール・アンディアメリカ合衆国80524コロラド州フォート・コリンズ、リンデン・ビュー・コート 5916 ──────────────────────────────────────────────────続き Continued on front page (72) Inventor Stephen Earl Andy 5916 Linden View Court, Fort Collins, Colorado 80524 USA

Claims

[Claims]

1. A method for managing prediction information used by a processor, the method comprising: storing prediction information in a multi-level memory hierarchy including at least one low-level prediction cache and one high-level prediction cache; Attempting to fetch a prediction information value from the low-level prediction cache and attempting to fetch the prediction information value from the high-level prediction cache when the prediction information value is not stored in the low-level prediction cache. And periodically updating the prediction information stored in the low-level prediction cache and the high-level prediction cache using the prediction information stored in the low-level prediction cache.