JPH09330220A

JPH09330220A - Method and system for minimizing branch penalty of processor

Info

Publication number: JPH09330220A
Application number: JP9063094A
Authority: JP
Inventors: T Gora Robert; ロバート・ティー・ゴラ; H Olson Christopher; クリストファー・エイチ・オルソン; M Potter Terence; テレンス・エム・ポッター
Original assignee: International Business Machines Corp
Current assignee: International Business Machines Corp
Priority date: 1996-03-25
Filing date: 1997-03-17
Publication date: 1997-12-22

Abstract

PROBLEM TO BE SOLVED: To minimize the delay when an indirect instruction is executed by predicting the target address of the indirect branch instruction by a branching unit if the contents of a branch register can not be used yet. SOLUTION: A branch target address cache(BTAC) 54 allows the branching unit 28 to predict the target address before it is supplied with a move-to-count- register(mtctr) instruction, and subtract the calculated branch penalty of the indirect branching of a branch-to-count(brctr) instruction, etc. The branching unit 28 sends the current branch address of a branch address register 50 to the BTAC 54 when the target address depends upon the unprocessed mtctr instruction. When the brctr address matches one of the addresses stored in the BTAC 54 and its item is effective, a BTAC hit is made, and the target address of the item is read and sent to an instruction cache.

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】本発明は、プロセッサの性能
を改善するための方法およびシステムに関し、具体的に
は、レジスタ依存分岐命令に関連する分岐ペナルティを
最小にするための方法およびシステムに関する。FIELD OF THE INVENTION The present invention relates to methods and systems for improving processor performance, and more particularly to methods and systems for minimizing branch penalties associated with register dependent branch instructions.

【０００２】[0002]

【従来の技術】ほとんどのパーソナル・コンピュータ
（ＰＣ）・アーキテクチャの命令セットには、分岐命令
が含まれる。分岐命令があると、順次経路に沿ったプロ
グラムの実行が不連続になり、メモリ内の新しい位置で
実行が再開される。この新しい位置を、分岐の目標アド
レスと称する。The instruction set of most personal computer (PC) architectures includes branch instructions. The branch instruction causes the program execution along the sequential path to become discontinuous and resumes execution at a new location in memory. This new location is called the target address of the branch.

【０００３】相対、絶対および間接という３種類の分岐
命令が、目標アドレスの指定に使用される。相対分岐命
令の目標アドレスは、分岐命令のアドレスにオフセット
を加えたアドレスである。絶対分岐命令の目標アドレス
は、分岐命令に含まれる即値アドレスである。間接分岐
命令は、２種類のレジスタ依存分岐すなわち、branch-t
o-count（カウントへの分岐）とbranch-to-link（リン
クへの分岐）に分類され、branch-to-countの目標アド
レスは、カウント・レジスタに記憶された値、branch-t
o-linkの目標アドレスは、リンク・レジスタに記憶され
た値である。Three types of branch instructions, relative, absolute and indirect, are used to specify the target address. The target address of the relative branch instruction is an address obtained by adding an offset to the address of the branch instruction. The target address of the absolute branch instruction is the immediate address included in the branch instruction. Indirect branch instructions are two types of register-dependent branch, namely branch-t.
Classified into o-count (branch to count) and branch-to-link (branch to link), the target address of branch-to-count is the value stored in the count register, branch-t.
The o-link target address is the value stored in the link register.

【０００４】カウント・レジスタおよびリンク・レジス
タが値を受け取る方法の１つが、move-to-count（カウ
ント・レジスタに移動）命令およびmove-to-link（リン
ク・レジスタに格納）命令の実行を介することである。
これらの命令では、汎用レジスタ（ＧＰＲ）にある、オ
ペランドと称する値が、カウント・レジスタまたはリン
ク・レジスタに移される。ＧＰＲに格納されたアドレス
へ分岐するには、通常は、move-to-count（ｍｔｃｔ
ｒ）命令を実行し、その後にbranch-to-count（ｂｒｃ
ｔｒ）命令を実行する。ｍｔｃｔｒ／ｂｒｃｔｒ命令シ
ーケンス中には、ｍｔｃｔｒ命令からもたらされるカウ
ント・レジスタの値がわかるまでｂｒｃｔｒ命令を実行
できない。このような命令は、特定のレジスタがその命
令の目標アドレスを取得するまで実行できないので、レ
ジスタ依存命令と呼ばれる。One way the count and link registers receive values is through the execution of move-to-count and move-to-link instructions. That is.
These instructions move a value, which is in the general purpose register (GPR), called an operand, into a count register or a link register. To branch to the address stored in GPR, usually move-to-count (mtct
r) instructions, followed by branch-to-count (brc
tr) execute the instruction. During the mtctr / brctr instruction sequence, the brctr instruction cannot be executed until the value of the count register resulting from the mtctr instruction is known. Such instructions are called register-dependent instructions because they cannot be executed until a particular register gets the target address of the instruction.

【０００５】read-after-write（ＲＡＷ、書込後の読
取）ハザードが、ｍｔｃｔｒ命令とｂｒｃｔｒ命令の間
に存在する。ＲＡＷハザードが存在するのは、レジスタ
がある命令によって書き込まれ、その後、後の命令によ
って読み取られる時である。プロセッサは、この２つの
命令が実際に実行される順序と無関係に、第１の命令か
らのデータを第２の命令に供給する必要がある。A read-after-write hazard exists between the mtctr and brctr instructions. A RAW hazard exists when a register is written by an instruction and then read by a later instruction. The processor needs to supply the data from the first instruction to the second instruction regardless of the order in which the two instructions are actually executed.

【０００６】ｍｔｃｔｒ／ｂｒｃｔｒ命令シーケンス間
の依存性は、動作のオーバーラップによってハードウェ
ア構成要素をできる限り稼動状態に保とうとするパイプ
ライン・アーキテクチャで問題となる。動作をオーバー
ラップさせるため、通常のパイプライン・アーキテクチ
ャには、同一のプロセッサ・サイクル中に発生する３つ
のオーバーラップする動作、すなわち、メモリから命令
を取り出す取出サイクル、命令を実行ユニットに送るデ
ィスパッチ・サイクルおよび実行ユニットが命令を実行
する実行サイクルが含まれる。分岐命令を実行する責任
を負う実行ユニットを、分岐ユニットと呼ぶ。Dependencies between mtctr / brctr instruction sequences are problematic in pipeline architectures that try to keep hardware components up and running as much as possible by overlapping operations. To overlap operations, a typical pipeline architecture has three overlapping operations that occur during the same processor cycle: a fetch cycle that fetches an instruction from memory, a dispatch operation that sends the instruction to an execution unit. Cycles and execution cycles are included in which execution units execute instructions. The execution unit responsible for executing the branch instruction is called the branch unit.

【０００７】目標アドレスを入手できないので分岐ユニ
ットが分岐を実行できない時には、目標アドレスがわか
るまで取出サイクルを停止（ストール）させなければな
らない。このため、次のサイクルにディスパッチ・サイ
クルがストールし、さらに、その次のサイクルで実行サ
イクルがストールする。したがって、ｂｒｃｔｒ命令を
実行できるようになる前にｍｔｃｔｒ命令を実行するた
めに余分のサイクルが必要になり、総合的なシステム性
能が低下する。When the branch unit cannot execute a branch because the target address cannot be obtained, the fetch cycle must be stopped (stalled) until the target address is known. Therefore, the dispatch cycle stalls in the next cycle, and the execution cycle stalls in the next cycle. Therefore, extra cycles are required to execute the mtctr instruction before it can be executed, reducing overall system performance.

【０００８】ｍｔｃｔｒを実行する時と、分岐ユニット
がｂｒｃｔｒ命令に出会う時の間の時間を、計算された
分岐ペナルティと定義する。ｍｔｃｔｒ命令をｂｒｃｔ
ｒの十分前にスケジューリングすることによって、コン
パイラは、計算された分岐ペナルティを減らすことがで
きる。これは、分岐ユニットがｂｒｃｔｒに出会う前
に、実行ユニットにｍｔｃｔｒが送られるからである。
残念ながら、たとえば分岐依存性が原因で、ｍｔｃｔｒ
とｂｒｃｔｒの間にスケジューリングする命令をコンパ
イラが発見できないことがしばしばである。The time between executing mtctr and the time the branch unit encounters the brctr instruction is defined as the calculated branch penalty. brct the mtctr instruction
By scheduling well before r, the compiler can reduce the calculated branch penalty. This is because mtctr is sent to the execution unit before the branch unit encounters brctr.
Unfortunately, due to branch dependencies, for example, mtctr
It is often the case that the compiler cannot find the instruction to schedule between a and brctr.

【０００９】[0009]

【発明が解決しようとする課題】したがって、間接分岐
命令を実行する際の遅延を最小にするためのシステムお
よび方法が必要である。本発明は、この必要に対処す
る。Accordingly, there is a need for a system and method for minimizing delays in executing indirect branch instructions. The present invention addresses this need.

【００１０】[0010]

【課題を解決するための手段】本発明は、分岐ペナルテ
ィが第１間接分岐命令に関連する場合の、プロセッサ内
の分岐ペナルティを最小にするためのシステムおよび方
法を提供する。このシステムおよび方法には、前の間接
分岐命令のアドレスと前の目標アドレスとを含む少なく
とも１つの項目を記憶するための第１テーブルが含まれ
る。分岐ユニットは、第１間接分岐命令を処理するため
に第１テーブルに結合される。第１間接分岐命令は、特
定のアドレスを有し、プログラム実行を新しい目標アド
レスから開始させる。第１間接分岐命令は、新しい目標
アドレスを供給するために先行命令に依存する。分岐ユ
ニットは、第１間接分岐命令のアドレスを、第１テーブ
ルに記憶された前の間接分岐命令のアドレスと比較する
ことによって第１間接分岐命令を処理し、これらが一致
する場合には、前の目標アドレスを新しい目標アドレス
として使用し、これによって、先行命令によって新しい
目標アドレスが供給される前に新しい目標アドレスを予
測する。SUMMARY OF THE INVENTION The present invention provides a system and method for minimizing branch penalties in a processor when the branch penalties are associated with a first indirect branch instruction. The system and method includes a first table for storing at least one entry containing a previous indirect branch instruction address and a previous target address. The branch unit is coupled to the first table for processing the first indirect branch instruction. The first indirect branch instruction has a specific address and causes program execution to start at a new target address. The first indirect branch instruction depends on the predecessor instruction to supply the new target address. The branch unit processes the first indirect branch instruction by comparing the address of the first indirect branch instruction with the address of the previous indirect branch instruction stored in the first table, and if they match, the previous indirect branch instruction is processed. Target address is used as the new target address, which predicts the new target address before it is supplied by the preceding instruction.

【００１１】本明細書に開示されるシステムおよび方法
に従って、本発明は、分岐レジスタの内容をまだ利用で
きない時に、分岐ユニットが間接分岐命令の目標アドレ
スを予測できるようにする。８つの項目を有する分岐テ
ーブルのシミュレーションによって、従来の方法に対し
て５０％を越える性能向上が示された。In accordance with the systems and methods disclosed herein, the present invention allows a branch unit to predict the target address of an indirect branch instruction when the contents of the branch register are not yet available. A simulation of a branch table with 8 entries showed a performance improvement of over 50% over the conventional method.

【００１２】[0012]

【発明の実施の形態】本発明は、分岐命令の処理の改良
に関する。以下の説明は、当技術分野で通常の技量を有
する者が、本発明を作成、使用することを可能にするた
めに提示され、特許出願とその要件に関連して提供され
る。好ましい実施例のさまざまな修正が、当業者には容
易に明白になり、本明細書の一般原理は、他の実施例に
適用できる。したがって、本発明は、示される実施例に
制限されるものではなく、本明細書に開示された原理お
よび特徴に矛盾しない最も広い範囲に従うべきものであ
る。DETAILED DESCRIPTION OF THE INVENTION The present invention relates to improving the processing of branch instructions. The following description is presented to enable any person of ordinary skill in the art to make and use the invention and is provided in connection with a patent application and its requirements. Various modifications of the preferred embodiment will be readily apparent to those skilled in the art, and the general principles herein may be applied to other embodiments. Therefore, the present invention is not limited to the illustrated embodiments, but rather should be subject to the broadest scope consistent with the principles and features disclosed herein.

【００１３】図１に、本発明が常駐するプロセッサ１０
を示す。プロセッサ１０には、命令キャッシュ（ＩＣ）
１２、命令バッファ（ＩＢ）１４、命令アドレス待ち行
列（ＩＡＱ）１６、第２の命令アドレス待ち行列（ＣＢ
ＩＡＱ）３０、ディスパッチ・ユニット（ＤＵ）１８、
機能ユニット（ＦＵ）２０、２２および２４、完了バッ
ファ（ＣＢ）２６、および分岐ユニット（ＢＵ）２８が
含まれる。Referring to FIG. 1, a processor 10 in which the present invention resides.
Is shown. The processor 10 has an instruction cache (IC).
12, instruction buffer (IB) 14, instruction address queue (IAQ) 16, second instruction address queue (CB
IAQ) 30, dispatch unit (DU) 18,
Functional units (FU) 20, 22 and 24, completion buffer (CB) 26, and branch unit (BU) 28 are included.

【００１４】プロセッサ１０は、以下のように機能す
る。命令バッファ１４は、アドレス線３２を介して命令
キャッシュ１２に取出アドレスを供給し、取出アドレス
によって指される命令は、データ線３４を介して命令キ
ャッシュ１２から転送され、命令バッファ１４に置かれ
る。これと同時に、命令キャッシュ１２は、その命令の
アドレスを生成し、そのアドレスをアドレス線３６を介
して命令アドレス待ち行列１６に送る。命令アドレス待
ち行列１６は、一般に、相対分岐の分岐目標アドレスを
生成するために必要である。The processor 10 functions as follows. The instruction buffer 14 supplies the fetch address to the instruction cache 12 via the address line 32, and the instruction pointed to by the fetch address is transferred from the instruction cache 12 via the data line 34 and placed in the instruction buffer 14. At the same time, the instruction cache 12 generates the address of the instruction and sends the address to the instruction address queue 16 via the address line 36. The instruction address queue 16 is generally needed to generate branch target addresses for relative branches.

【００１５】各サイクルに、ディスパッチ・ユニット１
８は、命令バッファ１４からの命令を評価し、非分岐命
令を機能ユニット２０、２２または２４にディスパッチ
し、この機能ユニット２０、２２または２４で、その命
令は実行のため待ち行列化される。それと同時に、分岐
ユニット２８は、処理すべき分岐命令に関して命令バッ
ファ１４を継続的に走査する。ディスパッチ・ユニット
１８によってディスパッチされる各命令のタイプが、命
令バス３８を介して完了バッファ２６に送られる。In each cycle, dispatch unit 1
8 evaluates the instruction from the instruction buffer 14 and dispatches the non-branch instruction to the functional unit 20, 22 or 24, where the instruction is queued for execution. At the same time, branch unit 28 continuously scans instruction buffer 14 for branch instructions to process. Each instruction type dispatched by dispatch unit 18 is sent to completion buffer 26 via instruction bus 38.

【００１６】完了バッファ２６は、プロセッサ１０内で
未処理のディスパッチ済み命令を記憶し、プロセッサ１
０のアーキテクチャ的状態を維持する。命令がディスパ
ッチ・ユニット１８によってディスパッチされる際に、
その命令に関連するアドレスが、完了バッファ２６に割
り当てられた、ＣＢＩＡＱ３０と称する第２の命令アド
レス待ち行列３０に読み込まれる。完了バッファ２６
は、ＣＢＩＡＱ３０を使用して、障害を発生した命令の
正しいアドレスを保管する。The completion buffer 26 stores the dispatched instructions that have not been processed in the processor 10, and
Maintain an architectural state of 0. When an instruction is dispatched by dispatch unit 18,
The address associated with that instruction is read into a second instruction address queue 30, designated CBIAQ 30, which is assigned to completion buffer 26. Completion buffer 26
Uses CBIAQ30 to save the correct address of the failing instruction.

【００１７】完了バッファ２６は、カウント・レジスタ
４２とリンク・レジスタ４４のアーキテクチャ化された
値も制御する。たとえば、分岐などの命令が完了バッフ
ァ２６の底に達したならば、その命令は完全に実行され
ている。完全に実行された命令は、完了バッファ２６に
よってコミットされ、この時点で、アーキテクチャ化さ
れたレジスタの値が更新される。この例では、完了バッ
ファ２６は、カウント・レジスタ４２の値を１つ減らす
ことによって、アーキテクチャ化されたカウント・レジ
スタを更新する。Completion buffer 26 also controls the architected values of count register 42 and link register 44. For example, if an instruction such as a branch reaches the bottom of the completion buffer 26, then the instruction has been completely executed. The fully executed instruction is committed by the completion buffer 26, at which point the value of the architected register is updated. In this example, completion buffer 26 updates the architected count register by decrementing the value of count register 42 by one.

【００１８】分岐ユニット２８は、分岐命令を処理し、
取出アドレス線４０を介して命令キャッシュ１２に目標
アドレスを供給し、次のサイクルに命令キャッシュ１２
が新しい命令ストリームを処理できるようにする。Branch unit 28 processes branch instructions,
The target address is supplied to the instruction cache 12 via the fetch address line 40, and the instruction cache 12 is supplied in the next cycle.
To handle the new instruction stream.

【００１９】一部の間接分岐命令では、目標アドレスの
生成にカウント・レジスタ４２やリンク・レジスタ４４
などのレジスタの内容が使用されることを想起された
い。たとえば、ｂｒｃｔｒ命令では、まず機能ユニット
２０、２２および２４のうちの１つによって目標アドレ
スを計算させた後に、この目標アドレスをカウント・レ
ジスタ４２に移動しなければならない。次に、目標アド
レスは、カウント・レジスタ４２から読み取られ、命令
キャッシュ１２に送られる。議論のため、主要な例とし
てｂｒｃｔｒ命令を使用する。In some indirect branch instructions, count register 42 and link register 44 are used to generate the target address.
Recall that the contents of registers such as are used. For example, the brctr instruction must first cause a target address to be calculated by one of the functional units 20, 22 and 24 and then move this target address to the count register 42. The target address is then read from count register 42 and sent to instruction cache 12. For discussion purposes, we will use the brctr instruction as the primary example.

【００２０】上で述べたように、ｂｒｃｔｒ命令は、レ
ジスタ依存であり、これは、ｍｔｃｔｒ命令から生じる
カウント・レジスタ４２の値がわかるまで、ｂｒｃｔｒ
命令を実行できないことを意味する。深いパイプライン
設計を有するプロセッサは、ｂｒｃｔｒ命令の開始と目
標アドレスが使用可能になる時の間に５サイクル以上の
遅延をこうむる可能性がある。As mentioned above, the brctr instruction is register-dependent, which is until the value of the count register 42 resulting from the mtctr instruction is known.
Means the instruction cannot be executed. A processor with a deep pipeline design may experience a delay of 5 cycles or more between the start of the brctr instruction and the time when the target address becomes available.

【００２１】本発明によれば、間接分岐命令の目標アド
レスは、間接分岐命令を実行する際の遅延を最小にする
分岐目標アドレス・キャッシュを使用することによっ
て、目標アドレスがわかる前に予測される。本発明によ
る間接分岐を処理するための方法およびシステムをより
具体的に示すために、そのようなシステムの実施例の１
つを示す図２を参照する。According to the present invention, the target address of an indirect branch instruction is predicted before the target address is known by using a branch target address cache that minimizes the delay in executing an indirect branch instruction. . To more specifically illustrate the method and system for handling indirect branches according to the present invention, one of the embodiments of such a system is given.
2, which shows one of them.

【００２２】図２は、間接分岐命令の計算された分岐ペ
ナルティを最小にするための分岐ユニット・アーキテク
チャを示すブロック図である。この分岐ユニット２８に
は、分岐アドレス・レジスタ５０、分岐加算器５２およ
び分岐目標アドレス・キャッシュ（ＢＴＡＣ）５４が含
まれる。FIG. 2 is a block diagram illustrating a branch unit architecture for minimizing the calculated branch penalties of indirect branch instructions. The branch unit 28 includes a branch address register 50, a branch adder 52 and a branch target address cache (BTAC) 54.

【００２３】上で述べたように、分岐ユニット２８は、
分岐命令を受け取り、目標アドレスを生成し、この目標
アドレスは、次取出アドレス５６として命令キャッシュ
１２に送られる。分岐命令がディスパッチ・ユニット１
８から受け取られる時に、分岐命令のアドレスが、分岐
アドレス・レジスタ５０に置かれる。As mentioned above, the branching unit 28 is
The branch instruction is received and a target address is generated, and this target address is sent to the instruction cache 12 as the next fetch address 56. Branch instruction is dispatch unit 1
When received from 8, the address of the branch instruction is placed in branch address register 50.

【００２４】分岐命令が、相対分岐命令または絶対分岐
命令である場合には、分岐ユニット２８は、分岐加算器
５２を使用して目標アドレスを生成する。相対分岐命令
の場合、目標アドレスは、分岐アドレスと分岐オフセッ
ト５８を加算することによって生成される。絶対分岐の
場合、目標アドレスは、即値データとして命令内で指定
され、分岐オフセット５８入力を使用して分岐加算器５
２を介して渡される。If the branch instruction is a relative branch instruction or an absolute branch instruction, branch unit 28 uses branch adder 52 to generate the target address. For relative branch instructions, the target address is generated by adding the branch address and branch offset 58. In the case of an absolute branch, the target address is specified in the instruction as immediate data and the branch adder 5 is used using the branch offset 58 input.
Passed through 2.

【００２５】間接分岐命令は、相対分岐命令および絶対
分岐命令とは異なる形で処理される。本発明によれば、
ＢＴＡＣ５４は、目標アドレスがｍｔｃｔｒによって供
給される前に分岐ユニット２８が目標アドレスを予測で
きるようにすることによって、ｂｒｃｔｒなどの間接分
岐の計算された分岐ペナルティを減らす。ＢＴＡＣ５４
の各項目には、ｂｒｃｔｒ命令のアドレス、対応するｂ
ｒｃｔｒによって最後に生成された目標アドレスおよび
有効ビットが含まれる。従来の実施態様とは異なり、間
接分岐だけがＢＴＡＣに置かれることに留意されたい。
また、このＢＴＡＣは、分岐命令の目標アドレスだけを
予測し、方向は予測しない。本発明を拡張して、単一の
ｂｒｃｔｒの複数の目標アドレスを（おそらくは複数の
項目を使用して）記憶し、たとえば、ある既知のレジス
タの値や、分岐のこの特定の出現に至る経路など、可能
な目標アドレスのうちの１つを選択するための機構を使
用するようにすることは簡単である。Indirect branch instructions are processed differently than relative and absolute branch instructions. According to the present invention,
BTAC 54 reduces the calculated branch penalty for indirect branches such as brctr by allowing branch unit 28 to predict the target address before it is supplied by mtctr. BTAC54
In each item of, the address of the brctr instruction and the corresponding b
It contains the target address and valid bits last generated by rctr. Note that unlike conventional implementations, only indirect branches are placed in the BTAC.
Also, this BTAC predicts only the target address of the branch instruction, not the direction. The present invention can be extended to store multiple target addresses of a single brctr (possibly using multiple entries), such as the value of some known register or the route to this particular occurrence of a branch. , It is straightforward to use a mechanism for selecting one of the possible target addresses.

【００２６】分岐ユニット２８は、目標アドレスが未処
理のｍｔｃｔｒ命令に依存する時に、分岐アドレス・レ
ジスタ５０の現在の分岐アドレスをＢＴＡＣ５４に送
る。間接分岐命令のアドレスは、ＢＴＡＣ５４の項目の
アドレスと比較される。ｂｒｃｔｒアドレスが、ＢＴＡ
Ｃ５４に記憶されたアドレスの１つと一致し、その項目
が有効である場合には、ＢＴＡＣヒットが発生し、その
項目の目標アドレスが、ＢＴＡＣ５４から読み取られ、
命令キャッシュ１２（図１）に送られる。Branch unit 28 sends the current branch address of branch address register 50 to BTAC 54 when the target address depends on an outstanding mtctr instruction. The address of the indirect branch instruction is compared with the address of the BTAC 54 item. If the brctr address is BTA
If it matches one of the addresses stored in C54 and the item is valid, a BTAC hit occurs and the target address for that item is read from BTAC 54,
It is sent to the instruction cache 12 (FIG. 1).

【００２７】分岐ユニット２８は、ｂｒｃｔｒが実際に
分岐するか、分岐すると予測される時に、新しい命令ス
トリームを投機的に取り出す。目標アドレスがカウント
・レジスタ４２で使用可能になった時に、分岐ユニット
２８は、ｂｒｃｔｒ目標アドレスが誤って予測されたか
否かに応じて、投機的なストリームをフラッシュまたは
完了する。Branch unit 28 speculatively fetches a new instruction stream when brctr actually does or is predicted to branch. When the target address becomes available in count register 42, branch unit 28 flushes or completes the speculative stream depending on whether the brctr target address was mispredicted.

【００２８】ｂｒｃｔｒのアドレスがＢＴＡＣ５４内に
ないか、有効でない場合、ミスが発生し、分岐ユニット
２８は、通常の計算された分岐ペナルティをこうむり、
ｂｒｃｔｒの目標を取り出す前にｍｔｃｔｒの実行が完
了するまで待たなければならない。If the address of brctr is not in BTAC 54 or is not valid, then a miss occurs and branch unit 28 incurs the normal calculated branch penalty.
We must wait until the execution of mtctr completes before fetching the target of brctr.

【００２９】ＢＴＡＣ５４でミスが発生する時には、分
岐ユニット２８が、ｂｒｃｔｒのアドレス、取り出され
た目標アドレスおよびセットされた有効ビットを含む新
しい項目をＢＴＡＣ５４に書き込む。本発明の好ましい
実施例では、ＢＴＡＣ５４の項目の置換は、ラウンドロ
ビン形式で行われる。When a miss occurs in BTAC 54, branch unit 28 writes a new entry to BTAC 54 containing the address of brtr, the target address fetched and the valid bit set. In the preferred embodiment of the invention, the replacement of BTAC 54 items is done in a round-robin fashion.

【００３０】本発明は、相対分岐命令および絶対分岐命
令の目標命令アドレスを予測するのにＢＴＡＣを使用す
るが、間接分岐命令には使用しない従来の方法に対する
改良である。間接分岐命令の目標命令アドレスを予測し
ようとする従来の試みでは、相対分岐命令と間接分岐命
令の両方の項目を含む組合せＢＴＡＣが使用されてき
た。相対分岐命令の予測は、間接分岐命令の予測より正
確であるから、間接命令の不正確な予測が、組合せＢＴ
ＡＣの総合性能を低下させていた。このため、組合せＢ
ＴＡＣはほとんど使用されず、従来の方法のほとんど
は、相対命令の目標命令アドレスだけを予測する。The present invention is an improvement over conventional methods that use BTAC to predict target instruction addresses for relative and absolute branch instructions, but not for indirect branch instructions. Previous attempts to predict the target instruction address of an indirect branch instruction have used a combined BTAC that includes both relative branch instruction and indirect branch instruction entries. Since the prediction of the relative branch instruction is more accurate than the prediction of the indirect branch instruction, the inaccurate prediction of the indirect instruction results in the combination BT.
It was degrading the overall performance of the AC. Therefore, combination B
TAC is rarely used and most conventional methods predict only the target instruction address of relative instructions.

【００３１】しかし、本発明によれば、間接分岐命令専
用の別のＢＴＡＣ５４を使用することによって、分岐ユ
ニットが間接分岐命令の目標アドレスを予測できるよう
になる。８項目を有する完全連想式のＢＴＡＣ５４のシ
ミュレーションによって、たとえばエミュレータなど、
計算式分岐アドレッシングを広範囲に使用するアプリケ
ーションについて、５０％を越える性能向上が示され
た。However, according to the present invention, the use of another BTAC 54 dedicated to indirect branch instructions allows the branch unit to predict the target address of the indirect branch instruction. By simulation of a fully associative BTAC 54 with 8 items, eg an emulator,
Performance gains of over 50% have been demonstrated for applications that make extensive use of computational branch addressing.

【００３２】間接分岐命令に関連する計算された分岐ペ
ナルティを最小にするための方法およびシステムを開示
した。上に示した実施例によって本発明を説明したが、
当業者であれば、この実施例の変形が可能であり、これ
らの変形が本発明の趣旨および範囲に含まれることを容
易に諒解するであろう。したがって、当技術分野で通常
の技量を有する者は、請求項の趣旨および範囲から逸脱
することなく、多数の変更を行うことができる。A method and system for minimizing the calculated branch penalty associated with indirect branch instructions has been disclosed. The invention has been described by the examples given above,
Those skilled in the art will readily appreciate that variations of this embodiment are possible and are within the spirit and scope of the invention. Therefore, one of ordinary skill in the art can make many changes without departing from the spirit and scope of the claims.

【００３３】まとめとして、本発明の構成に関して以下
の事項を開示する。In summary, the following items are disclosed regarding the configuration of the present invention.

【００３４】（１）前の間接分岐命令のアドレスと前の
目標アドレスとを含む少なくとも１つの項目を記憶する
ための第１テーブルと、第１間接分岐命令を処理するた
め第１テーブルに結合された分岐ユニットとを含み、第
１間接分岐命令が、特定のアドレスを有し、プログラム
実行を新目標アドレスで開始させ、第１間接分岐命令
が、新目標アドレスを供給するために先行命令に依存
し、分岐ユニットが、第１間接分岐命令のアドレスを第
１テーブルに記憶された前の間接分岐命令のアドレスと
比較することによって第１間接分岐命令を処理し、アド
レスが一致する場合に、前の目標アドレスが新目標アド
レスとして使用され、これによって、先行命令によって
新目標アドレスが供給される前に新目標アドレスが予測
され、第１間接分岐命令に分岐ペナルティが関連するこ
とを特徴とする、プロセッサ内の分岐ペナルティを最小
にするためのシステム。（２）第１テーブルが、分岐目標アドレス・テーブルで
あり、分岐目標アドレス・テーブルに、複数の項目が含
まれることを特徴とする、上記（１）のシステム。（３）分岐目標アドレス・テーブルの複数の項目のそれ
ぞれに、さらに、有効ビットが含まれることを特徴とす
る、上記（２）のシステム。（４）第１間接分岐命令が、branch-to-count命令であ
ることを特徴とする、上記（３）のシステム。（５）先行命令が、move-to-count命令であることを特
徴とする、上記（４）のシステム。（６）第１間接分岐命令が、branch-to-link命令である
ことを特徴とする、上記（３）のシステム。（７）先行命令が、move-to-link命令であることを特徴
とする、上記（６）のシステム。（８）（ａ）前の間接分岐命令のアドレスと、前の間接
分岐命令の前の目標アドレスとを含む、前の間接分岐命
令に対応する少なくとも１つの項目を、第１テーブルに
記憶するステップと、（ｂ）第１テーブルに記憶された
前の間接分岐命令のアドレスと第１間接分岐命令のアド
レスを比較するステップと、（ｃ）第１間接分岐命令の
アドレスが前の間接分岐命令のアドレスと一致する場合
に、新目標アドレスとして前の目標アドレスを使用し、
これによって、先行命令によって新目標アドレスが供給
される前に新目標アドレスを予測するステップとを含
み、分岐ペナルティが、第１間接分岐命令に関連し、第
１間接分岐命令が、特定のアドレスを含み、プログラム
実行を新目標アドレスで開始させ、第１間接分岐命令
が、新目標アドレスを供給するために先行命令に依存す
ることを特徴とする、プロセッサ内の分岐ペナルティを
最小にするための方法。（９）ステップ（ａ）がさらに、（ａ１）第１テーブル
に複数の項目を記憶するステップと、（ａ２）第１テー
ブルの複数の項目のそれぞれに有効ビットを設けるステ
ップとを含むことを特徴とする、上記（８）の方法。（１０）ステップ（ｃ）がさらに、（ｃ１）第１間接分
岐命令のアドレスが複数の項目のうちの第１の項目のア
ドレスと一致する時に、第１の項目に対応する有効ビッ
トによって第１の項目が有効であることが示される時に
一致が見つかったと判定するステップと、（ｃ２）一致
が見つかったと判定した時に、第１の項目の目標アドレ
スを第１テーブルから命令キャッシュに送るステップと
を含むことを特徴とする、上記（９）の方法。（１１）一致が見つからなかった場合に、ステップ
（ｃ）がさらに、（ｃ３）先行命令によって新目標アド
レスが供給された後に新目標アドレスを取り出すステッ
プと、（ｃ４）第１間接分岐命令のアドレス、取り出さ
れた目標アドレスおよびセットされた有効ビットを含む
新項目を第１テーブルに書き込むステップとを含むこと
を特徴とする、上記（１０）の方法。（１２）命令をディスパッチするためのディスパッチ・
ユニットと、ディスパッチ・ユニットに結合された、分
岐命令を実行するための分岐ユニットとを含み、分岐ペ
ナルティが、間接分岐命令に関連し、第１間接分岐命令
が、特定のアドレスを有し、プログラム実行を新目標ア
ドレスで開始させ、第１間接分岐命令が、新目標アドレ
スを供給するために先行命令に依存し、分岐ユニット
が、相対分岐命令を処理するため、分岐アドレス・レジ
スタに結合された分岐加算器と、分岐アドレス・レジス
タと、複数の項目を含み、複数の項目のそれぞれに、前
の間接分岐命令のアドレスと対応する前の目標アドレス
とが含まれる、分岐アドレス・レジスタに結合された分
岐目標アドレス・テーブルとを含み、分岐アドレス・レ
ジスタに第１間接分岐命令を受け取る際に、分岐ユニッ
トが、分岐目標アドレス・テーブル内で第１間接分岐命
令のアドレスを探索し、第１間接分岐命令のアドレス
が、前の間接分岐命令のうちの１つのアドレスと一致す
る場合に、対応する前の目標アドレスを新目標アドレス
として使用し、これによって、先行命令によって新目標
アドレスが供給される前に新目標アドレスを予測するこ
とを特徴とする、プロセッサ内の分岐ペナルティを最小
にするためのシステム。（１３）第１間接分岐命令が、レジスタ依存分岐命令で
あることを特徴とする、上記（１２）のシステム。（１４）レジスタ依存分岐命令が、branch-to-count
命令であることを特徴とする、上記（１３）のシステ
ム。（１５）先行命令が、move-to-count命令であることを
特徴とする、上記（１４）のシステム。（１６）レジスタ依存分岐命令が、branch-to-link命令
であることを特徴とする、上記（１３）のシステム。（１７）先行命令が、move-to-link命令であることを特
徴とする、上記（１６）のシステム。(1) A first table for storing at least one item containing the address of the previous indirect branch instruction and the previous target address, and a first table for processing the first indirect branch instruction. A first indirect branch instruction having a specific address and causing the program execution to start at a new target address, the first indirect branch instruction dependent on the preceding instruction to supply the new target address. Then, the branch unit processes the first indirect branch instruction by comparing the address of the first indirect branch instruction with the address of the previous indirect branch instruction stored in the first table, and if the address matches, then Target address of the first indirect branch instruction is used as a new target address before the new target address is predicted by the preceding instruction. System for branching penalty and wherein the associated, to minimize the branch penalty in processor. (2) The system according to (1) above, wherein the first table is a branch target address table, and the branch target address table includes a plurality of items. (3) The system of (2) above, wherein each of the plurality of items of the branch target address table further includes an effective bit. (4) The system according to (3) above, wherein the first indirect branch instruction is a branch-to-count instruction. (5) The system according to (4) above, wherein the preceding instruction is a move-to-count instruction. (6) The system according to (3), wherein the first indirect branch instruction is a branch-to-link instruction. (7) The system according to (6) above, wherein the preceding instruction is a move-to-link instruction. (8) (a) storing in the first table at least one item corresponding to the previous indirect branch instruction, including the address of the previous indirect branch instruction and the target address before the previous indirect branch instruction. And (b) comparing the address of the previous indirect branch instruction and the address of the first indirect branch instruction stored in the first table, and (c) the address of the first indirect branch instruction of the previous indirect branch instruction. Uses the previous target address as the new target address if it matches the address,
Predicting the new target address before it is supplied by the preceding instruction, the branch penalty being associated with the first indirect branch instruction and the first indirect branch instruction And a method for minimizing branch penalties in a processor, characterized in that program execution starts at a new target address and the first indirect branch instruction depends on a preceding instruction to supply the new target address. . (9) Step (a) further includes (a1) storing a plurality of items in the first table, and (a2) providing a valid bit in each of the plurality of items in the first table. The method of (8) above. (10) The step (c) further includes (c1) when the address of the first indirect branch instruction matches the address of the first item of the plurality of items, the first effective bit corresponding to the first item And (c2) sending a target address of the first item from the first table to the instruction cache when it is determined that a match is found. The method according to (9) above, which comprises: (11) If no match is found, step (c) further comprises: (c3) retrieving the new target address after the new target address is supplied by the preceding instruction; and (c4) the address of the first indirect branch instruction. Writing a new entry containing the fetched target address and the set valid bit into the first table. (12) Dispatch for dispatching instructions
A unit and a branch unit coupled to the dispatch unit for executing the branch instruction, the branch penalty associated with the indirect branch instruction, the first indirect branch instruction having a particular address, and a program Execution begins at the new target address, the first indirect branch instruction relies on the predecessor instruction to provide the new target address, and the branch unit is coupled to the branch address register to process the relative branch instruction. A branch adder, a branch address register, and a plurality of items, each of which is coupled to a branch address register that includes an address of a previous indirect branch instruction and a corresponding previous target address. And a branch target address table, and when the branch unit receives the first indirect branch instruction in the branch address register, Search the address of the first indirect branch instruction in the address table, and if the address of the first indirect branch instruction matches the address of one of the previous indirect branch instructions, the corresponding previous target address is updated. A system for minimizing branch penalties in a processor, characterized by being used as a target address, whereby the new target address is predicted before being supplied by the preceding instruction. (13) The system according to (12), wherein the first indirect branch instruction is a register-dependent branch instruction. (14) Register-dependent branch instruction is branch-to-count
The system according to (13) above, which is an instruction. (15) The system according to (14), wherein the preceding instruction is a move-to-count instruction. (16) The system according to (13) above, wherein the register-dependent branch instruction is a branch-to-link instruction. (17) The system according to (16) above, wherein the preceding instruction is a move-to-link instruction.

[Brief description of drawings]

【図１】本発明が常駐するプロセッサのブロック図であ
る。FIG. 1 is a block diagram of a processor on which the present invention resides.

【図２】間接分岐命令の計算された分岐ペナルティを最
小にするための分岐ユニット・アーキテクチャを示すブ
ロック図である。FIG. 2 is a block diagram illustrating a branch unit architecture for minimizing the calculated branch penalty of indirect branch instructions.

[Explanation of symbols]

１０プロセッサ１２命令キャッシュ（ＩＣ）１４命令バッファ（ＩＢ）１６命令アドレス待ち行列（ＩＡＱ）１８ディスパッチ・ユニット（ＤＵ）２０機能ユニット（ＦＵ）２２機能ユニット（ＦＵ）２４機能ユニット（ＦＵ）２６完了バッファ（ＣＢ）２８分岐ユニット（ＢＵ）３０命令アドレス待ち行列（ＣＢＩＡＱ）４２カウント・レジスタ４４リンク・レジスタ 10 processor 12 instruction cache (IC) 14 instruction buffer (IB) 16 instruction address queue (IAQ) 18 dispatch unit (DU) 20 functional unit (FU) 22 functional unit (FU) 24 functional unit (FU) 26 completion buffer (CB) 28 Branch Unit (BU) 30 Instruction Address Queue (CBIAQ) 42 Count Register 44 Link Register

フロントページの続き (72)発明者クリストファー・エイチ・オルソンアメリカ合衆国78730 テキサス州オースチンラーンチ・クリーク・ドライブ 3649 (72)発明者テレンス・エム・ポッターアメリカ合衆国78731 テキサス州オースチンツウィン・レッジ・コーブ 6107Front Page Continuation (72) Inventor Christopher H. Olson United States 78730 Austin, Texas Ranch Creek Drive 3649 (72) Inventor Terence M. Potter United States 78731 Austin, Texas Twin Win Ledge Cove 6107

Claims

[Claims]

1. A first table for storing at least one entry containing an address of a previous indirect branch instruction and a previous target address, and a first table coupled to process the first indirect branch instruction. A branch unit, the first indirect branch instruction having a specific address and causing the program execution to start at a new target address, the first indirect branch instruction depending on the preceding instruction to supply the new target address. , The branch unit processes the first indirect branch instruction by comparing the address of the first indirect branch instruction with the address of the previous indirect branch instruction stored in the first table, and if the addresses match, then The target address is used as the new target address, which allows the new target address to be predicted and delivered to the first indirect branch instruction before the preceding target supplies the new target address. A system for minimizing branch penalties within a processor, characterized in that divergence penalties are associated.

2. The system of claim 1, wherein the first table is a branch target address table, and the branch target address table includes a plurality of items.

3. The system of claim 2, wherein each of the plurality of entries in the branch target address table further includes a valid bit.

4. The system of claim 3, wherein the first indirect branch instruction is a branch-to-count instruction.

5. The system of claim 4, wherein the predecessor instruction is a move-to-count instruction.

6. The system of claim 3, wherein the first indirect branch instruction is a branch-to-link instruction.

7. The system of claim 6, wherein the predecessor instruction is a move-to-link instruction.

8. A first table stores at least one item corresponding to a previous indirect branch instruction, including (a) an address of the previous indirect branch instruction and a target address before the previous indirect branch instruction. And (b) comparing the address of the previous indirect branch instruction stored in the first table with the address of the first indirect branch instruction, and (c) indirect branching the address of the first indirect branch instruction before. Using the previous target address as the new target address if it matches the address of the instruction, thereby predicting the new target address before it is supplied by the preceding instruction, and the branch penalty is Related to the first indirect branch instruction, the first indirect branch instruction includes a specific address and causes the program execution to start at a new target address, and the first indirect branch instruction causes the new target address to Characterized in that it depends on the preceding instruction to supply less, a method for minimizing the branch penalty in the processor.

9. Step (a) further comprises: (a1) storing a plurality of items in a first table, and (a2) providing a valid bit for each of the plurality of items in the first table. 9. The method of claim 8, characterized by:

10. The step (c) further comprises: (c1) when the address of the first indirect branch instruction matches the address of the first item of the plurality of items, by the valid bit corresponding to the first item. Determining that a match was found when the first item was shown to be valid; and (c2) sending a target address for the first item from the first table to the instruction cache when determining that a match was found. 10. The method of claim 9, comprising:

11. If no match is found, step (c) further comprises: (c3) retrieving the new target address after the new target address has been supplied by the preceding instruction, and (c4) the first indirect branch instruction. Of the new address, the target address fetched, and the valid bit set in the first table.

12. A branch penalty comprising a dispatch unit for dispatching instructions and a branch unit coupled to the dispatch unit for executing branch instructions, wherein a branch penalty is:
Related to an indirect branch instruction, the first indirect branch instruction has a specific address and causes program execution to start at a new target address, and the first indirect branch instruction depends on the preceding instruction to supply the new target address. However, the branch unit includes a branch adder coupled to the branch address register for processing relative branch instructions, a branch address register, and multiple items, each of the multiple items having a previous indirect branch. A branch target address table coupled to the branch address register containing the address of the instruction and the corresponding previous target address, the branch unit receiving a first indirect branch instruction in the branch address register; ,
Searches the address of the first indirect branch instruction in the branch target address table, and if the address of the first indirect branch instruction matches the address of one of the previous indirect branch instructions, the corresponding previous target address Is used as a new target address, whereby the new target address is predicted before the new target address is supplied by the preceding instruction, thereby minimizing the branch penalty in the processor.

13. The system of claim 12, wherein the first indirect branch instruction is a register dependent branch instruction.

14. A register-dependent branch instruction is a branch-to-co.
The system of claim 13, wherein the system is an unt instruction.

15. The system of claim 14, wherein the predecessor instruction is a move-to-count instruction.

16. A register-dependent branch instruction is a branch-to-li.
14. The system of claim 13, which is an nk instruction.

17. The system of claim 16, wherein the predecessor instruction is a move-to-link instruction.