JP3800533B2

JP3800533B2 - Program counter control method and processor

Info

Publication number: JP3800533B2
Application number: JP2002190557A
Authority: JP
Inventors: 竜一砂山; 國樹森田; 愛一郎井上
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2002-06-28
Filing date: 2002-06-28
Publication date: 2006-07-26
Anticipated expiration: 2022-06-28
Also published as: US7765387B2; JP2004038256A; US20040003207A1

Description

【０００１】
【発明の属する技術分野】
本発明はプログラムカウンタ制御方法及びプロセッサに係り、特に分岐予測を行うと共に分岐のための遅延命令を持つ命令制御において、分岐命令を含めた複数の命令が同時に完了する構成のプログラムカウンタ及びネクストプログラムカウンタを同時に更新・制御するためのプログラムカウンタ制御方法、及びそのようなプログラムカウンタ制御方法を用いるプロセッサに関する。
【０００２】
【従来の技術】
近年、プロセッサの性能を向上させるために、様々な命令処理方式が用いられる。その中の１つに、アウトオブオーダ処理方式がある。アウトオブオーダ処理方式を採用したプロセッサにおいては、１つの命令実行の完了を待たずに後続の命令列を順次、複数のパイプラインに投入して命令を実行させることで、性能の向上を行なっている。
【０００３】
しかし、先行している命令の実行結果が後続の命令の実行に影響を与える場合には、先行している命令実行が完了しなければ後続の命令を実行させることができない。後続の命令の実行に影響を与える命令の処理が遅ければ、その間後続の命令は実行できず、先行の命令完了を待ち続けることになる。このため、パイプラインに乱れが生じ、性能の低下を引き起こす。このようなパイプラインの乱れは、特に分岐命令の場合に顕著に現れる。
【０００４】
分岐命令の中でも、条件分岐と呼ばれる命令の場合、条件分岐命令の直前で分岐条件（通常はコンディションコード）を変更する命令があると、分岐条件を変更する命令が完了し、分岐条件が確定するまで分岐が確定しない。従って、分岐命令の後続のシーケンスがわからないために、後続の命令を投入できず、処理が止まってしまい処理能力が低下する。このことはアウトオブオーダ処理方式を採用したプロセッサに限らず、ロック・ステップ・パイプライン等の処理方式を採用しているプロセッサにおいても同様の問題が生じるが、アウトオブオーダ処理方式を採用しているプロセッサにおいては、より顕著に性能の低下が生じる。そこで、分岐命令による性能低下を抑えるために、通常は、分岐予測を行う分岐予測機構をプロセッサ内の命令制御装置に設けて、分岐命令の高速化を図っている。
【０００５】
分岐予測機構を備えたアウトオブオーダ処理方式を採用するプロセッサの場合、複数の分岐命令が分岐予測結果に基づいて実行パイプラインに投入される。分岐命令が分岐した場合は、分岐先アドレスを命令アドレスレジスタにセットする必要がある。尚、SPARC（商標）アーキテクチャを採用するプロセッサでは、この命令アドレスレジスタは、プログラムカウンタ・ネクストプログラムカウンタと呼ばれる。実行パイプライン上に複数の分岐命令が存在している場合、命令アドレスレジスタは、各分岐命令の分岐先アドレスを命令完了まで保持する必要がある。しかし、各分岐命令の分岐確定のタイミングは異なるため、従来は、実際には分岐しない分岐命令の分岐先アドレスまで保持する必要があった。
【０００６】
実行パイプラインの分岐命令のスループットは、分岐命令制御部のスループット及び分岐先アドレスを保持するための分岐先アドレスレジスタの数で決まる。ところが、実際には分岐しない分岐命令の分岐先アドレスで分岐先アドレスレジスタを使用してしまうと、結果的に分岐命令のスループットを抑えてしまうことになる。このため、スループット改善のために、更に分岐先アドレスレジスタを増やす必要が生じてしまうという悪循環に陥ってしまっていた。
【０００７】
命令制御装置において、その実行速度を決定するものの１つに、1サイクルにおける処理可能な命令数がある。アウトオブオーダ方式による命令制御装置においては、同時に複数の命令を完了することができる。通常、命令完了とは使用するレジスタ等のリソースの更新が完了した時点を指すが、同時に複数の命令を完了させる場合には、使用するリソースの更新も複数同時に完了させる必要がある。当然、命令アドレスレジスタも、複数の命令分更新する必要がある。SPARCアーキテクチャに代表される、分岐の遅延命令を有するアーキテクチャを制御する場合、分岐命令の分岐の可否により遅延命令の実行の有無が決まり、プログラムカウンタ・ネクストプログラムカウンタの２つのレジスタを更新する必要がある。このため、従来は、分岐命令は単独若しくは決まった位置（同時に命令完了する時の他の命令との相対的な位置）からしか命令完了（コミット）できなかった。通常、デコードのサイクルにおいて、パケット方式で分岐命令をパケットの最後の位置に置き命令完了する場合も、その位置を決めている。この場合、デコードサイクル及び命令完了サイクルは、分岐命令により制約を受けることになる。
【０００８】
【発明が解決しようとする課題】
近年、LSI製造技術の向上等により、大容量のメモリを使用することが可能になり、オペレーテイングシステム（OS）やアプリケーションにおいて６４ビット化が進められるようになった。これに伴い、命令制御装置においても６４ビット化が求められている。しかし、６４ビット化に伴い、使用するレジスタ等の回路が増大してしまう。分岐命令の制御に関わるレジスタも、６４ビット化する必要があり、分岐先アドレスレジスタ等が増大してしまう。
【０００９】
このように、単純に３２ビット構成から６４ビット構成にした場合、エントリー数は変わらずに、回路が２倍必要になってしまい、回路規模（実装面積）が増大してしまうという問題があった。
【００１０】
しかし、現時点において、実際のプログラムの命令領域を４Ｇバイト以上使用するケースは少なく、プログラム中で４Ｇバイト境界を越えるケースが命令処理の性能を大きく左右することはあまり見られない。そこで、本発明は、分岐のスループットの向上を、できるだけ小さい回路規模（実装面積）で図れるプログラムカウンタ制御方法及びプロセッサを実現することを目的とする。
【００１１】
【課題を解決するための手段】
上記の課題は、分岐予測機構を用いてアウトオブオーダ方式による命令制御を行うと共に、分岐の遅延命令を有するアーキテクチャを制御するプロセッサにおいて、分岐予測が成功して分岐命令が分岐した時に、分岐命令を含めた複数の命令を同時に命令完了させるステップと、プログラムカウンタ及びネクストプログラムカウンタを、完了した命令数に合わせて同時に更新するステップとを含むことを特徴とするプログラムカウンタ制御方法によって達成できる。
【００１２】
上記の課題は、分岐予測機構を用いてアウトオブオーダ方式による命令制御を行うと共に、分岐の遅延命令を有するアーキテクチャを制御するプロセッサにおいて、分岐予測が成功して分岐命令が分岐しない時に、分岐命令を含めた複数の命令を同時に命令完了させるステップと、プログラムカウンタ及びネクストプログラムカウンタを、完了した命令数に合わせて同時に更新するステップとを含むことを特徴とするプログラムカウンタ制御方法によっても達成できる。
【００１３】
上記の課題は、分岐予測機構を用いてアウトオブオーダ方式による命令制御を行うと共に、分岐の遅延命令を有するアーキテクチャを制御するプロセッサにおいて、分岐予測が失敗して分岐命令が分岐する時に、分岐命令を含めた複数の命令を同時に命令完了させるステップと、プログラムカウンタ及びネクストプログラムカウンタを、完了した命令数に合わせて同時に更新するステップとを含むことを特徴とするプログラムカウンタ制御方法によっても達成できる。
【００１４】
上記の課題は、分岐予測機構を用いてアウトオブオーダ方式による命令制御を行うと共に、分岐の遅延命令を有するアーキテクチャを制御するプロセッサにおいて、分岐予測が失敗して分岐命令が分岐しない時に、分岐命令を含めた複数の命令を同時に命令完了させるステップと、プログラムカウンタ及びネクストプログラムカウンタを、完了した命令数に合わせて同時に更新するステップとを含むことを特徴とするプログラムカウンタ制御方法によっても達成できる。
【００１５】
前記アーキテクチャは、６４ビット長の命令アドレス空間を使用し、プログラムカウンタ制御方法は、命令の下位３２ビットとキャリービット及びボロウビットのみを使用して、分岐命令制御及び分岐先アドレスの生成を行うステップを更に含んでも良い。
【００１６】
上記の課題は、分岐予測部を用いてアウトオブオーダ方式による命令制御を行うと共に、分岐の遅延命令を有するアーキテクチャを制御するプロセッサにおいて、分岐命令の分岐条件判定、分岐予測の成否及び命令再フェッチの制御を行い、複数の分岐命令を同時に制御可能な分岐命令制御部と、分岐することが確定した分岐命令の分岐先アドレスを複数格納する分岐先アドレスレジスタとを備え、該分岐先アドレスレジスタは、該分岐命令制御部及び該分岐予測部とは独立して制御可能であることを特徴とするプロセッサによっても達成できる。
【００１７】
上記の課題は、分岐予測機構を用いてアウトオブオーダ方式による命令制御を行うと共に、分岐の遅延命令を有するアーキテクチャを制御するプロセッサにおいて、プログラムカウンタ及びネクストプログラムカウンタからなるプログラムカウンタ手段と、分岐予測が成功して分岐命令が分岐した時に、分岐命令を含めた複数の命令を同時に命令完了させる手段と、該プログラムカウンタ及び該ネクストプログラムカウンタを、完了した命令数に合わせて同時に更新する手段とを備えたことを特徴とするプロセッサによっても達成できる。
【００１８】
上記の課題は、分岐予測機構を用いてアウトオブオーダ方式による命令制御を行うと共に、分岐の遅延命令を有するアーキテクチャを制御するプロセッサにおいて、プログラムカウンタ及びネクストプログラムカウンタからなるプログラムカウンタ手段と、分岐予測が成功して分岐命令が分岐しない時に、分岐命令を含めた複数の命令を同時に命令完了させる手段と、該プログラムカウンタ及び該ネクストプログラムカウンタを、完了した命令数に合わせて同時に更新する手段とを備えたことを特徴とするプロセッサによっても達成できる。
【００１９】
上記の課題は、分岐予測機構を用いてアウトオブオーダ方式による命令制御を行うと共に、分岐の遅延命令を有するアーキテクチャを制御するプロセッサにおいて、プログラムカウンタ及びネクストプログラムカウンタからなるプログラムカウンタ手段と、分岐予測が失敗して分岐命令が分岐する時に、分岐命令を含めた複数の命令を同時に命令完了させる手段と、該プログラムカウンタ及び該ネクストプログラムカウンタを、完了した命令数に合わせて同時に更新する手段とを備えたことを特徴とするプロセッサによっても達成できる。
【００２０】
上記の課題は、分岐予測機構を用いてアウトオブオーダ方式による命令制御を行うと共に、分岐の遅延命令を有するアーキテクチャを制御するプロセッサにおいて、プログラムカウンタ及びネクストプログラムカウンタからなるプログラムカウンタ手段と、分岐予測が失敗して分岐命令が分岐しない時に、分岐命令を含めた複数の命令を同時に命令完了させる手段と、該プログラムカウンタ及び該ネクストプログラムカウンタを、完了した命令数に合わせて同時に更新する手段とを備えたことを特徴とするプロセッサによっても達成できる。
【００２１】
従って、本発明によれば、分岐のスループットの向上を、できるだけ小さい回路規模（実装面積）で図れるプログラムカウンタ制御方法及びプロセッサを実現することができる。
【００２２】
【発明の実施の形態】
本発明になるプログラムカウンタ制御方法及び本発明になるプロセッサの各実施例を、以下図面と共に説明する。
【００２３】
【実施例】
図１は、本発明になるプロセッサの一実施例を示すブロック図である。同図中、プロセッサ１００は、命令ユニット２１、メモリユニット２２及び演算ユニット２３を含む。命令ユニット２１は、本発明になる命令制御方法の一実施例を採用する命令制御装置を構成する。メモリユニット２２は、命令やデータ等を格納するために設けられ、演算ユニット２３は、各種演算を実行するために設けられている。
【００２４】
命令ユニット２１は、図１に示す如く接続された分岐予測部１、命令フェッチ部２、命令バッファ部３、相対分岐アドレス生成部４、命令デコーダ部５、分岐命令実行部６、命令完了制御部９、分岐先アドレスレジスタ１０及びプログラムカウンタ部１１からなる。分岐命令実行部６は、分岐命令制御部７及びディレイスロットスタック部８からなる。プログラムカウンタ部１１には、プログラムカウンタPC、ネクストプログラムカウンタnPC及び更新部が含まれる。
【００２５】
分岐命令の制御は、分岐予測部１、分岐命令制御部７、命令完了制御部９及び分岐先アドレスレジスタ１０において独立して制御できる。実行パイプライン上に存在する分岐命令は、命令デコーダ部５においてデコードされると、一旦分岐命令制御部７の制御下となる。分岐命令制御部７では、分岐命令の分岐条件判定、分岐予測の成否及び命令再フェッチの制御を行う。分岐命令制御部７で制御可能な分岐命令数は、エントリー数で決定される。分岐命令制御部７での制御は、分岐命令の分岐条件確定及び分岐先アドレス生成までであり、それ以降は命令完了制御部９において制御が行われる。分岐先アドレスレジスタ１０では、分岐命令制御部７での制御から開放された分岐する分岐命令の分岐先アドレスを制御しており、命令完了、即ち、プログラムカウンタ部１１の更新まで制御する。命令完了制御部９では、全命令の命令完了条件を制御しており、分岐命令は、分岐の可否によらず必ず制御される。実行パイプライン上に同時に存在できる分岐命令数MAXは、命令完了制御部９のエントリー数Nに依存し、分岐先アドレスレジスタ（エントリー数＝M）がフル（FULL）になった時は、分岐する分岐命令数のMAXは、分岐命令制御部７のエントリー数Lとの和L+Mになる。分岐しない分岐命令は、分岐先アドレスレジスタ１０の数に依存しない。分岐先アドレスレジスタ１０の下での制御は、分岐命令制御部７から開放されて命令が完了するまでの間だけであるため、分岐命令制御部７の空きが存在する間は命令デコードが影響を受けることはない。
【００２６】
分岐先アドレス生成は、命令相対分岐とレジスタ相対分岐の２種類に分けられる。命令相対分岐の分岐先アドレス生成は、相対分岐アドレス生成部４で計算され、分岐命令制御部７経由で分岐先アドレスレジスタ１０に供給される。レジスタ相対分岐の分岐先アドレスは、命令実行ユニット２３内で計算され、分岐命令制御部７経由で分岐先アドレスレジスタ１０に供給される。例えば、レジスタ相対分岐の分岐先アドレスの下位３２ビットは分岐命令制御部７経由で、上位３２ビットは直接プログラムカウンタ部１１に供給される。アドレス相対分岐の分岐先アドレスは、上位３２ビットが変わる時はボロウ（Borrow）ビット、キャリー（Carry）ビットの有無に基づいて計算により求められるので、分岐命令制御部７では分岐先命令アドレスは（下位３２ビット+４ビットパリティ+Borrowビット+Carryビット）*エントリー数で制御される。分岐先アドレスレジスタ１０においても同様に、分岐先命令アドレスは（下位３２ビット+４ビットパリティ+Borrowビット+Carryビット）*エントリー数で構成されている。命令アドレスの上位32ビットが変わる時は、一旦プログラムカウンタ部１１に値をセットした後にプログラムカウンタ部１１からのリトライにより命令フェッチを必ず行うようにする。
【００２７】
使用するリソースの更新のための制御は、命令完了制御部９及びプログラムカウンタ部１１で行われる。プログラムカウンタ部１１の場合、命令完了（コミット）では同時に何命令が命令完了したか及び分岐する命令が命令完了したかどうかの情報が供給される。尚、分岐する命令が命令完了した場合は、その情報が分岐命令制御部７にも供給される。本実施例では、PC=nPC+（同時に命令完了した数-1）＊４、nPC＝nPC+（同時に命令完了した数＊４）若しくは分岐先アドレスとなる。本実施例では、分岐する分岐命令はそれより前にある命令とは同時に命令完了できるが、その後ろにある命令とは同時に命令完了しないようになっている。これは、プログラムカウンタPCにセットするパスに、分岐先アドレスのパスを入れていないからであり、ネクストプログラムカウンタnPCと同様に、プログラムカウンタPCにも分岐先アドレスのパスを入れれば、分岐命令の同時命令完了数の制約はなくなる。分岐しない分岐命令においては、本実施例においても、同時命令完了数の制約は受けない。本実施例では、分岐命令が命令完了する際に、命令完了する位置の制約はなく、又、デコード時にも制約を受けない。
【００２８】
図２は、図１に示す命令ユニット２１の要部を、演算ユニット２３と共に示すブロック図である。図２中、図１と同一部分には同一符号を付し、その説明は省略する。図２において、命令デコーダ部５から分岐命令制御部７及び命令完了制御部９への入力の図示は省略する。プログラムカウンタ部１１は、プログラムカウンタPC、ネクストプログラムカウンタnPC、ラッチ回路１１−１、プログラムカウンタ（PC）用更新回路１１−２及びネクストプログラムカウンタ（nPC）用更新回路１１−３からなる。以下の説明で、特に明記していない場合を除き、アドレスは全てロジカルアドレスであるものとする。
【００２９】
本実施例では、説明の便宜上、SPARCアーキテクチャが採用されているものとして説明する。命令は、アウトオブオーダで処理され、分岐命令実行部６においては、分岐命令制御部７内に複数のRSBR0〜RSBRm（Reservation Station for Branch）が設けられ、ディレイスロットスタック部８内に複数のDSS0〜DSSn（Delay Slot Stack）が設けられている。又、分岐命令予測機構として、分岐予測部１が設けられている。
【００３０】
図３は、上記の如き分岐命令制御時の動作を説明するフローチャートである。同図中、ステップＳ１は、分岐命令が終了したか否かを判定し、判定結果がＹＥＳになると、ステップＳ２は、分岐命令が分岐するか否かを判定する。ステップＳ２の判定結果がＮＯであると、処理は後述するステップＳ４へ進む。他方、ステップＳ２の判定結果がＹＥＳであると、ステップＳ３は、分岐先アドレスレジスタ１０に空きエントリーが存在するか否かを判定する。ステップＳ３の判定結果がＹＥＳになると、処理はステップＳ４へ進む。
【００３１】
ステップＳ４は、分岐命令制御部７での分岐命令の制御を完了し、処理はステップＳ５及びステップＳ６へ進む。ステップＳ５は、分岐命令の完了を命令完了制御部９へ通知する。ステップＳ５と同時に、ステップＳ６は、分岐する場合は分岐アドレスの保持を分岐先アドレスレジスタ１０に指示する。ステップＳ５及びステップＳ６の後、ステップＳ７は、リソースの更新、即ち、プログラムカウンタ部１１の更新を更新回路１１−２，１１−３により行う。
【００３２】
図４は、プログラムカウンタ部１１の更新時の動作を説明するフローチャートである。同図に示す処理は、図３に示すステップＳ７の処理に対応する。図４において、ステップＳ１１は、命令完了の条件がそろっているか否かを判定する。ステップＳ１１の判定結果がＹＥＳになると、ステップＳ１２は、同時に何命令が命令完了したか及び分岐する命令が命令完了したかの情報をプログラムカウンタ部１１へ通知する。ステップＳ１２の後、ステップＳ１３及びステップＳ１４が同時に行われる。
【００３３】
ステップＳ１３は、分岐する命令が命令完了した場合、その情報を分岐先アドレスレジスタ１０へ通知し、処理は後述するステップＳ１５へ進む。他方、ステップＳ１４は、通知される情報に、分岐する命令が含まれているか否かを判定し、判定結果がＹＥＳであると処理はステップＳ１５へ進み、ＮＯであると処理はステップＳ１６へ進む。ステップＳ１５は、PC=nPC+（同時に命令完了した数-1）＊４、nPC＝分岐先アドレスに設定する。又、ステップＳ１６は、PC=nPC+（同時に命令完了した数-1）＊４、nPC＝nPC+（同時に命令完了した数＊４）に設定する。
【００３４】
図２にの説明に戻るに、命令フェッチ要求が命令フェッチ部２から出されると、命令フェッチ要求が要求する命令アドレスに対して、分岐予測部１で分岐予測が行われる。分岐予測部１に、命令フェッチ要求が要求する命令アドレスに該当するエントリーが存在する場合は、対応する命令フェッチデータに分岐予測が行われたことを示すフラグBRHIS_HITが付加され、分岐予測された分岐先命令アドレスの命令フェッチ要求が命令フェッチ部２へ出力される。命令フェッチデータは、命令フェッチ部２から、付加されたフラグBRHIS_HITと共に命令デコーダ部５に供給される。命令デコーダ部５で命令がデコードされ、命令が無効（Annul）ビットを持つBPr,Bicc,BPcc, FBcc,FBPcc等の分岐命令の場合は、フラグBRHIS_HITと共に無効ビットを参照する。
【００３５】
命令デコーダ部５は、フラグBRHIS_HIT=1の場合には、無条件で後続の１命令を実行するが、フラグBRHIS_HIT＝０で無効ビット＝１の場合には、後続の１命令をNon-Operation命令（NOP命令）としてデコードを行う。つまり、命令デコーダ部５は、フラグBRHIS_HIT=1であれば通常通りにデコードを行い、デコード結果が分岐命令でフラグBRHIS_HIT＝０であり無効ビット＝１の場合は、後続の１命令をNOP命令に変える。SPARCアーキテクチャでは、無効ビットを有する分岐命令は、分岐成立の場合はディレイスロット命令（遅延命令）を実行し、分岐不成立の場合で無効ビットが１の場合はディレイスロット命令を実行せず、無効ビットが０の場合のみディレイスロット命令を実行する。分岐予測を行うということは、命令が分岐命令であり、且つ、分岐成立と予測していることであるので、ディレイスロット命令を実行すると予測していることと実質的に同じである。尚、無効ビットを持たないCALL命令、JMPL命令及びRETURN命令は、無条件分岐であり、いずれもディレイスロット命令を必ず実行することから、同じように取り扱える。COND=1000であるALWAYS_BRANCH命令については、無条件分岐であるにもかかわらず、無効ビット＝１の時はディレイスロット命令を実行しないが、このようなケースは出現頻度が少ないことから、命令再フェッチによりリカバリーすることができる。
【００３６】
分岐予測を行った場合、予測が当たった場合は再命令フェッチを行う必要はなく、予測した分岐先の命令シーケンスと実際の命令シーケンスは同じである。又、分岐予測が当たったということは、ディレイスロット命令の実行も正しく行われているということであり、この場合はそのまま命令を実行し続ける。
【００３７】
分岐予測を行い予測が外れた場合は、再命令フェッチが必要である。分岐先の命令シーケンスは、間違ったものを実行しており、実際の命令シーケンスを実行し直す必要がある。この場合、ディレイスロット命令の実行も誤っているため、ディレイスロット命令からやり直す必要がある。本実施例では、分岐命令制御部８から分岐先の再命令フェッチ要求を命令フェッチ部２に出力した後、ディレイスロットスタック部８より再実行するディレイスロット命令を取り出し、命令デコーダ部５へディレイスロット命令を供給する。これにより、ディレイスロット命令を含めて、分岐予測のリカバリーを行っている。
【００３８】
全ての分岐命令は、命令デコード部５でデコードされると、分岐命令制御部７及び命令完了制御部９にエントリーを作成する。分岐命令制御部７では、分岐命令の分岐先アドレス及び分岐条件の確定まで分岐命令を制御する。命令完了制御部９では、命令完了のための制御、即ち、命令をインオーダで完了させるための制御が行われる。
【００３９】
SPARCでは、上記の如く、命令相対分岐及びレジスタ相対分岐の２種類が定義されている。命令相対分岐の分岐先アドレスは、相対分岐アドレス生成部４で生成され、レジスタ相対分岐の分岐先アドレスは、演算ユニット２３で生成される。相対分岐アドレス生成部４で生成された分岐先アドレスは、分岐命令制御部７に供給される。分岐命令制御部７は、相対分岐アドレス生成部４から分岐先アドレスPCRAG_TGT_PC[31:0,P3:P0]、CARRYビットPCRAG_TGTPC_CARRY、BORROWビットPCRAG_TGTPC_BORROWを供給され、演算ユニット２３から分岐先アドレスEXA_TGT_PC[31:0,P3:P0]を供給される。又、演算ユニット２３は、この時EXA_TGT_PC[63:32,P7:P4]をプログラムカウンタ部１１に供給する。
【００４０】
分岐命令制御部７での分岐命令の制御が完了すると、分岐命令は命令完了制御部９で命令完了まで制御される。分岐命令が分岐命令制御部７から開放される時、分岐命令が分岐する場合は、分岐先アドレスが分岐先アドレスレジスタ１０に格納される。分岐先アドレスレジスタ１０に格納されている分岐先アドレスは、対応する分岐命令のサイクルＷでプログラムカウンタ部１１のネクストプログラムカウンタnPCの更新に使われる。サイクルＷは、レジスタ類の更新サイクルであり、プログラムカウンタPC及びネクストプログラムカウンタnPCもこのサイクルＷにおいて更新される。分岐命令制御部７から分岐命令を開放する時、開放される分岐命令が分岐する時は分岐アドレスレジスタ１０のエントリーに空きがあるか否かが確認され、空きがある場合は分岐命令制御部７から開放されるが、空きがない場合は分岐命令制御部７から開放されない。しかし、分岐アドレスレジスタ１０がフル（FULL）であっても、分岐命令が分岐しない場合は、分岐命令は分岐命令制御部７から開放される。
【００４１】
本実施例では、分岐命令制御部７は１０個のエントリーを持ち、分岐アドレスレジスタ１０は２個のエントリーを持っている。分岐アドレスレジスタ１０がフル（FULL）であっても、分岐命令制御部７内にある後続の分岐命令の制御は、分岐命令制御部７がフル（FULL）になるまで制御が止まることはない。分岐アドレスレジスタ１０のエントリーは、VALID、分岐先アドレスTGT_PC[31:0,P3:P0]、CARRYビットTGT_PC_CARRY、BORROWビットTGT_PC_BORROW、IID[5:0]により構成されている。エントリーの有効性を示すVALID=1である時は、そのエントリーが有効であることを示す。分岐する分岐命令が分岐命令制御部７から開放されると、分岐アドレスレジスタ１０にエントリーが作成され、VALID=１にセットされ、その分岐命令のサイクルＷまでエントリーが保持される。
【００４２】
図５は、分岐命令制御部７内のエントリーを示す概念図である。同図に示す１０個エントリーは、夫々VALID、分岐先アドレスTGT_PC[31:0,P3:P0]、分岐アドレスPC[31:0,P3:P0]、CARRYビット、BORROWビット及びIIDを含む。
又、図６は、分岐先アドレスレジスタ１０の概要図である。同図に示す２個のエントリーＡ、Ｂは、夫々VALID、分岐先アドレスTGT_PC[31:0,P3:P0]、CARRYビット、BORROWビット及びIIDを含む。
【００４３】
図７は、プログラムカウンタ部１１の構成を示すブロック図である。プログラムカウンタ部１１のプログラムカウンタPC及びネクストプログラムカウンタnPCの更新は、命令の完了（命令完了サイクル）後のサイクルＷにおいて同時に行われる。更新については、大きく分けて以下のケース(1)〜(4)が存在する。
(1)複数の命令が同時に命令完了し、その中に分岐する分岐命令が存在しない場合。
(2)複数の命令が同時に命令完了し、その中に分岐する分岐命令が存在する場合。
(3)命令が４Ｇバイト境界を跨いで実行される場合。
(4)命令が命令完了した時その命令で割り込みが発生した場合。
【００４４】
プログラムカウンタ部１１のプログラムカウンタPC及びネクストプログラムカウンタnPCの更新は、基本的にはPC=nPC+（同時に命令完了した数-1）＊４、nPC＝nPC+（同時に命令完了した数＊４）or分岐先アドレスとなる。従って、PC=nPC+（同時に命令完了した数-1）＊４、nPC=PC+4 or分岐先アドレスとなる。
【００４５】
本実施例では、分岐命令の遅延命令（ディレイスロット（DSS）命令）を、無効（ANNUL）にするという事象を、無効にされるDSS命令をNon-Operation命令（NOP命令）相当に置き換えることで実現している。プログラムカウンタ部１１のプログラムカウンタPC及びネクストプログラムカウンタnPCについても、命令を実行した時と同じように更新を行う。従って、分岐しない分岐命令が命令完了する場合と、同時に命令完了する命令群の中に分岐命令が含まれていない場合（上記ケース(1)の場合）には、プログラムカウンタPC及びネクストプログラムカウンタnPCの更新はPC=nPC+（同時に命令完了した数-1）＊４、nPC=PC+4となる。DSS命令が無効にされた時、NOP命令に置き換えたDSS命令の命令完了時に割り込みを許すと、プロセッサ外にDSS命令終了時のプログラムカウンタPCの値が見えてしまう。このようなNOP命令に置き換えたDSS命令は、実際には実行されない命令であるため、PC=（NOP命令に置き換えたDSS命令の命令アドレス）となっていると、割り込みから復帰した場合に実行命令列を誤る。そこで、これを防ぐため、本実施例では、このような場合にプログラムカウンタPC及びネクストプログラムカウンタnPCの値をDSS命令分（4バイト）更に更新するようにする。
【００４６】
分岐する分岐命令が、複数の命令と同時に命令完了する場合、分岐命令は、それ以前の命令と同時に命令完了することはできるが、後続の命令とは同時に命令完了することができない。即ち、分岐する分岐命令が命令完了する場合は、必ず同時に命令完了する命令群の最後の命令となるため、PC=（DSS命令の命令アドレス）、nPC＝（分岐の分岐先アドレス）となる（上記ケース(2)の場合）。
【００４７】
本実施例では、分岐する分岐命令の命令完了時に制約を設けることで、プログラムカウンタPC及びネクストプログラムカウンタnPCの更新回路１１−２，１１−３を簡素化しているが、プログラムカウンタPCの更新回路１１−２に対して、分岐する分岐命令の後続命令が何命令同時に命令完了したかを通知することで、PC=TGT_PC+（同時に命令完了した分岐命令の後続命令数*4）、nPC=PC+4になり、命令完了時の制約は不要となる。
【００４８】
分岐する分岐命令のTGT_PCは、分岐先アドレスレジスタ１０からネクストプログラムカウンタnPCにサイクルＷにおいてセットされる。分岐する分岐命令が命令完了すると、そのサイクルＷにその命令の命令番号（Instruction ID:IID）が命令完了制御部９から供給され、分岐先アドレスレジスタ１０にある同じIIDを持つエントリーから、ネクストプログラムカウンタnPCに対してTGT_PC[31:0,P3:P0]がセットされる。このようにネクストプログラムカウンタnPCにセットすると同時に、分岐先アドレスレジスタ１０のエントリーは開放され、分岐先アドレスレジスタ１０には新たなエントリーをセットすることが可能となる。
【００４９】
分岐命令が命令完了し、その分岐命令が分岐し、分岐先のアドレスが４Ｇバイト境界を跨いでいる場合の動作は、次のようになる。プログラムカウンタPCはPC=nPC+（同時に命令完了した数-1）＊４で求められるため、特別な制御は必要としない。ネクストプログラムカウンタnPCは、nPC＝（分岐先アドレス）となるため、分岐先アドレスレジスタ１０からセットされるが、分岐先アドレスレジスタ１０は、下位３２ビット（+４PARITY）しかアドレスを保持していない。このため、命令完了した分岐命令が、命令相対分岐である場合は、分岐先アドレスレジスタ１０に保持してあるTGT_PC_CARRY、TGT_PC_BORROWに基づいて、上位３２ビット（+４PARITY）が生成される。分岐先アドレスが４Ｇバイト境界を跨ぐか否かは、TGT_PC_CARRY、TGT_PC_BORROWのいずれかが「１」であるか否かで判断できる。尚、TGT_PC_CARRY、TGT_PC_BORROWが同時に「１」になることはない。
【００５０】
レジスタ相対分岐の場合、分岐先アドレスは、演算ユニット２３で生成される。下位３２ビット（+４PARITY）については、分岐先アドレスレジスタ１０より得られる。上位３２ビット（+４PARITY）については、演算ユニット２３での分岐先アドレス生成後、分岐命令制御部７に下位３２ビット（+４PARITY）が供給されると同時に、プログラムカウンタ部１１に上位３２ビット（+４PARITY）が供給される。この際、演算ユニット２３から分岐先アドレスを生成した分岐命令のIID[5:0]が、分岐先アドレスと同時に、分岐命令制御部７及びプログラムカウンタ部１１に供給される。プログラムカウンタ部１１では、分岐先アドレスの上位３２ビット（+４PARITY）及びその時のIID[5:0]を保持する。本実施例では、プログラムカウンタ部１１に、1命令分保持するためのADRS_HOLDラッチ回路１１−１が設けられている。この時、供給されてくる分岐先アドレスの上位３２ビットとプログラムカウンタPCの上位３２ビットとが比較され、一致しない場合には４Ｇバイト境界を跨ぐと判断し、信号+JMPL_RETURN_TGT_EQ_PC_HIGH=0となる。
【００５１】
レジスタ相対分岐が分岐して命令完了した場合、命令完了した命令のIIDがプログラムカウンタ部１１で保持しているADRS_HOLDラッチ回路１１−１のIIDと一致すると、上位３２ビット（+４PARITY）は、このプログラムカウンタ部１１にあるADRS_HOLDラッチ回路１１−１よりセットされる。レジスタ相対分岐が分岐する場合は、分岐先アドレスが４Ｇバイト境界を跨ぐか否かに関わらず、上位３２ビット（+４PARITY）はADRS_HOLDラッチ回路１１−１から、下位３２ビット（+４PARITY）は分岐先アドレスレジスタ１０からネクストプログラムカウンタnPCにセットされることになる。ただし、分岐先アドレスが４Ｇバイト境界を跨ぐ場合は、信号+JMPL_RETURN_TGT_EQ_PC_HIGH=0となる。
【００５２】
命令列が４Ｇバイト境界を跨いで実行される場合、命令フェッチ部２は、命令フェッチアドレスの上位３２ビットについて、境界直前の値で命令フェッチを行っているため、境界直前の命令が完了した時点で命令フェッチをやり直す必要がある。これは、命令アドレスの上位３２ビットの値が、境界直前の命令と境界直後の命令では異なるためである。従って、本実施例では、境界直前の命令が完了した後、プログラムカウンタ部１１から命令再フェッチ要求REIFCHが命令フェッチ部２に供給される。この時、プログラムカウンタPCの値は、境界直後の命令アドレスに更新されているので、命令フェッチ部２は、プログラムカウンタPCの値から命令フェッチを再開する。
【００５３】
命令完了時に割り込みが発生し、割り込み処理を終えた後、命令制御装置が再スタートした場合、nPC≠PC+4となっている状態が発生することがある。この場合、プログラムカウンタ部１１から命令再フェッチ要求REIFCHが命令フェッチ部２に供給されるが、この時の要求アドレスは、プログラムカウンタPCが指すアドレスとなるものの、その次に実行しなければならない命令は、ネクストプログラムカウンタnPCが指すアドレスの命令である。従って、この場合、一度プログラムカウンタPが指すアドレスで命令フェッチが行われ、1命令（プログラムカウンタPCが指していたアドレスの命令）完了すると、プログラムカウンタPC及びネクストプログラムカウンタnPCを更新した後、再度、命令再フェッチ要求REIFCHが命令フェッチ部２に供給される。これは、命令フェッチ部２は、プログラムカウンタ部１１から命令再フェッチ要求REIFCHがあると、プログラムカウンタPCが指すアドレスで命令フェッチを行い、その後続の命令を供給しようとするからである。
【００５４】
分岐命令のディレイスロット命令が無効にされた場合、ディレイスロット命令は、NOP命令扱いにして命令制御を行うが、この無効にされたディレイスロット命令の直前の分岐命令完了時に割り込みが発生した場合には、PC＝（無効にされたディレイスロット命令のアドレス）、nPC＝（実際に分岐命令の次に実行されるべき命令のアドレス）となってしまう。この状態で、割り込み処理を行い再スタートさせると、実際には実行してはいけない無効にされたディレイスロット命令からスタートしてしまう。そこで、本実施例では、分岐命令完了時に割り込みが発生した場合、後続のディレイスロット命令を無効にする場合は信号+FORCE_NOP_TGR=1とし、この信号がONの場合に割り込みが発生した場合には一度プログラムカウンタPC及びネクストプログラムカウンタnPCを更新した後、割り込み処理中にPC＝nPC、nPC＝nPC+4と再度プログラムカウンタPC及びネクスト付グラムカウンタnPCを更新している。
【００５５】
次に、分岐命令制御部７の構成について、図８〜図１１と共に説明する。図８〜図１１は、分岐命令制御部７内の要部の論理回路図である。
【００５６】
図８において、アンド回路１７１〜１７２及びオア回路１７４は、分岐命令制御部７で分岐する分岐命令が少なくとも１つ制御完了した時に「１」になる信号+RSBR_COMPLETE_TAKEN_RELEASEを生成する。アンド回路１７１には、分岐命令制御部７の０番のエントリーにある分岐命令の制御完了時に「１」となる信号+RSBR0_COMPLETEと、分岐命令制御部７の０番のエントリーにある分岐命令の分岐確定時に「１」となる信号+RSBR0_TAKENとが入力される。同様にして、アンド回路１７２には、分岐命令制御部７の１番のエントリーにある分岐命令の制御完了時に「１」となる信号+RSBR1_COMPLETEと、分岐命令制御部７の１番のエントリーにある分岐命令の分岐確定時に「１」となる信号+RSBR1_TAKENとが入力され、アンド回路１７３には、分岐命令制御部７の２番のエントリーにある分岐命令の制御完了時に「１」となる信号+RSBR2_COMPLETEと、分岐命令制御部７の２番のエントリーにある分岐命令の分岐確定時に「１」となる信号+RSBR2_TAKENとが入力される。オア回路１７４には、アンド回路１７１〜１７３の出力が入力される。
【００５７】
エクスクルシブノア（排他的否定論理和）回路２７１及びアンド回路２７２，２７３は、分岐命令制御部７内にある分岐先アドレスレジスタ１０内のエントリＡに保持されている分岐命令のIIDと分岐する分岐命令が命令完了した時のその分岐命令のIIDとを比較する。エクスクルシブノア回路２７１には、分岐命令制御部７（又はプログラムカウンタ部１１）内で分岐する分岐命令が命令完了した時のその分岐命令のIIDを示す信号+COMIT_TAKEN_IID[5:0]と、分岐命令制御部７内にある分岐先アドレスレジスタ１０のエントリーＡに保持されている分岐命令のIIDを示す信号+RSBR_TGT_BUFF_A_IID[5:0]とが入力される。信号+RSBR_TGT_BUFF_A_IID[5:0]は、後述する信号+TARGET_ADRS_BUFFER_A_IID[5:0]と同等である。アンド回路２７２には、分岐先アドレスレジスタ１０よりネクストプログラムカウンタに値をセットする必要がある時に「１」になる信号+LOAD_TGT_TO_NPCと、分岐先アドレスレジスタ１０内のエントリーＡが有効である時に「１」となる信号+RSBR_TGT_BUFF_A_VALIDとが入力される。アンド回路２７３は、エクスクルシブノア回路２７１及びアンド回路２７２の出力が入力される。
【００５８】
エクスクルシブノア回路２７４及びアンド回路２７５，２７６は、分岐命令制御部７内にある分岐先アドレスレジスタ１０内のエントリＢに保持されている分岐命令のIIDと分岐する分岐命令が命令完了した時のその分岐命令のIIDとを比較する。エクスクルシブノア回路２７４には、分岐命令制御部７（又はプログラムカウンタ部１１）内で分岐する分岐命令が命令完了した時のその分岐命令のIIDを示す信号+COMIT_TAKEN_IID[5:0]と、分岐命令制御部７内にある分岐先アドレスレジスタ１０のエントリーＢに保持されている分岐命令のIIDを示す信号+RSBR_TGT_BUFF_B_IID[5:0]とが入力される。信号+RSBR_TGT_BUFF_B_IID[5:0]は、後述する信号+TARGET_ADRS_BUFFER_B_IID[5:0]と同等である。アンド回路２７５には、分岐先アドレスレジスタ１０よりネクストプログラムカウンタに値をセットする必要がある時に「１」になる信号+LOAD_TGT_TO_NPCと、分岐先アドレスレジスタ１０内のエントリーＢが有効である時に「１」となる信号+RSBR_TGT_BUFF_B_VALIDとが入力される。アンド回路２７６は、エクスクルシブノア回路２７４及びアンド回路２７５の出力が入力される。
【００５９】
図９において、アンド回路２７７は、分岐先アドレスレジスタ１０のエントリーＡを開放する時に「１」となる信号-RSBR_TGT_BUFF_A_REL及び分岐先アドレスレジスタ１０内のエントリーＡが有効である時に「１」となる信号+RSBR_TGT_BUFF_A_VALIDに基づいて、分岐先アドレスレジスタ１０のエントリーＡに対するクロックイネーブル信号+HOLD_RSBR_TGT_BUFF_Aを生成する。アンド回路２７８は、分岐先アドレスレジスタ１０のエントリーＢを開放する時に「１」となる信号-RSBR_TGT_BUFF_B_REL及び分岐先アドレスレジスタ１０内のエントリーＢが有効である時に「１」となる信号+RSBR_TGT_BUFF_B_VALIDに基づいて、分岐先アドレスレジスタ１０のエントリーＢに対するクロックイネーブル信号+HOLD_RSBR_TGT_BUFF_Bを生成する。
【００６０】
ナンド回路３７１は、分岐命令制御部７の０番のエントリーにある分岐命令の分岐確定時に「１」となる信号+RSBR0_TAKEN、分岐先アドレスレジスタ１０内のエントリーＡが有効である時に「１」となる信号+RSBR_TGT_BUFF_A_VALID、分岐先アドレスレジスタ１０内のエントリーＢが有効である時に「１」となる信号+RSBR_TGT_BUFF_B_VALID及び分岐する分岐命令が命令完了したことを示す信号+W_COMMIT_BR_TAKENに基づいて、分岐命令制御部７内の０番目のエントリーの分岐命令が分岐する分岐命令である時に分岐先アドレスレジスタ１０にエントリーの空きがないことを示す信号-RSBR0_TGT_BUFF_BUSYを生成する。信号+W_COMMIT_BR_TAKENは、サイクルＷにおける信号であり、分岐する分岐命令の命令完了＋１τのサイクルで「１」となる。
【００６１】
ナンド回路３７２は、分岐命令制御部７の１番のエントリーにある分岐命令の分岐確定時に「１」となる信号+RSBR1_TAKEN、分岐先アドレスレジスタ１０内のエントリーＡが有効である時に「１」となる信号+RSBR_TGT_BUFF_A_VALID、分岐先アドレスレジスタ１０内のエントリーＢが有効である時に「１」となる信号+RSBR_TGT_BUFF_B_VALID及び分岐する分岐命令が命令完了したことを示す信号+W_COMMIT_BR_TAKENに基づいて、分岐命令制御部７内の１番目のエントリーの分岐命令が分岐する分岐命令である時に分岐先アドレスレジスタ１０にエントリーの空きがないことを示す信号-RSBR1_TGT_BUFF_BUSYを生成する。信号+W_COMMIT_BR_TAKENは、サイクルＷにおける信号であり、分岐する分岐命令の命令完了＋１τのサイクルで「１」となる。
【００６２】
ナンド回路３７１は、分岐命令制御部７の２番のエントリーにある分岐命令の分岐確定時に「１」となる信号+RSBR2_TAKEN、分岐先アドレスレジスタ１０内のエントリーＡが有効である時に「１」となる信号+RSBR_TGT_BUFF_A_VALID、分岐先アドレスレジスタ１０内のエントリーＢが有効である時に「１」となる信号+RSBR_TGT_BUFF_B_VALID及び分岐する分岐命令が命令完了したことを示す信号+W_COMMIT_BR_TAKENに基づいて、分岐命令制御部７内の２番目のエントリーの分岐命令が分岐する分岐命令である時に分岐先アドレスレジスタ１０にエントリーの空きがないことを示す信号-RSBR2_TGT_BUFF_BUSYを生成する。信号+W_COMMIT_BR_TAKENは、サイクルＷにおける信号であり、分岐する分岐命令の命令完了＋１τのサイクルで「１」となる。
【００６３】
図１０において、ラッチ回路３７４のセット端子SETには信号+RSBR_COMPLETE_TAKEN_RELEASEが入力され、入力端子INHSには実行パイプライン上にある全命令をクリアすることを示す信号-CLEAR_PIPELINEが入力される。ノア回路３７５には信号+RSBR_TGT_BUFF_A_REL及び信号+CLEAR_PIPELINEが入力され、ノア回路３７５の出力はラッチ回路３７４のリセット端子RSTに入力される。ラッチ回路３７４は、上記信号+RSBR_TGT_BUFF_A_VALIDを生成する。
【００６４】
ラッチ回路３７６のセット端子SETには、信号+RSBR_COMPLETE_TAKEN_RELEASE及びクロックイネーブル信号+HOLD_RSBR_TGT_BUFF_Aが入力されるアンド回路３７７の出力が入力される。ラッチ回路３７６の入力端子INHSには、実行パイプライン上にある全命令をクリアすることを示す信号-CLEAR_PIPELINEが入力される。ノア回路３７８には信号+RSBR_TGT_BUFF_B_REL及び信号+CLEAR_PIPELINEが入力され、ノア回路３７８の出力はラッチ回路３７６のリセット端子RSTに入力される。ラッチ回路３７６は、上記信号+RSBR_TGT_BUFF_B_VALIDを生成する。
【００６５】
図１１において、ラッチ回路４７１には信号+HOLD_RSBR_TGT_BUFF_A、+COMPLETE_RSBR_IID[5:0]、+COMPLETE_RSBR_CARRY、+COMPLETE_RSBR_BORROW及び+COMPLETE_RSBR_TGT_PC[31:0,P3:P0]が入力され、信号+TARGET_ADRS_BUFFER_A_IID[5:0]、+TARGET_ADRS_BUFFER_A_OVF、+TARGET_ADRS_BUFFER_A_UDF及び+TARGET_ADRS_BUFFER_A[31:0,P3:P0]が出力される。+HOLD_RSBR_TGT_BUFF_Aは、分岐先アドレスレジスタ１０のエントリーＡのクロックイネーブル信号、+COMPLETE_RSBR_IID[5:0]は、分岐命令制御部７から開放した（分岐命令の制御完了時の）分岐命令のIID、+COMPLETE_RSBR_CARRYは、分岐命令制御部７から開放した分岐命令の分岐先アドレスでCARRYが発生した場合に「１」となる信号、+COMPLETE_RSBR_BORROWは、分岐命令制御部７から開放した分岐命令の分岐先アドレスでBORROWが発生した場合に「１」となる信号、+COMPLETE_RSBR_TGT_PC[31:0,P3:P0]は、分岐命令制御部７から開放した分岐命令の分岐先アドレスである。+TARGET_ADRS_BUFFER_A_IID[5:0]は、分岐先アドレスレジスタ１０のエントリーＡに保持されている分岐命令のIID、+TARGET_ADRS_BUFFER_A_OVFは、分岐先アドレスレジスタ１０のエントリーＡに保持されている分岐命令のCARRYビット、+TARGET_ADRS_BUFFER_A_UDFは、分岐先アドレスレジスタ１０のエントリーＡに保持されている分岐命令のBORROWビット、+TARGET_ADRS_BUFFER_A[31:0,P3:P0]は、分岐先アドレスレジスタ１０のエントリーＡに保持されている分岐命令の分岐先アドレスである。
【００６６】
ラッチ回路４７２には信号+HOLD_RSBR_TGT_BUFF_B、+COMPLETE_RSBR_IID[5:0]、+COMPLETE_RSBR_CARRY、+COMPLETE_RSBR_BORROW及び+COMPLETE_RSBR_TGT_PC[31:0,P3:P0]が入力され、信号+TARGET_ADRS_BUFFER_B_IID[5:0]、+TARGET_ADRS_BUFFER_B_OVF、+TARGET_ADRS_BUFFER_B_UDF及び+TARGET_ADRS_BUFFER_B[31:0,P3:P0]が出力される。+HOLD_RSBR_TGT_BUFF_Bは、分岐先アドレスレジスタ１０のエントリーＢのクロックイネーブル信号、+COMPLETE_RSBR_IID[5:0]は、分岐命令制御部７から開放した（分岐命令の制御完了時の）分岐命令のIID、+COMPLETE_RSBR_CARRYは、分岐命令制御部７から開放した分岐命令の分岐先アドレスでCARRYが発生した場合に「１」となる信号、+COMPLETE_RSBR_BORROWは、分岐命令制御部７から開放した分岐命令の分岐先アドレスでBORROWが発生した場合に「１」となる信号、+COMPLETE_RSBR_TGT_PC[31:0,P3:P0]は、分岐命令制御部７から開放した分岐命令の分岐先アドレスである。+TARGET_ADRS_BUFFER_B_IID[5:0]は、分岐先アドレスレジスタ１０のエントリーＢに保持されている分岐命令のIID、+TARGET_ADRS_BUFFER_B_OVFは、分岐先アドレスレジスタ１０のエントリーＢに保持されている分岐命令のCARRYビット、+TARGET_ADRS_BUFFER_B_UDFは、分岐先アドレスレジスタ１０のエントリーＢに保持されている分岐命令のBORROWビット、+TARGET_ADRS_BUFFER_B[31:0,P3:P0]は、分岐先アドレスレジスタ１０のエントリーＢに保持されている分岐命令の分岐先アドレスである。
【００６７】
次に、命令完了制御部９の構成について、図１２及び図１３と共に説明する。図１２及び図１３は、命令完了制御部９内の要部の論理回路図である。
【００６８】
図１２において、アンド回路９１には、１番目に命令完了したのが分岐命令であり後続の遅延命令を無効にする場合に「１」となる信号+TOQ_CSE_BR_FORCE_NOP、少なくとも１つの命令が完了したことを示す信号+COMMIT_TOQ_CSE、少なくとも２つの命令が完了したことを示す信号-COMMIT_2ND_CSE、最初に命令完了する命令が命令の再実行（RERUN）を行う場合に「１」となる信号-TOQ_RERUN_REIFCH_OWN_ORが入力される。アンド回路９２には、２番目に命令完了したのが分岐命令であり後続の遅延命令を無効にする場合に「１」となる信号+2ND_CSE_BR_FORCE_NOP、少なくとも２つの命令が完了したことを示す信号+COMMIT_2ND_CSE、少なくとも３つの命令が完了したことを示す信号-COMMIT_3RD_CSEが入力される。アンド回路９３には、３番目に命令完了したのが分岐命令であり後続の遅延命令を無効にする場合に「１」となる信号+3RD_CSE_BR_FORCE_NOP、少なくとも３つの命令が完了したことを示す信号+COMMIT_3RD_CSE、少なくとも４つの命令が完了したことを示す信号-COMMIT_4TH_CSEが入力される。アンド回路９４には、４番目に命令完了したのが分岐命令であり後続の遅延命令を無効にする場合に「１」となる信号+4TH_CSE_BR_FORCE_NOP、少なくとも４つの命令が完了したことを示す信号+COMMIT_4TH_CSEが入力される。ノア回路９５は、アンド回路９１〜９４の出力を入力される。
【００６９】
アンド回路９６には、割り込み処理が発生した時に「１」となる信号-RS1、次に最初に完了する命令がNOP化された遅延命令であることを示す信号+BR_FORCE_NOP_TGRが入力され、ナンド回路９７には、少なくとも１つの命令が完了したことを示す信号-COMMIT_TOQ_CSE及びアンド回路９６の出力が入力される。アンド回路９８には、ノア回路９５の出力及びナンド回路９７の出力が入力される。ラッチ回路９９は、入力端子1Hに演算ユニット２３等で例外が発生した時に「１」となる信号+EU_XCPTN_ORを入力され、セット端子SETにアンド回路９８の出力を入力され、次に最初に完了する命令がNOP化された遅延命令であることを示す信号-BR_FORCE_NOP_TGRを出力する。
【００７０】
図１３において、ナンド回路１９１には、トラップ処理を行う命令が命令完了したことを示すサイクルＷの信号+W_TRAP_VALID、命令が少なくとも１つ完了したことを示す信号+COMMIT_ENDOP_ORが入力される。アンド回路１９２には、ナンド回路１９１の出力及び信号+BR_FORCE_NOP_TGR=1の時に非同期の割り込み（外部割り込み）が発生すると「１」となる信号+FORCE_NOP_TGR、割り込み処理が発生した時に「１」となる信号-RS1が入力される。アンド回路１９２の出力は、ラッチ回路１９３のセット端子SETに入力される。ラッチ回路１９３は、信号+FORCE_PC_INCR_TGRを出力する。信号+FORCE_PC_INCR_TGRは、分岐命令が命令完了し、ディレイスロット命令が命令完了するまでの間が延びた時にディレイスロット命令が無効化（NOP化）されている場合において割り込み処理が発生した時にプログラムカウンタPC及びネクストプログラムカウンタnPCをディレイスロット命令分（４バイト）更新しなければならない時に「１」となる信号である。つまり、信号+FORCE_PC_INCR_TGRは、信号+FORCE_NOP_TGRの１τ後に立ち上がる、サイクルＷ＋１より有効となる信号である。
【００７１】
次に、プログラムカウンタ部１１内のnPC用更新回路１１−３について、図１４〜図１８と共に説明する。図１４〜図１８は、プログラムカウンタ部１１内のnPC用更新回路１１−３の論理回路図である。
【００７２】
図１４において、インクリメンタ１１１は、信号+PC[53:32,P7:P4]、+TARGET_ADRS_BUFFER_A_OVF、+TARGET_ADRS_BUFFER_A_UDFを入力され、信号+MOD_PC_FOR_TGT_ADRS_A[63:32,P7:P4]を出力する。信号+MOD_PC_FOR_TGT_ADRS_A[63:32,P7:P4]は、分岐先アドレスレジスタ１０のエントリーＡでCARRYビット又はBORROWビットが「１」の時のハイ側の分岐先アドレスを示す。インクリメンタ１１２は、信号+PC[53:32,P7:P4]、+TARGET_ADRS_BUFFER_B_OVF、+TARGET_ADRS_BUFFER_B_UDFを入力され、信号+MOD_PC_FOR_TGT_ADRS_B[63:32,P7:P4]を出力する。信号+MOD_PC_FOR_TGT_ADRS_B[63:32,P7:P4]は、分岐先アドレスレジスタ１０のエントリーＢでCARRYビット又はBORROWビットが「１」の時のハイ側の分岐先アドレスを示す。
【００７３】
図１５において、アンド回路１１３には信号+MOD_PC_FOR_TGT_ADRS_A[63:32, P7:P4]、+RSBR_TGT_BUFF_A_RELが入力され、アンド回路１１４には信号+MOD_PC_FOR_TGT_ADRS_B[63:32,P7:P4]、+RSBR_TGT_BUFF_B_RELが入力される。オア回路１１５は、アンド回路１１３，１１４の出力に基づいて信号+MOD_PC_FOR_TGT_ADRS[63:32,P7:P4]を出力する。信号+MOD_PC_FOR_TGT_ADRS[63:32,P7:P4]は、分岐先アドレスレジスタ１０からネクストプログラムカウンタnPCにセットするハイ側の分岐先アドレスを示す。
【００７４】
アンド回路１１６には信号+TARGET_ADRS_BUFFER_A[31:0,P3:P0]、+RSBR_TGT_BUFF_A_RELが入力され、アンド回路１１７には信号+TARGET_ADRS_BUFFER_B[31:0,P3:P0]、+RSBR_TGT_BUFF_B_RELが入力される。オア回路１１８は、アンド回路１１６，１１７の出力に基づいて信号+SELECTED_TGT_ADRS_BUFF[31:0,P3:P0]を出力する。信号+SELECTED_TGT_ADRS_BUFF[31:0,P3:P0]は、分岐先アドレスレジスタ１０からネクストプログラムカウンタnPCにセットするロー側の分岐先アドレスを示す。
【００７５】
図１６において、インクリメンタ２１１は、信号+NPC[63:0,P7:P0]、+NPC_INCREMENT[3:0]、+FORCE_PC_INCR_TGRを入力され、信号+INCR_NPC[63:0,P7:P0]を出力する。信号+NPC_INCREMENT[3:0]は、いくつの命令が同時に命令完了したかを示す。例えば、ビット３が「１」であると４命令が同時に命令完了したことを示し、ビット２が「１」であると３命令が同時に命令完了したことを示す。信号+INCR_NPC[63:0,P7:P0]は、+FORCE_NOP_TGR=1の時はnPC+4なる演算を行うことを示す。又、アンド回路２１２には信号+COMMIT_UPDATE_PC、-RS1が入力される。信号+COMMIT_UPDATE_PCは、プログラムカウンタPC又はネクストプログラムカウンタnPCの更新が必要であることを示す。ノア回路２１３には、アンド回路２１２の出力及び信号+TRAP_SW1、+FORCE_PC_INCR_TGRが入力される。ノア回路２１３の出力は、ネクストプログラムカウンタnPCのクロックイネーブル信号-CE_NPC及びプログラムカウンタPCのクロックイネーブル信号-CE_PCとして使用される。
【００７６】
図１７において、アンド回路２１４には信号+COMMIT_UPDATE_PC、-BRTKN_EQ_JUMPL_HOLD_VALID、-LOAD_TARGET_ADRS_TO_NPCが入力される。信号-BRTKN_EQ_JUMPL_HOLD_VALIDは、命令完了したレジスタ相対の分岐命令の分岐アドレスのハイ側がオール０（All0）でなくラッチ回路１１−１に保持されている時に「１」となる信号である。オア回路２１５には、アンド回路２１４の出力及び信号+FORCE_PC_INCR_TGRが入力される。アンド回路２１６には、オア回路２１５の出力信号+SEL_INCR_TO_NPC_LOW及び信号-PSTATE_AMが入力される。信号+SEL_INCR_TO_NPC_LOWは、ネクストプログラムカウンタnPCのロー側へセットする際に信号+INCR_NPCを選択する時に「１」となる信号である。信号-PSTATE_AMは、「１」の時に３２ビットアドレスモードを示す信号である。アンド回路２１６は、信号+SEL_INCR_TO_NPC_HIGHを出力する。信号+SEL_INCR_TO_NPC_HIGHは、ネクストプログラムカウンタnPCのハイ側へセットする際に信号+INCR_NPCを選択する時に「１」となる信号である。
【００７７】
アンド回路２１７には信号-BRTKN_EQ_JUMPL_HOLD_VALID、+LOAD_TARGET_ADRS_TO_NPCが入力される。オア回路２１８には、アンド回路２１７の出力及び信号+FORCE_PC_INCR_TGRが入力される。アンド回路２１９には、オア回路２１８の出力及び信号-PSTATE_AMが入力される。アンド回路２１９は、信号+SEL_TARGET_TO_NPC_HIGHを出力する。信号+SEL_TARGET_TO_NPC_HIGHは、ネクストプログラムカウンタnPCのハイ側へセットする際に信号+MOD_PC_FOR_TGT_ADRSを選択する時に「１」となる信号である。又、バッファ３１１は、信号+LOAD_TARGET_ADRS_TO_NPCに基づいて信号+SEL_TARGET_TO_NPC_LOWを出力する。信号+LOAD_TARGET_ADRS_TO_NPCは、ネクストプログラムカウンタnPCに分岐先アドレスレジスタ１０から値をセットしなければならない時に「１」となる信号である。信号+SEL_TARGET_TO_NPC_LOWは、ネクストプログラムカウンタnPCのハイ側へセットする際に信号+SELECTED_TGT_ADRS_BUFFを選択する時に「１」となる信号である。アンド回路３１２は、信号+BRTKN_EQ_JUMPL_HOLD_TGR、-PSTATE_AMに基づいて信号+SEL_JUMPL_AH_TO_NPCを出力する。信号+SEL_JUMPL_AH_TO_NPCは、ネクストプログラムカウンタnPCのハイ側へセットする際にラッチ回路１１−１からの値（+JMPL_ADRS_HOLD）を選択する時に「１」となる信号である。
【００７８】
図１８において、アンド回路４１１は信号+INCR_NPC[63:32,P7:P4]、+SEL_INCR_TO_NPC_HIGHを入力され、アンド回路４１２は信号+MOD_PC_FOR_TGT_ADRS[63:32,P7:P4]、+SEL_TARGET_TO_NPC_HIGHを入力される。アンド回路４１３は信号+JUMPL_ADRD_HOLD[63:32,P7:P4]、+SEL_JUMPL_AH_TO_NPCを入力され、アンド回路４１４は信号+TRAP_ADRS[63:32,P7:P4]、+SEL_TRAP_ADRS_TO_NPCを入力される。信号+TRAP_ADRS[63:32,P7:P4]は、SPARCアーキテクチャで定義されており、トラップが発生した場合（+W_TRAP_VALID=1）に専用のトラップ（TRAP）アドレスを選択する信号である。信号+SEL_TRAP_ADRS_TO_NPCは、トラップが発生した場合に信号+TRAP_ADRSを選択する時に「１」となる信号である。オア回路４１５は、アンド回路４１１〜４１４の出力に基づいて、ネクストプログラムカウンタnPCのセット信号+SET_NPC[63:32,P7:P4]を出力する。アンド回路４１６は信号+INCR_NPC[31:0,P3:P0]、+SEL_INCR_TO_NPC_LOWを入力され、アンド回路４１７は信号+SELECTED_TGT_ADRS_BUFF[31:0,P3:P0]、+SEL_TARGET_TO_NPC_LOWを入力され、アンド回路４１８は信号+TRAP_ADRS[31:0,P3:P0]、+SEL_TRAP_ADRS_TO_NPCを入力される。オア回路４１９は、アンド回路４１６〜４１８の出力に基づいて信号+SET_NPC[31:0,P3:P0]を出力する。
【００７９】
次に、プログラムカウンタ部１１内のPC用更新回路１１−２について、図１９及び図２０と共に説明する。図１９及び図２０は、プログラムカウンタ部１１内のPC用更新回路１１−２の論理回路図である。
【００８０】
図１９において、オア回路５１１には信号+COMMIT_UPDATE_PC、+FORCE_PC_INCR_TGRが入力される。アンド回路５１２には、オア回路５１１の出力信号+SEL_INCR_TO_PC_LOW及び信号-PSTATE_AMが入力される。信号+SEL_INCR_TO_PC_LOWは、プログラムカウンタPCのロー側へセットする際に信号+INCR_PCを選択する時に「１」となる信号である。信号+INCR_PCは、図１６に示す信号+INCRR_NPCと同様に、+NPC_INCREMENT≠0の場合は（INCR_PC=）PC=nPC+（同時に命令完了した数−１）＊４が計算されることを示し、+FORCE_PC_INCR_TRG=1の場合は(INCR_PC=)PC=nPCとなる。信号+NPC_INCREMENTと信号+FORCE_PC_INCR_TGRは、同時に有効にはならない。アンド回路５１２は、信号+SEL_INCR_TO_PC_HIGHを出力する。信号+SEL_INCR_TO_PC_HIGHは、プログラムカウンタPCのハイ側へセットする際に信号+INCR_PCを選択する時に「１」となる信号である。又、インクリメンタ５１３は、上記信号+PC[63:0,P7:P0]、+NPC_INCREMENT[3:0]、+FORCE_PC_INCR_TGRを入力され、信号+INCR_PC[63:0,P7:P0]を出力する。
【００８１】
図２０において、アンド回路６１１には信号+INCR_PC[63:32,P7:P4]、+SEL_INCR_TO_PC_HIGHが入力され、アンド回路６１２には信号+TRAP_ADRS[63:32,P7:P4]、+SEL_TRAP_ADRS_TO_PCが入力される。信号+SEL_TRAP_ADRS_TO_PCは、トラップが発生した場合に信号+TRAP_ADRSを選択する時に「１」となる信号である。オア回路６１３は、アンド回路６１１，６１２の出力に基づいて、プログラムカウンタPCのセット信号+SET_PC[63:32,P7:P4]を出力する。アンド回路６１４には信号+INCR_PC[31:0,P3:P0]、+SEL_INCR_TO_PC_LOWが入力され、アンド回路６１５には信号+TRAP_ADRS[31:0,P3:P0]、+SEL_TRAP_ADRS_TO_PCが入力される。オア回路６１６は、アンド回路６１４，６１５の出力に基づいて、プログラムカウンタPCのセット信号+SET_PC[31:0,P3:P0]を出力する。
【００８２】
以上のように、本実施例では、分岐先アドレスレジスタを設け、命令アドレスレジスタを同時に完了した命令数に合わせて高速に更新す。又、分岐命令の制御が、分岐命令制御部、分岐予測部、分岐先アドレスレジスタ及び命令完了制御部で夫々独立して制御できるので、分岐のスループットを向上させてできるだけ少ない実装面積で回路を実現可能とする。
【００８３】
又、６４ビット長の命令アドレス空間を使用するアーキテクチャにおいて、分岐命令制御部及び分岐先アドレスを生成する部分において、下位３２ビットとCARRYビット及びBORROWビットのみを使用して分岐命令の制御が行える。
【００８４】
尚、本発明は、以下に付記する発明をも包含するものである。
【００８５】
（付記１）分岐予測機構を用いてアウトオブオーダ方式による命令制御を行うと共に、分岐の遅延命令を有するアーキテクチャを制御するプロセッサにおいて、
分岐予測が成功して分岐命令が分岐した時に、分岐命令を含めた複数の命令を同時に命令完了させるステップと、
プログラムカウンタ及びネクストプログラムカウンタを、完了した命令数に合わせて同時に更新するステップとを含むことを特徴とする、プログラムカウンタ制御方法。
【００８６】
（付記２）分岐予測機構を用いてアウトオブオーダ方式による命令制御を行うと共に、分岐の遅延命令を有するアーキテクチャを制御するプロセッサにおいて、
分岐予測が成功して分岐命令が分岐しない時に、分岐命令を含めた複数の命令を同時に命令完了させるステップと、
プログラムカウンタ及びネクストプログラムカウンタを、完了した命令数に合わせて同時に更新するステップとを含むことを特徴とする、プログラムカウンタ制御方法。
【００８７】
（付記３）分岐予測機構を用いてアウトオブオーダ方式による命令制御を行うと共に、分岐の遅延命令を有するアーキテクチャを制御するプロセッサにおいて、
分岐予測が失敗して分岐命令が分岐する時に、分岐命令を含めた複数の命令を同時に命令完了させるステップと、
プログラムカウンタ及びネクストプログラムカウンタを、完了した命令数に合わせて同時に更新するステップとを含むことを特徴とする、プログラムカウンタ制御方法。
【００８８】
（付記４）分岐予測機構を用いてアウトオブオーダ方式による命令制御を行うと共に、分岐の遅延命令を有するアーキテクチャを制御するプロセッサにおいて、
分岐予測が失敗して分岐命令が分岐しない時に、分岐命令を含めた複数の命令を同時に命令完了させるステップと、
プログラムカウンタ及びネクストプログラムカウンタを、完了した命令数に合わせて同時に更新するステップとを含むことを特徴とする、プログラムカウンタ制御方法。
【００８９】
（付記５）前記アーキテクチャは、６４ビット長の命令アドレス空間を使用し、
命令の下位３２ビットとキャリービット及びボロウビットのみを使用して、分岐命令制御及び分岐先アドレスの生成を行うステップを更に含むことを特徴とする、付記１〜４のいずれか１項記載のプログラムカウンタ制御方法。
【００９０】
（付記６）分岐予測部を用いてアウトオブオーダ方式による命令制御を行うと共に、分岐の遅延命令を有するアーキテクチャを制御するプロセッサにおいて、
分岐命令の分岐条件判定、分岐予測の成否及び命令再フェッチの制御を行い、複数の分岐命令を同時に制御可能な分岐命令制御部と、
分岐することが確定した分岐命令の分岐先アドレスを複数格納する分岐先アドレスレジスタとを備え、
該分岐先アドレスレジスタは、該分岐命令制御部及び該分岐予測部とは独立して制御可能であることを特徴とする、プロセッサ。
【００９１】
（付記７）分岐予測機構を用いてアウトオブオーダ方式による命令制御を行うと共に、分岐の遅延命令を有するアーキテクチャを制御するプロセッサにおいて、
プログラムカウンタ及びネクストプログラムカウンタからなるプログラムカウンタ手段と、
分岐予測が成功して分岐命令が分岐した時に、分岐命令を含めた複数の命令を同時に命令完了させる手段と、
該プログラムカウンタ及び該ネクストプログラムカウンタを、完了した命令数に合わせて同時に更新する手段とを備えたことを特徴とする、プロセッサ。
【００９２】
（付記８）分岐予測機構を用いてアウトオブオーダ方式による命令制御を行うと共に、分岐の遅延命令を有するアーキテクチャを制御するプロセッサにおいて、
プログラムカウンタ及びネクストプログラムカウンタからなるプログラムカウンタ手段と、
分岐予測が成功して分岐命令が分岐しない時に、分岐命令を含めた複数の命令を同時に命令完了させる手段と、
該プログラムカウンタ及び該ネクストプログラムカウンタを、完了した命令数に合わせて同時に更新する手段とを備えたことを特徴とする、プロセッサ。
【００９３】
（付記９）分岐予測機構を用いてアウトオブオーダ方式による命令制御を行うと共に、分岐の遅延命令を有するアーキテクチャを制御するプロセッサにおいて、
プログラムカウンタ及びネクストプログラムカウンタからなるプログラムカウンタ手段と、
分岐予測が失敗して分岐命令が分岐する時に、分岐命令を含めた複数の命令を同時に命令完了させる手段と、
該プログラムカウンタ及び該ネクストプログラムカウンタを、完了した命令数に合わせて同時に更新する手段とを備えたことを特徴とする、プロセッサ。
【００９４】
（付記１０）分岐予測機構を用いてアウトオブオーダ方式による命令制御を行うと共に、分岐の遅延命令を有するアーキテクチャを制御するプロセッサにおいて、
プログラムカウンタ及びネクストプログラムカウンタからなるプログラムカウンタ手段と、
分岐予測が失敗して分岐命令が分岐しない時に、分岐命令を含めた複数の命令を同時に命令完了させる手段と、
該プログラムカウンタ及び該ネクストプログラムカウンタを、完了した命令数に合わせて同時に更新する手段とを備えたことを特徴とする、プロセッサ。
【００９５】
（付記１１）前記アーキテクチャは、６４ビット長の命令アドレス空間を使用し、
命令の下位３２ビットとキャリービット及びボロウビットのみを使用して、分岐命令制御及び分岐先アドレスの生成を行う手段を更に備えたことを特徴とする、付記６〜１０のいずれか１項記載のプロセッサ。
【００９６】
以上、本発明を実施例により説明したが、本発明は上記実施例に限定されるものではなく、種々の変形及び改良が可能であることは、言うまでもない。
【００９７】
【発明の効果】
本発明によれば、分岐のスループットの向上を、できるだけ小さい回路規模（実装面積）で図れるプログラムカウンタ制御方法及びプロセッサを実現することができる。
【図面の簡単な説明】
【図１】本発明になるプロセッサの一実施例を示すブロック図である。
【図２】命令ユニットの要部を示すブロック図である。
【図３】分岐命令制御時の動作を説明するフローチャートである。
【図４】プログラムカウンタ部の更新時の動作を説明するフローチャートである。
【図５】分岐命令制御部内のエントリーを示す概要図である。
【図６】分岐先アドレスレジスタ内のエントリーを示すの概要図である。
【図７】プログラムカウンタ部の構成を示すブロック図である。
【図８】分岐命令制御部内の要部の論理回路図である。
【図９】分岐命令制御部内の要部の論理回路図である。
【図１０】分岐命令制御部内の要部の論理回路図である。
【図１１】分岐命令制御部内の要部の論理回路図である。
【図１２】命令完了制御部内の要部の論理回路図である。
【図１３】命令完了制御部内の要部の論理回路図である。
【図１４】プログラムカウンタ部内のnPC用更新回路の論理回路図である。
【図１５】プログラムカウンタ部内のnPC用更新回路の論理回路図である。
【図１６】プログラムカウンタ部内のnPC用更新回路の論理回路図である。
【図１７】プログラムカウンタ部内のnPC用更新回路の論理回路図である。
【図１８】プログラムカウンタ部内のnPC用更新回路の論理回路図である。
【図１９】プログラムカウンタ部内のPC用更新回路の論理回路図である。
【図２０】プログラムカウンタ部内のPC用更新回路の論理回路図である。
【符号の説明】
１分岐予測部
２命令フェッチ部
３命令バッファ部
４相対分岐アドレス生成部
５命令デコーダ部
６分岐命令実行部
７分岐命令制御部
８ディレイスロットスタック部
９命令完了制御部
１０分岐先アドレスレジスタ
１１プログラムカウンタ部
１１−１ラッチ回路
１１−２，１１−３更新回路
２１命令ユニット
２２メモリユニット
２３演算ユニット
PC プログラムカウンタ
nPC ネクストプログラムカウンタ[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a program counter control method and a processor, and more particularly to a program counter and next program counter configured to complete a plurality of instructions including a branch instruction at the same time in an instruction control that performs branch prediction and has a delay instruction for branching. The present invention relates to a program counter control method for simultaneously updating and controlling the program, and a processor using such a program counter control method.
[0002]
[Prior art]
In recent years, various instruction processing methods have been used to improve processor performance. One of them is an out-of-order processing method. In a processor adopting the out-of-order processing method, the performance is improved by sequentially executing subsequent instructions in a plurality of pipelines without waiting for the completion of execution of one instruction. Yes.
[0003]
However, when the execution result of the preceding instruction affects the execution of the subsequent instruction, the subsequent instruction cannot be executed unless the preceding instruction execution is completed. If the processing of the instruction that affects the execution of the subsequent instruction is delayed, the subsequent instruction cannot be executed during that time, and the process waits for completion of the preceding instruction. For this reason, the pipeline is disturbed, resulting in performance degradation. Such pipeline disturbance is particularly noticeable in the case of branch instructions.
[0004]
Among the branch instructions, in the case of an instruction called a conditional branch, if there is an instruction that changes the branch condition (usually the condition code) immediately before the conditional branch instruction, the instruction that changes the branch condition is completed and the branch condition is finalized. The branch is not fixed until. Therefore, since the subsequent sequence of the branch instruction is not known, the subsequent instruction cannot be input, the processing is stopped, and the processing capability is reduced. This is not limited to processors that use the out-of-order processing method, but the same problem occurs in processors that use processing methods such as lock, step, and pipeline, but the out-of-order processing method is adopted. In some processors, the performance is more significantly degraded. Therefore, in order to suppress the performance degradation due to the branch instruction, a branch prediction mechanism for performing branch prediction is usually provided in the instruction control device in the processor to speed up the branch instruction.
[0005]
In the case of a processor that employs an out-of-order processing method including a branch prediction mechanism, a plurality of branch instructions are input to the execution pipeline based on the branch prediction result. When a branch instruction branches, it is necessary to set the branch destination address in the instruction address register. In a processor employing the SPARC (trademark) architecture, this instruction address register is called a program counter / next program counter. When there are a plurality of branch instructions on the execution pipeline, the instruction address register needs to hold the branch destination address of each branch instruction until the instruction is completed. However, since the branch decision timing of each branch instruction is different, conventionally, it has been necessary to hold the branch destination address of a branch instruction that does not actually branch.
[0006]
The throughput of the branch instruction in the execution pipeline is determined by the throughput of the branch instruction control unit and the number of branch destination address registers for holding the branch destination address. However, if the branch destination address register is used at the branch destination address of a branch instruction that does not actually branch, the throughput of the branch instruction is consequently suppressed. For this reason, it has fallen into a vicious circle in which it is necessary to further increase the branch destination address register in order to improve the throughput.
[0007]
In the instruction control device, one of the factors that determine the execution speed is the number of instructions that can be processed in one cycle. In the instruction control apparatus using the out-of-order method, a plurality of instructions can be completed simultaneously. Normally, instruction completion refers to the time when updating of resources such as registers to be used is completed. However, when completing a plurality of instructions at the same time, it is necessary to complete the updating of the resources to be used at the same time. Of course, the instruction address register also needs to be updated for a plurality of instructions. When controlling an architecture having a branch delay instruction typified by the SPARC architecture, it is necessary to update the two registers of the program counter and the next program counter depending on whether the branch instruction is branched or not. is there. For this reason, conventionally, a branch instruction can be completed (committed) only from a single position or a fixed position (a position relative to another instruction when the instruction is completed simultaneously). Usually, in the decoding cycle, when the branch instruction is placed at the last position of the packet and the instruction is completed by the packet method, the position is determined. In this case, the decode cycle and the instruction completion cycle are restricted by the branch instruction.
[0008]
[Problems to be solved by the invention]
In recent years, due to improvements in LSI manufacturing technology, it has become possible to use large-capacity memories, and 64-bit operation has been promoted in operating systems (OS) and applications. Along with this, 64-bit conversion is also required in instruction control devices. However, as the number of bits increases, the number of circuits such as registers to be used increases. Registers related to the control of branch instructions also need to be 64 bits, which increases the branch destination address register and the like.
[0009]
Thus, when the 32-bit configuration is simply changed to the 64-bit configuration, the number of entries is not changed, and the circuit is required twice, which increases the circuit scale (mounting area). .
[0010]
However, at present, there are few cases where the instruction area of the actual program uses 4 Gbytes or more, and the case of exceeding the 4 Gbyte boundary in the program hardly affects the performance of the instruction processing. SUMMARY OF THE INVENTION An object of the present invention is to realize a program counter control method and a processor capable of improving the branch throughput with the smallest possible circuit scale (mounting area).
[0011]
[Means for Solving the Problems]
The above problem is that the branch instruction is executed when the branch instruction is branched because the branch prediction succeeds in the processor that controls the architecture having the branch delay instruction while performing the instruction control by the out-of-order method using the branch prediction mechanism. Can be achieved by a program counter control method comprising the steps of simultaneously completing a plurality of instructions including, and simultaneously updating the program counter and the next program counter in accordance with the number of completed instructions.
[0012]
The above problem is that a branch instruction is executed when branch prediction is successful and the branch instruction does not branch in a processor that controls an architecture having a branch delay instruction while performing instruction control by an out-of-order method using a branch prediction mechanism. Can be achieved by a program counter control method characterized in that it includes a step of simultaneously completing a plurality of instructions including, and a step of simultaneously updating the program counter and the next program counter in accordance with the number of completed instructions.
[0013]
The above-mentioned problem is that when a branch instruction is branched due to branch prediction failure in a processor that controls an architecture having a branch delay instruction while performing instruction control by an out-of-order method using a branch prediction mechanism, the branch instruction Can be achieved by a program counter control method characterized in that it includes a step of simultaneously completing a plurality of instructions including, and a step of simultaneously updating the program counter and the next program counter in accordance with the number of completed instructions.
[0014]
The above problem is that a branch instruction is executed when branch prediction fails and the branch instruction does not branch in a processor that controls an architecture having an out-of-order method using a branch prediction mechanism and has an instruction having a delayed branch instruction. Can be achieved by a program counter control method characterized in that it includes a step of simultaneously completing a plurality of instructions including, and a step of simultaneously updating the program counter and the next program counter in accordance with the number of completed instructions.
[0015]
The architecture uses a 64-bit instruction address space, and the program counter control method uses only the lower 32 bits of the instruction, the carry bit, and the borrow bit to perform branch instruction control and branch destination address generation. Further, it may be included.
[0016]
The above-described problems include an instruction control by an out-of-order method using a branch prediction unit, and a processor for controlling an architecture having a branch delay instruction, branch condition determination of branch instruction, success / failure of branch prediction, and instruction refetch. A branch instruction control unit that can simultaneously control a plurality of branch instructions, and a branch destination address register that stores a plurality of branch destination addresses of branch instructions determined to branch. This can also be achieved by a processor that can be controlled independently of the branch instruction control unit and the branch prediction unit.
[0017]
In the processor for controlling an architecture having a branch delay instruction while performing instruction control by an out-of-order method using a branch prediction mechanism, a program counter means including a program counter and a next program counter, and branch prediction Means for simultaneously completing a plurality of instructions including a branch instruction when the branch instruction branches, and means for simultaneously updating the program counter and the next program counter in accordance with the number of completed instructions. This can also be achieved by a processor characterized by the provision.
[0018]
In the processor for controlling an architecture having a branch delay instruction while performing instruction control by an out-of-order method using a branch prediction mechanism, a program counter means including a program counter and a next program counter, and branch prediction Means for simultaneously completing a plurality of instructions including a branch instruction when the branch instruction does not branch, and means for simultaneously updating the program counter and the next program counter in accordance with the number of completed instructions. This can also be achieved by a processor characterized by the provision.
[0019]
In the processor for controlling an architecture having a branch delay instruction while performing instruction control by an out-of-order method using a branch prediction mechanism, a program counter means including a program counter and a next program counter, and branch prediction Means for simultaneously completing a plurality of instructions including a branch instruction when the branch instruction branches due to failure, and means for simultaneously updating the program counter and the next program counter in accordance with the number of completed instructions. This can also be achieved by a processor characterized by the provision.
[0020]
In the processor for controlling an architecture having a branch delay instruction while performing instruction control by an out-of-order method using a branch prediction mechanism, a program counter means including a program counter and a next program counter, and branch prediction Means for simultaneously completing a plurality of instructions including a branch instruction when the branch instruction does not branch, and means for simultaneously updating the program counter and the next program counter in accordance with the number of completed instructions. This can also be achieved by a processor characterized by the provision.
[0021]
Therefore, according to the present invention, it is possible to realize a program counter control method and a processor capable of improving the branch throughput with the smallest possible circuit scale (mounting area).
[0022]
DETAILED DESCRIPTION OF THE INVENTION
Embodiments of the program counter control method and the processor according to the present invention will be described below with reference to the drawings.
[0023]
【Example】
FIG. 1 is a block diagram showing an embodiment of a processor according to the present invention. In the figure, a processor 100 includes an instruction unit 21, a memory unit 22, and an arithmetic unit 23. The instruction unit 21 constitutes an instruction control apparatus that employs an embodiment of the instruction control method according to the present invention. The memory unit 22 is provided for storing instructions, data, and the like, and the arithmetic unit 23 is provided for executing various arithmetic operations.
[0024]
The instruction unit 21 includes a branch prediction unit 1, an instruction fetch unit 2, an instruction buffer unit 3, a relative branch address generation unit 4, an instruction decoder unit 5, a branch instruction execution unit 6, and an instruction completion control unit connected as shown in FIG. 9 includes a branch destination address register 10 and a program counter unit 11. The branch instruction execution unit 6 includes a branch instruction control unit 7 and a delay slot stack unit 8. The program counter unit 11 includes a program counter PC, a next program counter nPC, and an update unit.
[0025]
The branch instruction can be controlled independently in the branch prediction unit 1, the branch instruction control unit 7, the instruction completion control unit 9, and the branch destination address register 10. A branch instruction existing on the execution pipeline is once under the control of the branch instruction control unit 7 once decoded by the instruction decoder unit 5. The branch instruction control unit 7 controls branch condition determination of branch instructions, success / failure of branch prediction, and instruction refetch. The number of branch instructions that can be controlled by the branch instruction control unit 7 is determined by the number of entries. The control in the branch instruction control unit 7 is until the branch condition determination of the branch instruction and the generation of the branch destination address, and thereafter, the control is performed in the instruction completion control unit 9. The branch destination address register 10 controls the branch destination address of the branch instruction that is released from the control by the branch instruction control unit 7 and controls until the instruction is completed, that is, the program counter unit 11 is updated. The instruction completion control unit 9 controls instruction completion conditions for all instructions, and branch instructions are always controlled regardless of whether or not branching is possible. The number of branch instructions MAX that can exist simultaneously on the execution pipeline depends on the number N of entries in the instruction completion control unit 9 and branches when the branch destination address register (number of entries = M) becomes full. The maximum number of branch instructions is the sum L + M with the number of entries L of the branch instruction control unit 7. A branch instruction that does not branch does not depend on the number of branch destination address registers 10. Since control under the branch destination address register 10 is only performed until the instruction is completed after being released from the branch instruction control unit 7, instruction decoding is affected while the branch instruction control unit 7 has a vacancy. I will not receive it.
[0026]
Branch destination address generation is divided into two types: instruction relative branch and register relative branch. The branch destination address generation of the instruction relative branch is calculated by the relative branch address generation unit 4 and supplied to the branch destination address register 10 via the branch instruction control unit 7. The branch destination address of the register relative branch is calculated in the instruction execution unit 23 and supplied to the branch destination address register 10 via the branch instruction control unit 7. For example, the lower 32 bits of the branch destination address of the register relative branch are supplied to the program counter unit 11 via the branch instruction control unit 7 and the upper 32 bits are directly supplied to the program counter unit 11. Since the branch destination address of the address relative branch is obtained by calculation based on the presence or absence of the Borrow bit and the Carry bit when the upper 32 bits are changed, the branch destination control address is ( (Lower 32 bits + 4 bits parity + Borrow bits + Carry bits) * Controlled by the number of entries. Similarly, in the branch destination address register 10, the branch destination instruction address is composed of (lower 32 bits + 4 bits parity + Borrow bits + Carry bits) * number of entries. When the upper 32 bits of the instruction address change, the instruction is always fetched by retry from the program counter unit 11 after setting a value in the program counter unit 11 once.
[0027]
Control for updating resources to be used is performed by the instruction completion control unit 9 and the program counter unit 11. In the case of the program counter unit 11, in instruction completion (commit), information on how many instructions have been completed at the same time and whether branching instructions have been completed is supplied. When the instruction to branch is completed, the information is also supplied to the branch instruction control unit 7. In this embodiment, PC = nPC + (number of instructions completed simultaneously-1) * 4, nPC = nPC + (number of instructions completed simultaneously * 4) or branch destination address. In this embodiment, the branch instruction that branches can complete the instruction at the same time as the instruction preceding it, but does not complete the instruction at the same time as the instruction that follows it. This is because the path of the branch destination address is not included in the path set in the program counter PC. Similarly to the next program counter nPC, if the path of the branch destination address is also included in the program counter PC, the branch instruction There is no restriction on the number of simultaneous instructions completed. In a branch instruction that does not branch, the number of simultaneous instruction completions is not limited in this embodiment. In this embodiment, when the branch instruction is completed, there is no restriction on the position where the instruction is completed, and there is no restriction at the time of decoding.
[0028]
FIG. 2 is a block diagram showing the main part of the instruction unit 21 shown in FIG. In FIG. 2, the same parts as those of FIG. In FIG. 2, illustration of inputs from the instruction decoder unit 5 to the branch instruction control unit 7 and the instruction completion control unit 9 is omitted. The program counter unit 11 includes a program counter PC, a next program counter nPC, a latch circuit 11-1, a program counter (PC) update circuit 11-2, and a next program counter (nPC) update circuit 11-3. In the following description, it is assumed that all addresses are logical addresses unless otherwise specified.
[0029]
In this embodiment, for the sake of convenience of explanation, it is assumed that the SPARC architecture is adopted. The instructions are processed out-of-order. In the branch instruction execution unit 6, a plurality of RSBR0 to RSBRm (Reservation Station for Branch) are provided in the branch instruction control unit 7, and a plurality of DSS0s are provided in the delay slot stack unit 8. ~ DSSn (Delay Slot Stack) is provided. Further, a branch prediction unit 1 is provided as a branch instruction prediction mechanism.
[0030]
FIG. 3 is a flowchart for explaining the operation at the time of branch instruction control as described above. In the figure, step S1 determines whether or not the branch instruction is completed, and if the determination result is YES, step S2 determines whether or not the branch instruction branches. If the decision result in the step S2 is NO, the process advances to a step S4 described later. On the other hand, if the decision result in the step S2 is YES, a step S3 decides whether or not there is an empty entry in the branch destination address register 10. If the determination result of step S3 is YES, the process proceeds to step S4.
[0031]
In step S4, the branch instruction control unit 7 completes control of the branch instruction, and the process proceeds to step S5 and step S6. In step S5, the completion of the branch instruction is notified to the instruction completion control unit 9. At the same time as step S5, step S6 instructs the branch destination address register 10 to hold the branch address when branching. After step S5 and step S6, step S7 updates the resource, that is, updates the program counter unit 11 by the update circuits 11-2 and 11-3.
[0032]
FIG. 4 is a flowchart for explaining the operation of the program counter unit 11 when it is updated. The process shown in the figure corresponds to the process of step S7 shown in FIG. In FIG. 4, step S11 determines whether or not the instruction completion conditions are met. If the decision result in the step S11 is YES, a step S12 notifies the program counter unit 11 of information on how many instructions have been completed simultaneously and how many instructions have been branched. After step S12, step S13 and step S14 are performed simultaneously.
[0033]
In step S13, when the instruction to branch is completed, the information is notified to the branch destination address register 10, and the process proceeds to step S15 described later. On the other hand, step S14 determines whether or not the notified information includes a branching instruction. If the determination result is YES, the process proceeds to step S15, and if the determination result is NO, the process proceeds to step S16. . In step S15, PC = nPC + (number of instructions completed simultaneously-1) * 4, nPC = branch destination address is set. In step S16, PC = nPC + (number of instructions completed simultaneously-1) * 4 and nPC = nPC + (number of instructions completed simultaneously * 4) are set.
[0034]
Returning to the description of FIG. 2, when an instruction fetch request is issued from the instruction fetch unit 2, the branch prediction unit 1 performs branch prediction on the instruction address requested by the instruction fetch request. If the branch prediction unit 1 has an entry corresponding to the instruction address requested by the instruction fetch request, a flag BRHIS_HIT indicating that branch prediction has been performed is added to the corresponding instruction fetch data, and the branch predicted branch An instruction fetch request at the previous instruction address is output to the instruction fetch unit 2. The instruction fetch data is supplied from the instruction fetch unit 2 to the instruction decoder unit 5 together with the added flag BRHIS_HIT. In the case of a branch instruction such as BPR, Bicc, BPcc, FBcc, FBPcc, etc., in which the instruction is decoded by the instruction decoder unit 5 and the instruction has an invalid (Annul) bit, the invalid bit is referred to along with the flag BRHIS_HIT.
[0035]
When the flag BRHIS_HIT = 1, the instruction decoder unit 5 executes the subsequent one instruction unconditionally. However, when the flag BRHIS_HIT = 0 and the invalid bit = 1, the subsequent one instruction is a non-operation instruction. Decode as (NOP instruction). In other words, if the flag BRHIS_HIT = 1, the instruction decoder unit 5 performs decoding as usual. If the decoding result is a branch instruction and the flag BRHIS_HIT = 0 and the invalid bit = 1, the subsequent one instruction is changed to a NOP instruction. Change. In the SPARC architecture, a branch instruction having an invalid bit executes a delay slot instruction (delay instruction) when the branch is established, and does not execute the delay slot instruction when the invalid bit is 1 when the branch is not established. The delay slot instruction is executed only when is zero. Performing branch prediction is substantially the same as predicting that a delay slot instruction is executed because the instruction is a branch instruction and it is predicted that a branch is taken. Note that the CALL instruction, JMPL instruction, and RETURN instruction that do not have an invalid bit are unconditional branches, and since they always execute the delay slot instruction, they can be handled in the same way. For the ALWAYS_BRANCH instruction with COND = 1000, the delay slot instruction is not executed when the invalid bit is 1, even though it is an unconditional branch. Can be recovered.
[0036]
When branch prediction is performed, there is no need to perform re-instruction fetch when the prediction is successful, and the predicted branch destination instruction sequence is the same as the actual instruction sequence. In addition, when the branch prediction is successful, the delay slot instruction is correctly executed. In this case, the instruction continues to be executed as it is.
[0037]
When branch prediction is performed and the prediction is lost, re-instruction fetch is necessary. The branch destination instruction sequence is executed incorrectly, and the actual instruction sequence must be re-executed. In this case, since the execution of the delay slot instruction is also incorrect, it is necessary to start over from the delay slot instruction. In this embodiment, after a branch instruction re-instruction fetch request is output from the branch instruction control unit 8 to the instruction fetch unit 2, a delay slot instruction to be re-executed is extracted from the delay slot stack unit 8, and the delay slot is sent to the instruction decoder unit 5. Supply instructions. As a result, the branch prediction recovery including the delay slot instruction is performed.
[0038]
When all branch instructions are decoded by the instruction decode unit 5, entries are created in the branch instruction control unit 7 and the instruction completion control unit 9. The branch instruction control unit 7 controls the branch instruction until the branch destination address of the branch instruction and the branch condition are determined. The instruction completion control unit 9 performs control for instruction completion, that is, control for completing the instruction in order.
[0039]
In SPARC, two types of instruction relative branch and register relative branch are defined as described above. The branch destination address of the instruction relative branch is generated by the relative branch address generation unit 4, and the branch destination address of the register relative branch is generated by the arithmetic unit 23. The branch destination address generated by the relative branch address generation unit 4 is supplied to the branch instruction control unit 7. The branch instruction control unit 7 is supplied with the branch destination address PCRAG_TGT_PC [31: 0, P3: P0], the CARRY bit PCRAG_TGTPC_CARRY, and the BORROW bit PCRAG_TGTPC_BORROW from the relative branch address generation unit 4, and the branch destination address EXA_TGT_PC [31: 0, P3: P0]. At this time, the arithmetic unit 23 supplies EXA_TGT_PC [63:32, P7: P4] to the program counter unit 11.
[0040]
When the branch instruction control by the branch instruction control unit 7 is completed, the branch instruction is controlled by the instruction completion control unit 9 until the instruction is completed. When the branch instruction is released from the branch instruction control unit 7, if the branch instruction branches, the branch destination address is stored in the branch destination address register 10. The branch destination address stored in the branch destination address register 10 is used for updating the next program counter nPC of the program counter unit 11 in the cycle W of the corresponding branch instruction. The cycle W is a register update cycle, and the program counter PC and the next program counter nPC are also updated in this cycle W. When a branch instruction is released from the branch instruction control unit 7, when the branch instruction to be released branches, it is confirmed whether or not there is an empty entry in the branch address register 10, and if there is an empty, the branch instruction control unit 7 Is not released from the branch instruction control unit 7 when there is no space. However, even if the branch address register 10 is full, if the branch instruction does not branch, the branch instruction is released from the branch instruction control unit 7.
[0041]
In this embodiment, the branch instruction control unit 7 has 10 entries, and the branch address register 10 has 2 entries. Even if the branch address register 10 is full, control of the subsequent branch instruction in the branch instruction control unit 7 does not stop until the branch instruction control unit 7 becomes full. The entry of the branch address register 10 is composed of VALID, branch destination address TGT_PC [31: 0, P3: P0], CARRY bit TGT_PC_CARRY, BORROW bit TGT_PC_BORROW, and IID [5: 0]. When VALID = 1, which indicates the validity of an entry, indicates that the entry is valid. When a branch instruction to be branched is released from the branch instruction control unit 7, an entry is created in the branch address register 10, set to VALID = 1, and the entry is held until the cycle W of the branch instruction.
[0042]
FIG. 5 is a conceptual diagram showing entries in the branch instruction control unit 7. The 10 entries shown in the figure each include VALID, branch destination address TGT_PC [31: 0, P3: P0], branch address PC [31: 0, P3: P0], CARRY bit, BORROW bit, and IID.
FIG. 6 is a schematic diagram of the branch destination address register 10. Two entries A and B shown in the figure each include VALID, branch destination address TGT_PC [31: 0, P3: P0], CARRY bit, BORROW bit, and IID.
[0043]
FIG. 7 is a block diagram showing a configuration of the program counter unit 11. The program counter PC and the next program counter nPC of the program counter unit 11 are updated simultaneously in a cycle W after the completion of an instruction (instruction completion cycle). There are the following cases (1) to (4) for updating.
(1) When multiple instructions complete at the same time and there is no branch instruction that branches.
(2) When multiple instructions complete at the same time and there is a branch instruction that branches.
(3) When an instruction is executed across a 4-Gbyte boundary.
(4) When an instruction generates an interrupt when the instruction completes.
[0044]
Updating of the program counter PC and the next program counter nPC of the program counter unit 11 is basically performed by PC = nPC + (number of instructions completed simultaneously-1) * 4, nPC = nPC + (number of instructions completed simultaneously * 4) or branch This is the destination address. Therefore, PC = nPC + (number of instructions completed simultaneously-1) * 4, nPC = PC + 4 or branch destination address.
[0045]
In this embodiment, the event that the branch instruction delay instruction (delay slot (DSS) instruction) is disabled (ANNUL) can be replaced by replacing the disabled DSS instruction with a Non-Operation instruction (NOP instruction). Realized. The program counter PC and the next program counter nPC of the program counter unit 11 are updated in the same manner as when the instruction is executed. Therefore, the program counter PC and the next program counter nPC are used when a branch instruction that does not branch completes and when a branch instruction is not included in the instruction group that completes the instruction at the same time (case (1) above). The update of PC = nPC + (number of instructions completed simultaneously-1) * 4 and nPC = PC + 4. When the DSS instruction is invalidated, if the interrupt is permitted when the instruction of the DSS instruction replaced with the NOP instruction is completed, the value of the program counter PC at the end of the DSS instruction can be seen outside the processor. Since the DSS instruction replaced with such a NOP instruction is an instruction that is not actually executed, if PC = (the instruction address of the DSS instruction replaced with the NOP instruction), the instruction that is executed when returning from the interrupt Wrong column. In order to prevent this, in this embodiment, the values of the program counter PC and the next program counter nPC are further updated for the DSS instruction (4 bytes) in such a case.
[0046]
When a branch instruction that branches branches simultaneously with a plurality of instructions, the branch instruction can complete instructions simultaneously with the previous instruction, but cannot complete instructions simultaneously with subsequent instructions. That is, when a branching instruction that completes branching is always the last instruction in the instruction group that completes the instruction at the same time, PC = (instruction address of DSS instruction) and nPC = (branch destination address of branch) ( Case (2) above).
[0047]
In this embodiment, the update circuit 11-2 and 11-3 for the program counter PC and the next program counter nPC are simplified by providing a restriction when the instruction of the branch instruction to be branched is completed. 11-2 is notified of how many subsequent instructions of the branch instruction to be branched, and PC = TGT_PC + (the number of subsequent instructions of the branch instruction completed simultaneously * 4), nPC = PC + 4 and no restrictions on instruction completion are required.
[0048]
The branch instruction TGT_PC to be branched is set in the cycle W from the branch destination address register 10 to the next program counter nPC. When the branch instruction to be branched is completed, the instruction number (Instruction ID: IID) of the instruction is supplied from the instruction completion control unit 9 in the cycle W, and the next program is started from the entry having the same IID in the branch destination address register 10. TGT_PC [31: 0, P3: P0] is set for the counter nPC. In this way, at the same time as setting in the next program counter nPC, the entry of the branch destination address register 10 is released, and a new entry can be set in the branch destination address register 10.
[0049]
The operation when the branch instruction is completed, the branch instruction branches, and the branch destination address crosses the 4 Gbyte boundary is as follows. Since the program counter PC is obtained by PC = nPC + (number of instructions completed simultaneously-1) * 4, no special control is required. The next program counter nPC is set from the branch destination address register 10 because nPC = (branch destination address), but the branch destination address register 10 holds only the lower 32 bits (+4 PARITY). Therefore, when the branch instruction that has been completed is an instruction-relative branch, the upper 32 bits (+4 PARITY) are generated based on TGT_PC_CARRY and TGT_PC_BORROW held in the branch destination address register 10. Whether or not the branch destination address crosses the 4 Gbyte boundary can be determined by whether or not either TGT_PC_CARRY or TGT_PC_BORROW is “1”. Note that TGT_PC_CARRY and TGT_PC_BORROW do not become “1” at the same time.
[0050]
In the case of a register relative branch, the branch destination address is generated by the arithmetic unit 23. The lower 32 bits (+4 PARITY) are obtained from the branch destination address register 10. For the upper 32 bits (+4 PARITY), after the branch destination address is generated in the arithmetic unit 23, the lower 32 bits (+4 PARITY) are supplied to the branch instruction control unit 7 and at the same time the upper 32 bits ( + 4PARITY) is supplied. At this time, the IID [5: 0] of the branch instruction that generated the branch destination address from the arithmetic unit 23 is supplied to the branch instruction control unit 7 and the program counter unit 11 simultaneously with the branch destination address. The program counter unit 11 holds the upper 32 bits (+4 PARITY) of the branch destination address and IID [5: 0] at that time. In the present embodiment, the program counter unit 11 is provided with an ADRS_HOLD latch circuit 11-1 for holding one instruction. At this time, the upper 32 bits of the supplied branch destination address are compared with the upper 32 bits of the program counter PC.
[0051]
When the register relative branch branches and the instruction is completed, when the IID of the instruction that has been completed matches the IID of the ADRS_HOLD latch circuit 11-1 held in the program counter unit 11, the upper 32 bits (+4 PARITY) It is set by the ADRS_HOLD latch circuit 11-1 in the program counter unit 11. When register-relative branching occurs, the upper 32 bits (+4 PARITY) branch from the ADRS_HOLD latch circuit 11-1 and the lower 32 bits (+4 PARITY) branch regardless of whether the branch destination address crosses the 4 GB boundary. The next address counter 10 is set to the next program counter nPC. However, when the branch destination address crosses a 4 Gbyte boundary, the signal + JMPL_RETURN_TGT_EQ_PC_HIGH = 0.
[0052]
When the instruction sequence is executed across a 4 Gbyte boundary, the instruction fetch unit 2 performs the instruction fetch with the value immediately before the boundary for the upper 32 bits of the instruction fetch address. It is necessary to redo the instruction fetch. This is because the upper 32 bits of the instruction address are different between the instruction immediately before the boundary and the instruction immediately after the boundary. Therefore, in this embodiment, after the instruction immediately before the boundary is completed, the instruction refetch request REIFCH is supplied from the program counter unit 11 to the instruction fetch unit 2. At this time, since the value of the program counter PC is updated to the instruction address immediately after the boundary, the instruction fetch unit 2 resumes the instruction fetch from the value of the program counter PC.
[0053]
When an instruction is completed and an interrupt is generated, and the instruction control device restarts after completing the interrupt processing, a state where nPC ≠ PC + 4 may occur. In this case, an instruction refetch request REIFCH is supplied from the program counter unit 11 to the instruction fetch unit 2, but the request address at this time is an address pointed to by the program counter PC, but the instruction that must be executed next. Is an instruction at the address pointed to by the next program counter nPC. Therefore, in this case, once the instruction fetch is performed at the address indicated by the program counter P and one instruction (the instruction at the address indicated by the program counter PC) is completed, the program counter PC and the next program counter nPC are updated, and then again. The instruction refetch request REIFCH is supplied to the instruction fetch unit 2. This is because, when there is an instruction refetch request REIFCH from the program counter unit 11, the instruction fetch unit 2 attempts to fetch an instruction at the address indicated by the program counter PC and supply the subsequent instruction.
[0054]
When the delay slot instruction of the branch instruction is invalidated, the delay slot instruction is handled as a NOP instruction and the instruction is controlled. However, when an interrupt occurs when the branch instruction immediately before the invalidated delay slot instruction is completed. Will be PC = (address of the disabled delay slot instruction) and nPC = (address of the instruction that should actually be executed next to the branch instruction). If interrupt processing is performed and restarted in this state, it will start from an invalid delay slot instruction that should not be executed. Therefore, in this embodiment, when an interrupt occurs when the branch instruction is completed, the signal + FORCE_NOP_TGR = 1 is set to invalidate the subsequent delay slot instruction, and once an interrupt occurs when this signal is ON. After updating the program counter PC and the next program counter nPC, the program counter PC and the next gram counter nPC are updated again with PC = nPC and nPC = nPC + 4 during the interrupt processing.
[0055]
Next, the configuration of the branch instruction control unit 7 will be described with reference to FIGS. 8 to 11 are logic circuit diagrams of the main part in the branch instruction control unit 7.
[0056]
In FIG. 8, AND circuits 171 to 172 and an OR circuit 174 generate a signal + RSBR_COMPLETE_TAKEN_RELEASE that becomes “1” when control of at least one branch instruction branched by the branch instruction control unit 7 is completed. The AND circuit 171 includes a signal + RSBR0_COMPLETE that becomes “1” when the control of the branch instruction in the 0th entry of the branch instruction control unit 7 is completed, and the branch instruction branch in the 0th entry of the branch instruction control unit 7 A signal + RSBR0_TAKEN, which becomes “1” at the time of confirmation, is input. Similarly, the AND circuit 172 has a signal + RSBR1_COMPLETE that becomes “1” when the control of the branch instruction in the first entry of the branch instruction control unit 7 is completed, and the first entry of the branch instruction control unit 7. A signal + RSBR1_TAKEN that becomes “1” when the branch instruction is confirmed is inputted, and the AND circuit 173 has a signal that becomes “1” when the control of the branch instruction in the second entry of the branch instruction control unit 7 is completed + RSBR2_COMPLETE and a signal + RSBR2_TAKEN that becomes “1” when the branch instruction in the second entry of the branch instruction control unit 7 is confirmed are input. The outputs of the AND circuits 171 to 173 are input to the OR circuit 174.
[0057]
The exclusive NOR (exclusive negative OR) circuit 271 and the AND circuits 272 and 273 branch a branch with the IID of the branch instruction held in the entry A in the branch destination address register 10 in the branch instruction control unit 7. Compare the IID of the branch instruction when the instruction completes. The exclusive NOR circuit 271 includes a signal + COMIT_TAKEN_IID [5: 0] indicating the IID of the branch instruction when the branch instruction branching in the branch instruction control unit 7 (or the program counter unit 11) is completed, A signal + RSBR_TGT_BUFF_A_IID [5: 0] indicating the IID of the branch instruction held in the entry A of the branch destination address register 10 in the instruction control unit 7 is input. The signal + RSBR_TGT_BUFF_A_IID [5: 0] is equivalent to a signal + TARGET_ADRS_BUFFER_A_IID [5: 0] described later. The AND circuit 272 has a signal + LOAD_TGT_TO_NPC that becomes “1” when a value needs to be set in the next program counter from the branch destination address register 10 and “1” when the entry A in the branch destination address register 10 is valid. The signal + RSBR_TGT_BUFF_A_VALID that becomes “is input. The AND circuit 273 receives the outputs of the exclusive NOR circuit 271 and the AND circuit 272.
[0058]
The exclusive NOR circuit 274 and the AND circuits 275 and 276 receive the branch instruction IID held in the entry B in the branch destination address register 10 in the branch instruction control unit 7 and the branch instruction when the branch instruction branches. Compare the IID of the branch instruction. The exclusive NOR circuit 274 includes a signal + COMIT_TAKEN_IID [5: 0] indicating the IID of the branch instruction when the branch instruction branching in the branch instruction control unit 7 (or the program counter unit 11) is completed, A signal + RSBR_TGT_BUFF_B_IID [5: 0] indicating the IID of the branch instruction held in the entry B of the branch destination address register 10 in the instruction control unit 7 is input. The signal + RSBR_TGT_BUFF_B_IID [5: 0] is equivalent to a signal + TARGET_ADRS_BUFFER_B_IID [5: 0] described later. In the AND circuit 275, the signal + LOAD_TGT_TO_NPC which becomes “1” when a value needs to be set to the next program counter from the branch destination address register 10 and “1” when the entry B in the branch destination address register 10 is valid. The signal + RSBR_TGT_BUFF_B_VALID that becomes “is input. The AND circuit 276 receives the outputs of the exclusive NOR circuit 274 and the AND circuit 275.
[0059]
In FIG. 9, the AND circuit 277 is a signal −RSBR_TGT_BUFF_A_REL which becomes “1” when the entry A of the branch destination address register 10 is released and a signal which becomes “1” when the entry A in the branch destination address register 10 is valid. Based on + RSBR_TGT_BUFF_A_VALID, a clock enable signal + HOLD_RSBR_TGT_BUFF_A for entry A of the branch destination address register 10 is generated. The AND circuit 278 is based on the signal −RSBR_TGT_BUFF_B_REL that becomes “1” when the entry B of the branch destination address register 10 is released and the signal + RSBR_TGT_BUFF_B_VALID that becomes “1” when the entry B in the branch destination address register 10 is valid. Thus, the clock enable signal + HOLD_RSBR_TGT_BUFF_B for the entry B of the branch destination address register 10 is generated.
[0060]
The NAND circuit 371 sets the signal + RSBR0_TAKEN which becomes “1” when the branch instruction of the branch instruction in the 0th entry of the branch instruction control unit 7 is confirmed, and “1” when the entry A in the branch destination address register 10 is valid. Based on the signal + RSBR_TGT_BUFF_A_VALID, the signal + RSBR_TGT_BUFF_B_VALID that becomes “1” when the entry B in the branch destination address register 10 is valid, and the signal + W_COMMIT_BR_TAKEN indicating that the branch instruction to branch is completed When the branch instruction of the 0th entry in 7 is a branch instruction, a signal -RSBR0_TGT_BUFF_BUSY indicating that there is no entry space in the branch destination address register 10 is generated. The signal + W_COMMIT_BR_TAKEN is a signal in the cycle W, and becomes “1” in the cycle of instruction completion + 1 branching of the branch instruction to branch.
[0061]
The NAND circuit 372 sets the signal + RSBR1_TAKEN that becomes “1” when the branch instruction in the first entry of the branch instruction control unit 7 is confirmed, and “1” when the entry A in the branch destination address register 10 is valid. Based on the signal + RSBR_TGT_BUFF_A_VALID, the signal + RSBR_TGT_BUFF_B_VALID that becomes “1” when the entry B in the branch destination address register 10 is valid, and the signal + W_COMMIT_BR_TAKEN indicating that the branch instruction to branch is completed When the branch instruction of the first entry in 7 is a branch instruction, a signal -RSBR1_TGT_BUFF_BUSY indicating that there is no entry space in the branch destination address register 10 is generated. The signal + W_COMMIT_BR_TAKEN is a signal in the cycle W, and becomes “1” in the cycle of instruction completion + 1 branching of the branch instruction to branch.
[0062]
The NAND circuit 371 sets the signal + RSBR2_TAKEN which becomes “1” when the branch instruction of the branch instruction in the second entry of the branch instruction control unit 7 is confirmed, and “1” when the entry A in the branch destination address register 10 is valid. Based on the signal + RSBR_TGT_BUFF_A_VALID, the signal + RSBR_TGT_BUFF_B_VALID that becomes “1” when the entry B in the branch destination address register 10 is valid, and the signal + W_COMMIT_BR_TAKEN indicating that the branch instruction to branch is completed When the branch instruction of the second entry in 7 is a branch instruction, a signal -RSBR2_TGT_BUFF_BUSY indicating that there is no entry space in the branch destination address register 10 is generated. The signal + W_COMMIT_BR_TAKEN is a signal in the cycle W, and becomes “1” in the cycle of instruction completion + 1 branching of the branch instruction to branch.
[0063]
In FIG. 10, a signal + RSBR_COMPLETE_TAKEN_RELEASE is input to the set terminal SET of the latch circuit 374, and a signal -CLEAR_PIPELINE indicating that all instructions on the execution pipeline are cleared is input to the input terminal INHS. The NOR circuit 375 receives the signal + RSBR_TGT_BUFF_A_REL and the signal + CLEAR_PIPELINE, and the output of the NOR circuit 375 is input to the reset terminal RST of the latch circuit 374. The latch circuit 374 generates the signal + RSBR_TGT_BUFF_A_VALID.
[0064]
An output of the AND circuit 377 to which the signal + RSBR_COMPLETE_TAKEN_RELEASE and the clock enable signal + HOLD_RSBR_TGT_BUFF_A are input is input to the set terminal SET of the latch circuit 376. The input terminal INHS of the latch circuit 376 receives a signal −CLEAR_PIPELINE indicating that all instructions on the execution pipeline are cleared. The NOR circuit 378 receives the signal + RSBR_TGT_BUFF_B_REL and the signal + CLEAR_PIPELINE, and the output of the NOR circuit 378 is input to the reset terminal RST of the latch circuit 376. The latch circuit 376 generates the signal + RSBR_TGT_BUFF_B_VALID.
[0065]
In FIG. 11, the signals + HOLD_RSBR_TGT_BUFF_A, + COMPLETE_RSBR_IID [5: 0], + COMPLETE_RSBR_CARRY, + COMPLETE_RSBR_BORROW, and + COMPLETE_RSBR_TGT_PC [31: 0, P3: P0] are input to the latch circuit 471, and the signal + TARGET_ARS_BUF , + TARGET_ADRS_BUFFER_A_OVF, + TARGET_ADRS_BUFFER_A_UDF, and + TARGET_ADRS_BUFFER_A [31: 0, P3: P0] are output. + HOLD_RSBR_TGT_BUFF_A is the clock enable signal of entry A of the branch destination address register 10, + COMPLETE_RSBR_IID [5: 0] is the IID of the branch instruction released when the branch instruction control unit 7 is completed (+ COMPLETE_RSBR_CARRY) Is a signal that becomes “1” when CARRY occurs at the branch destination address of the branch instruction released from the branch instruction control unit 7, + COMPLETE_RSBR_BORROW is the branch destination address of the branch instruction released from the branch instruction control unit 7 The signal + COMPLETE_RSBR_TGT_PC [31: 0, P3: P0], which becomes “1” when the error occurs, is the branch destination address of the branch instruction released from the branch instruction control unit 7. + TARGET_ADRS_BUFFER_A_IID [5: 0] is the IID of the branch instruction held in the entry A of the branch destination address register 10, + TARGET_ADRS_BUFFER_A_OVF is the CARRY bit of the branch instruction held in the entry A of the branch destination address register 10, + TARGET_ADRS_BUFFER_A_UDF is the BORROW bit of the branch instruction held in the entry A of the branch destination address register 10, and + TARGET_ADRS_BUFFER_A [31: 0, P3: P0] is the branch held in the entry A of the branch destination address register 10 Instruction branch destination address.
[0066]
The latch circuit 472 receives the signals + HOLD_RSBR_TGT_BUFF_B, + COMPLETE_RSBR_IID [5: 0], + COMPLETE_RSBR_CARRY, + COMPLETE_RSBR_BORROW and + COMPLETE_RSBR_TGT_PC [31: 0, P3: P0], and signals + TARGET_ADRS_FER_ + TARGET_ADRS_BUFFER_B_UDF and + TARGET_ADRS_BUFFER_B [31: 0, P3: P0] are output. + HOLD_RSBR_TGT_BUFF_B is the clock enable signal for entry B of the branch destination address register 10, + COMPLETE_RSBR_IID [5: 0] is the IID of the branch instruction released when the branch instruction control unit 7 is completed (+ COMPLETE_RSBR_CARRY) Is a signal that becomes “1” when CARRY occurs at the branch destination address of the branch instruction released from the branch instruction control unit 7, + COMPLETE_RSBR_BORROW is the branch destination address of the branch instruction released from the branch instruction control unit 7 The signal + COMPLETE_RSBR_TGT_PC [31: 0, P3: P0], which becomes “1” when the error occurs, is the branch destination address of the branch instruction released from the branch instruction control unit 7. + TARGET_ADRS_BUFFER_B_IID [5: 0] is the IID of the branch instruction held in entry B of the branch destination address register 10, + TARGET_ADRS_BUFFER_B_OVF is the CARRY bit of the branch instruction held in entry B of the branch destination address register 10, + TARGET_ADRS_BUFFER_B_UDF is the BORROW bit of the branch instruction held in entry B of the branch destination address register 10, and + TARGET_ADRS_BUFFER_B [31: 0, P3: P0] is a branch held in entry B of the branch destination address register 10 Instruction branch destination address.
[0067]
Next, the configuration of the instruction completion control unit 9 will be described with reference to FIGS. 12 and 13 are logic circuit diagrams of the main part in the instruction completion control unit 9. FIG.
[0068]
In FIG. 12, the AND circuit 91 indicates that the first instruction has been completed is a branch instruction and a signal + TOQ_CSE_BR_FORCE_NOP which becomes “1” when a subsequent delay instruction is invalidated, that at least one instruction has been completed. A signal + COMMIT_TOQ_CSE, a signal -COMMIT_2ND_CSE indicating that at least two instructions have been completed, and a signal -TOQ_RERUN_REIFCH_OWN_OR that becomes “1” when the first instruction completion instruction re-executes the instruction (RERUN) are input. The AND circuit 92 has a signal + 2ND_CSE_BR_FORCE_NOP that becomes “1” when the second instruction is completed and a subsequent delayed instruction is invalidated, and a signal + COMMIT_2ND_CSE indicating that at least two instructions are completed The signal -COMMIT_3RD_CSE indicating that at least three instructions are completed is input. The AND circuit 93 has a signal + 3RD_CSE_BR_FORCE_NOP that becomes “1” when the third instruction is completed and a subsequent delayed instruction is invalidated, and a signal + COMMIT_3RD_CSE indicating that at least three instructions are completed The signal -COMMIT_4TH_CSE indicating that at least four instructions are completed is input. The AND circuit 94 has a signal + 4TH_CSE_BR_FORCE_NOP that becomes “1” when the branch instruction is the fourth instruction completed and the subsequent delay instruction is invalidated, and a signal + COMMIT_4TH_CSE indicating that at least four instructions are completed Is entered. The NOR circuit 95 receives the outputs of the AND circuits 91-94.
[0069]
The AND circuit 96 receives a signal −RS1 that becomes “1” when interrupt processing occurs, and a signal + BR_FORCE_NOP_TGR indicating that the first instruction to be completed is a delayed instruction that has been turned into a NOP. Are supplied with a signal -COMMIT_TOQ_CSE indicating that at least one instruction has been completed and the output of the AND circuit 96. The AND circuit 98 receives the output of the NOR circuit 95 and the output of the NAND circuit 97. The latch circuit 99 receives a signal + EU_XCPTN_OR that becomes “1” when an exception occurs in the arithmetic unit 23 or the like at the input terminal 1H, receives the output of the AND circuit 98 from the set terminal SET, and then completes first. A signal -BR_FORCE_NOP_TGR indicating that the instruction is a delayed instruction converted to NOP is output.
[0070]
In FIG. 13, a NAND circuit 191 receives a signal + W_TRAP_VALID of a cycle W indicating that an instruction to perform trap processing is completed and a signal + COMMIT_ENDOP_OR indicating that at least one instruction is completed. The AND circuit 192 has a signal + FORCE_NOP_TGR that becomes “1” when an asynchronous interrupt (external interrupt) occurs when the output of the NAND circuit 191 and the signal + BR_FORCE_NOP_TGR = 1, and a signal that becomes “1” when an interrupt process occurs. -RS1 is input. The output of the AND circuit 192 is input to the set terminal SET of the latch circuit 193. The latch circuit 193 outputs a signal + FORCE_PC_INCR_TGR. Signal + FORCE_PC_INCR_TGR indicates that the program counter PC is used when interrupt processing occurs when the delay slot instruction is invalidated (NOP) when the branch instruction is completed and the delay slot instruction is extended. This signal is “1” when the next program counter nPC needs to be updated by a delay slot instruction (4 bytes). That is, the signal + FORCE_PC_INCR_TGR is a signal that becomes effective after the cycle W + 1 that rises 1τ after the signal + FORCE_NOP_TGR.
[0071]
Next, the nPC update circuit 11-3 in the program counter unit 11 will be described with reference to FIGS. 14 to 18 are logic circuit diagrams of the nPC update circuit 11-3 in the program counter unit 11. FIG.
[0072]
In FIG. 14, the incrementer 111 receives signals + PC [53:32, P7: P4], + TARGET_ADRS_BUFFER_A_OVF, and + TARGET_ADRS_BUFFER_A_UDF, and outputs a signal + MOD_PC_FOR_TGT_ADRS_A [63:32, P7: P4]. The signal + MOD_PC_FOR_TGT_ADRS_A [63:32, P7: P4] indicates the branch destination address on the high side when the CARRY bit or the BORROW bit is “1” in the entry A of the branch destination address register 10. The incrementer 112 receives the signals + PC [53:32, P7: P4], + TARGET_ADRS_BUFFER_B_OVF, + TARGET_ADRS_BUFFER_B_UDF, and outputs the signal + MOD_PC_FOR_TGT_ADRS_B [63: 32, P7: P4]. The signal + MOD_PC_FOR_TGT_ADRS_B [63:32, P7: P4] indicates the branch destination address on the high side when the CARRY bit or the BORROW bit is “1” in the entry B of the branch destination address register 10.
[0073]
In FIG. 15, signals + MOD_PC_FOR_TGT_ADRS_A [63:32, P7: P4] and + RSBR_TGT_BUFF_A_REL are input to the AND circuit 113, and signals + MOD_PC_FOR_TGT_ADRS_B [63: 32, P7: P4] and + RSBR_TGT_BUFF_B_REL are input to the AND circuit 114. Is done. The OR circuit 115 outputs a signal + MOD_PC_FOR_TGT_ADRS [63:32, P7: P4] based on the outputs of the AND circuits 113 and 114. The signal + MOD_PC_FOR_TGT_ADRS [63:32, P7: P4] indicates the high-side branch destination address set from the branch destination address register 10 to the next program counter nPC.
[0074]
The AND circuit 116 receives the signals + TARGET_ADRS_BUFFER_A [31: 0, P3: P0] and + RSBR_TGT_BUFF_A_REL, and the AND circuit 117 receives the signals + TARGET_ADRS_BUFFER_B [31: 0, P3: P0] and + RSBR_TGT_BUFF_B_REL. The OR circuit 118 outputs a signal + SELECTED_TGT_ADRS_BUFF [31: 0, P3: P0] based on the outputs of the AND circuits 116 and 117. The signal + SELECTED_TGT_ADRS_BUFF [31: 0, P3: P0] indicates the branch destination address on the low side set from the branch destination address register 10 to the next program counter nPC.
[0075]
In FIG. 16, an incrementer 211 receives signals + NPC [63: 0, P7: P0], + NPC_INCREMENT [3: 0] and + FORCE_PC_INCR_TGR and outputs a signal + INCR_NPC [63: 0, P7: P0]. To do. The signal + NPC_INCREMENT [3: 0] indicates how many instructions are completed simultaneously. For example, if bit 3 is “1”, it indicates that four instructions are simultaneously completed, and if bit 2 is “1”, it indicates that three instructions are simultaneously completed. The signal + INCR_NPC [63: 0, P7: P0] indicates that an operation of nPC + 4 is performed when + FORCE_NOP_TGR = 1. Further, signals + COMMIT_UPDATE_PC and -RS1 are input to the AND circuit 212. The signal + COMMIT_UPDATE_PC indicates that the program counter PC or the next program counter nPC needs to be updated. The NOR circuit 213 receives the output of the AND circuit 212 and the signals + TRAP_SW1 and + FORCE_PC_INCR_TGR. The output of the NOR circuit 213 is used as a clock enable signal -CE_NPC for the next program counter nPC and a clock enable signal -CE_PC for the program counter PC.
[0076]
In FIG. 17, signals + COMMIT_UPDATE_PC, −BRTKN_EQ_JUMPL_HOLD_VALID, and −LOAD_TARGET_ADRS_TO_NPC are input to the AND circuit 214. The signal -BRTKN_EQ_JUMPL_HOLD_VALID is a signal that becomes “1” when the high side of the branch address of the register-relative branch instruction that has completed the instruction is held in the latch circuit 11-1 instead of all 0 (All0). The OR circuit 215 receives the output of the AND circuit 214 and the signal + FORCE_PC_INCR_TGR. The AND circuit 216 receives the output signal + SEL_INCR_TO_NPC_LOW of the OR circuit 215 and the signal −PSTATE_AM. The signal + SEL_INCR_TO_NPC_LOW is “1” when the signal + INCR_NPC is selected when the next program counter nPC is set to the low side. The signal -PSTATE_AM is a signal indicating the 32-bit address mode when “1”. The AND circuit 216 outputs a signal + SEL_INCR_TO_NPC_HIGH. The signal + SEL_INCR_TO_NPC_HIGH is “1” when the signal + INCR_NPC is selected when the next program counter nPC is set to the high side.
[0077]
The AND circuit 217 receives signals -BRTKN_EQ_JUMPL_HOLD_VALID and + LOAD_TARGET_ADRS_TO_NPC. The output of the AND circuit 217 and the signal + FORCE_PC_INCR_TGR are input to the OR circuit 218. The AND circuit 219 receives the output of the OR circuit 218 and the signal -PSTATE_AM. The AND circuit 219 outputs a signal + SEL_TARGET_TO_NPC_HIGH. The signal + SEL_TARGET_TO_NPC_HIGH is a signal that is “1” when the signal + MOD_PC_FOR_TGT_ADRS is selected when the next program counter nPC is set to the high side. The buffer 311 outputs a signal + SEL_TARGET_TO_NPC_LOW based on the signal + LOAD_TARGET_ADRS_TO_NPC. The signal + LOAD_TARGET_ADRS_TO_NPC is a signal that becomes “1” when a value must be set from the branch destination address register 10 to the next program counter nPC. The signal + SEL_TARGET_TO_NPC_LOW is “1” when the signal + SELECTED_TGT_ADRS_BUFF is selected when the next program counter nPC is set to the high side. The AND circuit 312 outputs a signal + SEL_JUMPL_AH_TO_NPC based on the signals + BRTKN_EQ_JUMPL_HOLD_TGR and -PSTATE_AM. The signal + SEL_JUMPL_AH_TO_NPC is a signal that is “1” when the value (+ JMPL_ADRS_HOLD) from the latch circuit 11-1 is selected when the next program counter nPC is set to the high side.
[0078]
In FIG. 18, the AND circuit 411 receives signals + INCR_NPC [63: 32, P7: P4] and + SEL_INCR_TO_NPC_HIGH, and the AND circuit 412 receives signals + MOD_PC_FOR_TGT_ADRS [63: 32, P7: P4] and + SEL_TARGET_TO_NPC_HIGH. . The AND circuit 413 receives the signals + JUMPL_ADRD_HOLD [63:32, P7: P4] and + SEL_JUMPL_AH_TO_NPC, and the AND circuit 414 receives the signals + TRAP_ADRS [63: 32, P7: P4] and + SEL_TRAP_ADRS_TO_NPC. The signal + TRAP_ADRS [63:32, P7: P4] is defined in the SPARC architecture, and is a signal for selecting a dedicated trap (TRAP) address when a trap occurs (+ W_TRAP_VALID = 1). The signal + SEL_TRAP_ADRS_TO_NPC is a signal that becomes “1” when the signal + TRAP_ADRS is selected when a trap occurs. The OR circuit 415 outputs the set signal + SET_NPC [63:32, P7: P4] of the next program counter nPC based on the outputs of the AND circuits 411 to 414. The AND circuit 416 receives the signals + INCR_NPC [31: 0, P3: P0] and + SEL_INCR_TO_NPC_LOW, the AND circuit 417 receives the signals + SELECTED_TGT_ADRS_BUFF [31: 0, P3: P0] and + SEL_TARGET_TO_NPC_LOW, and the AND circuit 418 The signals + TRAP_ADRS [31: 0, P3: P0] and + SEL_TRAP_ADRS_TO_NPC are input. The OR circuit 419 outputs a signal + SET_NPC [31: 0, P3: P0] based on the outputs of the AND circuits 416 to 418.
[0079]
Next, the PC update circuit 11-2 in the program counter unit 11 will be described with reference to FIGS. 19 and 20 are logic circuit diagrams of the PC update circuit 11-2 in the program counter unit 11. FIG.
[0080]
In FIG. 19, signals + COMMIT_UPDATE_PC and + FORCE_PC_INCR_TGR are input to the OR circuit 511. The AND circuit 512 receives the output signal + SEL_INCR_TO_PC_LOW of the OR circuit 511 and the signal −PSTATE_AM. The signal + SEL_INCR_TO_PC_LOW is a signal that becomes “1” when the signal + INCR_PC is selected when setting to the low side of the program counter PC. Similarly to the signal + INCRR_NPC shown in FIG. 16, the signal + INCR_PC indicates that (INCR_PC =) PC = nPC + (number of instructions completed simultaneously−1) * 4 when + NPC_INCREMENT ≠ 0, When FORCE_PC_INCR_TRG = 1, (INCR_PC =) PC = nPC. Signal + NPC_INCREMENT and signal + FORCE_PC_INCR_TGR are not valid at the same time. The AND circuit 512 outputs a signal + SEL_INCR_TO_PC_HIGH. The signal + SEL_INCR_TO_PC_HIGH is “1” when the signal + INCR_PC is selected when the program counter PC is set to the high side. The incrementer 513 receives the signals + PC [63: 0, P7: P0], + NPC_INCREMENT [3: 0] and + FORCE_PC_INCR_TGR and outputs the signal + INCR_PC [63: 0, P7: P0]. .
[0081]
In FIG. 20, the signals + INCR_PC [63:32, P7: P4] and + SEL_INCR_TO_PC_HIGH are input to the AND circuit 611, and the signals + TRAP_ADRS [63: 32, P7: P4] and + SEL_TRAP_ADRS_TO_PC are input to the AND circuit 612. Is done. The signal + SEL_TRAP_ADRS_TO_PC is a signal that becomes “1” when the signal + TRAP_ADRS is selected when a trap occurs. The OR circuit 613 outputs the set signal + SET_PC [63:32, P7: P4] of the program counter PC based on the outputs of the AND circuits 611 and 612. The AND circuit 614 receives signals + INCR_PC [31: 0, P3: P0] and + SEL_INCR_TO_PC_LOW, and the AND circuit 615 receives signals + TRAP_ADRS [31: 0, P3: P0] and + SEL_TRAP_ADRS_TO_PC. The OR circuit 616 outputs the set signal + SET_PC [31: 0, P3: P0] of the program counter PC based on the outputs of the AND circuits 614 and 615.
[0082]
As described above, in this embodiment, a branch destination address register is provided, and the instruction address register is updated at a high speed according to the number of instructions completed simultaneously. In addition, branch instruction control can be controlled independently by the branch instruction control unit, branch prediction unit, branch destination address register, and instruction completion control unit, thereby improving the branch throughput and realizing a circuit with as little mounting area as possible. Make it possible.
[0083]
In an architecture using a 64-bit instruction address space, branch instructions can be controlled using only the lower 32 bits, the CARRY bit, and the BORROW bit in the branch instruction control unit and the branch destination address generation part.
[0084]
In addition, this invention also includes the invention attached to the following.
[0085]
(Supplementary note 1) In a processor for controlling an architecture having a branch delay instruction while performing instruction control by an out-of-order method using a branch prediction mechanism,
A step of simultaneously completing a plurality of instructions including a branch instruction when branch prediction is successful and the branch instruction branches; and
Updating the program counter and the next program counter according to the number of completed instructions at the same time.
[0086]
(Supplementary Note 2) In a processor for controlling an architecture having a branch delay instruction while performing instruction control by an out-of-order method using a branch prediction mechanism,
A step of simultaneously completing a plurality of instructions including a branch instruction when the branch prediction is successful and the branch instruction does not branch; and
Updating the program counter and the next program counter according to the number of completed instructions at the same time.
[0087]
(Supplementary Note 3) In a processor for controlling an architecture having a branch delay instruction while performing instruction control by an out-of-order method using a branch prediction mechanism,
A step of simultaneously completing a plurality of instructions including a branch instruction when branch prediction fails and a branch instruction branches; and
Updating the program counter and the next program counter according to the number of completed instructions at the same time.
[0088]
(Supplementary Note 4) In a processor for controlling an architecture having a branch delay instruction while performing instruction control by an out-of-order method using a branch prediction mechanism,
A step of simultaneously completing a plurality of instructions including a branch instruction when the branch prediction fails and the branch instruction does not branch; and
Updating the program counter and the next program counter according to the number of completed instructions at the same time.
[0089]
(Supplementary Note 5) The architecture uses a 64-bit instruction address space,
The program counter according to any one of appendices 1 to 4, further comprising a step of performing branch instruction control and branch destination address generation using only the lower 32 bits of the instruction, the carry bit, and the borrow bit. Control method.
[0090]
(Supplementary Note 6) In a processor for controlling an architecture having a branch delay instruction while performing instruction control by an out-of-order method using a branch prediction unit,
A branch instruction control unit capable of simultaneously controlling a plurality of branch instructions, performing branch condition determination of branch instructions, success / failure of branch prediction, and instruction refetch control.
A branch destination address register for storing a plurality of branch destination addresses of branch instructions determined to branch;
The processor, wherein the branch destination address register can be controlled independently of the branch instruction control unit and the branch prediction unit.
[0091]
(Supplementary Note 7) In a processor for controlling an architecture having a branch delay instruction while performing instruction control by an out-of-order method using a branch prediction mechanism,
Program counter means comprising a program counter and a next program counter;
Means for simultaneously completing a plurality of instructions including a branch instruction when branch prediction is successful and the branch instruction branches;
A processor comprising: means for simultaneously updating the program counter and the next program counter in accordance with the number of completed instructions.
[0092]
(Supplementary Note 8) In a processor for controlling an architecture having a branch delay instruction while performing instruction control by an out-of-order method using a branch prediction mechanism,
Program counter means comprising a program counter and a next program counter;
Means for simultaneously completing a plurality of instructions including a branch instruction when the branch prediction is successful and the branch instruction does not branch;
A processor comprising: means for simultaneously updating the program counter and the next program counter in accordance with the number of completed instructions.
[0093]
(Supplementary Note 9) In a processor that performs instruction control by an out-of-order method using a branch prediction mechanism and controls an architecture having a branch delay instruction,
Program counter means comprising a program counter and a next program counter;
Means for simultaneously completing a plurality of instructions including a branch instruction when branch prediction fails and a branch instruction branches;
A processor comprising: means for simultaneously updating the program counter and the next program counter in accordance with the number of completed instructions.
[0094]
(Supplementary Note 10) In a processor for controlling an architecture having a branch delay instruction while performing instruction control by an out-of-order method using a branch prediction mechanism,
Program counter means comprising a program counter and a next program counter;
Means for simultaneously completing a plurality of instructions including a branch instruction when branch prediction fails and the branch instruction does not branch;
A processor comprising: means for simultaneously updating the program counter and the next program counter in accordance with the number of completed instructions.
[0095]
(Supplementary Note 11) The architecture uses a 64-bit instruction address space,
The processor according to any one of appendices 6 to 10, further comprising means for performing branch instruction control and branch destination address generation using only the lower 32 bits of the instruction, the carry bit, and the borrow bit. .
[0096]
As mentioned above, although this invention was demonstrated by the Example, this invention is not limited to the said Example, It cannot be overemphasized that various deformation | transformation and improvement are possible.
[0097]
【The invention's effect】
According to the present invention, it is possible to realize a program counter control method and a processor capable of improving the branch throughput with a circuit scale (mounting area) as small as possible.
[Brief description of the drawings]
FIG. 1 is a block diagram showing an embodiment of a processor according to the present invention.
FIG. 2 is a block diagram illustrating a main part of an instruction unit.
FIG. 3 is a flowchart illustrating an operation during branch instruction control.
FIG. 4 is a flowchart illustrating an operation at the time of updating a program counter unit.
FIG. 5 is a schematic diagram showing entries in a branch instruction control unit;
FIG. 6 is a schematic diagram showing entries in a branch destination address register.
FIG. 7 is a block diagram showing a configuration of a program counter unit.
FIG. 8 is a logic circuit diagram of a main part in a branch instruction control unit.
FIG. 9 is a logic circuit diagram of a main part in a branch instruction control unit.
FIG. 10 is a logic circuit diagram of a main part in a branch instruction control unit.
FIG. 11 is a logic circuit diagram of a main part in the branch instruction control unit.
FIG. 12 is a logic circuit diagram of a main part in an instruction completion control unit.
FIG. 13 is a logic circuit diagram of a main part in an instruction completion control unit.
FIG. 14 is a logic circuit diagram of an nPC update circuit in a program counter unit.
FIG. 15 is a logic circuit diagram of an update circuit for nPC in a program counter unit.
FIG. 16 is a logic circuit diagram of an update circuit for nPC in a program counter unit.
FIG. 17 is a logic circuit diagram of an update circuit for nPC in a program counter unit.
FIG. 18 is a logic circuit diagram of an update circuit for nPC in a program counter unit.
FIG. 19 is a logic circuit diagram of a PC update circuit in a program counter unit.
FIG. 20 is a logic circuit diagram of a PC update circuit in a program counter unit.
[Explanation of symbols]
1 Branch prediction unit
2 Instruction fetch section
3 Instruction buffer
4 Relative branch address generator
5 Instruction decoder section
6 Branch instruction execution part
7 Branch instruction control unit
8 Delay slot stack
9 Instruction completion controller
10 Branch destination address register
11 Program counter section
11-1 Latch circuit
11-2, 11-3 Update circuit
21 Instruction unit
22 Memory unit
23 Arithmetic unit
PC program counter
nPC Next Program Counter

Claims

In a processor for controlling an architecture having a branch delay instruction while performing instruction control by an out-of-order method using a branch prediction mechanism,
A step of simultaneously completing a plurality of instructions including a branch instruction when branch prediction is successful and the branch instruction branches; and
Updating the program counter and the next program counter according to the number of completed instructions at the same time.

In a processor for controlling an architecture having a branch delay instruction while performing instruction control by an out-of-order method using a branch prediction mechanism,
A step of simultaneously completing a plurality of instructions including a branch instruction when the branch prediction is successful and the branch instruction does not branch; and
Updating the program counter and the next program counter according to the number of completed instructions at the same time.

In a processor for controlling an architecture having a branch delay instruction while performing instruction control by an out-of-order method using a branch prediction mechanism,
A step of simultaneously completing a plurality of instructions including a branch instruction when branch prediction fails and a branch instruction branches; and
Updating the program counter and the next program counter according to the number of completed instructions at the same time.

In a processor for controlling an architecture having a branch delay instruction while performing instruction control by an out-of-order method using a branch prediction mechanism,
A step of simultaneously completing a plurality of instructions including a branch instruction when the branch prediction fails and the branch instruction does not branch; and
Updating the program counter and the next program counter according to the number of completed instructions at the same time.

In a processor that controls an architecture having a branch delay instruction while performing instruction control by an out-of-order method using a branch prediction unit,
A branch instruction control unit capable of simultaneously controlling a plurality of branch instructions, performing branch condition determination of branch instructions, success / failure of branch prediction, and instruction refetch control.
A branch destination address register for storing a plurality of branch destination addresses of branch instructions determined to branch;
The processor, wherein the branch destination address register can be controlled independently of the branch instruction control unit and the branch prediction unit.