JPH07262005A

JPH07262005A - Extended operand bypass system

Info

Publication number: JPH07262005A
Application number: JP3276095A
Authority: JP
Inventors: V Argade Pramod; ヴェイサントアーゲードプラモド
Original assignee: American Telephone and Telegraph Co Inc; AT&T Corp
Current assignee: AT&T Corp
Priority date: 1994-02-22
Filing date: 1995-02-22
Publication date: 1995-10-13

Abstract

PURPOSE: To avoid the necessity from another on-chip data cache or off-chip memory by making the output port of a result register adaptive to instruction registers and operand registers by selectively connecting the output port to the input port of one of the instruction and operand registers through multiplexers. CONSTITUTION: Substitute signal processing routes 30 and 40 respectively connect the output port of an ALU 200 to the input ports of operand registers 410 and 420 through multiplexers 610 and 620 and routes 10 and 20 respectively connect the output port of a result register 500 to the input ports of the operand registers 410 and 420 through the multiplexers 610 and 620. The processing circuits 30 and 40 also connect the output port of the ALU 200 to the input ports of instruction registers 310 and 320 through multiplexers 710 and 720 and the routes 10 and 20 make the output port of the result register 500 adaptive to the instruction registers 310 and 320 by connecting the output port to the registers 310 and 320.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明はマイクロプロセッサ、よ
り詳細には、パイプライン結合されたマイクロプロセッ
サに関する。FIELD OF THE INVENTION The present invention relates to microprocessors, and more particularly to pipelined microprocessors.

【０００２】[0002]

【従来の技術】マイクロプロセッサ、例えば、デジタル
信号プロセッサは、コード化されたインストラクショ
ン、例えば、コンピュータコードをオブジェクトコード
の形式にて実行する。この背景においては、オブジェク
トコードは、デジタル信号プロセッサ或はマイクロプロ
セッサの動作を制御するビット或はインストラクション
信号の形式のマシンにて実行可能なデジタル信号から構
成される。このオブジェクトコード或はインストラクシ
ョン信号が、デジタル信号プロセッサ或はマイクロプロ
セッサが指令するためにプロセッサに直接に提供され、
或はより高い水準のコンピュータプログラミング言語を
オブジェクトコードインストラクション信号に翻訳する
ことによって得られる。これらオブジェクトコードイン
ストラクション信号は、次に、典型的には、デジタル信
号プロセッサ或はマイクロプロセッサによって、復号さ
れ、実行される。従って、典型的には、オブジェクトコ
ードインストラクションの実行と関連するステップに
は：オブジェクトコードインストラクション信号をメモ
リから取り出すステップ、これらインストラクション信
号を復号し、これら信号を演算／論理動作の実行のため
に適当な形式にてプロセッサに提供するステップ、及び
復号されたインストラクション信号を実行するステップ
が含まれる。さらに、復号されたインストラクション信
号を実行するステップには：取り出されるべきオペラン
ドのアドレスの位置を見つけるサブステップ、これらオ
ペランドを得る或は取り出すサブステップ、これらオペ
ランドに関して選択された動作を遂行するサブステッ
プ、及びそれらオペランドに関して遂行された動作の結
果を格納するサブステップが含まれる。2. Description of the Related Art Microprocessors, eg digital signal processors, execute coded instructions, eg computer code, in the form of object code. In this context, object code consists of machine-executable digital signals in the form of bits or instruction signals that control the operation of a digital signal processor or microprocessor. This object code or instruction signal is provided directly to the processor for command by a digital signal processor or microprocessor,
Alternatively, it can be obtained by translating a higher level computer programming language into object code instruction signals. These object code instruction signals are then decoded and executed, typically by a digital signal processor or microprocessor. Thus, typically, the steps associated with performing object code instructions include: retrieving object code instruction signals from memory, decoding these instruction signals, and retrieving those signals suitable for performing arithmetic / logical operations. Providing the processor in the form and executing the decoded instruction signal. Further, the steps of executing the decoded instruction signal include: finding the location of the addresses of the operands to be fetched, substeps to get or retrieve these operands, substeps to perform the selected operation on these operands, And substeps for storing the results of the operations performed on those operands.

【０００３】マイクロプロセッサ、例えば、デジタル信
号プロセッサの速度性能を向上させる一つの方法は、こ
れら一巡のステップをパイプライン結合する方法であ
る。パイプライン結合されたデジタル信号プロセッサ或
はマイクロプロセッサの一例が、１９９２年８月１１日
付けで付与され、本発明の譲受人に譲渡された、Argade
による『Multiplier Signed and Unsigned Overflow Fl
ags 』という名称の合衆国特許第５，１３８，５７０号
において開示されている。典型的には、パイプライン結
合されたデジタル信号プロセッサ或はマイクロプロセッ
サにおいては、インストラクションは、２段はそれ以上
の段のパイプライン、例えば、３段のパイプライン内で
復号された後に実行される。３段実行ユニットの場合
は、そのパイプラインの３段実行ユニット或は実行部分
の第一の段は、インストラクションレジスタ（ＩＲ）段
と称される。この段は、復号器ユニットから実行のため
に復号されたインストラクション信号を得て、取り出さ
れるべきオペランドの有効メモリアドレス位置を形成す
る。これはまた、それらオペランドに関して遂行される
べき動作の結果を格納するための宛先メモリ位置のアド
レスを形成する。パイプラインの実行ユニットの第二の
段はオペランドレジスタ（ＯＲ）段と称される。この段
は、復号されたインストラクション命令に基づいて遂行
されるべき動作を決定し、オペランドを取り出し、そし
て、これらオペランドに関して選択された演算／論理動
作を遂行する。パイプラインの実行ユニットの第三の段
は、結果レジスタ（ＲＲ）段と称される。この段は、こ
れらオペランドに関して遂行された動作の結果をメモリ
内の指定された宛先アドレス位置に格納する。One way to improve the speed performance of a microprocessor, eg a digital signal processor, is to pipeline combine these rounds of steps. An example of a pipeline coupled digital signal processor or microprocessor was granted on Aug. 11, 1992 and assigned to the assignee of the present invention, Argade.
'Multiplier Signed and Unsigned Overflow Fl
It is disclosed in US Pat. No. 5,138,570 entitled "ags". Typically, in pipelined digital signal processors or microprocessors, instructions are executed after two stages have been decoded within a pipeline of more than one stage, eg, a three stage pipeline. . In the case of a three-stage execution unit, the first stage of the three-stage execution unit or part of the pipeline is called the instruction register (IR) stage. This stage takes the decoded instruction signal for execution from the decoder unit and forms the effective memory address location of the operand to be fetched. It also forms the address of the destination memory location for storing the result of the operation to be performed on those operands. The second stage of the execution unit of the pipeline is called the operand register (OR) stage. This stage determines the operation to be performed based on the decoded instruction instruction, fetches the operands, and performs the selected arithmetic / logical operation on these operands. The third stage of the execution unit of the pipeline is referred to as the result register (RR) stage. This stage stores the result of the operation performed on these operands at the specified destination address location in memory.

【０００４】パイプライン結合されたデジタル信号プロ
セッサ或はマイクロプロセッサにおいては、後続の或は
第二のインストラクションが先行の或は第一のインスト
ラクションの結果をオペランドとして使用することが度
々発生する。このような状況においては、パイプライン
の性質上、パイプラインの実行ユニット内の一つの段が
その動作をもう一つの段の動作が完了する前に完了する
ことに起因するタイミング問題が発生する。例えば、後
続のインストラクションがＩＲ段の動作を制御するよう
なクロックサイクルにおいては、ＩＲ段が前の或は先行
のインストラクションの結果の位置の取り出しを試みる
が、その結果がまだ格納されてないような場合が生じ
る。つまり、先行インストラクションが、例えば、ＲＲ
段の手前のパイプライン段内の実行を終えていない場合
がある。同様にして、インストラクションがＯＲ段内に
あり、ＯＲ段がオペランドとして先行のインストラクシ
ョンの結果を取り出すことを試みるが、これがまだメモ
リ内に格納されてない場合もある。これらタイミング問
題は、典型的には、オペランドバイパスメカニズムによ
って扱われる。これに関しては、例えば、IEEE Compute
r Society による主催で１９９３年２月２２−２６に開
催されたCompcon Spring'93 に発表のHobbitによる論文
『A High-performance,low-power microprocessor 』、
及び１９８７年６月２−５日に開催のThe 14th Annual
Symposium on Computer Architecture の議事録にDitre
lとMcellan によって掲載の論文『The Hardware Archit
ectureof the Crisp Machine』を参照されたい。In pipelined digital signal processors or microprocessors, it often happens that a subsequent or second instruction uses the result of the preceding or first instruction as an operand. In such situations, due to the nature of the pipeline, timing problems arise due to one stage in the execution unit of the pipeline completing its operation before the operation of the other stage completes. For example, in a clock cycle where a subsequent instruction controls the operation of an IR stage, the IR stage attempts to retrieve the position of the result of the previous or preceding instruction, but the result has not yet been stored. There are cases. That is, if the preceding instruction is, for example, RR
Execution in the pipeline stage before the stage may not be completed. Similarly, the instruction is in an OR stage, and the OR stage attempts to retrieve the result of the preceding instruction as an operand, but this may not yet be stored in memory. These timing issues are typically handled by the operand bypass mechanism. In this regard, for example, IEEE Compute
Hobbit's paper "A High-performance, low-power microprocessor" presented at Compcon Spring '93 , 22-26 February 1993, sponsored by the r Society,
And The 14th Annual , held June 2-5, 1987
Ditre in the minutes of Symposium on Computer Architecture
l and Mcellan's paper The Hardware Archit
ectureof the Crisp Machine ”.

【０００５】[0005]

【発明が解決しようとする課題】もう一つの全く異なる
タイミグ状況が発生するが、この状況においては、パイ
プライン内でインストラクションが時間的に離され、結
果として、第一のインストラクションがパイプラインか
ら“クロックアウト”されるために、第二のインストラ
クションが第一のインストラクションに対する結果をメ
モリから取り出す必要性が生じる。従って、この後者の
タイミグ状況を扱うための方法或はメカニズムが必要と
される。Another completely different timing situation arises, in which the instructions are temporally separated in the pipeline, and as a result the first instruction is “out of the pipeline”. Being "clocked out" results in the need for the second instruction to retrieve the result for the first instruction from memory. Therefore, a method or mechanism for handling this latter timing situation is needed.

【０００６】[0006]

【課題を解決するための手段】簡単に述べると、本発明
の一つの実施例においては、マイクロプロセッサは、パ
イプライン結合されたマイクロプロセッサから構成され
る。このパイプライン結合されたマイクロプロセッサ
は、拡張されたオペランドバイパスメカニズムを含む。
同様にして、本発明によるパイプライン結合されたマイ
クロプロセッサ内で使用されるオペランドをバスパスす
るための方法は、一つの実行されたマイクロプロセッサ
インストラクションに対する演算論理ユニット（ＡＬ
Ｕ）の出力信号をパイプライン結合されたマイクロプロ
セッサの結果レジスタ段内のレジスタ内にパイプライン
結合されたマイクロプロセッサの１クロックサイクル期
間以上格納するステップを含む。Briefly stated, in one embodiment of the present invention, the microprocessor comprises pipelined microprocessors. This pipelined microprocessor includes an extended operand bypass mechanism.
Similarly, the method for buspassing operands used in pipelined microprocessors according to the present invention provides an arithmetic logic unit (AL) for one executed microprocessor instruction.
U) storing the output signal of U) in a register in the result register stage of the pipelined microprocessor for at least one clock cycle period of the pipelined microprocessor.

【０００７】本発明として考慮される主題事項は、この
説明の特許請求の範囲によって具体的に個別に請求され
る。但し、本発明、つまり、本発明の構成と動作の方法
の両者、及び本発明の目的、特徴、利益は、以下の詳細
な説明を付随の図面と照らし合わせて読むことによって
さらに良く理解できるものである。The subject matter considered as the invention is specifically claimed by the claims of this description. However, the invention, that is, both the construction and the method of operation of the invention, and the objects, features and advantages of the invention, can be better understood by reading the following detailed description in view of the accompanying drawings. Is.

【０００８】[0008]

【実施例】図２はパイプライン結合されたデジタル信号
プロセッサ（ＤＳＰ）或はマイクロプロセッサの実行ユ
ニットの一つの実施例の略図である。図示されるよう
に、マイクロプロセッサ５０は、ＩＲ段内のインストラ
クションレジスタ（ＩＲ）３１０と３２０、ＯＲ段内の
オペランドレジスタ（ＯＲ）４１０、４２０、演算論理
ユニット（ＡＬＵ）２００、及びＲＲ段内の結果或はデ
ータレジスタ（ＲＲ）５００を含む。図２には示されな
いが、各段はプログラムカウンタを含み、プログラムカ
ウンタは、その段によって実行中のインストラクション
のメモリアドレス位置を含む。DESCRIPTION OF THE PREFERRED EMBODIMENT FIG. 2 is a schematic diagram of one embodiment of an execution unit of a pipelined digital signal processor (DSP) or microprocessor. As shown, the microprocessor 50 includes instruction registers (IR) 310 and 320 in the IR stage, operand registers (OR) 410, 420 in the OR stage, arithmetic logic unit (ALU) 200, and in the RR stage. It includes a result or data register (RR) 500. Although not shown in FIG. 2, each stage contains a program counter, which contains the memory address location of the instruction being executed by that stage.

【０００９】上で述べたように、パイプライン結合され
たデジタル信号プロセッサ或はマイクロプロセッサにお
いては、様々なインストラクションがそのパイプライン
内において異なる実行段階にあることがある。これは、
そのパイプライン内のインストラクションが互いに依存
するような状況、例えば、後続の或は第二のインストラ
クションが先行の或は第一のインストラクションによっ
て遂行された演算／論理動作の結果をオペランドとして
使用するような場合には、特に重要な意味を持つ。後続
の或は第二のインストラクションが、第一のインストラ
クションによって遂行された動作の結果を、その宛先メ
モリのアドレス位置から、第一のインストラクションの
結果がそこに格納される前に取り出そうとした場合、第
二のインストラクションは、典型的には、誤ったオペラ
ンドを取り出すこととなる。これはオペランドハザード
と称される状況の一例である。前述の論文“The Hardwa
reArchitecture of the Crisp Machine”において説明
されているごとく、パイプライン結合されたデジタル信
号プロセッサ或はマイクロプロセッサは、このハザード
状況を検出して、特定のハザード状況に基づいて、オペ
ランドをＩＲかＯＲ段に“バイパス”するオペランドバ
イパスメカニズムを使用する。As mentioned above, in a pipelined digital signal processor or microprocessor, various instructions may be in different stages of execution within the pipeline. this is,
In situations where the instructions in the pipeline are dependent on each other, for example, a subsequent or second instruction uses as an operand the result of an arithmetic / logical operation performed by the preceding or first instruction. In case it has a particularly important meaning. If a subsequent or second instruction attempts to retrieve the result of the operation performed by the first instruction from its destination memory address location before the result of the first instruction is stored therein: The second instruction will typically fetch the wrong operand. This is an example of a situation called operand hazard. The above paper “The Hardwa
As described in "ReArchitecture of the Crisp Machine", a pipelined digital signal processor or microprocessor detects this hazard condition and places the operand in an IR or OR stage based on the particular hazard condition. Uses the "bypass" operand bypass mechanism.

【００１０】第二のインストラクションが第一のインス
トラクションがその実行を完了する前に第一のインスト
ラクションの結果を要求するのではなく、第一のインス
トラクションがその実行を、第二のインストラクション
が第一のインストラクションによって遂行された動作を
必要とする前に完了した場合には、異なるタイミング状
況が発生する。つまり、第一のインストラクションが第
二のインストラクションよりもあまりにも早く結果を生
成し、このために、第一のインストラクションが、本質
的に、パイプラインから“クロックアウト”する場合が
ある。この異なるタイミング状況は、例えば、後続のイ
ンストラクションの取り出し或は復号のタイミングが不
確かな場合、或は、マイクロプロセッサの資源の衝突、
例えば、メモリに対する要求の衝突があった場合に発生
する。後者の状況が発生した場合は、第一のインストラ
クションによって遂行された動作の結果があまりにも早
く結果レジスタ５００に到達するために、パイプライン
結合されたマイクロプロセッサの次のクロックサイクル
で無効にされる。典型的には、このような状況において
は、第一のインストラクションの結果が補助メモリ、例
えば、オンチップデータキャッシュ、或は別の方法とし
て、オフチップメモリ内に書込まれ、第一のインストラ
クションによって遂行された動作の結果が失われるのが
回避される。Rather than the second instruction requesting the result of the first instruction before the first instruction completes its execution, the first instruction causes its execution and the second instruction causes the first instruction to complete. Different timing situations occur if the operations performed by the instructions are completed before they are needed. That is, the first instruction may produce a result too sooner than the second instruction, which may essentially cause the first instruction to "clock out" from the pipeline. This different timing situation may be due to, for example, uncertain timing of fetching or decoding subsequent instructions, or conflicting microprocessor resources.
For example, this occurs when there is a collision of requests for memory. If the latter situation occurs, the result of the operation performed by the first instruction will be invalidated in the next clock cycle of the pipelined microprocessor so that it reaches the result register 500 too early. . Typically, in such situations, the result of the first instruction is written into auxiliary memory, eg, on-chip data cache, or, alternatively, off-chip memory, where the first instruction causes Loss of the result of the performed action is avoided.

【００１１】図１は本発明に従う拡張されたオペランド
バイパスメカニズム１００を示す。図１の実施例は、３
つの一連の段を持つ実行ユニット１５を示すが、本発明
の範囲は、３つの一連の段のみに限定されるものではな
い。例えば、実行ユニットは、二つ或はそれ以上の段を
含むことも、或はこれら段は必ずしも連続したものでな
くてもよい。さらに、パイプライン結合されたマイクロ
プロセッサの別の実現も可能である。示されるように、
前述のインストラクションレジスタ（ＩＲ）、オペラン
ドレジスタ（ＯＲ）、演算論理ユニット（ＡＬＵ）、結
果レジスタ（ＲＲ）に加えて、図１に示される実施例
は、ＩＲ段インストラクション妥当性（ＩＶ）標識３３
０、ＯＲ段インストラクション妥当性（ＩＶ）標識４３
０、ＯＲ段アドレスレジスタ４４０、ＲＲ段アドレスレ
ジスタ５１０、ＲＲ段インストラクション妥当性（Ｉ
Ｖ）標識５３０、ＲＲ段データ妥当性（ＤＶ）標識５２
０と、マルチプレクサ（ＭＵＸ）６１０、６２０、７１
０、７２０を含む。図１に示されるマルチプレクサは、
並列ＭＵＸから構成されるが、本発明の範囲はこの点で
限定を受けるものではない。つまり、ＭＵＸは、直列Ｍ
ＵＸから構成することもでき、直列から並列への変換、
或は並列から直列への変換を遂行するために追加のデジ
タル電子回路を使用することもできる。ただし、これに
付随して発生する信号処理の遅延のために、後者のアプ
ローチは、幾つかの計算或は信号処理用途には適さない
場合がある。FIG. 1 illustrates an extended operand bypass mechanism 100 according to the present invention. The embodiment of FIG.
Although shown as an execution unit 15 with three series of stages, the scope of the invention is not limited to only three series of stages. For example, an execution unit may include two or more stages, or the stages need not be consecutive. Furthermore, other implementations of pipelined microprocessors are possible. As shown
In addition to the instruction register (IR), operand register (OR), arithmetic logic unit (ALU), result register (RR) described above, the embodiment shown in FIG. 1 uses the IR stage instruction validity (IV) indicator 33.
0, OR stage instruction validity (IV) indicator 43
0, OR stage address register 440, RR stage address register 510, RR stage instruction validity (I
V) indicator 530, RR stage data validity (DV) indicator 52
0 and multiplexers (MUX) 610, 620, 71
Including 0 and 720. The multiplexer shown in FIG.
Although composed of parallel MUXs, the scope of the invention is not limited in this respect. That is, MUX is a serial M
It can also consist of UX, conversion from serial to parallel,
Alternatively, additional digital electronic circuitry may be used to perform the parallel to series conversion. However, due to the attendant signal processing delays, the latter approach may not be suitable for some computational or signal processing applications.

【００１２】示されるように、拡張されたオペランドバ
イパスメカニズム１００は、二つの主信号処理経路７
０、８０を持つ。各実行されたインストラクションに対
する第一と第二、或は左と右に対応するデジタル信号或
はビットは、これら各々の経路に沿って伝播する。つま
り、経路７０の場合は、ＩＲ３１０とＯＲ４１０を介し
てＡＬＵ２００の入力ポートに伝播し、経路８０の場合
は、ＩＲ３２０とＯＲ４２０を介して伝播する。同様に
して、図示されるように、主信号処理経路９０は、ＡＬ
Ｕ２００の出力ポートをＲＲ５００の入力ポートに結合
する。これら主経路に加えて、メカニズム１００は、さ
らに、複数の代替信号処理経路を含む。例えば、経路３
０と４０は、ＡＬＵ２００の出力ポートを、それぞれ、
ＭＵＸ６１０と６２０を介して、ＯＲ４１０と４２０の
入力ポートに結合し、同様にして、経路１０、２０は、
それぞれ、ＭＵＸ６１０、６２０を介して、ＲＲ５００
の出力ポートをＯＲ４１０、４２０の入力ポートに結合
する。さらに、代替信号処理経路３０、４０は、ＡＬＵ
２００の出力ポートを、それぞれ、ＭＵＸ７１０、７２
０を介してＩＲ３１０、３２０の入力ポートに結合し、
同様にして、経路１０、２０は、ＲＲ５００の出力ポー
トを、それぞれ、ＭＵＸ７１０、７２０を介して、３１
０、３２０の入力ポートに結合する。As shown, the extended operand bypass mechanism 100 includes two main signal processing paths 7.
Has 0, 80. The first and second, or left and right, corresponding digital signals or bits for each executed instruction propagate along their respective paths. That is, in the case of the route 70, it propagates to the input port of the ALU 200 via the IR 310 and OR 410, and in the case of the route 80, it propagates via the IR 320 and OR 420. Similarly, as shown, the main signal processing path 90 is
The output port of U200 is coupled to the input port of RR500. In addition to these main paths, mechanism 100 further includes a plurality of alternative signal processing paths. For example, route 3
0 and 40 are the output ports of the ALU 200, respectively.
Via MUX 610 and 620, coupled to the input ports of OR 410 and 420, and in the same way, paths 10 and 20
RR500 via MUX610 and 620, respectively
Of the OR's to the input ports of the ORs 410, 420. Further, the alternative signal processing paths 30, 40 are
200 output ports to MUX710 and 72, respectively
Coupled to the input port of IR 310, 320 via 0,
Similarly, the routes 10 and 20 are connected to the output ports of the RR 500 via the MUXs 710 and 720, respectively.
0, 320 input ports.

【００１３】図１に示される実施例においては、ＲＲ５
００とＯＲ４１０、４２０の間の結合は、それぞれ、Ｒ
Ｒ５００の出力ポートとＭＵＸ６１０、６２０の間の経
路１０、２０によって達成される。示されるように、Ｍ
ＵＸ６１０と６２０の入力ポートは、それぞれ、ＩＲ３
１０と３２０の出力ポートに結合され、また、同様にし
て、経路１０と２０に結合される。制御信号ポート５０
と６０が、それぞれ、デジタル信号の流れ、例えば、Ｍ
ＵＸ６１０、６２０を通るビットの流れを制御する制御
信号を受信するために適応（使用）される。こうして、
制御信号が制御信号ポート５０、６０に提供され、第一
のインストラクションによって遂行された動作の結果が
ＲＲ５００内に１クロックサイクル以上保持された場合
に、この事実が通知され、これがＭＵＸ６１０、６２０
の一つを介してＲＯ４１０、４２０の一つに効率的にバ
イパスされ、こうして、従来のように、結果を他のオン
チップデータキャッシュ或はオフチップメモリから得る
必要性が回避される。同様にして、経路１０、２０は、
ＩＲ３１０、３２０を、それぞれ、ＭＵＸ７１０、７２
０を介して、ＲＲ５００の出力ポートと結合する。ＭＵ
Ｘ７１０、７２０は、この特定の実施例においては、そ
の入力ポートが信号経路７０、８０によってパイプライ
ン結合されたマイクロプロセッサの復号ユニット或は復
号部分の二つの出力ポートの一つに結合され、また、同
様にして、それぞれ、経路１０と２０に結合される。そ
れぞれ制御信号ポート５５、５６がＭＵＸ７１０、７２
０を通るデジタル信号の流れを制御する制御信号を受信
するように適応（使用）される。第一のインストラクシ
ョンの結果をＲＲ段からＯＲ段の代わりにＩＲ段に“バ
イパス”することによって、パイプライン結合されたＤ
ＳＰ或はマイクロプロセッサによってしばしば採用され
る間接アドレシングメカニズムが助けられる。In the embodiment shown in FIG. 1, RR5
The connection between 00 and OR 410, 420 is R respectively.
Achieved by paths 10, 20 between the output ports of R500 and MUXs 610, 620. As shown, M
Input ports of UX610 and 620 are IR3
It is coupled to the output ports of 10 and 320, and in the same way to paths 10 and 20. Control signal port 50
And 60 are digital signal streams, for example, M
Adapted (used) to receive control signals that control the flow of bits through the UX 610, 620. Thus
This fact is signaled when a control signal is provided on the control signal ports 50, 60 and the result of the operation performed by the first instruction is held in the RR 500 for more than one clock cycle, which is then indicated by the MUX 610, 620.
Is efficiently bypassed to one of the ROs 410, 420 via one of the two, thus avoiding the need to obtain results from another on-chip data cache or off-chip memory as is conventional. Similarly, the routes 10 and 20 are
IR310 and 320 are replaced with MUX710 and 72, respectively.
0 to the output port of RR500. MU
X710, 720 is coupled to one of the two output ports of the decoding unit or portion of the microprocessor whose input port is pipeline coupled by signal paths 70, 80 in this particular embodiment, and , In the same manner, coupled to paths 10 and 20, respectively. The control signal ports 55 and 56 are MUX710 and 72, respectively.
Adapted (used) to receive control signals that control the flow of digital signals through the zeros. Pipelined D by "bypassing" the result of the first instruction from the RR stage to the IR stage instead of the OR stage.
The indirect addressing mechanism often employed by the SP or microprocessor is aided.

【００１４】ＲＲ５００の内容があるインストラクショ
ンのバイパスされた結果を含む場合には、ＲＲ段データ
妥当性標識５２０がセットされる。こうして、標識５２
０の設定はＲＲ５００が“正当な”デジタル信号を含む
ことを示す。この特定の実施例においては、バイパスメ
カニズムは、この標識がセットされるまで動作が完了し
ない。前に述べたように、図１に示される実施例におい
ては、制御信号ポート５０、６０、５５、６５は、それ
ぞれ、ＭＵＸ６１０、６２０、７１０、７２０に対する
制御信号を受信するように適応（使用）される。このよ
うに、特定の実施例に応じて、これら制御信号は、少な
くとも部分的に、ＲＲデータ妥当性標識５２０の状態或
は内容に依存するようにされる。標識５２０がセットさ
れている場合は、これらＭＵＸは、ＲＲ５００の内容が
ＡＬＵ２００の出力信号を含むことを知り、これら信号
がＯＲ或はＩＲ段に転送され、パイプライン内の後続の
マイクロプロセッサインストラクション、例えば、一連
のマイクロプロセッサインストラクションの次のインス
トラクションに対するオペランドとして、或はオペラン
ドのメモリアドレス位置として使用される。こうして、
ＡＬＵによって完結された動作の結果を宛先メモリアド
レス位置、例えば、オフチップメモリ或はオンチップデ
ータキャッシュ内のメモリ位置から読み出す代わりに、
結果が結果或はデータレジスタ５００内に１クロックサ
イクル以上保持され、ＯＲ段或はＩＲ段の一つの中のレ
ジスタにバイパスされ、これによって、拡張されたオペ
ランドバイパスメカニズムが提供される。同様にして、
宛先メモリの位置アドレスがＲＲ段のアドレスレジスタ
５１０内に１クロックサイクル以上格納或は保持され
る。If the contents of RR 500 contain a bypassed result of an instruction, then RR stage data validity indicator 520 is set. Thus, the sign 52
A setting of 0 indicates that the RR 500 contains a "legitimate" digital signal. In this particular embodiment, the bypass mechanism will not complete operation until this indicator is set. As mentioned previously, in the embodiment shown in FIG. 1, the control signal ports 50, 60, 55, 65 are adapted (used) to receive control signals for the MUXs 610, 620, 710, 720, respectively. To be done. Thus, depending on the particular implementation, these control signals are made dependent, at least in part, on the state or content of the RR data validity indicator 520. If indicator 520 is set, then these MUXs know that the contents of RR 500 contain the output signals of ALU 200, and these signals are forwarded to the OR or IR stage for subsequent microprocessor instructions in the pipeline. For example, it is used as an operand for the next instruction in a series of microprocessor instructions, or as a memory address location of an operand. Thus
Instead of reading the result of the operation completed by the ALU from a destination memory address location, eg, a memory location in off-chip memory or on-chip data cache,
The result is held in the result or data register 500 for more than one clock cycle and bypassed to the register in one of the OR or IR stages, thereby providing an extended operand bypass mechanism. Similarly,
The location address of the destination memory is stored or held in the address register 510 of the RR stage for one clock cycle or more.

【００１５】図１に示されるように、実行ユニット内の
各ユニットは、さらに、インストラクション妥当性標
識、例えば、標識３３０、４３０、５３０を含む。これ
ら各々の標識の内容がセットされると、例えば、“１”
の信号値或は決定によっては“０”の信号値を持つこと
によってセットされると、これは、その特定の段が正当
なインストラクションを実行中であることを示す。上と
同様に、これらレジスタの内容は、その特定の段に対す
る標識がセットされない限り、一連の次の段に伝播され
ることはない。こうして、図解されるように、レジスタ
の内容がこれら段を順に伝播すると、同様にして、正当
なインストラクションの実行を示す信号がインストラク
ション妥当性標識の形式にてこれら段を順に伝播する。As shown in FIG. 1, each unit within the execution unit further includes instruction validity indicators, eg, indicators 330, 430, 530. When the contents of each of these signs are set, for example, "1"
, Or depending on the decision, by having a signal value of "0", this indicates that the particular stage is executing a valid instruction. As above, the contents of these registers will not be propagated to the next stage in the series unless the indicator for that particular stage is set. Thus, as illustrated, as the contents of the registers propagate through these stages in turn, a signal indicative of the proper execution of an instruction similarly propagates through these stages in the form of instruction validity indicators.

【００１６】ＭＵＸ６１０、６２０、７１０、７２０に
提供される制御信号もその特定の実施例に応じて、少な
くとも部分的に、パイプライン結合されたマイクロプロ
セッサの実行ユニットの様々な段の妥当性標識の状態に
依存する。例えば、ＲＲ段のインストラクション妥当性
標識がセットされてない場合は、ＭＵＸに提供される制
御信号は、オペランドハザードを回避するためにＲＲ５
００の出力ポートからオペランドをバイパスする指示は
行なわない。但し、実施例によっては、ＲＲ段のインス
トラクション妥当性標識がセットされてない場合でも、
つまり、そのＲＲ段があるインストラクションを実行中
でない場合でも、ＲＲ５００の出力ポートからの本発明
によるオペランドのバイパスが実行される。但し、上に
述べたように、この特定の実施例の場合は、ＲＲ段のデ
ータ妥当性標識がセットされていることが要求される。
同様にして、本発明に従う拡張されたオペランドバイパ
スは、ＯＲインストラクション妥当性標識がセットされ
てない場合でも達成することができる。The control signals provided to MUXs 610, 620, 710, 720 also depend, at least in part, on the validity indicators of the various stages of the pipelined microprocessor's execution units, depending on the particular implementation. Depends on the state. For example, if the instruction validity indicator of the RR stage is not set, the control signal provided to the MUX will be RR5 to avoid operand hazards.
The output port of 00 does not give an instruction to bypass the operand. However, in some embodiments, even if the instruction validity indicator in the RR stage is not set,
That is, even if the RR stage is not executing an instruction, the bypass of the operand according to the present invention from the output port of the RR 500 is executed. However, as noted above, this particular embodiment requires that the RR stage data validity indicator be set.
Similarly, extended operand bypass in accordance with the present invention can be achieved even when the OR instruction validity indicator is not set.

【００１７】図１に示される実施例について述べると、
ＡＬＵ２００からＲＲ５００へのデジタル出力信号の流
れは、少なくとも部分的に、ＯＲ段インストラクション
妥当性標識４３０の内容に基づいて制御される。例え
ば、ＲＲ５００は、ＡＬＵ２００の出力ポートからのデ
ジタル出力信号或はビットを受信する一つの入力ポート
と出力ポートを含むラッチを含む。ＯＲの内容がＡＬＵ
に経路７０、８０に沿って提供された後に、ＡＬＵはＲ
Ｒ５００内に格納されるべきデジタル出力信号を提供す
る。標識４３０は、次にその内容を標識５３０に転送す
るが、これは、ＯＲの内容が現在“正当でなく”、ＲＲ
５００の内容が正当であることを知らせる。こうして、
ＯＲ段が“正当でない”ことを示す標識４３０の内容に
よって、ラッチがＡＬＵ２００からさらにデジタル出力
信号を得ることが阻止される。この結果として、ＲＲ５
００の内容が、ＯＲの現在“正当でない”内容に基づい
てＡＬＵ２００によって生成されたその後のデジタル出
力信号によってオーバライトされる（無効にされる）こ
とが回避される。Referring to the embodiment shown in FIG. 1,
The flow of digital output signals from the ALU 200 to the RR 500 is controlled based, at least in part, on the contents of the OR stage instruction validity indicator 430. For example, RR 500 includes a latch that includes an input port and an output port that receives a digital output signal or bit from the output port of ALU 200. The contents of OR are ALU
ALU, after being provided along path 70, 80 to
Provide a digital output signal to be stored in R500. Indicator 430 then forwards its contents to indicator 530 because the contents of the OR are currently "not valid" and RR.
Notify that the contents of 500 are valid. Thus
The contents of the indicator 430, which indicates that the OR stage is "illegal", prevents the latch from getting more digital output signals from the ALU 200. As a result of this, RR5
It is avoided that the contents of 00 are overwritten (overridden) by subsequent digital output signals generated by ALU 200 based on the current "illegal" contents of the OR.

【００１８】同様にして、実施例によっては、本発明に
よる拡張されたオペランドバイパスメカニズムが、少な
くとも部分的に、比較器、例えば、図１に示される等価
比較器９１０、９２０、９３０、９４０の出力信号によ
ってトリガ或は先導される。これら比較器は、マイクロ
プロセッサインストラクションの結果用の宛先メモリの
アドレス位置と次の或は後続のマイクロプロセッサイン
ストラクションのオペランド用のメモリアドレス位置と
を比較する。これらメモリ位置が一致或は対応する場合
は、前述のように、第一の或は先行のインストラクショ
ンがパイプラインから“クロックアウト”する可能性が
ある。従って、この特定の実施例においては、ＭＵＸに
提供される制御信号は、少なくとも部分的に、この比較
器の出力信号にも依存するようにされ、こうして、以下
に説明されるように、本発明に従う拡張されたオペラン
ドバイパスメカニズムが起動される。同様にして、結果
として、論理回路によるメモリからオペランドを読み出
す動作が省かれる。別の状況として、これらのメモリ位
置が一致しない場合は、前述のように、第一のインスト
ラクションの結果はバイパスされる必要はない。Similarly, in some embodiments, the extended operand bypass mechanism according to the present invention is at least partially based on the output of a comparator, eg, equality comparator 910, 920, 930, 940 shown in FIG. It is triggered or led by a signal. These comparators compare the destination memory address location for the result of the microprocessor instruction with the memory address location for the operand of the next or subsequent microprocessor instruction. If these memory locations match or correspond, then the first or previous instruction may "clock out" from the pipeline, as described above. Thus, in this particular embodiment, the control signal provided to the MUX is made to depend, at least in part, also on the output signal of this comparator, and thus the present invention, as described below. The extended operand bypass mechanism according to is activated. Similarly, as a result, the operation of reading the operand from the memory by the logic circuit is omitted. Alternatively, if these memory locations do not match, then the result of the first instruction need not be bypassed, as described above.

【００１９】上で示唆されたように、マイクロプロセッ
サインストラクションによって遂行される動作の結果に
対する宛先メモリの位置アドレスは、代表的には、ＩＲ
段内で生成されるが、本発明の範囲はこれによって限定
されるものではない。図１に示される実施例について述
べると、ＯＲ段アドレスレジスタ４４０は、ＩＲ３１０
の内容を受信する。同様にして、図示されるように、こ
の実施例においては、レジスタ４４０の内容は比較器、
例えば、等価比較器９１０、９２０に提供されるが、こ
れら比較器は、ＯＲ段アドレスレジスタ４４０内に格納
されている宛先アドレスを、そのクロックサイクルに対
するＩＲ３１０、３２０の現在の内容と比較する。同様
にして、次のクロックサイクルにおいて、レジスタ４４
０の内容がＲＲ段アドレスレジスタ５１０に提供され、
結果として、ここでも、図１に示されるように等価比較
器９３０、９４０によって、この次のクロックサイクル
に対するＲＩ３１０、３２０の内容との比較が遂行され
る。こうして、各クロックサイクルごとに、ＩＲ３１
０、３２０の内容がＯＲ段とＲＲ段の宛先メモリのアド
レス位置と比較される。こうして、この技法は、ＩＲ３
１０、３２０がメモリアドレス位置を含むために、複雑
な間接アドレシングも扱うことが理解できる。As suggested above, the location address of the destination memory for the result of the operation performed by the microprocessor instruction is typically IR.
Although generated in stages, the scope of the invention is not limited thereby. To describe the embodiment shown in FIG. 1, the OR stage address register 440 uses the IR 310.
To receive the contents of. Similarly, as shown, in this embodiment, the contents of register 440 is the comparator,
For example, provided to equality comparators 910, 920, which compare the destination address stored in OR stage address register 440 with the current contents of IR 310, 320 for that clock cycle. Similarly, on the next clock cycle, register 44
The contents of 0 are provided to the RR stage address register 510,
As a result, here again, the comparison with the contents of RI 310, 320 for this next clock cycle is performed by equality comparators 930, 940 as shown in FIG. Thus, for each clock cycle, the IR31
The contents of 0, 320 are compared with the address locations in the destination memory of the OR and RR stages. Thus, this technique uses IR3
It can be seen that since 10, 320 includes memory address locations, it also handles complex indirect addressing.

【００２０】先に示唆したように、図１は、オペランド
ハザードを回避するために採用されるバイパスメカニズ
ムを示す。例えば、図示されるように、経路３０、４０
は、ＡＬＵ２００の出力ポートをＭＵＸ６１０、６２０
の入力ポートに結合する。制御信号ポート５０、６０
は、従って、各々のＭＵＸ制御信号を受信し、それぞ
れ、経路３０、４０によってＭＵＸ６１０、６２０に提
供されたデジタル信号を選択し、ＭＵＸ６１０、６２０
の出力ポートに向ける。結果として、ＡＬＵ２００の出
力信号がオペランドレジスタ４１０、４２０の一つに直
接にバイパスされ、これによってハザードが回避され
る。ここまで説明すると、一つ或は複数の外部的に生成
されたクロックによって、適当なクロックパルス或はタ
イミング信号を提供し、これによって、実行ユニット段
を含む異なるパイプライン段の動作を調節或は同期でき
ることが理解できる。As alluded to above, FIG. 1 illustrates the bypass mechanism employed to avoid operand hazards. For example, as shown, paths 30, 40
Outputs the output ports of the ALU200 to the MUXs 610 and 620.
To the input port of. Control signal port 50, 60
Therefore receives the respective MUX control signals and selects the digital signals provided to the MUXs 610, 620 by paths 30, 40, respectively, MUXs 610, 620
To the output port of. As a result, the output signal of ALU 200 is directly bypassed to one of the operand registers 410, 420, thereby avoiding hazards. To this extent, one or more externally generated clocks provide the appropriate clock pulses or timing signals to regulate or control the operation of different pipeline stages, including execution unit stages. Understand that you can synchronize.

【００２１】拡張されたオペランドバイパスメカニズ
ム、例えば、図１に示される実施例は、幾つかの利益を
提供する。例えば、第一の或は先行ＤＳＰ或はマイクロ
プロセッサのインストラクションの結果をメモリから読
み出すことと関連するクロックサイクルが回避されるた
めに、ＤＳＰ或はマイクロプロセッサの速度と性能の向
上が達成される。同様にして、前述のように、結果とし
て、オフチップメモリ或はオンチップデータキャッシュ
の使用と関連するクロックサイクルが排除されるため
に、このインストラクションによって遂行される動作を
完遂するために使用される全体としての電力が低減され
る。この電力の節約は、この節約が、部分的に、本発明
による拡張されたオペランドバイパスメカニズムによっ
て対処される状況が発生する頻度に依存するために、定
量的に述べることは困難である。但し、この節約は、２
０から１０％の桁であると考えられる。これは、特に、
電力の消費が重要な実用上の考慮事項となるような環
境、例えば、携帯用途の場合は、特に重要である。The extended operand bypass mechanism, eg, the embodiment shown in FIG. 1, provides several benefits. For example, an increase in speed or performance of the DSP or microprocessor is achieved because the clock cycles associated with reading the results of the first or preceding DSP or microprocessor instructions from memory are avoided. Similarly, as mentioned above, as a result, clock cycles associated with the use of off-chip memory or on-chip data caches are eliminated, so that they are used to complete the operations performed by this instruction. The overall power is reduced. This power savings is difficult to quantify because it depends, in part, on the frequency with which the situations encountered by the extended operand bypass mechanism according to the present invention occur. However, this savings is 2
It is considered to be in the order of 0 to 10%. This is especially
This is especially important in environments where power consumption is an important practical consideration, such as in mobile applications.

【００２２】例えば、図１に示される実施例の場合のよ
うなマイクロプロセッサのインストラクションを実行す
るパイプライン結合されたマイクロプロセッサの実行段
内でオペランドバイパスを行なうための方法は、以下の
ように達成される。前に述べたように、二つのマイクロ
プロセッサのインストラクションがパイプライン内の実
行の異なる段階にある場合がある。状況によっては、第
一或は先行のインストラクションが実行を第二のインス
トラクションの少なくとも２クロック或はそれ以上のク
ロックサイクルだけ先に完了する場合が考えられる。こ
のために、ＡＬＵ、例えば、図１内のＡＬＵ２００の出
力信号が、こうして実行されるインストラクションに対
して、ＲＲ段内にパイプライン結合されたマイクロプロ
セッサの１クロックサイクル期間以上格納される。For example, a method for performing operand bypass within the execution stages of a pipelined microprocessor that executes microprocessor instructions as in the embodiment shown in FIG. 1 is accomplished as follows. To be done. As mentioned previously, the instructions of the two microprocessors may be at different stages of execution within the pipeline. In some situations, the first or previous instruction may complete execution at least two clock cycles or more clock cycles of the second instruction. To this end, the output signal of the ALU, eg, ALU 200 in FIG. 1, is stored for the instruction thus executed for one clock cycle period or more of the pipelined microprocessor in the RR stage.

【００２３】１クロックサイクル期間以上に保持或は格
納された出力信号は、次に、ＩＲ段或はＯＲ段の一つ中
のレジスタ内に転送される。こうして、前述のように、
オフチップメモリ或はオンチップデータキャッシュ内の
その宛先メモリアドレス位置から出力信号を読み出すた
めのインストラクション或は動作が取消し或は省略され
る。この格納された出力信号は、ＭＵＸ、例えば、図１
に示されるＭＵＸ６１０、６２０、７１０或は７２０の
一つを介して転送される。この格納された出力信号がＯ
Ｒ段内のレジスタに転送された場合は、こうして転送さ
れた出力信号が、次に、その実行されたインストラクシ
ョンの後に実行されるもう一つのインストラクションに
対するオペランドとして使用される。図１に示される実
施例の場合は、この実行されたインストラクションの後
に実行されるインストラクションは、この実行されたイ
ンストラクションの直後に実行されるが、ただし、本発
明の範囲はこの点に関して限定を受けるものではない。
幾つかの状況においては、ある一つの結果を生成する第
一のインストラクションと、第一のインストラクション
の結果をオペランドとして使用する第二のインストラク
ションの間に、入り込んだインストラクションが存在す
る場合でも、本発明に従う拡張されたオペランドバイパ
スを達成することが可能である。このような状況は、例
えば、この入り込んだインストラクションがメモリ内に
結果を格納しないような場合に発生する。同様にして、
格納された出力信号がＩＲ段内のレジスタに転送された
場合は、こうして転送された出力信号が、その実行され
たインストラクションの直後に実行されるもう一つのイ
ンストラクション用のオペランドのメモリアドレス位置
として使用される。The output signal held or stored for more than one clock cycle period is then transferred into a register in one of the IR or OR stages. Thus, as mentioned above,
Instructions or operations for reading the output signal from its destination memory address location in the off-chip memory or on-chip data cache are canceled or omitted. This stored output signal is a MUX, eg, FIG.
Via one of the MUXs 610, 620, 710 or 720 shown in FIG. This stored output signal is O
If transferred to a register in the R stage, the output signal so transferred is then used as an operand for another instruction that is executed after the executed instruction. In the case of the embodiment shown in FIG. 1, the instruction executed after this executed instruction is executed immediately after this executed instruction, although the scope of the invention is limited in this respect. Not a thing.
In some situations, even if there is an intervening instruction between the first instruction that produces a result and the second instruction that uses the result of the first instruction as an operand, the present invention It is possible to achieve extended operand bypass according to Such a situation arises, for example, when the intricate instruction does not store the result in memory. Similarly,
If the stored output signal is transferred to a register in the IR stage, then the output signal thus transferred is used as the memory address location of the operand for another instruction executed immediately after its executed instruction. To be done.

【００２４】前述のように、図１に示される実施例の場
合は、実行されたインストラクションに対するＡＬＵの
出力信号がＲＲ段、例えば、図１のＲＲ５００内に格納
されると、ＲＲ段データ妥当性標識、例えば、標識４３
０がセットされる。同様にして、格納された出力信号を
ＩＲ段或はＯＲ段の一つのレジスタに転送するステップ
が、ＭＵＸ、例えば、図１に示されるＭＵＸ６１０、６
２０、７１０或は７２０の一つによって達成される場合
は、この格納された出力信号を転送するステップが、さ
らに、少なくとも部分的に、ＲＲデータ標識の設定状態
に基づいてＭＵＸを制御するステップを含む。As mentioned above, in the embodiment shown in FIG. 1, if the output signal of the ALU for the executed instruction is stored in the RR stage, eg, RR500 of FIG. 1, the RR stage data validity. Sign, eg sign 43
0 is set. Similarly, the step of transferring the stored output signal to one register in the IR or OR stage is a MUX, eg MUX 610, 6 shown in FIG.
20.710 or 720, the step of transferring the stored output signal further comprises controlling the MUX based at least in part on the set state of the RR data indicator. Including.

【００２５】前に述べたように、本発明に従う拡張され
たオペランドバイパスメカニズムは、第一の或は先行の
インストラクションの結果の宛先メモリのアドレス位置
を後続のインストラクションのオペランドのメモリアド
レス位置と比較することによって始動される。図１の実
施例に対して説明されるように、このステップは、ＡＬ
Ｕの出力信号を１クロックサイクル期間以上結果レジス
タ内に格納するステップに先行して或はこれと平行して
遂行される。ここまで説明すれば、結果レジスタ、例え
ば、図１のＲＲ５００の内容は、本発明に従う拡張され
たオペランドバイパスが発生するか否かに関係なくメモ
リ内に書込まれることが理解できるものである。As previously mentioned, the extended operand bypass mechanism according to the present invention compares the destination memory address location of the result of the first or preceding instruction with the memory address location of the operand of the subsequent instruction. It is started by This step, as described for the embodiment of FIG.
It is performed prior to or in parallel with the step of storing the U output signal in the result register for one clock cycle period or more. It will be appreciated that the contents of the result register, eg, RR 500 of FIG. 1, are written into memory regardless of whether extended operand bypass in accordance with the present invention occurs.

【００２６】図１には示されないが、幾つかの特定のケ
ースにおいては、オペランドのバイパスを回避すること
が要求される。例えば、文字操作インストラクションの
場合は、マイクロプロセッサはあるオペランドの細分さ
れた或は選択された部分に関して動作し、従って、レジ
スタ５００内に格納された結果の細分された部分のみが
“正当”となる。この特定の状況においては、従って、
要求される文字操作を成功裡に達成させるためには、結
果レジスタ、例えば、図１のＲＲ５００内に格納された
結果を、これをメモリ内に書込んだ後に、メモリから読
み出すための特定の動作を達成することが必要となる。
同様に、実行されたインストラクションがコンテキスト
スイッチインストラクションである場合は、オペランド
バイパスを遂行しない方が良い。このケースにおいて
は、コンテキストスイッチインストラクションの後に実
行されるためにパイプラインに入るインストラクション
が第二のインストラクションルーチン或はセットのイン
ストラクションを構成する。この状況が発生した場合、
この第二のセットのインストラクションは、第一の或は
先行のセットのインストラクション内のオペランドと動
作上或は論理的に対応するインストラクション信号を、
これらオペランドを指定する意図はないのに含む。従っ
て、コンテキストスイッチインストラクションの後に、
第二のセット内のインストラクションが、第二のセット
内のこのインストラクションは実際には特定のメモリア
ドレス位置の内容を参照するのにもかかわらず、第一の
セット内のインストラクションの結果を参照するように
見える場合がある。従って、このような状況において
は、バイパスを回避する必要がある。この二つの上述の
状況においては、オペランドバイパスが、例えば、デー
タ妥当性標識５２０がセットされることを阻止すること
によって回避される。Although not shown in FIG. 1, in some particular cases it is required to avoid operand bypass. For example, in the case of character manipulation instructions, the microprocessor operates on a subdivided or selected portion of an operand, so that only the subdivided portion of the result stored in register 500 is "valid". . In this particular situation,
In order to successfully accomplish the required character operation, a particular operation for reading the result stored in the result register, eg, RR 500 of FIG. 1, from memory after it has been written into the memory. Is required to be achieved.
Similarly, if the executed instruction is a context switch instruction, it is better not to perform operand bypass. In this case, the instructions that enter the pipeline to be executed after the context switch instruction comprise a second instruction routine or set of instructions. If this situation occurs,
The second set of instructions produces instruction signals that are operationally or logically corresponding to the operands in the first or preceding set of instructions.
It is included although there is no intention to specify these operands. Therefore, after the context switch instruction,
So that the instructions in the second set refer to the results of the instructions in the first set, even though this instruction in the second set actually refers to the contents of a particular memory address location. May look like. Therefore, bypass should be avoided in such situations. In these two above mentioned situations, operand bypass is avoided, for example by preventing the data validity indicator 520 from being set.

【００２７】ここでは、本発明の幾つかの特徴のみが図
に示され、説明されたが、多くの修正、代替、変更或は
均等物が当業者においては明らかになるものである。従
って、特許請求の範囲は、全てのこれら修正及び変更を
本発明の真の精神の範囲に入るものとして包含すること
を意図するものと理解されるべきである。While only some features of the present invention have been shown and described herein, many modifications, alternatives, alterations or equivalents will become apparent to those skilled in the art. Therefore, the claims are to be understood to include all such modifications and changes as fall within the true spirit of the invention.

[Brief description of drawings]

【図１】本発明による拡張されたオペランドバイパス
メカニズムの一つの実施例の略図である。FIG. 1 is a schematic diagram of one embodiment of an extended operand bypass mechanism according to the present invention.

【図２】パイプライン結合されたデジタル信号プロセ
ッサ或はマイクロプロセッサの実行ユニット或は実行部
分の一つの実施例の略図である。FIG. 2 is a schematic diagram of one embodiment of an execution unit or portion of a pipelined digital signal processor or microprocessor.

[Explanation of symbols]

５０マイクロプロセッサ３１０、３２０インストラクションレジスタ４１０、４２０オペランドレジスタ２００演算ユニット５００データレジスタ 50 Microprocessor 310, 320 Instruction register 410, 420 Operand register 200 Operation unit 500 Data register

Claims

[Claims]

1. A microprocessor, the microprocessor comprising: two instruction registers (IR) each having one input port and one output port.
Instruction register (IR) stages including (eg, 310, 320); Operand register (O) including two operand registers (OR) (eg, 410, 420) each having one input port and one output port.
R) stage; arithmetic logic unit (ALU) (for example, 20)
0); and a result register (RR) stage including a result register (RR) having one input port and an output port (eg 500);
200) are combined to form a pipelined microprocessor; said pipelined microprocessor being a multiplexer (eg 7
10, 720, 610, 620), and the multiplexer connects the output port of the RR (eg, 500) to at least one register (eg, 310, 320, 410, 420) of the IR and OR stages. A microprocessor adapted (used) to selectively couple to at least one input port.

2. The microprocessor of claim 1, wherein the RR stage further comprises an RR stage data validity indicator register (eg 520) and an RR stage address register (eg 510).

3. The multiplexer (eg 71
0, 720, 610, 620) is, at least in part, the RR stage data validity indicator register (eg,
520); wherein the RR (eg, 500) is adapted (used) to store the output signal of the ALU (eg, 200) for one or more clock cycle periods of the microprocessor. The microprocessor according to claim 2, wherein

4. The RR stage includes an RR stage instruction validity indicator register (eg, 530) and the multiplexer (eg, 710, 720, 61).
0,620) is further at least partially defined by the aforementioned R
R-stage instruction validity indicator register (eg,
The microprocessor of claim 3 depending on 530).

5. The at least one register (eg, 310, 320, 410, 420) is the at least one register of the OR stage (eg, 410, 42).
0); said multiplexer (eg, 610,
620) is an output port of the RR (eg, 500) and at least one register (eg, of the IR stage) (eg,
10. The output port of a device (310, 420) is adapted (used) to selectively couple to an input port of at least one register (eg, 410, 420) of the OR stage. Microprocessor.

6. The pipelined microprocessor includes a decoder unit before the IR stage; the at least one register (eg, 31).
0, 320, 410, 420) includes at least one register (eg, 310, 320) of said IR stage; said multiplexer (eg, 710, 720).
Has at least one of said RR (eg 500) output port and said decoder unit output port said at least one register (eg 3) of said IR stage.
10. The microprocessor of claim 1 adapted (used) to selectively couple to an input port of 10,320).

7. The RR (eg 500) comprises a latch and the OR stage comprises an OR stage instruction validity indicator register (eg 430); the latch comprises the ALU (eg 200). 6. The micro of claim 5, adapted (received) to receive an output signal from an output port of said at least partially in response to said OR stage instruction validity indicator register (eg, 430). Processor.

8. A method for bypassing operands in a pipelined microprocessor for executing a microprocessor instruction, said pipelined microprocessor comprising an arithmetic logic unit (ALU) (ALU). For example, 200) and a result register (RR) stage; the method described above outputs the output signal of an ALU (eg 200) for one executed microprocessor instruction into a register (eg 500) in the RR stage. Storing at least one clock cycle period of at least one pipelined microprocessor.

9. The pipeline coupled microprocessor further comprises an instruction register (I
R) stage and an operand register (OR) stage; and further store said stored output signal essentially with said IR stage and O stage.
9. Transferring to a register (eg, 310, 320, 410, 420) in one stage selected from the group of R stages is included.
the method of.

10. A register (eg, 310, 320, 4) in one stage selected from the group consisting essentially of the IR and OR stages of the stored output signal.
10, 420) to output the stored output signal to the MUX (eg, 710, 720, 610, 62).
0) Method according to claim 9, characterized in that it comprises the step of transferring via 0).