JPH1021071A

JPH1021071A - Processor operating method processing plural instructions

Info

Publication number: JPH1021071A
Application number: JP33864496A
Authority: JP
Inventors: H Schell Jonathan; エィチ．シエルジョナサン
Original assignee: Texas Instruments Inc
Current assignee: Texas Instruments Inc
Priority date: 1995-12-19
Filing date: 1996-12-18
Publication date: 1998-01-23

Abstract

PROBLEM TO BE SOLVED: To obtain an operation method for making a microprocessor process more than one instructions extremely efficiently by minimizing increase in the complexity of its architecture. SOLUTION: This operating method 20 includes several steps. Namely, one of instructions is received (step 22). It is judged whether the received instruction contains an operand prefix for discriminating a 3rd operand (steps 24 and 26). In response to a judgement showing that the received instruction contains the operand prefix, two operands selected out of a 1st operand, a 2nd operand, and a 3rd operand are used to execute this instruction so that one of the 1st, 2nd, and 3rd operands is not selected, thereby generating a result (step 28). Further, this result is stored in the unselected operand.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、マイクロプロセッ
サ技術、特に命令プレフィックスを使用して２オペラン
ド命令を３オペランド命令に拡張する回路、システム及
び方法に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to microprocessor technology, and more particularly, to a circuit, system and method for extending a two-operand instruction to a three-operand instruction using an instruction prefix.

【０００２】[0002]

【従来の技術】所与の時間間隔にわたって実行される命
令の数で測られるようなマイクロプロセッサの性能を向
上するために、マイクロプロセッサの設計に顕著な進歩
が遂げられている。このような進歩の１つは、「スーパ
スカラ（ｓｕｐｅｒｓｃａｌａｒ）」型のマイクロプロ
セッサの最近における導入であり、この型のマイクロプ
ロセッサは単一命令ポインタを用いる並列命令計算を達
成する。典型的に、スーパスカラマイクロプロセッサ
は、プログラム命令を実行するために、多整数算術演算
ユニット（以下、ＡＬＵと称する）及び浮動小数点ユニ
ット（以下、ＦＰＵと称する）のような多重実行ユニッ
トを有する。このような訳で、多数機械命令がスーパス
カラマイクロプロセッサ内で同時に実行されることがあ
り、デバイス及びそのシステム応用の総合性能に明白な
益を持たらす。BACKGROUND OF THE INVENTION Significant advances have been made in microprocessor design to improve microprocessor performance, as measured by the number of instructions executed over a given time interval. One such advance is the recent introduction of "superscalar" type microprocessors, which achieve parallel instruction computation using a single instruction pointer. Typically, superscalar microprocessors have multiple execution units, such as a multiple integer arithmetic unit (ALU) and a floating point unit (FPU), to execute program instructions. As such, multiple machine instructions may be executed concurrently in a superscalar microprocessor, with a clear benefit to the overall performance of the device and its system applications.

【０００３】性能を向上するために最近のマイクロプロ
セッサに使用される他の普及している技術は、命令の
「パイプライン化（ｐｉｐｅｌｉｎｉｎｇ）」を伴う。
技術上周知のように、マイクロプロセッサ命令は、各
々、命令取出し、命令デコード、レジスタ又はメモリか
らのオペランドの検索、命令の実行、及び命令の結果の
再書き込みのような、いくつかの逐次動作を一般に伴
う。マイクロプロセッサ内の命令のパイプライン化は、
命令の系列内の多数命令を内部系列内の異なるステージ
において同時に処理するような命令系列のステージング
を指す。例えば、もしパイプライン化マイクロプロセッ
サが所与のマイクロプロセッサクロックサイクルに命令
ｎを実行中であるならば、４ステージパイプライン化マ
イクロプロセッサは、同時に（すなわち、同じ機械サイ
クルに）、命令ｎ＋１（すなわち、系列内の次の命令）
に対するオペランドを検索し、命令ｎ＋２をデコード
し、かつ命令ｎ＋３を取り出す。パイプライン化の使用
を通して、マイクロプロセッサの性能は、命令の系列を
クロックサイクル当たり１つと云う速度で有効に実行す
ることができる。[0003] Another popular technique used in modern microprocessors to improve performance involves "pipeling" of instructions.
As is well known in the art, microprocessor instructions each perform a number of sequential operations, such as instruction fetch, instruction decode, retrieving operands from registers or memory, executing instructions, and rewriting instruction results. Generally accompanies. The pipeline of instructions in a microprocessor is
Refers to staging of an instruction sequence such that multiple instructions in the sequence of instructions are processed simultaneously at different stages in the internal sequence. For example, if the pipelined microprocessor is executing instruction n in a given microprocessor clock cycle, the four-stage pipelined microprocessor will simultaneously (ie, in the same machine cycle) execute instruction n + 1 (ie, , The next instruction in the series)
, Decode instruction n + 2, and fetch instruction n + 3. Through the use of pipelining, the performance of a microprocessor can be effectively executed at a rate of one sequence of instructions per clock cycle.

【０００４】パイプライン化技術及びスーパスカラ技術
の両方の使用は多くの最近のマイクロプロセッサに命令
を機械クロックサイクル当たり１つより高い速度で実行
させるが、多くの制限が依然として生じて、これらが総
合性能を低下させる。本発明の実施例の目的にとって１
つの重要な例は、数年前に初めて開発された、必然的に
２オペランド命令に制限されているアーキテクチャ（例
えば、Ｘ８６アーキテクチャ）に起こる。例えば、この
ようなアーキテクチャの命令セット内の加算命令の疑似
コードは、次のような形を呈することがある。[0004] Although the use of both pipelined and superscalar techniques has caused many modern microprocessors to execute instructions at a rate higher than one per machine clock cycle, many limitations still arise, and these imply overall performance. Lower. For purposes of embodiments of the present invention, 1
One important example occurs in architectures that were first developed several years ago and are necessarily limited to two-operand instructions (eg, the X86 architecture). For example, pseudocode for an add instruction in an instruction set of such an architecture may take the following form:

【０００５】[0005]

【数１】ＡＤＤオペランド１、オペランド２命令（１）ADD operand 1, operand 2 Instruction (1)

【０００６】命令（１）は、実行されるとき、下に図式
的に示したように動作する。Instruction (1), when executed, operates as shown diagrammatically below.

【０００７】[0007]

【数２】オペランド１←オペランド１＋オペランド２[Equation 2] Operand 1 ← Operand 1 + Operand 2

【０００８】それゆえ、命令（１）は、オペランド１を
オペランド２に加算し、かつその後の再書き込みステー
ジが結果の和をオペランド１内に記憶する。このような
命令に関していくつかの制限があることに注意された
い。例えば、オペランド１は、ソースオペランドとデス
ティネーションオペランドの両方として動作する、すな
わち、加算オペランドに対する加数の１つがオペランド
１から検索され、かつ、その後、結果の和がオペランド
１にまた記憶される。したがって、１つの欠点はオペラ
ンド１に初めに記憶されていた値が加算の結果によって
書き直されると云うこと、及び、それゆえ、初めの値
が、それをどこかにまず複写しておかない限り、喪失す
るおそれがあると云うことであることに注意されたい。
換言すれば、初めのオペランドの値を保存するために、
先行技術の命令セットは、次のように、全部で３つのオ
ペランドを伴う２つの別個の命令の処理を必要とする。Instruction (1) therefore adds operand 1 to operand 2 and a subsequent rewrite stage stores the sum of the results in operand 1. Note that there are some restrictions on such instructions. For example, operand 1 operates as both a source operand and a destination operand, ie, one of the addends for the addition operand is retrieved from operand 1 and then the resulting sum is also stored in operand 1. Therefore, one disadvantage is that the value originally stored in operand 1 is rewritten by the result of the addition, and therefore, unless the initial value copies it somewhere first, Note that there is a risk of loss.
In other words, to preserve the value of the first operand,
Prior art instruction sets require the processing of two separate instructions with a total of three operands, as follows.

【０００９】[0009]

【数３】ＭＯＶオペランド３、オペランド１命令（２）## EQU00003 ## MOV Operand 3, Operand 1 Instruction (2)

【００１０】[0010]

【数４】ＡＤＤオペランド３、オペランド２命令（３）ADD operand 3, operand 2 instruction (3)

【００１１】命令（２）及び（３）は、実行されると
き、下に図式的に示したように動作する。Instructions (2) and (3), when executed, operate as shown schematically below.

【００１２】[0012]

【数５】オペランド３←オペランド１[Equation 5] Operand 3 ← Operand 1

【００１３】[0013]

【数６】オペランド３←オペランド３＋オペランド２[Equation 6] Operand 3 ← Operand 3 + Operand 2

【００１４】したがって、命令（２）は、オペランド１
の値をオペランド３内へ複写し、かつ、その後、命令
（３）が上掲の命令（１）の場合と同じ和を計算するた
めに加算を遂行するが、しかしオペランド１及び２では
なくてオペランド３及び２を使用する。命令（３）によ
って同じ和が計算されるのは、この時点で命令（３）の
オペランド２及び３がオペランド２及び１が有するのと
同じ値を有するゆえであり、これはオペランド１及び３
の値が命令（２）に起因して同じであるからである。そ
れゆえ、加算動作をさせ、かつ両被加算値を保存させる
には、全部で２つの異なる命令と共に、全部で３つのオ
ペランドが必要とされる。Therefore, the instruction (2) has the operand 1
Is copied into operand 3 and then instruction (3) performs an addition to calculate the same sum as in instruction (1) above, but instead of operands 1 and 2, Operands 3 and 2 are used. The same sum is calculated by instruction (3) because at this point operands 2 and 3 of instruction (3) have the same values as operands 2 and 1 have.
Is the same due to instruction (2). Therefore, a total of three operands are required, along with a total of two different instructions, to cause the addition operation and to preserve both augmented values.

【００１５】極く最近数年の間に、縮小命令セットコン
ピュータ（以下、ＲＩＳＣと称する）アーキテクチャの
ようなマイクロプロセッサアーキテクチャが、３つのオ
ペランド命令を含むようになった。例えば、多くのＲＩ
ＳＣアーキテクチャにおける命令は、次のような一般書
式を有する。In the last few years, microprocessor architectures, such as reduced instruction set computer (RISC) architectures, have included three operand instructions. For example, many RI
Instructions in the SC architecture have the following general format:

【００１６】[0016]

【数７】ＡＤＤオペランド３、オペランド２、オペランド１命令（４）ADD operand 3, operand 2, operand 1 Instruction (4)

【００１７】ほとんどのＲＩＳＣ機械において、オペラ
ンド３はデスティネーションオペランドであるのに対し
て、オペランド２及び１はソースオペランドである。そ
れゆえ、命令（４）の場合、この命令の実行はオペラン
ド１及び２を被加算数に対するソースオペランドとする
一方、その和はデスティネーションとしてのオペランド
３に記憶される。これらのオペランドの順序は変わって
よいので、オペランド１又はオペランド２のどちらかが
デスティネーションでありかつオペランド３がソースに
なることもできる。しかし、いずれにしても、かつ古い
アーキテクチャと異なり、３オペランドアーキテクチャ
内の単一オペランドはソースオペランド及びデスティネ
ーションオペランドの両方として働くことはないことに
注意されたい。In most RISC machines, operand 3 is the destination operand, while operands 2 and 1 are the source operands. Thus, for instruction (4), execution of this instruction makes operands 1 and 2 the source operands for the augend, while the sum is stored in operand 3 as the destination. Since the order of these operands can be changed, either operand 1 or operand 2 can be the destination and operand 3 can be the source. However, it should be noted that in any case, and unlike older architectures, a single operand in a three-operand architecture will not serve as both a source operand and a destination operand.

【００１８】したがって、ＲＩＳＣアーキテクチャは３
オペランド命令を可能にするものの、この命令は或る決
まったアーキテクチャ内でかつ上掲の書式に従ってのみ
利用可能である。しかしながら、現在、このような命令
又はこれに匹敵する命令であっても含まない多くの他の
アーキテクチャが存在する。例えば、Ｘ８６アーキテク
チャは、現在非常に大きな市場占有率を誇っているが、
それでも上掲のような命令を含まない。実際、全く新し
いオプコード（ｏｐｃｏｄｅ）に基づいてこのようなア
ーキテクチャに非常に多くの命令を付け加えることはデ
コードハードウェアに可なりの負担になるであろうし、
かつまたこのようなアーキテクチャ内に残されたオプコ
ード空間の量を、超えることはなくても、欠乏させるで
あろう。Therefore, the RISC architecture has three
Although allowing for an operand instruction, this instruction is only available in certain architectures and according to the format described above. However, there are currently many other architectures that do not include such instructions or even comparable instructions. For example, the X86 architecture currently boasts a very large market share,
Still does not include such orders. In fact, adding too many instructions to such an architecture based on a completely new opcode would place a considerable burden on the decoding hardware,
And also will deplete, if not exceed, the amount of opcode space left in such an architecture.

【００１９】[0019]

【発明が解決しようとする課題】上述の見地から、現在
普及しているアーキテクチャにビット匹敵性であり、か
つ、命令デコーディング及びオプコード空間のような考
慮に基づいて、複雑性が増すのを最少限に抑制する３オ
ペランド命令アーキテクチャを開発する必要が生じてい
る。From the above point of view, it is bit comparable to currently prevalent architectures and minimizes the increase in complexity based on considerations such as instruction decoding and opcode space. There is a need to develop a three-operand instruction architecture that minimizes this.

【００２０】[0020]

【課題を解決するための手段】本発明の好適実施は、複
数の命令を処理する方法、回路及びシステムに関する。
１実施例の方法においては、複数の命令の中から１つの
命令を受け、この命令は第１オペランド及び第２オペラ
ンドを含む。次に、受けた命令が第３オペランドを識別
するオペランドプレフィックスを含むかどうか判定す
る。受けた命令がオペランドプレフィックスを含むと云
う判定に応答して、第１オペランド、第２オペランド、
及び第３オペランドの中から選択された２つのオペラン
ドを使用して、かつ第１オペランド、第２オペランド、
及び第３オペランドのうちの１つが選択されないように
して、命令を実行して結果を発生する。次に、この結果
を、選択されなかったオペランド内に記憶する。他の回
路、システム、及び方法もまた開示されかつ特許請求の
範囲に掲げられる。SUMMARY OF THE INVENTION A preferred embodiment of the present invention is directed to a method, circuit and system for processing a plurality of instructions.
In one embodiment, a method receives one instruction from a plurality of instructions, the instruction including a first operand and a second operand. Next, it is determined whether the received instruction includes an operand prefix identifying the third operand. Responsive to determining that the received instruction includes an operand prefix, the first operand, the second operand,
And two operands selected from among a third operand and a first operand, a second operand,
And executing one of the instructions so that one of the third operands is not selected and produces a result. The result is then stored in the operands that were not selected. Other circuits, systems, and methods are also disclosed and claimed.

【００２１】[0021]

【発明の実施例の形態】図１は、３オペランド命令１０
を線図的に示す。命令１０のコード化は４つの部分に分
解され、これらの部分にはプレフィックス１２、オプコ
ード１４、第１オペランド１６、及び第２オペランド１
８がある。下に述べるように、命令１０は追加情報を含
むことがあり、これらの情報の中には他の型式のプレフ
ィックスがあるが、簡単目的のために、図１には示され
ていない。命令１０の書式内にプレフィックス１２を含
むことは、単一命令内に全部で３つのオペランドを生
じ、これらのオペランドにはオペランド１６、オペラン
ド１８ばかりでなくプレフィックス１２と関連した第３
オペランドがある。次いで、これらのオペランドに説明
を移すして、次の議論では、まず、オプコード１４に続
くオペランド１６及び１８を、次いでプレフィックス１
２に関連した第３オペランドを取り扱う。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS FIG.
Is shown diagrammatically. The encoding of instruction 10 is broken down into four parts, which include prefix 12, opcode 14, first operand 16, and second operand 1
There are eight. As described below, the instruction 10 may include additional information, among which there are other types of prefixes, which are not shown in FIG. 1 for simplicity. Including the prefix 12 in the format of the instruction 10 results in a total of three operands in a single instruction, which operands 16 and 18 as well as the third associated with the prefix 12
There is an operand. Turning now to these operands, in the following discussion, first the operands 16 and 18 following the opcode 14 and then the prefix 1
Handle the third operand associated with 2.

【００２２】１実施例においては、オペランド１６及び
オペランド１８は、好適にはソースオペランドであり、
既知のＸ８６命令セットにおけるのと同様にして指定さ
れる。本明細書中で使用される場合、単にオペランドと
云う用語は、ソースオペランド又はデスティネーション
オペランドのどちらかを表示することがある。ソースオ
ペランドが動作を遂行するためにそれからデータが検索
されるオペランドであるのに対して、デスティネーショ
ンオペランドは動作の結果としてそれにデータが記憶さ
れるオペランドである。更に、ソースオペランドは、動
作用の実際データ（すなわち、即値データ）を表すか、
又は実際データを記憶するプロセッサリソース（例え
ば、レジスタ又はメモリ）を指定することがある。それ
ゆえ、オペランド１６及び１８が、好適には、Ｘ８６命
令セットと同じようにして確立される例においては、オ
ペランド１６は（上掲の命令（１）関する場合のよう
に）ソースオペランド及びデスティネーションオペラン
ドの両方であることがあるのに対して、オペランド１８
は典型的にソースオペランドに限られる。上の説明が与
えられた所で、或る例を考えてみよう。まず、２オペラ
ンドＡＤＤ即値データ命令に対して、オペランド１６は
ソースオペランド及びデスティネーションオペランドの
両方でありかつプロセッサリソースを指定するのに対し
て、オペランド１８はオペランド１６によって指定され
たリソース内に記憶されたデータに関して作用を受ける
即値データである。他の例として、２オペランドＡＤＤ
レジスタ対レジスタ命令に対して、オペランド１６はや
はりソースオペランド及びデスティネーションオペラン
ドの両方である、すなわち、ソースとして、オペランド
１６はソースオペランド１８によって指定されたリソー
ス内のデータに加算されるデータを記憶するリソースを
指定し、かつデスティネーションオペランドとしてオペ
ランド１６はその和を記憶することになるリソースを指
定する。なお更に、オペランド１６及び１８は図１に一
般的な意味で示されているが、これらのオペランドは、
１つの以上の変位バイトと共に、ＭｏｄＲＭバイト及び
（又は）ＳＩＢ（ｓｃａｌｅ, ｉｎｄｅｘ, ｂａｓｅ、
すなわち、目盛、指標、基準）バイトのような既知の変
動値を使用して、指定されることがある。In one embodiment, operands 16 and 18 are preferably source operands;
Specified as in the known X86 instruction set. As used herein, the term simply operand may refer to either a source operand or a destination operand. The source operand is the operand from which data is retrieved to perform the operation, while the destination operand is the operand whose data is stored as a result of the operation. Further, the source operand may represent actual data for the operation (ie, immediate data), or
Alternatively, a processor resource (eg, a register or a memory) that stores actual data may be specified. Therefore, in the example where operands 16 and 18 are preferably established in the same way as the X86 instruction set, operand 16 is the source operand and destination (as with instruction (1) above). Operand 18 can be both operands
Is typically limited to the source operand. Given the above explanation, consider an example. First, for a two-operand ADD immediate data instruction, operand 16 is both a source operand and a destination operand and specifies a processor resource, while operand 18 is stored in the resource specified by operand 16. Immediate data that is affected by the data. As another example, two-operand ADD
For register-to-register instructions, operand 16 is also both a source operand and a destination operand, ie, as a source, operand 16 stores data that is added to the data in the resource specified by source operand 18. Operand 16 specifies a resource, and as a destination operand, specifies a resource to store the sum. Still further, while operands 16 and 18 are shown in a general sense in FIG.
ModRM bytes and / or SIBs (scale, index, base,
That is, it may be specified using a known variable value such as a scale, index, reference) byte.

【００２３】プレフィックス１２に説明を移すと、これ
は第３オペランドを表示するのに充分な情報をコード化
する。プレフィックス１２は、２つ以上のバイトの長さ
を有してよいことに注意されたい。１実施例において
は、プレフィックス１２内に含まれる第３オペランドは
デスティネーションオペランドであり、したがって、次
の例は、そのより深い理解のためにデスティネーション
オペランドを含むプレフィックスに係わる。しかしなが
ら、下に説明されるように、プレフィックス１２内に含
まれるオペランドがソースオペランドである代替実施例
もまた、この技術の習熟者によって理解可能であり、か
つ上に説明したように即値データ又はプロセッサリソー
スを指定することがある。ソースオペランドとしてのプ
レフィックスオペランドの場合、プロセッサリソース
は、直接識別されるか、又は上掲のＭｏｄＲＭシステム
及び（又は）ＳＩＢシステムに基づくような或る代替方
式によって識別されるかどちらかである。いずれにして
も、下に詳細に説明されるように、命令１０のデコーデ
ィング及び実行は、単一命令で以て３オペランド動作を
実施できるようにする。Turning now to prefix 12, this encodes enough information to represent the third operand. Note that prefix 12 may have a length of more than one byte. In one embodiment, the third operand included within prefix 12 is the destination operand, so the following example involves a prefix that includes the destination operand for a deeper understanding thereof. However, as described below, alternative embodiments in which the operands contained within prefix 12 are source operands are also comprehensible by those skilled in the art and, as described above, may be immediate data or processor. Resources may be specified. In the case of a prefix operand as a source operand, processor resources are either identified directly or by some alternative scheme, such as based on the ModRM and / or SIB systems described above. In any case, decoding and execution of instruction 10, as described in detail below, enables a three-operand operation to be performed with a single instruction.

【００２４】図２は、本発明の方法の実施例の流れ図２
０を示す。これは、図１に示された命令１０のような命
令に応答する。ステップ２２は、典型的な逐次様式に配
置された命令系列からの命令をプロセッサパイプライン
に受ける。例えば、ステップ２２は、図１の命令１０を
受けるか、又はこれに代えてプレフィックスを持たな
い、又は或る他の動作を意図したプレフィックスを持つ
或る他の型式の命令を受ける。いずれにしても、命令を
受けるタイミングは、プログラムコードの順序によって
授けられるが、技術上知られているように、命令取り出
し、デコーディング、実行、及びその他のステップは同
時に起こるか又は順序に従わないことさえある。例え
ば、いくつかの命令は、実際に、一度に取り出されるこ
とがある。他の例としては、スーパスカラ動作におい
て、或る種の命令は、同時に実行されるか又はそれらの
逐次順序に従わないことさえある。したがって、ステッ
プ２２は、命令がともかく検索され、次いで後続のステ
ップに従って分析されることを単に表す。FIG. 2 is a flowchart 2 of an embodiment of the method of the present invention.
Indicates 0. It responds to instructions such as instruction 10 shown in FIG. Step 22 receives instructions from a sequence of instructions arranged in a typical sequential manner into a processor pipeline. For example, step 22 receives instruction 10 of FIG. 1, or alternatively receives some other type of instruction that has no prefix or has a prefix intended for some other operation. In any case, the timing of receiving instructions is dictated by the order of the program code, but as is known in the art, instruction fetching, decoding, execution, and other steps may occur simultaneously or out of order. There are even things. For example, some instructions may actually be fetched at once. As another example, in superscalar operation, certain instructions may be executed concurrently or may not even follow their sequential order. Thus, step 22 simply represents that the instruction is retrieved anyway and then analyzed according to the subsequent steps.

【００２５】ステップ２４は、ステップ２２において受
けた命令が命令プレフィックスを含むかどうかを判定す
る。好適には、ステップ２４は、動作のプレデコードス
テージ中に起こり、かつ既知のデコーディング技術を使
用して達成されることがある。事実、本実施例の１つの
利点は、種々の命令セットが、この中にはＸ８６命令セ
ットがあるが、第３オペランドを指定する以外の目的に
使用されるプレフィックスを既に含むと云うことであ
る。それゆえ、これらの命令セットに基づくマイクロプ
ロセッサは、プレフィックスが存在するどうかを判定し
かつこのようにしてプレフィックスに出会うならばこれ
をデコードするための充分なハードウェアを既に含んで
いるはずである。それゆえ、この技術の習熟者は、この
ようなシステムを、ハードウェアの追加をほとんど要せ
ずに、本明細書に説明された命令プレフィックスをデコ
ードするように更に修正することができる。説明をステ
ップ２４に戻し、もし判定が否定であるならば、流れは
ステップ２２へ復帰して、次の逐次命令を分析する。ス
テップ２２への復帰は本実施例の目的のためである、す
なわち、３オペランドプレフィックス命令に応答するた
めであることに注意されたい。それゆえ、実際には、多
くの他のステップ又は方法が命令プレフィックスを含ま
ない命令に対して生ずることがあるが、このような方法
をここで詳細に取り扱う必要はない。Step 24 determines whether the instruction received in step 22 includes an instruction prefix. Preferably, step 24 occurs during the pre-decode stage of the operation and may be accomplished using known decoding techniques. In fact, one advantage of this embodiment is that the various instruction sets, including the X86 instruction set, already include a prefix used for purposes other than specifying the third operand. . Therefore, microprocessors based on these instruction sets should already contain enough hardware to determine if a prefix is present and to decode it if it is encountered. Therefore, those skilled in the art can further modify such a system to decode the instruction prefixes described herein with little additional hardware. Returning to step 24, if the answer is no, flow returns to step 22 to analyze the next sequential instruction. Note that the return to step 22 is for the purposes of the present embodiment, ie, in response to a three-operand prefix instruction. Thus, in practice, many other steps or methods may occur for instructions that do not include an instruction prefix, but such methods need not be dealt with in detail here.

【００２６】ステップ２４に戻り、もしこのステップの
判定が肯定ならば、流れはステップ２６へと進む。ステ
ップ２６は、そのプレフィックスを分析してその型式を
判定する。もし命令プレフィックスが第３オペランドを
指定するならば、流れはステップ２８へ進む。他方、も
し命令プレフィックスが第３オペランド以外のプレフィ
ックスを指定するならば、流れはステップ３０へ進む。Returning to step 24, if the determination in this step is affirmative, flow proceeds to step 26. Step 26 analyzes the prefix to determine the type. If the instruction prefix specifies a third operand, flow proceeds to step 28. On the other hand, if the instruction prefix specifies a prefix other than the third operand, flow proceeds to step 30.

【００２７】ステップ３０は、単に他の既知のプレフィ
ックスに応答して動作する。例えば、Ｘ８６アーキテク
チャにおいて、ＲＥＰ、ＲＥＰＥ／ＲＥＰＺ、ＲＥＰＮ
Ｅ／ＲＥＰＮＺ、又はＬＯＣＫのような命令プレフィッ
クスに、ステップ２６において出会うこともあり、この
場合流れはステップ３０及び技術上既知のプレフィック
スに応答する他のステップ（図示されていない）へ進
む。やはり、これらの型式の命令プレフィックスは３オ
ペランドプレフィックスでないので、かつ更にこのよう
な命令プレフィックスの処理は技術上既知であるゆえ
に、それらに係わる方法をここで詳細に取り扱う必要は
ない。いずれにしても、いったん命令が先行技術により
完遂されると、流れはステップ３０からステップ２２へ
復帰して、次に受けた命令を処理する。Step 30 operates solely in response to other known prefixes. For example, in the X86 architecture, REP, REPE / REPZ, REPN
An instruction prefix, such as E / REPNZ or LOCK, may be encountered at step 26, in which case the flow proceeds to step 30 and other steps (not shown) responsive to prefixes known in the art. Again, because these types of instruction prefixes are not three-operand prefixes, and furthermore, the processing of such instruction prefixes is well known in the art, so the methods involved need not be dealt with in detail here. In any event, once the instruction is completed by the prior art, flow returns from step 30 to step 22 to process the next received instruction.

【００２８】説明をステップ２６へ戻すと、もしこのス
テップの判定が肯定ならば、流れはステップ２８へ進
む。ステップ２８は、第３オペランドプレフィックスが
提供する追加情報に従って第３オペランドプレフィック
ス命令を完遂する。このような命令の完遂の例、及び先
行技術との明確な相違を証明するために、初めのオペラ
ンドを保存し、第１オペランド及び第２オペランド上の
演算の結果である第３オペランドを最終的に記憶するの
に２つの命令が必要とされる上掲の先行技術の例を想起
されたい。Referring back to step 26, if the determination in this step is affirmative, flow proceeds to step 28. Step 28 completes the third operand prefix instruction according to the additional information provided by the third operand prefix. To demonstrate an example of the completion of such an instruction, and a clear difference from the prior art, the first operand is preserved and the third operand, which is the result of the operation on the first and second operands, is finalized. Recall the prior art example above, where two instructions are required to be stored in the prior art.

【００２９】[0029]

【数８】ＭＯＶオペランド３、オペランド１命令（２）## EQU8 ## MOV Operand 3, Operand 1 Instruction (2)

【００３０】[0030]

【数９】ＡＤＤオペランド３、オペランド２命令（３）ADD operand 3, operand 2 instruction (3)

【００３１】しかしながら、本実施例においては、図１
の命令１０は、次の命令（４）によって示されるような
形を呈することになろう。However, in this embodiment, FIG.
Instruction 10 will take the form as shown by the following instruction (4).

【００３２】[0032]

【数１０】オペランド３、ＡＤＤオペランド１、オペランド２命令（４）## EQU10 ## Operand 3, ADD operand 1, Operand 2 Instruction (4)

【００３３】それゆえ、図１の命令１０を書式に表した
命令（４）を用いると、プレフィックス１２はオペラン
ド３を含み、このオペランドはオプコード動作の結果を
記憶するデスティネーションリソースを指定するのに対
して、オペランド１６及び１８はソースオペランドであ
る。したがって、現行の例では、プレフィッス１２は、
オペランド１及び２によって表されるデータの和を受け
るオペランド３を指定する。やはり、オペランド２は、
即値データであるか又はプロセッサリソースを指定する
オペランドであるのに対して、オペランド１は、好適に
は、プロセッサリソースを指定する。この動作を達成す
るために、オペランドステージはオペランド１及び２に
アクセスしてソースオペランドを検索し、かつ実行ユニ
ットが先行技術の再書き込み動作と異なるやり方で制御
される。特に、加算の結果をデスティネーションオペラ
ンドとしてのオペランド１に書き込む代わりに、この結
果をプレフィックスオペランドによって識別されたリソ
ースに記憶する。したがって、もし命令（３）が「オペ
ランド３」を含まなかったとしたならば、適当な実行ユ
ニットがオペランド１及び２を加算しかつその結果をオ
ペランド１に（又はもしオペランド１がレジスタであっ
たとしかつレジスタのリネーミング（ｒｅｎａｍｉｎ
ｇ）が実行されていたとしたならばリネームされたレジ
スタ）に記憶することであろう。しかしながら、これと
は全く対照的に、命令（３）の完遂は、記憶ユニットが
結果をオペランド１ではなくオペランド３に書き込むよ
うに実施される。更に、プレフィックスデコードアーキ
テクチャが本場合には存在するばかりでなく、記憶回路
が本場合には利用可能であるために、先行技術ならば２
つのオペランドに関してのみしか働かないであろう所で
も比較的僅かのハードウェアの追加を要するだけで第３
オペランドを含ませることが可能であることを、この技
術の習熟者ならば承知するはずである。それゆえ、僅か
に複雑性を増すだけで、命令（４）のより効率的な実行
及び達成を可能にし、その上、先行技術に比べて命令を
５０％減少させることができる。Thus, using instruction (4), which is a form of instruction 10 of FIG. 1, prefix 12 includes operand 3, which specifies the destination resource that stores the result of the opcode operation. In contrast, operands 16 and 18 are source operands. Thus, in the current example, the prefix 12 is
Operand 3 receives the sum of the data represented by operands 1 and 2. Again, operand 2 is
Operand 1 preferably specifies a processor resource, whereas it is immediate data or an operand that specifies a processor resource. To accomplish this operation, the operand stage accesses operands 1 and 2 to retrieve the source operand, and the execution unit is controlled differently than prior art rewrite operations. In particular, instead of writing the result of the addition to operand 1 as the destination operand, the result is stored in the resource identified by the prefix operand. Thus, if instruction (3) did not include "operand 3", the appropriate execution unit would add operands 1 and 2 and place the result in operand 1 (or if operand 1 was a register). And register renaming
If g) had been performed, it would be stored in a renamed register). However, in sharp contrast, completion of instruction (3) is implemented such that the storage unit writes the result to operand 3 instead of operand 1. Furthermore, not only is a prefix decoding architecture present in this case, but also because the storage circuit is available in this case, 2
Where only three operands would work, only a little additional hardware is needed
Those skilled in the art will recognize that operands can be included. Therefore, with only a small increase in complexity, it is possible to execute and achieve the instruction (4) more efficiently, while reducing the instruction by 50% compared to the prior art.

【００３４】上掲の実施例は、再書込み中にアクセスさ
れるデスティネーションオペランドを指定する命令プレ
フィックスを考える一方、２つのソースオペランドは先
行技術におけるようにオプコードに続きかつオプコード
ステージ中アクセスされると云うことに注意されたい。
この好適実施例は、２オペランド命令を取り扱うように
既に構成されている機械に対してアーキテクチャに最少
限の変更しか生じない。更に、代替実施例では、図１の
命令１０は追加ビットを含むこともでき、又は（フラグ
等のような）命令の外部の指定を受けることもでき、こ
れによって、もしその指定が或るやり方でセットされる
ならば第３オペランドプレフィックスが無視され、この
場合、そのプレフィックスは考慮されずかつその命令が
先行技術におけるように２オペランドの意味に従って動
作する。なお更に、命令プレフィックスを有する好適実
施例がデスティネーションオペランドを識別するのにか
かわらず、この技術の習熟者ならば、上掲の実施例を修
正して、命令プレフィックスがデスティネーションオペ
ランドの代わりにソースオペランドを指定し、かつ命令
オプコードに続くオペランドの１つがデスティネーショ
ンオペランドを指定するような代替実施例を提供するこ
ともできる。このような代替実施例において、かつソー
スオペランドとしてのプレフィックスオペランドを用い
て、プレフィックスオペランドは、即値データを含む
か、又は命令に対するデータを記憶するプロセッサリソ
ースを指定するかのどちかをできる。それゆえ、オペラ
ンドステージ中、２つのオペランドの１つは命令プレフ
ィックスオペランドからアクセスされ、及びその後の再
書込みステージが結果をオプコードの後でその命令内に
配置された２つのオペランドの１つに記憶するであろ
う。The above embodiment considers an instruction prefix that specifies the destination operand to be accessed during rewriting, while the two source operands follow the opcode and are accessed during the opcode stage as in the prior art. Note that
This preferred embodiment results in minimal changes to the architecture for machines already configured to handle two-operand instructions. Further, in an alternative embodiment, the instruction 10 of FIG. 1 may include additional bits, or may be specified externally to the instruction (such as a flag or the like), so that if the specification , The third operand prefix is ignored, in which case the prefix is not considered and the instruction operates according to the two-operand meaning as in the prior art. Still further, regardless of the preferred embodiment having an instruction prefix identifying the destination operand, those skilled in the art will be able to modify the above embodiment so that the instruction prefix is replaced by the source operand instead of the destination operand. Alternative embodiments may be provided in which the operands are specified and one of the operands following the instruction opcode specifies the destination operand. In such alternative embodiments, and with the prefix operand as the source operand, the prefix operand can either contain immediate data or specify a processor resource that stores data for the instruction. Therefore, during the operand stage, one of the two operands is accessed from the instruction prefix operand, and a subsequent rewrite stage stores the result in one of the two operands located within the instruction after the opcode. Will.

【００３５】上に説明した実施例は、Ｘ８６マイクロプ
ロセッサ内にいままで存在しなかった少なくとも２つの
命令書式を生じることにまた注意されたい。例えば、次
の２つの命令の図式的指定を考えよう。It should also be noted that the embodiments described above result in at least two instruction formats not previously present in the X86 microprocessor. For example, consider the schematic designation of the following two instructions.

【００３６】[0036]

【数１１】メモリ←（レジスタ）オペランド（即値データ）命令（５）[Equation 11] Memory ← (register) operand (immediate data) Instruction (5)

【００３７】[0037]

【数１２】メモリ←（レジスタ）オペランド（レジスタ）命令（６）[Expression 12] Memory ← (register) Operand (register) Instruction (6)

【００３８】命令（５）は、本実施の下で、メモリ場所
が、命令プレフィッス内のデスティネーションオペラン
ドとして指定され、かつレジスタ内の第１値と即値デー
タである第２値との間の演算の結果を受けることになる
のを図式的に示す。やはり、先行Ｘ８６命令セットであ
ったならば、こうではなくて、演算結果をソースオペラ
ンドである同じレジスタに（又は適当なリネームレジス
タに）記憶し、かつ第２命令がそのメモリ場所にソース
レジスタの内容を複写することを要求することであろ
う。同様に、命令（６）は、本実施例の下で、メモリ場
所が命令プレフィックス内のデスティネーションオペラ
ンドとして指定され、かつレジスタ内の第１値とこのレ
ジスタと異なるレジスタ内の第２値との間の演算結果を
受けることになるのを図式的に示す。やはり、先行技術
ならば、この動作を達成するために２つの命令を普通必
要とする。Instruction (5), under this implementation, specifies that the memory location is specified as the destination operand in the instruction prefix, and that the operation between the first value in the register and the second value, which is immediate data, is performed. Is shown schematically to receive the results of Again, if it was the preceding X86 instruction set, instead, store the result of the operation in the same register that is the source operand (or in an appropriate rename register), and store the second instruction in its memory location in the source register. Would require copying the contents. Similarly, instruction (6), under this embodiment, specifies that the memory location is specified as the destination operand in the instruction prefix, and that the first value in the register is different from the second value in a register different from this register. It is shown schematically that the result of the operation is received. Again, the prior art typically requires two instructions to accomplish this operation.

【００３９】上の実施例を説明した所で、図３はこの実
施例を含むことができるマイクロプロセッサ実施例のブ
ロック図を示す。図３を参照して、好適実施例が中に実
現される例示のスーパスカラパイプライン化マイクロプ
ロセッサ１１０を含む例示のデータ処理システム１０２
を説明する。云うまでもなく、本実施例が種々のアーキ
テクチャのマイクロプロセッサに利用されると考えられ
るので、システム１０２の及びマイクロプロセッサ１１
０のアーキテクチャは、ここではただ例として説明され
る。したがって、本明細書を参照したならば、この技術
の通常の習熟者は本実施例をこのような他のマイクロプ
ロセッサアーキテクチャ内で容易に実現することができ
ると考えられる。Having described the above embodiment, FIG. 3 shows a block diagram of a microprocessor embodiment that can include this embodiment. Referring to FIG. 3, an exemplary data processing system 102 including an exemplary superscalar pipelined microprocessor 110 in which the preferred embodiment is implemented.
Will be described. Of course, it is contemplated that the present embodiment may be utilized with microprocessors of various architectures, so that the system 102 and the microprocessor 11
The 0 architecture is described here only as an example. Accordingly, it is believed that, with reference to the present specification, those of ordinary skill in the art will be able to readily implement the present embodiments within such other microprocessor architectures.

【００４０】図３に示されたマイクロプロセッサ１１０
は、バスＢを介して他のシステムデバイスに接続され
る。この例では、バスＢは単一バスとして示されている
が、バスＢはＰＣＩローカルバスアーキテクチャを利用
する従来のコンピュータ内で既知のように、異なる速度
及びプロトコルを有する多数バスも表すことを、もちろ
ん考えており、単一バスＢはここでは単に例としてかつ
簡単のために示されている。システム１０２は、通信ポ
ート１０３（モデムポート、モデム、ネットワークイン
タフェース等を含む）、グラフィックディスプレイシス
テム１０４（ビデオメモリ、ビデオプロセッサ、グラフ
ィックモニタを含む）、典型的にダイナミックランダム
アクセスメモリ（ＤＲＡＭ）によって実現されかつスタ
ック１０７を含む主メモリシステム１０５、入力装置１
０６（キボード、位置入力装置、及びこれらに対するイ
ンタフェース回路を含む）、及びディスクシステム１０
８（ハードディスク装置、フロッピーディスク装置、及
びＣＤ−ＲＯＭ駆動装置を含む）のような従来のサブシ
ステムを含む。したがって、図３のシステム１０２は、
技術上いま普及しているような、従来のデスクトップコ
ンピュータ又はワークステーションに相当すると考えら
れる。もちろん、この技術の通常の熟練者が承知するよ
うに、マイクロプロセッサ１１０の他のシステム実現も
また、本実施例から受益することができる。The microprocessor 110 shown in FIG.
Are connected to other system devices via a bus B. In this example, bus B is shown as a single bus, but bus B also represents multiple buses with different speeds and protocols, as is known in conventional computers utilizing the PCI local bus architecture. Of course, the single bus B is shown here merely as an example and for simplicity. System 102 is implemented by a communication port 103 (including a modem port, a modem, a network interface, etc.), a graphic display system 104 (including a video memory, a video processor, a graphic monitor), and typically a dynamic random access memory (DRAM). And the main memory system 105 including the stack 107 and the input device 1
06 (including a keyboard, a position input device, and an interface circuit therefor) and the disk system 10
8 (including hard disk drives, floppy disk drives, and CD-ROM drives). Thus, the system 102 of FIG.
It is considered to correspond to a conventional desktop computer or workstation, as is now widespread in the art. Of course, other system implementations of microprocessor 110 may also benefit from this embodiment, as will be appreciated by those of ordinary skill in the art.

【００４１】マイクロプロセッサ１１０は、バスＢに接
続されるバスインタフェースユニット（以下、ＢＩＵと
称する）１１２を含み、このユニットはマイクロプロセ
ッサ１１０とシステム１０２内の他の素子との間の通信
を制御しかつ実施する。ＢＩＵ１１２は、この機能を遂
行する適当な制御及びクロック電子回路を含み、動作速
度を向上する書込みバッファを含み、及び内部マイクロ
プロセッサ動作をバスＢタイミング制約と同期させるよ
うにタイミング電子回路を含む。マイクロプロセッサ１
１０は、また、クロック発生及び制御電子回路１２０を
含み、この電子回路は、この例示のマイクロプロセッサ
１１０では、バスＢからのバスクロックに基づいて内部
クロック位相を発生し、この内部クロック位相の周波数
は、この例では、バスクロックの周波数の倍数として選
択的にプログラムされる。The microprocessor 110 includes a bus interface unit (hereinafter, referred to as BIU) 112 connected to the bus B, and controls communication between the microprocessor 110 and other elements in the system 102. And implement. BIU 112 includes appropriate control and clock electronics to perform this function, includes a write buffer to increase operating speed, and includes timing electronics to synchronize internal microprocessor operation with bus B timing constraints. Microprocessor 1
10 also includes clock generation and control electronics 120, which in this exemplary microprocessor 110 generates an internal clock phase based on the bus clock from bus B, Is selectively programmed in this example as a multiple of the frequency of the bus clock.

【００４２】図３で明らかなように、マイクロプロセッ
サ１１０は、内部キャッシュメモリの３つのレベルを有
し、これらのうちの最高のレベル２キャッシュ１１４
は、ＢＩＵ１１２に接続される。この例では、レベル２
キャッシュ１１４は、統一キャッシュであり、かつＢＩ
Ｕ１１２を経由して全てのキャッシュ可能データ及びキ
ャッシュ可能命令を受けるように構成されているので、
マイクロプロセッサ１１０によって発せられるバストラ
フィックの多くがレベル２キャッシュ１１４を経由して
達成される。もちろん、マイクロプロセッサ１１０は、
或るバス読出し及び書込みを「キャッシュ不能」として
取り扱うことによって、キャッシュ１１４を迂回するバ
ストラフィックを実施することもまたある。図３に示さ
れたように、レベル２キャッシュ１１４は、２つのレベ
ル１キャッシュ１１６に接続され、レベル１データキャ
ッシュ１１６ｄはデータに専用され、他方、レベル１命
令キャッシュ１１６ｉは命令に専用される。マイクロプ
ロセッサ１１０による電力消費は、レベル１キャッシュ
１１６の適当な１つのキャッシュ喪失の際に限りレベル
２キャッシュ１１４しかアクセスしないことによって最
低限に抑えられる。更に、データ側に、マイクロキャッ
シュ１１８がレベル０キャッシュとして具備され、かつ
この例では、完全に二重ポートキャッシュである。As can be seen in FIG. 3, the microprocessor 110 has three levels of internal cache memory, of which the highest level 2 cache 114 is
Are connected to BIU 112. In this example, level 2
Cache 114 is a unified cache and BI
It is configured to receive all cacheable data and cacheable instructions via U112,
Much of the bus traffic emitted by microprocessor 110 is accomplished via level 2 cache 114. Of course, the microprocessor 110
Bus traffic bypassing cache 114 may also be implemented by treating certain bus reads and writes as "non-cacheable." As shown in FIG. 3, the level two cache 114 is connected to two level one caches 116, the level one data cache 116d is dedicated to data, while the level one instruction cache 116i is dedicated to instructions. Power consumption by the microprocessor 110 is minimized by accessing only the level 2 cache 114 only during the appropriate one cache loss of the level 1 cache 116. Further, on the data side, a microcache 118 is provided as a level 0 cache, and in this example, is a completely dual port cache.

【００４３】図３に示されかつ上述したように、マイク
ロプロセッサ１１０は、スーパスカラ型のものである。
この例では、多重実行ユニットは、マイクロプロセッサ
１１０内に具備されて、単一命令ポインタエントリに対
して最高４つまでの命令を同時に並列に実行できるよう
にする。これらの実行ユニットは、条件付き飛越し、整
数演算、及び論理演算を処理する２つのＡＬＵ１４
２₀、１４２₁を含み、またＦＰＵ１３０、２つのロー
ド−記憶ユニット１４０₀、１４０₁、マイクロシーケ
ンサ１４８を含む。２つのロード−記憶ユニット１４０
は、マイクロキャッシュ１１８への真の並列アクセスの
ためにこのマイクロキャッシュへの２つのポートを利用
し、かつまたレジスタファイル１３９内のレジスタへの
ロード動作及び記憶動作を遂行する。データマイクロア
ドレス変換バッファ（データμＴＬＢと称する）１３８
が具備されて、従来のやり方で論理アドレスを物理アド
レスに変換する。As shown in FIG. 3 and described above, microprocessor 110 is of the superscalar type.
In this example, multiple execution units are provided within microprocessor 110 to allow up to four instructions to be executed simultaneously in parallel for a single instruction pointer entry. These execution units are two ALUs 14 that handle conditional jumps, integer operations, and logical operations.
2 ₀ , 142 ₁ , and also includes the FPU 130, two load-store units 140 ₀ , 140 ₁ , and a microsequencer 148. Two load-storage units 140
Utilizes two ports to the microcache 118 for true parallel access to the microcache 118, and also performs load and store operations on registers in the register file 139. Data micro address conversion buffer (referred to as data μTLB) 138
To convert a logical address to a physical address in a conventional manner.

【００４４】これらの多重実行ユニットは、多重７ステ
ージパイプラインを介して制御される。これらのステー
ジは、次の通りである。These multiple execution units are controlled via a multiple seven stage pipeline. These stages are as follows.

【００４５】Ｆ取出し：このステージは命令アドレスを発
生しかつ命令キャッシュ又は命令メモリから命令を読み
出す。ＰＤ０プレデコードステージ０：このステージは
最高３つのまでの取り出されたＸ８６型命令の長さ及び
開始位置を決定する。ＰＤ１プレデコードステージ１：このステージは
Ｘ８６命令バイトを抽出し、かつデコードのためにそれ
らを固定長書式に記録する。ＤＣデコード：このステージはＸ８６命令をア
トミック動作（以下、ＡＯｐと称する）に変換する。ＳＣスケジュール：このステージは最高４つま
でのＡＯｐを適当な実行ユニットに割り当てる。ＯＰオペランド：このステージはＡＯｐによっ
て指示されたレジスタオペランドを検索する。ＥＸ実行：このステージはＡＯｐ及び検索され
たオペランドに従い実行ユニットをランさせる。ＷＢ再書込み：このステージは実行の結果をレ
ジスタ又はメモリに記憶する。F Fetch: This stage generates an instruction address and reads an instruction from the instruction cache or instruction memory. PD0 Predecode stage 0: This stage determines the length and start position of up to three fetched X86 type instructions. PD1 Predecode stage 1: This stage extracts the X86 instruction bytes and records them in fixed format for decoding. DC Decode: This stage converts X86 instructions into atomic operations (hereinafter AOps). SC Schedule: This stage assigns up to four AOps to the appropriate execution units. OP Operand: This stage retrieves the register operand pointed to by AOp. EX Execution: This stage runs the execution unit according to the AOp and the retrieved operand. WB rewrite: This stage stores the result of the execution in a register or memory.

【００４６】図３を再び参照すると、上掲のパイプライ
ンステージは、マイクロプロセッサ１１０内の種々の機
能ブロックによって遂行される。取出しユニット１２６
は、命令マイクロアドレス変換バッファ（命令μＴＬＢ
と称する）１２２を介して、命令ポインタから命令アド
レスを発生し、命令μＴＬＢ１２２は従来のやり方で論
理命令アドレスを物理アドレスに変換して、レベル１命
令キャッシュ１１６ｉに供給する。命令キャッシュ１１
６ｉは命令データの流れを発生して取出しユニット１２
６へ供給し、後者は、立ち代わって、命令コードをプレ
デコードステージへ所望の順序で供給する。純理論的な
実行が取出しユニット１２６によって更に下に詳細に説
明されるやり方で、主として制御される。Referring again to FIG. 3, the pipeline stages described above are performed by various functional blocks within microprocessor 110. Extraction unit 126
Is the instruction micro address translation buffer (instruction μTLB
The instruction μTLB 122 converts the logical instruction address to a physical address in a conventional manner and supplies it to the level 1 instruction cache 116i. Instruction cache 11
6i generates the instruction data flow and outputs the
6, which in turn supplies the instruction codes to the predecode stage in the desired order. Purely theoretical execution is controlled primarily by the retrieval unit 126 in a manner described in further detail below.

【００４７】命令のプレデコーディングは、マイクロプ
ロセッサ１１０内の２つの部分、すなわち、プレデコー
ド０ステージ１２８及びプレデコード１ステージ１３２
に分割される。これら２つのステージは、別個のパイプ
ラインステージとして働き、かつ一緒に動作して最高３
つまでのＸ８６命令を位置決めしかつこれらをデコーダ
１３４に供給する。このような訳で、マイクロプロセッ
サ１１０内のパイプラインのプレデコードステージは、
３命令幅である。上述のように、プレデコード０ステー
ジ１２８は、３つまでのＸ８６命令（これらは、もちろ
ん、可変長である）の寸法及び位置を決定し、かつこの
ような訳で、３つの命令認識ユニット（ｒｅｃｏｇｎｉ
ｚｅｒ）から構成され、プレデコード１ステージ１３２
はマルチバイト命令を固定長書式に記録し、デコーディ
ングを容易にする。The pre-decoding of the instruction is performed in two parts within the microprocessor 110: a predecode 0 stage 128 and a predecode 1 stage 132.
Is divided into These two stages work as separate pipeline stages and work together to a maximum of 3
It locates up to one X86 instruction and supplies them to decoder 134. For this reason, the predecode stage of the pipeline in microprocessor 110
3 instruction widths. As described above, the pre-decode 0 stage 128 determines the size and location of up to three X86 instructions (which, of course, are of variable length), and thus, has three instruction recognition units ( recogni
zer) and a predecode 1 stage 132
Records multibyte instructions in a fixed-length format to facilitate decoding.

【００４８】この例におけるデコードユニット１３４は
４つの命令デコーダを含み、これらのデコーダの各々は
プレデコード１ステージ１３２から固定長Ｘ８６命令を
受けかつ１つから３つのＡＯｐを発生する能力を有す
る。ＡＯｐは、ＲＩＳＣ命令と実質的に等価である。４
つのデコーダのうちの３つが並列に動作して、最高９ま
でのＡＯｐをデコードユニット１３４の出力上のデコー
ド待ち行列内に入れてスケジューリングを待機させる。
第４デコーダは、特別な場合に対する予備である。スケ
ジューラ１３６は、デコードユニット１３４の出力上の
デコード待ち行列から最高４つまでのＡＯｐを読み出
し、かつこれらのＡＯｐを適当な実行ユニットに割り当
てる。更に、オペランドユニット１４４は、オペランド
を受けかつ実行のために用意する。図３に示されたよう
に、オペランドユニット１４４は、マルチプレクサ１４
５を経由して、シーケンサ１４４及びマイクロコードＲ
ＯＭ１４６から入力を受け、かつ命令の実行に供される
レジスタオペランドを取り出す。更に、この例によれ
ば、オペランドユニットは、記憶されるように準備を完
了した結果をレジスタに送るためにオペランド転送を遂
行し、かつまたロード及び記憶型のＡＯｐに対してアド
レス発生を遂行する。The decode unit 134 in this example includes four instruction decoders, each of which is capable of receiving fixed length X86 instructions from the predecode one stage 132 and generating one to three AOps. AOp is substantially equivalent to a RISC instruction. 4
Three of the three decoders operate in parallel, placing up to nine AOps in a decode queue on the output of decode unit 134 to await scheduling.
The fourth decoder is a reserve for special cases. Scheduler 136 reads up to four AOps from the decode queue on the output of decode unit 134 and assigns these AOps to the appropriate execution units. In addition, operand unit 144 receives and prepares operands for execution. As shown in FIG. 3, the operand unit 144 includes the multiplexer 14
5, the sequencer 144 and the microcode R
A register operand which receives an input from the OM 146 and is used for execution of an instruction is extracted. Further, according to this example, the operand unit performs an operand transfer to send the result ready to be stored to a register, and also performs address generation for load and store type AOps. .

【００４９】マイクロシーケンサ１４８は、マイクロコ
ードＲＯＭ１４６と組み合わさって、一般に１サイクル
内で実行する最新ＡＯｐであるマイクロコードエントリ
ＡＯｐの実行に当たってＡＬＵ１４２及びロード−記憶
ユニッ１４０を制御する。この例では、マイクロシーケ
ンサは１４８は、マイクロコード化マイクロ命令に対す
るこの制御を実施するためにマイクロコードＲＯＭ１４
６内に記憶されたこれらのマイクロ命令を通じて逐次制
御する。マイクロコード化マイクロ命令の例としては、
マイクロプロセッサ１１０の場合、複素命令又は稀に使
用されるＸ８６命令、すなわち、セグメントレジスタ又
は制御レジスタを修正する命令であって例外及び割込み
を取り扱うＸ８６命令、及び（ＲＥＰ命令、及び全ての
レジスタをプッシュ（ＰＵＳＨ）及びポップ（ＰＯＰ）
する命令のような）マルチサイクル命令がある。The microsequencer 148, in combination with the microcode ROM 146, controls the ALU 142 and the load-store unit 140 in executing the microcode entry AOp, which is typically the latest AOp to execute in one cycle. In this example, the microsequencer 148 has a microcode ROM 14 to implement this control on microcoded microinstructions.
Control is performed sequentially through these microinstructions stored in 6. Examples of microcoded microinstructions include:
For the microprocessor 110, a complex instruction or a rarely used X86 instruction, i.e., an instruction to modify a segment or control register that handles exceptions and interrupts, and (REP instruction and push all registers) (PUSH) and pop (POP)
There are multi-cycle instructions (like those that do).

【００５０】マイクロプロセッサ１１０は、また、ＪＴ
ＡＧ走査試験及び或る内蔵自己試験機能の動作を制御し
て、製造完了の際、及びリセット又はその他の事象の際
に、マイクロプロセッサ１１０の動作の妥当性を保証す
る電子回路１２４を含む。The microprocessor 110 also has a JT
An electronic circuit 124 is included to control the operation of the AG scan test and certain built-in self-test functions to ensure the validity of operation of the microprocessor 110 upon completion of manufacture and upon reset or other events.

【００５１】図３の説明ばかりでなく先行の図の説明か
ら、この技術の習熟者ならば承知するように、図１及び
図２を使って説明された機能を達成する回路実施例を図
３に示したのと類似の構成要素内に組み込むこともでき
る。例えば、命令のデコーディングは、プレデコードス
テージ１２８及び１３２ばかりでなくプレデコードステ
ージ１３２を用いて行われる。他の例としては、実行が
ＡＬＵ１４２₀及び１４２₁のような多くの実行ユニッ
トを用いて達成される。更に、他の例として、実行の結
果がレジスタファイル１３９のような多くの異なる記憶
素子、又は主メモリサブシステム１０５に記憶される。
種々の関連機能が、図３に示された適当な電子回路によ
って更に遂行されることがある。As will be appreciated by those skilled in the art from the description of the preceding figures as well as the description of FIG. 3, a circuit embodiment that achieves the functions described with reference to FIGS. Can also be incorporated in components similar to those shown. For example, instruction decoding is performed using predecode stage 132 as well as predecode stages 128 and 132. As another example, execution is achieved using a number of execution units such as ALUs 142 ₀ and 142 _1. Further, as another example, the results of the execution are stored in a number of different storage elements, such as register file 139, or main memory subsystem 105.
Various related functions may be further performed by the appropriate electronics shown in FIG.

【００５２】[0052]

【発明の効果】上の説明から承知されるように、上述の
実施例は設計が複雑になるのを最少限に抑制して、先行
技術を著しく改善する。命令処理は、種々の場合に５０
％減少される。更に、命令プレフィックス内にデスティ
ネーションオペランドとして又はこれに代えてソースオ
ペランドとして指定されたオペランドを有するような、
種々の代替実施例が上に説明された。その他の例も、こ
の技術の熟練者によって、確実に理解可能である。例え
ば、上述の実施例はＸ８６アーキテクチャを益するが、
他のマイクロプロセッサも同様に受益すると云える。し
たがって、本実施を詳細に説明したが、その種々の置
換、修正、又は代替実施例を、前掲の特許請求の範囲に
よって画定された本発明の範囲から逸脱することなく上
に記載された説明から達成することができる。As can be appreciated from the above description, the above-described embodiment significantly improves the prior art with minimal design complexity. Instruction processing is 50 in various cases.
%. Further, such as having an operand specified as a destination operand in the instruction prefix or alternatively as a source operand,
Various alternative embodiments have been described above. Other examples will certainly be comprehensible to those skilled in the art. For example, while the above embodiment benefits from the X86 architecture,
Other microprocessors may benefit as well. Thus, while the invention has been described in detail, various substitutions, modifications, or alternative embodiments thereof may be made from the above description without departing from the scope of the invention, which is defined by the appended claims. Can be achieved.

【００５３】以上の説明に関して更に以下の項を開示す
る。With respect to the above description, the following items are further disclosed.

【００５４】（１）複数の命令を処理するためにプロ
セッサを動作させる方法であって、前記複数の命令の中
から１つの命令を受けるステップであって、前記命令が
第１オペランドと第２オペランドとを含む、前記受ける
ステップ、前記受けた命令がオペランドプレフィックス
を含むかどうか判定するステップであって、前記オペラ
ンドプレフィックスが第３オペランドを識別する、前記
判定するステップ、前記受けた命令がオペランドプレフ
ィックスを含むと云う判定に応答して、結果を発生する
ために前記受けた命令を実行するステップであって、前
記第１オペランド、前記第２オペランド、及び前記第３
オペランドの中から選択された２つのオペランドを使用
し、かつ前記第１オペランド、前記第２オペランド、及
び前記第３オペランドのうち１つが選択されないように
する、前記実行するステップ、及び前記選択されなかっ
たオペランド内に前記結果を記憶するステップを含む方
法。(1) A method for operating a processor to process a plurality of instructions, the method comprising the step of receiving one of the plurality of instructions, wherein the instructions comprise a first operand and a second operand The receiving step, determining whether the received instruction includes an operand prefix, wherein the operand prefix identifies a third operand, the determining step, wherein the received instruction identifies an operand prefix. Executing the received instruction to produce a result in response to the determination that the first operand, the second operand, and the third operand
Using the two operands selected from the operands and preventing one of the first operand, the second operand, and the third operand from being selected; and performing the unselected. Storing the result in an operand.

【００５５】（２）第１項記載の方法において、前記
第３オペランドがデスティネーションオペランドを含
み、かつ前記記憶するステップが前記結果を前記デステ
ィネーションオペランドに記憶することを含む、方法。2. The method of claim 1, wherein said third operand includes a destination operand, and wherein said storing comprises storing said result in said destination operand.

【００５６】（３）第１項記載の方法において、前記
第３オペランドが非レジスタデスティネーションオペラ
ンドを含む、方法。3. The method of claim 1, wherein said third operand comprises a non-register destination operand.

【００５７】（４）第３項記載の方法において、前記
非レジスタデスティネーションオペランドがメモリデス
ティネーションオペランドを含む、方法。(4) The method of claim 3, wherein said non-register destination operand comprises a memory destination operand.

【００５８】（５）第３項記載の方法において、前記
第１オペランドがレジスタオペランドを含みかつ前記第
２オペランドがレジスタオペランドを含む、方法。(5) The method of claim 3, wherein said first operand comprises a register operand and said second operand comprises a register operand.

【００５９】（６）第３項記載の方法において、前記
第１オペランドがレジスタオペランドを含みかつ前記第
２オペランドが即値データ値を含む、方法。The method of claim 3, wherein said first operand comprises a register operand and said second operand comprises an immediate data value.

【００６０】（７）第３項記載の方法において、前記
命令がオプコードと、前記非レジスタデスティネーショ
ンオペランドに続くかつ前記オプコードの前にある少な
くとも１つのオフセットバイトとを更に含む、方法。The method of claim 3, wherein the instruction further comprises an opcode and at least one offset byte following the non-register destination operand and preceding the opcode.

【００６１】（８）第１項記載の方法において、オペ
ランドプレフィックスを含む前記命令がＸ８６命令であ
る、方法。(8) The method of claim 1, wherein said instruction including an operand prefix is an X86 instruction.

【００６２】（９）第１項記載の方法において、オペ
ランドプレフィックスを用いない、前記オペランドオプ
レフィックを含む前記命令の実行が２つのオペランド命
令の実行を含む、方法。(9) The method of claim 1, wherein execution of said instruction without said operand prefix and including said operand prefix comprises execution of a two operand instruction.

【００６３】（１０）複数のＸ８６命令を処理するた
めにプロセッサを動作させる方法であって、前記複数の
命令の中から１つのＸ８６命令を受けるステップであっ
て、前記Ｘ８６命令が第１オペランドと第２オペランド
とを含む、前記受けるステップ、前記受けたＸ８６命令
がオペランドプレフィックスを含むかどうか判定するス
テップであって、前記オペランドプレフィックスが第３
オペランドをデスティネーションオペランドとして識別
する、前記判定するステップ、前記受けた命令がオペラ
ンドプレフィックスを含と云う判定に応答して、結果を
発生するために前記第１オペランドと前記第２オペラン
ドとを使用して前記命令を実行するステップ、及び前記
結果を前記デスティネーションオペランドに記憶するス
テップを含む方法。(10) A method of operating a processor to process a plurality of X86 instructions, the method comprising receiving one X86 instruction from the plurality of instructions, wherein the X86 instruction includes a first operand and And receiving the X86 instruction to determine whether the received X86 instruction includes an operand prefix, wherein the operand prefix includes a third operand.
The determining, identifying the operand as a destination operand, using the first operand and the second operand to produce a result in response to the determination that the received instruction includes an operand prefix. Executing the instruction in the destination operand, and storing the result in the destination operand.

【００６４】（１１）第１０項記載の方法において、
オペランドプレフィックスを用いない、前記オペランド
オプレフィックを含む前記Ｘ８６命令の実行が２つのオ
ペランド命令の実行を含む、方法。(11) The method according to item 10, wherein
The method wherein execution of the X86 instruction including the operand prefix without using an operand prefix comprises execution of a two operand instruction.

【００６５】（１２）複数の命令を処理する方法、回
路、及びシステム。方法実施例２０において、前記方法
は、前記複数の命令の中から１つの命令を受ける（２
２）。次に、前記方法は、前記受けた命令が第３オペラ
ンドを識別するオペランドプレフィックスを含むかどう
か判定する（２４、２６）。前記受けた命令１０がオペ
ランドプレフィックス１２を含むと云う判定に応答し
て、前記方法は、第１オペランド、第２オペランド、及
び前記第３オペランドの中から選択された２つのオペラ
ンドを使用して、かつ前記第１オペランド、前記第２オ
ペランド、及び前記第３オペランドのうちの１つが選択
されないようにして、前記命令を実行して結果を発生す
る（２８）。次に、前記方法は、前記結果を、選択され
なかったオペランド内に記憶する。他の回路、システ
ム、及び方法もまた開示されかつ特許請求の範囲に掲げ
られる。(12) A method, circuit, and system for processing a plurality of instructions. In method embodiment 20, the method receives one instruction from the plurality of instructions (2
2). Next, the method determines whether the received instruction includes an operand prefix identifying a third operand (24, 26). In response to determining that the received instruction 10 includes an operand prefix 12, the method uses two operands selected from a first operand, a second operand, and the third operand, And executing the instruction to produce a result such that one of the first operand, the second operand, and the third operand is not selected (28). Next, the method stores the result in the operands that were not selected. Other circuits, systems, and methods are also disclosed and claimed.

[Brief description of the drawings]

【図１】オペランドプレフィックス、オプコード、及び
オプコードに続く２つのオペランドを有する３オペラン
ド命令を示すダイヤグラム。FIG. 1 is a diagram illustrating a three-operand instruction having an operand prefix, an opcode, and two operands following the opcode.

【図２】３オペランド命令を検出しかつ処理する本発明
の方法の好適実施例の流れ図。FIG. 2 is a flow chart of a preferred embodiment of the method of the present invention for detecting and processing three-operand instructions.

【図３】本発明の方法の好適実施例が中で実施される例
示のデータ処理システムの機能ブロック図。FIG. 3 is a functional block diagram of an exemplary data processing system in which a preferred embodiment of the method of the present invention is implemented.

[Explanation of symbols]

１０３オペランド命令１２プレフィックス１４オプコード１６第１オペランド１７第２オペランド１０２データ処理システム１１０スーパスカラパイプライン化マイクロプロセッ
サ１１２バスインタフェースユニット（ＢＩＵ）１１４レベル２キャッシュ１１６ｄレベル１データキャッシュ１１６ｉレベル１命令キャッシュ１１８マイクロキャッシュ１２０クロック発生及び制御電子回路１２２命令マイクロアドレス変換バッファ（命令μＴ
ＬＢ）１２４ＪＴＡＧ走査試験及び内蔵自己試験制御電子回
路１２６取出しユニット１２８プレデコード０ステージ１３０浮動小数点ユニット１３２プレデコード１ステージ１３４デコードユニット１３６スケジューラ１３８データマイクロアドレス変換バッファ（データ
μＴＬＢ）１３９レジスタファイル１４０₀、１４０₁ ロード−記憶ユニット１４２₀、１４２₁ 多整数算術演算ユニット（ＡＬ
Ｕ）１４４オペランドユニット１４５マルチプレクサ１４６マイクロコードＲＯＭ１４８マイクロシーケンサ10 3 operand instruction 12 prefix 14 opcode 16 first operand 17 second operand 102 data processing system 110 superscalar pipelined microprocessor 112 bus interface unit (BIU) 114 level 2 cache 116 d level 1 data cache 116 i level 1 instruction cache 118 micro Cache 120 Clock generation and control electronics 122 Instruction microaddress translation buffer (instruction μT
LB) 124 JTAG scan test and built-in self-test control electronics 126 Extraction unit 128 Predecode 0 stage 130 Floating point unit 132 Predecode 1 stage 134 Decode unit 136 Scheduler 138 Data micro address conversion buffer (data μTLB) 139 Register file 140 ₀ , 140 ₁ load-storage unit 142 ₀ , 142 ₁ multi-integer arithmetic unit (AL
U) 144 Operand unit 145 Multiplexer 146 Microcode ROM 148 Microsequencer

Claims

[Claims]

1. A method for operating a processor to process a plurality of instructions, the method comprising: receiving one of the plurality of instructions, the instructions comprising a first operand and a second operand. Receiving, comprising: determining whether the received instruction includes an operand prefix, wherein the operand prefix identifies a third operand; determining that the received instruction includes an operand prefix. Executing the received instruction to produce a result in response to the determination of two operands selected from the first operand, the second operand, and the third operand. And the first operand, the second operand, and the third operand Out one but to not be selected, the method comprising steps, and storing the result in the unselected in operand to the execution.