TWI235331B - Method and apparatus for maintaining status coherency between queue-separated functional units - Google Patents
Method and apparatus for maintaining status coherency between queue-separated functional units Download PDFInfo
- Publication number
- TWI235331B TWI235331B TW92129613A TW92129613A TWI235331B TW I235331 B TWI235331 B TW I235331B TW 92129613 A TW92129613 A TW 92129613A TW 92129613 A TW92129613 A TW 92129613A TW I235331 B TWI235331 B TW I235331B
- Authority
- TW
- Taiwan
- Prior art keywords
- instruction
- pipeline
- patent application
- scope
- item
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 17
- 238000003860 storage Methods 0.000 claims description 39
- 230000006870 function Effects 0.000 claims description 15
- 206010011469 Crying Diseases 0.000 claims description 2
- 230000002159 abnormal effect Effects 0.000 claims 2
- 230000014759 maintenance of location Effects 0.000 claims 2
- 235000010627 Phaseolus vulgaris Nutrition 0.000 claims 1
- 244000046052 Phaseolus vulgaris Species 0.000 claims 1
- 230000004044 response Effects 0.000 claims 1
- 230000027311 M phase Effects 0.000 description 9
- 230000018199 S phase Effects 0.000 description 8
- 229910001219 R-phase Inorganic materials 0.000 description 6
- 238000010586 diagram Methods 0.000 description 6
- 230000008901 benefit Effects 0.000 description 3
- 241000282376 Panthera tigris Species 0.000 description 2
- 239000007787 solid Substances 0.000 description 2
- 230000005856 abnormality Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000000747 cardiac effect Effects 0.000 description 1
- 230000008878 coupling Effects 0.000 description 1
- 238000010168 coupling process Methods 0.000 description 1
- 238000005859 coupling reaction Methods 0.000 description 1
- 125000004122 cyclic group Chemical group 0.000 description 1
- 238000005034 decoration Methods 0.000 description 1
- 230000003111 delayed effect Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000003203 everyday effect Effects 0.000 description 1
- 238000007667 floating Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
- 239000011257 shell material Substances 0.000 description 1
- 210000002784 stomach Anatomy 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
- 238000010200 validation analysis Methods 0.000 description 1
Landscapes
- Advance Control (AREA)
Abstract
Description
12353311235331
_案號 9212961,1 五、發明說明(1) 號10/279,213,中請a主為張2二I年吳丄:申^日案之優先榷:案 【發明所屬之技術領域] [0 0 0 2 ]本發明係有關於管線化微處理器的領域,特 別是關於具有多個功能單元的管線化微處理器。 【先前技術】 [0 0 0 3 ]現代的微處理器通常具有多個功能單元,如 分別用以執行整數、浮點及多媒體指令的整數單元 (integer unit,簡稱 IU)、浮點單元(floating — p〇in1: unit,簡稱FPU)及向量算術單元,如MMX單元(MXU)。每一 功能單元為一具有多重階段的管線,當一個指令或運算妙 過管線時,每個階段都會執行此指令或運算的一部份了 ^ [ 0 0 0 4 ]因為浮點及多媒體指令通常涉及較長的77算術 計算’所以FPU及MXU通常會比IU需要更多的時脈來執"行;7指 令。這在某些情況下,會造成I U管線的運作停頓,例^曰 F P U或Μ X U尚未準備好接收另一個指令或運算的時候。 外’由於資料快取記憶體會在FPU或MXU未準備好接收= 要送出的資料而產生停頓,所以FPU及MXU需要更多時、所 執行的這個事實,會造成資料快取記憶體的運作缺乏脈來 率。為了解決這些問題,可加入一個指令及資料件,放 FPU或MXU,以接收指令及其相關的資料,以使至 取記憶體能持續運作。 貝料快 [0 0 0 5 ]微處理器有納入微處理器之使用者可見_ Case No. 9121961,1 V. Description of the Invention (1) No. 10 / 279,213, in which the main a is Zhang 22, Wu Yi: Priority of the case of Shen: Japanese case: case [the technical field of the invention] [0 0 0 2] The present invention relates to the field of pipelined microprocessors, and more particularly to a pipelined microprocessor having multiple functional units. [Prior technology] [0 0 0 3] Modern microprocessors usually have multiple functional units, such as integer units (integer units (IUs) and floating-point units (floating points) for executing integer, floating-point, and multimedia instructions, respectively.) — P〇in1: unit (referred to as FPU) and vector arithmetic unit, such as MMX unit (MXU). Each functional unit is a pipeline with multiple stages. When an instruction or operation passes through the pipeline, each stage will execute a part of the instruction or operation. ^ [0 0 0 4] Because floating-point and multimedia instructions usually Involving longer 77 arithmetic calculations, so the FPU and MXU usually require more clocks to execute the "7" instruction than the IU. In some cases, this will cause the operation of the I U pipeline to stop, for example, when F P U or M X U is not ready to receive another instruction or operation. Outside 'because the data cache memory will be paused when the FPU or MXU is not ready to receive = data to be sent, so the fact that the FPU and MXU need more time to execute will cause the lack of operation of the data cache memory Pulse rate. In order to solve these problems, a command and data file can be added to put the FPU or MXU to receive the command and its related data, so that the fetch memory can continue to operate. Shell material fast [0 0 0 5] The microprocessor can be seen by the user
12353311235331
五 、發明說明(2)V. Description of the invention (2)
U 器 括 器 狀態的概念。例如,在架構的微處理 中,使用者可見狀態包括使用者可見暫存器浐 1里 一般用途暫存器(如EAX暫存器)、 田'、-^ (如浮點暫存器、)以及如旗標I存器的V :暫之存相哭關暫存 成」广必須已到…^ 成4刻才可更新使用者可見狀態,例如 義的凡 案。這是因為某些事件或條件會發生將^檔 亦即,某些事件或條件會發生, 理J:J效化。 行指;丄;別是不應該更新處理器的使;者;=止執 行流中。*處理器後來 :乂進入執 已在功能單元管線的久柄μ刀叉預測為錯祆,即使指令 分支後所假想執行的扑人2段完成部份執行,也必須將此 的使用者可見狀態。另曰二無效化,且不允許其更新處理器 (exception),如分頁誤種^無效化事件的例子為異常 算碼異常。此外,扑人' =異 '、一般保護異常或無效運 情況最常見的原因,日:I能在初進人管線時就無效。這種 管線階段未填入有效 > 人於未"卩中扣令快取記憶體而造成 (bubble)。 ^ 9 $ ,而產生運作停頓或泡沫 [〇〇〇7]當一功台匕如— 時,FPU需依據所執/ =兀如FPl^備要完成指令的執行 者可見狀態。為了 f 丁#、特定指令,來更新處理器的使用 指令仍是有效Wf使用者可見狀態,FPU必須知道此 可見狀態。為了確6 士卩4’准許此指令更新處理器的使用者 在執行指令有效化功1指令仍是有效的,習用的處理器會 _之功能單元的末端,設置一佇列。U device concept of the state. For example, in the architecture's microprocessing, the user-visible state includes the general-purpose register (such as the EAX register), the field ',-^ (such as the floating-point register,) in the user-visible register. And V such as the flag I register: temporary storage phase crying off temporary storage "Can only be… ^ 4 minutes before you can update the user's visible status, such as righteous cases. This is because certain events or conditions will occur. That is, certain events or conditions will occur. Line means; 丄; Do not update the messenger of the processor; or; = Stop execution flow. * The processor later: The entry of the long handle μ knife and fork that was already in the functional unit pipeline is predicted to be wrong. Even if the execution of the hypothetical 2 steps after the instruction branch completes the partial execution, it must be visible to the user . On the other hand, it is invalidated, and it is not allowed to update the exception. For example, a paging error ^ invalidation event is an exception. In addition, the most common reason for fluttering '= different', general protection abnormality, or invalid operation, is that the day when I can enter the pipeline, it is invalid. This stage of the pipeline was not populated with valid > person " buffer caches. ^ 9 $, resulting in operation pauses or bubbles. [00〇7] When a power tool such as —, the FPU needs to be executed according to / = Wu Ru FPl ^ to prepare the execution state of the instruction to be visible to the executor. In order to update the use of the processor with specific instructions, the instruction is still valid and visible to the user. The FPU must know this visible state. In order to ensure that the user of this instruction to update the processor is allowed to execute the instruction validation function, the instruction 1 is still valid, and the conventional processor will set a queue at the end of the functional unit.
Mm I ai_ ftiLI ι·」·ι·· _li _ ι··_ι·_ ^_Mm I ai_ ftiLI ι · 」· ι ·· _li _ ι ·· _ι · _ ^ _
第8頁 1235331 修正 五 發明說明(3) 為“Γ:]有?化二 件(如分支預測0 $此、功能早凡。亦即,會將無效化條 所告知Λ件/;X或Λ常)告知整數管線104,其會依據 微處理琴i'OO合在//令或運算是否有效。圖1的習用 此,指= 的末端’設置仔列106。因 04。藉由將佇列1〇 列106之前,行經整數管線Page 81235331 Amendments to the fifth invention description (3) is "Γ:] Yes? Two pieces (such as branch prediction 0 $ this, early features. That is, the Λ pieces /; X or Λ will be notified by the invalidation bar. Often) informs the integer pipeline 104 whether it will be valid according to the microprocessor i'OO combination of // order or operation. Figure 1 is used to refer to the end of = to set the rank 106. Because 04. By queuing 10 column 106 before passing through the integer pipeline
:或二=:;二的末端,就能確保不會發A “的末端'Λ'Λ;;::二此可,;旦指令到達管線 [0 〇 〇 9 ]然而,將佇 ρ可確保指令為有效。 元末蠕會有缺點。要求人;、仃有效化功能之功能單 效化功能單元管線的末』:合進入佇列之前,f進行至有 先通過有效化功能單元管線二,:曰t f進入佇列前,必須 能不需要經過,卻因此二士 &邛階段,這些階段原本可 元原本可能在離有效化額:卜的延遲。亦即,功能單 段,即可接收指令並開始 70 S線末端還很遠的一個階 管線的中間階段,資料快取:二二j: ’在有效化功能單元 元如FPU所需的資料。因此,°匕人體可此已提供其他功能單 元管線階段所需的時脈週期曰7通過其餘的有效功能單 [〇〇1〇]額外延遲會造成門7尤韻^造成彳不必要的延遲。 整數乘法功能給整數單元的降74的一個例子是,MXU提供 MX職法指令的整數乘法有用以^于 丁以拿掉,以降低微處理H電路日日早7^的整數乘法器便 指令就能另以MXU整數乘法器來y 、寸,而整數乘法 水執仃。然而,由於在程式 $ 9頁 1235331 案號 92129613 修正 曰 ^替換f ί 五、發明說明(4) —— 指令序列中,整數乘法指令是相當常用的,所以在進行整 數乘法時,將MXIK宁列置於整數單元管線末端所帶來的額 外延遲,可能會令人無法接受。 [0 0 1 1 ]然而,若在架構上,將MXIK宁列置於某個其後 指令可能會被無效化的階段,則一旦指令進入MXIK宁列, MXU就無法確定指令仍是有效的。亦即,因為MXIK宁列係位 於整數管線的末端之前,所以當指令位於MXIH宁列中,或 當MXU已從彳宁列接收到指令而正執行時,可能會發生無效 化條件。而當指令行經I U及MXU管線時,就不再是密集連 鎖地進行。因此,由於整數單元可能會在MXIK宁列之任何 延遲期間使指令無效化,所以MXU不知道其是否可更新使 用者可見狀態。 [0 0 1 2 ]因此,需要一種機制,其可在功能單元佇列 未對準的情況下,維持功能單元之間的指令狀態一致性。 【發明内容】 [0 0 1 3 ]本發明提供一種裝置,用以追蹤一功能單元 指令佇列暫存器中之指令或運算的年齡(age ),不管指令 或運算在佇列中的位置為何。亦即,此功能單元會隨時記 錄指令所處的對應I U管線階段。此外,對於佇列中的每個 指令,功能單元會記錄一有效位元。若I U通知功能單元有 一指令已被無效化,則功能單元會相應地更新有效位元。 若指令在功能單元中完成,並且其年齡顯示其已通過IU管 線的末端且仍然有效,則功能單元可更新機器的使用者可 見狀態。再者,若在功能單元中,指令尚未完成,並且其: Or two = :; the end of the two, you can ensure that the end of A "will not be sent; 'Λ'Λ ;;: the second is OK ,; once the instruction reaches the pipeline [0 〇〇9] The instruction is effective. The creep at the end of the yuan will have disadvantages. It requires people; 仃 The end of the single-effect functional unit pipeline for the function of the validating function ": Before entering the queue, f proceeds to the first through the functional unit pipeline. : Said tf before entering the queue, it must be able to pass without, but therefore the two shi & 邛 stage, these stages could have been effective before the effective amount: the delay of Bu. That is, the function single stage, you can receive Command and start the middle stage of a meta pipeline that is far from the end of the 70 S line, data cache: two two j: 'The data needed to validate functional units such as FPU. Therefore, the human body can provide other The clock cycle required for the functional unit pipeline phase is 7 and the extra delay through the remaining effective function sheets [00〇10] will cause gate 7 to be more ^ unnecessary delay. The integer multiplication function reduces the integer unit by 74. An example is that MXU provides integer multiplication for MX professional method instructions. Take the ^ Yu Ding to remove the integer multiplier to reduce the micro-processing H circuit 7 ^ every day, then the instruction will be able to use the MXU integer multiplier to y, inch, and the integer multiplication is performed. However, because Page 121235331 Case No. 92129613 Amendment ^ Replace f Ⅴ 5. Description of the Invention (4) —— Instruction sequences, integer multiplication instructions are quite common, so when performing integer multiplication, place MXIK column at the end of the pipeline of integer units The additional delay may be unacceptable. [0 0 1 1] However, if MXIK is placed at a stage where the instruction may be invalidated later, once the instruction enters MXIK column, MXU cannot determine that the instruction is still valid. That is, because the MXIK column is located before the end of the integer pipeline, when the instruction is in the MXIH column, or when the MXU has received an instruction from the column During execution, invalidation conditions may occur. When the instruction passes through the IU and MXU pipelines, it will no longer be chained closely. Therefore, the integer unit may cause the instruction to be delayed during any delay in the MXIK list. MXU does not know whether it can update the user-visible state. [0 0 1 2] Therefore, a mechanism is needed to maintain the command state between the functional units when the functional unit queues are not aligned. [Abstract] [0 0 1 3] The present invention provides a device for tracking the age (age) of instructions or operations in an instruction queue register of a functional unit, regardless of whether the instructions or operations are in the queue. What is the position of the function. That is, the functional unit will record the corresponding IU pipeline stage of the instruction at any time. In addition, for each instruction in the queue, the functional unit will record a valid bit. If the I U informs the functional unit that an instruction has been invalidated, the functional unit will update the valid bit accordingly. If the instruction is completed in the functional unit and its age indicates that it has passed the end of the IU line and is still valid, the functional unit can update the user's visibility status of the machine. Furthermore, if in a functional unit, the instruction has not been completed and its
第10頁 1235331 曰 修正 案號 92129613_ 五、發明說明(5) 線的末端且仍然有效,則功: [0014]因此,為了達成上述目 徵是,提供一種微處理器中之指令 表明的—項特 列暫存器包括第一複數個館存元件,每:此指令件 個由第一功能單元所執行的指令。該浐八亦子7G件儲存_ 能單元之複數個管線階段:曰:〗儲存於第二功 ;包括第二複數個儲存元件丄;此指:仔列暫存器 =厂單元之《”線存於 丨以還 :應的-個中所存的指令m:複數個储存元件之該 卜出指令是否有效。 有效位70。此有效位元用以 [0015]另 _ 女 |處理器中的裝ι田面’本發明的-項特徵是提出一種忾 致性,此二ΐ;二f以維持二個指令管線間之指令狀種Λ 而非同步地運‘:::”於-指令佇列暫存器將“離 儲存Ν個指令。此^佇列暫存器具有Ν個項目,用以 目的_邏輯4裝置:^應於該Ν個指令仔列暫存二 儲存該_指=中母固邏輯元件包括-年齡暫存:,以 中的哪—階 7還儲存於該二個管線之第一7伽狄Ί 卜該N個指令Y,—,邏^ #效位兀。每個邏輯元件還包 第11頁 1235331 _案號92129613_年月曰 修正_ 五、發明說明(6) 括一多工器,其依據該年齡訊號選取複數個有效位元訊號 中的一個,提供至該有效暫存器。該些有效位元訊號係指 出第一管線之對應複數個階段中所儲存的指令是否有效。Page 101235331 Amendment No. 92119613_ V. Description of the Invention (5) The end of the line is still valid, then work: [0014] Therefore, in order to achieve the above objective, a microprocessor instruction is provided to indicate- The special register includes the first plurality of storage elements, each: this instruction is an instruction executed by the first functional unit. The storage of the 7G pieces of this Yayako _ energy unit's multiple pipeline stages: said: 〖stored in the second power; including the second plurality of storage elements 丄; this refers to: "line" of the register line = factory unit Stored in Yiyue: Should-instructions stored in one: m: Whether the specified instruction of a plurality of storage elements is valid. Valid bit 70. This valid bit is used by [0015] ι 田 面 'The-feature of the present invention is to propose a coherence, and the second one; the second f is to maintain the instruction type Λ between the two instruction pipelines instead of running synchronously' ::: "于-instruction queue The register will "store N instructions. This queue register has N items for the purpose _ logic 4 device: ^ should be temporarily stored in the N instruction queue 2 to store the _ finger = middle mother solid Logic elements include-age temporary storage: which of which-order 7 is also stored in the first 7 of the two pipelines. The N instructions Y,-, logic ^ # effect position. Each logic element Also includes page 111235331 _Case No. 92119613_ Year Month Revision _ V. Description of the invention (6) Including a multiplexer, which selects a plurality of valid bits based on the age signal A signal is provided to the active register. The plurality of valid bits corresponding to a first signal line means a plurality of stages of stored instructions is valid.
[0 0 1 6 ]另一方面,本發明的一項特徵是提出一種微 處理器。此微處理器包括第一指令管線,其包括複數個階 段,用以儲存指令。此微處理器還包括第二指令管線,耦 接至第一指令管線,以從第一指令管線接收該些指令的第 一部份,並加以執行。此微處理器還包括一指令佇列暫存 器,用以儲存該第一部份指令之一第二部份,直到第二指 令管線準備好執行該第二部份。此微處理器還包括一控制 邏輯,耦接至該指令佇列暫存器,用以儲存該第二部份的 每個指令之一目前狀態及一有效位元。此目前狀態指出, 該第二部份的指令係儲存於該些第一指令管線階段中的哪 個階段。[0 0 1 6] In another aspect, a feature of the present invention is to provide a microprocessor. The microprocessor includes a first instruction pipeline including a plurality of stages for storing instructions. The microprocessor also includes a second instruction pipeline coupled to the first instruction pipeline to receive a first portion of the instructions from the first instruction pipeline and execute them. The microprocessor also includes an instruction queue register for storing a second part of the first part of the instruction until the second instruction pipeline is ready to execute the second part. The microprocessor also includes a control logic coupled to the instruction queue register to store a current state of each instruction of the second part and a valid bit. This current state indicates which stage of the first instruction pipeline stage the instruction of the second part is stored in.
[0 0 1 7 ]另一方面,本發明的一項特徵是提出一種維 持微處理器中功能單元間之指令狀態一致性的方法,此微 處理器的階段係由於一佇列的存在而未對準。此方法包括 將一指令儲存於第一功能單元之一管線階段中,將該指令 的第一有效位元儲存於此管線階段中,以及將該指令儲存 於第二功能單元之一佇列,直到第二功能單元準備好執行 該指令。此方法還包括將該指令的第二有效位元儲存於該[0 0 1 7] On the other hand, a feature of the present invention is to propose a method for maintaining consistency of instruction states among functional units in a microprocessor. The stages of the microprocessor are not due to the existence of a queue. alignment. The method includes storing an instruction in a pipeline stage of a first functional unit, storing a first significant bit of the instruction in this pipeline stage, and storing the instruction in a queue of a second functional unit until The second functional unit is ready to execute the instruction. The method further includes storing a second significant bit of the instruction in the
第12頁 1235331 曰 ---^~tM 92120^ 修正 五、發明說明(7) 丨仔列中’以及將該 '~~ 出該指令係儲存於第儲存於該仵列中。此年齡指 ^括接收-訊號,其顯;::固管線階段。此方法Page 12 1235331 said --- ^ ~ tM 92120 ^ Amendment V. Description of the invention (7) 丨 in the queue 'and storing the' ~~ out of the instruction in the queue are stored in the queue. This age refers to the ^ receive-signal, whose display is ::: solid pipeline stage. This method
I及第二有效位元。文位’及該訊號的接收,更新此S I處理“1二二一/列V存”明广項特徵是提出-種微 複數個儲存元#,丄1;二二指令件列暫存器包括第一 =。該指令亦儲存於 :由第一功能單元 :接至該第-複數個儲存元件,每:;;數個儲存元件’ 件係儲存第一複數個 二複數個儲存元 丨年齡。此年齡係指定第!::單::==之指令的 丨狀態所指…二功能單元管線階段則 [0 0 1 9 ]本發明的一項優點是,可避 ^ .1 f # ϋ 1 ^ ^ ,t t i;& ^ ^ 管線末端所造成的延遲。取而代之的是,力犯早疋的 列置於較早的管線階段,同時確保指令能正=許將仔 [ 0 0 20 ]本發明之其它特冑及優點,在配 仃 |及所附圖示後,將更能突顯出來。 j灰月 【實施方式】 [ 0 0 2 6 ]現請參照圖2’其為本發明之微處理器2〇〇 方塊圖。微處理器2 0 0包括一整數管線2 0 2、一次' 乙 貪料快取記 1235331 案號9212%β 丨年』 曰 五、發明說明(8) 一一一 憶體 2 0 4、一 MXU管線 2 0 6、- 令佇列暫存器2 1 2。 [ 0 0 2 7 ]整數管線2 02包 括R -階段2 2 1、A -階段2 2 2、 段2 2 5、8-階段226以及墀-户皆 存資料的暫存器稽案,這些 運算元、處理器的控制與狀 段暫存器以及指令指標或程 產生記憶體位址的位址產生 資料階段,用以從記憶體及 料。資料係由資料快取記憶 -MXU資料佇列2 0 8以及一 MXU指 括連在一起的複數個階段,包 D-階段2 23、G-階段224、E-階 段22 7。R-階段221包括用以儲 資料如指令運算元、位址產生 態資訊、旗標、堆疊指標、區 式計數。A-階段2 2 2包括用以 器。D-階段223及G-階段2 24為 資料快取記憶體2 0 4載入資 體204送到G-階段224。E-階段 2 2 5包括執行單元,如執行整數算術或邏輯運算的算術邏 輯單元。S-階段2 2 6包括用以將指令結果儲存至記憶體及 資料快取記憶體2 0 4的邏輯。W-階段2 2 7包括用以將指令結 果寫回R-階段221之邏輯。亦即,W-階段22 7係用以更新微 處理器2 0 0的使用者可見狀態。w -階段2 2 7也會丟棄指令, 且為整數管線2 0 2的最後階段。此外,W-階段2 2 7會提供運 算元轉送功能,用以將結果轉送至整數管線2 〇 2中的G -階 段2 24、E-階段2 2 5及S-階段2 2 6。 [0 0 2 8 ] R -階段2 2 1會從圖上未顯示之其他整數管線 2 0 2的階段(如指令提取與解碼階段)接收指令2 7 6。指令 2 7 6會沿著整數管線2 0 2的各個階段而下,直到抵達整數管 線2 0 2的最後階段,即W-階段2 2 7。在將指令解碼時,也會 依據所解碼指令的類型,將其送至其他適當的功能單元。 特別是,MMX指令會被送到MXU管線2 0 6。在一實施例中,I and second significant bit. The text position and the reception of the signal, update this SI processing "1 2 21 / column V storage" The clear and broad item feature is to propose-a kind of micro-plurality of storage elements #, 丄 1; the two-two instruction list register includes First =. The instruction is also stored in: from the first functional unit: connected to the-plurality of storage elements, each: ;; the number of storage elements' is to store the first plurality of two storage elements 丨 age. This age is designated! :: Single :: The state of the instruction of the == ... two functional unit pipeline stages [0 0 1 9] An advantage of the present invention is that ^ .1 f # ϋ 1 ^ ^, tti; & ^ ^ Delay caused by the end of the pipeline. In its place, the queue of early offenders is placed in an earlier pipeline stage, while ensuring that the instruction can be positive = Xu Jiangzai [0 0 20] Other features and advantages of the present invention are provided in the configuration | and attached drawings Later, it will be more prominent. j 灰 月 [Embodiment] [0 0 2 6] Please refer to FIG. 2 ′, which is a block diagram of a microprocessor 2000 according to the present invention. The microprocessor 2 0 0 includes an integer pipeline 2 0 2. One time. “B. Cache Cache 1253331 Case No. 9212% β 丨 Year” Five. Description of the invention (8) One by one memory 2 0 4. One MXU Pipeline 2 0 6,-Order queue register 2 1 2. [0 0 2 7] Integer pipeline 2 02 includes R-stage 2 2 1, A-stage 2 2 2, paragraph 2 2 5, 8-stage 226, and register register of all data stored by the households. These operations The unit, the processor's control and status register, and the address of the instruction pointer or process to generate the memory address generate the data phase, which is used to retrieve data from the memory. The data is memorized by the data cache -MXU data queue 208 and an MXU refers to a plurality of phases connected together, including D-phase 2 23, G-phase 224, E-phase 22 7. The R-phase 221 includes data such as instruction operands, address generation status information, flags, stacked indicators, and district counts. A-phase 2 2 2 includes appliances. D-stage 223 and G-stage 2 24 are loaded into data cache memory 204 and sent to G-stage 224. E-phase 2 2 5 includes execution units such as arithmetic logic units that perform integer arithmetic or logical operations. S-stage 2 2 6 includes logic to store the command results in memory and data cache memory 2 0 4. W-phase 2 2 7 includes logic to write the instruction results back to R-phase 221. That is, W-stage 22 7 is used to update the user-visible state of the microprocessor 2000. w-stage 2 2 7 also discards the instruction and is the last stage of the integer pipeline 2 0 2. In addition, W-phase 2 7 will provide an operator transfer function to transfer the results to G-phase 2 24, E-phase 2 2 5 and S-phase 2 2 6 in the integer pipeline 202. [0 0 2 8] R-Phase 2 2 1 will receive instructions 2 7 6 from other integer pipeline phases 2 0 2 (such as instruction fetch and decode phases) which are not shown in the figure. The instruction 2 7 6 will go down each stage of the integer pipeline 202 until it reaches the final stage of the integer pipeline 202, which is W-phase 2 2 7. When the instruction is decoded, it is also sent to other appropriate functional units according to the type of decoded instruction. In particular, the MMX instruction is sent to the MXU pipeline 206. In one embodiment,
第14頁 1235331 五、發明說明 案號 92129613 (9) 丨替換 4 曰 修正 浮點指令會被送到浮點功能單元。 [0 0 2 9 ] Μ X U管線2 0 6包括連在一起的複數個階段,類 似於且大部分對應於整數管線2 〇 2。Μ X U管線2 0 6包括R -階 段2 6 1、R 2 -階段2 6 2、A -階段2 6 3、D -階段2 6 4、G -階段 2 6 5、E-階段2 6 6、S-階段2 6 7、W-階段2 68以及M-階段 2 6 9。在一貫施例中,具有對應於整數管線2 〇 2階段的名稱 之MXU管線20 6階段會執行相似的功能。特別是,E_階段 2 6 6包括執行單元,如算術邏輯單元,用以執行多媒體指 令° [0 0 3 0 ] R 2 -階段2 6 2為額外的暫存器階段,其提供一 個時脈週期的延遲給資料快取記憶體2 04,以將資料送到 MXU官線2 0 6。由於有R2-階段2 6 2,所以MXU管線20 6相對於 整數管線2 0 2,會向下多移一個階段。因此,μ X u管線2 0 6 的D-階段264會對應於整數管線2 0 2的G-階段2 24。Μ -階段 2 6 9會執行結果回寫功能,以更新微處理器2 〇 〇的使用者可 見狀態,與整數管線20 2的W-階段22 7相似。此外,Μ-階段 2 6 9會提供運算元轉送功能,以將結果轉送至管線2 〇 6 的G ^又2 6 5、Ε -階段2 6 6或S -階段2 6 7。當指令到達Μ-階 ,2 6 9時,Μ-階段2 6 9依據指令是否有效及指令已到達整數 官線y 0 2的哪個階段,或指令是否已從整數管線2 〇 2丟棄, 來斷疋否要更新微處理器2 〇 〇的使用者可見狀態或轉送 運异兀。該有效性及所到達的階段係藉由Μχυ指令來記 錄如下文配合其餘圖式所做的詳細說明。 古[0 0 3 1 ]整數管線2 0 2之R-階段2 2 1至D-階段2 2 3的相同 心頓條件’也適用於MXυ管線206的R-階段261至Α-階段Page 14 1235331 V. Description of the invention No. 92129613 (9) 丨 Replace 4 Modification The floating-point instruction will be sent to the floating-point functional unit. [0 0 2 9] The MX U pipeline 206 includes a plurality of stages connected together, similar to and mostly corresponding to the integer pipeline 202. Μ XU pipeline 2 0 6 includes R-phase 2 6 1, R 2-phase 2 6 2, A-phase 2 6 3, D-phase 2 6 4, G-phase 2 6 5, E-phase 2 6 6, S-phase 2 6 7, W-phase 2 68 and M-phase 2 6 9. In a consistent embodiment, the MXU pipeline 206 with a name corresponding to the integer pipeline 202 stage performs similar functions. In particular, E_stage 2 6 6 includes execution units, such as arithmetic logic units, to execute multimedia instructions. [0 0 3 0] R 2-stage 2 6 2 is an additional register stage, which provides a clock The cyclic delay gives the data cache memory 2 04 to send the data to the MXU official line 206. Since there are R2-stages 2 6 2, the MXU pipeline 20 6 will move down by one more stage than the integer pipeline 2 02. Therefore, the D-phase 264 of the μ X u pipeline 20 6 will correspond to the G-phase 2 24 of the integer pipeline 202. M-phase 2 6 9 performs a result write-back function to update the user-visible state of microprocessor 2000, similar to W-phase 22 7 of integer pipeline 202. In addition, the M-phase 269 will provide an operand transfer function to transfer the results to G ^ 2 265, E-phase 266 or S-phase 266 of the pipeline. When the instruction reaches the M-level, 2 6 9, the M-phase 2 6 9 breaks according to whether the instruction is valid and which stage of the integer official line y 0 2 has been reached, or whether the instruction has been discarded from the integer pipeline 2 02. Do you want to update the user's visible status of the microprocessor 2000 or transfer it? The validity and the stages reached are recorded by the Mxυ instruction as detailed below in conjunction with the remaining diagrams. Ancient [0 0 3 1] The same R-stage 2 2 1 to D-stage 2 2 3 of the integer pipeline 2 0 2 The same cardiac conditions are also applicable to the R-stage 261 to A-stage of the MXυ pipeline 206
第15頁 1235331 _案號92129613_年月日 倏正_ 五、發明說明(10) 2 63。因此,已到達MXU管線206之D-階段2 64的指令也同步 到達整數管線2 0 2的G-階段2 2 4。然而,對於指令在MXu指 令仵列暫存器2 1 2及Μ X U管線2 0 6的D -階段2 6 4至Μ -階段2 6 9 内的停頓或移動,則由另一組條件控制,與整數管線2 〇 2 階段之G-階段224至W-階段2 2 7的控制條件不同。亦即, MXU指令佇列暫存器21 2及MXU管線2 0 6階段之D-階段2 64至 Μ-階段2 6 9,與整數管線2 02之G-階段2 24至W-階段2 2 7的運 作為非同步。 [0 0 3 2 ] MXU管線2 0 6的R-階段261也會從整數管線202 的指令提取及解碼階段中,選擇性地接收指令2 7 6。因 此,若指令2 7 6為MMX指令,當其被提取及解碼時,會行經 整數管線2 0 2,且會沿著MXU管線2 0 6的各個階段而下,直 到抵達Μ X U管線2 0 6的最後階段(Μ -階段2 6 9 )及整數管線2 0 2 的最後階段。取決於某些條件是否存在(如配合圖3部分的 說明),指令2 7 6也會在通往MXU管線2 0 6末端的路徑上,通 過Μ X U指令佇列暫存器2 1 2。 [0 0 3 3 ] Μ X U資料佇列2 0 8係藉由一資料匯流排2 7 4耦接 至資料快取記憶體2 04。MXU資料佇列2 08包括複數個儲存 元件(稱為佇列項目),用以儲存從資料匯流排2 7 4上的資 料快取記憶體2 04所接收的資料。在圖2的實施例中,Μχυ 賀料彳丁列2 08包括五個仔列項目。mxu資料仔列208會將資 料從其底部項目送到MXU管線2 0 6的G-階段26 5。 ' [ 0 0 34 ]架構上,Μχυ指令佇列暫存器21 2位於MXU管線 2 0 6的D-階段264中。MXU指令佇列暫存器21 2包括複數個儲 存元件(稱為佇列項目),用以儲存從D_階段2 64所接收的Page 15 1235331 _Case No. 92119613_Year Month and Date _ Zheng V. Description of the invention (10) 2 63. Therefore, the instructions that have reached the D-phase 2 64 of the MXU pipeline 206 also synchronously reach the G-phase 2 2 4 of the integer pipeline 202. However, the pause or movement of instructions in the D-phase 2 64 to M -phase 2 6 9 of the MXu instruction queue register 2 12 and the M XU pipeline 2 06 is controlled by another set of conditions. The control conditions are different from the G-phase 224 to the W-phase 2 2 7 of the integer pipeline 2 02 phase. That is, the MXU instruction queue register 21 2 and the MXU pipeline 206 are D-phase 2 64 to M-phase 2 6 9 and the integer pipeline 20 02 is G-phase 2 24 to W-phase 2 2 7 operations are asynchronous. [0 0 3 2] The R-phase 261 of the MXU pipeline 2 0 6 also selectively receives the instructions 2 7 6 from the instruction fetch and decode phase of the integer pipeline 202. Therefore, if the instruction 2 7 6 is an MMX instruction, when it is fetched and decoded, it will go through the integer pipeline 2 0 2 and will go down the various stages of the MXU pipeline 2 0 6 until it reaches the M XU pipeline 2 0 6 The final phase (M-phase 2 6 9) and the final phase of the integer pipeline 2 0 2. Depending on the existence of certain conditions (such as in conjunction with the description in Figure 3), instruction 2 7 6 will also pass the MX U instruction queue register 2 1 2 on the path to the end of the MXU pipeline 206. [0 0 3 3] The M × U data queue 208 is coupled to the data cache memory 2 04 through a data bus 2 7 4. The MXU data queue 2 08 includes a plurality of storage elements (called queue items) for storing data received from the data cache memory 2 04 on the data bus 2 7 4. In the embodiment of FIG. 2, Μχυ 彳 彳 彳 列 2 08 includes five items. The mxu data line 208 will send data from its bottom item to the G-phase 26 5 of the MXU pipeline 206. '[0 0 34] Architecturally, the Mxυ instruction queue register 21 2 is located in the D-stage 264 of the MXU pipeline 206. The MXU instruction queue register 21 2 includes a plurality of storage elements (called queue items) for storing the data received from D_stage 2 64.
第16頁 1235331Page 16 1235331
案號 921296U 五、發明說明(11) 指令。在圖2的實施例中,ου指令佇列暫存器21 2包括五 個仔列項目,仏不為 QDO 240、 QD1 241、 QD2 242、 QD3 2 4 3以及Q D 4 2 4 4。Q D 0 2 4 0為Μ X U指令仵列暫存器2 1 2的底 部項目,而QD4 244為MXU指令佇列暫存器212的頂端項一 目。亦即’當MXU指令佇列暫存器21 2已滿時,Qj)〇 240係 位於MXU指令佇列暫存器212的頭端且保有最舊的指令,而 QD4 244則位於MXU指令佇列暫存器212的尾端且保有最新 的指令。當指令進入MXU指令佇列暫存器212時,其會進入 最接近MXU指令佇列暫存器2 1 2的底部或頭部之第一個空項 目。例如,若指令正佔據QD0 240及QD1 241,且QD2 242 為下個空項目’則進來的指令將會儲存於QD2 242中。若 MXU指令佇列暫存器21 2完全為空,則指令將會儲存於QD〇 240 中。 [ 0 0 3 5 ] MXU管線2 0 6的D-階段2 64還包括一個具二輸入 端的多工器2 1 4。多工器2 1 4的第一輸入端直接從D-階段 264接收指令。多工器21 4的第二輸入端則從QD〇 24〇,亦 即MXU指令佇列暫存器2 1 2的底部項目接收指令。多工器 2 1 4的輸出端會從二個輸入端選取指令,以送至Μχ υ管線 2 0 6的G-階段2 6 5。當一指令到達D—階段2 64,若此指令有 效且MXU指令佇列暫存器21 2為空,並且MXU管線2 0 6正在移 動(亦即未停頓),則多工器2 1 4會選擇第一輸入端,以將 指令直接送到G-階段26 5,藉此繞過MXU指令佇列暫存器 212。然而,若MXU指令佇列暫存器21 2不為空或MXU管線 20 6有停頓,則多工器21 4會選擇第二輸入端,以將qd〇 24 0中的指令送到g-階段2 6 5,直到MXU指令佇列暫存器21ιCase No. 921296U 5. Instructions for Invention (11) Directive. In the embodiment of FIG. 2, the instruction queue register 21 2 includes five queue items, which are not QDO 240, QD1 241, QD2 242, QD3 2 4 3, and Q D 4 2 4 4. Q D 0 2 4 0 is the bottom item of the MX U instruction queue register 2 12, and QD 4 244 is the top item of the MXU instruction queue register 212. That is, when the MXU instruction queue register 21 2 is full, Qj) 0240 is located at the head of the MXU instruction queue register 212 and holds the oldest instruction, while QD4 244 is located in the MXU instruction queue The end of the register 212 holds the latest instruction. When the instruction enters the MXU instruction queue register 212, it enters the first empty item closest to the bottom or head of the MXU instruction queue register 2 1 2. For example, if the instruction is occupying QD0 240 and QD1 241 and QD2 242 is the next empty item ’, the incoming instruction will be stored in QD2 242. If the MXU instruction queue register 21 2 is completely empty, the instruction will be stored in QD〇 240. [0 0 3 5] D-stage 2 64 of the MXU pipeline 2 0 6 also includes a multiplexer 2 1 4 with two inputs. The first input of the multiplexer 2 1 4 receives instructions directly from the D-phase 264. The second input terminal of the multiplexer 21 4 receives the instruction from QD〇 24〇, that is, the bottom item of the MXU instruction queue register 2 12. The output of the multiplexer 2 1 4 will select the instruction from the two inputs to send it to the G-phase 2 6 5 of the Μχ υ pipeline 2 0 6. When an instruction reaches D-stage 2 64, if this instruction is valid and the MXU instruction queue register 21 2 is empty, and the MXU pipeline 2 0 6 is moving (that is, not paused), the multiplexer 2 1 4 will The first input is selected to send the instruction directly to G-stage 265, thereby bypassing the MXU instruction queue register 212. However, if the MXU instruction queue register 21 2 is not empty or there is a pause in the MXU pipeline 20 6, the multiplexer 21 4 will select the second input to send the instruction in qd〇24 0 to the g-stage 2 6 5 until MXU instruction queue register 21ι
第17頁 1235331 案號 92129613 五、發明說明(12) 變成無指令的時候。 [〇 0 3 6 ]現請參照圖3,其為本發明控制圖2之Μχϋ指八 仔列暫存為2 1 2的邏輯3 0 0之方塊圖。控制邏輯3 〇 〇包括四 個多工器(標示為多工器1 302、多工器2 304、多工器3 30 6及多工器4 316)、一年齡暫存器312、一有效暫存^ 3 0 8及其他相關邏輯。對於Μχυ指令佇列暫存器>212的項°°目 所儲存的每個指令,控制邏輯3 0 0會記錄其年°齡及有=^ 態位元。此年齡及有效位元係分別儲存於年齡暫存器3 及有效暫存器3 0 8中。在圖3的實施例中,年齡暫存器32 包括二個位元,而有效暫存器3〇8包括一個位元。° [ 0 0 3 7 ]在圖3中,指令的年齡係標示為「ps」,即目 前狀態(present state)。指令的年齡指出其目前"在敫P 管線2 0 2中所處的階段。亦即,年齡值與指令在整== 2 0 2之階段位置的對應關係如下: &線 0 0=整數管線2 0 2的Ε ~階段2 2 5 0 1 =整數管線2 0 2的S -ρ皆段2 2 6 1 0=整數管線2 〇 2的W -卩皆段2 2 7 1 1 =超越整數管線2 〇 2的W -階段2 2 7 [ 0038 ]因此,一旦指令到達u的年齡,若其有戈 元仍處於設定狀態,則MXU知道指令將完成,且Μχ \ 使用者可見的處理器狀態。在圖3中,「NS」係表示 管線2 0 2的下個階段。 [0 0 3 9 ]每個MXU指令佇列暫存器2丨2的項目皆配 圖3的控制邏輯3 0 0。亦即,對於圖2的五項目佇列而古,° 會搭配五組圖3的控制邏輯3 0 0。在彳宁列的配置中, "" 五組控Page 17 1235331 Case No. 92129613 V. Description of the invention (12) When there is no instruction. [00 0 6] Please refer to FIG. 3, which is a block diagram of logic 3 0 for controlling the MX × 8 in FIG. 2 of the present invention to temporarily store 2 1 2 as the logic. Control logic 3 includes four multiplexers (labeled as multiplexer 1 302, multiplexer 2 304, multiplexer 3 30 6 and multiplexer 4 316), an age register 312, and a valid register Save ^ 3 0 8 and other related logic. For each instruction stored in the Mxυ instruction queue register > 212, the control logic 3 0 0 will record its age and state bits. The age and valid bits are stored in the age register 3 and the valid register 308, respectively. In the embodiment of FIG. 3, the age register 32 includes two bits, and the valid register 30 includes one bit. ° [0 0 3 7] In Figure 3, the age of the instruction is marked as "ps", which is the present state. The age of the directive indicates its current stage in the 敫 P pipeline 202. That is, the correspondence between the age value and the position of the instruction at the stage of integer == 2 0 2 is as follows: & Line 0 0 = E of integer pipeline 2 0 2 ~ Stage 2 2 5 0 1 = S of integer pipeline 2 0 2 -ρ 2 2 6 1 0 = W of integer pipeline 2 〇 2-W 2 of integer pipeline 2 2 7 1 1 = W beyond integer pipeline 2 〇 2-Phase 2 2 7 [0038] Therefore, once the instruction reaches u's Age, if its Ge Yuan is still in the set state, MXU knows that the instruction will complete and the processor state visible to the user. In Fig. 3, "NS" indicates the next stage of the pipeline 202. [0 0 3 9] Each MXU instruction queue register 2 丨 2 is equipped with the control logic 300 of FIG. 3. That is, for the five items in FIG. 2 that are queued and ancient, ° will be matched with the five sets of control logic 3 0 in FIG. 3. In the configuration of the Suining column, " "
案號92129故各Case No. 92129
1235331 五、發明說明(13) 制邏輯3 0 0係耦接在一起,以使得在Μχυ指令佇列暫 212中,一項目之相關控制邏輯3〇〇的輸出會變成下個項目 之控制邏輯3 0 0的輸入。圖3中,「γ 你支 、 靳六口口 9 1 9由一。A s α Χ」係表示MXU指令佇列 暫存态2 1 2中一已知項目,r χ+丨」則表示 ,…列暫存器212中下個最高或最新的項、目。因:, ps(o)為佇列中最舊或最低項目(即圖2的qd〇 24㈧之 齡。 卞 [ 0 0 4 0 ]控制邏輯3 0 0包括一個 ;多工器"02。多工器"。2包括三對; 為 VaUX) 344 及 Val(X+ ” 342。第:H輸入 354及PS(X+ 1 ) 3 52。第=針給入a 為 ,,QCO 弟一對輸入為 NS(X) 364及 NS(X+ 1 ; 〇 b Z ° [ 004 1 ]訊號Val(X) 344為多工器4 316的輸出,立表 不MXU指令佇列暫存器212之項目撕儲存的指令目前是否 。訊號Va1(X+ 〇 342為MXU指令佇列暫存器212之項 目^ 1的多工器4 316之輸出,其表示項目χ+ i所 的 指令目前是否有效。 [ 0042 ]訊號PS(X) 354係表示Μχυ指令佇列暫存哭212 之項目X所儲存指令的目前年齡,其係儲存於該指令的。年 齡暫存器312中。亦即,PS(X) 354係表示整數管線2〇2的 哪一個階段保有也儲存在MXU指令佇列暫存器2丨2之項目χ 中的指令。訊號PS(X+ 1) 3 5 2係表示Mxu指令佇列暫存器 又12之項目X+ ^斤儲存指令的目前年齡,其係儲存於該^ 令之年齡暫存器312中。 [0043]控制邏輯30 0還包括邏輯322,其依據ps(x)1235331 V. Description of the invention (13) The control logic 3 0 0 is coupled together so that in the M × υ instruction queue 212, the output of the related control logic 300 of an item will become the control logic of the next item 3 0 0 inputs. In Figure 3, "γ your branch, Jin Liukoukou 9 1 9 is composed of one. A s α χ" means a known item in the MXU instruction queue temporary storage state 2 1 2, r χ + 丨 "means, ... the next highest or latest item or item in the column register 212. Because: ps (o) is the oldest or lowest item in the queue (that is, the age of qd〇24㈧ in Figure 2. 卞 [0 0 4 0] control logic 3 0 0 includes one; multiplexer " 02. More The tool " .2 includes three pairs; it is VaUX) 344 and Val (X + "342. No .: H input 354 and PS (X + 1) 3 52. No. = needle input a is, QCO brother pair input is NS (X) 364 and NS (X + 1); 〇b Z ° [004 1] The signal Val (X) 344 is the output of multiplexer 4 316. The table of MXU instruction queue register 212 is torn and stored. Is the instruction currently. The signal Va1 (X + 〇342 is the output of the multiplexer 4 316 of item ^ 1 of the MXU instruction queue register 212, which indicates whether the instruction of the item χ + i is currently valid. [0042] Signal PS (X) 354 indicates the current age of the instruction stored in item X of the MX × instruction queue temporary cry 212, which is stored in the instruction. The age register 312. That is, PS (X) 354 represents an integer Which stage of the pipeline 202 holds the instruction that is also stored in the item χ of the MXU instruction queue register 2 丨 2. The signal PS (X + 1) 3 5 2 indicates that the Mxu instruction queue register 12 Item X + The current age of the order is stored in the age register 312 of the order. [0043] Control logic 300 also includes logic 322, which is based on ps (x)
第19頁 1235331Page 19 1235331
3 54及訊號乙(1乂-? 3 76產生訊號化(乂)364,如圖4的真值表 1所示。若指令最初被载入於MXU指令佇列暫存器2丨2的項 目X中則LdX — P為真或致能(act i ve)。對於保有指令 佇列暫存器212之項目X所存指令的整數管線2〇2階段,訊 號NS(X) 3 64指出其後的下個整數管線2〇2階段。而對於 保有MXU指令佇列暫存器2丨2之項目χ +丨所存指令的整數管 線2 0 2階段,訊號NS(X+i) 3 6 2則顯示其後的了個整數管 線20 2階段。如圖4之表丨所顯示,若指令最初被載入〇υ 指令佇列暫存器212中,則^(乂)3 64為〇〇,其對應於整數 官線2 0 2的£-階段225。否則,^8(\) 3 6 4係由?8(\)354 及HldX-P 372中決定,如圖4之表1所示。 [〇〇44]請再參照圖3,多工器! 3〇2會依據一選擇輸 入H ldX_P 372,從三對輸入的每一對中,選取其中一個輸 入。HldX-P 37 2係表示MXU指令佇列暫存器21 2中的項目是 否向下移位。當一指令要在MXU指令佇列暫存器21 2中向下 移位時,如由於某個指令要從MXU指令佇列暫存器2丨2移3 54 and signal B (1 乂-? 3 76 generate signalization (乂) 364, as shown in truth table 1 of Figure 4. If the instruction is initially loaded into the MXU instruction queue register 2 丨 2 items In X, LdX-P is true or enabled (act i ve). For the integer pipeline stage 202 of the instruction stored in item X of the instruction queue register 212, the signal NS (X) 3 64 indicates the following The next integer pipeline stage 202. For the integer pipeline stage 2 0 2 which holds the item χ + 丨 of the MXU instruction queue register 2 丨 2, the signal NS (X + i) 3 6 2 shows it. The next stage is an integer pipeline of 202. As shown in Table 丨 of Figure 4, if the instruction is initially loaded into the 〇υ instruction queue register 212, ^ (乂) 3 64 is 〇〇, which corresponds to Integer official line 2 0 2-stage 225. Otherwise, ^ 8 (\) 3 6 4 is determined by? 8 (\) 354 and HldX-P 372, as shown in Table 1 of Figure 4. [〇〇 44] Please refer to Figure 3 again, the multiplexer! 302 will input H ldX_P 372 according to a selection, and select one of each of the three pairs of inputs. HldX-P 37 2 represents the MXU instruction queue Whether the items in the register 21 2 are shifted downward. When an instruction is to be shifted down in the MXU instruction queue register 21 2, for example, because an instruction is to be moved from the MXU instruction queue register 2 丨 2
除’ HldX-P 372會變為除能(inactive)。除能的HldX P 3 7 2會使多工器1 3 0 2從MXU指令佇列暫存器2 1 2之下個較高 項目中,選取 Val(X+ 1 ) 342、PS(X+ 1 ) 3 52及 NS(X+ 1) 3 6 2。致能的HldX —P 372則會使多工器1 3 0 2維持來自於 MXU指令佇列暫存器212之目前項目中Val(X) 344、PS(X) 3 5 4及N S ( X ) 3 6 4的值。多工器1 3 0 2會於輸出訊號n s 3 9 2 中提供所選取的下個階段值、於輸出訊號PS 3 9 4中提供所 選取的目别狀態值’並於輸出訊號V a 1 3 9 6中提供所選取 的有效位元值。Removal of 'HldX-P 372 will become inactive. Disabled HldX P 3 7 2 will cause the multiplexer 1 3 0 2 to select Val (X + 1) 342, PS (X + 1) 3 from the higher item below the MXU instruction queue register 2 1 2 52 and NS (X + 1) 3 6 2. The enabled HldX-P 372 will enable the multiplexer 1 3 0 2 to maintain the current items Val (X) 344, PS (X) 3 5 4 and NS (X) from the MXU instruction queue register 212. 3 6 4 value. The multiplexer 1 3 0 2 will provide the selected next stage value in the output signal ns 3 9 2, provide the selected state value in the output signal PS 3 9 4 ', and output the signal V a 1 3 The selected significant bit value is provided in 9 6.
第20頁 1235331 案號 92129613 年 月 曰 修正 五、發明說明(15) [0045] 控制邏輯30 0還包括一個耦接至多工器1 302 的3 : 1多工器,標示為多工器2 3 0 4。多工器2 3 0 4係用以 將項目X中之指令的年齡更新至其適當值。多工器2 3 0 4會 接收三個指令狀態值,亦即三組包含一有效位元及二個年 齡位元的值,並且選取其中一個狀態值來輸出。第一指令 狀態值包括多工器1 3 0 2的PS輸出3 94及Val輸出3 9 6。亦 即,第一指令狀態包括由多工器1 3 0 2從PS ( X) 3 5 4與PS (X+ 1 ) 3 5 2中所選取的年齡,以及由多工器1 3 0 2從Val (X) 344與Val (X+ 1 ) 342中所選取的有效位元。第二指 令狀態值包括多工器1 3 0 2的N S輸出3 9 2及V a 1輸出3 9 6。亦 即,第二指令狀態包括由多工器1 302從NS(X) 364與NS (X+ 1 ) 3 6 2中所選取的年齡,以及由多工器1 3 0 2從Val (X) 344與Val (X+ 1 ) 342之中所選取的有效位元。第三 指令狀態值包括0 0 0的值,亦即有效位元為0、年齡為0 0, 其指定了整數管線2 0 2的E-階段225。 [0046] 多工器2 30 4會依據一個二位元的選擇輸入 age —update 382,選取三個指令狀態值其中之一。邏輯 3 2 2會根據下面表2所示的等式,依訊號PS 3 94、重置訊號 374、訊號1^\ — ? 376及63七6_人訊號3 78,來產生訊號 age —update 3 8 2。在表2中,PS[0]及PS[1]位元為多工器1 3 〇 2之輸出说5虎PS 394的二個位元。 age_update[1]= LdX—P | 重置; age_update[0]= Gate—A 丨 PS[0]丨 PS[1]; 表2Page 20 1253331 Case No. 92119613 Amendment V. Description of the Invention (15) [0045] The control logic 30 0 also includes a 3: 1 multiplexer coupled to the multiplexer 1 302, labeled as multiplexer 2 3 0 4. The multiplexer 2 3 0 4 is used to update the age of the instruction in item X to its appropriate value. The multiplexer 2 3 0 4 will receive three command status values, that is, three sets of values containing a valid bit and two age bits, and select one of the status values to output. The first command status value includes PS output 3 94 and Val output 3 9 6 of the multiplexer 1 3 0 2. That is, the first instruction state includes the age selected by the multiplexer 1 3 0 2 from PS (X) 3 5 4 and PS (X + 1) 3 5 2 and the multiplexer 1 3 0 2 from Val (X) 344 and Val (X + 1) 342. The second command state value includes the N S output 3 9 2 and the V a 1 output 3 9 6 of the multiplexer 1 3 0 2. That is, the second instruction state includes the age selected from the multiplexer 1 302 from NS (X) 364 and NS (X + 1) 3 6 2 and the multiplexer 1 3 0 2 from Val (X) 344 And the valid bit selected in Val (X + 1) 342. The third instruction state value includes a value of 0 0 0, that is, the effective bit is 0 and the age is 0 0, which specifies the E-phase 225 of the integer pipeline 2 0 2. [0046] The multiplexer 2 30 4 inputs age-update 382 according to a two-bit selection, and selects one of the three instruction state values. Logic 3 2 2 will generate the signal age —update 3 according to the equation shown in Table 2 below according to signal PS 3 94, reset signal 374, signal 1 ^ \ —? 376 and 63 7 6_person signal 3 78. 8 2. In Table 2, the PS [0] and PS [1] bits are the two bits of the output of the multiplexer 1 3 02 said 5 tiger PS 394. age_update [1] = LdX—P | reset; age_update [0] = Gate—A 丨 PS [0] 丨 PS [1]; Table 2
1235331 SE 92129613 五、發明說明(16)1235331 SE 92129613 V. Description of the invention (16)
[ 0 0 4 7 ]真值的LdX —p 3 76係表示Μχυ指令佇列暫存器 2 1 2的項目X所載入的是來自於D階段264的指令,而非已 ,位於MXU指令符列暫存器212中的指令。真值的重置訊號 374係表示MXU指令佇列暫存器212正在重置。直值的 Gatej 3 78係表示整數管線2〇2未產生停頓。在圖3的實施 例中G a te - A 3 7 8僅表示在整數管線2 〇 2的s階段2 2 6之上 的階段未產生停頓。亦即,在圖3的實施例中,s_階段226 f W-階段22 7不會停顿,以致於—旦指令到彡s階段m, 可確保在下個時脈週期,指令將會變老,亦即指令將繼 續進行至整數管線2 0 2的W -階段2 2 7。同樣地,一旦指令到 達W-階段22 7 ’就可確保其會丟棄。雖然控制訊號可從 Gate_A 378取得,但MXU管線2〇6的停頓或移動係由其自己 的控制訊號而非Gate —A 378來控制。[0 0 4 7] True value LdX — p 3 76 indicates that the item X of the Μχυ instruction queue register 2 1 2 is loaded with instructions from the D phase 264, not already, and is located in the MXU instruction symbol. Instruction in column register 212. A true reset signal 374 indicates that the MXU instruction queue register 212 is being reset. The straight Gatej 3 78 series indicates that there is no pause in the integer pipeline 202. In the embodiment of FIG. 3, G a te-A 3 7 8 only indicates that there is no pause at the stage above the s-stage 2 2 6 of the integer pipeline 202. That is, in the embodiment of FIG. 3, the s_phase 226 f W-phase 22 7 will not pause, so that once the instruction reaches 彡 s phase m, it can be ensured that the instruction will become old in the next clock cycle. That is, the instruction will continue to W-stage 2 2 7 of the integer pipeline 2 02. Similarly, once the instruction reaches W-phase 22 7 ', it is guaranteed that it will be discarded. Although the control signal can be obtained from Gate_A 378, the pause or movement of MXU pipeline 206 is controlled by its own control signal instead of Gate-A 378.
[0 048 ]若重置發生,或指令從Μχυ管線2 0 6的D-階段 2 64被載入MXU指令佇列暫存器212的項目X,則表2的等式 指定多工器2 3 0 4要選取第三指令狀態輸入。若指令正移 動至下個整數管線2 0 2階段(亦即,若整數管線2 〇 2未停 頓,如真值的Gate一A 3 78所顯示的,或若指令至少已到達 整數官線2 0 2的S -階段2 2 6,如〇 1、1 〇或丨1這些值的p s 3 9 4 所顯示)’則多工器2 304將選取第二指令狀態輸入(包括 NS 3 9 2及Val 3 9 6 )。否則,指令會在整數管線2〇2中停 頓,亦即不會沿整數管線2 0 2繼續向下移動;因此。多工 器2 304將會選取第一指令狀態(包括ps 3 94及vai 3 9 6 )。 [0 0 4 9 ]多工器2 3 0 4之輪出的年齡部份3 8 4係用來作 為年齡暫存器31 2的輸入。年齡暫存器31 2的輸出,即訊號[0 048] If a reset occurs or the instruction is loaded from the D-stage 2 64 of the Mxυ pipeline 2 0 6 into the item X of the MXU instruction queue register 212, the equation of Table 2 specifies the multiplexer 2 3 0 4 To select the third command state input. If the instruction is moving to the next integer pipeline stage 202 (that is, if the integer pipeline 2 has not stalled, as shown by the true Gate-A 3 78, or if the instruction has reached at least the integer official pipeline 2 0 2 S-Phase 2 2 6 as shown by ps 3 9 4 of these values of 〇1, 〇 or 丨 1) 'The multiplexer 2 304 will select the second command state input (including NS 3 9 2 and Val 3 9 6). Otherwise, the instruction will stall in the integer pipeline 202, that is, it will not continue to move down the integer pipeline 202; therefore. The multiplexer 2 304 will select the first command state (including ps 3 94 and vai 3 9 6). [0 0 4 9] The age portion 3 8 4 of the multiplexer 2 3 0 4 is used as the input of the age register 31 2. The output of the age register 31 2 is the signal
I 第22頁 1235331 -幽 29613__年—月日_Hi_ 五、發明說明(17)I Page 22 1235331-You 29613__year-month-day_Hi_ V. Description of the invention (17)
PS(X) 354,則用來當作邏輯322的輸入。訊號ps(x) 354 也被送到MXU指令佇列暫存器2 1 2的下個較低項目,而變成 項目X— 1的PS(X+ 1 ) 3 5 2。同樣地,訊號NS(X) 364會被 送到MXU指令仔列暫存器2 1 2的下個較低項目,而變成項目 1一1的^(\+1)362。同樣地,訊號781以)344會被送到 MXU指令仵列暫存器2 1 2的下個較低項目,而變成項目X— 1 的¥31(\+ 1 ) 342。此外,^^11指令佇列暫存器212之最低 項目(即項目(31)〇 24 0 )的訊號“1(〇) 344及?8( 0 ) 3 54會被 送到圖2的G-階段2 6 5,並且往下通過MXU管線2 0 6的其餘階 段。當指令到達MXU管線2 0 6的M-階段2 6 9時,Μ -階段2 6 9會 才双查相關的狀悲值’以判斷指令是否有效,以及指令位於 整數管線2 0 2的哪個階段中,藉此判斷是否要更新微處理 器2 0 0的使用者可見狀態。 [0050] 控制邏輯30 0還包括一個耦接至多工器2 304 的4 : 1多工器,標示為多工器3 3 0 6。多工器3 3 0 6係用以 將項目X中的指令之有效位元更新至其適當值。多工器3PS (X) 354 is used as the input of logic 322. The signal ps (x) 354 is also sent to the next lower item of the MXU instruction queue register 2 1 2 and becomes PS (X + 1) 3 5 2 of item X-1. Similarly, the signal NS (X) 364 will be sent to the next lower item of the MXU instruction queue register 2 1 2 and become the item 1 to 1 ^ (\ + 1) 362. Similarly, signals 781 and 344 are sent to the next lower item of the MXU instruction queue register 2 1 2 and become ¥ 31 (\ + 1) 342 of item X-1. In addition, the signals “1 (〇) 344 and? 8 (0) 3 54 of the lowest item (ie, item (31) 〇24 0) of the queue register 212 of the ^^ 11 instruction will be sent to G- in FIG. 2 Phase 2 6 5 and down through the remaining phases of the MXU pipeline 2 06. When the instruction reaches M-phase 2 6 9 of the MXU pipeline 2 06, the M-phase 2 6 9 will double check the relevant state and tragedy value 'To determine whether the instruction is valid, and in which stage of the integer pipeline 202, to determine whether to update the user-visible state of the microprocessor 200. [0050] The control logic 300 also includes a coupling The 4: 1 multiplexer up to multiplexer 2 304 is labeled as multiplexer 3 3 0 6. The multiplexer 3 3 0 6 is used to update the valid bit of the instruction in item X to its proper value. Multi Tool 3
3 0 6接收四個有效位元輸入。第一有效位元輸入為va 1輸出 386’其為多工器2 30 4的輸出之有效位元部份。其他的三 個有效位元輸入為來自於整數管線2 0 2的G -階段2 2 4、E -階 段2 2 5及S-階段22 6的有效位元,分別標示為MmxValNxt_G 336、 MmxValNxt—E 334及 MmxValNxt一S 332。多工器 3 306 的輸出係用來當作有效位元暫存器3 0 8的輸入。 [0051] 多工器3 30 6會依據一選擇輸入,即多工器2 3 0 4之輸出的年齡部份384,來選取四個有效位元輸入中的 一個。因此,若指令的年齡3 8 4為0 0,則多工器3 3 0 6會從3 0 6 receives four valid bit inputs. The first significant bit input is va 1 output 386 ', which is the significant bit portion of the output of multiplexer 2 30 4. The other three significant bit inputs are G-phase 2 2 4, E-phase 2 2 5 and S-phase 22 6 from the integer pipeline 2 0, which are marked as MmxValNxt_G 336, MmxValNxt-E, respectively. 334 and MmxValNxt-S 332. The output of multiplexer 3 306 is used as the input of valid bit register 308. [0051] The multiplexer 3 30 6 selects one of the four valid bit inputs according to a selection input, that is, the age portion 384 of the output of the multiplexer 2 304. Therefore, if the age of the instruction 3 8 4 is 0 0, the multiplexer 3 3 0 6 will start from
第23頁Page 23
1235331 民:T _ 案號92129611一一一一….ΰ年日洎 日 絛正_ 五、發明說明(18) 整數管線2 0 2的G階段2 2 4選取有效位元3 3 6。這是因為指令 會從MXU管線2 0 6的D階段264載入至MXU指令佇列暫存器 212,其等效於將指令從整數管線2 02的G階段224載入,亦 即,由於R 2 -階段2 6 2的存在,Μ X U管線2 0 6相對於整數管線 2 0 2會向下移位一個階段,所以MXU管線2 0 6的D階段2 6 4會 與整數管線2 0 2的G階段224相鄰;因此,整數管線202的G 階段2 2 4之指令的有效位元,即為要載入有效位元暫存器 3 0 8的更正有效位元。 [0052] 若指令的年齡38 4為0 1,則多工器3 3 0 6會從 整數管線2 0 2的Ε階段225選取有效位元334。若指令的胃年齡 3 8 4為1 0,則多工器3 3 0 6會從整數管線2 0 2的S階段2 2 6選 取有效位元332。最後,若指令的年齡384為u,則多工器 3 30 6會從多工器2 30 4的輸出選取有效位元386。亦 即’會保持目前的有效位元值。因此,一旦指令通過整數 官線2 0 2的W-階段2 2 7 (亦即,被w-階段2 2 7丟棄),有效位 元值會被保留著,因為在這個將指令無效化的時刻之後, 不會有條件或事件發生。 [0053] 控制邏輯3〇〇還包括一個耦接至多工器3 306 的2· 1多工器’標示為多工器4 316。若當指令位於整數 管,2 0 2的W-階段2 2 7時,產生一無效化條件或事件,則多 工器4 3 1 6會用來更新有效位元。多工器4 3丨6會接收二個 有效位元輸入。第一輪入係來自於有效位元暫存器3 0 8的 輸出。第二輸入為及閘3丨4的輸出。及閘3丨4為具有二個輸 入的及閘。及閘3 1 4的第一輸入為有效位元暫存器3 〇 8的輸 出。及閘314的第二輸入為£^6^ —佩號338的反相訊號, 1235331 案號92129613 ί 年库頃 曰 修正 五、發明說明(19) 其標示為圖3的「·丨Except —W 338」。真值的Ex cep t_W訊 號338係表示當指令位於整數管線2 02的W-階段22 7時,發 生一異常而將指令無效化。因此,若指令先前為無效,或 是當指令位於整數管線2 0 2的W-階段2 2 7時,發生一無效化 的異常’則及閘 3 1 4會產生偽值的輸出。 [ 0 0 5 4 ]多工器4 31 6會依據一選擇輸入,其為比較器 3 1 8的輸出,來選擇其中一個有效位元輸入。比較器3 1 8會 從年齡暫存器3 1 2的輸出,接收指令的年齡,並將此年齡 與二進位值1 〇做比較,後者係指定整數管線2 〇 2的w—階段 2 2 7 ’如前所述。若年齡為1 〇,則比較器3丨8會輸出真值, 而使夕工器4 3 1 6選取及閘3 1 4的輸出。否則,比較器3 1 8 會輸出偽值,而使多工器4 3 1 6選取有效位元暫存器3 〇 8的 輸出。多工器4 316的輸出為Val(X)訊號 344,其顯示M)(IJ 指令靖存器212的項目X中指令目前的有效位元M:MXU [0055]以剛才所述的方式,多工器3 30 6及多工器4 乂6可確實保有指令之最新有效位元值。這可藉由從整數 二^ 20 2取得有效位元332、334及336來達成,因為若在 ;向=行經整數管線2 0 2時,發生任何無效化的條件 ::::處理器m會更新整數管線2。2中之指令的有效事位 、J错由將指令無效化來達成;或一曰指人 :吊 …-階段227,則藉由保留有效整數管線 [〇 〇 5 6 ]現請參照圖5,其為太蘇曰日 的運作圖例。圖5俜顯干/ \ ^之圖2微處理器2 〇 〇 .__相“丁歹】暫存杰212的初始狀態。圖5更進—步顯1235331 Min: T _ Case No. 92129611 one by one ......... Leap year date 洎 Day 绦 _ V. Description of the invention (18) G stage 2 2 of integer pipeline 2 2 4 Select valid bit 3 3 6. This is because the instruction is loaded from the D stage 264 of the MXU pipeline 206 to the MXU instruction queue register 212, which is equivalent to loading the instruction from the G stage 224 of the integer pipeline 202. That is, since R 2-the existence of phase 2 6 2, Μ XU pipeline 2 0 6 will be shifted down by one phase relative to the integer pipeline 2 0 2, so the D phase 2 6 4 of the MXU pipeline 2 0 6 will be the same as the integer pipeline 2 0 2 The G stage 224 is adjacent; therefore, the effective bit of the instruction of the G stage 2 2 4 of the integer pipeline 202 is the corrected effective bit to be loaded into the effective bit register 3 0 8. [0052] If the age of the instruction 38 4 is 0 1, the multiplexer 3 3 0 6 will select a valid bit 334 from the E stage 225 of the integer pipeline 2 0 2. If the commanded stomach age 3 8 4 is 10, the multiplexer 3 3 0 6 will select the valid bit 332 from the S-phase 2 2 6 of the integer pipeline 2 0 2. Finally, if the age of the instruction is 384, the multiplexer 3 30 6 selects a valid bit 386 from the output of the multiplexer 2 30 4. That is, 'will keep the current significant bit value. Therefore, once the instruction passes the W-phase 2 2 7 of the integer official line 2 02 (that is, it is discarded by w-phase 2 2 7), the valid bit value will be retained because at this moment when the instruction is invalidated After that, no conditions or events will occur. [0053] The control logic 300 also includes a 2.1 multiplexer ' coupled to the multiplexer 3 306, designated as multiplexer 4 316. If an invalidation condition or event is generated when the instruction is in the integer tube, W-stage 2 2 7 of 002, the multiplexer 4 3 1 6 will be used to update the valid bit. The multiplexer 4 3 丨 6 will receive two valid bit inputs. The first round of input is from the output of the valid bit register 308. The second input is the output of AND gate 3 丨 4. AND gates 3 丨 4 are AND gates with two inputs. The first input of the AND gate 3 1 4 is the output of the valid bit register 308. And the second input of gate 314 is £ ^ 6 ^-reverse signal of Pei 338, 1235331 case No. 92119613 ί year library is amended V. Description of the invention (19) It is marked as "Except -W" in Figure 3 338 ". The true Ex cep t_W signal 338 indicates that when the instruction is in the W-phase 22 7 of the integer pipeline 202, an exception occurs and the instruction is invalidated. Therefore, if the instruction was previously invalid, or an invalidation exception occurred when the instruction was in the W-phase 2 2 7 of the integer pipeline 202, the AND gate 3 1 4 will produce a false value output. [0 0 5 4] The multiplexer 4 31 6 will select one of the valid bit inputs according to a selection input, which is the output of the comparator 3 1 8. The comparator 3 1 8 receives the age of the instruction from the output of the age register 3 1 2 and compares this age with the binary value 1 0, which is the w-stage 2 2 7 of the specified integer pipeline 2 0 2 'As mentioned before. If the age is 10, the comparators 3 and 8 will output a true value, so that the multiplexer 4 3 1 6 selects and the output of the brake 3 1 4. Otherwise, the comparator 3 1 8 will output a false value, and the multiplexer 4 3 1 6 selects the output of the valid bit register 3 08. The output of the multiplexer 4 316 is a Val (X) signal 344, which shows M) (IJ instruction The current valid bit M in the instruction X of the register 212 M: MXU [0055] In the manner just described, multiple The multiplexer 3 30 6 and the multiplexer 4 乂 6 can indeed keep the latest valid bit value of the instruction. This can be achieved by obtaining the valid bits 332, 334, and 336 from the integer two ^ 20 2 because if it is; When the = passes through the integer pipeline 2 02, any invalidation condition occurs :::: The processor m will update the valid bit of the instruction in the integer pipeline 2.2, J error is achieved by invalidating the instruction; or- Refers to the person: hanging ...- phase 227, by retaining a valid integer pipeline [00〇6] Now please refer to Figure 5, which is an example of the operation of Taisu Yueri. Figure 5 俜 Xiangan / \ ^ 的 图 2 Microprocessor 2 〇〇 .__ 相 "丁 歹" temporarily stores the initial state of Jie 212. Figure 5 goes further-step by step
第25頁 1235331Page 25 1235331
而向下行經圖 ,在下個時脈 2 1 2的運作情 --fc虎 9212^13 五、發明說明(20) 不當指令基於前述的初始條件及其他事件, 2的整數管線2 0 2及MXU指令佇列暫存器212時 週期(稱為時脈2)期間,MXU指令佇列暫存器 形。 [〇 〇 5 7 ]在時脈1期間,圖5顯示了 MXU指令佇列暫存@ 212之項目3(即_ 243 )中標示為「^rA」的丁 =存= 時脈1期間,lnstr錄位於整數管線2〇2的w_階段227。 ,,儲存於項目3之圖3年齡暫存器312中之instr A的年齡 為1〇。亦即,圖3之PS(3)訊號354具有1〇的值,如圖5所π =。因此,圖3之邏輯322會產生丨丨的NS(3)值,如圖5所 不。,者,在時脈说間,instr A是有效的。因此,有效 暫存為3 0 8所儲存的值為真,且Val (3)訊號344亦為真,如 圖5所示。 ^ [〇〇58]在時脈2期間,由於instr A未被載入MXU指令 佇列暫存器21 2,亦即instr A已存在於MXU指令佇列暫存 器212中,所以Ld2_P訊號3 76的值為偽,如圖所示。再 者’在時脈2期間,因為i n s t r A正向下行經整數管線 2 02,亦即整數管線2〇2未產生停頓,所以Gate —_號3γ8 的值為真,如圖所示。再者,在時脈2期間,η 1 d 2 - P訊號 3 72的值為偽(如圖所示),係表示由於指令佇列暫存器 2 1 2的底部項目已移出,所以儲存於MXU指令佇列暫存器 2 1 2項目3中的指令將會向下移位至項目2。再者,在時脈2 期間,Except_W訊號338的值為真(如圖所示),係表示發 生一個使i n s t r A無效的事件。 [0 0 5 9 ]已知有這些初始條件及事件,則在時脈2期The downward flow chart, the operation of the next clock 2 1 2-fc tiger 9212 ^ 13 V. Description of the invention (20) Improper instructions based on the aforementioned initial conditions and other events, 2 integer pipeline 2 2 2 and MXU During the 212-hour cycle (referred to as clock 2) of the instruction queue register, the MXU instruction queue register is shaped. [0057] During clock 1, Figure 5 shows the MXU instruction queue temporary storage @ 212 item 3 (ie _ 243) marked as "^ rA" D = save = clock 1 period, lnstr The recording is located at the w_ stage 227 of the integer pipeline 202. The age of instr A stored in the age register 312 of FIG. 3 in item 3 is 10. That is, the PS (3) signal 354 of FIG. 3 has a value of 10, as shown in FIG. 5 as π =. Therefore, the logic 322 of Fig. 3 will generate the NS (3) value, as shown in Fig. 5. That is, in the clock theory, instr A is effective. Therefore, the value stored in the valid temporary storage for 308 is true, and the Val (3) signal 344 is also true, as shown in FIG. 5. ^ [〇〇58] During clock 2, because instr A is not loaded into the MXU instruction queue register 21 2, that is, instr A already exists in the MXU instruction queue register 212, so Ld2_P signal 3 The value of 76 is false, as shown in the figure. Moreover, during clock 2, because i n s t r A goes down through integer pipeline 2 02, that is, there is no pause in integer pipeline 202, so the value of Gate — # 3γ8 is true, as shown in the figure. Moreover, during the clock 2 period, the value of η 1 d 2-P signal 3 72 is false (as shown in the figure), which indicates that the bottom item of the instruction queue register 2 1 2 has been removed, so it is stored in The instruction in the MXU instruction queue register 2 1 2 item 3 will be shifted down to item 2. Furthermore, during the clock 2 period, the value of the Except_W signal 338 is true (as shown in the figure), which means that an event that invalidates i n s t r A has occurred. [0 0 5 9] Knowing these initial conditions and events,
1235331 _案號92129613 车 月 a 絛正 五、發明說明(21) 間’與MXU指令仵列暫存器212之項目2及雄關之圖3控制 邏輯3 0 0將運作如下。因為Ps( 3 ) 354具有1〇的值,所以比 較器3 1 8將會產生真值的輸出,而使項目3的多工器4 3丄6 遥擇及閘3 1 4的輸出(其值為〇,此因insfr a位於w -階段 2 2 7時,發生一無效化的異常)。因此,在時脈2期間,將 會產生偽值的Val(3)訊號344,以表示inst]r A為無效。 [0 0 6 0 ]因為Η 1 d 2 — P 3 7 2為偽(表示μ X U指令仔列暫存1235331 _Case No. 92129613 Car month a 绦 正 5. Project description 2 of the invention description (21) and the MXU instruction queue register 212 and the control logic of Figure 3 control logic 3 0 0 will operate as follows. Because Ps (3) 354 has a value of 10, the comparator 3 1 8 will produce a true value output, and the multiplexer 4 3 丄 6 of item 3 will remotely select the output of the gate 3 1 4 (its value 〇, this is because insfr a is located at w-stage 2 2 7, an invalidation exception occurs). Therefore, during clock 2, a Val (3) signal 344 of false value will be generated to indicate that inst] r A is invalid. [0 0 6 0] because Η 1 d 2 — P 3 7 2 is false (meaning μ X U instruction queue is temporarily stored
器2 1 2正向下移位),所以MXU指令佇列暫存器2 1 2項目2之 多工器1 30 2將選取「乂+1」的值,亦即選取]^(3)354、 NS( 3 ) 364及Val( 3 ) 344的值,其分別為10、^及〇。由於 instr A會向下行經整數管線2 0 2,此由真值的Gate_A 37\ 來表示,所以項目2的多工器2 3 0 4將選取多工器i 3 〇 2的 N S輸出3 9 2。因此,在時脈2結束時,項目2之年齡暫存器 312中所儲存之丨113“人的新年齡將為11,此係顯示丨[151^ A已通過整數管線2 02的W-階段227。因為項目2之多工器2Shift 2 1 2 forward and downward), so the MXU instruction queue register 2 1 2 multiplexer 1 30 2 of item 2 will select the value of "乂 +1", that is, select] ^ (3) 354 , NS (3) 364 and Val (3) 344, which are 10, ^, and 0, respectively. Since instr A will go down through the integer pipeline 202, which is represented by the true Gate_A 37 \, the multiplexer 2 3 0 4 of item 2 will select the N S output of the multiplexer i 3 002 3 9 2. Therefore, at the end of Clock 2, the new age of 113 "people stored in the age register 312 of item 2 will be 11, which shows that [151 ^ A has passed the W-stage of the integer pipeline 2 02 227. Because of the multiplexer 2 of project 2
3 0 4的年齡輸出部份3 8 4為1 1 ’如前所述,所以項目2的多 工器3 30 6將選取多工器2 30 4的Val輸出386。由於Vai(3) 3 4 2為0 (如前述),所以輸入至項目2之多工器3 3 6的v a 1 38 6之值為〇,並且,項目2之多工器1 30 2及多工器2 304 會選取Val(3) 342,作為輸入至多工器3 30 6之Val 386。 因此,在時脈2結束時,有效暫存器3 0 8所儲存的新有效位 元將為0(表示instr A現在是無效的),以通知MXU管線 2 0 6 ’不可更新微處理器2 0 0對應於instr A之使用者可見 的程式狀態。 [0 0 6 1 ]雖然本發明及其目的、特徵與優點已詳細敘The age output part 3 3 4 of 3 0 4 is 1 1 ′ as described above, so the multiplexer 3 30 6 of item 2 will select the Val output 386 of the multiplexer 2 30 4. Since Vai (3) 3 4 2 is 0 (as described above), the value of va 1 38 6 input to the multiplexer 3 3 6 of item 2 is 0, and the multiplexer 1 30 2 and more of item 2 The multiplexer 2 304 will select Val (3) 342 as the input to Val 386 of the multiplexer 3 30 6. Therefore, at the end of clock 2, the new valid bit stored in the valid register 3 0 8 will be 0 (meaning that instr A is now invalid) to inform the MXU pipeline 2 0 6 'The microprocessor 2 cannot be updated. 0 0 corresponds to the program state visible to users of instr A. [0 0 6 1] Although the present invention and its objects, features, and advantages have been described in detail
第27頁 1235331 Λ 修正 1號 五、發明說明(22) 述’其它實施例亦可包含在本發明之範圍内。例如,雖然 本1¾明已敘述作為MXU—部份之指令與資料彳宁列,但本發 明仍可適用於各種其他類型的功能單元,如串流SIMD延伸 (SSE)單兀。再者,雖然本發明已配合χ86處理器的使用者 可見狀態加以說明,但本發明仍適用於各種處理器。此 =,雖然本發明所敘述之處理器,係以整數管線 行指令或運算有效化功能的功能單元,但本發^ 、又執 :其他及/或額外功能單元執行有效化功能的處理仍適久於 二,雖然本發明已述及維持整數管線與μμχ管線== ::致性,以獲知ΜΜΧ管線是否及何時可更新? :者:見狀態之目的’但本發明一般而 :的使 與分離仵列功能單元有關的狀態-致性問題。;,於任何 不同功能單元間由於存在—非同步佇列,❿使:即’對於 的問題,本發明可用來維持其狀態一致。"狀態偏掉 總之’以上所述者,彳 當不能以之限定本發明所:f明之較佳實施例而已, 專利範圍所作之均等變化盥::之靶圍。大凡依本發明申請 涵蓋之範圍内,謹請貴化審與/飾,皆應仍屬於本發明專二 至禱。 貝審查委員明鑑,並祈惠准,是所Page 27 1235331 Λ Amendment No. 5 5. Description of the Invention (22) Other embodiments described in the description may also be included in the scope of the present invention. For example, although the instructions and information as part of the MXU have been described in the present invention, the invention can also be applied to various other types of functional units, such as a stream SIMD extension (SSE) unit. Furthermore, although the present invention has been described with reference to the visible state of the user of the x86 processor, the present invention is still applicable to various processors. This =, although the processor described in the present invention is a functional unit that executes instructions or operations with integer pipelines, this issue ^, and also executes: the processing of other and / or additional functional units to perform effective functions is still appropriate Longer than two, although the present invention has described maintaining integer pipelines and μμχ pipelines == ::: consistency to know whether and when the MMX pipelines can be updated ?: by: see the purpose of the state 'but the present invention is generally: Separate status-consistency issues related to queued functional units. Because of the existence of non-synchronous queues between any different functional units, so that: 'for the problem, the present invention can be used to maintain the same state. " Status deviation In short, the above-mentioned one, when the present invention can not be limited: only the preferred embodiment of the invention, the patent range is made equal changes: the target range. Anyone who is within the scope of the application in accordance with the present invention is kindly requested to submit your trial and / or decoration, which should still belong to the second to the present prayer. Pei reviewer Ming Jian, and pray for accurate, is the
1235331 案號 9212961$ 修正 圖式簡單說明 [Ο Ο 2 1 ]圖1係顯示一習知技術的微處理器,其在整數 管線末端具有一功能單元佇列。 [0 0 2 2 ]圖2係本發明之微處理器的方塊圖。 [0 0 2 3 ]圖3係本發明控制圖2之MXU指令佇列暫存器的 邏輯之方塊圖。 [0 0 2 4 ]圖4係本發明圖3之邏輯產生下個狀態值的真 值表。 [0 0 2 5 ]圖5係本發明圖2之微處理器的運作圖例。 圖號 說明 100, 200 4效處 理 器 102, ,276 指令 104, ,202 整數 管 線 106 佇列 204 資料 快取記憶體 206 MXU管線 208 MXU資料快 取 記憶 體 212 MXU才 旨令佇 列 暫存 器 214 多工 器 221 ,261 R -階 段 222 ,263 A -階 段 223 ,264 D -階 段 224 ,265 G -階 段 225 ,266 E -階 段 ❿1235331 Case No. 9212961 $ Amendment Brief description of the drawing [Ο Ο 2 1] Figure 1 shows a conventional microprocessor, which has a function unit queue at the end of the integer pipeline. [0 0 2 2] FIG. 2 is a block diagram of a microprocessor of the present invention. [0 0 2 3] FIG. 3 is a block diagram of the logic for controlling the MXU instruction queue register of FIG. 2 according to the present invention. [0 0 2 4] FIG. 4 is a truth table of the next state value generated by the logic of FIG. 3 of the present invention. [0 0 2 5] FIG. 5 is an operation diagram of the microprocessor of FIG. 2 according to the present invention. The drawing numbers explain 100, 200 4-effect processors 102, 276, instructions 104,, 202 integer pipeline 106 queue 204 data cache memory 206 MXU pipeline 208 MXU data cache memory 212 MXU order queue queue register 214 Multiplexer 221, 261 R-Phase 222, 263 A-Phase 223, 264 D-Phase 224, 265 G-Phase 225, 266 E-Phase ❿
第29頁 J-匕1235331 案號 92129613 曰 修正 圖式簡單說明 2 2 6,2 6 7: S-階段 2 2 7,2 68: W-階段 262 269 274 300 302 304 306 308 312 314 316 318 332 334 336 338 342 344 352 354 362 364 R 2 -階段 M-階段 資料匯流排 3 2 2 :邏輯 多工器1 多工器2 多工器3 有效暫存器 年齡暫存器 及閘 多工器4 比較器Page 29 J-Dagger1235331 Case No. 92119613 Brief description of the revised schema 2 2 6, 2 6 7: S-phase 2 2 7, 2 68: W-phase 262 269 274 300 302 304 306 308 312 314 316 318 332 334 336 338 342 344 352 354 362 364 R 2-Phase M-phase data bus 3 2 2: Logical multiplexer 1 Multiplexer 2 Multiplexer 3 Effective register Age register and gate multiplexer 4 Comparators
MmxVa1Nxt_G MmxValNxt_E MmxVa1Nxt_S Except —W訊號 Val(X+ 1) Val (X) PS(X+ 1) PS(X) NS(X+ 1) NS(X)MmxVa1Nxt_G MmxValNxt_E MmxVa1Nxt_S Except —W signal Val (X + 1) Val (X) PS (X + 1) PS (X) NS (X + 1) NS (X)
第30頁 1235331 案號 92129613_年月日_修正Page 30 1235331 Case No. 92129613_Year_Month_Amendment
圖式簡單說明 3 72 : HldX 3 74 : 重置訊號 3 7 6 ·· LdX_P 3 78 : G a ΐ e — A訊號 38 2 ·· 訊號age一 一update 384 : 年齡輸出 部份 38 6 : Val輸出 3 9 2 : 輸出訊號 NS 3 94 : 輸出訊號 PS 3 9 6 : 輸出訊號 Val 第31頁Schematic description 3 72: HldX 3 74: Reset signal 3 7 6 · · LdX_P 3 78: G a ΐ e — A signal 38 2 · · Signal age one-to-one update 384: Age output part 38 6: Val output 3 9 2: Output signal NS 3 94: Output signal PS 3 9 6: Output signal Val Page 31
Claims (1)
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US10/279,213 US6983358B2 (en) | 2001-10-23 | 2002-10-23 | Method and apparatus for maintaining status coherency between queue-separated functional units |
Publications (2)
Publication Number | Publication Date |
---|---|
TW200412538A TW200412538A (en) | 2004-07-16 |
TWI235331B true TWI235331B (en) | 2005-07-01 |
Family
ID=36637637
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
TW92129613A TWI235331B (en) | 2002-10-23 | 2003-10-23 | Method and apparatus for maintaining status coherency between queue-separated functional units |
Country Status (1)
Country | Link |
---|---|
TW (1) | TWI235331B (en) |
-
2003
- 2003-10-23 TW TW92129613A patent/TWI235331B/en not_active IP Right Cessation
Also Published As
Publication number | Publication date |
---|---|
TW200412538A (en) | 2004-07-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP4856100B2 (en) | Non-aligned memory access prediction | |
US7434024B2 (en) | SIMD processor with register addressing, buffer stall and methods | |
US8782384B2 (en) | Branch history with polymorphic indirect branch information | |
TWI470547B (en) | Out-of-order execution microprocessor and operation method thereof | |
EP1296229B1 (en) | Scoreboarding mechanism in a pipeline that includes replays and redirects | |
US10209992B2 (en) | System and method for branch prediction using two branch history tables and presetting a global branch history register | |
EP0381471A2 (en) | Method and apparatus for preprocessing multiple instructions in a pipeline processor | |
US8683179B2 (en) | Method and apparatus for performing store-to-load forwarding from an interlocking store using an enhanced load/store unit in a processor | |
JP2008530714A5 (en) | ||
US9710272B2 (en) | Computer processor with generation renaming | |
US20060218124A1 (en) | Performance of a data processing apparatus | |
US20100131742A1 (en) | Out-of-order execution microprocessor that selectively initiates instruction retirement early | |
US20060224864A1 (en) | System and method for handling multi-cycle non-pipelined instruction sequencing | |
WO1996012228A1 (en) | Redundant mapping tables | |
US6981131B2 (en) | Early condition code evaluation at pipeline stages generating pass signals for controlling coprocessor pipeline executing same conditional instruction | |
US10437599B2 (en) | System and method of reducing processor pipeline stall caused by full load queue | |
US10007524B2 (en) | Managing history information for branch prediction | |
US20210019149A1 (en) | Detecting a dynamic control flow re-convergence point for conditional branches in hardware | |
US9841974B2 (en) | Renaming with generation numbers | |
US10713049B2 (en) | Stunt box to broadcast and store results until retirement for an out-of-order processor | |
US6983358B2 (en) | Method and apparatus for maintaining status coherency between queue-separated functional units | |
TWI235331B (en) | Method and apparatus for maintaining status coherency between queue-separated functional units | |
JP7409208B2 (en) | arithmetic processing unit | |
US6298436B1 (en) | Method and system for performing atomic memory accesses in a processor system | |
US20070043930A1 (en) | Performance of a data processing apparatus |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
MK4A | Expiration of patent term of an invention patent |