TW200406684A - Apparatus and method for masked move to and from flags register in a processor - Google Patents

Apparatus and method for masked move to and from flags register in a processor Download PDF

Info

Publication number
TW200406684A
TW200406684A TW92128964A TW92128964A TW200406684A TW 200406684 A TW200406684 A TW 200406684A TW 92128964 A TW92128964 A TW 92128964A TW 92128964 A TW92128964 A TW 92128964A TW 200406684 A TW200406684 A TW 200406684A
Authority
TW
Taiwan
Prior art keywords
register
eflags
instruction
scope
processor
Prior art date
Application number
TW92128964A
Other languages
Chinese (zh)
Other versions
TWI238943B (en
Inventor
Gerard M Col
Glenn G Henry
Terry Parks
Original Assignee
Ip First Llc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US10/279,207 external-priority patent/US7076639B2/en
Application filed by Ip First Llc filed Critical Ip First Llc
Publication of TW200406684A publication Critical patent/TW200406684A/en
Application granted granted Critical
Publication of TWI238943B publication Critical patent/TWI238943B/en

Links

Landscapes

  • Executing Machine-Instructions (AREA)

Abstract

A method and apparatus are provided for writing to, and reading from, the EFLAGS register in a processor. For a particular write to EFLAGS request, a mask is generated using destination information for the write and privilege level information for the write. The mask is then ANDed with EFLAGS new value information and the result is written to the EFLAGS register in a single instruction cycle. For a particular read from EGLAGS request, a mask is generated using privilege information for the read to specify those bits of EFLAGS which can be updated during the read. The mask is then ANDed with the contents of the EFLAGS register and the result is stored in a stack in memory.

Description

200406684 五、發明說明(1) 【與相關申請案之對照】 [0 0 0 1 ]本申請案優先權之申請係根據該美國專利申請200406684 V. Description of the invention (1) [Comparison with related applications] [0 0 0 1] The priority application of this application is based on the US patent application

案,案號60/345455,申請日:1 0/ 23/200 1,專利名 稱:"APPARATUS AND METHOD FOR MASKED MOVE TO FLAGS REGISTER” 。 【發明所屬之技術領域】 [ 00 0 2 ]本發明係有關電腦指令之執行的領域,尤指_Case No. 60/345455, Application Date: 1 0/23/200 1, Patent Name: " APPARATUS AND METHOD FOR MASKED MOVE TO FLAGS REGISTER ". [Technical Field to which the Invention belongs] [00 0 2] This invention is The area of execution of computer instructions, especially _

種用以減少執行寫入/讀取EFLAGS暫存器之指令週期的裝 置及方法。 / 【先前技術】 [ 0003 ]在一χ86管線微處理器中,執行一寫入到 EFLAGS暫存器的指令(例如:popF/popFj),CLI/S丁工, CLD/STD,CLC/STC)需要的週期數量相當多。因為寫入到 EFLAGS暫存器的動作係受到現時輸出入特權等級(I privilege level, l〇PL)及EFLAGS暫存器内某些位元在一 寫入時之狀態的影響。在微軟視窗R作業系統下,在每一 次一子程式有一回應時,該EFLAGS暫存器即須被從該母堆疊 儲存中挽出,因而導致顯著的作業系統延遲。 [ 0004 ]因此,本發明提供一種微處理器運算技術以減 少因執行一寫入到EFLAGS暫存器的指令之相關延遲,例 如:退堆疊(P 〇 P )指令。A device and method for reducing the instruction cycle of writing / reading EFLAGS register. / [Prior art] [0003] In a x86 pipeline microprocessor, execute an instruction written to the EFLAGS register (for example: popF / popFj), CLI / S Dingong, CLD / STD, CLC / STC) The number of cycles required is quite large. Because the action of writing to the EFLAGS register is affected by the current I / O privilege level (10PL) and the state of some bits in the EFLAGS register at the time of writing. Under the Microsoft Windows R operating system, each time a subroutine has a response, the EFLAGS register must be retrieved from the mother stack storage, resulting in a significant operating system delay. [0004] Therefore, the present invention provides a microprocessor operation technology to reduce the delay associated with executing an instruction written to the EFLAGS register, such as an unstack (P0P) instruction.

[ 0005 ]同時,在一x86管線微處理器中,將該EFLAGS[0055] Meanwhile, in an x86 pipeline microprocessor, the EFLAGS

200406684 五、發明說明(2) 〜 暫存器存入於該堆疊儲存器之一下推(push)指令,即 PUSGF/PUSHFD,其需要的週期數量亦相當多。因為從該 EFLAGf暫存器所讀取之該些位元的狀態及該微處理器的執 订狀癌係受到現時輸出入特權等級(I〇pL) &EFLAGS暫存器 内該些特定位元在一下推(push)時之狀態的影響。在微軟 視作業系統下,在每一次回應一子程式,該⑽“以暫 存器即須被存入到該堆疊儲存器中,因而導致顯著的作 糸統延遲。 v ,、[000 6 ]因此,本發明提供一種微處理器運算技術以減 夕,執行一 a買取EFLAGS暫存器之的指令之相關存入eflags 至J為堆$儲存器的延遲,例如:下推(push)指令。 【發明内容[0007] 處理器内之 法。該方法 要求寫入到 括利用該轉 一單一寫入 更包括產生 算元之及運 多位元旗標 標暫存器即[0008] 在本發明之一 一多位元旗標 包括利用一微 該位元旗標暫 譯階段以產生 週期完成寫入 一旗標遮罩, 算,以產生一 暫存器,而且 為該EFLAGS暫 本發明可以在 具體實施 暫存器上 處理器之 存器之一 一微指令 到該多位 並且利用 結果’該 在本具體 存器。 例中, 執行一 一轉譯 巨集指 ,該微 元旗標 該旗標 結果隨 實施例 本發明 寫入運 階段, 令。該 指令係 暫存器 遮罩與 後即被 中,該 提供在微 算的方 以接收一 方法亦包 I且態為在 。該方法 一預定運 存入到該 多位元旗 指令週期完成一寫入到該200406684 V. Description of the invention (2) ~ The temporary register is stored in one of the stacked registers, the push instruction (PUSGF / PUSHFD), which requires a considerable number of cycles. Because the state of the bits read from the EFLAGf register and the subscription status of the microprocessor are subject to the current I / O privilege level (IOpL) & the specific bits in the EFLAGS register The influence of Yuan's state when pushing. Under Microsoft's operating system, each time a subroutine is responded, the "register must be stored in the stack memory, resulting in a significant system delay. V ,, [000 6] Therefore, the present invention provides a microprocessor operation technology to reduce the delay of executing the instructions of a to buy the EFLAGS register and storing the delays from J to the heap $ memory, such as push instructions. [Summary of the Invention] [0007] A method in a processor. This method requires writing to a register including the use of the turn to a single write and also including the generation of arithmetic and multi-bit flag registers [0008] In the present invention One of the multi-bit flags includes the use of a micro-bit flag temporary translation stage to generate a flag mask to complete the calculation, to generate a temporary register, and for the EFLAGS, the present invention can be used in Specifically, one of the registers of the processor on the temporary register is a microinstruction to the multiple bits and the result is used in the specific register. In the example, a one-to-one macro translation is performed, and the micro flag flags the flag. The results are written with the embodiment of the present invention. This command is temporarily masked by the register and is immediately hit. The method that is provided in the microcomputer to receive a method also includes I and is in the state. This method is intended to be loaded into the multi-bit flag instruction. Cycle completes a write to the

第7頁 五、發明說明(3) EFLAGS暫存器,因 [0009]纟本/明=/效的減少處理器的延遲 在-多位元旗標暫存器上執了—體/取施二中’本發明提供 包括利用一微處理器一 "貝 π的方法。該方法 位元旗標暫存器之1巨^,㈣,以接收-要求讀取該 階段以產生一微指♦,;二:二:亥方法亦包括利用該轉譯 ;完成讀取該多位元旗;係=匕單-寫入週 在-現時特權等級下該;,權資訊係關於 器之該些適宜做為更新之用7之了兮f夕位元旗標暫存 旗標遮罩與一預定運算及=以方法更包括利用該 法更包括將該結果儲η;;之以。-結果。該方 [ooun在本發明‘:右:;體之一堆疊儲存器内。 取運算的處理器的延遲,:;的:少;因於執行EFUGS讀 器執行堆疊儲存器下推的延]遲:= = ==存 以在-單-指令週期完成該些_堆疊;;可 [oon]本發明之其他目的及優點由隨及Page 7 V. Description of the invention (3) EFLAGS register, because [0009] copy / Ming = / effect to reduce the processor's delay on the-multi-bit flag register-body / access The present invention provides a method including the use of a microprocessor. In this method, the bit flag register is 1 ^, ㈣, to receive-request to read this stage to generate a micro-finger ♦; 2: 2: The method also includes using the translation; complete reading the multiple bits Yuan Banner; Department = Dagger-Writing Week Under-Current Privilege Level; The right information is about the device which is suitable for updating. The flag flag temporary flag mask And a predetermined operation and = method further includes using the method further including storing the result η ;; -result. The party [ooun in the present invention ‘: right :; one of the bodies is stacked in a reservoir. The delay of the processor of the fetch operation ::;: less; delay due to the stack memory push-down performed by the EFUGS reader] delay: = = == stored to complete the _ stacks in the -single-instruction cycle; [Oon] Other objects and advantages of the present invention

Ik附之圖表當可更加明白。 、次月及 【實施方式】 [0018]以下的說明,係在一特定實施例及其必要條 的脈絡下而提供,可使一般熟習此項技術者能夠利用 明。然而,各種對該較佳實施例所作的修改,對熟習此 技術者而言乃係顯而易見,並且,在此所討論的—般原、 200406684 五、發明說明(4) 理,亦可應用至其他實施例。因此,本發明並不限於此處 所展出與敘述之特定實施例,而是具有與此處所揭露之 理與新穎特徵相符之最大範圍。 ” [0019]請參閱圖一,其係為描述一傳統管線微處理器 1 00的方塊圖。該微處理器有一提取階段丨〇5,一轉譯階段 110,一暫存階段115,一定址階段12〇,一 dat/ALu或執行 階段125,及一寫回(write back)13〇階段。 [ 0020 ]於運作時,該提取階段1〇5從記憶體(未顯示) 提取巨集指令以供該微處理器丨〇〇執行。轉譯階段丨丨〇則將 該被提取的巨集指令轉譯成對應的微指令。 [0 0 2 1 ]每一微指令係用以命令微處理器丨〇 〇執行一特 疋子任務,並且該子任務為完成一被提取的巨集指令之全 部運算的一部份。暫存階段丨丨5則從一暫存檔案中,取還 被該微指令所指定之運算元,以供管線中隨後的階段所 用。定址階段1 20則計算被該微指令所指定之計憶體位 址’以供資料儲存與取還運算所用Qj)ata/ALU階段125不 疋在攸忒暫存檔案取還的資料上,執行算術邏輯單元 (arithmeti^ l0gic unit,ALU)運算,即是利用在定址階 段1 2 0所什算之計憶體位址,以從該計憶體讀取資料或寫 入資料至該計憶體。寫回階段丨3〇則將一資料讀取運算, 或是一 ALU運算的執行結果寫入到該暫存檔案。因此,回 顧整個流程,提取階段丨05提取巨集指令,該些巨集指令 經由轉譯階段11 〇被解碼成微指令,而該些被轉譯的微指 令則再流經11 5 -1 3 0階段以執行運算,因此構成該微處理The chart attached to Ik should be more clear. The following description is provided in the context of a specific embodiment and its necessary conditions, so that those skilled in the art can use the description. However, various modifications made to the preferred embodiment are obvious to those skilled in the art, and the principles discussed in this article—Parahara, 200406684, 5. Description of the invention (4), can also be applied to other Examples. Therefore, the present invention is not limited to the specific embodiments shown and described herein, but has the widest scope consistent with the principles and novel features disclosed herein. [0019] Please refer to FIG. 1, which is a block diagram depicting a conventional pipeline microprocessor 100. The microprocessor has an extraction stage 005, a translation stage 110, a temporary storage stage 115, and a certain address stage. 12 o, a dat / ALu or execution stage 125, and a write back 13 o stage. [0020] When in operation, the fetch stage 105 fetches macro instructions from memory (not shown) for use in The microprocessor 丨 〇〇 executes. The translation stage 丨 丨 〇 translates the extracted macro instructions into corresponding micro instructions. [0 0 2 1] Each micro instruction is used to command the microprocessor 丨 〇〇 Execute a special sub-task, and this sub-task is part of completing all operations of an extracted macro instruction. The temporary storage stage 丨 5 is retrieved from a temporary file and designated by the micro instruction The operand is used by the subsequent stages in the pipeline. The addressing stage 1-20 calculates the memory address specified by the microinstruction 'for data storage and retrieval operations. Qj) ata / ALU stage 125 does not exist Perform the arithmetic logic unit (ar ithmeti ^ 10gic unit (ALU) operation, that is, using the memory address that was calculated in the addressing phase 120 to read data from or write data to the memory. Write back stage 丨30 writes a data read operation or an execution result of an ALU operation to the temporary file. Therefore, reviewing the entire process, the extraction stage 丨 05 extracts macro instructions, and these macro instructions pass the translation stage 11 〇 is decoded into micro-instructions, and the translated micro-instructions then flow through the 11 5 -1 30 stage to perform the operation, thus constituting the micro-processing

200406684 五、發明說明(5) 器1 0 0的管線運算。 [ 0022 ]為增進對微處理器之字串處理的了解, 的討論中將使用一x86微處理器的標準命名法。但 此領域技術者將發現使用x86架構之暫存器與巨集 止於舉例說明而已,其他微處理器盥年曰7 々 ^,、永構亦可被用來作為 範例。 [ 0023 ]Data/ALU 階段 125 包括 EFLAGS 暫存器 132,該200406684 V. Description of the invention (5) Pipeline operation of device 100. [0022] To improve understanding of string processing in microprocessors, the discussion will use a standard nomenclature for x86 microprocessors. However, those skilled in the art will find that the registers and macros using the x86 architecture are limited to illustrations, other microprocessors are 7 years old, and Yongzhuo can also be used as an example. [0023] The Data / ALU phase 125 includes the EFLAGS register 132, which

E F L A G S暫存器1 3 2係存有該處理器的狀態。對於條件指令 迴路(conditional l〇op)與條件指令跳躍(c〇nditi〇nal jump)而吕’該EFLAGS暫存器1 32可被許多指令所修改,並 且可做為比較參數之用。該EFLAGS暫存器的每一位元均存 有該最後指令之特定參數的狀態,如下列之表格一,其顯 示该EFLAGS暫存器的32個位元,及每一位元的功能。E F L A G S register 1 3 2 stores the state of the processor. For conditional instruction loops and conditional instruction jumps, the EFLAGS register 1 32 can be modified by many instructions and can be used as a comparison parameter. Each bit of the EFLAGS register holds the status of specific parameters of the last instruction, as shown in Table 1 below, which shows the 32 bits of the EFLAGS register and the function of each bit.

第10頁 200406684 五、發明說明(6) 表格 EFLAGS暫存器 位元號碼 名鞲 功能 32:22:00 Reserved "接低電位" 21 ID ID旗標 20 VIP 虛擬中斷未決 19 VIF 虛擬中斷旗標 18 AC 對位檢查 17 VM 虛擬模式 16 EF 回復旗標 15 0 "接低電位" 14 NT 巢狀1作旗標 13:12 IOPL 輸出入特權等級 11 OF 溢位旗標 10 DF 方向旗標 9 IF 中斷旗標致能 8 TF 陷阱旗標 7 SF 符號旗標 6 ZF 零旗標 5 0 11接低電位11 4 AP 輔助進位旗標 3 0 "接低電位11 ΚΙ 第11頁 200406684Page 10, 200406684 V. Description of the invention (6) Form EFLAGS register bit number name function 32:22:00 Reserved " Connect to low potential " 21 ID ID flag 20 VIP virtual interrupt pending 19 VIF virtual interrupt flag Standard 18 AC registration check 17 VM virtual mode 16 EF reply flag 15 0 " Connect to low potential " 14 NT nested 1 as flag 13:12 IOPL I / O privilege level 11 OF overflow flag 10 DF direction flag Standard 9 IF interrupt flag enable 8 TF trap flag 7 SF symbol flag 6 ZF zero flag 5 0 11 connected to low potential 11 4 AP auxiliary carry flag 3 0 " connected to low potential 11 ΚΙ page 11 200406684

[ 0 024 ]在一今日管線微處理器中,例如處理器丨⑽, 任一執行一寫入到該EFLAGS暫存器的該些指令(即 POPF/POPFD,CLC/STC,CLD/STD ’CLI/STI)的執行過裎始 需要相當數量的機器週期。因為寫入到EFLAGS暫存器 的動作係受到現時輸出入特權等級(I〇pL) &EFUgs[0 024] In a today's pipeline microprocessor, such as a processor, any one of the instructions (that is, POPF / POPFD, CLC / STC, CLD / STD'CLI) written to the EFLAGS register is executed. / STI) requires a significant number of machine cycles to execute. Because the action written to the EFLAGS register is subject to the current I / O privilege level (I0pL) & EFUgs

内某些位元在一寫入時之狀態的影響,尤其位元丨,位元… 3,位元5,位元15,及位元22〜31係為保留狀態 Ueserved),並且其狀態不可被改變。此外,當該處理器 係在特權等級0之保護模式(亦稱為真實位址模式)下運 時,除VI P ’ VI F,及VM之外,所有非保留的位元均可被修 改。該VI P與V IF旗標必須被清除,該〇旗標則必直 現在狀態。The influence of the status of some bits at the time of writing, especially bit 丨, bit ... 3, bit 5, bit 15, and bits 22 ~ 31 are reserved (Ueserved), and their status cannot be Was changed. In addition, when the processor is operating in privileged level 0 protection mode (also known as real address mode), all non-reserved bits can be modified except VIP ′ VI F and VM. The VI P and V IF flags must be cleared, and the 0 flag must be present.

[ 0025 ]任何前述之寫入資料到efugs暫存器132的巨 集指令之執行均會導致一些微指令的產生。細部來說,一 微指令將首先被執行以決定該現時輸出入特權等級 (IOPL),忒些隨後之微指令即行讀取某些eflags位元的現 時狀態,=如:VM,RF,IOPL,ΠΡ,VIF,及IF,並且設 立位το狀態為一新數值以被寫入到EFUGS。而一最後的微 指令則將該新數值寫入到EFLAGS暫存器132。 [ 0 02 6 ]前述之傳統更新該EFUGS暫存器的方法有非常 顯著的缺失,即是必須要執行為數眾多的微指令才能完成 寫入到該EFLAGS暫存器。因為有數個微指令必須要被產生 及處理,因此上述之更新該EFUGS暫存器會消耗許多時 200406684 五、發明說明(8) 間,使得微處理器的效能降低。 [0 0 2 7 ]本發明注意到在微軟視窗R作業系統下,在每 一次一子程式有一回應時,該EFLAGS暫存器即須被從該堆 疊儲存器中挽出,並且,此情況亦發生在許多今日普遍使 用之桌上型電腦之應用程式。既然此類型之指令已為大眾 廣泛使用,因此迫切需要能將該些指令之執行時間減到最 少 〇 [0 028 ]本發明的目的在於減少執行一寫入到該eflags 暫存所需之4曰令週期的數量。為達此目的,本發明提供 一種動態產生一EFLAGS遮罩的裝置與方法。該遮罩係為結 合一指定運算元執行邏輯及運算(即將EFLAGS暫存器退堆 疊’或選擇EFLAGS位元狀態),並且將其結果寫入到該 EFLAGS暫存器。此處所揭示之新處理器有下述優點;該處 理器使用一新的微指令,即M〇ve To EFLAGS (MTEF),該 微指令結合該執行階段之專用邏輯,使得寫入到EFLAGS的 動作可以在一單一指令週期内完成。 [0 0 2 9 ]現請參閱圖二’其係為一方塊圖,該方塊圖描 述一處理器200使用單一微指令,即M〇ve τ〇 EFLAGS (MTEF),以在一單一指令週期内完成寫入到EFLAGS暫存 - 器。該處理器2 0 0包括一提取階段2 〇 2,並且該提取階段 _ 202内含一麵接至指令記憶體2〇6的提取邏輯204 (instruction fetch logic) 〇 一指令指標208 (instruction pointer)係耦接至該提取邏輯2〇4以指示該 提取邏輯204到該記憶體206的特定位置去提取現行指令。[0025] The execution of any of the aforementioned macro instructions that write data to the efugs register 132 will result in the generation of some micro instructions. In detail, a micro-instruction will be executed first to determine the current I / O privilege level (IOPL), and some subsequent micro-instructions will read the current status of certain eflags bits, such as: VM, RF, IOPL, ΠP, VIF, and IF, and set the bit το state to a new value to be written to EFUGS. A final microinstruction writes the new value to the EFLAGS register 132. [0 02 6] The aforementioned traditional method of updating the EFUGS register has a very significant deficiency, that is, it must execute a large number of micro instructions to complete writing to the EFLAGS register. Because there are several micro-instructions that must be generated and processed, it will take a lot of time to update the EFUGS register as described above. 200406684 V. Invention Description (8), which reduces the performance of the microprocessor. [0 0 2 7] The present invention notices that under the Microsoft Windows R operating system, each time a subroutine has a response, the EFLAGS register must be pulled out of the stack memory, and this situation also Occurs on many desktop applications that are commonly used today. Since this type of instruction has been widely used by the public, there is an urgent need to reduce the execution time of these instructions to a minimum. [0 028] The purpose of the present invention is to reduce the time required to execute a write to the eflags temporary storage. The number of order cycles. To achieve this, the present invention provides a device and method for dynamically generating an EFLAGS mask. The mask performs logical AND operation on a specified operand in combination (that is, unstacks the EFLAGS register 'or selects the EFLAGS bit state), and writes the result to the EFLAGS register. The new processor disclosed here has the following advantages; the processor uses a new microinstruction, namely Move To EFLAGS (MTEF). This microinstruction combines the dedicated logic of the execution stage to make the action written to EFLAGS This can be done in a single instruction cycle. [0 0 2 9] Please refer to FIG. 2 ', which is a block diagram depicting a processor 200 using a single microinstruction, namely Move τοEFLAGS (MTEF), in a single instruction cycle. Finish writing to the EFLAGS scratchpad. The processor 2 0 0 includes an fetch phase 2 0 2, and the fetch phase _ 202 includes an fetch logic 204 (instruction fetch logic) which is connected to the instruction memory 2 0 0 an instruction pointer 208 Is coupled to the fetch logic 204 to instruct the fetch logic 204 to a specific location of the memory 206 to fetch the current instruction.

第13頁 200406684 五、發明說明(9) [ 0 03 0 ]當該提取邏輯204提取一巨集指令,例如·· POPF/POPFD,CLI/STI,CLC/STC,或CLD/STD指令,轉譯Page 13 200406684 V. Description of the invention (9) [0 03 0] When the extraction logic 204 fetches a macro instruction, such as POPF / POPFD, CLI / STI, CLC / STC, or CLD / STD instruction, translate

階段2 12之轉譯器210即回應產生一MTEF D,s微指令,該 指令將在data/ALU-執行階段2 1 6執行一移動到EFLAGS暫存 為的動作。在该MTEF D,S微指令中,s為一來源攔位,該 來源攔位指示將被轉移至EFLAGS暫存器214的資料來源;D 為一目的攔位,該目的欄位指定在EFLAGS暫存器214之意 圖被寫入的位兀。 [0 03 1 ]此處’在討論該jjTEF微指令之處理之前,將先 时娜4處理器2 0 0其餘的架構。如圖二所示,該% τ e f d,§ 械扣令被送至轉譯指令仔列(X I q ) 2 1 §。然後,該μ τ £ ρ j), S,指令再流至暫存階段222之一MTEF暫存器22〇。暫存階 段222係為儲存該處理器2〇〇的架構狀態。暫存檔案224包 括一ESP架構暫存器226。如圖二所示,該暫存階段222亦 包括一 OP1暫存器228及一 OP2暫存器230。 匕凡[ 0 032 ]該暫存階段222經由定位階段向下耦接至載入 階段2 3 2。該處理器2 〇 〇使用一傳統定位階段2 5 〇,以計算 該處理器200處理該些指令所使用的位址。定位階段222之 MTEF暫存器220的内容被送至並且儲存在載入階段232中對 應之MTEF暫存器234。載入階段232包括載入/調正邏輯 23 6。’該載入/調正邏輯236係耦接至暫存階段222之〇?1暫 =二、8及〇 P 2暫存器2 3 〇。該載入/調正邏輯2 3 6亦耦接至 貧料記憶體238。該載入/調正邏輯236的輸出端係耦接至 OP3暫存器240。如圖二所示,暫存階段222之〇?1暫存器 200406684 五、發明說明(ίο) 228及0P2暫存器230的内容係分別向下傳送至載入階段232 之OP1暫存器242及OP2暫存器244。 [0033] 處理器200更包括一 data/ALU或執行階段216, 該執行階段216包括前述之EFLAGS暫存器214。該執行階段 216亦包括一 TVAL暫存器246及一 TMASK暫存器248,並且在 TVAL暫存器246及TMASK暫存器248的内容係由一及閘250之 及運算結合在一起,該及運算之結果係儲存在EFLAGS暫存 器214。下述之討論將有關於該及運算與該TMASK暫存器 2 48所提供之遮罩運算。執行階段2 1 6亦包括一特權等級暫 存器PR IV 252,該特權等級暫存器係提供現時被^^“〖暫 存器248執行中之指令的特權資訊。該些指令執行的結果 則被送至結果暫存器2 5 4,該些結果並且經由一結果匯流 排(未顯示)被寫入到該暫存器檔案2 2 4。 [0034] 如前所述,為回應由提取邏輯2〇4提供給轉譯 器 210 之一被提取的 p0pF/p〇pFD,CLI/STI,CLX/STC, CLD/STD巨集指令,該轉譯器21〇產生一單一微指令,即 MTEF D,S,並且將該MTEF D,S微指令送至與該轉譯器 21 0耦接之轉譯器佇列(X丨q ) 2丨6,然後再送至與轉譯器佇 列(XIQ)216耦接之暫存階段222。該〇訂微指令包括一來 源欄,及一目的欄位d。該目的欄位係用以指定EFLAGS 暫存器2 1 4之應被寫入的位元。舉例來說,若D = 〇,則該目 的欄位將指定一寫入到進位旗標CF,該進位旗標叮,如表 格所示’係為EFLAGS暫存器之〇位元。在另一例中,d = 9 將指定一寫入到該EFLAGS暫存器之if位元,d = 1〇將指定一In response, the translator 210 of stage 2 12 generates a MTEF D, s microinstruction. This instruction will execute an action of moving to temporary storage of EFLAGS in the data / ALU-executing stage 2 1 6. In the MTEF D, S microinstruction, s is a source stop, and the source stop instruction will be transferred to the data source of the EFLAGS register 214; D is a destination stop, and the purpose field is designated in the EFLAGS temporary The bit in which the memory 214 is intended to be written. [0 03 1] Here, before discussing the processing of the jjTEF microinstruction, the rest of the architecture of the processor 4 will be used. As shown in Figure 2, the% τ e f d, § mechanical deduction order is sent to the translation instruction line (X I q) 2 1 §. Then, the μ τ £ ρ j), S, instruction flows to the MTEF register 22, which is one of the temporary storage stages 222. The temporary storage stage 222 is used to store the architecture state of the processor 200. The temporary file 224 includes an ESP architecture register 226. As shown in FIG. 2, the temporary storage stage 222 also includes an OP1 register 228 and an OP2 register 230. Dagger [0 032] The temporary storage stage 222 is coupled down to the loading stage 2 3 2 via the positioning stage. The processor 200 uses a conventional positioning phase 2 50 to calculate the addresses used by the processor 200 to process the instructions. The contents of the MTEF register 220 in the positioning stage 222 are sent to and stored in the corresponding MTEF register 234 in the loading stage 232. The loading phase 232 includes loading / correction logic 23 6. ′ The loading / correction logic 236 is coupled to the temporary storage stage 222-1 = 2, 8 and 0 P 2 register 2 3 〇. The load / adjust logic 2 3 6 is also coupled to the lean memory 238. The output of the load / adjust logic 236 is coupled to the OP3 register 240. As shown in Figure 2, the temporary storage stage 222-1 register 200406684 V. Description of the invention (228) and the 0P2 register 230 are transferred down to the OP1 register 242 of the loading stage 232, respectively. And OP2 register 244. [0033] The processor 200 further includes a data / ALU or execution stage 216. The execution stage 216 includes the aforementioned EFLAGS register 214. The execution phase 216 also includes a TVAL register 246 and a TMASK register 248, and the contents of the TVAL register 246 and the TMASK register 248 are combined by a sum operation of a sum gate 250, and The result of the operation is stored in the EFLAGS register 214. The following discussion will be about the sum operation and the mask operation provided by the TMASK register 2 48. The execution stage 2 1 6 also includes a privilege level register PR IV 252. The privilege level register provides the privilege information of the instructions currently being executed by the "register 248". The results of the execution of these instructions are The results are sent to the result register 2 5 4 and the results are written to the register file 2 2 4 via a result bus (not shown). [0034] As mentioned before, the extraction logic is used for the response. 204 is provided to one of the translators 210 and the extracted p0pF / p0pFD, CLI / STI, CLX / STC, CLD / STD macro instructions. The translator 21 generates a single micro instruction, namely MTEF D, S And send the MTEF D, S microinstruction to the translator queue (X 丨 q) 2 丨 6 coupled to the translator 21 0, and then to the translator queue (XIQ) 216 temporarily Storage stage 222. The zero order micro instruction includes a source field, and a field d. The field is used to specify the bits to be written in the EFLAGS register 2 1 4. For example, if D = 〇, the field of this item will specify a write to the carry flag CF, the carry flag bit, as shown in the table 'is EFLAGS register 〇 Element. In another embodiment, d = 9 to specify if a write to the EFLAGS register the bit, d = a specified 1〇

第15頁 200406684 五、發明說明(11) 寫入到該EFLAGS暫存器之DF位元。在該MTEF D,s微指令 中將EFLAGS暫存器214從該堆疊儲存器中挽出之一退堆 疊(pop)的設定為D = 31。 [ 0035 ]該MTEF微指令的來源攔位s則指定被寫入到該 EFLAGS暫存器214之位元的狀態。舉例來說,若〇=1〇,弘 〇,即命令處理器200清除DF。若D = 〇,s——j,即命令處理器 200設定該進位旗標。換言之,在該MTEF微指令中,s = 〇代 表清除該目的位元,而S = 1代表設定該目的位元。對一p〇p EFLAGS指令而言,該指令忽略該S攔位。 [0036]在寫入忒EFLAGS暫存器時,在data/ALU執行階 段m之執行邏輯256㈣為—遮罩以確保只有正確=元 位置被寫A。在執行一MTEF微指令時,即動態的該 TM^K暫,器248的内容。data/ALU執行階段2丨6之執行邏 軏256係從特權暫存器PRIV 252存取該現 =LA晴子器214存取其他位元之狀態。被提供乂 暫存I^TVAL 246的内容,若不是5欄位的數值,即是在 載^階段2 3 2讀取自堆疊儲存器的EFUGS暫存器。如圖二 所不,TVAL與TMASK係以—及閘25〇連結在一起並且該 fj 250運算之結果係被寫入到EFLA])S暫存器214,有利的 :可係組態為只改變某些位元,而該某些位元係為 :作為,PRIV暫存器讀取之該特定現時運算模式之函數 在此具體實施例中,—指令所擁有之最高特權等 气ί最南特*等級之指令*更新該EFLAGS暫存器之 μ二私疋位兀時,擁有最高的容許度。較低特權等級之指 200406684 五、發明說明(12) ---- 令在更新該EFLAGS暫存器的限制較多。最高特權等級為 0曰。在本發明之一具體實施例中,該遮罩所有之位元的數 量等同於該EFLAGS暫存器的位元數量。當一特定遮罩位元 被设立,即是指該遮罩位元在pRIV暫存器252之該現時 權等級之下,係為可被更新,反之,若一特定遮罩位元 被設立,則該遮罩位元為不可被更新。總而言之,口乩 提供被寫入到EFLAGS之該特定數值,而TMASK則根據對鹿 於儲存在PRIV暫存器252之該特定指令的特權等級,以^ 定一寫入動作是否被允許。 、 、[00j7]本發明使得一寫入到該EFLAGS暫存器的指令可 以在單一指令週期内完成,因此得以顯著的增加處理 的產出量。 |王裔 [j038 ]現請參閱圖三,其顯示一流程圖,用以描述微 益200根據本發明以執行一寫入到該efugs暫存器的 J异的高階處理流程概要。流程開始於方塊3〇〇,在此 处,彳< 纪憶體提取一巨集指令,例如:, = ,CLC/STC,或CLD/STD,流程接著進行到方塊 3〇5_。於方塊305中,轉譯器21〇將該巨集指令轉譯成要求 $仃寫入到EFLAGS暫存器214之微指令,流程接著進行到 塊^1〇。於方塊310中’在該EFUGS遮罩暫存器 ιφμγ;產生一EFLAGS遮罩,流程接著進行到方塊315。於方 \ 中,如岫文所述,一新數值將被寫入到EFLAGS,並 4巨集指令之執行結果將被載入到τ 程接著進行到方糊。於方塊320中,目的;訊將被提|供 200406684 五、發明說明(13) :TMAjK暫存器248,流程接著進行到方塊325。於方塊325 二時特權等級予該麵暫存器m,若該特權等 的資訊所指定之該些特定EFLAGS位元, 所=〜暫存器248會被組態有一許可更新由該目的資訊 之該些特定肌脱位元之—數值,流程接著進行到 f鬼330。於方塊33〇中,對該TMASK暫存器的内容與該新 換存、益TVU的内容執行一及運算,流程接著進行到方 二。於方塊335中,方塊330之及運算使得唯有被該現 日守特權等級所容許之EFLAGS位元會被更新。 [0 039]值得注意的是,在傳統管線處理器中,執行 P^SHF^PUSHFD以將該EFLAGS暫存器下推至該堆疊儲存器會 ^ ^ 很大數1之處理器週期,其係因為從該EFLAGS暫存Page 15 200406684 V. Description of the invention (11) The DF bit written into the EFLAGS register. In the MTEF D, s microinstruction, one of the EFLAGS register 214 pulled out of the stack memory is popped (D = 31). [0035] The source block s of the MTEF microinstruction specifies the state of the bits written to the EFLAGS register 214. For example, if 0 = 10, Hong 0 is commanded to the processor 200 to clear the DF. If D = 0, s-j, the command processor 200 is set to set the carry flag. In other words, in the MTEF microinstruction, s = 0 means that the destination bit is cleared, and S = 1 means that the destination bit is set. For a POP EFLAGS instruction, the instruction ignores the S block. [0036] When writing to the EFLAGS register, the execution logic 256 in the data / ALU execution stage m is a mask to ensure that only the correct = meta position is written to A. When an MTEF micro instruction is executed, the contents of the TM ^ K register 248 are dynamic. The execution logic of the data / ALU execution phase 2 to 6 is 256, which accesses the status from the privilege register PRIV 252, and the state where the LA bit 214 accesses other bits. Provided 乂 Temporary I ^ TVAL 246, if it is not a value of 5 columns, it is the EFUGS register read from the stack memory during the loading phase 2 3 2. As shown in the second figure, TVAL and TMASK are connected together with gate 250 and the result of the fj 250 operation is written to EFLA]) S register 214, which is advantageous: it can be configured to change only Some bits, and the certain bits are: as a function of the specific current operation mode read by the PRIV register, in this specific embodiment, the highest privileges held by the instruction, etc. * Level instruction * has the highest tolerance when updating the μ-private position of the EFLAGS register. Refer to the lower privilege level 200406684 V. Description of the Invention (12) ---- There are more restrictions on updating the EFLAGS register. The highest privilege level is 0. In a specific embodiment of the present invention, the number of bits in the mask is equal to the number of bits in the EFLAGS register. When a specific mask bit is set, it means that the mask bit is below the current weight level of the pRIV register 252, and can be updated. On the contrary, if a specific mask bit is set, Then the mask bit cannot be updated. In a word, the mouth provides the specific value written to EFLAGS, and TMASK determines whether a write operation is allowed according to the privilege level of the specific instruction stored in the PRIV register 252. [00j7] The present invention enables an instruction written to the EFLAGS register to be completed in a single instruction cycle, thereby significantly increasing the throughput of processing. Wang Yi [j038] Please refer to FIG. 3, which shows a flowchart for describing the outline of the high-level processing flow of the micro 200 in accordance with the present invention to execute a J-diff written to the efugs register. The process starts at block 300, where 彳 < Yi Yimei extracts a macro instruction, for example: =, CLC / STC, or CLD / STD, and the process then proceeds to block 305_. In block 305, the translator 2110 translates the macro instruction into a microinstruction that requires $ 仃 to be written to the EFLAGS register 214, and the flow proceeds to block ^ 10. At block 310 ', in the EFUGS mask register ιφμγ; an EFLAGS mask is generated, and the flow proceeds to block 315. In Fang \, as described in the text, a new value will be written to EFLAGS, and the execution result of the 4 macro instruction will be loaded into the τ procedure and then proceeded to the square paste. In block 320, the purpose; the message will be provided | for 200406684 V. Description of the invention (13): TMAjK register 248, and the flow then proceeds to block 325. At block 325, the privilege level is assigned to the surface register m. If the specific EFLAGS bits are specified by the information of the privilege, etc., the register 248 will be configured with a permission to update the information by the destination The value of these specific muscle dislocations-the value, the process then proceeds to f ghost 330. In block 33, an operation is performed on the contents of the TMASK register and the contents of the new exchange and the benefit TVU, and the flow proceeds to the second. In block 335, the sum of block 330 causes only the EFLAGS bits allowed to be updated by the current guarding privilege level to be updated. [0 039] It is worth noting that, in a traditional pipeline processor, executing P ^ SHF ^ PUSHFD to push down the EFLAGS register to the stack memory will result in a large number of processor cycles of 1. Because staging from that EFLAGS

器靖,之動作係受到現時輸出入特權等級(I〇pL)及πlAGS 暫存器 '該些特定位元在一寫入時之狀態的影響。尤其位 ’、位tl3,位元5,位元15,及位元22〜31係為保留狀The behavior of the device is affected by the current I / O privilege level (IopL) and the state of the πlAGS register 'the specific bits at the time of writing. In particular, bits', bits t13, bits 5, bits 15, and bits 22 to 31 are reserved.

恶,亚且其狀態不可被改變。再者,該EFLAGS暫存器之VM fRF旗標(/立元16及位元17)並沒有被複製,反之,該些旗 標的數值係從儲存在堆疊儲存器之EFLAGS暫存器的虛像中 清除掉。 [ 0 040 ]當一χ86處理器係在虛擬8086模式下運作,並 且其1/0特權等級(IOPL)小於3時,一PUSHF/PUSHFD指令的 執行必然會導致一一般性保護錯誤(g e n e r a 1 p r ^ e c ^丨〇 η f^u It)或異常。但是,在真實定址模式下,且該ESp暫存 器或該SP暫存器等於1,3,或5時,一puSHF/puSHFD指令Evil, and its state cannot be changed. Furthermore, the VM fRF flags of the EFLAGS register (/ LE 16 and bit 17) have not been copied. On the contrary, the values of these flags are from the virtual image of the EFLAGS register stored in the stack memory. Clear it out. [0 040] When a χ86 processor operates in virtual 8086 mode and its 1/0 privilege level (IOPL) is less than 3, the execution of a PUSHF / PUSHFD instruction will inevitably lead to a general protection error (genera 1 pr ^ ec ^ 丨 〇η f ^ u It) or abnormal. However, in the real addressing mode and the ESp register or the SP register is equal to 1, 3, or 5, a puSHF / puSHFD instruction

第18頁 200406684 五、發明說明(14) 的執行必然會導致該處理器因缺少堆疊儲存器空間而停止 運作。Page 18 200406684 V. The implementation of invention description (14) will inevitably cause the processor to stop operating due to lack of stack storage space.

[0 0 4 1 ]在一今曰的管線微處理器中,例如處理器 100,任一PUSHD/PUSHFD指令的執行均會導致若干微指令 的產生。首先,一微指令將被執行以將該EFLAGS暫存器的 内容移到一暫時暫存器中;然後,另一微指令將被執行以 清除該VM位元與RF位元;緊接著,執行又一微指令以決定 該現時I/O特權等級(I〇PL),使得該處理器得以知道是否 應產生異常,或是停止運作;最後,執行一最後的微指令 以將該E F L A G S儲存到該堆疊儲存器。 [〇 〇 4 2 ]前述之傳統微處理器有非常顯著的缺失,即是 必須要執行為數眾多的微指令才能完成一下推(push) EFLAGS到堆疊儲存器上。在該些眾多的微指令中,一些用 以取得現時1/0特權等級(I0PL),以將該EFLAGS的内容移 J只隹宜儲存器的微指令係為必要的,另一些用以命令名 I,之W ’先行清除訂“以的某些特定位元亦為必要的, 1疋仍然有許多微指令的產生係因為現今之管線處理器奔 =’、並不能完成適合一包括一 ALU運算與一儲存運算之指 Ϊ 2執!! °今曰的執行階段邏輯僅允許執行一 ALU與記‘ 子運异。因此,任何包括命令一 ALU類型之運瞀及Jtp! = 型之運算的指令的執行,均必須,生兩二 兩個、鱼二7集,並且該兩個連續的微指令集的執行係需者 器週期。☆一作業系統下,例如微軟視窗, 卞栗糸、、先,在每一次回應一子程式,該EFUGs暫存器即襄 200406684 五、發明說明(15) 被下推到該堆疊儲存器中,因此若能減少該下推EFLAGS暫 存恭到該堆豐儲存器的執行時間,將有助於增進處理器的 效能。 [ 0043 ]圖四之處理器400提供一單一微指令,即M〇ve[0 0 4 1] In a pipeline microprocessor today, such as the processor 100, the execution of any PUSHD / PUSHFD instruction will result in the generation of several micro instructions. First, a microinstruction will be executed to move the contents of the EFLAGS register to a temporary register; then, another microinstruction will be executed to clear the VM bit and the RF bit; then, execute Another microinstruction determines the current I / O privilege level (IOPL), so that the processor can know whether an exception should be generated or the operation should be stopped. Finally, a final microinstruction is executed to store the EFLAGS in the Stacked storage. [〇 〇 4 2] The aforementioned traditional microprocessor has a very significant deficiency, that is, it is necessary to execute a large number of micro instructions to complete the push of EFLAGS to the stack memory. Among the many micro-instructions, some are used to obtain the current 1/0 privilege level (I0PL), and it is necessary to move the contents of the EFLAGS to micro-instructions suitable for storage, and others are used to order names I, of W 'It is necessary to clear some specific bits in advance. 1 疋 There are still many micro-instructions generated because today's pipeline processors Ben =', can not complete a suitable ALU operation. 2 instructions with a storage operation! ° The execution phase logic of this day only allows the execution of an ALU and a record. Therefore, any instruction that includes an operation of an ALU type and a Jtp! = Type operation The execution must be performed in two, two, and seven fish sets, and the execution of the two consecutive microinstruction sets requires a user cycle. ☆ Under an operating system, such as Microsoft Windows, In each response to a subroutine, the EFUGs register is Xiang 200406684. V. Description of the invention (15) is pushed down to the stack storage, so if the push down EFLAGS temporary storage can be reduced to the heap storage Processor execution time will help improve processor [0043] The processor 400 in FIG. 4 provides a single microinstruction, which is Move

From EFLAGS (MFEF),以將EFLAGS暫存器414的内容移至 堆疊儲存器。如圖所示,執行階段(data/ALU階段)之執行 邏輯416及一 load-ALU儲存管線架構致能EFLaGS之一下推 可以在一單一指令週期内完成。因此,處理器的效能將可 有顯著的增進。 [ 0044 ]該處理器400包括一提取階段402,並且該提取 階段402内含一耗接至指令記憶體4〇6的提取邏輯4〇4 (instruction fetch logic)。一指令指標4〇8 (instruction pointer)係耦接至該提取邏輯4〇4,以指示 該提取邏輯404到該指令記憶體40 6的特定位置去提取現行 指令。 [ 0 045 ]當該提取邏輯404提取一巨集指令,例如一 PUSHF/PUSHFD指令,轉譯階段之轉譯器即回應產生一 mfef 微指令’該微指令係用以在d a t a / A L U -執行階段執行一從 EFLAGS暫存器414之移出。 [ 0 046 ]如圖四所示,該MFEF微指令被送至轉譯指令佇 列(XIQ)419,隨後到達在暫存階段422内之一 MFEF暫存器 420。暫存階段422包括一暫存檔案424,該暫存檔案424儲 存该處理器4 0 0的架構狀態。暫存槽案4 2 4包括一堆疊指標 暫存器ESP 426。暫存階段422亦包括0P1暫存器428與0P2From EFLAGS (MFEF) to move the contents of the EFLAGS register 414 to the stacker. As shown in the figure, the execution logic of the execution phase (data / ALU phase) 416 and a load-ALU storage pipeline architecture enable one of the EFLaGS pushdowns to be completed in a single instruction cycle. Therefore, the performance of the processor can be significantly improved. [0044] The processor 400 includes an fetching stage 402, and the fetching stage 402 contains an instruction fetch logic 404 which is connected to the instruction memory 406. An instruction pointer 408 (instruction pointer) is coupled to the fetch logic 404 to instruct the fetch logic 404 to a specific location in the instruction memory 406 to fetch the current instruction. [0 045] When the fetch logic 404 fetches a macro instruction, such as a PUSHF / PUSHFD instruction, the translator in the translation phase responds to generate an mfef microinstruction 'the microinstruction is used to execute a data instruction in the data / ALU-execution phase Moved from EFLAGS register 414. [0 046] As shown in FIG. 4, the MFEF microinstruction is sent to the translation instruction queue (XIQ) 419, and then reaches one of the MFEF registers 420 in the temporary storage stage 422. The temporary storage stage 422 includes a temporary storage file 424, which stores the architectural status of the processor 400. The temporary storage case 4 2 4 includes a stack index register ESP 426. Temporary stage 422 also includes 0P1 register 428 and 0P2

第20頁 200406684Page 20 200406684

暫存器430。定址階段431係緊鄰該暫存階段422。定址階 段431係用以計算被儲存數值之位址,使得該些數值可^ 被從記憶體取還及被寫入記憶體。 [ 0047 ]該MFEF暫存器420的内容被送到並且儲存在载 入1¾ #又434之對應的MFEF暫存器432。該載入階段434包括 載入/調正邏輯436。如圖所示,該載入/調正邏輯436係輕 接至暫存階段422之OP1暫存器428與OP2暫存器430。該載 入/調正邏輯236亦耦接至資料記憶體438。該載入/調正邏 輯236的輸出端係耦接至0P3暫存器44〇。如圖四所示,暫Register 430. The addressing phase 431 is next to the temporary storage phase 422. The addressing stage 431 is used to calculate the addresses of the stored values, so that these values can be retrieved from and written into the memory. [0047] The contents of the MFEF register 420 are sent to and stored in the corresponding MFEF register 432, which contains 1 ## 434. The loading phase 434 includes loading / correction logic 436. As shown in the figure, the load / adjust logic 436 is connected to the OP1 register 428 and the OP2 register 430 of the temporary storage stage 422. The load / adjust logic 236 is also coupled to the data memory 438. The output of the load / adjust logic 236 is coupled to the OP3 register 44. As shown in Figure 4,

存階段422之OP1暫存器428及0P2暫存器43〇的内容係分別The contents of OP1 register 428 and 0P2 register 43 in the storage stage 422 are respectively

向下傳送至載入階段434之OP1暫存器442及OP2暫存器 444。 WIt is transmitted down to the OP1 register 442 and the OP2 register 444 in the loading stage 434. W

[ 0048 ]MFET暫存器422,〇Ρ1暫存器442,OP2暫存器 444及OP3暫存器440均耦接至data/ALU -執行階段418之執 行邏輯4 1 6 ’使得儲存在該些暫存器之該些數值均可被提 供給執行邏輯4 1 6。 [ 0 049 ]以下詳細討論該評訂微指令從轉譯階段41 2到 data/ALU-執行階段418之處理過程。當該轉譯器41〇接收 到一PUSHF或PUSHFD指令時,該轉譯器41〇即回應產生並輸 出一MFEF微指令。該MFEF微指令命令微處理器4〇〇執行增 加及讀取的,作,該MFEF微指令更命令堆疊指標暫存器[0048] MFET register 422, OP1 register 442, OP2 register 444, and OP3 register 440 are all coupled to data / ALU-the execution logic 4 1 6 'of execution stage 418 makes the storage in these These values of the register can be provided to the execution logic 4 1 6. [0 049] The processing of the review microinstruction from the translation stage 412 to the data / ALU-execution stage 418 is discussed in detail below. When the translator 41 receives a PUSHF or PUSHFD instruction, the translator 41 generates and outputs an MFEF microinstruction in response. The MFEF microinstruction instructs the microprocessor 400 to perform additions and reads. The MFEF microinstruction also instructs the stacking index register.

ESP 426去a買取EFLAGS暫存器414,並且動態的將其EFLAGS 鏡像修改成該現時作業模式之一函數。然後,將該EFLAGS 鏡像儲存至記憶體4 4 5内之堆疊儲存器。ESP 426 goes to a to buy EFLAGS register 414, and dynamically changes its EFLAGS image to a function of the current operation mode. Then, the EFLAGS image is stored in a stack memory in the memory 4 4 5.

第21頁 200406684 五、發明說明(17) [0050]data/ALU-執行階段418之執行邏輯416包括一 ,權暫存器PRIV 446,該特權暫存器pRIV 446係儲存該現 時執行之指令的I0PL。該特權暫存器pRIV 446係耦接至 FMASK暫存器448,因此,該現時1〇1^可以為該—八%暫存 為448之一輸入。EFLAGS暫存器414係耦接至FMASK暫存器 448,以提供一第二輸入給FMASK暫存器448。執行邏輯可 提供一遮罩,即FMASK,當一财評微指令被執行時,該遮 罩即動態的被產生。在本發明之一具體實施例中,該遮罩 =有之位元的數量等同於該EFLAGS暫存器4丨4的位元數 ϊ。當一特定遮罩位元被設立,即是指該遮罩位元在”” 暫存器446之該現時特權等級之下,係為可被更新,反 之,若一特定遮罩位元未被設立,則該遮罩位元為不可被 更新。執行邏輯41 6從特權暫存器PRIV 446存取該現時運 算模式,並從EFLAGS暫存器414存取其他位元之狀離。缺 後,讀取該EFLAGS暫存器414的内容,將該些内容連同… F口以執行一及運算,並且將其結果儲存到一結果暫存 為452。砰細的說,FMASK暫存器的輸出端係 广之一輸入端,而該及閑450之其他輸入端係搞接^閘 EFLAGS 暫存器414。 [〇〇51]在下一個機器週期中,儲存在結果暫存哭452 的結果即被寫入到記憶体内之堆疊儲存器的位址 Ϊ Ϊ ! TSP ^ # " 426 ^ ^ ° EFLAGS ^ ^ ^ ^ y非必須,因為在其緊接著下一個機器週期中,該鲈 έ被提供至儲存階段456之儲存邏輯似,所以非不=暫Page 21, 200406684 V. Description of the invention (17) [0050] The execution logic 416 of the data / ALU-execution stage 418 includes one, a right register PRIV 446, and the privilege register pRIV 446 stores the currently executing instructions. I0PL. The privilege register pRIV 446 is coupled to the FMASK register 448. Therefore, the current 101% can be input for the -8% register as one of 448. The EFLAGS register 414 is coupled to the FMASK register 448 to provide a second input to the FMASK register 448. The execution logic can provide a mask, that is, FMASK. When a financial evaluation micro-instruction is executed, the mask is dynamically generated. In a specific embodiment of the present invention, the number of bits of the mask = is equal to the number of bits 该 of the EFLAGS register 4 丨 4. When a specific mask bit is set, it means that the mask bit is below the current privilege level of the "" register 446, and it can be updated. Otherwise, if a specific mask bit is not If set, the mask bit cannot be updated. The execution logic 416 accesses the current operation mode from the privilege register PRIV 446, and accesses the state of other bits from the EFLAGS register 414. After that, the contents of the EFLAGS register 414 are read, and the contents are combined with the F port to perform a sum operation, and the result is stored in a result temporarily as 452. Suffice it to say that the output terminal of the FMASK register is one of the wide input terminals, and the other input terminals of the 450 are connected to the EFLAGS register 414. [〇〇51] In the next machine cycle, the result stored in the result temporary cry 452 is written to the address of the stack memory Ϊ Ϊ! TSP ^ # " 426 ^ ^ ° EFLAGS ^ ^ ^ ^ y is not necessary, because in the immediately following machine cycle, the storage logic of the bass is provided to the storage phase 456, so it is not not = temporary

第22頁 200406684 五、發明說明(18) 曰守儲存該EFLAGS的鏡像。妙— ^ ^ ^ 豕儲存邏輯454會將該結果儲存在 纪fe体4 5 8之堆疊儲存器。 [ 0052 ]處理器400可以在_ PUSHF-PUSHFD 指令的執杆。L ^ ^ 女,曰 巩订 因此,該處理器4 0 0的效能及 產出1即有顯著的增進。 [0053]現請參閱圖五,其顯示一流程圖,用以描述微 ^理益40 Oj艮據本發明以執行一讀取該eflags暫存器的運 异’该運:,為執行-下推至該堆疊儲存器。根據本發明 之具體Λ施例,提取器(fetcher)404從指令記憶體提取 一 PUSHF或PUSHFD巨集指令,在此情況下,一被提取之指 令在被轉移至堆疊儲存器之前,會先要求從efugs暫存器 414之一讀取,如同流程開始之方塊5〇〇所示。於方塊5〇5 中,轉譯器410將該巨集指令轉譯成一肝訐微指令,該 MFEF微指令係組態為在一單一微指令週期内完成從該 EFLAGS暫存|§之一讀取,流程接著進行到方塊51 〇。於方 塊510中,在FMASK遮罩暫存器448中產生一EFLAGS庐星, 在本發明之-具體實施財,該肌AGS料==數罩量等 同於該EFLAGS的位元數量,因此,在該㈣“尺暫存器内之 該EFLAGS遮罩的該些位元與EFlAGS暫存器的該些位元有一 對一之對應關係。在方塊515之產生該EFLAGS遮罩的過程 中’執行邏輯4 1 6會檢查該現時特權等級,並且設定該遮 罩的該些位元’該些被設定的遮罩位元係對應於該特定特 權等級所允許可被更新之特定EFLAGS位元,至於該遮罩的 其他對應該特定特權等級所不允許更新之EFLAGS位元的遮Page 22 200406684 V. Description of the invention (18) The guard stores the image of the EFLAGS. Miao — ^ ^ ^ 豕 Storage logic 454 will store the result in the stack memory of Ji Fei 4 5 8. [0052] The processor 400 can execute the PUSHF-PUSHFD instruction. L ^ ^ Female, said Gongding Therefore, the performance and output 1 of the processor 400 have improved significantly. [0053] Please refer to FIG. 5, which shows a flowchart for describing the micro difference 40. According to the present invention, an operation to read the eflags register is executed. The operation is: Push to the stacker. According to a specific Λ embodiment of the present invention, the fetcher 404 fetches a PUSHF or PUSHFD macro instruction from the instruction memory. In this case, an fetched instruction will first request before it is transferred to the stack memory. Read from one of the efugs registers 414, as shown by block 500 at the beginning of the process. In block 505, the translator 410 translates the macro instruction into a liver micro-instruction. The MFEF micro-instruction is configured to complete reading from one of the EFLAGS temporary storage in one single micro-instruction cycle. The flow then proceeds to block 51. In block 510, an EFLAGS star is generated in the FMASK mask register 448. In the embodiment of the present invention, the muscle AGS material == the number of masks is equal to the number of bits of the EFLAGS. Therefore, in The bits in the EFLAGS mask in the ruler register have a one-to-one correspondence with the bits in the EFLAGS register. In the process of generating the EFLAGS mask in block 515, the logic is executed. 4 1 6 will check the current privilege level, and set the bits of the mask. 'The set mask bits correspond to specific EFLAGS bits allowed by the specific privilege level to be updated. As for the Other masks for EFLAGS bits that are not allowed to be updated for a particular privilege level

200406684 五、發明說明(19) 罩位兀,則為未設定狀態,或者將其 進行到方塊520。於方塊52〇中 兮/又為〇,流程接著 暫存器之内容進行-及運算,流程連同該該EFLAGS 於方塊525中,將該及運算之結果 仃^到方塊525。 [0054]在上文關於圖:及圖三寫^^儲存器。 種增進處理ϋ執行-寫人到EFLAGS暫存=描述一 另在上文關於圖四及圖五的敘述中,裝置二方法。 進處理器執行從EFLAGS暫存哭之綠二壯、田述描述一種增 的是,該寫入與讀取運與方法。有利 而非如同傳統處理器I,需要複數個週期才能^ =几成 [ 0055 ]雖然本發明的具體實施例已敘述:;: =未受限於此。本發明不但可以硬體實現 = 该電腦程式碼促成此處所揭露之本發明的功能,或, ^模擬,或測試之實現。舉例來說,本發明可以下列電腦 程式碼來實現之:一般的程式語言(例如,C、C + +、 等);GDSII資料庫;硬體描述語言(hardware deSCripti〇n languages,HDL),包括:VerU〇g h儿、 VHDL 'AHera HDL (AHDL)等等;或是其他本技術領域中 之編製程式及/或電路(即是,圖示的)捕捉工具。該電腦 程式碼可適用於任何已知之電腦可運用(例如,可辨識)元 件’此電腦可運用元件包括:半導體記憶體,磁碟片3,光 碟片(例如,CD-ROM、DVD-ROM等等),以及如同在一電腦200406684 V. Description of the invention (19) The mask position is not set, or it proceeds to block 520. In block 52o, the flow is followed by the contents of the register-and the operation is performed. The flow together with the EFLAGS is performed in block 525, and the result of the sum operation is passed to block 525. [0054] Regarding the figure above, and FIG. 3 write the ^^ memory. This kind of enhanced processing: implementation-write people to EFLAGS temporary storage = description 1 In the above description about Figure 4 and Figure 5, the device 2 method. The processor executes a temporary cry from the EFLAGS, and the second description Tian Tian describes an addition to the write and read operations and methods. It is advantageous rather than the traditional processor I, which requires a plurality of cycles to be equal to [0055] Although specific embodiments of the present invention have been described:; = is not limited to this. The present invention can not only be implemented in hardware = the computer program code facilitates the functions of the present invention disclosed herein, or, ^ simulation, or test implementation. For example, the present invention can be implemented by the following computer code: general programming languages (eg, C, C ++, etc.); GDSII database; hardware description languages (HDL), including : VerUogh, VHDL 'AHera HDL (AHDL), etc .; or other programming tools and / or circuits (ie, illustrated) capture tools in this technical field. The computer code can be applied to any known computer-usable (eg, recognizable) components. The computer-usable components include: semiconductor memory, magnetic disk 3, optical disks (eg, CD-ROM, DVD-ROM, etc.) Etc.), and like a computer

200406684 五、發明說明(20) 可運用(例如,可辨識)傳輸元件(例如, 數位、光學,或類比式元件)具體實現之一雷,,是包括 號。就其本身而論,該電腦程式碼可以在通俨ζ貝料訊 輸,該通信網路包括網際網路與企業網路。;,路上傳 本發明之功能或其結構可以内建於處理器之電=理解的是 實現之(例如,HDL,GDSII,等等),並且將之=長式螞來 以成為積體電路之一部分。本發明亦可以一硬硬體 式碼的組合來實現之。 '、電腦程 [〇〇56]再者,雖然本發明及其目的、特徵與 細敘述,其他具體實施例仍涵蓋在本發明之範圍内”。已砰 [〇 0 5 7 ]最後,本發明的具體實施例已敘述如前, 發明並未受限於此。唯以上所述者,僅為本發明之較^杂 施例,當不能以之限制本發明的範圍,其係為提供予熟^ 此項技術者使用或製造本發明之用。大凡依本發明申請 利範圍所做之均等變化及修飾,仍將不失本發明之要義所 在,亦不脫離本發明之精神和範圍,故都應視為本發明的 進一步實施狀況。200406684 V. Description of the invention (20) One of the specific implementations that can be implemented using (for example, identifiable) transmission elements (for example, digital, optical, or analog elements), is the included number. For its part, the computer code can transmit information through communications, including the Internet and corporate networks. ;, Upload the function or structure of the present invention can be built into the processor's power = understand that it is implemented (for example, HDL, GDSII, etc.), and it is = long form to be integrated circuit portion. The present invention can also be implemented by a combination of hard and hard code. ', Computer program [0050] Furthermore, although the present invention and its objectives, features, and detailed description, other specific embodiments are still covered by the scope of the present invention. "Finally, the present invention The specific embodiments have been described as before, and the invention is not limited thereto. The above are only comparative examples of the present invention. When the scope of the present invention cannot be limited, it is intended to provide a mature solution. ^ Those skilled in the art use or manufacture the present invention. Any equal changes and modifications made in accordance with the scope of the present application will not lose the essence of the present invention, nor depart from the spirit and scope of the present invention. It should be considered as a further implementation of the present invention.

第25頁 200406684Page 25 200406684

【發明圖示說明】 [0012]本發明之前述與其 合下列說明及所附圖示後,將 [0 0 1 3 ]圖一係為一方塊崮 管線階段; [0 0 1 4 ]圖二係為一方塊圖 之一具體實施例; 他目的、特徵及優點,在配 可獲得更好的理解: ’其描述一傳統微處理器的 ’其描述本發明的微處理器 [0 0 1 5 ]圖二係為解說圖二之微處理器運作的流程圖;[Illustration of the Invention] [0012] After the foregoing description of the present invention is combined with the following description and attached drawings, [0 0 1 3] FIG. 1 is a block and pipeline stage; [0 0 1 4] FIG. 2 is a series It is a specific embodiment of a block diagram; its purpose, features, and advantages can be better understood in the configuration: 'It describes a traditional microprocessor' and it describes the microprocessor of the present invention [0 0 1 5] Figure 2 is a flowchart illustrating the operation of the microprocessor of Figure 2;

[0 0 1 6 ]圖四係為一方塊圖,其描述本發明的微處理器 之另一具體實施例; [0 0 1 7 ]圖五係為解說圖四之微處理器運作的流程圖。 圖號說明: 1 0 0 管線化微處理器架構 I 0 5提取階段 II 〇轉譯階段 11 5暫存階段 1 2 0定址階段 125 DATA/ALU階段(執行階段) 1 3 0寫回階段 2 0 0處理器 2 0 2提取階段 204指令提取邏輯 2 0 6指令記憶體[0 0 1 6] FIG. 4 is a block diagram illustrating another specific embodiment of the microprocessor of the present invention; [0 0 1 7] FIG. 5 is a flowchart illustrating the operation of the microprocessor of FIG. . Figure number description: 1 0 0 pipelined microprocessor architecture I 0 5 extraction phase II 〇 translation phase 11 5 temporary storage phase 1 2 0 addressing phase 125 DATA / ALU phase (execution phase) 1 3 0 write back phase 2 0 0 Processor 2 0 2 fetch phase 204 instruction fetch logic 2 0 6 instruction memory

第26頁 200406684 圖式簡單說明 208指令指標 2 1 2 轉譯階段 210 轉譯器 2 2 2 暫存階段 224暫存檔案 232 載入階段 % 236 載入/調正邏輯 238資料記憶體 216 DATA/ALU 階段 254 結果 3 0 0〜3 3 5 流程圖 4 0 0 處理器 4 0 2 提取階段 404指令提取邏輯 406指令記憶體 408指令指標 4 1 2轉譯階段 4 1 〇 轉譯器 422 暫存階段 424暫存檔案 4 3 1 定址階段 434 載入階段 436 載入/調正邏輯 438資料記憶體Page 26 200406684 Schematic description 208 instruction indicators 2 1 2 translation stage 210 translator 2 2 2 temporary stage 224 temporary file 232 loading stage% 236 loading / adjusting logic 238 data memory 216 DATA / ALU stage 254 Results 3 0 0 ~ 3 3 5 Flow chart 4 0 0 Processor 4 0 2 Fetch phase 404 instruction fetch logic 406 instruction memory 408 instruction index 4 1 2 translation phase 4 1 〇 translator 422 temporary storage stage 424 temporary file 4 3 1 Addressing phase 434 Load phase 436 Load / adjust logic 438 Data memory

第27頁 200406684 圖式簡單說明 418 DATA/ALU 階段 4 5 2 結果 4 5 6儲存階段 454儲存邏輯 445 記憶體内之堆疊儲存器 5 0 0〜5 2 5 流程圖Page 27 200406684 Brief description of the diagram 418 DATA / ALU phase 4 5 2 Results 4 5 6 Storage phase 454 Storage logic 445 Stacked memory in the memory 5 0 0 ~ 5 2 5 Flow chart

第28頁Page 28

Claims (1)

200406684 六、申請專利範圍 1· 一種執行一寫入運算到一處理器内之一多位元旗標暫存 器的方法,該方法包括: 接收一巨集指令至一轉譯階段,該巨集指令係要求 一寫入到該多位元旗標暫存器;以及 抑產生一微指令自該轉譯階段,該微指令係組態成在 J 寫入週期完成該寫入到該多位元旗標暫存。 2. 如申請專利範圍第】項所述之方法,該方法 ^ 旗標遮罩。 G括·產生一 3. 如申請專利範圍第2項所述之方法,該方法包括: 姅果。一事先彳日疋運#兀進行邏輯,’及”運算以產生一 該方法包括··儲存該 其中該處理器為一 其中該多位元旗標 其中該巨集指令為 ,ST1 , CLC , STC , 4 ·如申請專利範圍第2項所述之方法 結果到該多位元旗標暫存器。 5·如申請專利範圍第1項所述之方法 X 8 6處理器。 6·如申請專利範圍第丨項所述之方法 暫存器為一EFLAGS暫存器。 7 ·如申請專利範圍第1項所述之方法 下列指令之一 :P〇pF,p〇pFD,CL1 CLD 及STD ° 8· —種執行從一多位元旗標暫存器之一 該方法包括: % # i A & …吁?:欠一巨集指令至-轉譯階段’該巨集指令係要求 k忒夕位TL旗標暫存器之一讀取;以及200406684 6. Scope of Patent Application 1. A method for performing a write operation to a multi-bit flag register in a processor, the method includes: receiving a macro instruction to a translation stage, the macro instruction Requires a write to the multi-bit flag register; and generates a micro-instruction from the translation stage, the micro-instruction is configured to complete the write to the multi-bit flag in the J write cycle Temporary. 2. The method described in item [Scope of patent application], which method ^ flag mask. G. Generate a 3. The method as described in item 2 of the scope of patent application, which method includes: fruit. A pre-scheduled operation is performed on the logic to perform an AND operation to generate a method. The method includes storing the processor as a multi-bit flag and the macro instruction as ST1, CLC, STC. 4 · The result of the method described in item 2 of the scope of patent application to the multi-bit flag register. 5. The method of X 8 6 processor as described in the scope of patent application 1. The method register described in item 丨 of the scope is an EFLAGS register. 7 • One of the following instructions of the method described in item 1 of the scope of patent application: P0pF, p0pFD, CL1 CLD and STD ° 8 · One method of executing from one of the multi-bit flag registers. The method includes:% # i A &… ??: owe a macro instruction to the-translation stage 'The macro instruction requires k 忒 bit Read from one of the TL flag registers; and 第29頁 200406684 六、申請專利範圍 產生一微指令自該轉譯階段,該微指令係組態成在 一單一寫入週期完成該從該多位元旗標暫存器之該讀 取。 9.如申請專利範圍第8項所述之方法,該方法包括:產生一 旗標遮罩,該旗標遮罩包括特權資訊,該些特權資訊係 關於在一讀取運算時,根據一現時特權等級,設定該多 位元旗標暫存器之該些位元為可被更新之位元。 1 0.如申請專利範圍第9項所述之方法,該方法包括:將該 旗標遮罩與該多位元旗標暫存器進行邏輯”及”運算以 產生一結果。 11.如申請專利範圍第1 〇項所述之方法,該方法包括:儲 存該結果到一記憶體之一堆疊儲存器。 1 2.如申請專利範圍第8項所述之方法,其中該處理器為一 X 8 6處理器。 1 3.如申請專利範圍第1 2項所述之方法,其中該多位元旗 標暫存器為一EFLAGS暫存器。Page 29 200406684 6. Scope of patent application A micro instruction is generated from the translation stage. The micro instruction is configured to complete the reading from the multi-bit flag register in a single write cycle. 9. The method according to item 8 of the scope of patent application, the method comprising: generating a flag mask, the flag mask including privilege information, the privilege information is related to a read operation, according to a current Privilege level, the bits in the multi-bit flag register are set as bits that can be updated. 10. The method as described in item 9 of the scope of patent application, the method comprising: performing a logical "and" operation on the flag mask and the multi-bit flag register to produce a result. 11. The method as described in claim 10 of the scope of patent application, the method comprising: storing the result in a stack memory of a memory. 1 2. The method according to item 8 of the scope of patent application, wherein the processor is an X 8 6 processor. 1 3. The method according to item 12 of the scope of patent application, wherein the multi-bit flag register is an EFLAGS register. 第30頁Page 30
TW92128964A 2002-10-22 2003-10-20 Apparatus and method for masked move to and from flags register in a processor TWI238943B (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US10/279,207 US7076639B2 (en) 2001-10-23 2002-10-22 Apparatus and method for masked move to and from flags register in a processor

Publications (2)

Publication Number Publication Date
TW200406684A true TW200406684A (en) 2004-05-01
TWI238943B TWI238943B (en) 2005-09-01

Family

ID=37001142

Family Applications (1)

Application Number Title Priority Date Filing Date
TW92128964A TWI238943B (en) 2002-10-22 2003-10-20 Apparatus and method for masked move to and from flags register in a processor

Country Status (1)

Country Link
TW (1) TWI238943B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI480798B (en) * 2011-12-23 2015-04-11 Intel Corp Apparatus and method for down conversion of data types
TWI498816B (en) * 2011-12-23 2015-09-01 Intel Corp Method, article of manufacture, and apparatus for setting an output mask
TWI501147B (en) * 2011-12-23 2015-09-21 Intel Corp Apparatus and method for broadcasting from a general purpose register to a vector register

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI480798B (en) * 2011-12-23 2015-04-11 Intel Corp Apparatus and method for down conversion of data types
TWI498816B (en) * 2011-12-23 2015-09-01 Intel Corp Method, article of manufacture, and apparatus for setting an output mask
TWI501147B (en) * 2011-12-23 2015-09-21 Intel Corp Apparatus and method for broadcasting from a general purpose register to a vector register
US9703558B2 (en) 2011-12-23 2017-07-11 Intel Corporation Systems, apparatuses, and methods for setting an output mask in a destination writemask register from a source write mask register using an input writemask and immediate
US10372450B2 (en) 2011-12-23 2019-08-06 Intel Corporation Systems, apparatuses, and methods for setting an output mask in a destination writemask register from a source write mask register using an input writemask and immediate
US10474463B2 (en) 2011-12-23 2019-11-12 Intel Corporation Apparatus and method for down conversion of data types

Also Published As

Publication number Publication date
TWI238943B (en) 2005-09-01

Similar Documents

Publication Publication Date Title
US11663006B2 (en) Hardware apparatuses and methods to switch shadow stack pointers
TWI567641B (en) Computer program product, computer system and method for vector find element equal instruction
TWI639952B (en) Method, apparatus and non-transitory machine-readable medium for implementing and maintaining a stack of predicate values with stack synchronization instructions in an out of order hardware software co-designed processor
CN107368286B (en) SIMD integer multiply-accumulate instruction for multi-precision arithmetic
US9354877B2 (en) Systems, apparatuses, and methods for performing mask bit compression
JP6498226B2 (en) Processor and method
TW201145161A (en) Add instructions to add three source operands
TWI574166B (en) Method and apparatus for performing a vector permute with an index and an immediate
TW201403468A (en) Transforming non-contiguous instruction specifiers to contiguous instruction specifiers
TW201401167A (en) Vector find element not equal instruction
TWI502490B (en) Method for processing addition instrutions, and apparatus and system for executing addition instructions
TW200540704A (en) Method and apparatus for counting interrupts by type
US10503662B2 (en) Systems, apparatuses, and methods for implementing temporary escalated privilege
JP2014510351A (en) System, apparatus, and method for performing jump using mask register
CN115357332A (en) Virtualization of inter-processor interrupts
TW200406684A (en) Apparatus and method for masked move to and from flags register in a processor
TWI742012B (en) Processor, method and system for an on-chip reliability controller
TWI697836B (en) Method and processor to process an instruction set including high-power and standard instructions
TW202305609A (en) Method and apparatus for high-performance page-fault handling for multi-tenant scalable accelerators
TWI223193B (en) Apparatus and method for masked move to and from flags register in a processor
US7058794B2 (en) Apparatus and method for masked move to and from flags register in a processor
US7076639B2 (en) Apparatus and method for masked move to and from flags register in a processor
BRPI0806390A2 (en) computer method, equipment, and program for effectively emulating condition code definitions for computer architecture
US20130166889A1 (en) Method and apparatus for generating flags for a processor
CN117377944A (en) Host-to-guest notification

Legal Events

Date Code Title Description
MM4A Annulment or lapse of patent due to non-payment of fees