TWI238943B - Apparatus and method for masked move to and from flags register in a processor - Google Patents

Apparatus and method for masked move to and from flags register in a processor Download PDF

Info

Publication number
TWI238943B
TWI238943B TW92128964A TW92128964A TWI238943B TW I238943 B TWI238943 B TW I238943B TW 92128964 A TW92128964 A TW 92128964A TW 92128964 A TW92128964 A TW 92128964A TW I238943 B TWI238943 B TW I238943B
Authority
TW
Taiwan
Prior art keywords
register
eflags
instruction
bit
scope
Prior art date
Application number
TW92128964A
Other languages
Chinese (zh)
Other versions
TW200406684A (en
Inventor
Gerard M Col
G Glenn Henry
Terry Parks
Original Assignee
Ip First Llc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US10/279,207 external-priority patent/US7076639B2/en
Application filed by Ip First Llc filed Critical Ip First Llc
Publication of TW200406684A publication Critical patent/TW200406684A/en
Application granted granted Critical
Publication of TWI238943B publication Critical patent/TWI238943B/en

Links

Abstract

A method and apparatus are provided for writing to, and reading from, the EFLAGS register in a processor. For a particular write to EFLAGS request, a mask is generated using destination information for the write and privilege level information for the write. The mask is then ANDed with EFLAGS new value information and the result is written to the EFLAGS register in a single instruction cycle. For a particular read from EGLAGS request, a mask is generated using privilege information for the read to specify those bits of EFLAGS which can be updated during the read. The mask is then ANDed with the contents of the EFLAGS register and the result is stored in a stack in memory.

Description

1238943 案號 92128964__£ 月 曰 修正 五、發明說明(1) 【與相關申請案之對照】 [0 0 0 1 ]本申請案優先權之申請係根據該美國專利申請 案,案號60/345455,申請日:1 0/ 23/200 1,專利名1238943 Case No. 92128964__ Month Amendment V. Description of Invention (1) [Comparison with related applications] [0 0 0 1] The priority application of this application is based on the US patent application, case number 60/345455, Application date: 1 0 / 23/200 1, patent name

稱:"APPARATUS AND METHOD FOR MASKED MOVE TO FLAGS REGISTER,’ 。 【發明所屬之技術領域】 [ 0002 ]本發明係有關電腦指令之執行的領 種用以減少執行寫入/讀取EFLAGS暫 -尤才曰 ^什裔 < 耘令週期的裝Weigh: " APPARATUS AND METHOD FOR MASKED MOVE TO FLAGS REGISTER, '. [Technical field to which the invention belongs] [0002] The present invention is related to the execution of computer instructions to reduce the execution of write / read EFLAGS temporarily-You Caiyu

1238943 案號 92128964 五、發明說明(2) 置及方法。 【先前技術】 [0003]在一 x86管線微處理器中,執行一寫入到 EFLAGS 暫存器的指令(例如:POPF/popfd,CLI/STI, CLD/STD,CLC/STC)需要的週期數量相當多。因為寫入到 EFLAGS暫存器的動作係受到現時輸出入特權等級(1/〇 privilege level, IOPL)及EFLAGS暫存器内某些位元在一 寫入時之狀態的影響。在微軟視窗R作業系統下,在每一 次一子程式有一回應時,該EFLAGS暫存器即須被從該堆疊 儲存器中挽出,因而導致顯著的作業系統延遲。 [0 0 0 4 ]因此,本發明提供一種微處理器運算技術以減 少因執行一寫入到EFLAGS暫存器的指令之相關延遲,例 如:退堆疊(pop)指令。1238943 Case No. 92128964 V. Description of the invention (2) Equipment and method. [Prior art] [0003] In an x86 pipeline microprocessor, the number of cycles required to execute an instruction (eg, POPF / popfd, CLI / STI, CLD / STD, CLC / STC) written to the EFLAGS register. very much. Because the action of writing to the EFLAGS register is affected by the current I / O privilege level (IOPL) and the state of some bits in the EFLAGS register at the time of writing. Under the Microsoft Windows R operating system, each time a subroutine responds, the EFLAGS register must be retrieved from the stack memory, resulting in a significant operating system delay. [0 0 0 4] Therefore, the present invention provides a microprocessor operation technology to reduce the delay associated with executing an instruction written to the EFLAGS register, such as a pop instruction.

[0005]同時,在一 x86管線微處理器中,將該EFLAGS 暫存器存入於該堆疊儲存器之一下推(pUsh)指令,即 PUSGF/PUSHFD,其需要的週期數量亦相當多。因為從該 EFLAGS暫存器所讀取之該些位元的狀態及該微處理器的執 行狀態係受到現時輸出入特權等級(I〇PL)及EFLAGS暫存器 内該些特定位元在一下推(push)時之狀態的影響。在微軟 視窗R作業系統下,在每一次回應一子程式,該EFLAGS暫 存器即須被存入到該堆疊儲存器中,因而導致顯著的作業 系統延遲。 [0 0 0 6 ]因此,本發明提供一種微處理器運算技術以減[0005] At the same time, in an x86 pipeline microprocessor, the EFLAGS register is stored in one of the stacked storage push-down (pUsh) instructions, that is, PUSGF / PUSHFD, which requires a considerable number of cycles. Because the state of the bits read from the EFLAGS register and the execution status of the microprocessor are subject to the current I / O privilege level (IOPL) and the specific bits in the EFLAGS register are below The effect of the state when pushing. Under the Microsoft Windows R operating system, each time a subroutine is responded, the EFLAGS register must be stored in the stack memory, resulting in a significant operating system delay. [0 0 0 6] Therefore, the present invention provides a microprocessor operation technology to reduce

画 1238943 92128964 玍 日 修正 —_ 五、發明說明(3) 少因執行一讀取EFLAGS暫存器之的指令之相關存入EFLAGS 到該堆疊儲存器的延遲,例如··下推(push)指令。 【發明内容】 [0 0 0 7 ]在本發明之一具體實施例中,本發明提供在微 處理器内之一多位元旗標暫存器上執行一寫入運算的方 法。該方法包括利用一微處理器之一轉譯階段,以接收一 要求寫^到該位元旗標暫存器之一巨集指令。該方法亦包 f用4轉厚階段以產生一微指令,該微指令係組態為在 :早一寫入週期完成寫入到該多位元旗標暫存器。該方法 μ包括產生一旗標遮罩,並且利用該旗標遮罩與一預定運 ί =之3 ί Γ,以產生一結果,該結果隨後即被存人到該 票:存器’而且在本具體實施例中1多位元旗 铋暫存15即為該EFLAGS暫存器。 EFLAGS暫;^發=::在-指令週期完成-寫入到該 在本發\/之以有—效且的體咸少處理/的延遲。 包括利用一微處理器 Ζ碩取運异的方法。該方法 位元旗標暫存器之::;;;階段,以接收一要求讀取該 階段以產生-微指♦,:2二該方法亦包括利用該轉譯 期完成讀取該多位元旗;以係組態為在-單-寫入週 標遮罩’該旗標遮罩包括該方法更包括產生-旗 在一現時特權等級之一讀取^二訊’該些特權資訊係關於 -買取運舁之下,該多位元旗標暫存 11^1Drawing 1238943 92128964 the next day's correction --- 5. Description of the invention (3) The delay in storing EFLAGS to the stack memory related to the execution of an instruction that reads the EFLAGS register, for example, the push instruction . [Summary] [0 0 0 7] In a specific embodiment of the present invention, the present invention provides a method for performing a write operation on a multi-bit flag register in a microprocessor. The method includes using a translation stage of a microprocessor to receive a macro instruction requesting a write to the bit flag register. The method also includes a four-turn thick stage to generate a micro-instruction, which is configured to complete writing to the multi-bit flag register in an earlier write cycle. The method μ includes generating a flag mask, and using the flag mask and a predetermined operation 3 = Γ to produce a result, which is then deposited into the ticket: register 'and in In the specific embodiment, the bismuth of 1 multi-bit flag is temporarily stored as 15 the EFLAGS register. EFLAGS temporarily; ^ issue = :: completed in-instruction cycle-write to this in this issue // with effective-effective and less processing / delay. Including the use of a microprocessor to master the difference. In the method, the bit flag register :: ;;; stage, to receive a request to read the stage to generate-micro-finger ♦, 2: 2 The method also includes using the translation period to complete reading the multiple bits The flag is configured as an on-single-write weekly mark mask. The flag mask includes the method and further includes generating-flag reading at one of the current privilege levels. ^ Two messages. The privilege information is about -Under the purchase and shipping, the multiple flags are temporarily stored 11 ^ 1

麵 第8頁 1238943 做為更新之用的位元。該 預定運算元之及運算,以 結果儲存在一記憶體之_ 本發明可以有效的減少導 器的延遲,例如··在一處 存器下推的延遲。在本發 令週期完成該些EFLAGS堆 發明之其他目的及優點由 可更加明白。 修正Page 8 1238943 Bits used for update. The sum of the predetermined operands is stored as a result in the memory. The present invention can effectively reduce the delay of the director, for example, the delay of the push-down in a memory. Other objects and advantages of completing these EFLAGS reactor inventions during this ordering cycle will become clearer. Amend

案號 92128QR4 五、發明說明(4) 器之該些適宜 旗標遮罩與一 法更包括將該 [0010]在 取運算的處理 器執行堆疊儲 以在一單一指 [0011 ]本 隨附之圖表當 方法更包括利用該 產生一結果。該方 堆疊儲存器内。 因於執行EFLAGS讀 理器之EFLAGS暫存 明之裝置與方法可 疊儲存器之下推。 隨後之詳細說明及 【實施方式】 [0 0 1 8 ]以下的說明,係在一特定實施例及其必要條件 的脈絡下而提供,可使一般熟習此項技術者能夠利用本發 明。然而’各種對該較佳實施例所作的修改,對熟習此項 技術者而言乃係顯而易見,並且,在此所討論的一般原 理’亦可應用至其他實施例。因此,本發明並不限於此處 所展出與敘述之特定實施例,而是具有與此處所揭露之原 理與新賴特徵相符之最大範圍。 [0019]請參閱圖一,其係為描述一傳統管線微處理器 1 〇 0的方塊圖。該微處理器有一提取階段1 〇 5,一轉譯階段 110 ’ 一暫存階段115,一定址階段120,一 dat/ALU或執行 階段125,及一寫 0(write back)130 階段。 [〇 0 2 0 ]於運作時,該提取階段1 〇 5從記憶體(未顯示) 提取巨集指令以供該微處理器1 〇〇執行。轉譯階段11 0則將Case No. 92128QR4 V. Description of the invention (4) The appropriate flag masks and methods of the device further include stacking the [0010] processor in the fetch operation and storing it in a single finger [0011] attached to this Graphing the method further includes using the result to produce a result. The side is stacked inside the reservoir. As the EFLAGS temporary storage device and method implementing the EFLAGS reader can be pushed down the stackable memory. The following detailed description and [Embodiment] [0 0 1 8] The following description is provided in the context of a specific embodiment and its necessary conditions, so that those skilled in the art can use the present invention. However, 'a variety of modifications to the preferred embodiment will be apparent to those skilled in the art, and the general principles discussed herein' can also be applied to other embodiments. Therefore, the present invention is not limited to the specific embodiments shown and described herein, but has the widest scope consistent with the principles and novel features disclosed herein. [0019] Please refer to FIG. 1, which is a block diagram illustrating a conventional pipeline microprocessor 1000. The microprocessor has an extraction phase 105, a translation phase 110 ', a temporary storage phase 115, an addressing phase 120, a dat / ALU or execution phase 125, and a write back 130 phase. [00 0 2] During operation, the fetching stage 105 fetches macro instructions from a memory (not shown) for execution by the microprocessor 100. The translation stage 11 0 will

1238943 --^ 92128964 年月日 條正 五、發明說明(5) " " '~ - 該被提取的巨集指令轉譯成對應的微指令。 [0 0 2 1 ]每一微指令係用以命令微處理器丨〇 〇執行一特 之子任務,並且該子任務為完成一被提取的巨集指令之全 部運算的一部份。暫存階段丨丨5則從一暫存檔案中,取還 被該微指令所指定之運算元,以供管線中隨後的階段所 用。定址階段1 2 0則計算被該微指令所指定之計憶體位 址’以供> 料儲存與取還運算所用。階段125不 是在從該暫存檔案取還的資料上,執行算術邏輯單元 (arithmetic logic unit,ALU)運算,即是利用在定址階 段1 2 0所計算之計憶體位址,以從該計憶體讀取資料或寫 入資料至該計憶體。寫回階段丨3〇則將一資料讀取運算, 或是一ALU運算的執行結果寫入到該暫存檔案。因此,回 顧整個流程’提取階段1 〇 5提取巨集指令,、該些巨集指令 經由轉澤階段1 1 〇被解碼成微指令,而該些被轉譯的微指 令則再流經115-130階段以執行運算,因此構成該微處理 器1 0 0的管線運算。 [0022] 為增進對微處理器之字串處理的了解,在以下 的討論中將使用一x86微處理器的標準命名法。但是熟習 此領域技術者將發現使用X 8 6架構之暫存器與巨集指令僅 止於舉例說明而已,其他微處理器與架構亦可被用來作為 範例。 [0023] Data/ALU 階段 125 包括EFLAGS 暫存器 132,該 EFLAGS暫存器1 32係存有該處理器的狀態。對於條件指令 迴路(conditional l〇op)與條件指令跳躍(c〇nditi〇nal1238943-^ 92128964 Year 5. Article description of the invention (5) " " '~-The extracted macro instruction is translated into the corresponding micro instruction. [0 0 2 1] Each micro-instruction is used to instruct the microprocessor to perform a special sub-task, and the sub-task is a part of completing all operations of an extracted macro instruction. The temporary storage stage 丨 丨 5 retrieves the operand specified by the microinstruction from a temporary archive for use by subsequent stages in the pipeline. The addressing stage 1 2 0 calculates the memory address specified by the microinstruction for the > material storage and retrieval operations. Phase 125 is not to perform arithmetic logic unit (ALU) operation on the data retrieved from the temporary file, that is, to use the memory address calculated in the addressing phase 120 to retrieve the data from the memory. Read or write data to the memory. The write-back stage 丨 30 writes a data read operation or an execution result of an ALU operation to the temporary file. Therefore, reviewing the entire process of 'fetching macro instructions' at stage 105, these macro instructions are decoded into micro-instructions through translation stage 1 10, and the translated micro-instructions then flow through 115-130 The stage performs operations, thus constituting a pipeline operation of the microprocessor 100. [0022] In order to improve understanding of microprocessor string processing, a standard nomenclature of an x86 microprocessor will be used in the following discussion. However, those skilled in the art will find that the registers and macro instructions using the X 8 6 architecture are limited to examples, and other microprocessors and architectures can also be used as examples. [0023] The Data / ALU stage 125 includes an EFLAGS register 132. The EFLAGS register 1 32 stores the state of the processor. For conditional instruction loop (conditional l〇op) and conditional instruction jump (c〇nditi〇nal

1238943 _案號92128964_年月日__ 五、發明說明(6) jump)而言,該EFLAGS暫存器132可被許多指令所修改,並 且可做為比較參數之用。該EF LAGS暫存器的每一位元均存 有該最後指令之特定參數的狀態,如下列之表格一,其顯 示該EFLAGS暫存器的32個位元,及每一位元的功能。1238943 _ Case No. 92128964_ Year Month Day__ 5. In the description of the invention (6) jump), the EFLAGS register 132 can be modified by many instructions and can be used as a comparison parameter. Each bit of the EF LAGS register holds the status of specific parameters of the last instruction, as shown in Table 1 below, which shows the 32 bits of the EFLAGS register and the function of each bit.

表格一Form one

第11頁 1238943 _案號92128964_年月日_ 五、發明說明(7) EFLAGS暫存器 位元號碼 名稱 功能 32:22:00 Reserved "接低電位" 21 ID ID旗標 20 VIP 虚擬中斷未決 19 VIF 虛擬中《旗標 18 AC 對位檢査 17 VM 虚擬棋式 16 RP 回復旗標 15 0 "接低«位" 14 NT 粜狀1作旗標 13:12 IOPL 輸出入特權等級 11 OF 溢位旗標 10 DF 方向旗標 9 IF 中斷旗標致牦 8 TF 陷阱旗標 7 SF 符號旗標 6 ZF 零旗標 5 0 "接低電位_ 4 AF 輔助進位旗標 3 0 "接低«位11Page 111238943 _Case No. 92128964_ Year Month Date_ V. Description of the invention (7) EFLAGS register bit number name function 32:22:00 Reserved " Connect to low potential " 21 ID ID flag 20 VIP Virtual Interrupt pending 19 VIF Virtual Flag 18 AC Alignment Check 17 VM Virtual Chess 16 RP Reply Flag 15 0 " Connect Low «Bit " 14 NT Flag 1 Flag 13:12 IOPL Output Input Privilege Level 11 OF Overflow flag 10 DF Direction flag 9 IF Interrupt flag Peugeot 8 TF Trap flag 7 SF Symbol flag 6 ZF Zero flag 5 0 " Connect to low potential_ 4 AF Auxiliary carry flag 3 0 " Low «bit 11

[0024]在一今曰管線微處理器中,例如處理器100, 任一執行一寫入到該EFLAGS暫存器的該些指令(即 POPF/POPFD,CLC/STC,CLD/STD,CLI/STI)的執行過程均 需要相當數量的機器週期。因為寫入到EFLAGS暫存器132 的動作係受到現時輸出入特權等級(IOPL)及EFLAGS暫存器[0024] In a pipeline microprocessor, such as the processor 100, any one of the instructions (that is, POPF / POPFD, CLC / STC, CLD / STD, CLI / STI) requires a considerable number of machine cycles to execute. Because the action written to the EFLAGS register 132 is subject to the current I / O privilege level (IOPL) and the EFLAGS register

第12頁 1238943Page 12 1238943

内某些位元在一寫入時之狀態的影響,尤其位元丨,位元 3,位元5,位元15,及位元22〜31係為保留狀態 (reserved),並且其狀態不可被改變。此外,當該處理器 係在特權等級〇之保護模式(亦稱為真實位址模式)下運作 時,除VIP,VIF,及VM之外,所有非保留的位元均可被修 改。該νιρ與VIF旗標必須被清除’該VM旗標則必須保持盆 現在狀態。 〃 [ 0 025 ]任何前述之寫入資料到以“以暫存器132的巨 集指令之執行均會導致一些微指令的產生。細部來說,一 微指令將首先被執行以決定該現時輸出入特權等級 (IOPL) ’該些隨後之微指令即行讀取某些EFLAGS位元的現 時狀態,例如:VM,RF,IOPL,VIP,VIF,及IF,並且設 立位元狀態為一新數值以被寫入到EFLAGS。而一最後的微 指令則將該新數值寫入到EFLAGS暫存器132。 [ 0 026 ]前述之傳統更新該EFLAGS暫存器的方法有非常 顯著的缺失,即是必須要執行為數眾多的微指令才能完成 寫入到該EFLAGS暫存器。因為有數個微指令必須要被產生 及處理,因此上述之更新該EFLAGS暫存器會消耗許多時 間,使得微處理器的效能降低。 [0 0 2 7 ]本發明注意到在微軟視窗r作業系統下,在每 一次一子程式有一回應時,該EFLAGS暫存器即須被從該堆 叠儲存器中挽出,並且,此情況亦發生在許多今日普遍使 用之桌上型電腦之應用程式。既然此類型之指令已為大眾 廣泛使用,因此迫切需要能將該些指令之執行時間減到最The influence of the state of some bits in a write, especially bit 丨, bit3, bit5, bit15, and bits22 ~ 31 are reserved, and their status cannot be Was changed. In addition, when the processor is operating in the protected mode of privilege level 0 (also known as the real address mode), all non-reserved bits can be modified except VIP, VIF, and VM. The νιρ and VIF flags must be cleared 'and the VM flag must remain in its current state. 〃 [0 025] Any of the foregoing writing data to the execution of the macro instruction with the register 132 will cause the generation of some micro instructions. In detail, a micro instruction will be executed first to determine the current output Enter Privileged Level (IOPL) 'These subsequent micro instructions read the current state of certain EFLAGS bits, such as: VM, RF, IOPL, VIP, VIF, and IF, and set the bit state to a new value to Is written to EFLAGS. And a new microinstruction writes the new value to EFLAGS register 132. [0 026] The aforementioned traditional method of updating the EFLAGS register has a very significant deficiency, that is, it must be Many micro-instructions need to be executed to complete writing to the EFLAGS register. Because there are several micro-instructions that must be generated and processed, the above update of the EFLAGS register will consume a lot of time and make the performance of the microprocessor [0 0 2 7] The present invention notices that under the Microsoft Windows operating system, each time a subroutine has a response, the EFLAGS register must be pulled out of the stack memory, and this Situation also Applications created on many desktop computers that are commonly used today. Since this type of instruction is widely used by the general public, there is an urgent need to minimize the execution time of these instructions

1238943 _案號92128964_年月日 佟π: _ 五、發明說明(9) 少 〇 [0028] 本發明的目的在於減少執行一寫入到該eflagS 暫存器所需之指令週期的數量。為達此目的,本發明提供 一種動態產生一EFLAGS遮罩的裝置與方法。該遮罩係為結 合一指定運算元執行邏輯及運算(即將EFLAGS暫存器退堆 疊,或選擇EFLAGS位元狀態),並且將其結果寫入到該 EFLAGS暫存器。此處所揭示之新處理器有下述優點;該處 理器使用一新的微指令,即Move To EFLAGS (MTEF),該 微指令結合該執行階段之專用邏輯,使得寫入到EFLAGS的 動作可以在一單一指令週期内完成。 [0029] 現請參閱圖二’其係為一方塊圖,該方塊圖描 述一處理器200使用單一微指令,即Move To EFLAGS (MTEF),以在一單一指令週期内完成寫入到eFLAGS暫存 器。該處理器200包括一提取階段2 02,並且該提取階段 202内含一耦接至指令記憶體206的提取邏輯204 (instruction fetch logic)。一指令指標208 (instruct ion pointer)係耦接至該提取邏輯204以指示該 提取邏輯204到該記憶體206的特定位置去提取現行指令。1238943 _ Case No. 92128964_ Year Month Date 日 π: _ V. Description of the Invention (9) Less 〇 [0028] The purpose of the present invention is to reduce the number of instruction cycles required to execute a write to theeflagS register. To achieve this, the present invention provides a device and method for dynamically generating an EFLAGS mask. The mask performs logical AND operation on a specified operand in combination (ie, unstacks the EFLAGS register, or selects the EFLAGS bit state), and writes the result to the EFLAGS register. The new processor disclosed here has the following advantages; the processor uses a new micro instruction, Move To EFLAGS (MTEF), which combines the special logic of the execution stage, so that the actions written to EFLAGS can be executed in Completed in a single instruction cycle. [0029] Please refer to FIG. 2 ′, which is a block diagram illustrating a processor 200 using a single microinstruction, namely Move To EFLAGS (MTEF), to complete writing to the eFLAGS temporary in a single instruction cycle. Memory. The processor 200 includes an extraction stage 202, and the extraction stage 202 includes an instruction fetch logic 204 coupled to the instruction memory 206. An instruction pointer 208 is coupled to the fetch logic 204 to instruct the fetch logic 204 to a specific location of the memory 206 to fetch the current instruction.

[ 0 03 0 ]當該提取邏輯204提取一巨集指令,例如: POPF/POPFD,CLI/STI,CLC/STC,或CLD/STD指令,轉譯 階段212之轉譯器210即回應產生一MTEF D,S微指令,該 指令將在data/ALU-執行階段216執行一移動到EFLAGS暫存 器的動作。在該MTEF D,S微指令中,S為一來源攔位,該 來源攔位指示將被轉移至EFLAGS暫存器214的資料來源;D[0 03 0] When the fetch logic 204 fetches a macro instruction, such as: POPF / POPFD, CLI / STI, CLC / STC, or CLD / STD instruction, the translator 210 in the translation stage 212 responds to generate a MTEF D, S micro-instruction, this instruction will perform an action to move to the EFLAGS register in the data / ALU-execute stage. In the MTEF D, S microinstruction, S is a source stop, and the source stop instruction will be transferred to the data source of the EFLAGS register 214; D

第14頁 1238943 修正 曰 92128984 五、發明說明(10) 的:位:該目的攔位指定在肌AGS暫存器214之意 圖被寫入的位7〇。 [ 0 0 3 1 ]此處,在討論該MTEF微指令之處理之前 討論該處理器200其餘的架構。如圖二所示,該mtefd s 微指令被送至轉譯指令佇列(XIQ)218。然後,該mtef β, s,指令再流至暫存階段222之一 MTEF暫存器22〇。暫存階 段222係為儲存該處理器2〇〇的架構狀態。暫 括一ESP架構暫存器226。如圖二所*,該暫存階段22^ 包括一 OP1暫存器228及一 0P2暫存器230。 [ 00 32 ]該暫存階段222經由定位階段向下耦接至載入 階段2 32。該處理器2〇〇使用一傳統定位階段25〇,以計算 該處理器200處理該些指令所使用的位址。定位階段^22之 MTEF暫存器220的内容被送至並且儲存在載入階段232中對 應之MTEF暫存器234。載入階段232包括載入/調正邏輯 236 ’该載入/調正邏輯236係耦接至暫存階段222之OP1暫 存器228及OP2暫存器230。該載入/調正邏輯236亦耦接至 資料記憶體238。該載入/調正邏輯236的輸出端係耦接至 OP3暫存器240。如圖二所示,暫存階段222之〇})1暫存器 2 28及OP2暫存器230的内容係分別向下傳送至載入階段232 之OP1暫存器242及OP2暫存器244。 [0033]處理器200更包括一 data/ALU或執行階段216, 該執行階段216包括前述之EFLAGS暫存器214。該執行階段 216亦包括一 TVAL暫存器246及一 TMASK暫存器248,並且在 TVAL暫存器246及TM ASK暫存器248的内容係由一及閘250之 瞧 第15頁 1238943 案號 92128964 五、發明說明(11) 及運算結合在一起,該及運算之結果係儲存在EFLAGS暫存 器2 1 4。下述之討論將有關於該及運算與該tmASK暫存器 2 4 8所提供之遮罩運算。執行階段2 1 6亦包括一特權等級暫 存器PR IV 252,該特權等級暫存器係提供現時被^^“〖暫 存器248執行中之指令的特權資訊。該些指令執行的結果 則被送至結果暫存器2 5 4,該些結果並且經由一結果匯流 排(未顯示)被寫入到該暫存器檔案2 2 4。 [0034]如前所述,為回應由提取邏輯2〇4提供給轉譯 器 210 之一被提取的 p〇pf/p〇pfd,CLI/STI,(:ΙΧ/8ϊ(:, CLD/STD巨集指令,該轉譯器21〇產生一單一微指令,即 MTEF D,S,並且將該MTEF D,S微指令送至盥該轉嘩器 210耦接之轉譯器佇列(XIq)218,然後再送至與轉譯器佇 列(XIQ)218耦接之暫存階段222。該訂以微指令包括一來 源攔,S及目的欄位D。該目的攔位係用以指定EFLAGS 暫存态214之應被寫入的位元。舉例來說,若D = 〇,則該目 的攔位將指寫人到進位旗標CF,該進位旗標Μ,^表 格一所不,係為EFLAGS暫存器之〇位元。在另一例中,卜9 將指定一寫入到該EFLAGS暫存器之11?位元,D=:1 寫入到該EFLAGS暫存器之DF位元。在該MTEF D微曰 將腿gs暫存器214從該堆疊儲存器中挽出之 疊(pop)的設定為D = 31。 择 EFLAGS ί二Γ微指令的來源攔位S則指定被寫入到該 S 位兀的狀態。舉例來Hl〇,s = 。’即命令處理㈣咖卜鋼…’即命令處理器 1238943Page 14 1238943 Amendment 92128984 V. Description of the invention (10): Bit: The purpose stop is specified in bit 70, which is intended to be written in the muscle AGS register 214. [0 0 3 1] Here, before discussing the processing of the MTEF microinstruction, the rest of the architecture of the processor 200 is discussed. As shown in Figure 2, the mtefd s microinstruction is sent to the translation instruction queue (XIQ) 218. Then, the mtef β, s, instruction flows to the MTEF register 22, which is one of the temporary storage stages 222. The temporary storage stage 222 is used to store the architecture state of the processor 200. An ESP architecture register 226 is temporarily included. As shown in Figure 2 *, the temporary storage stage 22 ^ includes an OP1 register 228 and an 0P2 register 230. [0032] The temporary storage phase 222 is coupled down to the loading phase 232 via the positioning phase. The processor 200 uses a conventional positioning stage 25 to calculate the addresses used by the processor 200 to process the instructions. The contents of the MTEF register 220 in the positioning stage ^ 22 are sent to and stored in the corresponding MTEF register 234 in the loading stage 232. The loading stage 232 includes a loading / adjustment logic 236 ', which is coupled to the OP1 register 228 and the OP2 register 230 of the temporary storage stage 222. The load / adjust logic 236 is also coupled to the data memory 238. The output of the load / adjust logic 236 is coupled to the OP3 register 240. As shown in Figure 2, the contents of the temporary storage stage 222 to 0)) 1 register 2 28 and the OP2 register 230 are transferred down to the OP1 register 242 and the OP2 register 244 of the loading stage 232, respectively. . [0033] The processor 200 further includes a data / ALU or execution phase 216. The execution phase 216 includes the aforementioned EFLAGS register 214. The execution phase 216 also includes a TVAL register 246 and a TMASK register 248, and the contents of the TVAL register 246 and the TM ASK register 248 are determined by the first and second gates. 92128964 V. Description of the invention (11) Combined with the operation, the result of the AND operation is stored in the EFLAGS register 2 1 4. The following discussion will be about the sum operation and the mask operation provided by the tmASK register 2 4 8. The execution stage 2 1 6 also includes a privilege level register PR IV 252. The privilege level register provides the privilege information of the instructions currently being executed by the "register 248". The results of the execution of these instructions are The results are sent to the result register 2 5 4 and the results are written to the register file 2 2 4 via a result bus (not shown). [0034] As mentioned previously, the logic is extracted for the response. 204 is provided to one of the translators 210, the extracted p0pf / p0pfd, CLI / STI, (: Ιχ / 8ϊ (:, CLD / STD macro instruction, the translator 21) generates a single micro instruction MTEF D, S, and the MTEF D, S microinstruction is sent to the translator queue (XIq) 218 coupled to the transponder 210, and then sent to the translator queue (XIQ) 218 The temporary storage phase 222. The micro instruction includes a source block, S, and a target field D. The target block is used to specify the bit that should be written in the EFLAGS temporary state 214. For example, if D = 〇, then the purpose of the stop will refer to the writer to the carry flag CF, the carry flag M, ^ table is different, is the 0 bit of the EFLAGS register In another example, BU 9 writes a designated 11 bit to the EFLAGS register, and D =: 1 writes to the DF bit of the EFLAGS register. At the MTEF D, the leg gs is written. The register popup 214 from the stack storage is set to D = 31. Select the source block S of the EF micro instruction and specify the state to be written to the S bit. For example Come to H10, s =. 'That is, the command processing ㈣Cabu steel ...' that is the command processor 1238943

20 0設定該進位旗標。換言之,在該ΜΤΕ{Γ微指令中,s=r〇代 表清除該目的位元,而S = 1代表設定該目的位元。對一p〇p EFLAGS指令而言,該指令忽略該5欄位。 [ 0036 ]在寫入該EFLAGS暫存器時,在data/ALU執行階 段2 1 6之執行邏輯2 5 6係作為一遮罩以確保只有正確的位元 位置被寫入。在執行一MTEF微指令時,即動態的產生該 TMASK暫存器248的内容。data/ALU執行階段21 6之執行邏20 0 Set this carry flag. In other words, in the MTTE {Γ microinstruction, s = r0 means to clear the destination bit, and S = 1 means to set the destination bit. For a POP EFLAGS instruction, the instruction ignores the 5 fields. [0036] When writing to the EFLAGS register, the execution logic 2 56 in the data / ALU execution stage 2 16 is used as a mask to ensure that only the correct bit position is written. When an MTEF micro instruction is executed, the contents of the TMASK register 248 are dynamically generated. data / ALU execution phase 21 6 execution logic

輯256係從特權暫存器PRIV 252存取該現時運算模式,並 從EFLAGS暫存器214存取其他位元之狀態。被提供給新數 值暫存器TVAL 246的内容,若不是s攔位的數值,即是在 載入階段232讀取自堆疊儲存器的EFLAGS暫存器。如圖二 所示,TVAL與TMASK係以一及閘250連結在一起,並且該及 閘2 50運算之結果係被寫入到EFLADS暫存器214,有利的 疋,TMASK係組態為只改變某些位元,而該某些位元係為 一可作為從PR IV暫存器讀取之該特定現時運算模式之函數 的位元。在此具體實施例中,一指令所擁有之最高特權等 級為3 ’該最高特權等級之指令在更新該EFLAGS暫存器之 該些指定位元時,擁有最高的容許度。較低特權等級之指 令在更新該EFLAGS暫存器的限制較多。最高特權等級為曰 〇曰。在本發明之一具體實施例中,該遮罩所有之位元的數 量等同於該EFLAGS暫存器的位元數量。當一特定遮罩位元 被設立,即是指該遮罩位元在PRIV暫存器252之該現時特 權等級之下,係為可被更新,反之,若一特定遮罩位元未 被設立,則該遮罩位元為不可被更新。總而言之,係Series 256 accesses the current operation mode from the privilege register PRIV 252, and accesses the state of other bits from the EFLAGS register 214. The content provided to the new value register TVAL 246, if it is not the value of the s block, is the EFLAGS register read from the stack memory during the loading phase 232. As shown in Figure 2, TVAL and TMASK are connected by a AND gate 250, and the result of the AND gate 250 operation is written to the EFLADS register 214. Advantageously, the TMASK system is configured to change only Certain bits, and the certain bits are a bit that can be used as a function of the particular current operation mode read from the PR IV register. In this specific embodiment, the highest privilege level owned by an instruction is 3 '. The highest privilege level instruction has the highest allowability when updating the specified bits of the EFLAGS register. Instructions with lower privilege levels have more restrictions on updating this EFLAGS register. The highest privilege level is 〇〇. In a specific embodiment of the present invention, the number of bits in the mask is equal to the number of bits in the EFLAGS register. When a specific mask bit is set, it means that the mask bit is below the current privilege level of the PRIV register 252, and it can be updated. Conversely, if a specific mask bit is not set, , The mask bit cannot be updated. All in all, the department

第17頁 1238943 ---案號92128964 _年月日 修正 五'發明說明(13) 提供被寫入到EFLAGS之該特定數值,而TMASK則根據對應 於儲存在PRIV暫存器252之該特定指令的特權等級,以決 疋一寫入動作是否被允許。 [ 0037 ]本發明使得一寫入到該EFLAGS暫存器的指令可 以在一單一指令週期内完成,因此得以顯著的增加處理器 的產出量。 [0 0 3 8 ]現請參閱圖三,其顯示一流程圖,用以描述微 處理器20 0根據本發明以執行一寫入到該EFLAGS暫存器的 運算的高階處理流程概要。流程開始於方塊3 〇 〇,在此 處’從記憶體提取一巨集指令,例如:p〇pF/p〇pFD, CLI/STI,CLC/STC,或CLD/STD,流程接著進行到方塊 305。於方塊305中,轉譯器210將該巨集指令轉譯成要求 執行寫入到EFLAGS暫存器214之微指令,流程接著進行到 方塊310。於方塊31〇中,在該EFLAGS遮罩暫存器TMASK 248中產生一EFLAGS遮罩,流程接著進行到方塊31 5。於方 塊315中,如前文所述,一新數值將被寫入到EFLAGS,並 且該巨集指令之執行結果將被載入到巧“暫存器246,流 程接著進行到方塊320。於方塊320中,目的資訊將被提供 給TMASK暫存器248,流程接著進行到方塊325。於方塊325 中,提供現時特權等級予該TMASK暫存器248,若該特權等 級容許更新由該目的資訊所指定之該些特定EFLAGS位元, 則該TMASK暫存器248會被組態有一許可更新由該目的資訊 所指定之該些特定EFLAGS位元之一數值,流程接著進行到 方塊330。於方塊330中,對該TMASK暫存器的内容與該新 Η kmPage 171238943-Case No. 92128964 _Year Month Day Amendment 5 'Invention Description (13) Provide the specific value written to EFLAGS, and TMASK according to the specific instruction corresponding to stored in the PRIV register 252 Privilege level to determine if a write operation is allowed. [0037] The present invention enables an instruction written to the EFLAGS register to be completed in a single instruction cycle, thereby significantly increasing the output of the processor. [0 0 3 8] Please refer to FIG. 3, which shows a flowchart for describing an outline of a high-level processing flow of the microprocessor 200 according to the present invention to perform an operation written to the EFLAGS register. The process starts at block 3 00, where 'fetch a macro instruction from memory, for example: p0pF / p0pFD, CLI / STI, CLC / STC, or CLD / STD, and the process then proceeds to block 305. . In block 305, the translator 210 translates the macro instruction into a micro-instruction that is required to be written to the EFLAGS register 214, and the flow proceeds to block 310. In block 31o, an EFLAGS mask is generated in the EFLAGS mask register TMASK 248, and the flow proceeds to block 315. In block 315, as described above, a new value will be written to EFLAGS, and the execution result of the macro instruction will be loaded into the smart register 246. The flow then proceeds to block 320. At block 320 The target information will be provided to the TMASK register 248, and the flow then proceeds to block 325. In block 325, the current privilege level is provided to the TMASK register 248. If the privilege level allows updating specified by the purpose information The specific EFLAGS bits, the TMASK register 248 will be configured with a permission to update one of the specific EFLAGS bits specified by the destination information, and the flow then proceeds to block 330. In block 330 , The contents of the TMASK register and the new Η km

Ml m 第18頁 1238943 --m 92128964_年 月—曰 條正 五、發明說明(14) " --— 數值暫存器TVAL的内容執行一及運算’流程接著進行到方 塊335。於方塊335中,方塊330之及運算使得唯有被該 時特權等級所容許之EFLAGS位元會被更新。 [ 0 0 3 9 ]值得注意的是,在傳統管線處理器中, PUSHF/PUSHFD以將該EFLAGS暫存器下推至該堆疊儲存器會 j要一很大數量之處理器週期,其係因為從該eflags暫存 器讀取之動作係受到現時輸出入特權等級(I〇pL)及以^以 暫存器内^亥些特定位元在一寫入時之狀態的影響。尤其位 ,、位TC3,位元5,位元15,及位元22〜31係為保留狀 態’並且其狀態不可被改變。再者,該EFLAGS暫存器之" 旗標立元16及位元17)並沒有被複製,反之,該些旗 払的數值係從儲存在堆疊儲存器之EFLAGS暫存器的 清除掉。 [0040]當一 χ86處理器係在虛擬8〇86模式下運作,並 且fl/O特權等級(I0PL)小於3時,一pusHF/pusHFD指令的 執行必然會導致 般性保護錯誤(general protect ion f,u It)或異常。但是,在真實定址模式下,且該ESp暫存 器或该SP暫存器等於1,3,或5時,一PUSHF/PUSHFD指令Ml p. 18 1238943 --m 92128964_year month-said article 5. Description of the invention (14) " --- The content of the value register TVAL performs a sum operation 'flow then proceeds to block 335. In block 335, the sum operation of block 330 causes only the EFLAGS bits allowed by the privilege level at that time to be updated. [0 0 3 9] It is worth noting that in a traditional pipeline processor, PUSHF / PUSHFD to push down the EFLAGS register to the stack memory will require a large number of processor cycles, because The action read from the eflags register is affected by the current input / output privilege level (IOpL) and the state of some specific bits in the register at the time of writing. In particular, bit, bit TC3, bit 5, bit 15, and bits 22 to 31 are reserved states' and their states cannot be changed. Furthermore, the "flag flags 16 and 17" of the EFLAGS register are not copied. On the contrary, the values of these flags are cleared from the EFLAGS register stored in the stack memory. [0040] When a x86 processor operates in virtual 8086 mode and the fl / O privilege level (IOPL) is less than 3, the execution of a pusHF / pusHFD instruction will inevitably lead to a general protection ion f , U It) or abnormal. However, in the true addressing mode, and the ESp register or the SP register is equal to 1, 3, or 5, a PUSHF / PUSHFD instruction

的執行必然會導致該處理器因缺少堆疊儲存器空間而停止 運作。 [〇 0 4 1 ]在一今日的管線微處理器中,例如處理器 100,任一PUSHD/PUSHFD指令的執行均會導致若干微指令 的^生。首先’一微指令將被執行以將該EFLAGS暫存器的 ”移丨暫時暫存器中;然後,另一微指令將被執行以Execution will inevitably cause the processor to stop functioning due to lack of stack memory space. [00 0 1] In today's pipeline microprocessors, such as processor 100, the execution of any PUSHD / PUSHFD instruction will result in the generation of several micro-instructions. First, a microinstruction will be executed to move the EFLAGS register to the temporary register; then, another microinstruction will be executed to

1238943 修正 Ά% 921289Β4 五、發明說明(15) 清除該VM位元與RF位元;緊接著,執行又—微指令以決定 該現時I/O特權等級(Ι〇η),使得該處理器得以知道是否 應產生異f ’或是停止運作;最後’執行 以將該EFLAGS儲存到該堆疊儲存器。 " [0 042]前述之傳統微處理器有非常顯著的缺失,即 必須要執行為數眾多的微指令才能完成一下推sh EFLAGS到堆疊儲存器上。在該些眾多的微指令中一些用 以取传現時I/O特權等級(I〇PL),以將該eflags的内容移 到該堆f儲存器的微指令係為必要的,另一些用以命令在 下f之刖,先行清除EFLAGS的某些特定位元亦為必要的, J是:然有許多微指令的產生係因為現今之管線處理器架 構,並不能完成適合一包括一 AU運算與一儲存運算之指 f的執行。今日的執行階段邏輯僅允許執行一 存取運算。因&,任何包括命令一則類型之 U = 之運算的指令的執行,均必須產生兩個連 續”々集,並且該兩個連續的微指令集的執行係需 :U的,?週期。在一作業系統下,例如微軟視 、、、 母_人回應一子程式,該EFLAGS暫存器即須 =叠儲存器中,因此若能減少該下推EFUGS暫 存器到該堆叠儲存器的執行時間,將有助於增進處 的 效能。 σ ^043 ]圖四之處理器400提供一單一微指令,即M〇ve (MFEF),以將EFLAGS暫存器414的内容移至 諸存器。如圖所示,執行階段(data/ALU階段)之執行 Π·· 第20頁 1238943 __案號92128964__年月日_修正 __ 五、發明說明(16) 邏輯416及一load-ALU儲存管線架構致能EFLAGS之一下推 可以在一單一指令週期内完成。因此,處理器的效能將可 有顯著的增進。 [ 0 044 ]該處理器400包括一提取階段402,並且該提取 階段402内含一耗接至指令記憶體406的提取邏輯4〇4 (instruction fetch logic) ° 一指令指標408 (instruction pointer)係耦接至該提取邏輯404,以指示 該提取邏輯4 0 4到該指令記憶體4 0 6的特定位置去提取現行 指令。 [0045]當該提取邏輯404提取一巨集指令,例如一 PUSHF/PUSHFD指令’轉譯階段之轉譯器即回應產生一評ef 微指令’該微指令係用以在data/ALU-執行階段執行一從 EFLAGS暫存器4 1 4之移出。 [ 0 046 ]如圖四所示,該MFEF微指令被送至轉譯指令佇 列(XIQ)419,隨後到達在暫存階段422内之一 MFEF暫存器 420。暫存階段422包括一暫存檔案424 ,該暫存檔案424儲 存该處理器4 0 0的架構狀態。暫存檔案4 2 4包括一堆疊指標 暫存器ESP 426。暫存階段422亦包括0P1暫存器428與0P2 暫存器430。定址階段431係緊鄰該暫存階段422。定址階 段431係用以計算被儲存數值之位址,使得該些數值可以 被從記憶體取還及被寫入記憶體。 [ 0047 ]該MFEF暫存器420的内容被送到並且儲存在載 入階段434之對應的MFEF暫存器432。該載入階段434包括 載入/;正邏輯436。>圖所示,該載人/調正邏輯436係搞1238943 Modification Ά% 921289B4 V. Description of the invention (15) Clear the VM bit and RF bit; Then, execute another micro instruction to determine the current I / O privilege level (Ι〇η), so that the processor can Know if it should generate 'f' or stop operation; finally 'execute to store the EFLAGS in the stack storage. " [0 042] The aforementioned traditional microprocessor has a very significant deficiency, that is, it must execute a large number of micro instructions to complete the push of sh EFLAGS to the stack memory. Some of these many micro-instructions are used to get the current I / O privilege level (IOPL), it is necessary to move the contents of the eflags to the micro-instruction of the heap f memory, and others The order is below. It is also necessary to clear some specific bits of EFLAGS first. J is: However, many micro-instructions are generated because of the current pipeline processor architecture, which cannot be completed. The storage operation refers to the execution of f. Today's runtime logic allows only one access operation to be performed. Because of &, the execution of any instruction that includes an operation of type U = of a command must produce two consecutive "々" sets, and the execution of the two consecutive microinstruction sets requires: U,? Cycles. Under an operating system, such as Microsoft TV, the parent and the human respond to a subroutine, the EFLGS register must be stored in the stack, so if the execution of the push-down EFUGS register to the stack is reduced, Time will help improve performance. Σ ^ 043] The processor 400 in FIG. 4 provides a single microinstruction, namely Move (MFEF), to move the contents of the EFLAGS register 414 to the registers. As shown in the figure, the execution of the execution phase (data / ALU phase) Π ·· Page 20 1238943 __Case No. 92128964__year month day_correction__ V. Description of the invention (16) Logic 416 and a load-ALU storage The pipeline architecture enables one of the push-downs of EFLAGS to be completed in a single instruction cycle. Therefore, the performance of the processor can be significantly improved. [0 044] The processor 400 includes an extraction phase 402, and within the extraction phase 402 Contains a fetch logic 404 (in struction fetch logic) ° An instruction pointer 408 (instruction pointer) is coupled to the fetch logic 404 to instruct the fetch logic 404 to a specific position in the instruction memory 406 to fetch the current instruction. [0045] 当The fetch logic 404 fetches a macro instruction, for example, a PUSHF / PUSHFD instruction 'The translator in the translation phase responds to generate a comment ef micro instruction'. The micro instruction is used to execute a temporary storage from EFLAGS during the data / ALU-execution phase. [0 046] As shown in Figure 4, the MFEF microinstruction is sent to the translation instruction queue (XIQ) 419, and then reaches one of the MFEF registers 420 in the temporary storage stage 422. The storage stage 422 includes a temporary file 424, which stores the architecture state of the processor 400. The temporary file 4 2 4 includes a stack index register ESP 426. The temporary phase 422 also includes the 0P1 temporary Register 428 and 0P2 register 430. The addressing phase 431 is next to the temporary storage phase 422. The addressing phase 431 is used to calculate the address of the stored values, so that these values can be retrieved from memory and written. [0047] The contents of the MFEF register 420 The content is sent to and stored in the corresponding MFEF register 432 in the loading stage 434. The loading stage 434 includes loading / positive logic 436. As shown in the figure, the manning / correcting logic 436 is

1238943 __案號92128964_年月日 修$ 五、發明說明(17) 接至暫存階段422之0P1暫存器428與0P2暫存器430。該載 入/調正邏輯236亦耦接至資料記憶體438。該載入/調正邏 輯236的輸出端係耦接至〇P3暫存器440。如圖四所示,暫 存階段422之OP1暫存器428及OP2暫存器430的内容係分別 向下傳送至載入階段434之OP1暫存器442及OP2暫存器 444 〇 ^ [ 0048 ]MFET暫存器422,ΟΡ1暫存器442,〇Ρ2暫存器 444及ΟΡ3暫存器440均搞接至data/ALU -執行階段418之執 行邏輯416,使得儲存在該些暫存器之該些數值均可被提 供給執行邏輯416。 [0 0 4 9 ]以下詳細討論該M F E F微指令從轉譯階段4 1 2到 data/ALU-執行階段418之處理過程。當該轉譯器410接收 到一PUSHF或PUSHFD指令時,該轉譯器41〇即回應產生並輸 出一 MFEF微指令。該MFEF微指令命令微處理器4〇〇執行增 加及讀取的動作,該MFEF微指令更命令堆疊指標暫存器 ESP 426去讀取EFLAGS暫存器414,並且動態的將其efugs 鏡像修改成該現時作業模式之一函數。然後,將該E F L A G s 鏡像儲存至記憶體445内之堆疊儲存器。 [005 0 ] data/ALU-執行階段418之執行邏輯41 6包括一 特權暫存器PRIV 446,該特權暫存器PRIV 446係儲存該現 時執行之指令的IOPL。該特權暫存器PRIV 446係耦接至 FMASK暫存器448,因此,該現時l〇PL可以為該FMASK暫存 器448之一輸入。EFLAGS暫存器414係耦接至FM ASK暫存器 448,以&供一第二輸入給FMASK暫存器448。執行邏輯可1238943 __Case No. 92128964_ Year, Month, Day, Rev. V. Description of the Invention (17) Connected to the 0P1 register 428 and 0P2 register 430 of the temporary storage stage 422. The load / adjust logic 236 is also coupled to the data memory 438. The output of the load / adjustment logic 236 is coupled to the OP3 register 440. As shown in Figure 4, the contents of the OP1 register 428 and the OP2 register 430 in the temporary storage stage 422 are transmitted downward to the OP1 register 442 and the OP2 register 444 in the loading stage 434, respectively. ^^ 0048 ] MFET register 422, OP1 register 442, OP2 register 444 and OP3 register 440 are all connected to the data / ALU-executing logic 416 of execution stage 418, so that they are stored in these registers Any of these values may be provided to the execution logic 416. [0 0 4 9] The processing of the M F E F microinstruction from the translation stage 412 to the data / ALU-execution stage 418 is discussed in detail below. When the translator 410 receives a PUSHF or PUSHFD instruction, the translator 410 generates and outputs an MFEF microinstruction in response. The MFEF microinstruction instructs the microprocessor 400 to perform adding and reading operations. The MFEF microinstruction also instructs the stack index register ESP 426 to read the EFLAGS register 414, and dynamically changes its efugs image to A function of this active mode. Then, the E F L A G s image is stored in a stack memory in the memory 445. [005 0] The execution logic 416 of the data / ALU-execution stage 418 includes a privilege register PRIV 446, which is an IOPL that stores the currently executing instructions. The privilege register PRIV 446 is coupled to the FMASK register 448. Therefore, the current 10PL can be input for one of the FMASK registers 448. The EFLAGS register 414 is coupled to the FM ASK register 448 and provides & a second input to the FMASK register 448. Execution logic can

1238943 案號 92128964 五、發明說明(18) 提供一遮罩,即FMASK,當一肝⑽微指令被執行時,該遮 罩即動態的被產生。在本發明之一具體實施例中,該遮罩 戶:有之位元的數量等同於該EFLAGS暫存器41 4的位元數 里。备一特疋遮罩位元被設立,即是指該遮罩位元在pR J V 暫存器446之該現時特權等級之下,係為可被更新,反 之,若一特定遮罩位元未被設立,則該遮罩位元為不可被 更新。執行邏輯416從特權暫存器PRIV 446存取該現時運 算模式,並從EFLAGS暫存器414存取其他位元之狀離。鈇 後,讀取該EFLAGS暫存器414的内容,將該些内容、 FMASK以執行一及運算,並且將其結果儲存^ 一結果暫存 4=52 H說’FMASK暫存器的輸出端係輕接至及閘 450之一輸入&,而該及閘45〇之其他 EFLAGS暫存器414。 ’丁柄丧主 [0051]在下一個機器週期中, 的結果即被寫入到記憶体内之::器T ;位址係由ESP暫存器426所指二= 存並非必帛,因為在其緊接著 ::的暫時儲 會被提供至儲存階段4 5 6之儲在、羅& ^ ^、中,5亥結果 時儲存該EFLAGS的鏡像。儲存^ 4,所以非不需要暫 記憶体458之堆疊儲存器儲存邏輯454會將該結果儲存在 [ 0052 ]處理器400可以在一單一 執行。因此,該處理二二:* 產出量即有顯著的增進。 〜A %及 [ 0 053 ]現請參閱圖五,其 丹顯不一流程圖,用以描述微1238943 Case No. 92128964 V. Description of the invention (18) Provide a mask, namely FMASK. When a liver microinstruction is executed, the mask is dynamically generated. In a specific embodiment of the present invention, the mask user: the number of bits is equal to the number of bits in the EFLAGS register 414. The preparation of a special mask bit means that the mask bit can be updated under the current privilege level of the pR JV register 446, otherwise, if a specific mask bit is not If it is set, the mask bit cannot be updated. The execution logic 416 accesses the current operation mode from the privilege register PRIV 446, and accesses other bits from the EFLAGS register 414. After that, read the contents of the EFLAGS register 414, perform an AND operation on the contents and FMASK, and store the result ^ A result is temporarily stored 4 = 52 H says that the output terminal of the FMASK register is Tap to one of the AND gates 450 & and the other EFLAGS register 414 of the AND gate 450. 'Ding handle mourner [0051] In the next machine cycle, the result is written into the memory :: device T; the address is pointed by the ESP register 426 = storage is not necessary, because in its Immediately after :: The temporary storage will be provided to the storage stage 4 5 6 stored in, Luo & ^ ^, middle, and the EFLAGS image will be stored when the result is 5 Hai. Store ^ 4, so the stack memory storage logic 454 which does not need temporary memory 458 will store the result in [0052] processor 400 can be executed in a single. As a result, this treatment has two: * The output has increased significantly. ~ A% and [0 053] Now please refer to Figure 5, which shows a different flow chart for describing micro

1238943 修正 月 曰 案號 92128QR4 五、發明說明(19) ,理器40 0j艮據本發明以執行一讀取該EFLAGS暫存器的運 算:該運算係為執行_下推至該堆疊儲存器。根據本發明 之具體實施例,提取器(f etcher )404從指令記憶體提取 一 pushF*PUSHFDs集指令,在此情況下,一被提取之指 令在被轉>移至堆疊儲存器之前,會先要求從EFLAGS暫存器 414之一讀取,如同流程開始之方塊5〇0所示。於方塊5〇5 中,轉譯器410將該巨集指令轉譯成—MFEF微指令,該 MFEF微指令係組態為在一單一微指令週期内完成從該 EFLAGS暫存器之一讀取,流程接著進行到方塊51〇。於方 塊510中,在FMASK遮罩暫存器448中產生一 EFLAGS遮罩, 在本發明之一具體實施例中,該EFLAGS遮罩的位元數量等 同於該EFLAGS的位元數量,因此,在該FMASk暫存器内之 該EFLAGS遮罩的該些位元與EFLAGS暫存器的該些位^元有一 對一之對應關係。在方塊5 1 5之產生該eflAGS遮罩的過程 中’執行邏輯4 1 6會檢查該現時特權等級,並且設定該遮 罩的該些位元’該些被設定的遮罩位元係對應於該特定特 權專級所允终可被更新之特疋EFLAGS位元,至於該遮罩的 其他對應該特定特權等級所不允許更新之EFLAGS ^元的遮 罩位元’則為未设疋狀態’或者將其值設為〇,流程接著' 進行到方塊520。於方塊520中,將該遮罩連同該該efugs 暫存器之内容進行一及運算,流程接著進行到方塊525。 於方塊5 2 5中,將該及運算之結果寫入到堆疊儲存器。 [ 0054 ]在上文關於圖二及圖三的敘述中,其係描述一 種增進處理器執行一寫入到EFLAGS暫存器的裝置與方法。 第24頁 1238943 修正 茶就此1289fi4 五、發明說明(20) 另在上文關於圖四及圖五的敘述中,1 進處理器執行從肌似暫存器之讀取的1238943 Amendment Month Case No. 92128QR4 V. Description of the Invention (19), the processor 403 performs an operation to read the EFLAGS register according to the present invention: the operation is to execute _ push down to the stack memory. According to a specific embodiment of the present invention, the fetcher 404 fetches a set of pushF * PUSHFDs instructions from the instruction memory. In this case, a fetched instruction will be transferred before it is transferred to the stack memory and will be It is first required to read from one of the EFLAGS registers 414, as shown in block 500 at the beginning of the process. In block 505, the translator 410 translates the macro instruction into an MFEF micro instruction. The MFEF micro instruction is configured to complete reading from one of the EFLAGS registers in a single micro instruction cycle. Proceed to block 51. In block 510, an EFLAGS mask is generated in the FMASK mask register 448. In a specific embodiment of the present invention, the number of bits of the EFLAGS mask is equal to the number of bits of the EFLAGS. Therefore, in The bits in the EFLAGS mask in the FMASk register have a one-to-one correspondence with the bits in the EFLAGS register. In the process of generating the eflAGS mask in block 5 1 5 'Execution logic 4 1 6 will check the current privilege level and set the bits of the mask' The set mask bits correspond to The special EFLAGS bit allowed by the specific privilege level can be updated. As for other mask bits of the mask that correspond to the EFLAGS ^ elements that are not allowed to be updated in the specific privilege level, 'the state is not set.' Or set its value to 0, and the flow then proceeds to block 520. In block 520, the mask is summed with the contents of the efugs register, and the flow proceeds to block 525. In block 5 2 5, write the result of the sum operation to the stack memory. [0054] In the description of FIG. 2 and FIG. 3 above, it describes a device and method for enhancing a processor to execute a write to an EFLAGS register. Page 24 1238943 Amendment Tea 1289fi4 V. Description of the Invention (20) Also in the description of Figure 4 and Figure 5 above, the 1-in processor executes the reading from the muscle-like register.

的是,該寫入與讀取運算均可在一翠一指令週:内去: 而非如同傳統處理器般,需要複數個週期才能完二 、[COM]雖然本發明的具體實施例已敘述如=L =5 土受:於此。本發明不但可以硬體實現之亦;以藉 在一電腦可運用(例如,可辨識)元件具體實現之 辨識碼(例如’電腦可辨識程式碼、資料等】 腦程式碼促成此處所揭露之本發明的功能以, 測試之實現。舉例來說,本發明可以下ί電藤 程式碼來實現之··一般的程式語言(例如,C、C+:” 等),GDSII資料庫;硬體描述語言(hardware , description languages, HDL) , :Veril〇g HDL > 之編製程式及/或電路(:)是等術領域中 私式I可適用於任何已知之電腦可運用(例如,可辨識)元 :二Ί運用元件包括:半導體記憶體,磁碟片,光 λ M'DVD,M等等),以及如同在一電腦 可運用(例如,可辨識)傳輸元件(例如,載波,或是包括 Ϊ位就ΐ :身或ΐ比▲式元件)具體實現之-電腦資料訊 於,s 而,,该電腦程式碼可以在通信網路上傳 :麻::5 1:路包括網際網路與企業網路。可以理解的是 f if W 其結構可以内建於處理器之電腦程式碼來 ° L,GDSn,等等),並且將之轉移至硬體 1 Ι^ΗΠ 第25頁 1238943What ’s more, the write and read operations can be performed in one instruction cycle: instead of: as in traditional processors, multiple cycles are required to complete the second. [COM] Although specific embodiments of the present invention have been described Such as = L = 5 soil suffer: here. The present invention can be implemented not only in hardware, but also through an identification code (such as' computer-identifiable code, data, etc.) specifically implemented by a computer that can use (eg, identifiable) components. The function of the invention is realized by testing. For example, the present invention can be implemented by using diandian code. General programming languages (for example, C, C +: ", etc.), GDSII database; hardware description language (Hardware, description languages, HDL),: Veril〇g HDL > programming and / or circuit (:) is a private I in the field of surgery, etc. can be applied to any known computer can be used (for example, identifiable) element : Second, the use of components include: semiconductor memory, magnetic disks, optical lambda M'DVD, M, etc.), and as a computer can use (for example, recognizable) transmission components (for example, carrier waves, or include Ϊ The position is: the body or the ratio ▲ type component) concrete realization-computer information in, s and, the computer code can be uploaded on the communication network: hemp :: 5 1: the road includes the Internet and corporate networks Road. Understandable is f if W its structure can be built into the processor's computer code (L, GDSn, etc.), and transferred to the hardware 1 Ι ^ ΗΠ page 25 1238943

本發明亦可以 一硬體與電腦程 _銮號 92128964_ 五、發明說明(21) 以成為積體電路之一部分 式碼的組合來實現之。 [0 0 5 6 ]再者,雖然本發明及I a从 ^ 人丹目的、特徵盥振 細敘述,其他具體實施例仍涵蓋名太欢⑽ ”设點已詳 嚴仕本發明之範圍 [ 0057 ]最後,本發明的具體實 固⑺ 耳施例已敘述如前 發明並未受限於此。唯以上所述者為:士 施例,當不能以之限制本發明的範 直 J佳實 此項技術者使用或製造本發明之田固/係^供予熟習 利範圍所做之均等變化及体綠 明專 ^τ始μ ^I 飾,仍將不失本發明之要義所 ^ 止與Θ;猜砷和範圍,故都應視為本發明的 進一步實施狀況。The present invention can also be implemented by a hardware and computer program _ 銮 号 92128964_ V. Description of the invention (21) It can be realized by the combination of the code which becomes part of the integrated circuit. [0 0 5 6] Furthermore, although the present invention and I a detailed description from the purpose and characteristics of the human body, other specific embodiments still cover the name “Tai Huan” “The set point has been detailed and strictly covers the scope of the present invention [0057] Finally The specific embodiment of the present invention has been described as the previous invention is not limited to this. Only the above is: Shi embodiment, when it is not possible to limit the invention of the present invention J Jiashi this technology Those who use or manufacture the Tiangu / Department of the present invention to make the same changes in the scope of the conventional benefits and the body green Ming ^ τμ μ ^ I decoration, will still not lose the gist of the present invention ^ and Θ; guess Both arsenic and range should be considered as the further implementation status of the present invention.

第26頁 1238943 ^S_92128964 圖式簡單說明 【發明圖示說明】 [0 0 1 2 ]本發明之二 合下列說明及所附圖刖二,、八他目的、特徵及優點,在 [0013]圖一係ϊυ,將可獲得更好的理解: 管線階段; 塊圖’其描述一傳統微處理器 [0014]圖二係為 之一具體實施例; 方塊圖 其描述本發明的微處理 [0 0 1 5 ]圖三係 [0 0 1 6 ]圖四係 之另一具體實施例 [0 0 1 7 ]圖五係 為解說圖 為一方塊 為解說圖 一之微處 圖,其描 四之微處 理器運作 述本發明 理器運作 的流程圖 的微處理 的流程圖 配 的 器 器 圖號說明: 1 0 〇管線化微處理器架構 I 0 5提取階段 II 〇轉譯階段 11 5暫存階段 120定址階段 125 DATA/ALU階段(執行階段) 1 3 0寫回階段 2 0 0處理器 2 0 2提取階段 204指令提取邏輯 2 0 6指令記憶體Page 26 1238943 ^ S_92128964 Brief description of the drawings [Illustration of the invention] [0 0 1 2] The present invention combines the following description and the attached drawings, the second, eighth purposes, features and advantages, as shown in [0013] A series of ϊυ will get a better understanding: pipeline phase; block diagram 'which describes a traditional microprocessor [0014] Figure two series is a specific embodiment; block diagram which describes the microprocessing of the present invention [0 0 1 5] Figure three series [0 0 1 6] Another specific embodiment of the figure four series [0 0 1 7] Figure five series are explanatory diagrams. The operation of the processor describes the flowchart of the micro-processing of the flow chart of the operation of the present invention. The description of the flow chart of the processor is as follows: 1 0 〇 pipelined microprocessor architecture I 0 5 extraction phase II 〇 translation phase 11 5 temporary storage phase 120 Addressing phase 125 DATA / ALU phase (execution phase) 1 3 0 write back phase 2 0 0 processor 2 0 2 fetch phase 204 instruction fetch logic 2 0 6 instruction memory

1238943 _案號92128964_年月日 修正 圖式簡單說明 20 8指令指標 2 1 2轉譯階段 2 1 0轉譯器 22 2 暫存階段 224暫存檔案 23 2 載入階段 23 6 載入/調正邏輯 238 資料記憶體 216 DATA/ALU 階段 2 5 4 結果 3 0 0〜3 3 5 流程圖 4 0 0 處理器 402 提取階段 404指令提取邏輯 406 指令記憶體 408指令指標 4 1 2 轉譯階段 4 1 0轉譯器 4 2 2 暫存階段 424暫存檔案 431 定址階段 434 載入階段 436 載入/調正邏輯 438 資料記憶體1238943 _ Case No. 92128964_ Year, month, and day correction diagram, simple description 20 8 instruction indicators 2 1 2 translation stage 2 1 0 translator 22 2 temporary stage 224 temporary file 23 2 loading stage 23 6 loading / correction logic 238 Data memory 216 DATA / ALU phase 2 5 4 result 3 0 0 ~ 3 3 5 flow chart 4 0 0 processor 402 fetch phase 404 instruction fetch logic 406 instruction memory 408 instruction index 4 1 2 rendering phase 4 1 0 rendering 4 2 2 Temporary stage 424 Temporary file 431 Addressing stage 434 Loading stage 436 Loading / adjusting logic 438 Data memory

第28頁 1238943 案號92128964 年月日 修正 圖式簡單說明 418 DATA/ALU 階段 4 5 2 結果 456儲存階段 454儲存邏輯 445 記憶體内之堆疊儲存器 5 0 0〜5 2 5 流程圖Page 28 1238943 Case No. 92128964 Rev. Date Brief description of the diagram 418 DATA / ALU phase 4 5 2 Results 456 Storage phase 454 Storage logic 445 Stacked memory in memory 5 0 0 ~ 5 2 5 Flow chart

Claims (1)

1238943 ----- 案號 92128Qfi4 六、申請專利範圍1238943 ----- Case No. 92128Qfi4 6. Scope of Patent Application year 1· 一種執行一寫入運算到一處理器内之一多位元旗標暫存 器的方法,該方法包括: 接收一巨集指令至一轉譯階段,該巨集指令係要求 一寫入到該多位元旗標暫存器;以及 產生一微指令自該轉譯階段,該微指令係組態成在 一單一寫入週期完成該寫入到該多位元旗標暫存器。 2 ·如申請專利範圍第1項所述之方法,該方法包括:產生一 旗標遮罩。 3·如申請專利範圍第2項所述之方法,該方法包括:將該旗 標遮罩與一事先指定運算元進行邏輯”及”運算以產生一 結果。 4 ·如申睛專利範圍第2項所述之方法,該方法包括:儲存該 結果到該多位元旗標暫存器。 5 ·如申請專利範圍第1項所述之方法,其中該處理器為一 X 8 6處理器。 6 ·如申請專利範圍第1項所述之方法,其十該多位元旗標 暫存器為一EFLAGS暫存器。 7 ·如申請專利範圍第1項所述之方法,其中該巨集指令為 下列指令之一·· P〇PF,p〇PFD,CL1,ST1,CLC,STX, CLD 及STD 〇 一多位元旗標暫存器之-讀取運算的方法, 該方法包括: ㈣ϊ 2:ΐ集指令至一轉譯階段,該巨集指令係要求 從忒夕位7〇旗標暫存器之一讀取;以及1. A method of performing a write operation to a multi-bit flag register in a processor, the method comprising: receiving a macro instruction to a translation stage, the macro instruction requires a write to The multi-bit flag register; and generating a micro-instruction from the translation stage, the micro-instruction is configured to complete the writing to the multi-bit flag register in a single write cycle. 2. The method as described in item 1 of the scope of patent application, the method comprising: generating a flag mask. 3. The method as described in item 2 of the scope of patent application, the method comprising: performing a logical "and" operation on the flag mask with a pre-specified operand to produce a result. 4. The method as described in item 2 of Shenyan's patent scope, the method comprising: storing the result in the multi-bit flag register. 5. The method according to item 1 of the scope of patent application, wherein the processor is an X 8 6 processor. 6. The method as described in item 1 of the scope of patent application, wherein the ten-bit multi-bit flag register is an EFLAGS register. 7. The method as described in item 1 of the scope of patent application, wherein the macro instruction is one of the following instructions: POF, POFD, CL1, ST1, CLC, STX, CLD, and STD 〇 more than one bit A method for the read operation of the flag register, the method includes: ㈣ϊ 2: set instruction to a translation stage, the macro instruction is required to read from one of the flag registers of the 70th bit; as well as 第30頁 1238943 _案號92128964_年月曰 修正_ 六、申請專利範圍 產生一微指令自該轉譯階段,該微指令係組態成在 一單一寫入週期完成該從該多位元旗標暫存器之該讀 取0 9.如申請專利範圍第8項所述之方法,該方法包括_·產生一 旗標遮罩,該旗標遮罩包括特權資訊,該些特權資訊係 關於在一讀取運算時,根據一現時特權等級,設定該多 位元旗標暫存器之該些位元為可被更新之位元。Page 30 1238943 _Case No. 92128964_ Rev. Year_6. The scope of the patent application generates a micro instruction from the translation stage. The micro instruction is configured to complete the slave multi-bit flag in a single write cycle. The reading of the register 0 9. The method as described in item 8 of the scope of patent application, the method includes generating a flag mask, the flag mask including privileged information, which is about the During a read operation, the bits in the multi-bit flag register are set as bits that can be updated according to a current privilege level. 1 0.如申請專利範圍第9項所述之方法,該方法包括:將該 旗標遮罩與該多位元旗標暫存器進行邏輯”及”運算以 產生一結果。 11.如申請專利範圍第1 0項所述之方法,該方法包括:儲 存該結果到一記憶體之一堆疊儲存器。 1 2.如申請專利範圍第8項所述之方法,其中該處理器為一 x86處理器。 1 3.如申請專利範圍第1 2項所述之方法,其中該多位元旗 標暫存器為一EFLAGS暫存器。10. The method as described in item 9 of the scope of patent application, the method comprising: performing a logical "and" operation on the flag mask and the multi-bit flag register to produce a result. 11. The method according to item 10 of the patent application scope, the method comprising: storing the result in a stack memory of a memory. 1 2. The method according to item 8 of the scope of patent application, wherein the processor is an x86 processor. 1 3. The method according to item 12 of the scope of patent application, wherein the multi-bit flag register is an EFLAGS register. 第31頁Page 31
TW92128964A 2002-10-22 2003-10-20 Apparatus and method for masked move to and from flags register in a processor TWI238943B (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US10/279,207 US7076639B2 (en) 2001-10-23 2002-10-22 Apparatus and method for masked move to and from flags register in a processor

Publications (2)

Publication Number Publication Date
TW200406684A TW200406684A (en) 2004-05-01
TWI238943B true TWI238943B (en) 2005-09-01

Family

ID=37001142

Family Applications (1)

Application Number Title Priority Date Filing Date
TW92128964A TWI238943B (en) 2002-10-22 2003-10-20 Apparatus and method for masked move to and from flags register in a processor

Country Status (1)

Country Link
TW (1) TWI238943B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI498816B (en) * 2011-12-23 2015-09-01 Intel Corp Method, article of manufacture, and apparatus for setting an output mask

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10474463B2 (en) 2011-12-23 2019-11-12 Intel Corporation Apparatus and method for down conversion of data types
CN104126167B (en) * 2011-12-23 2018-05-11 英特尔公司 Apparatus and method for being broadcasted from from general register to vector registor

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI498816B (en) * 2011-12-23 2015-09-01 Intel Corp Method, article of manufacture, and apparatus for setting an output mask
US9703558B2 (en) 2011-12-23 2017-07-11 Intel Corporation Systems, apparatuses, and methods for setting an output mask in a destination writemask register from a source write mask register using an input writemask and immediate
US10372450B2 (en) 2011-12-23 2019-08-06 Intel Corporation Systems, apparatuses, and methods for setting an output mask in a destination writemask register from a source write mask register using an input writemask and immediate

Also Published As

Publication number Publication date
TW200406684A (en) 2004-05-01

Similar Documents

Publication Publication Date Title
US7725736B2 (en) Message digest instruction
US11663006B2 (en) Hardware apparatuses and methods to switch shadow stack pointers
TWI584190B (en) Computer program product, computer system and mwthod for facilitating exception handling
CN104636116B (en) Complete its recursion instruction executed without reading carry flag
TWI657371B (en) Systems, apparatuses, and methods for data speculation execution
TWI575452B (en) Systems, apparatuses, and methods for data speculation execution
CN107861756B (en) Add instructions with independent carry chains
JP5947879B2 (en) System, apparatus, and method for performing jump using mask register
EP4020176A1 (en) Apparatus and method for conjugate transpose and multiply
TW201640337A (en) Systems, apparatuses, and methods for data speculation execution
CN112579171A (en) Hardware for omitting security checks when deemed secure during speculative execution
TW201643701A (en) Systems, apparatuses, and methods for data speculation execution
TWI238943B (en) Apparatus and method for masked move to and from flags register in a processor
EP4020174A1 (en) Apparatus and method for complex matrix multiplication
US20220197654A1 (en) Apparatus and method for complex matrix conjugate transpose
CN115357332A (en) Virtualization of inter-processor interrupts
TW202234275A (en) Dynamic mitigation of speculation vulnerabilities
TW201716990A (en) Method and apparatus for execution mode selection
JP2021157766A (en) Apparatus and method for efficiently managing and processing shadow stack
TWI223193B (en) Apparatus and method for masked move to and from flags register in a processor
US7076639B2 (en) Apparatus and method for masked move to and from flags register in a processor
TWI567643B (en) Systems, apparatuses, and methods for data speculation execution
US20240103871A1 (en) Cpuid enumerated deprecation
US20240103870A1 (en) Far jump and interrupt return
EP4020179A1 (en) Apparatus and method for complex matrix transpose and multiply