TW201738758A - Instructions and logic to provide base register swap status verification functionality - Google Patents
Instructions and logic to provide base register swap status verification functionality Download PDFInfo
- Publication number
- TW201738758A TW201738758A TW106101885A TW106101885A TW201738758A TW 201738758 A TW201738758 A TW 201738758A TW 106101885 A TW106101885 A TW 106101885A TW 106101885 A TW106101885 A TW 106101885A TW 201738758 A TW201738758 A TW 201738758A
- Authority
- TW
- Taiwan
- Prior art keywords
- register
- instruction
- processor
- memory
- instructions
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/30007—Arrangements for executing specific machine instructions to perform operations on data operands
- G06F9/30032—Movement instructions, e.g. MOVE, SHIFT, ROTATE, SHUFFLE
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30098—Register arrangements
- G06F9/30101—Special purpose registers
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30098—Register arrangements
- G06F9/3012—Organisation of register space, e.g. banked or distributed register file
- G06F9/30123—Organisation of register space, e.g. banked or distributed register file according to context, e.g. thread buffers
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30181—Instruction operation extension or modification
- G06F9/30189—Instruction operation extension or modification according to execution mode, e.g. mode flag
Landscapes
- Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Executing Machine-Instructions (AREA)
- Advance Control (AREA)
Abstract
Description
本揭示係關於處理邏輯、微處理器、以及相關聯指令集結構之領域,當該等指令利用處理器或其他處理邏輯被執行時,則進行邏輯、數學、或其他功能運算。尤其是,本揭示係有關於提供基底暫存器交換狀態驗證功能之指令及邏輯。 The present disclosure pertains to the field of processing logic, microprocessors, and associated instruction set structures that perform logical, mathematical, or other functional operations when executed by a processor or other processing logic. In particular, the present disclosure is directed to instructions and logic for providing a substrate register exchange state verification function.
現代處理器可於有時包含一暫存器或多個暫存器之一個或多個暫存器檔案及/或模組特定暫存器(MSR)中提供暫存器以指向線程特定資料。 Modern processors may provide a scratchpad to point to thread-specific data in one or more scratchpad files and/or module specific registers (MSRs) that sometimes include a scratchpad or multiple scratchpads.
歷史悠久的x86結構,例如,具有八個一般用途暫存器(GPR)、六個區段暫存器、一個旗標暫存器以及一個指令指示器。在大多數現代操作系統(類似於FreeBSD、Linux或微軟視窗)上之應用使用一記憶體模組,其幾乎將所有區段暫存器指向至相同位置(且使用分頁以取代區 段),而有效地使該等區段暫存器之使用失能。一般,區段暫存器FS及GS之二者的使用,對這規則是例外的,其被使用以指向線程特定資料。這些可被使用於32位元模式中或64位元模式中。x86結構之一些“系統程式化”特點是不被使用於一些處理器上的64位元操作系統中,或是不可用於所謂的“長模式”(64位元及兼容模式)中--包含區段位址(雖然FS及GS區段以殘留形式被維持以供使用作為對於操作系統結構之額外基底指標)。 The historic x86 architecture, for example, has eight general purpose registers (GPRs), six sector registers, a flag register, and an instruction indicator. Applications on most modern operating systems (similar to FreeBSD, Linux, or Microsoft Windows) use a memory module that points almost all of the extent registers to the same location (and uses paging to replace the area) Segment), effectively disabling the use of the segment registers. In general, the use of both segment registers FS and GS is an exception to this rule, which is used to point to thread-specific material. These can be used in 32-bit mode or in 64-bit mode. Some "system stylization" features of the x86 architecture are not used in 64-bit operating systems on some processors, or in so-called "long mode" (64-bit and compatible modes) - including The segment address (although the FS and GS segments are maintained in a residual form for use as an additional base indicator for the operating system architecture).
當FS和GS區段超控被使用於64位元模式中時,例如,它們各自的基底位址被使用於線性位址計算中: (FS或GS).Base(基底)+索引+偏移。對於這二區段暫存器,FS及GS,設定基底位址是可經由兩個MSR:FS.Base(C000_0100h)及GS.Base(C000_0101h)。例如,於長模式中,被稱為交換GS(SwapGS)之一指令可被使用以與另一MSR,KernelGSBase(C000_0102h),之內容交換GS.Base之內容。這指令被使用以保留供用於越過脈絡切換之一邏輯處理器核心的核資訊。 When FS and GS segment overrides are used in 64-bit mode, for example, their respective base addresses are used in linear address calculations: (FS or GS).Base (base) + index + offset. For the two-segment registers, FS and GS, the base address is set via two MSRs: FS.Base (C000_0100h) and GS.Base (C000_0101h). For example, in the long mode, one of the instructions called SwapGS can be used to exchange the contents of GS.Base with the contents of another MSR, KernelGSBase (C000_0102h). This instruction is used to reserve core information for use by one of the logical processor cores to bypass the context switch.
但是,對於一些中斷及/或例外,其是可能在核入口及SwapGS指令之間或在SwapGS指令及一核出口之間達到。因此,例外處置器可能需要推斷,在運行時,是否一SwapGS指令需要被執行--有時,例如,以便保障系統軟體之安全。例如,一些操作系統憑藉特設方法,例如,複雜及易於損壞的“魔術位址檢查”,或於核中設定GS基底暫存器為負的值以及使得用以設定該GS基底暫存器之使 用者空間指令失能。一些的這些議題可能在例外處置器設計中增加複雜的因素,或需要額外耗時檢查及非必要的使用者限制。 However, for some interrupts and/or exceptions, it is possible to reach between the core entry and the SwapGS instruction or between the SwapGS instruction and a core exit. Therefore, the exception handler may need to infer whether, at runtime, a SwapGS instruction needs to be executed -- sometimes, for example, to secure the system software. For example, some operating systems rely on ad hoc methods, such as complex and vulnerable "magic address check", or set the GS base register to a negative value in the core and enable the GS base register to be set. User space instructions are disabled. Some of these issues may add complications to the design of the exception handler, or require additional time-consuming checks and non-essential user restrictions.
到今日為止,對於此另外的例外處置器設計複雜化、性能限制議題及系統軟體安全相關問題之可能的解決辦法並沒有得到充分的探討。 As of today, possible solutions to this additional exception handler design complication, performance limitations issues, and system software security related issues have not been fully explored.
依據本發明之一實施例,係特地提出一種處理器,其包括:一第一模組特定暫存器(MSR),其用以儲存對應至供用於一第一執行脈絡之一區段的一第一基底位址欄;一第二MSR,其用以儲存對應至供用於一第二執行脈絡之該區段的一第二基底位址欄;一第三暫存器,其用以儲存對應至該等第一及第二執行脈絡之該區段的一基底暫存器交換狀態欄;一解碼單元,其用以解碼一第一交換指令;一執行單元,其用以進行下列動作:響應於該被解碼之該第一交換指令而執行該第一MSR之值及該第二MSR之值的一交換、判定該第一MSR之值及該第二MSR之值的該交換是否順利地被完成、以及響應至該第一MSR之值及該第二MSR之值的該交換順利地被完成之一判定而改變該基底暫存器交換狀態欄之一值。 According to an embodiment of the present invention, a processor is specifically provided, including: a first module specific register (MSR) for storing a corresponding one for a segment of a first execution context a first base address bar; a second MSR for storing a second base address field corresponding to the segment for a second execution context; a third temporary register for storing the corresponding a base register exchange status column to the segments of the first and second execution threads; a decoding unit for decoding a first exchange instruction; and an execution unit for performing the following actions: Performing an exchange of the value of the first MSR and the value of the second MSR on the decoded first exchange instruction, determining whether the exchange of the value of the first MSR and the value of the second MSR is successfully performed The completion, and the exchange of the value of the first MSR and the value of the second MSR are successfully determined by one of the completions to change one of the values of the base register exchange status column.
100‧‧‧系統 100‧‧‧ system
102‧‧‧處理器 102‧‧‧Processor
104‧‧‧快取記憶體 104‧‧‧Cache memory
106‧‧‧暫存器檔案 106‧‧‧Scratch file
108‧‧‧執行單元 108‧‧‧Execution unit
109‧‧‧封裝指令集 109‧‧‧Package Instruction Set
110‧‧‧處理器匯流排 110‧‧‧Processor bus
112‧‧‧圖形控制器 112‧‧‧Graphics controller
114‧‧‧加速圖形埠互連 114‧‧‧Accelerated graphics埠interconnect
116‧‧‧記憶體控制器中樞 116‧‧‧Memory Controller Hub
118‧‧‧記憶體介面 118‧‧‧ memory interface
120‧‧‧記憶體 120‧‧‧ memory
122‧‧‧系統I/O 122‧‧‧System I/O
124‧‧‧資料儲存器 124‧‧‧Data storage
126‧‧‧無線收發機 126‧‧‧Wireless transceiver
128‧‧‧快閃BIOS 128‧‧‧Flash BIOS
130‧‧‧I/O控制器中樞 130‧‧‧I/O Controller Hub
134‧‧‧網路控制器 134‧‧‧Network Controller
140‧‧‧電腦系統 140‧‧‧ computer system
141‧‧‧匯流排 141‧‧ ‧ busbar
142‧‧‧執行單元 142‧‧‧Execution unit
143‧‧‧封裝指令集 143‧‧‧Package Instruction Set
144‧‧‧解碼器 144‧‧‧Decoder
145‧‧‧暫存器檔案 145‧‧‧Scratch file
146‧‧‧同步動態隨機存取記憶體(SDRAM)控制 146‧‧‧Synchronous Dynamic Random Access Memory (SDRAM) Control
147‧‧‧靜態隨機存取記憶體(SRAM)控制 147‧‧‧Static Random Access Memory (SRAM) Control
148‧‧‧叢發快閃記憶體介面 148‧‧‧ burst flash memory interface
149‧‧‧個人電腦記憶體卡國際協會/小型快閃卡控制 149‧‧‧ PC Memory Card International Association / Compact Flash Card Control
150‧‧‧液晶顯示器控制 150‧‧‧LCD display control
151‧‧‧直接記憶體存取控制器 151‧‧‧Direct memory access controller
152‧‧‧交替匯流排主介面 152‧‧‧ alternating bus main interface
153‧‧‧I/O匯流排 153‧‧‧I/O busbar
154‧‧‧I/O橋 154‧‧‧I/O Bridge
155‧‧‧通用非同步接收/發送器 155‧‧‧Common asynchronous receiver/transmitter
156‧‧‧通用序列匯流排(USB) 156‧‧‧Common Sequence Bus (USB)
157‧‧‧藍牙無線UART 157‧‧‧Bluetooth Wireless UART
158‧‧‧I/O擴充介面 158‧‧‧I/O expansion interface
159‧‧‧處理核心 159‧‧‧ Processing core
160‧‧‧資料處理系統 160‧‧‧Data Processing System
161‧‧‧SIMD輔助處理器 161‧‧‧SIMD auxiliary processor
162‧‧‧執行單元 162‧‧‧Execution unit
163‧‧‧指令集 163‧‧‧Instruction Set
164‧‧‧暫存器檔案 164‧‧‧Scratch file
165‧‧‧解碼器 165‧‧‧Decoder
165B‧‧‧解碼器 165B‧‧‧Decoder
166‧‧‧主處理器 166‧‧‧Main processor
167‧‧‧快取記憶體 167‧‧‧Cache memory
168‧‧‧輸入/輸出系統 168‧‧‧Input/Output System
169‧‧‧無線介面 169‧‧‧Wireless interface
170‧‧‧處理核心 170‧‧‧ Processing core
171‧‧‧輔助處理器匯流排 171‧‧‧Auxiliary processor bus
200‧‧‧處理器 200‧‧‧ processor
201‧‧‧依序前端點 201‧‧‧Sequential front end points
202‧‧‧快速排程器 202‧‧‧Quick Scheduler
203‧‧‧失序執行引擎 203‧‧‧ Out-of-order execution engine
204‧‧‧慢速/一般浮點排程器 204‧‧‧Slow/general floating point scheduler
206‧‧‧簡單浮點排程器 206‧‧‧Simple floating point scheduler
208、210‧‧‧暫存器檔案 208, 210‧‧‧Scratch file
211‧‧‧執行區塊 211‧‧‧Executive block
212‧‧‧位址產生單元(AGU) 212‧‧‧ Address Generation Unit (AGU)
214‧‧‧位址產生單元(AGU) 214‧‧‧ Address Generation Unit (AGU)
216‧‧‧快速ALU 216‧‧‧fast ALU
218‧‧‧快速ALU 218‧‧‧fast ALU
220‧‧‧慢速ALU 220‧‧‧Slow ALU
222‧‧‧浮點ALU 222‧‧‧Floating ALU
224‧‧‧浮點移動單元 224‧‧‧Floating point mobile unit
226‧‧‧指令預擷取器 226‧‧‧Instruction prefetcher
228‧‧‧指令解碼器 228‧‧‧ instruction decoder
230‧‧‧追蹤快取 230‧‧‧ Tracking cache
232‧‧‧微碼ROM 232‧‧‧Microcode ROM
234‧‧‧微操作佇列 234‧‧‧Micromanipulation queue
310‧‧‧封裝位元組 310‧‧‧Encapsulated Bytes
320‧‧‧封裝字組 320‧‧‧Package blocks
330‧‧‧封裝雙字組(dword) 330‧‧‧Package double word (dword)
341‧‧‧半封裝 341‧‧‧ Half-package
342‧‧‧單一封裝 342‧‧‧ single package
343‧‧‧雙重封裝 343‧‧‧Double package
344‧‧‧不帶符號封裝位元組表示 344‧‧‧Unsigned packed byte representation
345‧‧‧帶符號封裝位元組表示 345‧‧‧Signed Encapsulated Bytes Representation
346‧‧‧不帶符號之封裝字組表示 346‧‧‧Unsigned packaged block representation
347‧‧‧帶符號封裝字組表示 347‧‧‧Signed package block representation
348‧‧‧不帶符號封裝雙字組表示 348‧‧‧Unsigned packaged double-word representation
349‧‧‧帶符號封裝雙字組表示 349‧‧‧Signed Encapsulated Double Word Representation
360‧‧‧操作碼(opcode)格式 360‧‧‧Opcode format
361、362‧‧‧編碼欄 361, 362‧‧‧ coding bar
363‧‧‧MOD欄 363‧‧‧MOD column
364、365‧‧‧來源運算元辨識符 364, 365‧‧‧ source operand identifier
366‧‧‧目的運算元辨識符 366‧‧‧ Objective operand identifier
370‧‧‧操作碼(opcode)格式 370‧‧‧Opcode format
371、372‧‧‧編碼欄 371, 372‧‧‧ coding column
373‧‧‧MOD欄 373‧‧‧MOD column
374、375‧‧‧來源運算元辨識符 374, 375‧‧‧ source operand identifier
376‧‧‧目的運算元辨識符 376‧‧‧ Objective Operator Identifier
378‧‧‧字首位元組 378‧‧‧ prefix first tuple
380‧‧‧操作碼(opcode)格式 380‧‧‧Opcode format
381‧‧‧條件欄 381‧‧‧ conditional column
382‧‧‧操作碼欄 382‧‧‧Operator bar
383‧‧‧資料大小欄 383‧‧‧Information size column
384‧‧‧飽和型式欄 384‧‧‧Saturation type bar
385‧‧‧來源運算元辨識符 385‧‧‧Source operator identifier
386‧‧‧目的運算元辨識符 386‧‧‧ Objective Operator Identifier
387-389‧‧‧操作碼欄 387-389‧‧‧Operator bar
390‧‧‧來源運算元辨識符 390‧‧‧Source operator identifier
391‧‧‧VEX字首位元組 391‧‧‧VEX prefix first tuple
392‧‧‧操作碼欄 392‧‧‧Operator bar
393‧‧‧純量-索引-基底辨識符 393‧‧‧ scalar-index-base identifier
394‧‧‧選擇式位移辨識符 394‧‧‧Selective displacement identifier
395‧‧‧選擇式即時位元組 395‧‧‧Selective Instant Bits
396‧‧‧EVEX字首位元組 396‧‧‧EVEX prefix first tuple
397‧‧‧操作碼(opcode)格式 397‧‧‧Opcode format
398‧‧‧操作碼(opcode)格式 398‧‧‧Opcode format
400‧‧‧處理器管線 400‧‧‧Processor pipeline
402‧‧‧擷取步驟 402‧‧‧Select steps
404‧‧‧長度解碼步驟 404‧‧‧ Length decoding step
406‧‧‧解碼步驟 406‧‧‧ decoding steps
408‧‧‧分配步驟 408‧‧‧Distribution steps
410‧‧‧重新命名步驟 410‧‧‧Renaming steps
412‧‧‧排程步驟 412‧‧‧ scheduling steps
414‧‧‧暫存器讀取/記憶體讀取 414‧‧‧ scratchpad read/memory read
416‧‧‧執行步驟 416‧‧‧Steps for implementation
418‧‧‧回寫/記憶體寫入步驟 418‧‧‧Write/Memory Write Step
422‧‧‧例外處理步驟 422‧‧‧Exceptional processing steps
424‧‧‧確定步驟 424‧‧‧Determining steps
430‧‧‧前端點單元 430‧‧‧ front-end point unit
432‧‧‧分支預測單元 432‧‧‧ branch prediction unit
434‧‧‧指令快取單元 434‧‧‧ instruction cache unit
436‧‧‧指令轉化後備緩衝器 436‧‧‧ instruction conversion back buffer
438‧‧‧指令擷取單元 438‧‧‧Command capture unit
440‧‧‧解碼單元 440‧‧‧Decoding unit
450‧‧‧執行引擎單元 450‧‧‧Execution engine unit
452‧‧‧重新命名/分配器單元 452‧‧‧Rename/Distributor Unit
454‧‧‧除役單元 454‧‧‧Decommissioning unit
456‧‧‧排程器單元 456‧‧‧ Scheduler unit
458‧‧‧實體暫存器檔案單元 458‧‧‧ entity register file unit
460‧‧‧執行聚集 460‧‧‧Execution aggregation
462‧‧‧執行單元 462‧‧‧Execution unit
464‧‧‧記憶體存取單元 464‧‧‧Memory access unit
470‧‧‧記憶體單元 470‧‧‧ memory unit
472‧‧‧資料TLB單元 472‧‧‧data TLB unit
474‧‧‧資料快取單元 474‧‧‧Data cache unit
476‧‧‧位準2(L2)快取單元 476‧‧‧ Position 2 (L2) cache unit
490‧‧‧處理器核心 490‧‧‧ processor core
500‧‧‧處理器 500‧‧‧ processor
502A‧‧‧核心 502A‧‧‧ core
502N‧‧‧核心 502N‧‧‧ core
504A-N‧‧‧快取單元 504A-N‧‧‧ cache unit
506‧‧‧共用快取單元 506‧‧‧Shared cache unit
508‧‧‧整合圖形邏輯 508‧‧‧Integrated Graphical Logic
510‧‧‧系統代理單元 510‧‧‧System Agent Unit
512‧‧‧環形基底互連單元 512‧‧‧ring base interconnect unit
514‧‧‧整合記憶體控制器單元 514‧‧‧Integrated memory controller unit
516‧‧‧匯流排控制器單元 516‧‧‧ Busbar Controller Unit
600‧‧‧系統 600‧‧‧ system
610、615‧‧‧處理器 610, 615‧‧ ‧ processor
620‧‧‧圖形記憶體控制器中樞 620‧‧‧Graphic Memory Controller Hub
640‧‧‧記憶體 640‧‧‧ memory
645‧‧‧顯示器 645‧‧‧ display
650‧‧‧I/O控制器中樞(ICH) 650‧‧‧I/O Controller Hub (ICH)
660‧‧‧外部圖形裝置 660‧‧‧External graphic device
670‧‧‧週邊裝置 670‧‧‧ peripheral devices
695‧‧‧前側匯流排(FSB) 695‧‧‧Front side busbars (FSB)
700‧‧‧多處理器系統 700‧‧‧Multiprocessor system
714‧‧‧I/O裝置 714‧‧‧I/O device
716‧‧‧匯流排 716‧‧ ‧ busbar
718‧‧‧匯流排橋 718‧‧ ‧ bus bar bridge
720‧‧‧匯流排 720‧‧ ‧ busbar
722‧‧‧鍵盤及/或滑鼠 722‧‧‧ keyboard and / or mouse
724‧‧‧音訊I/O 724‧‧‧Audio I/O
727‧‧‧通訊裝置 727‧‧‧Communication device
728‧‧‧儲存單元 728‧‧‧storage unit
730‧‧‧數碼及資料 730‧‧‧Digital and information
732‧‧‧記憶體 732‧‧‧ memory
734‧‧‧記憶體 734‧‧‧ memory
738‧‧‧高性能圖形電路 738‧‧‧High performance graphics circuit
739‧‧‧高性能圖形介面 739‧‧‧High-performance graphical interface
750‧‧‧點對點互連介面 750‧‧‧ Point-to-point interconnect interface
752、754‧‧‧P-P介面 752, 754‧‧‧P-P interface
770‧‧‧處理器 770‧‧‧ processor
772‧‧‧整合記憶體控制器單元 772‧‧‧Integrated memory controller unit
776、778‧‧‧P-P介面 776, 778‧‧‧P-P interface
780‧‧‧處理器 780‧‧‧ processor
782‧‧‧整合記憶體控制器單元 782‧‧‧ integrated memory controller unit
786、788‧‧‧P-P介面 786, 788‧‧‧P-P interface
790‧‧‧晶片組 790‧‧‧ chipsets
794、798‧‧‧P-P介面 794, 798‧‧‧P-P interface
796‧‧‧介面 796‧‧‧ interface
800‧‧‧系統 800‧‧‧ system
814‧‧‧I/O裝置 814‧‧‧I/O device
815‧‧‧遺留I/O裝置 815‧‧‧Remaining I/O devices
832、834‧‧‧記憶體 832, 834‧‧‧ memory
870、880‧‧‧處理器 870, 880‧‧ ‧ processor
872、882‧‧‧整合記憶體及I/O控制邏輯(CL) 872, 882‧‧‧ Integrated Memory and I/O Control Logic (CL)
890‧‧‧晶片組 890‧‧‧ chipsets
900‧‧‧單晶片系統 900‧‧‧Single wafer system
902‧‧‧互連單元 902‧‧‧Interconnect unit
910‧‧‧應用處理器 910‧‧‧Application Processor
920‧‧‧媒體處理器 920‧‧‧Media Processor
924‧‧‧影像處理器 924‧‧‧Image Processor
926‧‧‧音訊處理器 926‧‧‧Optical processor
928‧‧‧視訊處理器 928‧‧‧Video Processor
930‧‧‧靜態隨機存取記憶體(SRAM)單元 930‧‧‧Static Random Access Memory (SRAM) Unit
932‧‧‧直接記憶體存取(DMA)單元 932‧‧‧Direct Memory Access (DMA) Unit
940‧‧‧顯示單元 940‧‧‧Display unit
1000‧‧‧處理器 1000‧‧‧ processor
1005‧‧‧CPU 1005‧‧‧CPU
1010‧‧‧GPU 1010‧‧‧GPU
1015‧‧‧影像處理器 1015‧‧‧Image Processor
1020‧‧‧視訊處理器 1020‧‧‧Video Processor
1025‧‧‧USB控制器 1025‧‧‧USB controller
1030‧‧‧UART控制器 1030‧‧‧UART controller
1035‧‧‧SPI/SDIO控制器 1035‧‧‧SPI/SDIO Controller
1040‧‧‧顯示裝置 1040‧‧‧ display device
1045‧‧‧高清晰度多媒體介面(HDMI)控制器 1045‧‧‧High Definition Multimedia Interface (HDMI) Controller
1050‧‧‧MIPI控制器 1050‧‧‧MIPI controller
1055‧‧‧快閃記憶體控制器 1055‧‧‧Flash memory controller
1060‧‧‧雙重資料率控制器 1060‧‧‧Double data rate controller
1065‧‧‧安全引擎 1065‧‧‧Security Engine
1070‧‧‧I2S/I2C(整合晶片間聲音/積體電路間)介面 1070‧‧‧I2S/I2C (integrated inter-wafer sound/integrated circuit) interface
1110‧‧‧硬體或軟體模組 1110‧‧‧ hardware or software modules
1120‧‧‧模擬軟體 1120‧‧‧ Simulation software
1130‧‧‧儲存器 1130‧‧‧Storage
1140‧‧‧記憶體 1140‧‧‧ memory
1150‧‧‧有線連接 1150‧‧‧Wired connection
1160‧‧‧無線連接 1160‧‧‧Wireless connection
1165‧‧‧生產製造 1165‧‧‧Manufacture
1205‧‧‧程式 1205‧‧‧Program
1210‧‧‧模擬邏輯 1210‧‧‧ Analog Logic
1215‧‧‧處理器 1215‧‧‧ processor
1302‧‧‧高階語言 1302‧‧‧Higher language
1304‧‧‧x86編譯器 1304‧‧x86 compiler
1306‧‧‧x86二進制數碼 1306‧‧ x86 binary digital
1308‧‧‧替代指令集編譯器 1308‧‧‧Alternative Instruction Set Compiler
1310‧‧‧替代指令集二進制數碼 1310‧‧‧Alternative instruction set binary digital
1312‧‧‧指令轉換器 1312‧‧‧Instruction Converter
1314‧‧‧不具有x86指令集核心處理器 1314‧‧‧ does not have the x86 instruction set core processor
1316‧‧‧具有x86指令集核心處理器 1316‧‧‧with x86 instruction set core processor
1400‧‧‧系統 1400‧‧‧ system
1410‧‧‧記憶體 1410‧‧‧ memory
1412‧‧‧轉譯器或轉換器 1412‧‧‧Translator or converter
1414‧‧‧處理器 1414‧‧‧ processor
1416‧‧‧暫存器檔案 1416‧‧‧Scratch file
1418‧‧‧向量/FP暫存器 1418‧‧‧Vector/FP register
1420‧‧‧一般暫存器 1420‧‧‧General register
1422‧‧‧暫存器 1422‧‧‧ register
1424‧‧‧控制暫存器 1424‧‧‧Control register
1426‧‧‧區段暫存器 1426‧‧‧Segment register
1428‧‧‧任務狀態區段 1428‧‧‧Task Status Section
1430‧‧‧頁表 1430‧‧‧ page
1432‧‧‧描述符表 1432‧‧‧Descriptor Table
1501‧‧‧基底暫存器交換狀態驗證功能處理流程 1501‧‧‧Based register exchange status verification function processing flow
1518-1524‧‧‧處理步驟 1518-1524‧‧‧Processing steps
1502‧‧‧基底暫存器交換狀態驗證功能處理流程 1502‧‧‧Based register exchange status verification function processing flow
1510-1526‧‧‧處理步驟 1510-1526‧‧‧Processing steps
1503‧‧‧基底暫存器交換狀態驗證功能處理流程 1503‧‧‧Based register exchange status verification function processing flow
1526-1550‧‧‧處理步驟 1526-1550‧‧‧Processing steps
1601‧‧‧處理器微結構 1601‧‧‧ Processor Microstructure
1602‧‧‧處理器微結構 1602‧‧‧Processor microstructure
1603‧‧‧位移 1603‧‧‧displacement
1604‧‧‧基底位址 1604‧‧‧Base address
1605‧‧‧索引 1605‧‧‧ index
1606‧‧‧有效位址 1606‧‧‧Valid address
1608‧‧‧遮罩 1608‧‧‧ mask
1609‧‧‧資料 1609‧‧‧Information
1610‧‧‧區段 Section 1610‧‧‧
1615‧‧‧基底暫存器交換狀態 1615‧‧‧Base register exchange status
1620‧‧‧EFLAGS實體暫存器 1620‧‧‧EFLAGS physical register
1621‧‧‧指令指示器實體暫存器 1621‧‧‧Instruction Indicator Physical Register
1622‧‧‧MRS實體暫存器 1622‧‧‧MRS entity register
1624‧‧‧控制(CTL)實體暫存器 1624‧‧‧Control (CTL) physical register
1625‧‧‧CR3 1625‧‧‧CR3
1626‧‧‧區段實體暫存器 1626‧‧‧Sector entity register
1627‧‧‧FS 1627‧‧‧FS
1628‧‧‧GS 1628‧‧‧GS
1630‧‧‧GS基底 1630‧‧‧GS substrate
1632‧‧‧核GS基底 1632‧‧‧ nuclear GS substrate
1634‧‧‧FS基底 1634‧‧‧FS base
1635‧‧‧MSR 1635‧‧‧MSR
1637‧‧‧基底 1637‧‧‧Base
1638‧‧‧基底 1638‧‧‧Base
1640‧‧‧解碼單元 1640‧‧‧Decoding unit
1650‧‧‧執行引擎單元 1650‧‧‧Execution engine unit
1652‧‧‧重新命名/分配器單元 1652‧‧‧Rename/Distributor Unit
1654‧‧‧除役單元 1654‧‧‧Demeritment unit
1656‧‧‧排程器單元 1656‧‧‧scheduler unit
1664‧‧‧記憶體存取單元 1664‧‧‧Memory access unit
1670‧‧‧記憶體單元 1670‧‧‧ memory unit
1672‧‧‧TLB 1672‧‧‧TLB
1674‧‧‧L1快取 1674‧‧‧L1 cache
1676‧‧‧L2快取 1676‧‧‧L2 cache
1680‧‧‧浮點(FP)實體暫存器 1680‧‧‧Floating point (FP) physical register
1682‧‧‧遮罩實體暫存器 1682‧‧‧Mask physical register
1684‧‧‧向量實體暫存器 1684‧‧‧ Vector entity register
1686‧‧‧整數實體暫存器 1686‧‧‧Integer entity register
1694‧‧‧位址產生邏輯 1694‧‧‧ Address generation logic
1699‧‧‧儲存資料緩衝器 1699‧‧‧Storage data buffer
本發明藉由範例被例示並且不受限於附圖之圖形。 The invention is illustrated by way of example and not limited by the drawings.
圖1A是執行指令以提供基底暫存器交換狀態驗證功能之一系統實施例的方塊圖。 1A is a block diagram of an embodiment of a system for executing instructions to provide a base register exchange state verification function.
圖1B是執行指令以提供基底暫存器交換狀態驗證功能之另一系統實施例的方塊圖。 FIG. 1B is a block diagram of another embodiment of a system that executes instructions to provide a base register exchange state verification function.
圖1C是執行指令以提供基底暫存器交換狀態驗證功能之另一系統實施例的方塊圖。 1C is a block diagram of another embodiment of a system that executes instructions to provide a base register exchange state verification function.
圖2是執行指令以提供基底暫存器交換狀態驗證功能之一處理器實施例的方塊圖。 2 is a block diagram of an embodiment of a processor that executes instructions to provide a substrate scratchpad exchange state verification function.
圖3A例示根據一實施例之封裝資料型式。 FIG. 3A illustrates a package data pattern in accordance with an embodiment.
圖3B例示根據一實施例之封裝資料型式。 FIG. 3B illustrates a package data pattern in accordance with an embodiment.
圖3C例示根據一實施例之封裝資料型式。 FIG. 3C illustrates a package data pattern in accordance with an embodiment.
圖3D例示根據一實施例提供基底暫存器交換狀態驗證功能之一指令編碼。 3D illustrates one of the instruction encodings for providing a substrate register exchange state verification function in accordance with an embodiment.
圖3E例示根據另一實施例提供基底暫存器交換狀態驗證功能之一指令編碼。 FIG. 3E illustrates one of the instruction encodings for providing a substrate scratchpad exchange state verification function in accordance with another embodiment.
圖3F例示根據另一實施例提供基底暫存器交換狀態驗證功能之一指令編碼。 FIG. 3F illustrates one of the instruction encodings for providing a substrate register exchange state verification function in accordance with another embodiment.
圖3G例示根據另一實施例提供基底暫存器交換狀態驗證功能之一指令編碼。 FIG. 3G illustrates one of the instruction encodings for providing a substrate register exchange state verification function in accordance with another embodiment.
圖3H例示根據另一實施例提供基底暫存器交換狀態驗證功能之一指令編碼。 FIG. 3H illustrates one of the instruction encodings for providing a substrate register exchange state verification function in accordance with another embodiment.
圖4A例示執行提供基底暫存器交換狀態驗證功能之指令的微結構之一處理器實施例的元件。 4A illustrates elements of a processor embodiment of a microstructure that performs instructions to provide a substrate scratchpad exchange state verification function.
圖4B例示執行提供基底暫存器交換狀態驗證功能之指令的微結構之另一處理器實施例的元件。 4B illustrates elements of another processor embodiment of a microstructure that performs instructions to provide a substrate scratchpad exchange state verification function.
圖5是執行提供基底暫存器交換狀態驗證功能之指令的一處理器實施例之方塊圖。 5 is a block diagram of a processor embodiment of executing instructions to provide a base register exchange state verification function.
圖6是執行提供基底暫存器交換狀態驗證功能之指令的一電腦系統實施例的方塊圖。 6 is a block diagram of an embodiment of a computer system that executes instructions for providing a base register exchange state verification function.
圖7是執行提供基底暫存器交換狀態驗證功能之指令的另一電腦系統實施例之方塊圖。 7 is a block diagram of another embodiment of a computer system that executes instructions for providing a base register exchange state verification function.
圖8是執行提供基底暫存器交換狀態驗證功能之指令的另一電腦系統實施例之方塊圖。 8 is a block diagram of another embodiment of a computer system that executes instructions for providing a base register exchange state verification function.
圖9是執行提供基底暫存器交換狀態驗證功能之指令的一單晶片系統實施例之方塊圖。 9 is a block diagram of an embodiment of a single wafer system that executes instructions for providing a substrate register exchange state verification function.
圖10是執行提供基底暫存器交換狀態驗證功能之指令的一處理器實施例之方塊圖。 10 is a block diagram of a processor embodiment of executing instructions to provide a base register exchange state verification function.
圖11是提供基底暫存器交換狀態驗證功能之一IP核心發展系統實施例的方塊圖。 11 is a block diagram of an embodiment of an IP core development system that provides a base register exchange status verification function.
圖12例示提供基底暫存器交換狀態驗證功能之一結構模擬系統實施例。 Figure 12 illustrates an embodiment of a structural simulation system that provides a substrate register exchange status verification function.
圖13例示轉化提供基底暫存器交換狀態驗證功能之指令之一系統實施例。 Figure 13 illustrates an embodiment of a system for converting instructions that provide a base register exchange state verification function.
圖14例示轉化提供基底暫存器交換狀態驗證功能之指令的一系統之不同實施例。 Figure 14 illustrates a different embodiment of a system for converting instructions that provide a base register exchange status verification function.
圖15A例示提供基底暫存器交換狀態驗證功能之一處理程序實施例的流程圖。 Figure 15A illustrates a flow diagram of an embodiment of a handler for providing a substrate scratchpad exchange status verification function.
圖15B例示提供基底暫存器交換狀態驗證功能之一處理程序的不同實施例之流程圖。 Figure 15B illustrates a flow diagram of various embodiments of a process for providing a one of the base register exchange status verification functions.
圖15C例示提供基底暫存器交換狀態驗證功能之處理程序的另一不同實施例之流程圖。 Figure 15C illustrates a flow chart of another different embodiment of a process for providing a substrate register exchange status verification function.
圖16A例示執行提供基底暫存器交換狀態驗證功能之指令的一處理器微結構實施例之元件。 Figure 16A illustrates elements of a processor microstructure embodiment that executes instructions to provide a substrate register exchange state verification function.
圖16B例示執行提供基底暫存器交換狀態驗證功能之指令的一處理器微結構實施例之更詳細的元件。 Figure 16B illustrates a more detailed element of a processor microstructure embodiment that executes instructions to provide a substrate scratchpad exchange state verification function.
下面的說明揭示提供基底暫存器交換狀態驗證功能之指令及邏輯。一些實施例可以包含一處理器,其包括:具有一第一模組特定暫存器(MSR)以儲存對應至供用於一第一執行脈絡之一區段的一第一基底位址欄及一第二MSR以儲存對應至供用於一第二執行脈絡之該區段的一第二基底位址欄。一第三暫存器以儲存對應至該等第一及第二執行脈絡之該區段的一基底暫存器交換狀態欄。該處理器解碼單元解碼一區段交換指令且執行邏輯響應至該被解碼的區段交換指令而執行該第一MSR值及該第二MSR值之交換。如果其被判定該第一MSR值及該第二MSR值之該交換順利地被完成,則該執行邏輯響應至該第一MSR值及該第二MSR值之該交換順利地被完成之該判定而改變該基底暫存器交換狀態欄之一值。 The following description discloses instructions and logic for providing a base register exchange state verification function. Some embodiments may include a processor including: a first module specific register (MSR) to store a first base address field corresponding to a segment for a first execution context and a The second MSR stores a second base address field corresponding to the segment for use in a second execution context. A third register stores a base register exchange status column corresponding to the segments of the first and second execution threads. The processor decoding unit decodes a sector exchange instruction and performs a logical response to the decoded sector exchange instruction to perform an exchange of the first MSR value and the second MSR value. If it is determined that the exchange of the first MSR value and the second MSR value is successfully completed, the execution logic responds to the determination that the exchange of the first MSR value and the second MSR value is successfully completed. And change the value of one of the base register exchange status bars.
在歷史上,不可能有一無特許處理程序將一任意值寫入至FS基底暫存器或GS基底暫存器。兩個新的指令,WrFSBase和WrGSBase,現在允許一操作系統以致動無特許處理程序以分別地寫入至FS基底暫存器和GS基底暫存器。因此,對於一例外處置器,越來越難地不易於判定一基底暫存器是否已被設定至用以存取核資訊之一值。將了解,如於此處說明之實施例中,基底暫存器交換狀態驗證,可被使用以提供例外處置器具有能力以在運行時推斷,例如,一SwapGS指令是否需要被執行而不需要憑藉特設方法,例如,複雜且易於損壞的“魔術地址檢查”,或設定該GS基底暫存器為負的值於該核中且使得用以設定該FS基底暫存器及/或GS基底暫存器之使用者空間指令失能,例如,WrFSBase和WrGSBase。同時將了解,基底暫存器交換狀態驗證指令可被使用以避免例外處置器設計之複雜化、額外耗時檢查及非必要的使用者限制。 Historically, it has not been possible to have an unlicensed handler write an arbitrary value to the FS base register or GS base register. Two new instructions, WrFSBase and WrGSBase, now allow an operating system to actuate the unlicensed handler to write to the FS base register and the GS base register, respectively. Therefore, for an exceptional handler, it is increasingly difficult to determine if a substrate register has been set to access a value of the core information. It will be appreciated that, as described herein, the substrate register exchange state verification can be used to provide an exception handler with the ability to infer at runtime, for example, whether a SwapGS instruction needs to be executed without resorting to An ad hoc method, such as a complex and vulnerable "magic address check", or setting the GS base register to a negative value in the core and for setting the FS base register and/or GS base temporary storage User space instructions are disabled, for example, WrFSBase and WrGSBase. It will also be appreciated that the base register swap state verification instructions can be used to avoid complication of exception handler designs, additional time consuming checks, and unnecessary user restrictions.
於下面說明中,許多特定細節,例如,處理邏輯、處理器型式、微結構條件、事件、致動機構、以及其類似者被提及,以便提供本發明實施例之更全面地了解。但是,熟習本技術者應了解,本發明可被實行而不必此等特定細節。另外地,一些習知的結構、電路、以及其類似者不詳細地被展示以避免非必要地混淆本發明實施例。 In the following description, numerous specific details, such as processing logic, processor types, microstructure conditions, events, actuating mechanisms, and the like, are referred to in order to provide a more complete understanding of the embodiments of the invention. However, it will be apparent to those skilled in the art that the present invention may be practiced without the specific details. Additionally, some of the conventional structures, circuits, and the like are not shown in detail to avoid obscuring the embodiments of the present invention.
雖然下面的實施例參考一處理器被說明,但其他的實施例是可應用至其他型式的積體電路及邏輯裝置。本發明實施例之相似技術及原理可被應用至其他型式的電路或半 導體裝置,其可受益於較大之可定址記憶體、較高之管線產能及改進之安全系統性能。本發明實施例之技術是可應用至進行資料處理之任何處理器或機器。但是,本發明是不受限定於進行512位元、256位元、128位元、64位元、32位元、或16位元資料操作之處理器或機器,並且可被應用至資料處理或管理於其中被進行之任何處理器及機器。此外,下面的說明提供範例,並且附圖展示用於例示用途之各種範例。但是,這些範例不應被視為限定之意,因它們僅是旨在提供本發明實施範例而不是要提供本發明實施例所有可能實行例之一詳盡列表。 Although the following embodiments are described with reference to a processor, other embodiments are applicable to other types of integrated circuits and logic devices. Similar techniques and principles of embodiments of the present invention can be applied to other types of circuits or half Conductor devices that can benefit from larger addressable memory, higher pipeline throughput, and improved safety system performance. The techniques of embodiments of the invention are applicable to any processor or machine that performs data processing. However, the present invention is not limited to a processor or machine that operates on 512-bit, 256-bit, 128-bit, 64-bit, 32-bit, or 16-bit data, and can be applied to data processing or Manage any processors and machines that are being executed. Moreover, the following description provides examples and the figures show various examples for illustrative purposes. However, the examples are not to be considered as limiting, as they are merely intended to provide an embodiment of the invention and are not intended to provide an exhaustive list of all possible embodiments of the embodiments of the invention.
雖然下面範例說明執行單元及邏輯電路之脈絡中的指令處理及分配,本發明其他實施例亦可藉由儲存在一機器可讀取、實體化媒體上之資料及/或指令被達成,其中當資料及/或指令利用機器被進行時,將導致該機器進行與本發明至少一實施例一致之功能。於一實施例中,關聯於本發明實施例之功能以機器可執行指令被實施。該等指令可被使用以導致藉由指令被程式化之一般用途或特殊用途處理器進行本發明之步驟。本發明實施例可被提供作為一電腦程式產品或軟體,其可包含具有被儲存在其上之指令的機器或電腦可讀取媒體,其可被使用以程式化一電腦(或其他電子裝置),以根據本發明實施例而進行一個或多個操作。另外地,本發明實施例之步驟可藉由含有用以進行步驟之固定功能邏輯的特定硬體構件被進行,或藉由程式化電腦構件及固定功能硬體構件之任何組合被進行。 Although the following examples illustrate instruction processing and allocation in the context of execution units and logic circuits, other embodiments of the present invention may be implemented by storing data and/or instructions on a machine readable, materialized medium, where When the data and/or instructions are made using the machine, the machine will be caused to perform functions consistent with at least one embodiment of the present invention. In one embodiment, the functions associated with embodiments of the present invention are implemented in machine-executable instructions. The instructions can be used to cause the general inventive or special purpose processor programmed by the instructions to perform the steps of the present invention. Embodiments of the invention may be provided as a computer program product or software, which may include a machine or computer readable medium having instructions stored thereon that may be used to program a computer (or other electronic device) One or more operations are performed in accordance with an embodiment of the present invention. Additionally, the steps of an embodiment of the invention may be performed by a particular hardware component that includes fixed function logic for performing the steps, or by any combination of a stylized computer component and a fixed function hardware component.
被使用以程式化邏輯以進行本發明實施例之指令可被儲存在系統中之一記憶體內,例如,DRAM、快取、快閃記憶體、或其他儲存器。更進一步地,該等指令可經由一網路或經由其他電腦可讀取媒體被分佈。因此機器可讀取媒體可包含用以利用一機器(例如,電腦)可讀取之形式而儲存或發送資訊之任何機構,但其是不受限定於,軟式磁碟片、光碟、小型碟片、唯讀記憶體(CD-ROM)、以及鐵磁式光碟、唯讀記憶體(ROM)、隨機存取記憶體(RAM)、可消除可程控唯讀記憶體(EPROM)、電氣可消除可程控唯讀記憶體(EEPROM)、磁卡或光卡、快閃記憶體、或被使用於經由電氣、光學、聽覺或其他形式之傳輸信號(例如,載波、紅外線信號、數位信號等等)而在網際網路上發送資訊之一實體機器可讀取儲存器。因此,電腦-可讀取媒體包含適用於利用機器(例如,電腦)可讀取之形式而儲存或發送電子指令或資訊之任何型式的實體化機器可讀取媒體。 Instructions used to program logic to carry out embodiments of the present invention can be stored in a memory in the system, such as DRAM, cache, flash memory, or other storage. Still further, the instructions can be distributed via a network or via other computer readable media. Thus, the machine readable medium may include any mechanism for storing or transmitting information in a form readable by a machine (eg, a computer), but is not limited thereto, a floppy disk, a compact disc, or a compact disc. , read-only memory (CD-ROM), and ferromagnetic optical disc, read-only memory (ROM), random access memory (RAM), can eliminate programmable read-only memory (EPROM), electrical can be eliminated Program-controlled read-only memory (EEPROM), magnetic or optical card, flash memory, or used to transmit signals (eg, carrier waves, infrared signals, digital signals, etc.) via electrical, optical, audible, or other means One of the physical machines on the Internet can read the storage. Thus, computer-readable media includes any type of materialized machine readable medium suitable for storing or transmitting electronic instructions or information in a form readable by a machine (eg, a computer).
一設計可經歷各種階段,自創作至模擬至製造。表示一設計之資料可以一些方式代表該設計。首先,如模擬時有用,硬體可以使用一硬體敘述語言或另一功能敘述語言被表示。另外地,具有邏輯及/或電晶體閘之電路位準模組可以在設計處理程序的一些階段中被產生。更進一步地,大多數的設計,在一些階段,達到表示硬體模組之各種裝置的實際安置之資料位準。於習見的半導體製造技術被使用之情況中,表示硬體模組之資料可以是被使用以產 生積體電路之遮罩中,指明在不同遮罩層的各種特點之存在或不存在之資料。於任何之設計表示中,該資料可以機器可讀取媒體之任何形式被儲存。一記憶體或一磁式或光學儲存器,例如,一磁碟片,可以是機器可讀取媒體,其儲存經由調變或此外以不同方式被產生以發送此資訊之光學或電氣波被發送之資訊。當指示或攜帶數碼或設計之一電氣載波被發送時,在某種程度上,電氣信號之複製、緩衝、或再發送被進行,則一新的複製被製造。因此,一通訊提供者或網路提供者可能,至少暫時地,將一物件(例如,被編碼成為載波以實施本發明實施例之技術的資訊)儲存在一實體機器可讀取媒體上。 A design can go through various stages, from creation to simulation to manufacturing. Information indicating a design can represent the design in some way. First, as useful when simulating, the hardware can be represented using a hardware narrative language or another functional narrative language. Additionally, a circuit level module having logic and/or transistor gates can be generated during some stages of the design process. Further, most designs, at some stages, reach the data level of the actual placement of the various devices representing the hardware modules. In the case where the semiconductor manufacturing technology is used, the data indicating the hardware module can be used to produce In the mask of the biofilm circuit, information indicating the presence or absence of various features of different mask layers. In any design representation, the material may be stored in any form of machine readable media. A memory or a magnetic or optical storage, such as a magnetic disk, may be a machine readable medium whose storage is transmitted via modulation or otherwise optically or electrically generated in a different manner to transmit the information. Information. When an electrical carrier that indicates or carries a digital or design is transmitted, to some extent, the copying, buffering, or retransmission of the electrical signal is performed, and a new copy is made. Thus, a communication provider or network provider may, at least temporarily, store an object (e.g., information encoded as a carrier to implement the techniques of the embodiments of the present invention) on a physical machine readable medium.
在現代的處理器中,一些不同的執行單元被使用以處理及執行多種數碼及指令。並非所有的指令都是一樣地被產生,一些是較快地被完成而其他的可能需要花費一些時脈週期來完成。指令的產能速度越快,則處理器的整體性能愈好。因此盡可能多的指令盡可能快速地執行將是有利。但是,有某些指令其具有較大的複雜性且需要更多的執行時間及處理器資源。例如,有浮點指令、負載/儲存操作、資料移動等等。 In modern processors, a number of different execution units are used to process and execute a variety of digital and instructional. Not all instructions are generated the same, some are completed faster and others may take some clock cycles to complete. The faster the commanded production capacity, the better the overall performance of the processor. It would therefore be advantageous to execute as many instructions as possible as quickly as possible. However, there are certain instructions that have greater complexity and require more execution time and processor resources. For example, there are floating point instructions, load/store operations, data movement, and the like.
隨著越來越多的電腦系統被使用於網際網路、文字、以及多媒體應用中,通常額外的處理器支援已被引介。於一實施例中,一指令集可以是關聯於一個或多個電腦結構,其包含資料型式、指令、暫存器結構、定址模式、記憶體結構、中斷和例外處理、以及外部輸入及輸出(I/O)。 As more and more computer systems are used in Internet, text, and multimedia applications, additional processor support has often been introduced. In one embodiment, an instruction set may be associated with one or more computer structures including data patterns, instructions, scratchpad structures, addressing modes, memory structures, interrupt and exception handling, and external inputs and outputs ( I/O).
於一實施例中,指令集結構(ISA)可利用一個或多個微結構被實行,其包含被使用以實行一個或多個指令集之處理器邏輯及電路。因此,具有不同微結構的處理器可共用一共同指令集之至少一部份。例如,英特爾奔騰4(Intel®Pentium4)處理器、英特爾核心(Intel®CoreTM)處理器、以及來自加州桑尼維爾公司之超微裝置的處理器,幾乎實行x86指令集的相同版本(以及已加入較新版本之一些擴展),但是具有不同的內部設計。同樣地,利用其他處理器開發公司(例如,ARM控股公司、MIPS、或它們特許者或採用者)所設計的處理器,可共用一共同指令集之至少一部份,但是可包含不同的處理器設計。例如,ISA之相同暫存器結構可使用新的或習知技術以不同方式被實行於不同微結構中,其包含專用實體暫存器、一個或多個使用一暫存器重命名機構(例如,一暫存器混疊列表(RAT)、一重排序緩衝器(ROB)及一除役暫存器檔案之使用)而動態地被分配之實體暫存器。於一實施例中,暫存器可以包含一個或多個可以是或可能不是由一軟體程式師可定址之暫存器、暫存器結構、暫存器檔案、或其他暫存器集合。 In one embodiment, an instruction set architecture (ISA) may be implemented using one or more microstructures including processor logic and circuitry used to implement one or more instruction sets. Thus, processors having different microstructures can share at least a portion of a common instruction set. For example the same version, the Intel Pentium 4 (Intel®Pentium4) processor, Intel Core (Intel®Core TM) processors from Advanced Micro Devices, and the processor means Company of Sunnyvale, California, is almost carry out the x86 instruction set (and has Add some extensions to newer versions), but with different internal designs. Similarly, processors designed by other processor development companies (eg, ARM Holdings, MIPS, or their licensors or adopters) can share at least a portion of a common instruction set, but can include different processing. Design. For example, the same scratchpad structure of the ISA can be implemented in different microstructures in different ways using new or conventional techniques, including a dedicated physical scratchpad, one or more renaming mechanisms using a scratchpad (eg, A physical register that is dynamically allocated by a register aliasing list (RAT), a reordering buffer (ROB), and a deduplication register file. In one embodiment, the scratchpad may contain one or more registers, scratchpad structures, scratchpad files, or other sets of registers that may or may not be addressable by a software programmer.
於一實施例中,一指令可以包含一個或多個指令格式。於一實施例中,一指令格式可以指示各種欄位(位元數、位元位置等等)以在其他事項之中,指明將被進行之操作及操作將於其上被進行的運算元。一些指令格式可以進一步地利用指令樣模(或子格式)被界定。例如,一給予 指令格式之指令樣模可以被界定以具有指令格式之欄位的不同子集及/或被界定以具有不同地被詮釋之被給予之欄位。於一實施例中,一指令使用一指令格式被表示(並且,如果被界定,於那指令格式的該等指令樣模之所給予的一者)並且指定或指示操作以及該操作將在其上操作之運算元。 In an embodiment, an instruction may include one or more instruction formats. In one embodiment, an instruction format may indicate various fields (number of bits, bit positions, etc.) to indicate among other things, the operations to be performed and the operations on which the operations are to be performed. Some instruction formats can be further defined using instruction patterns (or sub-formats). For example, one gives The instruction pattern of the instruction format can be defined to have a different subset of the fields of the instruction format and/or be defined to have differently interpreted fields that are given. In one embodiment, an instruction is represented using an instruction format (and, if defined, one of the instruction patterns of the instruction format) and specifies or indicates an operation and the operation is to be performed thereon The operand of the operation.
科學、金融、自動向量化一般用途、RMS(辨識、挖採、及合成)、以及視覺和多媒體應用(例如,2D/3D圖形、影像處理、視訊壓縮/解壓縮、聲音辨識演算法以及音訊處理)可能需要將相同操作被進行在大量資料項目上。於一實施例中,單指令多資料(SIMD)係指一指令型式,其導致一處理器進行一操作於多數個資料元件上。SIMD技術可被使用於處理器中,其可邏輯地將一暫存器中之位元分割成為一些固定大小或可變大小的資料元素,其各代表一各別的值。例如,於一實施例中,一64位元暫存器中之位元可以被組織作為含有四個各別的16位元資料元素之一源運算元,其各代表一各別的16位元值。這型式之資料可被稱為‘封裝’資料型式或‘向量’資料型式,並且這資料型式之運算元被稱為封裝資料運算元或向量運算元。於一實施例中,一封裝資料項目或向量可以是被儲存在一單一暫存器內之一序列封裝資料元素,並且一封裝資料運算元或一向量運算元可以是SIMD指令(或‘封裝資料指令’或一‘向量指令’)之一來源或目的運算元。於一實施例中,一SIMD指令指定將被進行於二個來源向量 運算元上之一單一向量操作,以產生具有相同或不同的資料元素數目,以及相同或不同的資料元素順序之相同或不同大小的一目的向量運算元(也被稱為一結果向量運算元)。 Science, finance, automated vectorization general purpose, RMS (identification, mining, and synthesis), and visual and multimedia applications (eg, 2D/3D graphics, image processing, video compression/decompression, voice recognition algorithms, and audio processing) ) It may be necessary to perform the same operation on a large number of data items. In one embodiment, single instruction multiple data (SIMD) refers to a type of instruction that causes a processor to perform an operation on a plurality of data elements. SIMD techniques can be used in a processor that can logically partition a bit in a register into fixed or variable size data elements, each representing a respective value. For example, in one embodiment, a bit in a 64-bit scratchpad can be organized as a source operand containing one of four distinct 16-bit data elements, each representing a respective 16-bit element. value. This type of material can be referred to as a 'packaged' data type or a 'vector' data type, and the data elements of this data type are referred to as package data operands or vector operation elements. In one embodiment, a package data item or vector may be a sequence of encapsulated data elements stored in a single scratchpad, and a packaged data operand or a vector operation element may be a SIMD instruction (or 'package data') One of the source or destination operands of the instruction 'or a 'vector instruction'). In one embodiment, a SIMD instruction designation is to be made to two source vectors. A single vector operation on an operand to produce a destination vector operand (also referred to as a result vector operand) having the same or different number of data elements and the same or different size of the same or different data element order. .
SIMD技術,例如,被具有包含x86、MMXTM、串流SIMD擴展(SSE)、SSE2、SSE3、SSE4.1、以及SSE4.2指令之一指令集的Intel®CoreTM處理器所採用,例如,被具有包含向量浮點(VFP)及/或NEON指令之一指令集的ARMCortex®處理器家族之ARM處理器所採用,以及例如,被由中國科技學院電腦技術協會(ICT)所發展之Loongson處理器家族的MIPS處理器所採用,具有致能應用性能之一主要改進(CoreTM和MMXTM是美國加州聖克拉拉市之英特爾公司之註冊商標或商標)。 SIMD technology, for example, has been comprising x86, Intel®Core TM processor MMX TM, streaming SIMD extensions (SSE), SSE2, SSE3, SSE4.1, one SSE4.2 instruction and the instruction set employed, e.g., Used by ARM processors with the ARMCortex® family of processors that include one of the vector floating point (VFP) and/or NEON instructions, and, for example, by Loongson, developed by the Computer Technology Association (ICT) of the China Institute of Science and Technology. Adopted by the family of MIPS processors, with one of the major improvements in enabling application performance (Core TM and MMX TM are registered trademarks or trademarks of Intel Corporation of Santa Clara, Calif.).
於一實施例中,目的及來源暫存器/資料是表示對應的資料或操作之來源及目的之通用術語。於一些實施例中,它們可藉由暫存器、記憶體、或具有不同於那些所展示者的其他名稱或功能之其他儲存區域被實行。例如,於一實施例中,“DEST1”可以是一臨時儲存暫存器或其他儲存區域,因而“SRC1”和“SRC2”可以是一第一及第二來源儲存暫存器或其他儲存區域、以及其它者。於其他實施例中,二個或更多個SRC和DEST儲存區域可對應至在相同儲存區域(例如,一SIMD暫存器)內之不同的資料儲存元件。於一實施例中,該等來源暫存器之一者也可作用如同一目的暫存器,例如,藉由將於第一及第二來源資料上被 進行之一操作結果回寫至作為一目的暫存器的二個來源暫存器之一者。 In one embodiment, the purpose and source register/data is a generic term that indicates the source and purpose of the corresponding data or operation. In some embodiments, they may be implemented by a scratchpad, memory, or other storage area having other names or functions than those shown. For example, in an embodiment, "DEST1" may be a temporary storage register or other storage area, and thus "SRC1" and "SRC2" may be a first and second source storage register or other storage area, And others. In other embodiments, two or more SRC and DEST storage areas may correspond to different data storage elements within the same storage area (eg, a SIMD register). In one embodiment, one of the source registers may also act as a destination buffer, for example, by being on the first and second source materials. One of the operation results is written back to one of the two source registers as a destination register.
圖1A是依據本發明一實施例藉由包含執行一指令之執行單元的一處理器所形成之電腦系統範例的方塊圖。例如,依據本發明於此處說明的實施例中,系統100包含一構件,例如,處理器102,以採用包含邏輯以進行用以處理資料之演算法之執行單元。系統100是代表依據來自加利福尼亞州聖克拉拉市的英特爾公司之可用的PENTIUM®III、PENTIUM®4、XeonTM、Itanium®、XScaleTM及/或StrongARMTM微處理器之處理系統,雖然其他系統(包含具有其他微處理器、工程工作站、機上盒及其類似者之PC)也可被使用。於一實施例中,樣本系統100可以執行來自華盛頓州雷蒙德市的微軟公司的WINDOWSTM系統之可用的版本,雖然其他操作系統(例如,UNIX和Linux)、嵌入式軟體、及/或圖形使用者介面,也可被使用。因此,本發明實施例是不受限定於硬體電路及軟體之任何特定組合。 1A is a block diagram of an example of a computer system formed by a processor including an execution unit that executes an instruction, in accordance with an embodiment of the present invention. For example, in an embodiment illustrated herein in accordance with the present invention, system 100 includes a component, such as processor 102, to employ an execution unit that includes logic to perform algorithms for processing data. System 100 is available from representatives based on Intel's Santa Clara, California's PENTIUM ® III, PENTIUM ® 4, Xeon TM, Itanium ®, XScale TM and / or StrongARM TM microprocessor processing system, although other systems ( PCs with other microprocessors, engineering workstations, set-top boxes, and the like can also be used. In a While other operating systems (eg, UNIX and Linux), embedded software, and / or graphics embodiments, the sample system 100 may perform a version of WINDOWS TM systems available from Microsoft Corporation of Redmond, Washington, The user interface can also be used. Therefore, the embodiments of the present invention are not limited to any specific combination of hardware circuits and software.
實施例是不受限定於電腦系統。本發明之不同實施例可被使用於其他裝置中,例如,手持裝置以及嵌入式應用。手持裝置之一些範例包含行動電話、網際網路協定裝置、數位攝影機、個人數位助理(PDA)、以及手持PC。嵌入式應用可包含一微控制器、一數位信號處理器(DSP)、單晶片系統、網路電腦(NetPC)、機上盒、網路集線器、廣域網路(WAN)交換機、或可依據至少一實施例而進行一 個或多個指令之任何其他系統。 Embodiments are not limited to computer systems. Different embodiments of the invention may be used in other devices, such as handheld devices and embedded applications. Some examples of handheld devices include mobile phones, internet protocol devices, digital cameras, personal digital assistants (PDAs), and handheld PCs. The embedded application may include a microcontroller, a digital signal processor (DSP), a single chip system, a network computer (NetPC), a set-top box, a network hub, a wide area network (WAN) switch, or may be based on at least one Example one Any other system of one or more instructions.
圖1A是依據本發明一實施例之藉由一處理器102被形成之電腦系統100的方塊圖,該處理器102包含一個或多個執行單元108以進行一演算法而進行至少一指令。一實施例可以以一單一處理器桌上型或伺服器系統之本文脈絡被說明,但是不同實施例可被包含於一多數處理器之系統中。系統100是一‘中樞’系統結構之範例。該電腦系統100包含一處理器102以處理資料信號。該處理器102可以是,例如,一複雜指令集電腦(CISC)微處理器、一精簡指令集計算(RISC)微處理器、一超長指令字組(VLIW)微處理器、一實行一指令集組合之處理器、或任何其他處理器裝置,例如,一數位信號處理器。該處理器102被耦合至一處理器匯流排110,其可在處理器102及系統100的其他構件之間傳送資料信號。系統100之元件進行那些熟習本技術者習知之它們習見的功能。 1A is a block diagram of a computer system 100 formed by a processor 102 that includes one or more execution units 108 for performing an algorithm to perform at least one instruction, in accordance with an embodiment of the present invention. An embodiment may be illustrated in the context of a single processor desktop or server system, although different embodiments may be included in a multi-processor system. System 100 is an example of a 'hub' system architecture. The computer system 100 includes a processor 102 for processing data signals. The processor 102 can be, for example, a Complex Instruction Set Computer (CISC) microprocessor, a Reduced Instruction Set Computing (RISC) microprocessor, a Very Long Instruction Block (VLIW) microprocessor, and an instruction to execute an instruction. A combined processor, or any other processor device, such as a digital signal processor. The processor 102 is coupled to a processor bus 110 that can transmit data signals between the processor 102 and other components of the system 100. The elements of system 100 perform those functions that are familiar to those skilled in the art.
於一實施例中,該處理器102包含一位準1(L1)內部快取記憶體104。取決於其結構,該處理器102可具有一單一內部快取或多數個位準之內部快取。另外地,於另一實施例中,快取記憶體可存在於該處理器102之外部。其他實施例也可依據特定的實行例及需求以包含內部和外部快取兩者之組合。暫存器檔案106可將不同型式的資料儲存於各種暫存器中,其包含整數暫存器、浮點暫存器、狀態暫存器、及指令指示暫存器。 In one embodiment, the processor 102 includes a one-bit 1 (L1) internal cache memory 104. Depending on its structure, the processor 102 can have a single internal cache or a majority of internal caches. Additionally, in another embodiment, the cache memory may be external to the processor 102. Other embodiments may also include a combination of both internal and external caches, depending on the particular implementation and needs. The scratchpad file 106 can store different types of data in various scratchpads, including an integer register, a floating point register, a status register, and an instruction indication register.
包含邏輯以進行整數及浮點操作之執行單元108,同 時也存在於該處理器102中。該處理器102也包含一微碼(ucode)ROM,其儲存用於某些巨指令之微碼。對於一實施例,執行單元108包含處理封裝指令集109之邏輯。藉由包含該封裝指令集109於一般用途處理器102之指令集中,以及執行該等指令的相關聯電路,被許多多媒體應用所使用之操作可以使用在一般用途處理器102中之封裝資料而被進行。因此,許多的多媒體應用可藉由使用用以進行封裝資料上之操作的處理器之資料匯流排的全寬度而被加速且更有效地被執行。這可消除越過處理器之資料匯流排而轉移較小單位資料以每次以一資料元素之方式進行一個或多個操作之需求。 An execution unit 108 that includes logic for integer and floating point operations, the same It also exists in the processor 102. The processor 102 also includes a microcode (ucode) ROM that stores microcode for certain macro instructions. For an embodiment, execution unit 108 includes logic to process package instruction set 109. The operations used by many multimedia applications can be used with the package material in the general purpose processor 102 by including the package instruction set 109 in the instruction set of the general purpose processor 102, and the associated circuitry that executes the instructions. get on. Thus, many multimedia applications can be accelerated and executed more efficiently by using the full width of the data bus of the processor used to perform the operations on the packaged data. This eliminates the need to transfer smaller unit data across the processor's data bus to perform one or more operations per data element.
一執行單元108之不同實施例也可被使用於微控制器、嵌入式處理器、圖形裝置、DSP、以及其他型式之邏輯電路中。系統100包含記憶體120。記憶體120可以是一動態隨機存取記憶體(DRAM)裝置、一靜態隨機存取記憶體(SRAM)裝置、快閃記憶體裝置或其他記憶體裝置。記憶體120,其等可儲存藉由可利用處理器102被執行之資料信號所表示的指令及/或資料。 Different embodiments of an execution unit 108 can also be used in microcontrollers, embedded processors, graphics devices, DSPs, and other types of logic circuits. System 100 includes a memory 120. The memory 120 can be a dynamic random access memory (DRAM) device, a static random access memory (SRAM) device, a flash memory device, or other memory device. Memory 120, which may store instructions and/or data represented by data signals that may be executed by processor 102.
一系統邏輯晶片116被耦合至處理器匯流排110及記憶體120。於例示之實施例中,該系統邏輯晶片116是一記憶體控制器中樞(MCH)。該處理器102可經由一處理器匯流排110通訊至MCH 116。該MCH 116提供一高帶寬記憶體路線118至記憶體120以供用於指令和資料儲存以及供用於圖形命令、資料和紋理結構之儲存。該MCH 116 是用以引導在系統100中之該處理器102、記憶體120、及其他構件之間的資料信號,並且橋接在處理器匯流排110、記憶體120、及系統I/O 122之間的資料信號。於一些實施例中,該系統邏輯晶片116可提供圖形埠以供耦合至一圖形控制器112。該MCH 116經由記憶體介面118被耦合至記憶體120。圖形卡112經由一加速圖形埠(AGP)互連114被耦合至該MCH 116。 A system logic wafer 116 is coupled to the processor bus 110 and memory 120. In the illustrated embodiment, the system logic chip 116 is a memory controller hub (MCH). The processor 102 can communicate to the MCH 116 via a processor bus 110. The MCH 116 provides a high bandwidth memory path 118 to the memory 120 for storage of instructions and data and for storage of graphics commands, data and texture structures. The MCH 116 It is used to guide data signals between the processor 102, the memory 120, and other components in the system 100, and is bridged between the processor bus 110, the memory 120, and the system I/O 122. signal. In some embodiments, the system logic chip 116 can provide graphics for coupling to a graphics controller 112. The MCH 116 is coupled to the memory 120 via a memory interface 118. Graphics card 112 is coupled to the MCH 116 via an accelerated graphics layer (AGP) interconnect 114.
系統100使用一專有中樞介面匯流排122以將該MCH 116耦合至該I/O控制器中樞(ICH)130。該ICH 130經由一局域I/O匯流排提供直接連接至一些I/O裝置。該局域I/O匯流排是用以連接週邊至記憶體120、晶片組、以及處理器102的一快速I/O匯流排。一些範例是音訊控制器、韌體中樞(快閃BIOS)128、無線收發機126、資料儲存器124、含有使用者輸入及鍵盤介面之傳統I/O控制器、一串列擴充埠(例如,通用系列匯流排(USB))、以及一網路控制器134。資料儲存裝置124可包括一硬碟驅動器、一軟式磁碟片驅動器、一CD-ROM裝置、一快閃記憶體裝置、或其他大量儲存裝置。 System 100 uses a proprietary hub interface bus 122 to couple the MCH 116 to the I/O controller hub (ICH) 130. The ICH 130 provides direct connection to some I/O devices via a local area I/O bus. The local I/O bus is a fast I/O bus for connecting peripherals to the memory 120, the chipset, and the processor 102. Some examples are audio controllers, firmware hubs (flash BIOS) 128, wireless transceivers 126, data storage 124, conventional I/O controllers with user input and keyboard interfaces, and a series of expansion ports (eg, A universal series bus (USB)), and a network controller 134. The data storage device 124 can include a hard disk drive, a floppy disk drive, a CD-ROM device, a flash memory device, or other mass storage device.
對於系統之另一實施例,依據一實施例之一指令可被配合使用於單晶片系統。單晶片系統之一實施例包括一處理器及一記憶體。對於此一系統之記憶體是一快閃記憶體。該快閃記憶體可被安置於作為處理器及其他系統構件之相同晶模上。另外地,其他邏輯區塊,例如,一記憶體控制器或圖形控制器,也可被安置於一單晶片系統上。 For another embodiment of the system, instructions in accordance with one embodiment can be used in conjunction with a single wafer system. One embodiment of a single wafer system includes a processor and a memory. The memory for this system is a flash memory. The flash memory can be placed on the same crystal form as the processor and other system components. Additionally, other logic blocks, such as a memory controller or graphics controller, may also be placed on a single wafer system.
圖1B例示實行本發明一實施例之原理的一資料處理系統140。熟習本領域之技術人員應了解,於此處說明之實施例可被使用於不同的處理系統中而不脫離本發明實施例之範疇。 FIG. 1B illustrates a data processing system 140 that implements the principles of an embodiment of the present invention. Those skilled in the art will appreciate that the embodiments described herein can be utilized in various processing systems without departing from the scope of the embodiments of the invention.
電腦系統140包括一處理核心159,其可進行依據一實施例之至少一指令。對於一實施例,處理核心159代表任何型式結構之一處理單元,其包含但是不受限定於一CISC、一RISC或一VLIW型式結構。處理核心159也可以是適用於以一個或多個處理技術製造,並且足夠詳細地被表示在一機器可讀取媒體上,而可以是適合於方便該製造。 Computer system 140 includes a processing core 159 that can execute at least one instruction in accordance with an embodiment. For an embodiment, processing core 159 represents a processing unit of any type of architecture, including but not limited to a CISC, a RISC, or a VLIW type of structure. Processing core 159 may also be adapted to be fabricated in one or more processing techniques and expressed in sufficient detail on a machine readable medium, but may be adapted to facilitate the manufacture.
處理核心159包括一執行單元142、一組暫存器檔案145、以及一解碼器144。處理核心159同時也包含另外的電路(未被展示於圖形中),其不是對於本發明實施例之了解所必須。執行單元142被使用於執行利用處理核心159所接收的指令。除了進行一般的處理器指令之外,執行單元142也可進行封裝指令集143中之指令以供進行封裝資料格式上之操作。封裝指令集143包含用以進行本發明實施例之指令以及其他封裝指令。執行單元142藉由一內部匯流排被耦合至暫存器檔案145。暫存器檔案145代表在處理核心159上之一儲存區域以供儲存包含資料之資訊。如先前所述,應了解,被使用以供儲存封裝資料之儲存區域不是緊要的。執行單元142被耦合至解碼器144。解碼器144被使用以供將利用處理核心159所接收之指令 解碼成為控制信號及/或微碼入口點。響應這些控制信號及/或微碼入口點,執行單元142進行適當的操作。於一實施例中,該解碼器被使用以轉譯指令之操作碼(opcode),其將指示哪些操作應該在指令內所指示之對應的資料上被進行。 Processing core 159 includes an execution unit 142, a set of scratchpad files 145, and a decoder 144. Processing core 159 also includes additional circuitry (not shown in the graphics) that is not required for an understanding of embodiments of the present invention. Execution unit 142 is used to execute instructions received by processing core 159. In addition to performing general processor instructions, execution unit 142 may also execute instructions in package instruction set 143 for operation on the package data format. The package instruction set 143 includes instructions for performing embodiments of the present invention, as well as other package instructions. Execution unit 142 is coupled to register file 145 by an internal bus. The scratchpad file 145 represents a storage area on the processing core 159 for storing information containing the data. As previously stated, it should be understood that the storage area used to store packaged data is not critical. Execution unit 142 is coupled to decoder 144. Decoder 144 is used for instructions to be received by processing core 159 The decoding becomes a control signal and/or a microcode entry point. In response to these control signals and/or microcode entry points, execution unit 142 performs the appropriate operations. In one embodiment, the decoder is used to translate the opcode of the instruction, which will indicate which operations should be performed on the corresponding material indicated within the instruction.
處理核心159被耦合於匯流排141以供與各種其他系統裝置通訊,其可包含,但是不受限定於,例如,同步動態隨機存取記憶體(SDRAM)控制146、靜態隨機存取記憶體(SRAM)控制147、叢發快閃記憶體介面148、個人電腦記憶體卡國際協會(PCMCIA)/小型快閃(CF)卡控制149、液晶顯示器(LCD)控制150、直接記憶體存取(DMA)控制器151、以及交替匯流排主介面152。於一實施例中,資料處理系統140也可包括用以經由一I/O匯流排153與各種I/O裝置通訊之一I/O橋154。此I/O裝置可包含,但是不受限定於,例如,通用非同步接收器/發送器(UART)155、通用串列匯流排(USB)156、藍牙無線UART157以及I/O擴充介面158。 Processing core 159 is coupled to bus 141 for communication with various other system devices, which may include, but is not limited to, for example, synchronous dynamic random access memory (SDRAM) control 146, static random access memory ( SRAM) control 147, burst flash memory interface 148, PC Memory Card International Association (PCMCIA) / Compact Flash (CF) card control 149, liquid crystal display (LCD) control 150, direct memory access (DMA) The controller 151, and the alternate bus master interface 152. In one embodiment, data processing system 140 may also include an I/O bridge 154 for communicating with various I/O devices via an I/O bus 153. Such I/O devices may include, but are not limited to, for example, a Universal Non-Synchronous Receiver/Transmitter (UART) 155, a Universal Serial Bus (USB) 156, a Bluetooth Wireless UART 157, and an I/O Expansion Interface 158.
資料處理系統140之一實施例提供移動式、網路及/或無線通訊以及處理核心159可進行包含一文字串比較操作之SIMD操作。處理核心159可以藉由包含離散轉換之各種音訊、視訊、影像以及通訊演算法被程式規化,例如,一沃爾什-哈達瑪(Walsh-Hadamard)轉換、一快速傅立葉轉換(FFT)、一離散餘弦轉換(DCT)、以及它們之各自的反向轉換;壓縮/解壓縮技術,例如,色彩空間轉換、視 訊編碼移動估計或視訊解碼移動補償;以及調變/解調變功能,例如,脈波編碼調變(PCM)。 One embodiment of data processing system 140 provides for mobile, network, and/or wireless communication, and processing core 159 can perform SIMD operations that include a string comparison operation. The processing core 159 can be programmed by various audio, video, video, and communication algorithms including discrete conversions, for example, a Walsh-Hadamard conversion, a fast Fourier transform (FFT), and a Discrete cosine transform (DCT), and their respective inverse transforms; compression/decompression techniques, for example, color space conversion, view The coded motion estimation or video decoding motion compensation; and the modulation/demodulation function, for example, Pulse Code Modulation (PCM).
圖1C例示可執行指令以提供基底暫存器交換狀態驗證功能之一資料處理系統之另一不同實施例。依據一不同實施例,資料處理系統160可以包含一主處理器166、一SIMD輔助處理器161、一快取記憶體167、以及一輸入/輸出系統168。該輸入/輸出系統168可以選擇地被耦合至一無線介面169。SIMD輔助處理器161是可進行包含依據一實施例之指令的操作。處理核心170可以是適用於以一個或多個處理技術製造並且藉由足夠詳細地被表示在一機器可讀取媒體上,而可以是適合於便利包含處理核心170之所有或部份的資料處理系統160之製造。 1C illustrates another different embodiment of a data processing system that can execute instructions to provide a substrate scratchpad exchange status verification function. According to a different embodiment, data processing system 160 can include a main processor 166, a SIMD auxiliary processor 161, a cache memory 167, and an input/output system 168. The input/output system 168 can be selectively coupled to a wireless interface 169. The SIMD auxiliary processor 161 is operable to include instructions in accordance with an embodiment. Processing core 170 may be adapted to be fabricated in one or more processing techniques and represented on a machine readable medium in sufficient detail, but may be adapted to facilitate data processing including all or a portion of processing core 170. Fabrication of system 160.
對於一實施例,SIMD輔助處理器161包括一執行單元162及一暫存器檔案164。主處理器166之一實施例包括一解碼器165以辨識包含用以依據一實施例而藉由執行單元162執行之指令的指令集163之指令。對於不同實施例,SIMD輔助處理器161同時也包括至少部份之解碼器165B以解碼指令集163之指令。處理核心170同時也包含另外的電路(未被展示於圖中),對於本發明實施例之了解,其不是必需的。 For an embodiment, the SIMD auxiliary processor 161 includes an execution unit 162 and a register file 164. One embodiment of main processor 166 includes a decoder 165 to identify instructions that include an instruction set 163 for executing instructions by execution unit 162 in accordance with an embodiment. For different embodiments, SIMD auxiliary processor 161 also includes at least a portion of decoder 165B to decode the instructions of instruction set 163. Processing core 170 also includes additional circuitry (not shown in the figures) which is not required for an understanding of embodiments of the invention.
當操作時,主處理器166執行一串流之資料處理指令,其控制包含與快取記憶體167、及輸入/輸出系統168之互動的一般型式之資料處理操作。被嵌進在資料處理指令串流之內的是SIMD輔助處理器指令。主處理器166之 解碼器165確認這些SIMD輔助處理器指令,如將藉由一附帶SIMD輔助處理器161被執行的型式。因此,主處理器166自它們利用任何附帶SIMD輔助處理器所接收的輔助處理器匯流排171上發出這些SIMD輔助處理器指令(或代表SIMD輔助處理器指令之控制信號)。於此情況中,該SIMD輔助處理器161將接受以及執行針對它之任何的接收SIMD輔助處理器指令。 When operating, main processor 166 executes a stream of data processing instructions that control the general type of data processing operations that include interaction with cache memory 167 and input/output system 168. Embedded within the data processing instruction stream is a SIMD auxiliary processor instruction. Main processor 166 Decoder 165 acknowledges these SIMD auxiliary processor instructions, as would be performed by a SIMD-assisted processor 161. Thus, main processor 166 issues these SIMD auxiliary processor instructions (or control signals representative of SIMD auxiliary processor instructions) from their auxiliary processor bus 171 received by any of the SIMD auxiliary processors. In this case, the SIMD assist processor 161 will accept and execute any of the received SIMD assist processor instructions for it.
資料可經由無線介面169被接收以供用於藉由SIMD輔助處理器指令之處理。對於一範例,語音通訊可以數位信號形式被接收,其可利用SIMD輔助處理器指令被處理以重新產生語音通訊之數位音訊取樣表示。對於另一範例,被壓縮之音訊及/或視訊可以數位位元流形式被接收,其可利用SIMD輔助處理器指令被處理以重新產生數位音訊取樣及/或移動視訊像框。對於處理核心170之一實施例,主處理器166、及一SIMD輔助處理器161被整合進入包括一執行單元162、一暫存器檔案164、以及一解碼器165之一單一處理核心170,以辨識包含依據一實施例之指令的指令集163之指令。 The data may be received via wireless interface 169 for processing by the SIMD auxiliary processor instructions. For an example, voice communication may be received in the form of a digital signal that may be processed using SIMD assisted processor instructions to regenerate a digital audio sample representation of the voice communication. For another example, the compressed audio and/or video may be received in the form of a digital bit stream that may be processed using SIMD assisted processor instructions to regenerate the digital audio samples and/or the moving video frame. For one embodiment of the processing core 170, the main processor 166, and a SIMD auxiliary processor 161 are integrated into a single processing core 170 including an execution unit 162, a register file 164, and a decoder 165 to An instruction is identified that includes an instruction set 163 in accordance with an instruction of an embodiment.
圖2是包含用以進行依據本發明一實施例之指令的邏輯電路之一處理器200的微結構方塊圖。於一些實施例中,依據一實施例之一指令可被實行以在具有位元組、字組、雙字組、四字組、等等大小之資料元素以及資料型式,例如,單一及雙精確性整數以及浮點資料型式上操作。於一實施例中,依序前端點201是處理器200之部 份,其擷取將被執行之指令並且備妥它們以供稍後於處理器管線中被使用。該前端點201可以包含許多單元。於一實施例中,指令預擷取器226自記憶體擷取指令並且將它們饋送至一指令解碼器228,其接著解碼或轉譯它們。例如,於一實施例中,該解碼器解碼一接收的指令成為被稱為“微指令”或“微操作”(同時也被稱為微op或uop)之機器可執行的一個或多個操作。於其他實施例中,解碼器將指令剖析成為一操作碼以及對應的資料,以及被微結構所使用之控制欄以進行依據一實施例之操作。於一實施例中,追蹤快取230採用被解碼之微操作並且組合它們成為用於執行之微操作佇列234中的程式有序序列或蹤跡。當追蹤快取230遇到一複雜指令時,該微碼ROM232提供完成該操作所需的微操作。 2 is a block diagram of a microstructure of a processor 200 including logic circuitry for performing instructions in accordance with an embodiment of the present invention. In some embodiments, instructions in accordance with an embodiment may be implemented to have data elements of a size, such as a byte, a block, a double block, a quad, or the like, and a data pattern, such as single and double precision. Sexual integers and operations on floating point data types. In an embodiment, the sequential front end point 201 is part of the processor 200. A copy that fetches the instructions to be executed and prepares them for later use in the processor pipeline. The front end point 201 can contain a number of units. In one embodiment, instruction prefetcher 226 fetches instructions from memory and feeds them to an instruction decoder 228, which then decodes or translates them. For example, in one embodiment, the decoder decodes a received instruction into one or more operations executable by a machine called a "microinstruction" or "micro-operation" (also referred to as micro-op or uop). . In other embodiments, the decoder parses the instructions into an opcode and corresponding material, and a control bar used by the microstructure to perform operations in accordance with an embodiment. In one embodiment, the trace cache 230 employs decoded micro-ops and combines them into a program ordered sequence or trace for execution in the micro-ops array 234. When the trace cache 230 encounters a complex instruction, the microcode ROM 232 provides the micro-ops needed to complete the operation.
一些指令被轉換成為一單一微操作,而其他者需要許多微操作以完成全部的操作。於一實施例中,如果多於四個微操作是完成一指令所需要的,則解碼器228存取該微碼ROM 232以處理該指令。對於一實施例,一指令可被解碼成為一小數量的微操作以供在指令解碼器228之處理。於另一實施例中,一指令可被儲存在微碼ROM 232之內,如果需要一些微操作以達成該操作。追蹤快取230是指一入口點可程控邏輯陣列(PLA)以判定用以自微碼ROM 232讀取微碼序列之一正確微指令指示器以完成依據一實施例之一個或多個指令。在微碼ROM 232結束用於一指令之微操作的排序之後,機器之前端點201重新開始 自追蹤快取230擷取微操作。 Some instructions are converted into a single micro-operation, while others require many micro-ops to perform all operations. In one embodiment, if more than four micro-ops are required to complete an instruction, decoder 228 accesses micro-coded ROM 232 to process the instruction. For an embodiment, an instruction can be decoded into a small number of micro-ops for processing at instruction decoder 228. In another embodiment, an instruction can be stored within the microcode ROM 232 if some micro-operation is required to achieve the operation. Tracking cache 230 refers to an entry point programmable logic array (PLA) to determine one of the correct microinstruction indicators for reading a microcode sequence from microcode ROM 232 to complete one or more instructions in accordance with an embodiment. After the microcode ROM 232 ends the sorting for the micro-ops of an instruction, the endpoint 201 is restarted before the machine Self-tracking cache 230 takes micro-operations.
失序執行引擎203是指令被備妥以供執行之處。該失序執行邏輯具有一些緩衝器以平順化以及重排序指令之流程以便當它們往下至管線以及獲得排程以供用於執行時可最佳化性能。分配器邏輯分配各微操作為了執行所需要之機器緩衝器以及資源。暫存器重新命名邏輯重新命名邏輯暫存器至一暫存器檔案中之項目上。該分配器同時也對於二個微操作佇列(一者用於記憶體操作以及一者用於非記憶體操作)之一者中的各微操作分配一項目,在指令排程器之前:記憶體排程器、快速排程器202、慢速/一般浮點排程器204、以及簡單浮點排程器206。微操作排程器202、204、206,判定何時一微操作被備妥以依據它們的從屬輸入暫存器運算元來源之準備性以及微操作需要完成它們的操作之執行資源的有效性而執行。一實施例之快速排程器202可在主要時脈週期之各半個週期上排程,而其他排程器在每個主處理器時脈週期僅可排程一次。排程器仲裁調遣埠以排程供執行之微操作。 The out-of-order execution engine 203 is where the instructions are prepared for execution. The out-of-sequence execution logic has buffers to smooth out and reorder the instructions so that they can optimize performance as they go down to the pipeline and get scheduled for execution. The allocator logic allocates each micro-op in order to perform the required machine buffers and resources. The scratchpad rename logic renames the logical scratchpad to an item in a scratchpad file. The distributor also assigns an item to each of the two micro-operation queues (one for memory operation and one for non-memory operation), before the instruction scheduler: memory A volume scheduler, a fast scheduler 202, a slow/general floating point scheduler 204, and a simple floating point scheduler 206. The micro-ops schedulers 202, 204, 206 determine when a micro-op is ready to execute based on the readiness of their slave input register operand sources and the effectiveness of the execution resources of the micro-operations that need to complete their operations . The fast scheduler 202 of an embodiment can schedule every half cycle of the main clock cycle, while other schedulers can only schedule one cycle per master processor clock cycle. The scheduler arbitrates the micro-operations that are scheduled for execution.
暫存器檔案208、210,位於排程器202、204、206、以及執行區塊211中之執行單元212、214、216、218、220、222、224之間。其是一各別的暫存器檔案208、210,分別地用於整數及浮點運算。一實施例之各個暫存器檔案208、210,也包含一旁通網路,其可旁通或傳送剛好被完成而尚未被寫入暫存器檔案之結果至新的從屬微操作。整數暫存器檔案208以及浮點暫存器檔案210也是 可與其他者通訊資料。對於一實施例,整數暫存器檔案208被分離成為二個各別的暫存器檔案,一個暫存器檔案是用於資料之低階32位元以及一第二暫存器檔案是用於資料之高階32位元。一實施例之浮點暫存器檔案210具有128位元寬項目,因為浮點指令一般具有自64至128位元之寬度的運算元。 The scratchpad files 208, 210 are located between the schedulers 202, 204, 206, and the execution units 212, 214, 216, 218, 220, 222, 224 in the execution block 211. It is a separate register file 208, 210 for integer and floating point operations, respectively. Each of the scratchpad files 208, 210 of an embodiment also includes a bypass network that bypasses or transmits the results of the slave microprocessor file that has just been completed but not yet written to the scratchpad file. The integer register file 208 and the floating point register file 210 are also Can communicate with other people. For an embodiment, the integer register file 208 is separated into two separate scratchpad files, a scratchpad file is used for low-order 32-bit data and a second scratchpad file is used for High-order 32-bit data. The floating point register file 210 of an embodiment has a 128 bit wide item because floating point instructions typically have operands from a width of 64 to 128 bits.
執行區塊211含有執行單元212、214、216、218、220、222、224,其中該等指令實際地被執行。這部份包含暫存器檔案208、210,其儲存微指令執行所需要的整數以及浮點資料運算元值。一實施例之處理器200包括一些執行單元:位址產生單元(AGU)212、AGU 214、快速ALU 216、快速ALU 218、慢速ALU 220、浮點ALU 222、浮點移動單元224。對於一實施例,浮點執行區塊222、224,執行浮點、MMX、SIMD、以及SSE,或其他運算。一實施例之浮點ALU 222包含一64位元x64位元浮點除法器以執行除法、平方根、以及其餘的微運算。對於本發明實施例,涉及一浮點值之指令可以藉由浮點硬體被處理。於一實施例中,ALU操作轉到快速ALU執行單元216、218。一實施例之快速ALU 216、218,可藉由半個時脈週期之一有效潛伏期而執行快速操作。對於一實施例,大多數複雜整數操作轉到緩慢ALU 220,因緩慢ALU 220包含用於長潛伏期型式運算之整數執行硬體,例如,一乘法器、移位器、旗標邏輯、以及分支處理。記憶體負載/儲存操作利用AGU 212、214被執行。對於一實施例, 整數ALU 216、218、220,被說明於64位元資料運算元上之進行整數操作的本文脈絡中。於不同實施例中,ALU 216、218、220,可被實行以支援包含16、32、128、256等等之多種資料位元。同樣地,浮點單元222、224,可被實行以支援具有各種寬度之位元的運算元範圍。對於一實施例,浮點單元222、224,可配合SIMD以及多媒體指令而於128位元寬之封裝資料運算元上操作。 Execution block 211 contains execution units 212, 214, 216, 218, 220, 222, 224, where the instructions are actually executed. This portion contains the scratchpad files 208, 210, which store the integers required for microinstruction execution and the floating point data operand values. The processor 200 of an embodiment includes some execution units: an address generation unit (AGU) 212, an AGU 214, a fast ALU 216, a fast ALU 218, a slow ALU 220, a floating point ALU 222, and a floating point mobile unit 224. For an embodiment, floating point execution blocks 222, 224 perform floating point, MMX, SIMD, and SSE, or other operations. The floating point ALU 222 of an embodiment includes a 64 bit x64 bit floating point divider to perform the division, the square root, and the remainder of the micro operations. For embodiments of the present invention, instructions relating to a floating point value may be processed by floating point hardware. In one embodiment, the ALU operation proceeds to the fast ALU execution unit 216, 218. The fast ALUs 216, 218 of an embodiment can perform fast operations by one of the half clock cycles of an effective latency. For an embodiment, most of the complex integer operations go to the slow ALU 220, since the slow ALU 220 contains integer execution hardware for long latency type operations, such as a multiplier, shifter, flag logic, and branch processing. . Memory load/store operations are performed using AGUs 212, 214. For an embodiment, The integer ALUs 216, 218, 220 are illustrated in the context of the integer operation of the 64-bit metadata operand. In various embodiments, ALUs 216, 218, 220 can be implemented to support multiple data bits including 16, 32, 128, 256, and the like. Similarly, floating point units 222, 224 can be implemented to support operand ranges having bits of various widths. For an embodiment, the floating point units 222, 224 can operate on a 128-bit wide packed data operand in conjunction with SIMD and multimedia instructions.
於一實施例中,微操作排程器202、204、206,在親系裝載已結束執行之前調遣從屬操作。如微操作推測性地被排程以及被執行於處理器200中,該處理器200也包含邏輯以處理記憶體錯失。如果一資料負載在資料快取中錯失了,則其可能是從屬操作在管線行程中已經留給排程器暫時地不正確之資料。一重播機構追蹤及重新執行使用不正確資料之指令。僅從屬操作將需要重新播放且獨立之一者被允許完成。一處理器之實施例的排程器以及重新播放同時也被設計以捕捉提供基底暫存器交換狀態驗證功能之指令。 In one embodiment, the micro-ops schedulers 202, 204, 206 dispatch slave operations before the parental load has finished executing. As the micro-operations are speculatively scheduled and executed in the processor 200, the processor 200 also contains logic to handle memory misses. If a data load is missed in the data cache, it may be that the slave operation has left the scheduler temporarily incorrect in the pipeline trip. A replay organization tracks and re-executes instructions for using incorrect data. Only slave operations will need to be replayed and one of the independents will be allowed to complete. The scheduler and replay of an embodiment of a processor are also designed to capture instructions that provide a base register exchange state verification function.
名稱“暫存器”可能涉及被使用作為辨識運算元之指令部份的板上處理器儲存位置。換言之,暫存器可以是自處理器外側之可使用的那些者(自一程式師之觀點)。但是,一實施例之暫存器在意義上應不受限定於一特定型式的電路。反而,一實施例之暫存器是可儲存以及提供資料,並且進行此處說明之功能。此處說明之暫存器可使用任何數量之不同技術藉由在一處理器內之電路被實行,例如,專 用實體暫存器、使用暫存器重新命名而動態地被分配於實體暫存器、專用及動態組合地被分配於實體暫存器,等等。於一實施例中,整數暫存器儲存32位元整數資料。一實施例之一暫存器檔案同時也含有用於封裝資料之八個多媒體SIMD暫存器。對於下面之討論,暫存器被視為被設計以保持封裝資料之資料暫存器,例如,藉由來自美國加州聖克拉拉之英特爾公司的MMX技術被致能之微處理器中的64位元寬之MMXTM暫存器(於一些實例中同時也被稱為‘mm’暫存器)。這些MMX暫存器,可用於整數及浮點兩形式中,可利用伴隨SIMD及SSE指令之封裝資料元素而操作。同樣地,關連於SSE2、SSE3、SSE4、或以後者(統稱為“SSEx”)技術之128位元寬的XMM暫存器也可被使用以保持此等封裝資料運算元。於一實施例中,在儲存封裝資料及整數資料中,該等暫存器並不需要在兩資料型式之間進行區分。於一實施例中,整數和浮點是被含有於相同暫存器檔案或不同暫存器檔案中。更進一步地,於一實施例中,浮點和整數資料可被儲存於不同暫存器或相同暫存器中。 The name "scratchpad" may refer to an on-board processor storage location that is used as part of the instruction that identifies the operand. In other words, the scratchpad can be those that are available from outside the processor (from a programmer's point of view). However, the register of an embodiment should not be limited in its sense to a particular type of circuit. Rather, the register of an embodiment is storable and provides information and performs the functions described herein. The registers described herein can be dynamically implemented by a physical scratchpad by using any number of different techniques by circuitry implemented within a processor, such as a dedicated physical scratchpad, renamed using a scratchpad. , dedicated and dynamic combination is assigned to the physical register, and so on. In one embodiment, the integer register stores 32-bit integer data. In one embodiment, the scratchpad file also contains eight multimedia SIMD registers for packaging the data. For the discussion below, the scratchpad is considered to be a data buffer designed to hold the package data, for example, 64 bits in a microprocessor enabled by MMX technology from Intel Corporation of Santa Clara, California, USA. Yuan Xun's MMX TM scratchpad (also referred to as the 'mm' register in some instances). These MMX registers are available in both integer and floating point formats and can be operated with packaged data elements that accompany SIMD and SSE instructions. Similarly, a 128-bit wide XMM register associated with SSE2, SSE3, SSE4, or later (collectively "SSEx") technology can also be used to maintain such packaged data operands. In an embodiment, in storing the package data and the integer data, the registers do not need to distinguish between the two data types. In one embodiment, integers and floating points are included in the same scratchpad file or in different scratchpad files. Further, in an embodiment, floating point and integer data can be stored in different registers or in the same register.
於下面圖形範例中,一些資料運算元被說明。圖3A例示根據本發明一實施例之多媒體暫存器中的各種封裝資料型式表示。圖3A例示供用於一封裝位元組310、一封裝字組320、以及供用於128位元寬運算元之一封裝雙字組(dword)330的資料型式。這範例之封裝位元組格式310是128位元長並且含有十六個封裝位元組資料元素。一位 元組於此處被界定為8個資料位元。用於各位元組資料元素之資訊是被儲存於用於位元組0之位元7至位元0中、用於位元組1之位元15至位元8、用於位元組2之位元23至位元16、並且最後是用於位元組15之位元120至位元127。因此,暫存器中所有可用的位元被使用。這儲存器配置增加處理器之儲存效能。同樣地,藉由十六資料元素被存取,一操作接著可於十六個資料元素上平行地被進行。 In the graphic example below, some data operands are illustrated. FIG. 3A illustrates various package data type representations in a multimedia buffer in accordance with an embodiment of the present invention. 3A illustrates a data pattern for a packaged byte 310, a package block 320, and a packaged dword 330 for one of the 128-bit wide operands. The encapsulated byte format 310 of this example is 128 bits long and contains sixteen encapsulated byte data elements. One The tuple is defined here as 8 data bits. The information for each tuple data element is stored in bit 7 to bit 0 for byte 0, bit 15 to bit 8 for byte 1, for byte 2 Bits 23 through 16, and finally bits 120 through 127 for byte 15. Therefore, all available bits in the scratchpad are used. This memory configuration increases the storage performance of the processor. Similarly, by accessing sixteen data elements, an operation can then be performed in parallel on sixteen data elements.
通常,一資料元素是與具有相同長度其他資料元素而被儲存於一單一暫存器或記憶體位置中之一個別的資料片段。在關於SSEx技術之封裝資料序列中,被儲存於一XMM暫存器中之資料元素數目是128位元被除於一個別的資料元素之位元長度。同樣地,在關於MMX和SSE技術之封裝資料序列中,被儲存於一MMX暫存器中之資料元素的數目是64位元被除於一個別的資料元素之位元長度。雖然被例示於圖3A中之資料型式是128位元長,本發明實施例同時也可利用64位元寬、256位元寬、512位元寬、或其他大小的運算元而操作。這範例之封裝字組格式320是128位元長並且含有八個封裝字組資料元素。各個封裝字組含有十六個資訊位元。圖3A之封裝雙字組格式330是128位元長並且含有四個封裝雙字組資料元素。各封裝雙字組資料元素含有32個資訊位元。一個封裝四字組是128位元長並且含有二個封裝四字組資料元素。 Typically, a data element is an individual piece of data stored in a single scratchpad or memory location with other data elements of the same length. In the package data sequence for SSEx technology, the number of data elements stored in an XMM scratchpad is 128 bits divided by the bit length of another data element. Similarly, in the package data sequence for MMX and SSE techniques, the number of data elements stored in an MMX register is 64 bits divided by the bit length of another data element. Although the data pattern illustrated in FIG. 3A is 128 bits long, embodiments of the present invention may also operate with 64-bit wide, 256-bit wide, 512-bit wide, or other sized operands. The encapsulated block format 320 of this example is 128 bits long and contains eight packed block data elements. Each package block contains sixteen information bits. The packaged double word format 330 of Figure 3A is 128 bits long and contains four encapsulated double word data elements. Each packaged double word data element contains 32 information bits. A package quadword is 128 bits long and contains two encapsulated quadword data elements.
圖3B例示另外的暫存器中資料儲存器格式。各封裝 資料可包含多於一個的獨立資料元素。三個封裝資料格式被例示;半封裝341、單一封裝342、以及雙重封裝343。半封裝341、單一封裝342、以及雙重封裝343之一實施例含有固定點資料元素。對於一不同實施例,一個或多個的半封裝341、單一封裝342、以及雙重封裝343可以含有浮點資料元素。半封裝341之一不同實施例是128位元長並且含有八個16位元資料元素。單一封裝342之一實施例是128位元長並且含有四個32位元資料元素。雙重封裝343之一實施例是128位元長並且含有二個64位元資料元素。應了解,此等封裝資料格式可以進一步地被擴展至其他暫存器長度,例如,擴展至96-位元、160-位元、192-位元、224-位元、256-位元、512-位元或更多。 Figure 3B illustrates a data store format in an additional register. Each package The data can contain more than one independent data element. Three package data formats are illustrated; a half package 341, a single package 342, and a dual package 343. One embodiment of the half package 341, the single package 342, and the dual package 343 contains fixed point data elements. For a different embodiment, one or more of the half package 341, the single package 342, and the dual package 343 may contain floating point data elements. A different embodiment of one of the half packages 341 is 128 bits long and contains eight 16 bit data elements. One embodiment of a single package 342 is 128 bits long and contains four 32-bit data elements. One embodiment of dual package 343 is 128 bits long and contains two 64 bit data elements. It should be appreciated that such package data formats can be further extended to other scratchpad lengths, for example, to 96-bit, 160-bit, 192-bit, 224-bit, 256-bit, 512. - bit or more.
圖3C例示根據本發明一實施例之多媒體暫存器中的各種帶符號以及不帶符號之封裝資料型式表示。不帶符號之封裝位元組表示344例示一SIMD暫存器中一不帶符號之封裝位元組的儲存器。用於各位元組資料元素之資訊以位元組0之位元7至位元0、位元組1之位元15至位元8、位元組2之位元23至位元16等等、以及最後位元組15之位元127至位元120之方式被儲存。因此,暫存器中所有可用的位元被使用。這儲存配置可增加處理器之儲存效能。同樣地,藉由被存取的十六個資料元素,一操作接著也可於十六個資料元素上以平行形式被進行。帶符號封裝位元組表示345例示一帶符號封裝位元組之儲存。注 意到,每個位元組資料元素之第八位元是符號指示符。不帶符號之封裝字組表示346例示字組7至字組0如何被儲存於一SIMD暫存器中。帶符號的封裝字組表示347是相似於暫存器中表示346之不帶符號的封裝字組。注意到,各字組資料元素之第十六位元是符號指示符。不帶符號封裝雙字組表示348展示雙字組資料元素如何被儲存。帶符號封裝雙字組表示349是相似於暫存器中表示348之不帶符號封裝雙字組。注意到,所需的符號位元是各個雙字組資料元素之第三十二位元。 3C illustrates various signed and unsigned package data pattern representations in a multimedia buffer in accordance with an embodiment of the present invention. The unsigned packaged byte representation 344 illustrates a memory of an unsigned packaged byte in a SIMD register. The information for each tuple data element is from bit 7 to bit 0 of byte 0, bit 15 to bit 8 of byte 1, bit 23 to bit 16 of byte 2, etc. And the manner of bit 127 to bit 120 of the last byte 15 is stored. Therefore, all available bits in the scratchpad are used. This storage configuration increases the storage performance of the processor. Similarly, by accessing the sixteen data elements, an operation can then be performed in parallel on sixteen data elements. The signed packed byte representation 345 illustrates the storage of a signed packed byte. Note It is intended that the eighth bit of each byte data element is a symbol indicator. The unsigned package block indicates 346 how the block 7 to block 0 are stored in a SIMD register. The signed packed block representation 347 is an unsigned packed block similar to the representation 346 in the scratchpad. Note that the sixteenth bit of each block data element is a symbol indicator. The unsigned encapsulation double-word representation 348 shows how the double-word data elements are stored. The signed packed double block representation 349 is an unsigned packed double block similar to the representation 348 in the scratchpad. Note that the required sign bit is the thirty-second bit of each double-word data element.
圖3D是具有三十二或更多位元,以及暫存器/記憶體運算元定址模式之一操作碼(opcode)格式360之一實施例的展示,其對應至被說明於“Intel®64以及IA-32英特爾結構軟體開發者之手冊組合卷2A和2B中之操作碼格式的一型式:指令集參考A-Z”,其是在全球網際網路(www)intelcom/products/processor/manuals/上由美國加州聖克拉拉之英特爾公司所提供的。於一實施例中,指令可以利用一個或多個欄位361及362被編碼。每個指令有高至二運算元位置可以被辨識,包含高至二個來源運算元辨識符364及365。對於一實施例,目的運算元辨識符366是相同如來源運算元辨識符364,而於其他實施例中,它們是不同的。對於一不同實施例,目的運算元辨識符366是相同如來源運算元辨識符365,而於其他實施例中,它們是不同的。於一實施例中,利用來源運算元辨識符364及365被辨識的來源運算元之一者利用指令之結果被重疊 寫入,而於其他實施例中,辨識符364對應至一來源暫存器元件並且辨識符365對應至一目的暫存器元件。對於一實施例,運算元辨識符364及365可被使用以辨識32位元或64位元來源以及目的運算元。 3D is a representation of one embodiment of an opcode format 360 having thirty-two or more bits, and one of the scratchpad/memory operand addressing modes, which corresponds to being described in "Intel® 64. And a type of opcode format in the IA-32 Intel Architecture Software Developer's Handbook Volumes 2A and 2B: Instruction Set Reference AZ" on the World Wide Web (www) intelcom/products/processor/manuals/ Provided by Intel Corporation of Santa Clara, California. In one embodiment, the instructions may be encoded using one or more fields 361 and 362. Up to two operand positions per instruction can be identified, including up to two source operand identifiers 364 and 365. For an embodiment, the destination operand identifiers 366 are the same as the source operand identifiers 364, while in other embodiments they are different. For a different embodiment, the destination operand identifiers 366 are the same as the source operand identifiers 365, while in other embodiments they are different. In one embodiment, one of the source operands identified by the source operand identifiers 364 and 365 is overlaid with the result of the instruction. Write, while in other embodiments, the identifier 364 corresponds to a source register element and the identifier 365 corresponds to a destination register element. For an embodiment, operand identifiers 364 and 365 can be used to identify 32-bit or 64-bit sources as well as destination operands.
圖3E是另一不同操作碼(opcode)格式370之展示,其具有40或更多位元。操作碼格式370對應至操作碼格式360並且包括一選擇式字首位元組378。根據一實施例之一指令可以利用一個或多個欄位378、371、以及372被編碼。每個指令有高至二個運算元位置可以利用來源運算元辨識符374和375以及利用字首位元組378被辨識。對於一實施例,字首位元組378可被使用以辨識32位元或64位元來源和目的運算元。對於一實施例,目的運算元辨識符376是相同如來源運算元辨識符374,而於其他實施例中,它們是不同的。對於一不同實施例,目的運算元辨識符376是相同如來源運算元辨識符375,而於其他實施例中,它們是不同的。於一實施例中,一指令在利用運算元辨識符374和375被辨識的一個或多個運算元上操作並且利用運算元辨識符374和375被辨識的一個或多個運算元利用指令之結果被重疊寫入,而於其他實施例中,利用辨識符374和375被辨識的運算元被寫入至另一暫存器中之另一資料元素。操作碼格式360和370允許暫存器至暫存器、記憶體至暫存器、藉由記憶體之暫存器、藉由暫存器之暫存器、即時暫存器、暫存器至記憶體定址,該記憶體定址部分地利用MOD欄363和373以及利用選擇 式純量-索引-基底和位移位元組被指定。 FIG. 3E is a representation of another different opcode format 370 having 40 or more bits. The opcode format 370 corresponds to the opcode format 360 and includes a select prefix prefix 378. Instructions in accordance with an embodiment may be encoded using one or more of the fields 378, 371, and 372. Up to two operand positions per instruction can be identified using source operand identifiers 374 and 375 and using prefix first byte 378. For an embodiment, the prefix byte 378 can be used to identify a 32-bit or 64-bit source and destination operand. For an embodiment, the destination operand identifiers 376 are the same as the source operand identifiers 374, while in other embodiments they are different. For a different embodiment, the destination operand identifiers 376 are the same as the source operand identifiers 375, while in other embodiments they are different. In one embodiment, an instruction operates on one or more operands identified by operand identifiers 374 and 375 and the one or more operands identified using operand identifiers 374 and 375 utilize the result of the instruction. The writes are overwritten, while in other embodiments, the operands identified by the identifiers 374 and 375 are written to another material element in another register. Opcode formats 360 and 370 allow the scratchpad to the scratchpad, the memory to the scratchpad, the scratchpad by the memory, the scratchpad by the scratchpad, the instant register, the scratchpad to Memory addressing, which uses partial MOD columns 363 and 373 and utilization options The scalar-index-base and shift byte are specified.
接著轉至圖3F,於一些不同實施例中,64位元(或128-位元、或256-位元、或512位元或更多)單一指令多數個資料(SIMD)算術操作可以經由一輔助處理器資料處理(CDP)指令被進行。操作碼(opcode)格式380展示具有CDP操作碼欄382和389之此一CDP指令。CDP指令之型式,對於不同實施例,操作可以利用一個或多個欄位383、384、387、以及388被編碼。每個指令可有高至三個運算元位置被辨識,其包含高至二個來源運算元辨識符385和390以及一個目的運算元辨識符386。輔助處理器之一實施例可在8、16、32、以及64位元值上操作。對於一實施例,一指令在整數資料元素上被進行。於一些實施例中,一指令可能使用條件欄381而條件式地被執行。對於一些實施例,來源資料大小可以利用欄位383被編碼。於一些實施例中,零(Z)、負(N)、進位(C)、以及溢位(V)檢測可在SIMD欄上被完成。對於一些指令,飽和之型式可以利用欄384被編碼。 Turning next to Figure 3F, in some different embodiments, 64-bit (or 128-bit, or 256-bit, or 512-bit or more) single instruction majority data (SIMD) arithmetic operations may be via one Auxiliary Processor Data Processing (CDP) instructions are performed. The opcode format 380 shows such a CDP instruction with CDP opcode columns 382 and 389. The version of the CDP instruction, for different embodiments, may be encoded using one or more of the fields 383, 384, 387, and 388. Each instruction can have up to three operand locations identified, including up to two source operand identifiers 385 and 390 and a destination operand identifier 386. One embodiment of the secondary processor can operate on 8, 16, 32, and 64 bit values. For an embodiment, an instruction is performed on an integer data element. In some embodiments, an instruction may be conditionally executed using condition bar 381. For some embodiments, the source material size can be encoded using field 383. In some embodiments, zero (Z), negative (N), carry (C), and overflow (V) detections can be done on the SIMD column. For some instructions, the saturated version can be encoded using column 384.
接著轉至圖3G,其是另一不同操作碼(opcode)格式397之展示,以根據另一實施例而提供基底暫存器交換狀態驗證功能,該操作碼(opcode)格式397對應至被說明於“Intel®先進向量擴展程式化參考”中之操作碼格式之型式,其是可由加利福尼亞州聖克拉拉的英特爾公司在全球網際網路(www)intelcom/products/processor/manuals/上所提供。 Turning next to Figure 3G, which is a representation of another different opcode format 397 to provide a substrate scratchpad exchange state verification function in accordance with another embodiment, the opcode format 397 corresponding to The version of the opcode format in the Intel® Advanced Vector Extension Stylization Reference is available on Intel Worldwide at www.intelcom/products/processor/manuals/ by Intel Corporation of Santa Clara, California.
被提供以用於具有被包含於附加位元組之各種格式的位址字節以及即時運算元的一個1-位元組操作碼之原始的x86指令集,其存在性自第一“操作碼”位元組得知。另外地,有某些位元組值,其是被保留作為對操作碼之修改符(被稱為字首,由於它們必須被安置在該指令之前)。當256操作碼位元組之原始調板(包含這些特殊字首值)被耗盡時,一個單一位元組被專用作為至新的一組256操作碼之脫逸。隨著向量指令(例如,SIMD)被添加,需要更多操作碼被產生,並且即使當經由字首之使用被擴充時,“2位元組”操作碼映射也是不夠的。為此目的,新的指令被添加在另外的映射上,其使用2位元組加上一選擇式字首作為一辨識符。 An original x86 instruction set provided for having an address byte of various formats included in the additional byte and a 1-byte operation code of the immediate operand, the presence of the first "opcode" The byte was informed. Additionally, there are certain byte values that are reserved as modifiers to the opcode (referred to as prefixes since they must be placed before the instruction). When the original palette of 256 opcoded tuples (including these special word first values) is exhausted, a single byte is dedicated as a escape to the new set of 256 opcodes. As vector instructions (eg, SIMD) are added, more opcodes are required to be generated, and even when expanded by use of word prefixes, "2-byte" opcode mapping is not sufficient. For this purpose, new instructions are added to the additional map, which uses 2 bytes plus a select prefix as an identifier.
另外地,為了便利64位元模式中之另外的暫存器,一另外的字首可被使用(被稱為“REX”)在該字首和該操作碼(以及判定該操作碼必須的任何脫逸位元組)之間中。於一實施例中,該REX可以具有4個“酬載”位元以指示64位元模式中之另外的暫存器之使用。於其他實施例中,其可以具有少於或多於4位元。至少一指令集之一般格式(其通常對應至格式360及/或格式370)通常利用下面方式被例示:[prefixes][rex]escape[escape2]opcode modrm(等等) Additionally, to facilitate additional registers in the 64-bit mode, an additional prefix can be used (referred to as "REX") at the beginning of the prefix and the opcode (and any necessary to determine the opcode) Between the escaped bytes). In one embodiment, the REX may have four "payload" bits to indicate the use of additional scratchpads in the 64-bit mode. In other embodiments, it may have less than or more than 4 bits. The general format of at least one instruction set (which typically corresponds to format 360 and/or format 370) is typically instantiated in the following manner: [prefixes][rex]escape[escape2]opcode modrm(etc.)
操作碼格式397對應於操作碼格式370並且包括選擇式VEX字首位元組391(於一實施例中以C4(16進制)開始)以取代大多數其他通常被使用之傳統指令字首位元組 和脫逸數碼。例如,下面例示使用兩個欄位以編碼一指令之一實施例,其當一第二脫逸數碼以原始指令被呈現時或當REX欄中之額外位元(例如,XB和W欄)將需要被使用時,可被使用。於下面例示之實施例中,傳統脫逸利用一新的脫逸值被表示,傳統字首是完全地被壓縮作為“酬載”位元組之部份,傳統字首被回收並且可供未來擴充,第二脫逸數碼被壓縮在一“映射”欄中,具有可用的進一步映射或特點空間,並且新的特點被添加(例如,增加向量長度以及一另外的來源暫存器指定符)。 The opcode format 397 corresponds to the opcode format 370 and includes a selector VEX prefix first byte 391 (starting with C4 (hexadecimal) in one embodiment) to replace most other conventional instruction prefix first tuples that are typically used. And escape digital. For example, the following illustrates an embodiment in which two fields are used to encode an instruction when a second escape digit is rendered with the original instruction or when additional bits in the REX column (eg, XB and W columns) will Can be used when it needs to be used. In the exemplified embodiment below, the traditional escape is represented by a new escape value, which is completely compressed as part of the "payload" byte, the traditional prefix being recovered and available for the future. The second escape number is compressed in a "mapping" column with additional mapping or feature space available, and new features are added (eg, increasing the vector length and an additional source register specifier).
根據一實施例之一指令可以利用一個或多個欄位391和392被編碼。每個指令可有高至四個運算元位置以便藉由來源運算元辨識符374和375之組合方式以及一選擇式純量-索引-基底(SIB)辨識符393、一選擇式位移辨識符394、及一選擇式即時位元組395之組合方式而利用欄391被辨識。對於一實施例,VEX字首位元組391可被使用以辨識32位元或64位元來源和目的運算元及/或128位元或256位元IMD暫存器或記憶體運算元。對於一實施例,利用操作碼格式397被提供的功能對於操作碼格式370,可能是冗餘,而於其他實施例中它們是不同的。操 作碼格式370和397允許暫存器至暫存器、記憶體至暫存器、藉由記憶體之暫存器、藉由暫存器之暫存器、即時暫存器、暫存器至記憶體定址,該記憶體定址部分地利用MOD欄373以及利用選擇式(SIB)辨識符393、一選擇式位移辨識符394、及一選擇式即時位元組395被指定。 Instructions in accordance with an embodiment may be encoded using one or more fields 391 and 392. Each instruction can have up to four operand positions for combination by source operand identifiers 374 and 375 and a selective scalar-index-base (SIB) identifier 393, a selective displacement identifier 394. And a combination of a selective instant byte 395 is identified using column 391. For an embodiment, the VEX prefix first byte 391 can be used to identify 32-bit or 64-bit source and destination operands and/or 128-bit or 256-bit IMD registers or memory operands. For an embodiment, the functionality provided using opcode format 397 may be redundant for opcode format 370, while in other embodiments they are different. Fuck Code formats 370 and 397 allow the scratchpad to the scratchpad, the memory to the scratchpad, the scratchpad by the memory, the scratchpad by the scratchpad, the instant register, the scratchpad to The memory is addressed, and the memory address is specified in part using the MOD column 373 and using a selection formula (SIB) identifier 393, a selective displacement identifier 394, and a selective instant byte 395.
接著轉至圖3H,其是另一不同操作碼(opcode)格式398之展示,其用以根據另一實施例而提供基底暫存器交換狀態驗證功能。操作碼格式398對應至操作碼格式370和397並且包括選擇式EVEX字首位元組396(於一實施例中以62(十六進制)開始)以取代多數之其他通常被使用之傳統指令字首位元組和脫逸數碼並且提供另外的功能。根據一實施例之一指令可以利用一個或多個的欄位396和392被編碼。每個指令之高至四個運算元位置以及一遮罩可藉由欄位396與來源運算元辨識符374和375之組合以及與一選擇式純量-索引-基底(SIB)辨識符393、一選擇式位移辨識符394、和一選擇式即時位元組395之組合被辨識。對於一實施例,EVEX字首位元組396可被使用以辨識32位元或64位元來源以及目的運算元及/或128-位元、256位元或512位元IMD暫存器或記憶體運算元。對於一實施例,利用操作碼格式398所提供之功能對於操作碼格式370或397可能是冗餘,而於其他實施例中,它們是不同的。操作碼格式398允許暫存器至暫存器、記憶體至暫存器、藉由記憶體之暫存器、藉由暫存器之暫存器、即時暫存器、暫存器至記憶體定址,具有遮罩,該記憶體 定址部分地利用MOD欄373以及利用選擇式(SIB)辨識符393、一選擇式位移辨識符394、及一選擇式即時位元組395被指定。至少一指令集之一般格式(其通常對應至格式360及/或格式370)通常藉由下面方式被例示:evex1 RXBmmmmm WvvvLpp evex4 opcode modrm[sib][disp][imm] Turning next to Figure 3H, which is a representation of another different opcode format 398 for providing a substrate scratchpad exchange status verification function in accordance with another embodiment. Opcode format 398 corresponds to opcode formats 370 and 397 and includes a selective EVEX prefix first byte 396 (starting with 62 (hexadecimal) in one embodiment) to replace most of the other conventional instruction words that are typically used. The first tuple and the escape digits provide additional functionality. Instructions in accordance with one embodiment may be encoded using one or more of fields 396 and 392. Up to four operand positions per instruction and a mask can be combined by field 396 and source operand identifiers 374 and 375 and with a selective scalar-index-base (SIB) identifier 393, A combination of a selective displacement identifier 394 and a selective instant bit 395 is identified. For an embodiment, the EVEX prefix first byte 396 can be used to identify 32-bit or 64-bit sources as well as destination operands and/or 128-bit, 256-bit or 512-bit IMD registers or memory. Operator. For an embodiment, the functionality provided by opcode format 398 may be redundant to opcode format 370 or 397, while in other embodiments they are different. The opcode format 398 allows the scratchpad to the scratchpad, the memory to the scratchpad, the scratchpad by the memory, the scratchpad by the scratchpad, the instant register, the scratchpad to the memory Addressing, with a mask, the memory The addressing is partially specified using the MOD column 373 and using the selection formula (SIB) identifier 393, a selective displacement identifier 394, and a selective instant byte 395. The general format of at least one instruction set (which typically corresponds to format 360 and/or format 370) is typically instantiated by: evex1 RXBmmmmm WvvvLpp evex4 opcode modrm[sib][disp][imm]
對於一實施例,根據EVEX格式398被編碼的一指令可以具有另外的“酬載”位元,其可被使用以提供基底暫存器交換狀態驗證功能,例如,具有另外的新特點,例如,一使用者可組態遮罩暫存器,或一另外的運算元,或自128-位元、256位元或512位元向量暫存器之間選擇,或自其中選擇的更多暫存器,等等。 For an embodiment, an instruction encoded in accordance with EVEX format 398 may have additional "pay" bits that may be used to provide a base register exchange status verification function, for example, with additional new features, for example, A user configurable mask register, or an additional operand, or between 128-bit, 256-bit or 512-bit vector registers, or more temporary memory selected from , and so on.
指令範例,其中之一些可被使用以提供基底暫存器交換狀態驗證功能,經由範例被例示並且不受限於下面列表中:
應了解,基底暫存器交換狀態驗證指令,如於上面範例中,可被使用以在執行時,提供例外處置器推斷能力,或例如,不論一SwapGS指令是否需要被執行而不必憑藉特設方法,例如,複雜及易於損壞的“魔術位址檢查”,或將GS基底暫存器設定至核中之負的值並且使用以設定FS基底暫存器和GS基底暫存器之使用者空間指令失能。同時也應了解,基底暫存器交換狀態驗證指令,如於上面範例中,可被使用以避免在例外處置器設計、額外耗時檢查以及非必要的使用者限制中之錯雜性。 It should be appreciated that the base register exchange state verification instructions, as in the above examples, can be used to provide exception handler inference capabilities when executed, or for example, whether or not a SwapGS instruction needs to be executed without having to resort to an ad hoc method, For example, a complex and vulnerable "magic address check", or setting the GS base register to a negative value in the core and using the user space command to set the FS base register and the GS base register. can. It should also be appreciated that the base register exchange status verification instructions, as in the above examples, can be used to avoid errors in exception handler design, additional time consuming checks, and unnecessary user restrictions.
圖4A是例示根據本發明至少一實施例之一依序管線和一暫存器重命名步驟、失序發布/執行管線之方塊圖。圖4B是例示根據本發明至少一實施例被包含於一處理器中的一依序結構核心和一暫存器重命名邏輯、失序發布/執行邏輯之方塊圖。圖4A中之實線方塊例示依序管線,而虛線方塊則例示暫存器重新命名、失序發布/執行管線。同樣地,圖4B中之實線方塊例示依序結構邏輯,而虛線方塊則例示暫存器重命名邏輯和失序發布/執行邏輯。 4A is a block diagram illustrating a sequential pipeline and a scratchpad renaming step, an out-of-order issue/execution pipeline, in accordance with at least one embodiment of the present invention. 4B is a block diagram illustrating a sequential structure core and a scratchpad rename logic, out-of-order issue/execution logic included in a processor in accordance with at least one embodiment of the present invention. The solid line block in Figure 4A illustrates the sequential pipeline, while the dashed line block illustrates the register renaming, out of order release/execution pipeline. Similarly, the solid line block in FIG. 4B illustrates sequential structure logic, while the dashed line block illustrates scratchpad rename logic and out-of-order issue/execution logic.
於圖4A中,一處理器管線400包含一擷取步驟402、一長度解碼步驟404、一解碼步驟406、一分配步驟408、一重新命名步驟410、一排程(同時也習知如一調遣或發布)步驟412、一暫存器讀取/記憶體讀取步驟414、一執行步驟416、一回寫/記憶體寫入步驟418、一例外處理步驟422、以及一確定步驟424。 In FIG. 4A, a processor pipeline 400 includes a capture step 402, a length decoding step 404, a decoding step 406, an assigning step 408, a renaming step 410, and a schedule (also known as a dispatch or Release 412, a scratchpad read/memory read step 414, an execution step 416, a writeback/memory write step 418, an exception processing step 422, and a determination step 424.
圖4B中,箭號表示在二個或更多個單元之間的一耦合並且箭號方向指示在那些單元之間的資料流程之方向。圖4B展示包含被耦合至一執行引擎單元450之一前端點單元430的處理器核心490,並且其兩者皆被耦合至一記憶體單元470。 In Fig. 4B, an arrow indicates a coupling between two or more units and an arrow direction indicates the direction of the data flow between those units. 4B shows a processor core 490 that includes a front end point unit 430 coupled to an execution engine unit 450, and both of which are coupled to a memory unit 470.
核心490可以是一簡化指令集計算(RISC)核心、一複雜指令集計算(CISC)核心、一超長指令字組(VLIW)核心、或一混合或不同的核心類型。如另外之一選擇,該核心490可以是一特殊用途核心,例如,一網路或通訊核心、壓縮引擎、圖形核心、或其類似者。 Core 490 can be a simplified instruction set computing (RISC) core, a complex instruction set computing (CISC) core, a very long instruction block (VLIW) core, or a hybrid or different core type. Alternatively, the core 490 can be a special purpose core, such as a network or communication core, a compression engine, a graphics core, or the like.
前端點單元430包含被耦合至一指令快取單元434之一分支預測單元432,該指令快取單元434被耦合至一指令轉化後備緩衝器(TLB)436,其被耦合至一指令擷取單元438,其被耦合至一解碼單元440。解碼單元或解碼器可以解碼指令,並且作為一輸出地產生一個或多個微操作、微碼入口點、微指令、其他指令、或其他控制信號,其自原始指令被解碼,或其在其他方面反映原始指令、或自原始指令被導出。解碼器可使用各種不同的機構被實行。適當機構之範例包含,但是不受限定於,查詢表、硬體實行例、可程控邏輯陣列(PLA)、微碼唯讀記憶體(ROM)等等。指令快取單元434進一步地被耦合至記憶體單元470中之一位準2(L2)快取單元476。解碼單元440被耦合至執行引擎單元450中之一重新命名/分配器單元452。 The pre-endpoint unit 430 includes a branch prediction unit 432 coupled to an instruction cache unit 434 that is coupled to an instruction conversion lookaside buffer (TLB) 436 that is coupled to an instruction fetch unit 438, which is coupled to a decoding unit 440. The decoding unit or decoder may decode the instructions and, as an output, generate one or more micro-ops, microcode entry points, microinstructions, other instructions, or other control signals that are decoded from the original instructions, or otherwise Reflect the original instruction or be exported from the original instruction. The decoder can be implemented using a variety of different mechanisms. Examples of suitable mechanisms include, but are not limited to, look-up tables, hardware implementations, programmable logic arrays (PLAs), microcode read only memory (ROM), and the like. The instruction cache unit 434 is further coupled to a level 2 (L2) cache unit 476 in the memory unit 470. The decoding unit 440 is coupled to one of the execution engine units 450, the rename/allocator unit 452.
執行引擎單元450包含被耦合至一除役單元454之重 新命名/分配器單元452以及一組的一個或多個排程器單元456。排程器單元456代表任何數目之不同排程器,其包含保留站、中央指令窗口、等等。該排程器單元456被耦合至實體暫存器檔案單元458。實體暫存器檔案單元458之各者代表一個或多個實體暫存器檔案,其之不同的一者儲存一個或多個不同的資料型式,例如,純量整數、純量浮點、封裝整數、封裝浮點、向量整數、向量浮點、等等,狀態(例如,一指令指示器,其是將被執行的下一個指令之位址),等等。實體暫存器檔案單元458與除役單元454相重疊以例示暫存器重命名和失序執行可被實行之各種方式(例如,使用一重排序緩衝器及一除役暫存器檔案、使用一將來的檔案、一歷史緩衝器、以及一除役暫存器檔案;使用一暫存器映射以及一共用暫存器;等等)。通常,結構暫存器是自處理器外部或自一程式師觀點可見的。該等暫存器是不受限定於任何習知的特定型式之電路。各種不同型式的暫存器是適當的,只要它們是能夠如此處說明地儲存以及提供資料。適當暫存器的範例包含,但是不受限定於,專用實體暫存器、使用暫存器重新命名而動態地被分配的實體暫存器、專用以及動態地被分配之實體暫存器之組合、等等。該除役單元454以及該實體暫存器檔案單元458被耦合至執行聚集460。執行聚集460包含一組的一個或多個執行單元462以及一組的一個或多個記憶體存取單元464。該執行單元462可以進行各種操作(例如,移位、加法、減法、乘法)以及在各種型式 的資料上(例如,純量浮點、封裝整數、封裝浮點、向量整數、向量浮點)進行各種操作。儘管一些實施例可以包含專用於特定功能或功能組的一些執行單元,其他實施例可以僅包含一執行單元或全都進行所有功能之多數個執行單元。排程器單元456、實體暫存器檔案單元458、以及執行聚集460如可能地以複數被展示,因為某些實施例產生個別的管線以供用於某些型式之資料/操作(例如,各具有它們獨有的排程器單元、實體暫存器檔案單元、及/或執行聚集之一純量整數管線、一純量浮點/封裝整數/封裝浮點/向量整數/向量浮點管線、及/或一記憶體存取管線,並且於一各別的記憶體存取管線之情況中,於其中僅執行具有記憶體存取單元464的這管線之聚集的某些實施例被實行)。同時也應了解,其中各別的管線被使用處,一個或多個的這些管線可能是失序發布/執行且其餘的是依序。 Execution engine unit 450 includes the weight coupled to a decommissioning unit 454 A new naming/dispenser unit 452 and a set of one or more scheduler units 456. Scheduler unit 456 represents any number of different schedulers, including reservation stations, central instruction windows, and the like. The scheduler unit 456 is coupled to the physical register file unit 458. Each of the physical scratchpad file units 458 represents one or more physical register files, one of which stores one or more different data types, such as scalar integers, scalar floating points, packed integers , wrap floating point, vector integer, vector floating point, etc., state (eg, an instruction indicator, which is the address of the next instruction to be executed), and so on. The physical scratchpad file unit 458 overlaps with the decommissioning unit 454 to illustrate various ways in which register renaming and out-of-sequence execution can be performed (eg, using a reorder buffer and a delimited register file, using a future File, a history buffer, and a delimited scratchpad file; use a scratchpad map and a shared scratchpad; etc.). Typically, the structure register is external to the processor or visible from a programmer's point of view. The registers are not limited to any conventional, specific type of circuitry. A variety of different types of registers are suitable as long as they are capable of storing and providing information as explained herein. Examples of suitable registers include, but are not limited to, a combination of a dedicated entity register, a physical register that is dynamically allocated using a register renaming, a dedicated, and a dynamically allocated physical register. ,and many more. The decommissioning unit 454 and the physical register file unit 458 are coupled to the execution aggregate 460. Execution aggregation 460 includes a set of one or more execution units 462 and a set of one or more memory access units 464. The execution unit 462 can perform various operations (eg, shifting, addition, subtraction, multiplication) as well as in various types Various operations (for example, scalar floating point, packed integer, encapsulated floating point, vector integer, vector floating point) perform various operations. Although some embodiments may include some execution units that are specific to a particular function or group of functions, other embodiments may include only one execution unit or a plurality of execution units that perform all of the functions. Scheduler unit 456, physical register file unit 458, and execution aggregate 460 are shown as complex numbers, as some embodiments generate individual pipelines for use in certain types of data/operations (eg, each having Their unique scheduler unit, physical register file unit, and/or execution of a scalar integer pipeline, a scalar floating point/packaged integer/packaged floating point/vector integer/vector floating point pipeline, and / or a memory access pipeline, and in the case of a separate memory access pipeline, some embodiments in which only the aggregation of this pipeline with memory access unit 464 is performed are performed). It should also be understood that where separate pipelines are used, one or more of these pipelines may be out of order for release/execution and the rest are sequential.
該記憶體存取單元464之組集被耦合至記憶體單元470,其包含被耦合至一資料快取單元474(其被耦合至一位準2(L2)快取單元476)的一資料TLB單元472。於一實施範例中,該記憶體存取單元464可以包含一負載單元、一儲存器位址單元、以及一儲存資料單元,其各被耦合至記憶體單元470中之資料TLB單元472。該L2快取單元476被耦合至一個或多個其他位準之快取並且最終至一主記憶體。 The set of memory access units 464 is coupled to a memory unit 470 that includes a data TLB coupled to a data cache unit 474 that is coupled to a one-bit 2 (L2) cache unit 476. Unit 472. In one embodiment, the memory access unit 464 can include a load unit, a memory address unit, and a stored data unit, each coupled to a data TLB unit 472 in the memory unit 470. The L2 cache unit 476 is coupled to one or more other levels of cache and eventually to a primary memory.
經由範例,範例暫存器重新命名、失序發布/執行核 心結構可以如下所示地實行管線400:1)指令擷取438進行擷取及長度解碼步驟402和404;2)解碼單元440進行解碼步驟406;3)重新命名/分配器單元452進行分配步驟408和重新命名步驟410;4)排程器單元456進行排程步驟412;5)實體暫存器檔案單元458及記憶體單元470進行暫存器讀取/記憶體讀取步驟414;執行聚集460進行執行步驟416;6)記憶體單元470及實體暫存器檔案單元458進行回寫/記憶體寫入步驟418;7)各種單元可以涉及例外處理步驟422;以及8)除役單元454和實體暫存器檔案單元458進行確定步驟424。 By example, the sample register is renamed, out of order release/execution core The heart structure may implement pipeline 400 as follows: 1) instruction capture 438 for capture and length decoding steps 402 and 404; 2) decoding unit 440 for decoding step 406; 3) rename/allocator unit 452 for allocation step 408 and renaming step 410; 4) scheduler unit 456 performs scheduling step 412; 5) physical register file unit 458 and memory unit 470 perform register read/memory read step 414; perform aggregation 460 performs step 416; 6) memory unit 470 and physical register file unit 458 performs write back/memory write step 418; 7) various units may involve exception processing step 422; and 8) decommissioning unit 454 and The physical scratchpad file unit 458 performs a determination step 424.
核心490可以支援一個或多個指令集(例如,x86指令集(具有已被添加較新的版本的一些擴展版);加利福尼亞州桑尼維爾之MIPS技術的MIPS指令集;加利福尼亞州桑尼維爾之ARM控股公司的ARM指令集(具有選擇式另外的擴展版,例如,NEON)。 Core 490 can support one or more instruction sets (eg, the x86 instruction set (with some extensions that have been added with newer versions); MIPS instruction set for MIPS technology in Sunnyvale, California; Sunnyvale, California ARM's ARM instruction set (with optional extensions, such as NEON).
應了解,該核心可以支援多線程(執行二個或更多個平行操作或線程),並且可以多種方式這樣做,包含分時多線程、同時多線程(其中一單一實際核心提供一邏輯核心以供用於同時多線程之實際核心的該等線程之各者)、或其之組合(例如,分時擷取及解碼以及例如,隨後如在Intel®超線程技術中之同時多線程)。 It should be appreciated that the core can support multiple threads (performing two or more parallel operations or threads) and can do so in a variety of ways, including time-sharing multi-threading, simultaneous multi-threading (where a single actual core provides a logical core to Each of the threads for the actual core of simultaneous multi-threading, or a combination thereof (eg, time-sharing and decoding and, for example, subsequent multi-threading as in Intel® Hyper-Threading Technology).
雖然暫存器重命名被說明於失序執行之脈絡中,應了解,暫存器重命名可被使用於一依序結構中。雖然例示之處理器實施例也包含一各別的指令及資料快取單元 434/474以及一共用L2快取單元476,但不同實施例也可以具有,例如,供用於指令和資料兩者之一個單一內部快取,例如,一位準1(L1)內部快取,或多數個位準之內部快取。於一些實施例中,系統可以包含一內部快取以及在核心及/或處理器外部之一外部快取的組合。另外地,所有的快取可以是在核心及/或處理器之外部。 Although register renaming is illustrated in the context of out-of-order execution, it should be understood that register renaming can be used in a sequential structure. Although the illustrated processor embodiment also includes a separate instruction and data cache unit 434/474 and a shared L2 cache unit 476, but different embodiments may also have, for example, a single internal cache for both instructions and data, for example, a one-bit 1 (L1) internal cache, or Most of the internal caches are standard. In some embodiments, the system can include an internal cache and a combination of external caches at one of the core and/or outside of the processor. Additionally, all caches may be external to the core and/or processor.
圖5是根據本發明實施例之一個單一核心處理器以及具有整合記憶體控制器以及圖形之一多核心處理器500的方塊圖。圖5中之實線方塊例示一處理器500,其具有一單一核心502A、一系統代理510、一組的一個或多個匯流排控制器單元516,而選擇式添加之虛線方塊例示一不同處理器500,其具有多數個核心502A-N、於系統代理單元510中之一組的一個或多個整合記憶體控制器單元514、以及一整合圖形邏輯508。 5 is a block diagram of a single core processor and a multi-core processor 500 having an integrated memory controller and graphics, in accordance with an embodiment of the present invention. The solid line block in FIG. 5 illustrates a processor 500 having a single core 502A, a system agent 510, a set of one or more bus controller units 516, and a dashed line of options added to illustrate a different process. The device 500 has a plurality of cores 502A-N, one or more integrated memory controller units 514 in one of the system proxy units 510, and an integrated graphics logic 508.
記憶體階層包含在核心內之一個或多個快取位準、一組或一個或多個共用快取單元506、以及被耦合至該組整合記憶體控制器單元514之外部記憶體(未被展示於圖形中)。該組共用快取單元506可以包含一個或多個中間位準快取,例如,位準2(L2)、位準3(L3)、位準4(L4)、或其他快取位準、一最後位準快取(LLC)、及/或其組合。雖然於一實施例中,一環形基底互連單元512互連該整合圖形邏輯508、該組共用快取單元506、以及該系統代理單元510,但不同實施例可使用任何數量的習知技術以互連此等單元。 The memory hierarchy includes one or more cache levels within the core, a set or one or more shared cache units 506, and external memory coupled to the set of integrated memory controller units 514 (not Shown in the graph). The set of shared cache units 506 may include one or more intermediate level caches, such as level 2 (L2), level 3 (L3), level 4 (L4), or other cache level, one. Last level cache (LLC), and/or combinations thereof. Although in one embodiment, a ring substrate interconnect unit 512 interconnects the integrated graphics logic 508, the set of shared cache units 506, and the system proxy unit 510, different embodiments may use any number of conventional techniques to Interconnect these units.
於一些實施例中,一個或多個核心502A-N可以是多線程。該系統代理510包含協調及操作核心502A-N的那些構件。該系統代理單元510可以包含,例如,一電力控制單元(PCU)以及一顯示單元。該PCU可以是或包含用以調整核心502A-N以及整合圖形邏輯508所需的邏輯以及構件。顯示單元之電力狀態是用以驅動一個或多個外部連接之顯示器。 In some embodiments, one or more of the cores 502A-N can be multi-threaded. The system agent 510 includes those components that coordinate and operate the cores 502A-N. The system proxy unit 510 can include, for example, a power control unit (PCU) and a display unit. The PCU can be or include the logic and components needed to adjust cores 502A-N and integrate graphics logic 508. The power state of the display unit is a display used to drive one or more external connections.
核心502A-N就結構及/或指令集上可以是均相或非均相的。例如,一些的核心502A-N可以是依序地而其他的則是失序地。如另一範例,二個或更多個核心502A-N可以是能夠執行相同指令集,而其他者則可以是僅能夠執行該指令集的一子集或一不同指令集。 The cores 502A-N may be homogeneous or heterogeneous in terms of structure and/or instruction set. For example, some cores 502A-N may be sequential and others may be out of order. As another example, two or more cores 502A-N may be capable of executing the same set of instructions, while others may be capable of executing only a subset of the set of instructions or a different set of instructions.
處理器可以是一個一般用途處理器,例如,CoreTMi3、i5、i7、2Duo以及Quad、XeonTM、ItaniumTM、XScaleTM或StrongARMTM處理器,其是由加利州聖克拉拉之英特爾公司所提供的。另外地,該處理器可以是來自另一公司,例如,ARM控股公司、MIPS,等等。例如,該處理器可以是一特殊用途處理器,例如,一網路或通訊處理器、壓縮引擎、圖形處理器、輔助處理器、嵌入式處理器、或其類似者。該處理器可被實行在一個或多個晶片上。該處理器500可以是一部件及/或可以被實行在一個或多個基片上,例如,使用任何一些處理技術,例如,BiCMOS、CMOS、或NMOS。 The processor may be a general purpose processor, e.g., Core TM i3, i5, i7,2Duo and Quad, Xeon TM, Itanium TM, XScale TM or StrongARM TM processor, which is provided by the Intel Corporation of Santa Clara, Gary of. Additionally, the processor may be from another company, such as ARM Holdings, MIPS, and the like. For example, the processor can be a special purpose processor, such as a network or communications processor, a compression engine, a graphics processor, a secondary processor, an embedded processor, or the like. The processor can be implemented on one or more wafers. The processor 500 can be a component and/or can be implemented on one or more substrates, for example, using any processing technique, such as BiCMOS, CMOS, or NMOS.
圖6-8是適用於包含處理器500之系統的範例,而圖 9則是可以包含一個或多個核心502之系統晶片(SoC)的範例。用於膝上型電腦、桌上型、手持PC、個人數位助理、工程工作站、伺服器、網路裝置、網路中樞、交換機、嵌入式處理器、數位、信號處理器(DSP)、圖形裝置、視訊遊戲裝置、機上盒、微控制器、手機、輕便型媒體播放機、手持裝置、以及各種其他電子裝置之習知技術的其他系統設計以及組態,同時也是合適的。大體上,如此處揭示之可包含一處理器及/或其他執行邏輯之大量多種系統或電子裝置通常也是合適的。 6-8 are examples of systems suitable for use with processor 500, and FIG. 9 is an example of a system chip (SoC) that may contain one or more cores 502. For laptops, desktops, handheld PCs, personal digital assistants, engineering workstations, servers, networking devices, network hubs, switches, embedded processors, digital, signal processor (DSP), graphics devices Other system designs and configurations of conventional techniques for video game devices, set-top boxes, microcontrollers, cell phones, portable media players, handheld devices, and various other electronic devices are also suitable. In general, a wide variety of systems or electronic devices, as disclosed herein that may include a processor and/or other execution logic, are also generally suitable.
接著參看至圖6,所展示者是依據本發明一實施例的系統600的方塊圖。該系統600可以包含一個或多個處理器610、615,其被耦合至圖形記憶體控制器中樞(GMCH)620。另外的處理器615之選擇式性質是於圖6中以虛線被表示。 Referring next to Figure 6, a block diagram of a system 600 in accordance with an embodiment of the present invention is shown. The system 600 can include one or more processors 610, 615 that are coupled to a graphics memory controller hub (GMCH) 620. The alternative nature of the additional processor 615 is indicated in Figure 6 by a dashed line.
各處理器610、615可以是處理器500的一些版本。但是,應注意到,其不可能是將整合圖形邏輯以及整合記憶體控制單元存在於處理器610、615中。圖6例示的GMCH 620可以被耦合至一記憶體640,其可以是,例如,一動態隨機存取記憶體(DRAM)。對於至少一實施例,該DRAM可以是關聯於一非依電性快取。 Each processor 610, 615 can be some version of processor 500. However, it should be noted that it is not possible to have integrated graphics logic and integrated memory control units present in the processors 610, 615. The GMCH 620 illustrated in Figure 6 can be coupled to a memory 640, which can be, for example, a dynamic random access memory (DRAM). For at least one embodiment, the DRAM can be associated with a non-electrical cache.
GMCH 620可以是一晶片組、或一晶片組之一部份。該GMCH 620可以與處理器610、615通訊並且控制在處理器610、615以及記憶體640之間的互動。GMCH 620也可作用如同在系統600的處理器610、615以及其他元 件之間的一加速匯流排介面。對於至少一實施例,該GMCH 620經由一多點分支匯流排(例如,一前側匯流排(FSB)695)而與該等處理器610、615通訊。 The GMCH 620 can be a chipset, or a portion of a wafer set. The GMCH 620 can communicate with the processors 610, 615 and control the interaction between the processors 610, 615 and the memory 640. The GMCH 620 can also function as the processors 610, 615 and other elements in the system 600. An accelerated bus interface between the pieces. For at least one embodiment, the GMCH 620 communicates with the processors 610, 615 via a multi-drop branch bus (e.g., a front side bus (FSB) 695).
更進一步地,GMCH 620被耦合至一顯示器645(例如,一平面顯示器)。GMCH 620可以包含一整合圖形加速裝置。GMCH 620進一步地被耦合至一輸入/輸出(I/O)控制器中樞(ICH)650,其可被使用以將各種週邊裝置耦合至系統600。例如圖6實施例中所展示的是一外部圖形裝置660,其可以是,與另一週邊裝置670一起耦合至ICH 650的一離散圖形裝置。 Still further, the GMCH 620 is coupled to a display 645 (eg, a flat panel display). The GMCH 620 can include an integrated graphics accelerator. The GMCH 620 is further coupled to an input/output (I/O) controller hub (ICH) 650 that can be used to couple various peripheral devices to the system 600. For example, shown in the embodiment of FIG. 6, an external graphics device 660, which may be coupled to another peripheral device 670, to a discrete graphics device of the ICH 650.
另外地,額外的或不同的處理器也可存在於系統600中。例如,另外的處理器615可以包含相同於處理器610之另外的處理器、異構或非對稱於處理器610之另外的處理器、加速裝置(例如,圖形加速裝置或數位信號處理(DSP)單元)、場式可程控閘陣列、或任何其他處理器。就包含結構、微結構、熱、功耗特性、以及其類似者之優點度量頻譜而論,在實際資源610、615之間可以有多種差異。這些差異可能有效地在處理器610、615之中將它們自己表現為不對稱性以及不均勻性。對於至少一實施例,各種處理器610、615可能存在於相同晶片封裝中。 Additionally, additional or different processors may also be present in system 600. For example, the additional processor 615 can include additional processors identical to the processor 610, additional processors that are heterogeneous or asymmetric to the processor 610, acceleration devices (eg, graphics acceleration devices or digital signal processing (DSP) Unit), field programmable gate array, or any other processor. There are many differences between actual resources 610, 615 in terms of spectrum including the structure, microstructure, heat, power consumption characteristics, and the like. These differences may effectively present themselves as asymmetry and non-uniformity among the processors 610, 615. For at least one embodiment, various processors 610, 615 may be present in the same wafer package.
接著參看至圖7,所展示的是依據本發明一實施例之第二系統700的方塊圖。如於圖7之展示,多處理器系統700是一點對點互連系統,並且包含一第一處理器770及經由一點對點互連750被耦合的一第二處理器780。處理 器770以及780各者可以是如一個或多個處理器610、615之處理器500的一些版本。 Referring next to Figure 7, shown is a block diagram of a second system 700 in accordance with an embodiment of the present invention. As shown in FIG. 7, multiprocessor system 700 is a point-to-point interconnect system and includes a first processor 770 and a second processor 780 coupled via a point-to-point interconnect 750. deal with Each of the devices 770 and 780 can be some version of the processor 500 as one or more processors 610, 615.
雖然僅以兩個處理器770、780被展示,應當了解,本發明範疇是不因此受限定於此。於其他實施例中,一個或多個另外的處理器可以存在於一所給予的處理器中。 Although only two processors 770, 780 are shown, it should be understood that the scope of the invention is not so limited. In other embodiments, one or more additional processors may reside in a given processor.
處理器770及780被展示而分別地包含整合記憶體控制器單元772和782。處理器770同時也包含作為其之匯流排控制器單元點對點(P-P)介面776和778之部件;同樣地,第二處理器780包含P-P介面786和788。處理器770、780可以使用P-P介面電路778、788以經由一點對點(P-P)介面750而交換資訊。如於圖7之展示,IMC 772和782耦合處理器至各別的記憶體,亦即一記憶體732以及一記憶體734,其可以是局域性地被附帶至各別的處理器之主要記憶體的部份。 Processors 770 and 780 are shown to include integrated memory controller units 772 and 782, respectively. Processor 770 also includes components of its bus controller unit point-to-point (P-P) interfaces 776 and 778; likewise, second processor 780 includes P-P interfaces 786 and 788. Processors 770, 780 can use P-P interface circuits 778, 788 to exchange information via a point-to-point (P-P) interface 750. As shown in FIG. 7, IMCs 772 and 782 couple the processors to separate memories, namely a memory 732 and a memory 734, which may be localized to the respective processors. Part of the memory.
處理器770、780各可以使用點對點介面電路776、794、786、798而經由各別的P-P介面752、754以與一晶片組790交換資訊。晶片組790也可經由一高性能圖形介面739而與一高性能圖形電路738交換資訊。 Processors 770, 780 can each exchange information with a chipset 790 via respective P-P interfaces 752, 754 using point-to-point interface circuits 776, 794, 786, 798. Wafer set 790 can also exchange information with a high performance graphics circuit 738 via a high performance graphics interface 739.
一共用快取(未被展示於圖形中)可被包含於一處理器中或在兩個處理器之外,仍然經由P-P互連與處理器連接,以至於如果一處理器被安置成為一低功率模式,則任一的或兩處理器之局域性之快取資訊可以被儲存於該共用快取中。 A shared cache (not shown in the graphics) can be included in a processor or outside of the two processors, still connected to the processor via a PP interconnect, so that if a processor is placed low In the power mode, the localized cache information of any or two processors can be stored in the shared cache.
晶片組790可以經由一介面796被耦合至一第一匯流 排716。於一實施例中,第一匯流排716可以是一週邊構件互連(PCI)匯流排,或一匯流排,例如,一PCI快送匯流排或另一個第三代I/O互連匯流排,而本發明範疇是不因而受限定於此。 Wafer set 790 can be coupled to a first confluence via an interface 796 Row 716. In an embodiment, the first bus bar 716 can be a peripheral component interconnect (PCI) bus bar, or a bus bar, for example, a PCI express bus or another third-generation I/O interconnect bus. However, the scope of the present invention is not limited thereto.
如於圖7之展示,各種I/O裝置714可以被耦合至第一匯流排716,及一匯流排橋718其耦合第一匯流排716至一第二匯流排720。於一實施例中,第二匯流排720可以是一低引腳數(LPC)匯流排。於一實施例中,各種裝置可被耦合至第二匯流排720,例如,包含一鍵盤及/或滑鼠722、通訊裝置727,以及一儲存單元728,例如,一碟片驅動器或其他大量儲存裝置,其可以包含指令/數碼以及資料730。進一步地,一音訊I/O 724可以被耦合至第二匯流排720。應注意,其他結構也是可能的。例如,取代圖7之點對點結構,一系統可以實行一多點分支匯流排或其他此結構。 As shown in FIG. 7, various I/O devices 714 can be coupled to first bus bar 716, and a bus bar bridge 718 that couples first bus bar 716 to a second bus bar 720. In an embodiment, the second bus 720 can be a low pin count (LPC) bus. In one embodiment, various devices can be coupled to the second bus 720, for example, including a keyboard and/or mouse 722, a communication device 727, and a storage unit 728, such as a disc drive or other mass storage device. A device, which may include instructions/digitals and data 730. Further, an audio I/O 724 can be coupled to the second bus 720. It should be noted that other configurations are also possible. For example, instead of the point-to-point structure of Figure 7, a system can implement a multi-drop branch bus or other such structure.
接著參看至圖8,所展示的是依據本發明一實施例之第三系統800的方塊圖。圖7和圖8中之相似元件承用相同之參考號碼,並且圖7之某些方面自圖8中被省略,以便避免混淆圖8的其他論點。 Referring next to Figure 8, a block diagram of a third system 800 in accordance with an embodiment of the present invention is shown. Similar elements in Figures 7 and 8 bear the same reference numerals, and certain aspects of Figure 7 are omitted from Figure 8 to avoid confusing the other points of Figure 8.
圖8例示處理器870、880,其可以分別地包含整合記憶體以及I/O控制邏輯(“CL”)872和882。對於至少一實施例,CL 872、882可以包含整合記憶體控制器單元,例如,如上配合圖5和7之所述者。此外。CL 872、882也可包含I/O控制邏輯。圖8例示的不僅僅是被耦合至 CL 872、882之記憶體832、834,但是同時也例示也被耦合至控制邏輯872、882的I/O裝置814。傳統I/O裝置815被耦合至晶片組890。 FIG. 8 illustrates processors 870, 880, which may include integrated memory and I/O control logic ("CL") 872 and 882, respectively. For at least one embodiment, CL 872, 882 can include an integrated memory controller unit, such as described above in connection with Figures 5 and 7. Also. CL 872, 882 can also contain I/O control logic. Figure 8 illustrates more than just being coupled to Memory 832, 834 of CL 872, 882, but also I/O device 814, also coupled to control logic 872, 882. Conventional I/O device 815 is coupled to chip set 890.
接著參看至圖9,所展示的是依據本發明一實施例之SoC 900的方塊圖。相似於圖5中之元件具有相同的參考號碼。同時,虛線方塊也是更先進之SoC上之選擇式特點。於圖9中,一互連單元902被耦合至:一應用處理器910,其包含一組之一個或多個核心502A-N以及共用快取單元506;一系統代理單元510;一匯流排控制器單元516;一整合記憶體控制器單元514;一組之一個或多個媒體處理器920,其可包含整合圖形邏輯508、用以提供靜態及/或視訊攝影機功能之一影像處理器924、用以提供硬體音訊加速度之一音訊處理器926、以及用以提供視訊編碼/解碼加速度之一視訊處理器928;一靜態隨機存取記憶體(SRAM)單元930;一直接記憶體存取(DMA)單元932;以及用以耦合至一個或多個外部顯示器之一顯示單元940。 Referring next to Figure 9, shown is a block diagram of a SoC 900 in accordance with an embodiment of the present invention. Elements similar to those in Figure 5 have the same reference numbers. At the same time, the dotted squares are also a selective feature on more advanced SoCs. In Figure 9, an interconnect unit 902 is coupled to: an application processor 910 that includes a set of one or more cores 502A-N and a shared cache unit 506; a system proxy unit 510; a bus control Unit 516; an integrated memory controller unit 514; a set of one or more media processors 920, which may include integrated graphics logic 508, an image processor 924 for providing static and/or video camera functions, An audio processor 926 for providing hardware audio acceleration, and a video processor 928 for providing video encoding/decoding acceleration; a static random access memory (SRAM) unit 930; a direct memory access ( DMA) unit 932; and display unit 940 for coupling to one or more external displays.
圖10例示含有一中央處理單元(CPU)以及一圖形處理單元(GPU)之一處理器,其可以根據一實施例而進行至少一指令。於一實施例中,根據至少一實施例進行操作之一指令可利用CPU被進行。於另一實施例中,該指令可利用GPU被進行。再於另一實施例中,該指令可以經由利用該GPU以及該CPU被進行之操作的組合被進行。例如,於一實施例中,依據一實施例之一指令可以被接收並 且被解碼以供在GPU上執行。但是,在被解碼的指令內之一個或多個操作可以利用一CPU被進行並且其結果被返回至該GPU以供該指令之最後除役。相反地,於一些實施例中,該CPU可以作用如同主要的處理器並且該GPU作用如同輔助處理器。 Figure 10 illustrates a processor including a central processing unit (CPU) and a graphics processing unit (GPU) that can perform at least one instruction in accordance with an embodiment. In one embodiment, an instruction to operate in accordance with at least one embodiment may be performed using a CPU. In another embodiment, the instructions can be made using a GPU. In still another embodiment, the instructions can be made via a combination of operations performed using the GPU and the CPU. For example, in one embodiment, an instruction may be received in accordance with an embodiment and And is decoded for execution on the GPU. However, one or more operations within the decoded instruction may be performed using a CPU and the results are returned to the GPU for the final decommissioning of the instruction. Conversely, in some embodiments, the CPU can function as a primary processor and the GPU acts like a secondary processor.
於一些實施例中,受益於高度地平行、產能處理器之指令可以利用GPU被進行,而受益於處理器性能(其受益於深度管線結構)的指令可以利用CPU被進行。例如,圖形、科學應用、財務應用以及其他平行工作負載可以受益於該GPU之性能並且因此被執行,而更多序列的應用,例如,操作系統核心或應用數碼可以更佳地適用於該CPU。 In some embodiments, instructions that benefit from a highly parallel, capacity processor can be utilized with the GPU, while instructions that benefit from processor performance (which benefits from the deep pipeline structure) can be utilized with the CPU. For example, graphics, scientific applications, financial applications, and other parallel workloads can benefit from the performance of the GPU and are therefore executed, while more sequential applications, such as operating system cores or application numbers, may be better suited for the CPU.
於圖10中,處理器1000包含一CPU 1005、GPU 1010、影像處理器1015、視訊處理器1020、USB控制器1025、UART控制器1030、SPI/SDIO控制器1035、顯示裝置1040、高清晰度多媒體介面(HDMI)控制器1045、MIPI控制器1050、快閃記憶體控制器1055、雙重資料率(DDR)控制器1060、安全引擎1065、以及I2S/I2C(整合晶片間聲音/積體電路間)介面1070。其他邏輯以及電路可以被包含於圖10之處理器中,而包含更多的CPU或GPU以及其他的週邊介面控制器。 In FIG. 10, the processor 1000 includes a CPU 1005, a GPU 1010, an image processor 1015, a video processor 1020, a USB controller 1025, a UART controller 1030, an SPI/SDIO controller 1035, a display device 1040, and high definition. Multimedia Interface (HDMI) Controller 1045, MIPI Controller 1050, Flash Memory Controller 1055, Dual Data Rate (DDR) Controller 1060, Security Engine 1065, and I 2 S/I 2 C (Integrated Inter-Wound Sound / Between the integrated circuits) interface 1070. Other logic and circuitry can be included in the processor of Figure 10, including more CPUs or GPUs and other peripheral interface controllers.
至少一實施例之一個或多個論點可被實行,其藉由被儲存在一機器可讀取媒體上而代表在處理器內之各種邏輯的表示資料,其當利用一機器被讀取時將導致該機器構成 邏輯以進行此處說明之技術。此等表示,習知如“IP核心”,可以被儲存在一實體化機器可讀取媒體(“卡帶”)上,並且被供應至各種顧客或生產設施以加載到實際上構成邏輯或處理器之製造機器上。例如,IP核心,例如,由ARM控股公司開發的CortexTM系列處理器以及由中國科學院電腦技術學院(ICT)所開發的LoongsonIP核心可以被授權或出售至各種顧客或授權者,例如,德州儀器公司、高通、蘋果、或三星,並且於利用這些顧客或授權者被產生的處理器中被實行。 One or more arguments of at least one embodiment may be implemented by representing a representation of various logic within the processor by being stored on a machine readable medium, which when read by a machine This causes the machine to be logical to perform the techniques described herein. Such representations, such as "IP cores", can be stored on a physical machine readable medium ("cartridge") and supplied to various customers or production facilities to be loaded into the actual constituent logic or processor. On the manufacturing machine. For example, IP cores, such as the Cortex TM family of processors developed by ARM Holdings, and the Loongson IP core developed by the Institute of Computer Technology (ICT) of the Chinese Academy of Sciences, can be licensed or sold to various customers or licensees, for example, Texas Instruments , Qualcomm, Apple, or Samsung, and are implemented in processors that utilize these customers or licensees to be generated.
圖11展示根據一實施例之IP核心發展的方塊圖。儲存器1130包含模擬軟體1120及/或硬體或軟體模組1110。於一實施例中,代表IP核心設計之資料可經由記憶體1140(例如,硬碟)、有線連接(例如,網際網路)1150或無線連接1160而被提供至該儲存器1130。利用模擬工具和模組所產生的IP核資訊接著可被發送至一製造設備,其可由一第三團體製造以進行依據至少一實施例之至少一指令。 11 shows a block diagram of an IP core development in accordance with an embodiment. The storage 1130 includes a simulation software 1120 and/or a hardware or software module 1110. In one embodiment, data representative of the IP core design may be provided to the storage 1130 via a memory 1140 (eg, a hard drive), a wired connection (eg, the Internet) 1150, or a wireless connection 1160. The IP core information generated using the simulation tools and modules can then be sent to a manufacturing device that can be manufactured by a third group for at least one instruction in accordance with at least one embodiment.
於一些實施例中,一個或多個指令可以對應至一第一型式或結構(例如,x86)並且在一不同型式或結構之一處理器(例如,ARM)上被轉化或被模擬。一指令,根據一實施例,因此可以被進行於任何處理器或處理器型式上,其包含ARM、x86、MIPS、一GPU、或其他處理器型式或結構。 In some embodiments, one or more instructions may correspond to a first version or structure (eg, x86) and be transformed or simulated on a different type or structure of one of the processors (eg, ARM). An instruction, according to an embodiment, can therefore be implemented on any processor or processor type, including ARM, x86, MIPS, a GPU, or other processor type or architecture.
圖12例示根據一實施例之一第一型式的指令是如何 利用一不同型式的處理器被模擬。於圖12中,程式1205含有一些指令,其可以根據一實施例進行如一指令之相同或實值地相同的功能。但是程式1205之指令可以是不同於或不相容於處理器1215的一型式及/或格式,這意味著程式1205中之指令型式可能不是能夠原始地利用處理器1215被執行。但是,藉由模擬邏輯1210之幫助,程式1205之指令被轉化成為原始地能夠利用處理器1215被執行之指令。於一實施例中,該模擬邏輯以硬體方式被實施。於另一實施例中,該模擬邏輯於實體化機器可讀取媒體中被實施,該實體化機器可讀取媒體含有軟體以於程式1205中將該型式指令轉化成為原始地可利用處理器1215執行的型式。於其他實施例中,模擬邏輯是固定功能或可程控硬體以及被儲存在一實體化機器可讀取媒體上之一程式的組合。於一實施例中,該處理器含有模擬邏輯,而於其他實施例中,該模擬邏輯存在處理器之外部並且由一第三團體提供。於一實施例中,該處理器是能夠負載該模擬邏輯,該模擬邏輯是被實施於含有藉由被含於處理器中或關聯於該處理器之執行微碼或韌體的軟體之一實體化機器可讀取媒體中。 Figure 12 illustrates how the instructions of the first version according to one embodiment are A different type of processor is used to simulate. In FIG. 12, the program 1205 contains instructions that can perform the same or the same function as an instruction, according to an embodiment. However, the instructions of program 1205 may be different from or incompatible with a type and/or format of processor 1215, which means that the pattern of patterns in program 1205 may not be able to be originally executed by processor 1215. However, with the aid of the emulation logic 1210, the instructions of the program 1205 are translated into instructions that were originally executable by the processor 1215. In one embodiment, the analog logic is implemented in a hardware manner. In another embodiment, the analog logic is implemented in a materialized machine readable medium that includes software to translate the type of instructions into the original usable processor 1215 in the program 1205. The type of execution. In other embodiments, the analog logic is a combination of fixed function or programmable hardware and a program stored on a materialized machine readable medium. In one embodiment, the processor contains analog logic, while in other embodiments, the analog logic is external to the processor and is provided by a third community. In one embodiment, the processor is capable of loading the analog logic, the analog logic being implemented in an entity comprising software that is executed by or associated with the processor or associated with the processor. The machine can be read in the media.
圖13是根據本發明之實施例而對照轉換一來源指令集中之二進制指令為一目標指令集中之二進制指令的一軟體指令轉換器之使用的方塊圖。於例示之實施例中,該指令轉換器是一軟體指令轉換器,雖然該指令轉換器可以另外地以軟體、韌體、硬體、或其各種組合被實行。圖13 展示一高階語言1302中之一程式,其可使用一x86編譯器1304被編譯以產生x86二進制數碼1306,該x86二進制數碼1306可以原始地利用具有至少一個x86指令集核心1316之一處理器被執行。具有至少一個x86指令集核心1316之處理器代表可進行實值地如具有至少一個x86指令集核心之一英特爾處理器之相同功能的任何處理器,該處理器藉由兼容執行或另外以不同方式地處理(1)英特爾x86指令集核心之指令集的一主要部份或(2)應用或其他軟體之目標碼版本以目標在具有至少一個x86指令集核心之一英特爾處理器上進行,以便達成如具有至少一個x86指令集核心之一英特爾處理器之實值地相同結果。該x86編譯器1304代表一編譯器,其是可操作以產生x86二進制數碼1306(例如,目標碼),其可能,具有或不具有另外的鏈路處理,被執行於具有至少一個x86指令集核心1316之處理器上。同樣地,圖13展示高階語言1302中之程式,其可使用一替代的指令集編譯器1308被編譯以產生替代的指令集二進制數碼1310,其可以原始地利用不具有至少一個x86指令集核心1314之一處理器被執行(例如,一具有核心之處理器,其執行加州之桑尼維爾的MIPS技術之MIPS指令集及/或執行加州之桑尼維爾的ARM控股公司之ARM指令集)。該指令轉換器1312被使用以轉換該x86二進制數碼1306成為數碼,其可以原始地利用不具有一x86指令集核心1314之該處理器被執行。這被轉換的數碼是很不可能相同如替代的指令集二進 制數碼1310,因為能如此作的一指令轉換器可能是不易於達成;但是,該被轉換的數碼將達成一般操作並且由不同指令集構成該指令。因此,該指令轉換器1312代表軟體、韌體、硬體、或其之組合,其經由模擬、模仿或任何其他處理程序,允許不具有一x86指令集處理器或核心之一處理器或其他電子裝置執行該x86二進制數碼1306。 13 is a block diagram of the use of a software command converter for converting a binary instruction in a source instruction set to a binary instruction in a target instruction set in accordance with an embodiment of the present invention. In the illustrated embodiment, the command converter is a software command converter, although the command converter can be additionally implemented in software, firmware, hardware, or various combinations thereof. Figure 13 A program in a higher level language 1302 is shown that can be compiled using an x86 compiler 1304 to produce an x86 binary number 1306 that can be originally executed using a processor having at least one x86 instruction set core 1316 . A processor having at least one x86 instruction set core 1316 represents any processor that can perform the same function as an Intel processor having one of at least one x86 instruction set core, the processor being implemented by compatibility or otherwise in a different manner Handling (1) a major portion of the Intel x86 instruction set core instruction set or (2) application or other software object code version to target on an Intel processor with at least one x86 instruction set core to achieve The same result is the real value of an Intel processor with at least one of the x86 instruction set cores. The x86 compiler 1304 represents a compiler operative to generate x86 binary digits 1306 (eg, object code), possibly with or without additional link processing, executed with at least one x86 instruction set core On the processor of 1316. Similarly, FIG. 13 shows a program in higher order language 1302 that can be compiled using an alternate instruction set compiler 1308 to generate an alternate instruction set binary digit 1310 that can be utilized originally without at least one x86 instruction set core 1314 One of the processors is executed (eg, a core processor that implements the MIPS instruction set for Sunnyvale's MIPS technology in California and/or executes the ARM instruction set of ARM Holdings of Sunnyvale, Calif.). The command converter 1312 is used to convert the x86 binary 1306 into a digital, which can be originally executed using the processor that does not have an x86 instruction set core 1314. This converted digital is very unlikely to be the same as the alternative instruction set. Digit 1310, because an instruction converter that can do this may not be easy to achieve; however, the converted digital will achieve general operation and consist of different instruction sets. Thus, the command converter 1312 represents software, firmware, hardware, or a combination thereof that allows for an processor or other electronic device that does not have an x86 instruction set processor or core via analog, analog, or any other processing program. The device executes the x86 binary 1306.
於一些實施例中,一高階語言1306可支援本質功能之使用,其是x86編譯器1304所習知的功能,其直接地映射至一序列之一個或多個組合語言指令以經由x86二進制數碼1306中之一操作碼而提供基底暫存器交換狀態驗證功能。應了解,於一些實施例中,替代的指令集編譯器1308及/或指令轉換器1312可以辨識一個或多個本質功能及/或一個或多個組合語言指令以分別地提供基底暫存器交換狀態驗證功能,並且可以產生將達成一般操作之被轉換的數碼,以及/或經由模擬、模仿或任何其他處理程序,而允許不具有一x86指令集處理器核心之一處理器或其他電子裝置以執行該x86二進制數碼1306,以分別地提供基底暫存器交換狀態驗證功能。 In some embodiments, a high-level language 1306 can support the use of essential functions, which is a function known to the x86 compiler 1304, which maps directly to a sequence of one or more combined language instructions to pass x86 binary digits 1306. One of the operation codes provides a base register exchange state verification function. It should be appreciated that in some embodiments, an alternate instruction set compiler 1308 and/or instruction converter 1312 can recognize one or more essential functions and/or one or more combined language instructions to provide a base register exchange, respectively. State verification function, and may generate converted digital numbers that will achieve general operations, and/or allow for a processor or other electronic device that does not have an x86 instruction set processor core via analog, emulation, or any other processing program The x86 binary 1306 is executed to provide a base register exchange status verification function, respectively.
圖14例示轉化提供基底暫存器交換狀態驗證功能之指令的系統1400之一不同實施例。系統1400是映射非原始x86結構狀態至一記憶體1410中之任一記憶體位置或映射至一原始處理器1414中之處理器資源的範例。原始處理器1414包含一暫存器檔案1416,其包含該原始處理器結構之一般暫存器的結構。任何數量之暫存器可以被提 供。於系統1400之一實施例中,所有的非原始x86結構狀態被映射至記憶體1410。例如,描述符表1432(其可包含一廣域描述符表、一局域描述符表、以及一中斷描述符表)、頁表1430(其儲存虛擬對實際的位址轉化)、任務狀態區段1428、一般暫存器1420(未被展示於圖形中)、區段暫存器1426、控制暫存器1424、以及其他暫存器1422可以代表非原始構造狀態。 Figure 14 illustrates a different embodiment of a system 1400 for converting instructions that provide a base register exchange status verification function. System 1400 is an example of mapping non-original x86 fabric states to any of memory locations in memory 1410 or to processor resources in a raw processor 1414. The original processor 1414 includes a scratchpad file 1416 that contains the structure of the general scratchpad of the original processor structure. Any number of scratchpads can be raised for. In one embodiment of system 1400, all non-original x86 structural states are mapped to memory 1410. For example, the descriptor table 1432 (which may include a wide area descriptor table, a local area descriptor table, and an interrupt descriptor table), a page table 1430 (which stores virtual pairs of actual address translations), and a task status area. Segment 1428, general register 1420 (not shown in the graphics), sector register 1426, control register 1424, and other registers 1422 may represent non-original configuration states.
為了存取任何非原始構造狀態,一記憶體存取可以被進行。例如,如果一非原始指令具有一般暫存器之一者作為一運算元,該轉譯器或被轉換的原始指令進行被映射至一般暫存器的記憶體位置的一記憶體存取,以存取或更新那個一般暫存器。暫存器檔案1416中之暫存器可被該轉譯器或轉換器1412所使用作為臨時暫存器以保留中間結果,或另外地,暫存器檔案1416中之一組原始暫存器可以直接地被映射至非原始結構狀態之一般暫存器或供用於其他局域轉譯器/轉換器狀態。一般暫存器1420可以包含x86之64位元整數之一般暫存器(例如,RAX、RBX、等等),以及利用REX字首位元組被界定之另外的整數一般暫存器。向量/FP暫存器1418可以包含80位元浮點暫存器、串流單一指令、多數個資料(SIMD)擴展(SSE)暫存器、以及利用REX字首位元組被界定之另外的SSE暫存器。區段暫存器1426可以包含對應至六個16位元x86區段暫存器之儲存位置,其也可包含所展示之FS和GS區段暫存器並且可選擇地被映射至記憶體1410及/或於處理 In order to access any non-original state of construction, a memory access can be made. For example, if a non-original instruction has one of the general registers as an operand, the translator or the converted original instruction performs a memory access mapped to the memory location of the general scratchpad to save Take or update the general scratchpad. The scratchpad in the scratchpad file 1416 can be used by the translator or converter 1412 as a temporary scratchpad to preserve intermediate results, or alternatively, a set of original scratchpads in the scratchpad file 1416 can be directly The ground is mapped to a general scratchpad of the non-original state of the state or for use in other local translator/converter states. The general register 1420 can include a general register of x86 64-bit integers (eg, RAX, RBX, etc.), as well as additional integer general registers defined with the first byte of the REX word. The vector/FP register 1418 may include an 80-bit floating point register, a stream single instruction, a majority data (SIMD) extension (SSE) register, and an additional SSE defined with the first bit of the REX word. Register. Segment register 1426 may include storage locations corresponding to six 16-bit x86 sector registers, which may also include the FS and GS sector registers shown and optionally mapped to memory 1410 And/or processing
器1414中被支援。控制暫存器1424可以包含對應至在非原始x86處理器結構中被界定的各種控制暫存器之儲存位置。例如,控制暫存器,例如,包括LMA(長模式致動)、LME(長模式致能)之一擴展特點致能暫存器(EFER)、及包括PG( 分頁)和PE(保護模式致能)位元之CR0、以及LDTR(局域描述符表暫存器)和GDTR(廣域描述符表暫存器)以及CR3暫存器(其儲存頁表1430之基底位址)被展示。其他控制暫存器(例如,CR1、CR2、CR4等等)同樣地也可以被包含於控制暫存器1424中,其也可選擇地包含一控制暫存器以儲存一基底暫存器交換狀態欄(例如,一GS.base交換狀態及/或一FS.base交換狀態)如於圖16B之展示。其他暫存器1422包含任何其餘的x86結構暫存器。例如,EFLAGS暫存器(其儲存條件數碼資訊,並且可選擇地儲存一基底暫存器交換狀態欄,例如,於圖16B之展示)、指令指示器(IP)暫存器(其儲存將被執行的指令之位址),並且模組特定暫存器(MSR)可以被包含於其他暫存器1422中。其他暫存器1422之一實施例包括一KernelGSBase MSR、一GS.base MSR、一FS.base MSR,並且可以選擇地包含一MSR以儲存一基底暫存器交換狀態欄(例如,一GS.base交換狀態及/或一FS.base交換狀態),如於圖16B之展示。 It is supported in the device 1414. Control register 1424 can include storage locations corresponding to various control registers defined in the non-original x86 processor architecture. For example, the control register, for example, includes one of LMA (Long Mode Actuation), LME (Long Mode Enable), Extended Feature Enablement Register (EFER), and includes PG ( Paged ) and PE (protected mode enabled) bit CR0, and LDTR (local descriptor table register) and GDTR (wide area descriptor table register) and CR3 register (its stored page table 1430) The base address is shown. Other control registers (e.g., CR1, CR2, CR4, etc.) may likewise be included in control register 1424, which may also optionally include a control register to store a base register swap state. A column (eg, a GS.base exchange state and/or a FS.base exchange state) is shown in Figure 16B. The other registers 1422 contain any remaining x86 fabric registers. For example, the EFLAGS register (which stores conditional digital information and optionally stores a base register exchange status bar, for example, as shown in Figure 16B), an instruction indicator (IP) register (its storage will be The address of the executed instruction), and the module specific register (MSR) can be included in other registers 1422. An embodiment of the other register 1422 includes a Kernel GSBase MSR, a GS.base MSR, an FS.base MSR, and optionally an MSR to store a base register exchange status bar (eg, a GS.base) The exchange state and/or a FS.base exchange state) is as shown in Figure 16B.
圖15A例示用於提供基底暫存器交換狀態驗證功能之處理程序1501的一實施例之流程圖。此處被揭示之處理程序1501以及其他處理程序利用處理方塊被進行,其可 以包括利用一般用途機器或利用特殊用途機器或利用二者之組合可執行的專用硬體或軟體或韌體操作數碼。於一實施例中,用以交換GS基底暫存器之一SwapGS指令於處理程序方塊1518中被解碼。接著於處理程序方塊1520中,一交換被執行,其響應至該被解碼的SwapGS指令而交換一GS基底暫存器值以及一核GS基底暫存器值。於處理程序方塊1522中,判定該GS基底暫存器值以及該核GS基底暫存器值之該交換是否順利地被完成。如果是,則於處理程序方塊1524中,響應至該GS基底暫存器值以及該核GS基底暫存器值之該交換順利地被完成之該判定,一交換GS狀態被反向。否則將響應至對於該GS基底暫存器值以及該核GS基底暫存器值之該交換順利地被完成之判定失效而處理程序結束。 Figure 15A illustrates a flow diagram of an embodiment of a handler 1501 for providing a substrate scratchpad exchange status verification function. The processing program 1501 and other processing programs disclosed herein are performed using a processing block, which can be The digital is operated in a dedicated hardware or software or firmware that includes a general purpose machine or a special purpose machine or a combination of the two. In one embodiment, the SwapGS instruction to exchange one of the GS base registers is decoded in processing block 1518. Next, in process block 1520, an exchange is performed that exchanges a GS base register value and a core GS base register value in response to the decoded SwapGS instruction. In processing block 1522, it is determined whether the exchange of the GS base register value and the core GS base register value is successfully completed. If so, then in process block 1524, in response to the determination that the exchange of the GS base register value and the core GS base register value is successfully completed, an exchange GS state is reversed. Otherwise, the processing will end with a response to the determination that the exchange of the GS base register value and the core GS base register value is successfully completed.
圖15B例示用於在一例外入口上提供基底暫存器交換狀態驗證功能之處理程序1502的一不同實施例之流程圖。於處理程序方塊1510中,應用指令被處理。於處理程序方塊1512中,一系統呼叫或中斷或例外發生,而要求進入至一系統處置器處理程序中。於處理程序方塊1514中,一交換GS狀態欄之狀態被判定。接著於處理程序方塊1516中,決定一SwapGS指令是否為所需的,例如,依據該交換GS狀態欄之狀態。如果決定一SwapGS指令是所需的,則於處理程序方塊1518中,一SwapGS指令被解碼。接著於處理程序方塊1520中,一交換被執行,其響應至該被解碼之SwapGS指令而交換一GS基底 暫存器值以及一核GS基底暫存器值。於處理程序方塊1522中,被判定該GS基底暫存器值以及該核GS基底暫存器值之該交換是否順利地被完成。如果是,於處理程序方塊1524中,響應至該GS基底暫存器值以及該核GS基底暫存器值之該交換順利地被完成之判定,一交換GS狀態被反向,或如果其於處理程序方塊1516中原始地決定一SwapGS指令不是所需的,則處理程序前進至處理程序方塊1526,其中系統或系統處置器指令被處理。此外,於一些實施例中,響應至處理程序方塊1522中該GS基底暫存器值以及該核GS基底暫存器值之該交換不是順利地被完成之判定,處理程序可以反覆地開始於處理程序方塊1514中。另外地,於一些實施例中,響應至該GS基底暫存器值以及該核GS基底暫存器值之該交換不是順利地被完成之判定,則一中斷或錯誤處理例行程序可能被請求調處。 Figure 15B illustrates a flow diagram of a different embodiment of a process 1502 for providing a substrate scratchpad exchange status verification function on an exception entry. In process block 1510, the application instructions are processed. In process block 1512, a system call or interrupt or exception occurs, requiring entry into a system handler handler. In process block 1514, the status of an exchange GS status bar is determined. Next, in process block 1516, a determination is made as to whether a SwapGS instruction is required, for example, based on the state of the exchange GS status bar. If a SwapGS instruction is required, then in a processing block 1518, a SwapGS instruction is decoded. Next in process block 1520, an exchange is performed that exchanges a GS base in response to the decoded SwapGS command. The scratchpad value and the one-core GS base register value. In processing block 1522, it is determined whether the exchange of the GS base register value and the core GS base register value is successfully completed. If so, in processing block 1524, in response to the determination that the exchange of the GS base register value and the core GS base register value is successfully completed, an exchange GS state is reversed, or if Handler block 1516 originally determines that a SwapGS instruction is not required, then the process proceeds to processing block 1526 where the system or system handler instruction is processed. Moreover, in some embodiments, in response to the determination that the exchange of the GS base register value and the core GS base register value in the processing block 1522 is not successfully completed, the process may begin with processing in reverse. In block 1514. Additionally, in some embodiments, an interrupt or error handling routine may be requested in response to the GS base register value and the exchange of the core GS base register value being not successfully completed. Mediation.
圖15C例示用於在一例外退出上提供基底暫存器交換狀態驗證功能之處理程序1503的另一不同實施例之流程圖。開始於處理程序方塊1526中,系統或系統處置器指令被處理。於處理程序方塊1530中,一應用狀態的一些部份被恢復以供返回至一應用處理程序。於處理程序方塊1534中,交換GS狀態欄之當前狀態被判定並且處理前進至處理區塊1536,於其中決定一SwapGS指令是否為所需的,例如,依據交換GS狀態欄之狀態。如果是,則一SwapGS指令於處理程序方塊1538中被解碼。於處理程序 方塊1540中,一交換被執行,其響應至該被解碼的SwapGS指令而交換一GS基底暫存器值以及一核GS基底暫存器值。於處理程序方塊1542中,判定該GS基底暫存器值以及該核GS基底暫存器值之該交換是否順利地被完成。如果是,則於處理程序方塊1544中,響應至該GS基底暫存器值以及該核GS基底暫存器值之該交換順利地被完成之判定,一交換GS狀態被反向。此外,於一些實施例中,處理可以反覆地開始於處理程序方塊1540中,直至該GS基底暫存器值以及該核GS基底暫存器值之該交換順利地完成為止。另外地,於一些實施例中,響應至該GS基底暫存器值以及該核GS基底暫存器值之該交換並不是順利地完成之判定,一中斷或錯誤處理例行程式可以被請求調處。自處理區塊1544之後,或如果於處理程序方塊1536中被決定,沒有SwapGS指令是所需的,則形成自系統呼叫或中斷或例外返回於處理程序方塊1548中。接著於處理程序方塊1550中,應用指令之處理恢復。 Figure 15C illustrates a flow diagram of another different embodiment of a handler 1503 for providing a base register exchange status verification function on an exceptional exit. Beginning in process block 1526, the system or system handler instructions are processed. In process block 1530, portions of an application state are restored for return to an application handler. In processing block 1534, the current state of the swap GS status bar is determined and processing proceeds to processing block 1536 where a determination is made as to whether a SwapGS instruction is required, for example, based on the status of the exchange GS status bar. If so, a SwapGS instruction is decoded in handler block 1538. Processing program In block 1540, an exchange is performed that exchanges a GS base register value and a core GS base register value in response to the decoded SwapGS instruction. In processing block 1542, it is determined whether the exchange of the GS base register value and the core GS base register value is successfully completed. If so, then in process block 1544, an exchange GS state is reversed in response to a determination that the exchange of the GS base register value and the core GS base register value is successfully completed. Moreover, in some embodiments, processing may begin in process block 1540 repeatedly until the exchange of the GS base register value and the core GS base register value is successfully completed. Additionally, in some embodiments, the exchange in response to the GS base register value and the core GS base register value is not a successful completion of the determination, an interrupt or error handling routine can be requested to be mediated . After processing block 1544, or if determined in handler block 1536, no SwapGS instruction is required, then a self-system call or interrupt or exception is returned to process block 1548. Next, in process block 1550, the processing of the application instructions is resumed.
應了解,雖然上面之範例展示將一交換GS狀態反向,處理程序1501-1503也可被應用至其他區段基底(例如,FS基底)、區段基底之組合(例如,FS基底以及GS基底)或其他區段屬性(例如,用於CS屬性之一MSR)。同時也應了解,一描述符基底及/或屬性暫存器交換或修改狀態也可有效地被改變且不被反向以指示描述符基底及/或屬性暫存器之交換或修改狀態,而不脫離本揭示之原理或 所附加申請專利範圍之範疇。 It should be appreciated that while the above examples show that an exchange GS state is reversed, handlers 1501-1503 can also be applied to other segment substrates (eg, FS substrates), segment substrate combinations (eg, FS substrate and GS substrate). ) or other segment attributes (for example, one of the CS attributes for MSR). It should also be appreciated that a descriptor base and/or attribute register swap or modify state can also be effectively changed and not reversed to indicate the exchange or modification state of the descriptor base and/or attribute register, and Without departing from the principles of this disclosure or The scope of the patent application scope is attached.
圖16A例示執行提供基底暫存器交換狀態驗證功能之指令的一處理器微結構1601實施例之元件。圖16A是根據本發明至少一實施例而例示一依序管線處理器核心、以及一暫存器重命名步驟、失序發布/執行管線之方塊圖。圖16A中之實線方塊例示依序管線,而虛線方塊則例示暫存器重新命名、失序發布/執行管線。 Figure 16A illustrates elements of a processor microstructure 1601 embodiment that executes instructions to provide a substrate scratchpad exchange state verification function. 16A is a block diagram illustrating a sequential pipeline processor core, a register renaming step, and an out-of-order issue/execution pipeline, in accordance with at least one embodiment of the present invention. The solid line block in Figure 16A illustrates the sequential pipeline, while the dashed line block illustrates the register renaming, out of order release/execution pipeline.
處理器微結構1601包括一解碼單元1640以解碼至少一SwapGS指令、一執行引擎單元1650以及一記憶體單元1670。該解碼單元1640被耦合至執行引擎單元1650中之一重新命名/分配器單元1652。該執行引擎單元1650包含被耦合至一除役單元1654之重新命名/分配器單元1652以及一組之一個或多個排程器單元1656。該排程器單元1656代表任何數量之不同的排程器,其包含保留站、中央指令窗口,等等。該排程器單元1656可以被耦合至實體暫存器檔案,其包含向量實體暫存器1684、遮罩實體暫存器1682、浮點(FP)實體暫存器1680、EFLAGS實體暫存器1620、指令指示器(IP)實體暫存器1621、MRS實體暫存器1622、區段實體暫存器1626、控制(CTL)實體暫存器1624以及整數實體暫存器1686。各個實體暫存器檔案代表一個或多個實體暫存器檔案,其不同的一者儲存一個或多個不同的資料型式,例如,純量整數、純量浮點、封裝整數、封裝浮點、向量整數、向量浮點、等等,狀態(例如,一指令指示器,其指示將被執行之下一個指 令的位址)、控制、描述符表暫存器、等等。 The processor microstructure 1601 includes a decoding unit 1640 to decode at least one SwapGS instruction, an execution engine unit 1650, and a memory unit 1670. The decoding unit 1640 is coupled to one of the execution engine units 1650, the rename/allocator unit 1652. The execution engine unit 1650 includes a rename/dispenser unit 1652 coupled to a decommissioning unit 1654 and a set of one or more scheduler units 1656. The scheduler unit 1656 represents any number of different schedulers, including reservation stations, central command windows, and the like. The scheduler unit 1656 can be coupled to a physical register file that includes a vector entity register 1684, a mask entity register 1682, a floating point (FP) entity register 1680, and an EFLAGS entity register 1620. An instruction indicator (IP) entity register 1621, an MRS entity register 1622, a section entity register 1626, a control (CTL) entity register 1624, and an integer entity register 1686. Each physical register file represents one or more physical register files, one of which stores one or more different data types, such as scalar integers, scalar floating points, packed integers, encapsulated floating points, Vector integer, vector floating point, etc., state (eg, an instruction indicator indicating that the next finger will be executed The address of the order), control, descriptor table register, and so on.
執行引擎單元1650的一些實施例包括一儲存資料緩衝器1699,其中來自一純量暫存器或來自一描述符暫存器之資料元素可以被寫入以供用於一儲存操作,或來自一SIMD向量暫存器以供用於一分散(或在遮罩之下的一串流儲存)操作之所有資料元素可以在同一時間被寫入儲存器資料緩衝器1699之多數個各別的元件儲存位置(例如,使用一單一微操作)。應了解,被儲存於儲存器資料緩衝器1699之這些各別的儲存位置中之資料元素接著可以被傳送以滿足較新負載操作而不必存取外部記憶體。位址產生邏輯1694自至少一基底位址1604、一索引1605以及一位移1603而產生一有效位址1606(例如,利用整數實體暫存器1686或如即時資料地被提供)。儲存器被分配於儲存器資料緩衝器1699中以保留對應至所產生的有效位址1606之資料元素以供藉由記憶體存取單元1664而儲存至對應的記憶體位置。被產生之對應至有效位址1606的資料元素被複製至該儲存資料緩衝器1699中。記憶體存取單元1664是可操作地與該位址產生邏輯1694相耦合以經由記憶體單元1670而存取一記憶體位置(於一分散指令之情況中,供用於具有一不被遮罩值之一對應的遮罩1608元件),對應至響應一儲存指令或響應一分散指令而藉由位址產生邏輯1694所產生的一有效位址1606之記憶體位置,儲存一資料元素1609。於一實施例中,如果它們的有效位址1606對應至較新負載指令之有效位址,被儲存 於儲存器資料緩衝器1699中之資料元素可以被存取以滿足在序列指令順序外之較新負載指令。 Some embodiments of execution engine unit 1650 include a stored data buffer 1699 in which data elements from a scalar register or from a descriptor register can be written for use in a storage operation, or from a SIMD The vector register can be written to a plurality of individual component storage locations of the memory data buffer 1699 at the same time for operation of a data element for a decentralized (or a stream of storage under the mask) operation. For example, use a single micro-operation). It will be appreciated that the data elements stored in these respective storage locations of the storage data buffer 1699 can then be transferred to accommodate newer load operations without having to access external memory. Address generation logic 1694 generates a valid address 1606 from at least one base address 1604, an index 1605, and a offset 1603 (eg, using integer entity register 1686 or provided as instant data). The memory is allocated in the memory data buffer 1699 to retain the data elements corresponding to the generated valid address 1606 for storage by the memory access unit 1664 to the corresponding memory location. The data element generated corresponding to the valid address 1606 is copied into the stored data buffer 1699. A memory access unit 1664 is operatively coupled to the address generation logic 1694 for accessing a memory location via the memory unit 1670 (in the case of a scatter instruction for having an unmasked value) One of the corresponding mask 1608 elements) stores a data element 1609 corresponding to the memory location of a valid address 1606 generated by the address generation logic 1694 in response to a store command or in response to a scatter command. In one embodiment, if their valid address 1606 corresponds to the valid address of the newer load instruction, it is stored. The data elements in the memory data buffer 1699 can be accessed to satisfy newer load instructions outside of the sequence of instruction instructions.
於一些實施例中,記憶體存取單元1664是可操作地與記憶體單元1670之一TLB 1672耦合以使用頁表資訊(例如,頁表1430)而轉化虛擬位址為實際的位址。於處理器微結構1601之一實施例中,x86結構狀態利用處理器微結構1601於硬體被支援。例如,描述符表1432(其可包含一廣域描述符表、一局域描述符表、以及一中斷描述符表)利用暫存器被支援、頁表1430(其儲存虛擬對實際的位址轉化)利用一硬體頁表步進器被支援、一般暫存器1420利用整數實體暫存器1686被支援、區段暫存器1426利用區段實體暫存器1626被支援、控制暫存器1424利用控制實體暫存器1624被支援、以及其他暫存器1422利用EFLAGS1620、IP1621、MSR實體暫存器1622被支援,等等。 In some embodiments, memory access unit 1664 is operatively coupled to one of TLV 1672 of memory unit 1670 to convert the virtual address to the actual address using page table information (eg, page table 1430). In one embodiment of processor microstructure 1601, the x86 fabric state is supported by hardware using processor microstructure 1601. For example, the descriptor table 1432 (which may include a wide area descriptor table, a local area descriptor table, and an interrupt descriptor table) is supported by the scratchpad, page table 1430 (which stores the virtual pair actual address) The conversion is supported by a hardware pager stepper, the general register 1420 is supported by the integer entity register 1686, the section register 1426 is supported by the section entity register 1626, and the register is controlled. 1424 is supported by control entity register 1624, and other registers 1422 are supported by EFLAGS 1620, IP 1621, MSR entity register 1622, and the like.
應了解,基底暫存器交換狀態驗證,如於此處說明之實施例中,可被使用以提供例外處置器能力,例如,以在運行時,推斷一SwapGS指令是否需要被執行而不必憑藉特設方法,例如,複雜及易於損壞的“魔術地址檢查”、或設定核心中之GS基底暫存器至負的值並且使用以設定FS基底暫存器及/或GS基底暫存器之使用者空間指令失能。 It should be appreciated that the base register swap state verification, as in the embodiments described herein, can be used to provide exception handler capabilities, for example, to infer at runtime whether a SwapGS instruction needs to be executed without having to resort to an ad hoc Method, for example, a complex and vulnerable "magic address check", or setting the GS base register in the core to a negative value and used to set the user space of the FS base register and/or GS base register The instruction is disabled.
圖16B例示執行提供基底暫存器交換狀態驗證功能之指令的一處理器微結構1602之實施例的詳細元件。處理器微結構1602包含整數實體暫存器1686、位址產生邏輯 1694、儲存資料緩衝器1699以及記憶體存取單元1664等等,以進行相似於關於圖16A的如上所述之那些功能。處理器微結構1602之實施例也可具有MSR實體暫存器1622以儲存對應至供用於一第一執行脈絡(例如,一應用)之一區段GS1628之一第一基底位址欄(例如,GS.base1630)。一第二MSR實體暫存器1622可以儲存對應至供用於一第二執行脈絡(例如,一系統或中斷處置器)之區段GS 1628之一第二基底位址欄(例如,KernelGSBase 1632)。於一實施例中,一第三暫存器儲存對應至第一以及第二執行脈絡的區段GS1628之一基底暫存器交換狀態欄1615。於不同實施例中,基底暫存器交換狀態欄1615對應至區段FS1627,或GS1628以及FS1627兩者。於一些不同實施例中,儲存基底暫存器交換狀態欄1615的第三暫存器可以是控制實體暫存器1624(例如,CR1625)之一者或MSR實體暫存器1622(例如MSR1635)之一者或一旗標暫存器(例如,EFLAGS1620或一雷克斯擴展旗標RFLAGS暫存器)或一些其他的暫存器,例如,一擴展特點致能暫存器(EFER)。 Figure 16B illustrates the detailed elements of an embodiment of a processor microstructure 1602 that performs instructions for providing a substrate scratchpad exchange state verification function. Processor microstructure 1602 includes integer entity register 1686, address generation logic 1694, stored data buffer 1699 and memory access unit 1664, etc., perform functions similar to those described above with respect to FIG. 16A. The embodiment of processor microstructure 1602 can also have an MSR physical register 1622 to store a first base address field corresponding to one of the segments GS 1628 for use in a first execution context (eg, an application) (eg, GS.base1630). A second MSR physical register 1622 can store a second base address field (e.g., Kernel GSBase 1632) corresponding to one of the segments GS 1628 for use in a second execution context (e.g., a system or interrupt handler). In one embodiment, a third register stores one of the base register swap status fields 1615 of the extent GS 1628 corresponding to the first and second execution contexts. In various embodiments, the base register swap status column 1615 corresponds to either section FS1627, or both GS 1628 and FS 1627. In some different embodiments, the third register storing the scratchpad swap status column 1615 can be one of the control entity registers 1624 (eg, CR 1625) or the MSR physical register 1622 (eg, MSR 1635). One or a flag register (eg, EFLAGS 1620 or a Rex extension flag RFLAGS register) or some other scratchpad, such as an extended feature enable register (EFER).
MSR實體暫存器1622之實施例也可儲存對應至供用於至少一個執行模式(例如,長模式)之一區段(例如,FS1627或GS1628)之一64位元基底位址欄(例如,FS.base1634或GS.base1630)。MSR實體暫存器1622之實施例可以進一步地儲存對應至供用於至少另一執行模式(例如,一兼容模式)之一區段(例如,FS1627或GS1628) 之一32位元基底位址欄(例如,基底1637或基底1638)。如於區段1610之展示,至少一個執行模式(例如,一兼容模式)中之一區段可使用一16位元區段選擇器(例如,FS 1627或GS 1628)以及包含一基底位址欄(例如,基底1637或基底1638)和屬性欄(未被展示於圖形中)之一對應的區段描述符。然而,至少一個另外的執行模式(例如,一64位元模式)中之一區段可使用一16位元區段選擇器(例如,FS1627或GS1628)以及僅包含一64位元基底位址欄(例如,FS.base1634或GS.base1630)之一對應的區段描述符。 The embodiment of MSR entity register 1622 may also store a 64-bit base address field corresponding to one of the segments (eg, FS1627 or GS1628) for at least one execution mode (eg, long mode) (eg, FS) .base1634 or GS.base1630). An embodiment of the MSR physical register 1622 can further store a segment corresponding to at least one other execution mode (eg, a compatibility mode) (eg, FS1627 or GS1628) A 32-bit base address field (eg, substrate 1637 or substrate 1638). As shown in section 1610, one of the at least one execution mode (eg, a compatibility mode) can use a 16-bit section selector (eg, FS 1627 or GS 1628) and include a base address field. A segment descriptor corresponding to one of (for example, substrate 1637 or substrate 1638) and one of the attribute columns (not shown in the graphic). However, one of the at least one additional execution mode (eg, a 64-bit mode) may use a 16-bit segment selector (eg, FS1627 or GS1628) and only include a 64-bit base address bar. A segment descriptor corresponding to one of (for example, FS.base1634 or GS.base1630).
一解碼單元(例如,1640)可以解碼一交換指令(例如,SwapGS),並且一執行單元(例如,1650)響應至該被解碼的交換指令而執行第一MSR值及第二MSR值之一交換。如果該第一MSR值及該第二MSR值之交換順利地完成,則響應至該交換順利地被完成之一判定,該基底暫存器交換狀態欄1625之一值被改變。應了解,藉由於該基底暫存器交換狀態欄1625中提供一值以指示該第一MSR值及該第二MSR值是否被交換,系統或中斷處置器脈絡可以在運行時,推斷一SwapGS指令是否需要被執行。 A decoding unit (e.g., 1640) can decode an exchange instruction (e.g., SwapGS), and an execution unit (e.g., 1650) performs one of the exchange of the first MSR value and the second MSR value in response to the decoded exchange instruction. . If the exchange of the first MSR value and the second MSR value is successfully completed, a value of one of the base register exchange status fields 1625 is changed in response to one of the exchanges being successfully completed. It should be appreciated that by providing a value in the base register exchange status column 1625 to indicate whether the first MSR value and the second MSR value are exchanged, the system or interrupt handler context can infer a SwapGS instruction at runtime. Whether it needs to be executed.
應了解,基底暫存器交換狀態驗證指令可被使用以避免例外處置器設計中之複雜、額外耗時檢查以及非必要的使用者限制。其他實施例也是可能以及可仔細考慮的。 It should be appreciated that the base register exchange status verification instructions can be used to avoid complications, additional time consuming inspections, and unnecessary user restrictions in the exception handler design. Other embodiments are also possible and can be considered carefully.
此處被揭示之機構實施例可以硬體、軟體、韌體、或此等實行方法之組合被實行。本發明之實施例可被實行, 如在包括至少一處理器、一儲存系統(包含依電性以及非依電性記憶體及/或儲存元件)、至少一輸入裝置、以及至少一輸出裝置之可程控系統上執行之電腦程式或程式碼。 The embodiments of the mechanisms disclosed herein may be practiced in the form of hardware, software, firmware, or a combination of such methods. Embodiments of the invention may be practiced, a computer program executed on a programmable system including at least one processor, a storage system (including electrical and non-electrical memory and/or storage elements), at least one input device, and at least one output device Code.
程式碼可被應用至輸入指令以進行此處說明之功能並且產生輸出資訊。該輸出資訊可以習知的形式被應用至一個或多個輸出裝置。為了這應用之目的,一處理系統包含具有一處理器之任何系統,例如:一數位信號處理器(DSP)、一微控制器、一特定應用積體電路(ASIC)、或一微處理器。 The code can be applied to input commands to perform the functions described herein and to generate output information. The output information can be applied to one or more output devices in a conventional form. For the purposes of this application, a processing system includes any system having a processor, such as a digital signal processor (DSP), a microcontroller, an application specific integrated circuit (ASIC), or a microprocessor.
程式碼可以一高階程式或物件導向程式語言被實行以與一處理系統通訊。如果需要的話,該程式碼也可以組合語言或機器語言被實行。實際上,此處說明之該等機構是不受限定於任何特定程式語言的範疇中。於任何情況下,該語言可以是一編譯或轉譯的語言。 The code can be implemented in a higher level program or object oriented programming language to communicate with a processing system. The code can also be implemented in a combination of language or machine language, if desired. In fact, the institutions described herein are not limited to any particular programming language. In any case, the language can be a compiled or translated language.
至少一實施例之一個或多個論點藉由被儲存在一機器可讀取媒體上之表示指令可被實行,該等表示指令代表在處理器內之各種邏輯,該等指令當利用一機器被讀取時,將導致該機器製造邏輯以進行此處說明之技術。此等表示,習知如“IP核心”者,可以被儲存在一實體化機器可讀取媒體上並且被供應至各種顧客或製造設施以便裝載至實際上構成該等邏輯或處理器之製造機器中。 One or more arguments of at least one embodiment can be implemented by a representation instruction stored on a machine readable medium, the representation instructions representing various logic within the processor, the instructions being utilized by a machine When read, it will cause the machine to make logic to perform the techniques described herein. Such representations, such as "IP core", can be stored on a physical machine readable medium and supplied to various customers or manufacturing facilities for loading into a manufacturing machine that actually constitutes the logic or processor. in.
此等機器可讀取儲存媒體可包含,但不必限制於,藉由一機器或裝置被製造或被形成之製品的非暫態實體化配置,其包含儲存媒體,例如,硬碟,任何其他型式之碟 片,如包含軟式磁碟片、光學碟片、小型碟片唯讀記憶體(CD-ROM)、可重寫小型碟片(CD-RW)、以及磁式光碟、半導體裝置,例如,唯讀記憶體(ROM)、隨機存取記憶體(RAM),例如,動態隨機存取記憶體(DRAM)、靜態隨機存取記憶體(SRAM)、可消除可程控唯讀記憶體(EPROM)、快閃記憶體、電氣可消除可程控唯讀記憶體(EEPROM)、磁卡或光學卡、或用於儲存電子指令之任何其他型式的媒體。 Such machine readable storage media may include, but are not necessarily limited to, a non-transitory materialized configuration of an article manufactured or formed by a machine or device, including a storage medium, eg, a hard disk, any other type Disc Films, including floppy disks, optical disks, compact disc-read only memory (CD-ROM), rewritable compact discs (CD-RW), and magnetic optical discs, semiconductor devices, for example, read only Memory (ROM), random access memory (RAM), for example, dynamic random access memory (DRAM), static random access memory (SRAM), erasable programmable read only memory (EPROM), fast Flash memory, electrical eliminates programmable read-only memory (EEPROM), magnetic or optical cards, or any other type of media used to store electronic commands.
因此,本發明實施例同時也包含非暫態實體化機器可讀取媒體,其含有指令或含有設計資料,例如,硬體說明語言(HDL),其界定此處說明之結構、電路、裝置、處理器及/或系統特點。此等實施例也可被稱為程式產品。 Accordingly, embodiments of the present invention also include non-transitory materialized machine readable media containing instructions or containing design material, such as hardware description language (HDL), which defines the structures, circuits, devices, and Processor and / or system features. These embodiments may also be referred to as program products.
於一些情況中,一指令轉換器可被使用以將一指令自一源指令集轉換至一目標指令集。例如,該指令轉換器可以轉化(例如,使用靜態二進制轉化、包含動態編譯之動態二進制轉化)、變形、模擬、或此外以其他方式地轉換一指令為將利用核心被處理的一個或多個其他指令。該指令轉換器可以軟體、硬體、韌體、或其組合被實行。該指令轉換器可以是在處理器上、在處理器外、或部分在處理器上以及部分在處理器外。 In some cases, an instruction converter can be used to convert an instruction from a source instruction set to a target instruction set. For example, the instruction converter can convert (eg, use static binary conversion, dynamic binary conversion including dynamic compilation), morph, simulate, or otherwise convert an instruction to one or more other instruction. The command converter can be implemented in software, hardware, firmware, or a combination thereof. The instruction converter can be on the processor, external to the processor, or partially on the processor, and partially external to the processor.
因此,用以根據至少一實施例以進行一個或多個指令之技術被揭示。雖然某些實施範例已被說明並且被展示於附圖中,應了解,此等實施例僅是作為例示且不限制廣泛之發明,並且熟習本技術者應明白,這發明是不受限定於 被展示以及被說明之特定構造以及配置,因為當研究這揭示時,各種其他修改是可能發生的。例如在這技術區域中,其之成長是快速的並且進一步的進步不是容易地預見,揭示之實施例當藉由致能技術進步而可方便於配置以及細節之容易地修改,而不脫離本揭示之原理或附加申請專利範圍之範疇。 Thus, techniques for making one or more instructions in accordance with at least one embodiment are disclosed. While certain embodiments of the invention have been shown and described in the drawings, it should be understood that The particular construction and configuration shown and illustrated, as various other modifications are possible when studying this disclosure. For example, in this technical area, its growth is rapid and further advancement is not easily foreseen, and the disclosed embodiments can be easily modified by configuration and detail without departing from the disclosure. The principle or scope of the patent application scope.
200‧‧‧處理器 200‧‧‧ processor
201‧‧‧前端點 201‧‧‧ Front end point
202‧‧‧快速排程器 202‧‧‧Quick Scheduler
203‧‧‧失序執行引擎 203‧‧‧ Out-of-order execution engine
204‧‧‧慢速/一般浮點排程器 204‧‧‧Slow/general floating point scheduler
206‧‧‧簡單浮點排程器 206‧‧‧Simple floating point scheduler
208‧‧‧整數暫存器檔案/旁通網路 208‧‧‧Integer Scratchpad File/Bypass Network
210‧‧‧浮點暫存器檔案/旁通網路 210‧‧‧Floating point register file/bypass network
211‧‧‧執行區塊 211‧‧‧Executive block
212‧‧‧位址產生單元(AGU) 212‧‧‧ Address Generation Unit (AGU)
214‧‧‧位址產生單元(AGU) 214‧‧‧ Address Generation Unit (AGU)
216‧‧‧快速ALU 216‧‧‧fast ALU
218‧‧‧快速ALU 218‧‧‧fast ALU
220‧‧‧慢速ALU 220‧‧‧Slow ALU
222‧‧‧浮點ALU 222‧‧‧Floating ALU
224‧‧‧浮點移動單元 224‧‧‧Floating point mobile unit
226‧‧‧指令預擷取器 226‧‧‧Instruction prefetcher
228‧‧‧指令解碼器 228‧‧‧ instruction decoder
230‧‧‧追蹤快取 230‧‧‧ Tracking cache
232‧‧‧微碼ROM 232‧‧‧Microcode ROM
234‧‧‧微操作佇列 234‧‧‧Micromanipulation queue
Claims (27)
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US14/138,054 US20150178078A1 (en) | 2013-12-21 | 2013-12-21 | Instructions and logic to provide base register swap status verification functionality |
Publications (1)
Publication Number | Publication Date |
---|---|
TW201738758A true TW201738758A (en) | 2017-11-01 |
Family
ID=53400111
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
TW106101885A TW201738758A (en) | 2013-12-21 | 2014-11-20 | Instructions and logic to provide base register swap status verification functionality |
TW103140213A TWI578159B (en) | 2013-12-21 | 2014-11-20 | Instructions and logic to provide base register swap status verification functionality |
Family Applications After (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
TW103140213A TWI578159B (en) | 2013-12-21 | 2014-11-20 | Instructions and logic to provide base register swap status verification functionality |
Country Status (2)
Country | Link |
---|---|
US (1) | US20150178078A1 (en) |
TW (2) | TW201738758A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
TWI670721B (en) * | 2019-02-13 | 2019-09-01 | 睿寬智能科技有限公司 | Unusual power-off test method and device for storage device |
Families Citing this family (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106940428B (en) * | 2016-01-04 | 2020-11-03 | 中兴通讯股份有限公司 | Chip verification method, device and system |
US20210026950A1 (en) * | 2016-03-07 | 2021-01-28 | Crowdstrike, Inc. | Hypervisor-based redirection of system calls and interrupt-based task offloading |
US10496311B2 (en) | 2017-01-19 | 2019-12-03 | International Business Machines Corporation | Run-time instrumentation of guarded storage event processing |
US10725685B2 (en) | 2017-01-19 | 2020-07-28 | International Business Machines Corporation | Load logical and shift guarded instruction |
US10452288B2 (en) | 2017-01-19 | 2019-10-22 | International Business Machines Corporation | Identifying processor attributes based on detecting a guarded storage event |
US10579377B2 (en) | 2017-01-19 | 2020-03-03 | International Business Machines Corporation | Guarded storage event handling during transactional execution |
US10732858B2 (en) * | 2017-01-19 | 2020-08-04 | International Business Machines Corporation | Loading and storing controls regulating the operation of a guarded storage facility |
US10496292B2 (en) | 2017-01-19 | 2019-12-03 | International Business Machines Corporation | Saving/restoring guarded storage controls in a virtualized environment |
US10908973B2 (en) * | 2017-07-31 | 2021-02-02 | Mitsubishi Electric Corporation | Information processing device |
CN109214149B (en) * | 2018-09-11 | 2020-04-21 | 中国人民解放军战略支援部队信息工程大学 | MIPS firmware base address automatic detection method |
CN115774574B (en) * | 2021-09-06 | 2024-06-04 | 华为技术有限公司 | Method and device for switching kernel of operating system |
US12014203B2 (en) * | 2021-11-23 | 2024-06-18 | VMware LLC | Communications across privilege domains within a central processing unit core |
US11714649B2 (en) * | 2021-11-29 | 2023-08-01 | Shandong Lingneng Electronic Technology Co., Ltd. | RISC-V-based 3D interconnected multi-core processor architecture and working method thereof |
Family Cites Families (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6128727A (en) * | 1998-08-21 | 2000-10-03 | Advanced Micro Devices, Inc. | Self modifying code to test all possible addressing modes |
US6282633B1 (en) * | 1998-11-13 | 2001-08-28 | Tensilica, Inc. | High data density RISC processor |
EP1109096A3 (en) * | 1999-12-17 | 2004-02-11 | Fujitsu Limited | Processor and method of controlling the same |
US6901505B2 (en) * | 2001-08-09 | 2005-05-31 | Advanced Micro Devices, Inc. | Instruction causing swap of base address from segment register with address from another register |
US7490283B2 (en) * | 2004-05-13 | 2009-02-10 | Sandisk Corporation | Pipelined data relocation and improved chip architectures |
US8099448B2 (en) * | 2005-11-02 | 2012-01-17 | Qualcomm Incorporated | Arithmetic logic and shifting device for use in a processor |
US8296775B2 (en) * | 2007-01-31 | 2012-10-23 | Microsoft Corporation | Efficient context switching of virtual processors by managing physical register states in a virtualized environment |
US7752028B2 (en) * | 2007-07-26 | 2010-07-06 | Microsoft Corporation | Signed/unsigned integer guest compare instructions using unsigned host compare instructions for precise architecture emulation |
US8181003B2 (en) * | 2008-05-29 | 2012-05-15 | Axis Semiconductor, Inc. | Instruction set design, control and communication in programmable microprocessor cores and the like |
US9990201B2 (en) * | 2009-12-22 | 2018-06-05 | Intel Corporation | Multiplication instruction for which execution completes without writing a carry flag |
US8938606B2 (en) * | 2010-12-22 | 2015-01-20 | Intel Corporation | System, apparatus, and method for segment register read and write regardless of privilege level |
-
2013
- 2013-12-21 US US14/138,054 patent/US20150178078A1/en not_active Abandoned
-
2014
- 2014-11-20 TW TW106101885A patent/TW201738758A/en unknown
- 2014-11-20 TW TW103140213A patent/TWI578159B/en active
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
TWI670721B (en) * | 2019-02-13 | 2019-09-01 | 睿寬智能科技有限公司 | Unusual power-off test method and device for storage device |
Also Published As
Publication number | Publication date |
---|---|
TW201531857A (en) | 2015-08-16 |
TWI578159B (en) | 2017-04-11 |
US20150178078A1 (en) | 2015-06-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
TWI578159B (en) | Instructions and logic to provide base register swap status verification functionality | |
JP6207095B2 (en) | Instructions and logic to vectorize conditional loops | |
KR101555412B1 (en) | Instruction and logic to provide vector compress and rotate functionality | |
KR101842058B1 (en) | Instruction and logic to provide pushing buffer copy and store functionality | |
CN107092465B (en) | Instruction and logic for providing vector blending and permutation functions | |
JP6703707B2 (en) | Instructions and logic that provide atomic range operations | |
CN108369509B (en) | Instructions and logic for channel-based stride scatter operation | |
TWI659356B (en) | Instruction and logic to provide vector horizontal majority voting functionality | |
CN108292229B (en) | Instruction and logic for re-occurring neighbor aggregation | |
US20130339649A1 (en) | Single instruction multiple data (simd) reconfigurable vector register file and permutation unit | |
TWI697788B (en) | Methods, apparatus, instructions and logic to provide vector packed histogram functionality | |
JP2019050039A (en) | Methods, apparatus, instructions and logic to provide population count functionality for genome sequencing and alignment comparison | |
EP3391195A1 (en) | Instructions and logic for lane-based strided store operations | |
KR102472894B1 (en) | Methods, apparatus, instructions and logic to provide vector packed tuple cross-comparison functionality | |
TW201732564A (en) | Method and apparatus for user-level thread synchronization with a MONITOR and MWAIT architecture | |
EP2798454A1 (en) | Simd variable shift and rotate using control manipulation | |
US20170185402A1 (en) | Instructions and logic for bit field address and insertion | |
CN108292271B (en) | Instruction and logic for vector permutation | |
TWI729029B (en) | Instructions and logic for vector bit field compression and expansion | |
US10157063B2 (en) | Instruction and logic for optimization level aware branch prediction | |
TW201729081A (en) | Instructions and logic for vector-based bit manipulation | |
CN107408035B (en) | Apparatus and method for inter-strand communication |