TWI273485B - Pipeline microprocessor, apparatus, and method for generating early status flags - Google Patents
Pipeline microprocessor, apparatus, and method for generating early status flags Download PDFInfo
- Publication number
- TWI273485B TWI273485B TW93128090A TW93128090A TWI273485B TW I273485 B TWI273485 B TW I273485B TW 93128090 A TW93128090 A TW 93128090A TW 93128090 A TW93128090 A TW 93128090A TW I273485 B TWI273485 B TW I273485B
- Authority
- TW
- Taiwan
- Prior art keywords
- early
- instruction
- status flag
- flag
- microprocessor
- Prior art date
Links
Landscapes
- Advance Control (AREA)
- Executing Machine-Instructions (AREA)
Abstract
Description
1273485 九、發明說明: 【發明所屬之技術領域】 本發明係關於一種管線式微處理器,特別關於一種管 線式微處理器之早期指令執行。 【先前技術】 現今的微處理器一般係為管線式微處理器(Pipeline microprocessor)。即在微處理器不同的區塊中或是管線的 階層内,同時執行複數個指令。Hennessy和Patterson將 管線定義為“於一個執行動作中有複數個指令重疊執行 之一種執行技術。”他們對於管線式技術於Computer Architecture: A Quantitative Approach, 2nd edition, by John L· Hennessy and David A. Patterson, Morgan Kaufmann Publishers,San Francisco,CA,1966 —書中,提供以下說 明: 一管線就像一組裝線,以汽車組裝線為例:在汽車組 裝線中,包括了很多步驟,每一個步驟都對製造汽車提供 了一些貢獻,且雖然在不同的汽車上,但步驟間之運作上 都是平行的。在電腦管線中,管線中的每一個步驟係用以 完成指令的一部分,就像是組裝線,不同的步驟平行地完 成不同指令的不同部分。每一個步驟就稱為一管線階段 (pipe stage)或一管線區段(pipe segnient)。這些階段相互連 接以形成一管線,指令由管線之一端進入經過階段處理 後,由另一端出去,就像汽車在組裝線上完成一樣。 1273485 同步微處理11係依據時脈週期(elGek eyde)而運作。血 型的做法係於-時脈週期將1令從管線式微處理哭管 線之一階段膽線傳遞至下—個階段。於-汽車^i 中’若在組裝線的某—階段的卫作者因為沒有汽車可㈣1273485 IX. Description of the Invention: [Technical Field] The present invention relates to a pipeline microprocessor, and more particularly to an early instruction execution of a pipeline microprocessor. [Prior Art] Today's microprocessors are generally Pipeline microprocessors. That is, multiple instructions are executed simultaneously in different blocks of the microprocessor or in the hierarchy of the pipeline. Hennessy and Patterson define the pipeline as "an execution technique in which multiple instructions overlap in one execution." They use pipelined techniques in Computer Architecture: A Quantitative Approach, 2nd edition, by John L. Hennessy and David A. Patterson, Morgan Kaufmann Publishers, San Francisco, CA, 1966 - The book provides the following instructions: A pipeline is like an assembly line, taking the car assembly line as an example: in the car assembly line, there are many steps, each step There are some contributions to the manufacture of cars, and although they are on different cars, the operations between the steps are parallel. In the computer pipeline, each step in the pipeline is used to complete a portion of the instruction, just like an assembly line, with different steps completing different parts of the different instructions in parallel. Each step is called a pipe stage or a pipe segnient. These stages are interconnected to form a pipeline that is routed from one end of the pipeline to the staged end and exited from the other end as if the car were completed on the assembly line. 1273485 Synchronous microprocessor 11 operates according to the clock cycle (elGek eyde). The blood type is based on a - clock cycle that transfers 1 from one stage of the pipelined micro-treatment crying line to the next stage. In the - car ^i 'if the guardian of the stage of the assembly line - because there is no car (4)
而閒置’則此組裝線的成果或產量會降低。同樣地,若I 微處理器、’階段在-時脈週期中因為沒有指令可執行而 閒置’則這個程序的成果會降低,—般的情形係稱為管線 氣泡(pipeline bubble)。 造成管線氣泡的一個潛在原因是因為分支指令 (branch instruction)。當分支指令發生時,處理器必須判斷 分支指令之目標位址’且開始在目標位址取出指令,而不 是取出分支指令的下-個連續位址。此外,若分支指令是 -條件分支指令(即-分支是否發生係取決於—明確條件 是否存在),處理器必須判斷分支指令是否發生,除此之 外,處理器還必須判斷目標位址。因為管線階段最後判斷 目標位址及/或分支結果(即分支是否發生)的時間,通常是 在此階段之後的許多階段取出指令,而產生氣泡。 針對這個問題,現代的微處理器通常利用分支預測機 制以在管線中提早預測目標位址與分支結果。而微處理器 設計者亦不停地努力設計具有較高準確性的分支預測 器。然而,分支錯誤預測將浪費許多時間。如上所述,錯 誤預測必須在分支預測階段的管線階段被察覺及修正。由 於與錯誤預測相關之損失係關係到在分支預測與分支錯 誤預測修正階段間許多管線階段的執行。因此,一種用來 1273485 修正在管線之前段所錯誤預測的條件分支指令之裝置和 方法實為當前所需。 此外,條件分支指令指示一分支條件,若成立,則指 示微處理器將分支條件轉移至分支目標位址;否則,微處 理器則繼續取出下一個連續位址的指令。微處理器包含儲 存微處理器狀態的狀態旗標,狀態旗標是被用來檢測以判 斷被條件分支指令所指示的條件是否成立。因此,最終為 了判斷一條件分支指令是否被錯誤預測,微處理器必須檢 測最新狀態旗標的狀態。然而在現今,狀態旗標會在管線 晚期才被檢測以判斷分支條件是否成立,以及分支預測是 否正確。因此,一種在管線早期產生狀態旗標的裝置與方 法實屬當前所需。 最後,狀態旗標的狀態通常是被先前的條件分支指令 指示之結果所影響。例如:條件可能被一種狀態旗標一進 位旗標(carry flag)所設定,其狀態可能被最新的加法指令 結果所決定。然而,影響狀態旗標的指令結果是於微處理 器較低管線階段的執行單元所產生。因此,一種能在管線 早期產生指令的裝置或方法實屬當前所需。 【發明内容】 有鑑於上述課題,本發明提供一種產生管線式微處理 器之早期狀態旗標的裝置及方法。 緣是,依本發明一實施例之產生早期狀態旗標之裝置 包括早期狀態旗標產生邏輯電路,其係用以接收一指令、 1273485 之'以及用以表示早期結果是否為有效 標。若指:二一期?標產生邏輯電路係產生早期狀態旗 被儲存二:、: =態旗標修正指令,則早期狀態旗標將 被產Γ目綱暫存器。若早期旗標係在早期結果之前 等待結構式狀二r二i後的條件指令所使用,用以在 線階段中的旗標修正指令係在早期旗標產生邏輯 掸被重ΐ雜之後,財騎齡被®新預估1早期旗 、重新預估則表示結構式㈣旗標已被更新。 、 侔期狀態㈣可能在管線式微處理时被用來執行 ㈣刀、支指令之早期修正。一分支預測器係預測管線中之 i遽指t之結果。此預測係隨著分支指令在管線中被 总:』刀支修正邏輯電路係於一管線階段被執行,此 隸錢期分支修正邏輯電財被執行之階段 分支指令抵達早期分支修正邏輯電路, ==早:分支指令邏輯電路會利用分支指; 查條件 生。若早输離^ 據支指令之前的指令而產 ’ 心、*係有效的且係錯誤預測的,則早期分 支修正邏輯電路會修正此錯誤_。 八± t鑑於Ί述課題’根據本發明之某些實施例,在條件 二1”之刖的指令之早期結果係由早期執行邏輯電路 早期執仃邏輯電路係於一管線階段被執行,且其 1273485 的處理括5¾、也H ,」,If idle, the results or output of this assembly line will decrease. Similarly, if the I microprocessor, the 'stage is in the -clock cycle, because there is no instruction executable and idle' then the result of this program will be reduced, the general situation is called the pipeline bubble. One potential cause of pipeline bubbles is due to branch instructions. When a branch instruction occurs, the processor must determine the target address of the branch instruction and start fetching the instruction at the destination address instead of fetching the next consecutive address of the branch instruction. In addition, if the branch instruction is a -conditional branch instruction (i.e., whether or not the branch occurs depends on whether the condition exists or not), the processor must determine whether the branch instruction has occurred. In addition, the processor must also determine the target address. Because the pipeline stage finally determines the time of the target address and/or the branch result (i.e., whether the branch occurs), it is usually the fetching of the instructions at many stages after this phase. In response to this problem, modern microprocessors typically utilize a branch prediction mechanism to predict target addresses and branch results early in the pipeline. Microprocessor designers are constantly working to design branch predictors with higher accuracy. However, branch mispredictions will waste a lot of time. As mentioned above, the error prediction must be detected and corrected during the pipeline phase of the branch prediction phase. The loss associated with mispredictions is related to the execution of many pipeline stages between the branch prediction and branch error prediction correction stages. Therefore, an apparatus and method for 1273485 to correct conditional branch instructions mispredicted in the early stages of the pipeline is currently required. In addition, the conditional branch instruction indicates a branch condition, and if so, the microprocessor is instructed to transfer the branch condition to the branch target address; otherwise, the microprocessor continues to fetch the instruction for the next consecutive address. The microprocessor contains a status flag that stores the state of the microprocessor, and the status flag is used to detect if the condition indicated by the conditional branch instruction is asserted. Therefore, in order to determine whether a conditional branch instruction is mispredicted, the microprocessor must check the status of the latest status flag. However, today, the status flag is detected at the end of the pipeline to determine if the branch condition is true and if the branch prediction is correct. Therefore, a device and method for generating a status flag early in the pipeline is currently required. Finally, the state of the status flag is usually affected by the result of the previous conditional branch instruction. For example, the condition may be set by a status flag, a carry flag, whose status may be determined by the most recent addition instruction result. However, the result of the instruction affecting the status flag is generated by the execution unit of the lower pipeline stage of the microprocessor. Therefore, a device or method that can generate instructions early in the pipeline is currently a requirement. SUMMARY OF THE INVENTION In view of the foregoing, the present invention provides an apparatus and method for generating an early state flag for a pipeline microprocessor. Accordingly, an apparatus for generating an early status flag in accordance with an embodiment of the present invention includes an early status flag generation logic circuit for receiving an instruction, 1273485' and to indicate whether an early result is a valid indicator. If you mean: Phase II? The flag generation logic generates an early state flag that is stored in the second:, : = state flag correction instruction, and the early status flag will be generated by the target register. If the early flag is used before the early results, the conditional instruction after the structural formula is used, the flag correction command used in the online phase is after the early flag generation logic is heavily noisy, The age of the new estimate 1 early flag, re-estimation indicates that the structural (four) flag has been updated. The flood season (4) may be used to perform (4) early correction of the knife and branch instructions during pipelined microprocessing. A branch predictor predicts the result of i遽 in the pipeline. This prediction is performed in the pipeline along with the branch instruction: the knives correction logic circuit is executed in a pipeline stage, and the branch period correction logic power is executed at the stage branch instruction to reach the early branch correction logic circuit, = Early: The branch instruction logic circuit will use the branch finger; The early branch correction logic will correct this error if the early, lost, and prior instructions are produced and the heart is valid and mispredicted. In view of the above-described subject matter, in accordance with certain embodiments of the present invention, the early results of the instructions after the condition two 1 are performed by the early execution logic circuit early execution logic circuit in a pipeline stage, and The processing of 1273485 includes 53⁄4, also H,",
亚不是被設計來執行該先前指令 二指令是一狀態旗標修J 二t執了單70早被執行,而執行單元係用以產生最終指令 及最終狀愁旗標值。早期執行邏輯電路是用以執行微 ^理器指令集之-指令子集中的多數指令。特別是,早期 :仃J輯電路係對應到管線之一位址階段且包含一位址 以’早期處理邏輯電路可増加對頻繁被執行之指令們The sub-instruction is not designed to execute the previous instruction. The second instruction is a status flag. The second unit 70 is executed early, and the execution unit is used to generate the final instruction and the final flag value. The early execution logic is used to execute most of the instructions in the subset of instructions in the processor instruction set. In particular, the early: 仃J circuit corresponds to one of the pipeline address stages and contains an address. The early processing logic can add to the frequently executed instructions.
丨热7L别扣令’則此早期結果是無效 狀態旗標修正指令,此早期狀態旗標 有鑑於上述課題,在本發明之某實施例中,早期結果 =儲存於_早期暫存檔巾,其暫存器係對應於微處理器之 。構式暫存樯,早期結果係與每—暫存器之有效指標一起 儲存於早期暫存财。早㈣存難供被當成運算元的早 ^結果至早期執行邏輯電路/位址產生器,以作為產生早期 、σ果之用。右早期暫存檔提供一無效的輸入運算元至早期 ,行邏輯電路,縣期結果之產生是無效的。又若指令是 -狀態旗標的修正指令,則早期狀態旗標是無效的。 以下將參照相關圖式,說明依本發明較佳實施例之產 生g線式微處理裔之早期狀態旗標的裝置及方法,其中相 11 1273485 同的元件係以相同的參照符號加以說明。 請參照圖1所示,依據本發明較佳實施例之一種管線 式微處理器100的方塊示意圖。管線式微處理器100包含 複數個階段在其管線中。圖丨係說明12種管線階段之管 線式微處理器。 管線式微處理器1〇〇包含一 I階段1〇2(指令取得階 段),其係用以取得指令,I階段1〇2包含一指令快取,係 用以快取程式指令,I階段102從指令快取或一連結管線 式微處理器100之系統記憶體取得程式指令,I階段1〇2 在一指令指標暫存器中之記憶位址取得指令,通常在指令 被取出之後指令指標會遞增,因此指令可以連續地被取 出。然而,指令指標值可能依據一分支指令轉變為一非連 續性記憶位址使得管線式微處理器100轉移至一分支目標 位址。I階段亦可包含一分支預測器132,其係預測一分支 指令是否出現於取得指令流之中、分支指令是否發生以及 當分支指令發生時決定分支指令的分支目標位址,因此, 分支預測器132預測當於管線式微處理器1〇〇最後被辨識 之分支指令,是否指示管線式微處理器1〇〇轉移至由分支 指令指示之一分支目標位址(發生),或是指示管線式微處 理器100去取得且執行在分支指令之後的下一個連續指令 (不發生),I階段102係接收第一與第二控制信號154及 156,如下所詳述,其係用以指示I階段102修正由分支預 測器132產生之一分支指令之一預測,於一實施例中,分 支預測态132包含一分支目標位址快取(branch target 12 1273485 address cache,BTAC),其係儲存先前被執行之分支指令之 位址及被辨識之分支目標位址。另外,分支目標位址快取 係儲存預測資訊係基於分支指令之歷史記錄,用於預測分 支指令是否將會發生。其餘實施例包括動態分支歷史記錄 表(dynamic branch history tables)、靜態分支預測器(static branch predictors)及混合靜態與動態分支預測器(hybrid static/dynamic branch predictors),此類技術係在分支預測 領域裡係眾所皆知的。於一實施例中,I階段1 〇2包含了 四個管線階段。 管線式微處理器100亦包括一 F階段1〇4(指令格式化 階段),其係與I階段102連結。於一實施例中,管線式微 處理100之指令集包括可變長度指令(variable iength instructions),例如x86結構式指令集,而不是固定長度指 令(fixed-length instructions),F 階段 104 包括一指令格式 器,係用以分析(parse) —指令位元組流之語法,且將指令 位元組流區分為不同的指令。尤其指令格式器決定在指令 位元組流裡的每一指令的長度與開始位置。於一實施例 中,F階段104包括用以儲存格式化指令之一指令佇列。 管線式微處理器100亦包括一 X階段1〇6(翻譯階 段),其係與F階段104連結。於一實施例中,管線式微處 理器100包括一簡單指令集(reduced instruction set computer,RISC)中心。簡單指令集係執行一自然指令集。 而自然指令(亦被稱為微指令)係比由I階段102所取得 之程式指令(巨集指令)較簡單且較有執行效率之指令。 13 !273485 舉例來說,x86結構式指令集包含巨集指令或是複雜指令 集(complex instruction set computer,CISC)指令。X 階段 l〇6包括一指令翻譯程式,用以將巨集指令翻譯成微指 令。指令翻譯程式係由F階段104之巨集指令佇列擷取格 式化巨集指令,以及將每一巨集指令翻譯成一或多個微指 令以提供給管線式微處理器100之其餘的階段,其通常係 指管線式微處理器100之執行階段。於一實施例中,X階 段106包括一微指令佇列以儲存被翻譯之微指令。於一實 施例中,X階段106包括兩個管線階段。 管線式微處理器100亦包括一 R階段1〇8(暫存階段) 及一 A階段112(位址產生階段),其中,r階段1〇8係與X 階段106連接,而A階段112係與R階段1〇8連接。R階 4又 108 包括一結構式暫存檔(architected register file,ARF) 134 及一早期暫存檔(early register file, ERF) 136。結構式 暫存槽134對應到複數個在管線式微處理器i〇〇執行程式 時可見的暫存器。尤其,一程式指令可指定結構式暫存檔 所對應之一暫存器為一來源運算元,藉以接收一輸入 仏號以運异產生一結果;相同地,程式指令可指示結構式 1存擒134之-對應的暫存器作為一目標運算元,使得指 7之結果可被寫入。指令可明確地或暗示地指示暫存器。 於貝靶例中,結構式暫存檔134對應到有χ86結構式暫 存才田之 ΕΑΧ、ΕΒΧ、ECX、EDX、EBP、ESI、EDI 以及 ESP暫存器(如圖3所示)。早期暫存檔136對應之暫存器, 1273485 其係對應於結構式暫存檔134中之每—暫存器(如圖 不)〇 叮 有鑒於將值儲存於早期暫存檔暫存器可能會導致总 線式微處理器⑽處於—不確定狀態,所以值係儲存於二 構式暫存槽134暫存器以反映管線式微處理器刚之使用 者可見狀S。目此,當管線賴處理_丨⑻之—階段產生 一指令結果以及指令指示將—結構式暫存槽134暫存器當 成1果目標’此結果係不允許寫人至結構式暫存槽134: 到指令不再是不確定的,即,直到指令係被確定是完整的 ^是撤㈣。如下所詳述,相較之下,—指令結果可能在 才曰令被確定是完整的之前,就被寫入至早期暫存檔136。 特別是藉由早期執行邏輯電路/位址產生器138所產生之 一指令結果(包含A階段112且如下所詳述)可能在指令 被確疋疋元整的之剷,就被寫入至早期暫存檔136。若管 線式微處理器100判斷指令本身及沒有其他在此指令之前 的指令之前係有能力產生一例外,則此指令係被確定是完 正的,以及所有在指令之前的分支指令最後將被消除,換 言之,管線式微處理器100最後會判斷每一個在指令之前 的分支指令(不論是否正確地發生),以及每一取得的分支 才曰々之刀支目標位址疋否係正確的。此外,有雲於早期暫 存檔136之值除了不確定之外還可以是有效或無效的,因 此結構式暫存檔134之值總是被確定為有效的,如下所詳 述。因此在早期暫存檔136之中的每一個暫存器亦包括一 相對應的有效位元218(如圖3所示),用以指示儲存於相對 1273485 應之暫存器的值是否有效。當管線式微處理器1〇〇被重新 設定,被初使化的早期暫存檔136是與被初始化的結構式 暫存檔134之值相同的。 管線式微處理器100管線之R階段108亦包含了一結The early result is an invalid state flag correction command. This early state flag is in view of the above problem. In an embodiment of the present invention, the early result is stored in the early temporary archive towel. The register corresponds to the microprocessor. After the construction is temporarily stored, the early results are stored in the early temporary savings together with the effective indicators of each temporary register. Early (four) is difficult to be used as the early result of the operand to the early execution logic circuit / address generator, in order to produce early, σ fruit. The right early temporary archive provides an invalid input operand to the early, row logic circuit, and the generation of the county result is invalid. If the instruction is a correction flag of the status flag, the early status flag is invalid. Hereinafter, an apparatus and method for generating an early state flag of a g-line type micro-processed person according to a preferred embodiment of the present invention will be described with reference to the accompanying drawings, wherein the same elements are denoted by the same reference numerals. Referring to Figure 1, a block diagram of a pipeline microprocessor 100 in accordance with a preferred embodiment of the present invention. The pipelined microprocessor 100 includes a plurality of stages in its pipeline. The diagram shows the pipeline microprocessors of the 12 pipeline stages. The pipelined microprocessor 1A includes an I stage 1〇2 (instruction fetch stage) for fetching instructions, and an I stage 1〇2 includes an instruction cache, which is used to cache program instructions, and the I stage 102 The instruction cache or a system memory of the pipelined microprocessor 100 acquires a program instruction, and the I stage 1〇2 acquires an instruction in a memory address in an instruction index register, and the instruction index is usually incremented after the instruction is fetched. Therefore the instructions can be taken out continuously. However, the instruction index value may be converted to a non-contiguous memory address in accordance with a branch instruction such that the pipelined microprocessor 100 is transferred to a branch target address. The I stage may also include a branch predictor 132, which predicts whether a branch instruction occurs in the fetch instruction stream, whether the branch instruction occurs, and determines the branch target address of the branch instruction when the branch instruction occurs. Therefore, the branch predictor 132 predicting whether the pipelined microprocessor 1 is finally recognized by the branch instruction, whether the pipelined microprocessor 1 is instructed to branch to a branch target address indicated by the branch instruction (occurring), or indicating a pipelined microprocessor 100 to obtain and execute the next consecutive instruction (not occurring) after the branch instruction, the I stage 102 receives the first and second control signals 154 and 156, as detailed below, to indicate that the I stage 102 correction is The branch predictor 132 generates one of the branch instruction predictions. In one embodiment, the branch prediction state 132 includes a branch target 12 1273485 address cache (BTAC) that stores the previously executed branch. The address of the instruction and the identified branch target address. In addition, the branch target address cache store prediction information is based on the history of the branch instruction and is used to predict whether the branch instruction will occur. The remaining embodiments include dynamic branch history tables, static branch predictors, and hybrid static/dynamic branch predictors, which are in the field of branch prediction. The inside is well known. In one embodiment, Phase I 1 〇 2 contains four pipeline stages. The pipelined microprocessor 100 also includes an F stage 1〇4 (instruction formatting stage) that is coupled to the I stage 102. In one embodiment, the instruction set of pipelined microprocessing 100 includes variable iength instructions, such as x86 structured instruction sets, rather than fixed-length instructions, and F stage 104 includes an instruction format. The parser is used to parse the syntax of the instruction byte stream and to separate the instruction byte stream into different instructions. In particular, the instruction formatter determines the length and start position of each instruction in the instruction byte stream. In one embodiment, the F stage 104 includes an instruction queue for storing one of the formatting instructions. The pipelined microprocessor 100 also includes an X stage 1 〇 6 (translation stage) that is coupled to the F stage 104. In one embodiment, pipelined microprocessor 100 includes a reduced instruction set computer (RISC) center. A simple instruction set executes a natural instruction set. Natural instructions (also known as microinstructions) are simpler and more efficient instructions than program instructions (macro instructions) obtained by I stage 102. 13 !273485 For example, the x86 structured instruction set contains macro instructions or complex instruction set computer (CISC) instructions. The X stage l〇6 includes an instruction translation program for translating macro instructions into microinstructions. The instruction translation program retrieves the format macro instruction from the macro instruction queue of the F stage 104, and translates each macro instruction into one or more micro instructions to provide the remaining stages of the pipelined microprocessor 100. Generally referred to as the execution phase of the pipelined microprocessor 100. In one embodiment, the X stage 106 includes a microinstruction queue to store the translated microinstructions. In one embodiment, the X stage 106 includes two pipeline stages. The pipelined microprocessor 100 also includes an R phase 1〇8 (temporary phase) and an A phase 112 (address generation phase), wherein the r phase 1〇8 is connected to the X phase 106, and the A phase 112 is coupled to R stage 1〇8 connection. The R-order 4 and 108 include an architecturally registered register file (ARF) 134 and an early register file (ERF) 136. The structured scratchpad 134 corresponds to a plurality of registers visible when the pipelined microprocessor executes the program. In particular, a program instruction may specify that one of the scratchpads corresponding to the structured temporary archive is a source operand, thereby receiving an input nickname to generate a result by operation; and, similarly, the program instruction may indicate the structure 1 134. The corresponding register is used as a target operand, so that the result of the finger 7 can be written. The instructions may indicate the register explicitly or implicitly. In the Yubei target case, the structural temporary archive 134 corresponds to the χ86 structured temporary storage of the 才, ΕΒΧ, ECX, EDX, EBP, ESI, EDI, and ESP registers (as shown in Figure 3). The temporary temporary archive 136 corresponds to the scratchpad, 1273485 which corresponds to each of the temporary archives in the structured temporary archive 134 (as shown in the figure), in view of the fact that storing the value in the early temporary archive register may result in a total The line microprocessor (10) is in an indeterminate state, so the value is stored in the two-configuration temporary storage slot 134 register to reflect the user-visible S of the pipelined microprocessor. For this reason, when the pipeline processing _丨(8)-stage generates an instruction result and the instruction indicates that the structural temporary storage slot 134 register is regarded as a target, this result does not allow the writer to write to the structural temporary storage slot 134. : The instruction is no longer undefined, that is, until the instruction is determined to be complete ^ is removed (four). As detailed below, in contrast, the result of the instruction may be written to the early temporary archive 136 before the order is determined to be complete. In particular, the result of one of the instructions generated by the early execution logic/address generator 138 (including phase A 112 and detailed below) may be written to the early stage when the instruction is slashed. Temporary filing 136. If the pipelined microprocessor 100 determines that the instruction itself and other instructions prior to the instruction are capable of generating an exception, then the instruction is determined to be complete, and all branch instructions prior to the instruction are eventually eliminated. In other words, the pipelined microprocessor 100 will ultimately determine each branch instruction prior to the instruction (whether or not it occurs correctly), and whether each acquired branch is the correct target address. In addition, the value of the cloud in the early temporary archive 136 may be valid or invalid, except for the uncertainty, so the value of the structured temporary archive 134 is always determined to be valid, as detailed below. Therefore, each of the early temporary archives 136 also includes a corresponding valid bit 218 (shown in Figure 3) for indicating whether the value stored in the register corresponding to 1273485 is valid. When the pipelined microprocessor 1 is reset, the initial temporary archive 136 that was initialized is the same value as the initialized structured temporary archive 134. The R phase 108 of the pipelined microprocessor 100 pipeline also contains a knot
構式狀態暫存器(architected EFLAGS register) 162,以及 A 階段112包含了一早期狀態暫存器(early eflags reglSter)142。結構式狀態暫存器162及早期狀態暫存器i42 係配置狀態旗標以指出指令結果之屬性,像是結果是否為 零、是否產生進位或為負值。於一實施例中,每—狀態旗 標係由一位元表示。於一實施例中,結構式狀態暫存器^62 包括一 X 8 6結構式狀態暫存器,X 8 6結構式狀態暫存器含 有以下狀恝旗標:溢位旗標(〇verfl〇w flag,〇F)、正負旗標 (S1gn flag,SF)、零位旗標(zer。flag,ZF)、同位旗標加 flag’PF)以及進位旗標(carryflag CF),如目3所示。 式微處理If 1GG包括在指示條件碼之設定條件指令之;旨、 令。-條件碼係指示—或多個狀態旗標之—狀態。若曰% 旗標之現行狀態係等同於在條件碼中之狀態,則此= 真且管線式微處理器1GG執行-被條件指令所指示之’、、、 算’否則,被指示之運算Μ會被執行的。以 指令為一條件分支指令的例子,條件分支指令二 中若指令遇到㈣之情況,則此條件係一跳躍指〜、、、。構 件指令係指示一條件碼及一位移以計算一分支目;而條 址。- W之例子係-皿㈣⑽—㈣指令;^ 16 1273485 零條件’’為真),則管線式微處理器100轉移至被分支指令 (即條件分支指令發生)所指示之分支目標位址;然而,若 零j旗標(ZF)被設定(即,,非零,,條件為假),%管線式微處 理裔100取出在條件分支指令之後之連續指令。其他以條 件x86 |曰7的例子有SETcc、L〇〇pcc、及〔Μονά等 指令。 ,、、、"構式狀恶暫存器162包括在管線式微處理器100執 行的私式可看見的狀態旗標。尤其,—條件程式指令可依 據在結構式狀態暫存器162裡之狀態指標以指出一條件 碼。、早期㈣暫存11 142係包括之狀態旗標仙當於在結 構式狀暫存器162中之每-狀態旗標,如圖3所示。與 ,構式暫存檔134之間的關係及早期暫存稽136相似的 疋μ配置於結構式狀態暫存器162的值反應出管線式微處 理為1〇0使用者可見之狀態。配置於早期狀態暫存器142 之值可反映出一管線式微處理器100之不確定狀態。因 此,*官線式微處理器執行一修正一或多個狀態旗標之指 令時,狀態旗標在結構式狀態暫存器162中不會被更新, 直至私令不再為不確定的。反之,如下所詳述,狀態旗標 可於扣令確定完成之前,在早期狀態暫存器142中被更 新。尤其,在指令確定被完成之前,依據一被早期執行邏 輯電路/㈣處理|| 138所執行之指令,管線式微處理器 可在早期狀態暫存器I42中被更新,亦被稱為早期執 行單元146。此外,有鑒於早期狀態暫存器ι42值可能為 有效或無致的,結構式狀態暫存器162之值係永遠有效 17 Ϊ273485 ^如下之詳細討論。因此,如圖3所示,早期狀態暫存 ^42亦包括-有效位元246,以表示配置於早期狀態暫 存益142之值是否有效。早期狀態暫存器142技初始化 為相同於管線式微處理器·被重新設定時之 暫存器162被初始化之值。於圖3之實施例表示早期狀= ==器i42之單-有效位元246。然而其餘實施錄 ,考慮在早期狀態暫存器142中哪些有效值係為每一贿 態旗標而被維持的。 AP皆段112之早期執行賴電路/位址產生器138係產 =-早期結果及有效難,錢被提供經由—早期結果匯 =排152回到卩階段⑽’如下所詳述。早期執行邏輯電 ^位二止產生1丨38亦依據輸人運算元並為記憶體存取而 立址計生記憶位址,就如結構式暫存檔 供運异兀,早期暫存檔136及/或指令所提供之 ,异疋,如-位移或偏移量。記憶位址係可被指令暗示地 “。讀位址亦可由指令暗示地指示,> :堆疊 :-位置之位址;而上述動作乃是根據暗示地指:堆二 標暫存器(即ESP)或區塊指標暫存器(即 且曰 或彈跳指令巾。 (卩咖),如在一發送 管線式微處理器議亦包括一 j階段114 連結。J階…含早期分支修正邏二A 化攄早期刀支修正邏輯電路ΜΑ藉由第"控制信號W, ^據早期狀態暫存器142選擇性地修正分支預測,如下詳 18 1273485 管線式微處理器100亦包括 匕括一 D階段116,1An architected EFLAGS register 162, and an A phase 112 includes an early state register (early eflags reglSter) 142. The structured state register 162 and the early state register i42 configure a status flag to indicate the attributes of the result of the instruction, such as whether the result is zero, whether a carry occurs or a negative value. In one embodiment, each state flag is represented by a single bit. In one embodiment, the structured state register 62 includes an X 8 6 structured state register, and the X 8 6 structured state register contains the following flag: an overflow flag (〇verfl〇) w flag, 〇F), positive and negative flags (S1gn flag, SF), zero flag (zer.flag, ZF), parity flag plus flag 'PF), and carry flag CF (carryflag CF), as shown in item 3 Show. The micro-processing If 1GG is included in the instruction condition setting instruction condition code; - The condition code indicates - or status of multiple status flags. If the current state of the 曰% flag is equivalent to the state in the condition code, then this = true and the pipelined microprocessor 1GG executes - as indicated by the conditional instruction, ',, 'calculate' otherwise, the indicated operation will be Being executed. In the case where the instruction is a conditional branch instruction, if the instruction encounters (4) in the conditional branch instruction 2, then the condition is a jump pointer 〜, ,,. The component command indicates a condition code and a displacement to calculate a branch; and the address. - W example - dish (four) (10) - (four) instruction; ^ 16 1273485 zero condition ''true", the pipelined microprocessor 100 is transferred to the branch target address indicated by the branch instruction (ie, the conditional branch instruction occurs); however If the zero j flag (ZF) is set (ie, non-zero, the condition is false), the % pipelined microprocessor 100 takes the consecutive instructions following the conditional branch instruction. Other examples of the condition x86 |曰7 are SETcc, L〇〇pcc, and [Μονά]. The , , , " configuration cache register 162 includes a privately visible status flag that is executed by the pipelined microprocessor 100. In particular, the conditional program instructions can be based on a status indicator in the structured state register 162 to indicate a condition code. The early (four) temporary storage 11 142 is included in the state flag in the structural state register 162, as shown in FIG. The relationship between the configuration and the temporary archive 134 is similar to that of the early temporary storage 136. The value of the configuration in the structured state register 162 reflects the state in which the pipelined microprocessing is visible to the user of 1〇0. The value configured in the early state register 142 may reflect the indeterminate state of a pipelined microprocessor 100. Thus, when the *offline microprocessor executes an instruction to correct one or more status flags, the status flag is not updated in the structured status register 162 until the private order is no longer indeterminate. Conversely, as detailed below, the status flag can be updated in the early status register 142 before the deduction is completed. In particular, the pipelined microprocessor can be updated in the early state register I42, also referred to as the early execution unit, in accordance with an instruction executed by the early execution logic/(4) processing|| 138 before the instruction determination is completed. 146. In addition, the value of the structured state register 162 is always valid in view of the fact that the early state register ι42 value may be valid or not. 17 Ϊ 273485 ^ as discussed in detail below. Thus, as shown in Figure 3, the early state temporary storage ^42 also includes a valid bit 246 to indicate whether the value of the early state temporary benefit 142 is valid. The early state register 142 is initialized to the same value as the register 162 that was initialized when the pipelined microprocessor was reset. The embodiment of FIG. 3 represents the single-effective bit 246 of the early state === the device i42. However, the rest of the implementation records consider which of the valid values in the early state register 142 are maintained for each bribe flag. The early execution of the AP segment 112 relies on the circuit/address generator 138 system. = - Early results and effective difficulties, money is provided via the early results sink = row 152 back to the 卩 phase (10)' as detailed below. The early execution of the logic circuit 2 generates 1丨38, which is also based on the input operator and addresses the memory address for memory access, such as the structured temporary archive delivery, early temporary archive 136 and/or instructions. What is provided, such as - displacement or offset. The memory address can be implicitly indicated by the instruction ". The read address can also be implicitly indicated by the instruction, >: the address of the stack: - location; and the above actions are implicitly referred to as: the heap binary register (ie ESP) or block indicator register (ie, and 弹 or bounce command towel. (卩咖), such as a send pipeline microprocessor also includes a j-stage 114 link. J-stage... with early branch correction logic II The early knives correction logic circuit 选择性 selectively corrects the branch prediction according to the early state register 142 by the "control signal W, as follows. 18 1273485 The pipelined microprocessor 100 also includes a D stage 116. ,1
階段114連接,一 G階段118係蛊彳、係與J 、” D『白段116連接,以;5 一 Η階段122係與G階段118遠杻^ 逆楼U及 逆接。D階段116、 118與Η階段122包含一資料儲存 快取資料。於-實施例中,資料儲 ^統記憶體中 ί含三個管線階段及需要三時脈週期之存取時储 存區係由早純行邏輯電路/位址產生器138產 址而被取出。執行記龍之資料载人運算的指令係於㈣ 段116、G階段118及Η階段122中執行。 管線式微處理器100亦包括一 Ε階段124(執行階段), 其係與η階段m連接。Ε階段m包含執行單元146以 執行,令運算。執行單元146可能包含電路,例如加法器、 ί去裔、乘法器、除法器、位移器、旋轉器、執行布林運 ^之邏輯電路、執行超越及對數功能之邏輯電路等,以產 生指令之最終結果。於一實施例中,執行單元146包含一 ι數單元、一浮點數單元、一 ΜΜΧ單元及一 SSE單元。 才曰々結果藉由執行單元146來產生成為永遠正確之指令結 果。執行單元146接收之輸入運算元或來源運算元永遠係 $效的。執行單元146輸入運算元之來源包含結構式暫存 祂134、結構式狀態暫存器162、資料存取之運算元、指 7中指示之直接或固定運算元以及來自其他管線所發送 之運算元。特別的是,執行單元146不接收來自早期暫存 檔136或早期狀態暫存器142之不明確的運算元。 I273485 /管線式微處理器100亦包含一 s階段126(儲存階段), 二係與E階段m連接。s階段⑶執行儲存運算以儲存 貝料至資料儲存區及/或系統記憶體,如由執行單元146所 產生之指令結果。另外,s階段126包括晚期分支修正邏 輯電路148,依據結構式狀態暫存器、162並藉由第二控制 信號156來修正分支預測,如下所詳述。 管線式微處理器1〇〇亦包括一 w階段128(結果寫回 階段),其係與s階段126連接。w階段128係藉由結果 匯流排158來寫入指令結果至結構式暫存檔134及結構式 狀態暫存器162以更新管線式微處理器1〇〇之結構式狀 態。 於一實施例中,管線式微處理器100可以是單一指令 派送微處理器、純量微處理器、或單一執行微處理器。即, 自指令派送處之管線式微處理器100的每一時脈週期所派 送之一指令,或指令產生階段(I階段102通過X階段106) 至指令執行階段(R階段108通過W階段128),與超純量 微處理器相比較,它可在每一時脈週期派送一或多個執行 指令。然而,在這裡描述之方法及裝置並不限於一純量微 處理器。於一實施例中,管線式微處理器100包含一依次 派送微處理器。即,指令係於程式中被指示之次序被派送 以執行,不像一些微處理器具有不依次序派送指令206的 能力。 請參照圖2所示,依本發明之一方塊示意圖以詳細說 明圖1之管線式微處理器100之R階段108、A階段丨12、 1273485 及J階段114。如圖1所示,R階段1〇8包括結構式暫存 祂134、早期暫存檔136、及結構式狀態暫存器162。結構 式狀態暫存器162係藉由結果匯流排158被w階段128 所更新。R階段108係接收來自χ階段1〇6之指令2〇6。 除了本身之指令位元組外,指令2〇6可包括解碼資訊。指 々206私示一指令之型態,如—加法或分支等。指令206 亦可指不一條件碼。指令2〇6亦可藉由一標籤指示一目標 運算元位置。尤其,目標運算元可指示其中—位於結構式 暫存彳田134之暫存器以作為—指令206結果之目的位置。 圖b 1,結構式暫存檔134透過結果匯流排158接收來自w 階段12 8之指令結果。一目標運算元標籤2 7 8係被提供以 作為一選擇器,以透過匯流排輪入至結構式暫存檔134, 來選擇某個將被與來自結果Μ流排158之指令結果一同更 新之暫存器。如圖丨所示,早期暫存檔136藉由早期結果 匯流排152接收來自A階段112之早期結果242。指令2〇6 包括目"^運算元標籤216,其係被提供當作一選擇器作 為輸入早期暫存檔136以選擇某一將與早期結果242 一同 被更新之暫存器。被提供至早期暫存檔136之目標運算元 標籤216係從將指令2〇6傳送至a階段112之一管線暫存 器232而來的。 指令206亦可經由來源運算元標籤214以指示一或多 個來源運算元。於一實施例中,指令206可指示三個來源 運异元。指令206係經由來源運算元標籤214以指示暫存 器來源運算元。來源運算元標籤214被當作一選擇器以輸 21 1273485 入至結構式暫存檔134及早期暫存檔136,用以選擇哪一 個暫存器將被提供為來源運算元至指令2〇6。指令2〇6可 能亦指示一直接/固定(例如取代或抵銷)運算元222。 亦如圖3所示,R階段108亦包括用於每一早期暫存 檔136暫存器之有效位元218。早期暫存檔有效位元2以 ,由A階段112經由早期結果匯流排接收一早期結果有效 L號244。早期結果有效信號244係用以更新對應於藉由 目標運算元標籤216所選擇之早期暫存檔136暫存器之 效位元218。 一 R階段108亦包括-多工器226,其係選擇來源運算 元以使指令206進入R階段。多工器226係接收由結構= ,存彳曰134及早期暫存檔136輸入之來源運算元。於一實 ^例中’結構式暫存檔134包括三個讀取#,它們的兩二 輪出係用以作為多工器226之輸入以於每一時脈週期脖 兩個來源運算元。於—實施例中,早期暫存槽136包含兩 固頃取琿’它們的輸出係用以作為多工器226之輸入以於 每一時脈週期提供㈣來源運算元。多卫器挪亦接收 直接/固^運算元222’其可能包括於指令寫之中。多工 器226亦由a階段112接收早期結果242。 另外,多工器226接收與每一運算元輸入相關之一有 效位元輸人。有效位元與由早期暫存财效位元218所提 供之早期暫存檔!36所接收之運算元相關。由讀段ιΐ2 所接收的早期結果242運算元相_有效 有效信號冰。㈣構式暫存㈣錢直細^算 22 1273485 所提供之運算元之有效位_、、土、 134及直接/固定222運曾水遠為真,即結構式暫存檔 所選擇的來源運算元及係永遠有效的。藉由多工器226 -來源運算元管:暫存器有效位元’係被分別提供至 236,用以供應至讀段8及—有效位元管線暫存器 管線暫存器238係用以。於一實施例中,來源運算元 暫存器係用以儲存:個—個來源運异兀,且有效位元 於-實ml 的有效位元。 ^ 1如例中,管綠4Phase 114 is connected, a G phase 118 system is connected, and a system is connected to J, "D" white segment 116; 5 a phase 122 system and a G phase 118 are far away ^ reverse building U and reverse connection. D phase 116, 118 And the data phase cache 122 includes a data storage cache data. In the embodiment, the data storage system memory has three pipeline stages and requires three clock cycles to access the memory area by the early pure logic circuit. The address generator 138 is taken out of the address and executed. The instructions for performing the data maneuver calculation are recorded in (4) segment 116, G phase 118, and Η phase 122. The pipelined microprocessor 100 also includes a stage 124 ( The execution phase) is connected to the η phase m. The m phase m includes an execution unit 146 to perform operations, and the execution unit 146 may include circuitry such as adders, Descendants, multipliers, dividers, shifters, rotations. The logic circuit of the Boolean operation, the logic circuit for performing the transcendental and logarithmic functions, etc., to generate the final result of the instruction. In an embodiment, the execution unit 146 includes an integer unit, a floating point unit, and a ΜΜΧ unit and an SSE unit. The result is an instruction result that is always correct by the execution unit 146. The input operand or source operand received by the execution unit 146 is always valid. The execution unit 146 inputs the source of the operand including the structural temporary storage 134, The structured state register 162, the data access operand, the direct or fixed operand indicated in the finger 7, and the operands sent from other pipelines. In particular, the execution unit 146 does not receive the early temporary archive 136 or The ambiguous operand of the early state register 142. The I273485/pipelined microprocessor 100 also includes an s phase 126 (storage phase), the second phase is connected to the E phase m. The s phase (3) performs a storage operation to store the material to The data storage area and/or system memory, such as the result of the instructions generated by execution unit 146. Additionally, stage 126 includes late branch correction logic 148, according to structured state register, 162 and by second control signal 156 to correct the branch prediction, as detailed below. The pipelined microprocessor 1〇〇 also includes a w phase 128 (result write back phase), which is linked to the s phase 126 The w stage 128 is used to write the result of the instruction to the structured temporary archive 134 and the structured state register 162 by the result bus 158 to update the structural state of the pipelined microprocessor 1 。. In an embodiment, The pipelined microprocessor 100 can be a single instruction dispatch microprocessor, a scalar microprocessor, or a single execution microprocessor. That is, one instruction is dispatched per clock cycle of the pipelined microprocessor 100 from the instruction dispatch, Or the instruction generation phase (I phase 102 through X phase 106) to the instruction execution phase (R phase 108 through W phase 128), which can deliver one or more executions per clock cycle compared to a super-scalar microprocessor instruction. However, the methods and apparatus described herein are not limited to a scalar microprocessor. In one embodiment, the pipelined microprocessor 100 includes a sequential dispatch microprocessor. That is, the instructions are dispatched for execution in the order indicated in the program, unlike some microprocessors having the ability to not dispatch instructions 206 sequentially. Referring to FIG. 2, a R-phase 108, an A-stage 丨12, 1273485, and a J-stage 114 of the pipelined microprocessor 100 of FIG. 1 are illustrated in detail in accordance with a block diagram of the present invention. As shown in FIG. 1, R stage 1 〇 8 includes a structured temporary storage 134, an early temporary archive 136, and a structured state register 162. The structured state register 162 is updated by the w stage 128 by the result bus 158. The R stage 108 receives the command 2〇6 from the stage 1〇6. In addition to its own instruction byte, instruction 2〇6 may include decoding information. The index 206 privately indicates the type of an instruction, such as - addition or branching. The instruction 206 can also refer to a different condition code. The instruction 2〇6 can also indicate a target operand position by a tag. In particular, the target operand may indicate the location in which the temporary register of the structured temporary storage 134 is used as the result of the instruction 206. Figure b1. The structured temporary archive 134 receives the result of the instruction from the w stage 12 8 through the result bus 158. A target operand tag 2 7 8 is provided as a selector to route through the bus to the structured temporary archive 134 to select a temporary update to be updated with the result of the command from the result bus 158. Save. As shown in the figure, the early temporary archive 136 receives the early results 242 from the A stage 112 by the early results bus 152. Instruction 2〇6 includes a "^ operand tag 216, which is provided as a selector as input early temporary archive 136 to select a register to be updated with earlier results 242. The target operand tag 216 provided to the early temporary archive 136 is transferred from the instruction 2〇6 to the pipeline register 232 of the a-stage 112. The instructions 206 can also be via the source operand tag 214 to indicate one or more source operands. In one embodiment, the instructions 206 may indicate three source transport elements. The instruction 206 is via the source operand tag 214 to indicate the scratchpad source operand. The source operand tag 214 is treated as a selector to input 21 1273485 into the structured temporary archive 134 and the early temporary archive 136 for selecting which register will be provided as the source operand to the instruction 2〇6. Instruction 2〇6 may also indicate a direct/fixed (e.g., replace or offset) operand 222. As also shown in FIG. 3, R stage 108 also includes a valid bit 218 for each early temporary file 136 register. The early temporary valid bit 2 is received by the A phase 112 via the early result bus to receive an early result valid L number 244. The early result valid signal 244 is used to update the effect bit 218 corresponding to the early temporary archive 136 register selected by the target operand tag 216. An R stage 108 also includes a multiplexer 226 that selects the source operand to cause the instruction 206 to enter the R stage. The multiplexer 226 receives the source operands input by the structure = , memory 134 and early temporary archive 136. In a practical example, the structured temporary archive 134 includes three read #, and their two rounds are used as inputs to the multiplexer 226 to neck two source operands per clock cycle. In the embodiment, the early temporary storage slot 136 includes two solids 珲' their outputs are used as inputs to the multiplexer 226 to provide (d) source operands per clock cycle. The multi-guard is also received by the direct/solid operator 222' which may be included in the instruction write. The multiplexer 226 also receives the early results 242 from the a stage 112. Additionally, multiplexer 226 receives one of the valid bit inputs associated with each operand input. The effective bit and the early temporary archive provided by the early temporary savings bit 218! 36 received operand related. The early result 242 operand phase _ effective valid signal ice received by the read ιΐ2. (4) Construction temporary storage (4) Money is straightforward ^ Calculation 22 1273485 The effective bits of the operation elements provided by _,, soil, 134 and direct/fixed 222 are Zeng Shuiyuan true, that is, the structural source and system selected by the structural temporary archive Always effective. The multiplexer 226 - source arithmetic unit: register valid bits are provided to 236 for supply to the read 8 and the valid bit pipeline register pipeline register 238 is used. . In one embodiment, the source operand register is used to store: one source and one source, and the effective bits are valid bits of the real ml. ^ 1 As in the example, tube green 4
排(圖中未顯示),盆将脾:娀處理器100亦包括轉遞匯流 階段124及S階段126行單元146所產生之結果,由E 轉遞匯流排係被提供,至R階段108以供應運算元。 户人#千一十、 為夕工器226的輸入。若R階段108 Λ源運异兀,其標籤係與由A階段112直到Η 1¾半又122中之一日沾栌敛 的知戴不匹配,但其係與E階段124或 二二目的標鐵匹配,接著’早期旗標產生/控 田璉’、路212控制多卫器、226以選擇轉遞匯流排以提供 取新結果作為運算元。若一運算元係在轉遞匯流排上,其 係永遠有效的。 官線式微處理器1〇〇亦包括早期旗標產生/控制邏輯 電路212。早期旗標產生/控制邏輯電路212係接收結構式 狀態暫存器162作為輸入。早期旗標產生/控制邏輯電路 212亦接收在R階段1〇8以及a階段112中之指令206作 為輸入。早期旗標產生/控制邏輯電路212亦接收多工器 226的輸出作為輸入,即R階段1〇8來源運算元以及關聯 式有效位元。早期旗標產生/控制邏輯電路212亦接收管線 23 1273485 暫存器232、有效位元管線暫存器236及來 — 暫存器238之輸出作為輸入,即化段112;旨令之 =及關聯式有效位元。早期旗標產生/控制邏輯電路212 早期緒果—242作為輸入。早期旗標產生/控制邏輯電 路12亦接收由母-管線階段之下的a階段ii2之— 修正指令存在信號2G2作為輸人。旗標修正指令存在^ -真值係指示對應階段具有修正結構 二之-指令,且結構式狀態暫存請尚她 ” 修正指令存在信號搬係早期 =電…接收新的分支修正晚期: 輸入。在新的分支修正晚期信號268中之一直 ” 圖1所示之新的晚期分支修正邏輯電路148修正 :預:二其係暗示管線式微處理器1〇〇已清除。如圖5; 速’新的分支修正晚期信號268亦用以判 ==被恢復及確認。早期旗標產生/控二 =2亦接收存在於切段112之下的每—管線階段的指 1目的標戴204作為輸入。目的標鐵綱 ㈣請之中由指令206指示之_來源運算元是否= 結構式暫存檔134所提供,早期暫存檔136或A階段112 早期結果242係參照圖4且如下所述。於一實施例中,指 令篇:指不一記憶體來源運算元,即一運算元之位置係 位:斤,的。通常記憶運算元係存在於管線 式你i:處理斋1 〇〇 f線之柄u比以 吕踝之低P白奴,例如可能是有資料的情 24 1273485 況,那是具有目的記憶位址與R階段1〇8指令記憔 一 位址相匹配的一先前的儲存指令的主題。雖然未^示异= 亦接收記憶運算元作為輸入。早期旗標產:控多 璉輯電路212亦接收依記憶運算元信號266作為輪入二」 係指不由指令所指示之記憶體來源運算元是否疒 其 ,以提供至多工器226。記憶運算元信號加‘二: 疋否必須關閉R階段108指令206以及早期結果2 斲 有效,如圖4所描述。 ϋ 2是否 依據其輸入,早期旗標產生/控制邏輯電路212 同的控制信號。早期旗標產生/控制邏輯電路212產生不 ,器控制信號282,其係控制多工器226以適當的== 々206的來源運算元。早期旗標產生/控制邏輯電路2 =钿 二生早期結果有效信號244。早期旗標產生/控制邏輯:: 亦產生用以儲存於早期狀態暫存器142中之一 態暫存器1 262,及用以更新在早期狀態有效暫 之值之一控制信號264。在早期狀態有效暫存器中之值4曰6 ,當微處理器重新設文時被初始化成為—有效值。早期= ^暫存器142及早期狀態有效暫存器246作為管線暫存器 提供早斯狀態暫存器及有效位元至j階段114。早期旗標 ,生/控制邏輯電路212亦產生一暫停信號228,其係提供 g線暫存器232、234、有效位元管線暫存器236及來源運 算元管線暫存器238關閉R階段1〇8。於一實施例中,管 線暫存器232、234、236及238包括多路傳輸暫存器,其 係用以於暫停信號228為真時保持它們目前的狀態直到下 25 1273485 一個時脈週期。暫停信號228之操作以下將參照圖4以詳 細說明。 A階段112包括早期執行邏輯電路/位址產生器138, 其係由來源運算元管線暫存器238接收來源運算元,且依 據來源運算元管線暫存器238而產生早期結果242。早期 執行邏輯電路/位址產生器138包含算數邏輯電路272、布 林運算邏輯電路274及位移邏輯電路276。早期執行邏輯 電路/位址產生器138係用以產生實際的位址給記憶體操 作。早期執行邏輯電路/位址產生器138亦用以執行管線式 微處理器100之指令集之指令所需要之運算之一子集。於 是,早期執行邏輯電路/位址產生器138係可由如圖1所示 之執行單元146執行運算之一子集。早期執行邏輯電路/ 位址產生器138係用以執行運算之一子集,其係最一般執 行的運算。於一實施例中,最通常執行的運算在實質上亦 與最快速執行的運算重疊(即需要一相對的短時間去執 行,因此他們可以在一單一時脈週期被執行)且需要相對少 量的硬體,特別是在硬體以外已經要求產生記憶位址。於 一實施例中,算數邏輯電路272係用以執行一加法、減法、 增量及減量;然而,算數邏輯電路272不是用以執行進位 的加法或借位的減法。於一實施例中,布林運算邏輯電路 274係用以執行一布林及(AND)、或(OR)、互斥或(XOR)、 反及(NAND)、具正負延伸之移動及具零位延伸之移動; 然而,布林運算邏輯電路274不是用以執行一位元組替 換。於一實施例中,位移邏輯電路276係用以執行一左位 26 1273485 移或右位移;然而,位移邏輯電路276不是用以執行旋轉 或旋轉進位運算。雖然具體實施例被描述,但早期執行邏 輯電路/位址產生器138裡進行運算的具體子集,本發明不 侷限於特別的實施例,並且熟習微處理器設計之技術者, 可能容易體會早期執行邏輯電路/位址產生器138可基於 管線式微處理器100之特定指令集用以執行一運算之特定 子集以及目標執行及電路實際終點。由早期執行邏輯電路 /位址產生器138所產生之早期結果242係被提供至一早期 結果管線暫存器254以儲存且隨後供應至j階段114。 J階段包括如圖1所示之早期分支修正邏輯電路144。 早期分支修正邏輯電路144係接收早期狀態暫存器142、 早期狀悲有效暫存器246及早期結果管線暫存器254之輸 出。早期分支修正邏輯電路144亦由一營線暫存器248接 收J階段114中之指令用以由管線暫存器232以管線傳送 指令206。管線式微處理器1〇〇亦包含一分支預測發生信 號208,其係由I階段1〇2、F階段1〇4及χ階段1〇6而通 過於R階段108之管線暫存器234及於A階段112之管線 暫存裔252,由管線傳輸而來,且被提供至早期分支修正 远輯電路144。於分支預測發生信號2〇8上之一真值係指 不於對應階段中之指令係—分支指令,其係藉由圖i之分 支預測為132而預测發生,即,管線式微處理器1〇〇轉移 之前係依據分支預測器132所作之預測。 依據其輸入,早期分支修正邏輯電路144係產生第一 控制^说154,其係用以提供圖1之I階段102。第一控制 27 1273485 信號154係包括一早期分支修正信號258,其係用以提供 至圖1之晚期分支修正邏輯電路148。當早期分支修正邏 輯電路144修正一分支預測,則早期分支修正信號258係 為真。有關早期分支修正邏輯電路144之運作請參照圖6 並如下詳細說明。 請參照圖4所示,係依本發明圖2之產生早期結果及 早期狀態暫存器之一裝置運作流程圖。圖4之流程圖包括 兩個圖面,分別為圖4A及圖4B,且流程開始於方塊402。 於方塊402中,一指令206係指示一來源運算元由結 構式暫存檔134抵達R階段108。指令係指示一或多個來 源暫存運算元經由來源運算元標籤214。流程繼續至判斷 方塊404。 於判斷方塊404中,早期旗標產生/控制邏輯電路212 會檢測指令型態且判斷不論指令是何種型態皆必須於A階 段112執行。於一實施例中,當必須被提供至D階段116 中之資料快取之指令需要一記憶體位址之產生,則指令必 須於A階段112中執行。若指令必須於A階段112中執行, 則流程繼續至方塊406,否則流程繼續至方塊412。 於判斷方塊406中,早期旗標產生/控制邏輯電路212 係判斷所有的由指令所指示之來源運算元是否存在且係 有效的。若指令指示一記憶運算元,早期旗標產生/控制邏 輯電路212會檢測記憶運算元存在信號266以判斷被指示 之記憶運算元是否存在且係有效的。若指令係指示一直接 運算元,直接運算元係永遠存在且有效的。若指令指示一 28 1273485 暫存運算元,早期旗標產生/控制邏輯電路212會比較由低 階管線來的目的標籤204及來源運算元標籤214以判斷管 線式微處理器100中之舊的指令產生一結果,其係預定1 用以藉由來源運算元標籤214指示結構式暫存檔134暫存 器。如果是如此,結果係存在於早期暫存檔之中,無論何 種情況下,早期產生/控制邏輯電路212會為被指示的運算 元檢測有效位元218以判斷運算元是否有效。如果在管線 式被處理斋1 〇〇裡的一舊的指令沒有產生一個結果,其係 預疋被用以藉由來源運算元標籤214指示結構式暫存檔 134暫存器,然後,運算元係存在且有效的,因為其將會 由結構式暫存檔134被提供。若由指令所指示的所有的來 源運异几係存在且有效的,則流程繼續至判斷方塊412 ; 否則流程繼續至方塊4〇8。 於方塊408中,早期旗標產生/控制邏輯電路212係於 暫停信號228上產生-真值,用以在#前的時脈週期中等 =由心體到達或被寫回結構式暫存冑134或經由轉遞匯 後=所獲彳于之來源運算元,暫停在r階段中之指令。 <程繼續由方塊408回到方塊4〇6。 ;判斷方塊412巾,早期旗標產生/控制邏輯電路212 ς:來源運算元標籤214與由較低管_^^ 是作比較,以判斷管線式微處理器1〇〇中之一舊的指令 2U ^生一結果,其係預定被用以藉由來源運算元標籤 定曰=結構式暫存檔134暫存11。#此,賴結果不一 $效的,但結果會存在於早期暫存槽136之中。若於 29 1273485 管線式微處理器100中之一舊的指令沒有產生―、社果 係被預定用以藉由來源運算元標籤214指示結構式暫 134暫存器,流程繼續至方塊416 ;否則流程繼於 子才田 414。 、、王万塊 於方塊414中,早期旗標產生/控制邏輯電路 z 12會於 控制信號上產生一值,以使多工器226去選摆兹山+ 、 μ 一 伟错由來源運 异元標籤214所指示之暫存器來源運算元,其係由 存播136所提供。若早期結果242於同一時脈週期之a匕 段112中被產生,則指令到達R階段·ι〇8且需I^ 而要來源運曾 元,然後多工器226將會選擇早期結果242輪入以將來: 運算元提供給指令。多工器226可能會由早期暫存伊〜 選擇多重暫存運算元給一指令,其係指示多重暫存運# 元。流程繼續至方塊418。 # 於方塊416中,早期旗標產生/控制邏輯電路212會於 選擇态控制k號282上產生一值,以使多工器226去選^ 由來源運异元標籤214所指示之暫存器來源運算元,其係 由結構式暫存檔134所提供。另外,早期旗標產生/控制邏 輯電路212會控制多工器226去選擇由指令所指示之非暫 存運算元,若有的話,例如直接/固定運算元222或有效轉 遞運算元。流程繼續至方塊418。 於方塊418中,指令繼續至A階段112,於a階段中, 早期執行邏輯電路/位址產生器138會由多工器226所選擇 之來源運算元而產生早期結果242。特別的是,適當的算 術邏輯電路272、布林運算邏輯電路274或位移邏輯電路 30 1273485 276之其中之一會依據其上之指令型態而產生早期結果 242。流程繼續至判斷方塊422。 於判斷方塊422中,早期旗標產生/控制邏輯電路212 會檢測指令之目標運算元標籤,以判斷早期結果242是否 是預定要給結構式暫存檔中之一暫存器的。如果是,流程 繼續至判斷方塊424 ;否則,流程繼續至方塊434。 於判斷方塊424中,早期旗標產生/控制邏輯電路212 會檢測指令型態以判斷指令是否為可由早期執行邏輯電 路/位址產生器138所執行之型態。即假設來源運算元係有 效的,則早期旗標產生/控制邏輯電路212會判斷指令是否 於指令子集之範圍内,以使早期執行邏輯電路/位址產生器 138產生正確的早期結果。如果是,流程繼續至判斷方塊 428,否則流程繼績至方塊426。 於方塊426中,早期旗標產生/控制邏輯電路212係於 早期結果有效信號244產生假值且藉由含假值之目標運算 元標籤216自早期執行邏輯電路/位址產生器138沒有產生 一有效早期結果242之指令型態以更新符合於早期暫存 檔136暫存器之有效位元218。流程繼續至判斷方塊434。 於判斷方塊428中,自從早期執行邏輯電路/位址產生 器138產生一修正早期結果242,早期旗標產生/控制邏輯 電路212判斷所有用於產生早期結果242之運算元是否是 有效的。其係指示若早期旗標產生/控制邏輯電路212於判 斷方塊404判斷指令不需於a階段被執行,則指令係因缺 少一運算元而沒有被儲存於R階段1〇8。因此,就算當由 31 1273485 早期暫存播136而來之一暫存運算元係無效的,指令係不 :被储存於R階段108。相同地,當記憶運算元尚未由記 ^體被載入’指令係不會被儲存於R階段108。相同地, 备早期執行邏輯電路/位址產生器138沒有產生一有效早 期而求之指令型態,指令係不會被儲存於r階段1〇8。相 反的,早期結果242於方塊420中被標記為無效的,且修 =結果係於E階段124之執行單元146被估算。相較之下, 田運异元尚未獲得、或是無效的,則必須於A階段112中 被執行之扣令(例如用以計算一位址之一指令於d階段 中使用資料快取)儲存於R階段⑽巾。當所有被用以產 生早期結果242之運算元係有效的,流程繼續之方塊432; 否則流程繼續至方塊426。 於方塊432中,早期旗標產生/控制邏輯電路212於早 期^果有效信號244上產生-真值,且藉由含真值之目標 運算元標籤216由方塊418所產生之—有效早期結果指示 以更新對應於早期暫存檔136暫存器之有效位元218。同 時地,藉由目標運算元標藏216所指示之早期暫存檔136 暫存器係由有效早㈣果242較新。流程_至判斷方 塊 434 〇 於判斷方塊物中,早期旗標產生/控制邏輯電路川 會檢測指令以判斷指令是否為—修正結構式狀態暫存器 162之型態。於-實施财,修正狀態旗標之指令係依& x86結構式指令集而被指示。當指令修正狀態暫存器⑹ 之内容,則流程繼續至方塊436;否則流程結束。° 32 1273485 於方塊436中,早期旗標產生/控制邏輯電路212係產 生早期狀態暫存器值262,其係依據由早期執行邏輯電路/ 位址產生器138所產生之早期結果242,及依據指令206, 且由早期狀態暫存器值262以更新早期狀態暫存器142。 於一實施例中,當由早期執行邏輯電路/位址產生器138所 執行之有號整數二的補數運算以產生早期結果242而導致 一溢位情況(即早期結果太大或太小以致於無法符合於目 標運算元之中),則早期旗標產生/控制邏輯電路212產生 一真值給溢位旗標(overflow flag,OF),否則會產生一假 值;早期旗標產生/控制邏輯電路212設定正負旗標(sign flag,SF)給早期結果242之最高有效位元值;當早期結果 為零時,早期旗標產生/控制邏輯電路212會產生一真值給 零位旗標(zero flag, ZF),否則則產生一假值;當早期結果 242之最小有效數位元組包含1位元的偶數,則早期旗標 產生/控制邏輯電路212會產生一真值給同位旗標(parity flag,PF),否則會產生一假值;當由早期執行邏輯電路/位 址產生器138所執行之無號整數運算導致一溢位情況(即 算術運算元產生一進位或由早期結果242之最高位元借 位),則早期旗標產生/控制邏輯電路212會產生一真值給 進位旗標(carry flag,CF),否則會產生一假值。於一實施例 中,確切的說,產生一完整的狀態旗標係為了寫入早期狀 態暫存器142,早期旗標產生/控制邏輯電路212只更新會 被早期結果242所影響之特定的狀態旗標。於另一實施例 中,早期旗標產生/控制邏輯電路212會累積先前指令之狀 33 1273485 態旗標直到他們由結構式狀態暫存器162被複製,請參照 圖5並如下所述。流程繼續至判斷方塊438。 於判斷方塊438中,早期旗標產生/控制邏輯電路212 判斷狀態旗標之修正是否取決於指令之結果。舉例來說, 於一實施例中,某一指令直接修正狀態暫存器(例如x86 結構式 STC(set carry)、CLC(clear carry)或 CMC(complent carry)指令),且因為除了狀態旗標之修正指令之外,修正 不是取決於指令結果。當狀態暫存器之修正係依據指令之 結果,則流程繼續至判斷方塊442 ;否則流程結束。 一於判斷方塊442中,早期旗標產生/控制邏輯電路212 會檢測早期結果有效信號244以判斷早期結果242是否有 效。如承是的話’流程結束;否則流程繼續至方塊444。 於方塊444中,早期旗標產生/控制邏輯電路犯於控 + L 5虎264上產生-值’以更新早期狀態有效暫存器施 =值,以指補存於早態暫_ 142中之狀態旗標 狀態=二塊43 4至方塊444運算以累積早期 142 “、、效記錄―:欠無效直到早紐態暫存器 中,tir修請參照圖5所示並如下所述。於一實施例 因為直接修正狀態暫在哭♦ &人 簡化早期麟纽㈣^對雜倾執行(以 旗榡之—指A被遇至,丨輯電路212),當直接更新-狀態 請參照圖5所示,复焱 之裝置之運算m 本發明之一流程圖以說明围 ^ 早期狀態暫存器142恢復且有效。 34 !273485 :5包括兩個區別的流程圖。每—流程圖係依據不同的觸 U灰復及有效情況’以描述早期狀態暫存器⑷之恢復及 使有效。參照第-種情況’流程開始於方塊5〇2。 t於方塊502中,一分支指令抵達S階段126。當分支 ^ v需要修正(即當分支預測器132錯誤預測分支指令,不 管分支是否發生錯誤預測或是分支目標位址錯誤預測),然 後晚期分支修正邏輯電路148會清除於修正錯誤預測之過 秩中之管線式微處理器100,請參照圖7並如下所述。清 除官線式微處理器100表示沒有結構式狀態暫存修正指令 係存在於R階段108下之管線階段,或是位於任何在管線 中修正結構式狀態暫存器162之指令已更新結構式狀態暫 存器162或已被清除。因此,結構式狀態暫存器162包括 最新之狀態。其係指明分支指令修正可能為一條件或非條 件型態之分支指令。此外,早期狀態暫存器142可能依據 導致管線被清除之其他事件除了分支指令之修正之外而 被恢復。流程繼續至判斷方塊504。 於判斷方塊504中,早期旗標產生/控制邏輯電路212 會檢測分支修正晚期信號268以決定S階段126分支指令 是否被晚期分支修正邏輯電路148所修正,因此表示管線 式微處理器100被清除用以產生錯誤預測之修正,如果 是,流程繼續至方塊506 ;否則流程結束。 於方塊506中,早期旗標產生/控制邏輯電路212係複 製結構式狀態暫存器162之值經由早期狀態暫存器值262 至早期狀態暫存器142,且經由控制信號264標示早期狀 35 Ϊ273485 怨暫存器142係有效的,因此,恢復早期狀態暫存器i42 為—有效狀態,流程繼續至方塊506。 請參照於圖5中之第二情況,流程開始於方塊512。 於方塊512中,早期旗標產生/控制邏輯電路212會檢 測旗標修正指令存在信號搬以麟所有結構式狀態^存 修正指令是否存在於Α階段112下,若有的話,更新結構 式狀態暫存器162,如果是,流程繼續至方塊514 ;否 流程結束。 、 於方塊514中,早期旗標產生/控制邏輯電路212係經 由早期狀態暫存器值262以複製結構式狀態暫存器162: 值至早期狀態暫存器142,且經由控制信號264以標示早 』狀悲暫存态142為有效的,因此,恢復早期狀態暫^界 142為一有效狀態。流程結束於方塊514。 其係指示管線式微處理器100之清除係由於在方塊 504中判斷之S階段126中之一分支修正,係一事件,其 係創造條件給方塊512產生一判斷。 、 請參照圖6所示,依本發明之一流程圖說明管線式微 處理态100之運算以執行早期分支修正,流程開始於方塊 602。 “ 於方塊602中,一條件分支指令抵達j階段114,流 程繼續至判斷方塊604。 於方塊604中,早期分支修正邏輯電路144會檢測早 期狀恶有效暫存器246之輸出以判斷早期狀態暫存哭是否 係有效的,如果是,流程繼續至方塊606 ;否則,流程於 36 1273485 束。因此,當早期狀態暫存器 行早期條件分支修正。 σ 142係無效的,則装置不執 期狀:::606中’早期分支修正邏輯電路144會檢測早 匕一心暫存态142内容以判斷由條件分支指令之條件碼所 才曰不之條件是否係成立的。流程繼續至判斷方塊$卯。The row (not shown), the basin spleen: processor 100 also includes the results of the transfer confluence stage 124 and the S stage 126 line unit 146, provided by the E-transfer bus system, to the R stage 108 Supply operands. Households #千一十, input for the 夕工器226. If the R phase 108 is different from the source, the label is not matched with the one that is smothered by the A phase 112 until the ⁄ 13⁄4 half and 122, but it is associated with the E stage 124 or 22 target. The match, followed by 'early flag generation/control field', road 212 controls the multi-guard, 226 to select the transfer bus to provide a new result as an operand. If an operand is on the forwarding bus, it is always valid. The official line microprocessor 1 also includes an early flag generation/control logic circuit 212. The early flag generation/control logic circuit 212 receives the structured state register 162 as an input. Early flag generation/control logic 212 also receives instructions 206 in R stage 1 〇 8 and a stage 112 as inputs. The early flag generation/control logic circuit 212 also receives the output of the multiplexer 226 as an input, i.e., the R stage 1 〇 8 source operand and the associated significant bit. The early flag generation/control logic circuit 212 also receives the output of the pipeline 23 1273485 register 232, the effective bit line register 236, and the register 238 as input, ie, the segment 112; Effective bit. The early flag generation/control logic circuit 212 is early as an input. The early flag generation/control logic circuit 12 also receives the a-phase ii2 under the parent-pipeline stage - the correction command presence signal 2G2 as the input. The flag correction command exists ^ - the true value indicates that the corresponding phase has the correction structure II - the instruction, and the structural state temporary storage is still her". The correction instruction exists in the signal transmission system early = electricity ... receives the new branch correction late: input. In the new branch correction late signal 268, the new late branch correction logic circuit 148 shown in Figure 1 is modified: Pre-two: The line indicates that the pipelined microprocessor 1 has been cleared. As shown in Fig. 5; the speed 'new branch correction late signal 268 is also used to judge == is restored and confirmed. Early flag generation/control 2 = 2 also receives the finger 1 of the finger 1 per pipeline stage present under the segment 112 as input. The target target class (4) is indicated by the instruction 206 indicating whether the source operand = the structured temporary archive 134, the early temporary archive 136 or the A phase 112 early result 242 is as described below with reference to FIG. In one embodiment, the instruction: refers to a memory source-derived element, that is, the location of an operand: kilograms. Usually the memory operation element exists in the pipeline type. You i: handle the handle of the fasting line 1 比f line than the low white P slave of the Lv踝, for example, there may be information about the situation 24 1273485, that is the purpose of the memory address The subject of a previous store instruction that matches the address of the R phase 1〇8 instruction. Although the difference = does not show the memory operand as an input. The early flag production: control circuit 212 also receives the memory operation element signal 266 as a turn-in" to indicate whether the memory source operation element not indicated by the instruction is to be supplied to the multiplexer 226. The memory operation signal plus ‘two: 疋no must turn off the R stage 108 instruction 206 and the early result 2 斲 is valid, as described in FIG. Whether ϋ 2 is based on its input, the early flag generation/control logic circuit 212 has the same control signal. The early flag generation/control logic circuit 212 generates a device control signal 282 that controls the multiplexer 226 to source the appropriate == 々 206 source. Early flag generation/control logic 2 = 钿 Early birth result valid signal 244. Early flag generation/control logic:: A state register 1 262 for storing in the early state register 142 and a control signal 264 for updating the value asserted in the early state. The value in the early state valid register is 4曰6, which is initialized to a valid value when the microprocessor re-sets the text. The early = ^ register 142 and the early state valid register 246 serve as a pipeline register to provide the early state register and the valid bit to the j stage 114. The early flag, raw/control logic circuit 212 also generates a suspend signal 228 that provides g line registers 232, 234, a valid bit line register 236, and a source operand line register 238 to turn off R stage 1 〇 8. In one embodiment, the pipeline registers 232, 234, 236, and 238 include multiplex registers that are used to maintain their current state until the pause signal 228 is true until a clock cycle of 25 1273485. The operation of the pause signal 228 will be described in detail below with reference to FIG. Phase A 112 includes an early execution logic/address generator 138 that receives source operands from source operand pipeline register 238 and produces early results 242 based on source operand pipeline register 238. The early execution logic/address generator 138 includes an arithmetic logic circuit 272, a Boolean operation logic circuit 274, and a displacement logic circuit 276. The early execution logic/address generator 138 is used to generate the actual address for the memory gymnastics. The early execution logic/address generator 138 is also used to execute a subset of the operations required by the instructions of the instruction set of the pipelined microprocessor 100. Thus, the early execution logic/address generator 138 can perform a subset of the operations by the execution unit 146 as shown in FIG. The early execution logic/address generator 138 is used to perform a subset of the operations, which are the most commonly performed operations. In one embodiment, the most commonly performed operations are also substantially overlapping with the fastest performing operations (ie, requiring a relatively short time to execute, so they can be executed in a single clock cycle) and require a relatively small amount of Hardware, especially in hardware, has been required to generate memory addresses. In one embodiment, the arithmetic logic circuit 272 is operative to perform an addition, subtraction, increment, and decrement; however, the arithmetic logic circuit 272 is not a subtraction to perform the addition or borrowing of the carry. In one embodiment, the Boolean logic circuit 274 is configured to perform an AND and OR, OR, XOR, NAND, positive and negative extension, and zero The movement of the bit extension; however, the Boolean logic circuit 274 is not used to perform a one-tuple replacement. In one embodiment, the displacement logic circuit 276 is configured to perform a left bit 26 1273485 shift or right shift; however, the shift logic circuit 276 is not used to perform a spin or rotate carry operation. Although the specific embodiment has been described, the specific subset of operations performed in the logic/address generator 138 is performed earlier, and the present invention is not limited to a particular embodiment, and those skilled in microprocessor design may readily appreciate the early stages. Execution logic/address generator 138 may be based on a particular set of instructions of pipelined microprocessor 100 to perform a particular subset of operations and target execution and circuit actual endpoints. The early results 242 generated by the early execution logic/address generator 138 are provided to an early result pipeline register 254 for storage and subsequent supply to the j stage 114. The J stage includes an early branch correction logic circuit 144 as shown in FIG. The early branch correction logic circuit 144 receives the output of the early state register 142, the early history valid register 246, and the early result pipeline register 254. The early branch correction logic circuit 144 also receives an instruction from the J-stage 114 by a camp line register 248 for pipelined instructions 206 to be pipelined by the pipeline register 232. The pipelined microprocessor 1A also includes a branch prediction generation signal 208, which is passed through the I stage 〇2, the F stage 1〇4, and the χ stage 〇6, and passes through the pipeline register 234 of the R stage 108. The pipeline stage 252 of phase A 112 is transmitted from the pipeline and is provided to the early branch correction remote circuit 144. The true value of the branch prediction occurrence signal 2〇8 refers to the instruction system-branch instruction in the corresponding stage, which is predicted to occur by the branch prediction of FIG. i being 132, that is, the pipelined microprocessor 1 The prediction is based on the prediction made by the branch predictor 132. Based on its input, early branch correction logic 144 generates a first control 154 that is used to provide stage I 102 of FIG. The first control 27 1273485 signal 154 includes an early branch correction signal 258 that is provided to the late branch correction logic circuit 148 of FIG. When the early branch correction logic circuit 144 corrects a branch prediction, the early branch correction signal 258 is true. Please refer to Figure 6 for the operation of the early branch correction logic circuit 144 and as explained in detail below. Referring to FIG. 4, it is a flow chart of the operation of one of the early results and the early state register of FIG. 2 according to the present invention. The flow chart of Figure 4 includes two drawings, Figures 4A and 4B, respectively, and the flow begins at block 402. In block 402, an instruction 206 indicates that a source operand has arrived at the R stage 108 by the structured temporary archive 134. The instruction indicates that one or more of the source temporary operands are via the source operand tag 214. Flow continues to decision block 404. In decision block 404, early flag generation/control logic 212 detects the command type and determines that it must be executed at stage A 112 regardless of the type of instruction. In one embodiment, when an instruction that must be supplied to the data cache in D stage 116 requires the generation of a memory address, the instruction must be executed in stage A 112. If the instruction must be executed in stage A 112, then flow continues to block 406, otherwise flow continues to block 412. In decision block 406, the early flag generation/control logic circuit 212 determines whether all of the source operands indicated by the instruction are present and valid. If the instruction indicates a memory operand, the early flag generation/control logic circuit 212 detects the memory operand presence signal 266 to determine if the indicated memory operand is present and valid. If the instruction indicates a direct operand, the direct operand is always present and valid. If the instruction indicates a 28 1273485 temporary operand, the early flag generation/control logic 212 compares the destination tag 204 and the source operand tag 214 from the lower order pipeline to determine the old instruction generation in the pipelined microprocessor 100. As a result, it is predetermined 1 to indicate the structured temporary archive 134 register by the source operand tag 214. If so, the result is in the early temporary archive, in which case the early generation/control logic 212 will detect the valid bit 218 for the indicated operand to determine if the operand is valid. If an old instruction in the pipelined process is not producing a result, it is used to indicate the structured temporary archive 134 register by the source operand tag 214, and then the operation element It exists and is effective as it will be provided by the structured temporary archive 134. If all of the sources indicated by the instruction are present and valid, then flow continues to decision block 412; otherwise the flow continues to block 4-8. In block 408, the early flag generation/control logic circuit 212 generates a true value on the pause signal 228 for the medium clock period before #= to be reached by the heart or written back to the structural temporary storage 134 Or suspend the instruction in the r stage by forwarding the sink = the source operand obtained. <Course continues from block 408 back to block 4-6. The decision block 412, the early flag generation/control logic circuit 212: the source operand tag 214 is compared with the lower pipe _^^ to determine one of the pipelined microprocessors 1 A result is intended to be used by the source operand tag = structured temporary archive 134 temporary store 11. #这, The results are not valid, but the results will exist in the early temporary storage slot 136. If an old instruction in the pipelined microprocessor 100 is not generated at 29 1273485, the system is scheduled to indicate the structured temporary 134 register by the source operand tag 214, and the flow continues to block 416; otherwise, the flow Following Yu Caitian 414. , Wang Wan block in block 414, the early flag generation/control logic circuit z 12 will generate a value on the control signal, so that the multiplexer 226 selects the Zhazishan +, μ a Wei error from the source The scratchpad source operand indicated by meta tag 214 is provided by store 136. If the early result 242 is generated in the a segment 112 of the same clock cycle, then the instruction reaches the R phase · ι 〇 8 and I ^ is required to source the Zeng Yuan, then the multiplexer 226 will select the early result 242 rounds Into the future: The operand is supplied to the instruction. The multiplexer 226 may be assigned an instruction by an early temporary storage unit to select a multiple temporary storage operation unit, which indicates a multiple temporary storage unit. Flow continues to block 418. # In block 416, the early flag generation/control logic circuit 212 generates a value on the select state control k number 282 to cause the multiplexer 226 to deselect the register indicated by the source transport element label 214. The source operand is provided by the structured temporary archive 134. In addition, the early flag generation/control logic circuit 212 controls the multiplexer 226 to select the non-storing operands indicated by the instructions, such as direct/fixed operands 222 or valid transfer operands, if any. Flow continues to block 418. In block 418, the instruction continues to stage A 112, in which stage early execution logic/address generator 138 produces an early result 242 from the source operand selected by multiplexer 226. In particular, one of the appropriate arithmetic logic circuit 272, Boolean logic circuit 274, or displacement logic circuit 30 1273485 276 will produce an early result 242 based on the command pattern thereon. Flow continues to decision block 422. In decision block 422, the early flag generation/control logic 212 detects the target operand tag of the instruction to determine if the early result 242 is intended to be a register in the structured temporary archive. If so, the flow continues to decision block 424; otherwise, the flow continues to block 434. In decision block 424, the early flag generation/control logic 212 detects the instruction type to determine if the instruction is of a type that can be executed by the early execution logic circuit/address generator 138. That is, assuming that the source operand is valid, the early flag generation/control logic 212 will determine if the instruction is within the subset of the instruction so that the early execution logic/address generator 138 produces the correct early results. If so, the flow continues to decision block 428, otherwise the process continues to block 426. In block 426, the early flag generation/control logic circuit 212 generates a false value from the early result valid signal 244 and does not generate one from the early execution logic/address generator 138 by the target operand tag 216 containing the false value. The instruction pattern of the valid early result 242 is updated to match the valid bit 218 of the early temporary archive 136 register. Flow continues to decision block 434. In decision block 428, since the early execution logic/address generator 138 produces a modified early result 242, the early flag generation/control logic circuit 212 determines whether all of the operands used to generate the early result 242 are valid. It is indicated that if the early flag generation/control logic circuit 212 determines in decision block 404 that the instruction does not need to be executed in phase a, then the instruction is not stored in R stage 1〇8 due to the lack of an operand. Therefore, even if one of the temporary operands is invalid from the early temporary broadcast 136 of 31 1273485, the command is not: stored in the R stage 108. Similarly, when the memory operand has not been loaded by the record, the instruction system will not be stored in the R stage 108. Similarly, the early execution logic/address generator 138 does not generate an effective early instruction type, and the instruction system is not stored in the r stage 1〇8. In contrast, early results 242 are marked as invalid in block 420, and the results of the repairs are evaluated in execution unit 146 of stage E 124. In contrast, if the field transport has not been obtained or is invalid, it must be executed in the A phase 112 (for example, to calculate one of the addresses and use the data cache in the d stage) In the R stage (10) towel. When all of the operands used to generate the early results 242 are valid, the flow continues to block 432; otherwise the flow continues to block 426. In block 432, the early flag generation/control logic circuit 212 generates a true value on the early valid signal 244 and is generated by block 418 by the target operand tag 216 containing the true value - an effective early result indication To update the valid bit 218 corresponding to the early temporary archive 136 register. At the same time, the early temporary archive 136 register indicated by the target operand 216 is validated by the early (four) 242. Flow _ to decision block 434 判断 In the decision block, the early flag generation/control logic will detect the command to determine if the command is a modified structural state register 162. The instruction to correct the status flag is instructed according to the & x86 structured instruction set. When the instruction corrects the contents of the status register (6), the flow continues to block 436; otherwise the flow ends. ° 32 1273485 In block 436, the early flag generation/control logic circuit 212 generates an early state register value 262 that is based on the early results 242 generated by the early execution logic circuit/address generator 138, and Instruction 206, and is updated by early state register value 262 to early state register 142. In one embodiment, a signed integer two-bit complement operation performed by the early execution logic/address generator 138 to produce an early result 242 results in an overflow condition (ie, the early result is too large or too small) Therefore, the early flag generation/control logic circuit 212 generates a true value for the overflow flag (OF), otherwise a false value is generated; early flag generation/ Control logic circuit 212 sets the positive and negative flag (SF) to the most significant bit value of early result 242; when the early result is zero, early flag generation/control logic 212 generates a true value for the zero flag Zero flag (ZF), otherwise a false value is generated; when the least significant digit of the early result 242 contains an even number of 1 bit, the early flag generation/control logic circuit 212 generates a true value for the flag Parity flag (PF), otherwise a false value is generated; when an unsigned integer operation performed by the early execution logic/address generator 138 results in an overflow condition (ie, the arithmetic operand produces a carry or is early Results 242 Borrow bit maximum), the early flag generation / control logic 212 generates a true value for carry flag (carry flag, CF), or will cause a false value. In one embodiment, to be precise, a complete status flag is generated for writing to the early state register 142, and the early flag generation/control logic 212 only updates the particular state that would be affected by the early result 242. Flag. In another embodiment, early flag generation/control logic 212 accumulates the status of the previous instruction 33 1273485 state flag until they are copied by structured state register 162, see Figure 5 and described below. Flow continues to decision block 438. In decision block 438, the early flag generation/control logic circuit 212 determines if the correction of the status flag is dependent on the result of the instruction. For example, in one embodiment, an instruction directly corrects a state register (eg, an x86 structured set STC, a CLC (clear carry), or a CMC (complent carry) instruction, and because, in addition to the status flag In addition to the correction instructions, the correction does not depend on the result of the instruction. When the state register is modified based on the result of the instruction, then flow continues to decision block 442; otherwise the process ends. In decision block 442, early flag generation/control logic 212 detects early result valid signal 244 to determine if early result 242 is valid. If the answer is yes, the process ends; otherwise the process continues to block 444. In block 444, the early flag generation/control logic circuit commits a value-value on the control + L 5 tiger 264 to update the early state valid register register value to refer to the early state temporary 142 Status flag status = two blocks 43 4 to block 444 operation to accumulate early 142 ", effective record": under invalid until the early state register, tir repair as shown in Figure 5 and as follows. The embodiment is because the direct correction state is temporarily crying ♦ & simplifies the early nucleus (four) ^ on the miscellaneous execution (in the flag — - refers to A is encountered, 丨 circuit 212), when the direct update - state please refer to Figure 5 As shown, the operation of the apparatus for reclamation is a flowchart of the present invention to illustrate that the early state register 142 is restored and valid. 34 !273485 : 5 includes two different flowcharts. Each flow chart is different. The touch of U and the effective condition 'to describe the recovery and validity of the early state register (4). The flow of the reference to the first case begins at block 5 〇 2. In block 502, a branch instruction arrives at stage S 126 When the branch ^ v needs to be corrected (ie when the branch predictor 132 mispredicts the branch finger) Regardless of whether the branch has an erroneous prediction or a branch target address error prediction, then the late branch correction logic circuit 148 will clear the pipelined microprocessor 100 in the over-rank of the modified error prediction, please refer to FIG. 7 and described below. The clear-line microprocessor 100 indicates that there is no structural state temporary memory correction instruction that exists in the pipeline phase under the R phase 108, or that any instruction in the pipeline that corrects the structural state register 162 has updated the structural state. The memory 162 has either been cleared. Therefore, the structured state register 162 includes the most recent state, which indicates that the branch instruction may be a branch instruction that may be a conditional or non-conditional type. In addition, the early state register 142 may Other events that cause the pipeline to be cleared are resumed in addition to the correction of the branch instruction. Flow continues to decision block 504. In decision block 504, early flag generation/control logic 212 detects branch correction late signal 268 to determine Whether the S phase 126 branch instruction is modified by the late branch correction logic circuit 148, thus indicating the pipeline type micro The processor 100 is cleared to generate a correction for the erroneous prediction, and if so, the flow continues to block 506; otherwise the flow ends. In block 506, the early flag generation/control logic 212 copies the structural state register 162. The value passes through the early state register value 262 to the early state register 142, and the early state 35 Ϊ 273 485 complaint register 142 is valid via the control signal 264, thus restoring the early state register i42 to the active state, Flow continues to block 506. Referring to the second scenario in Figure 5, the flow begins at block 512. In block 512, the early flag generation/control logic circuit 212 detects the flag correction command presence signal to move all structures. Whether the state correction instruction exists in the Α stage 112, if any, updates the structured state register 162, and if so, the flow continues to block 514; In block 514, the early flag generation/control logic circuit 212 copies the structural state register 162 via the early state register value 262: the value to the early state register 142 and is indicated via the control signal 264. The early sinusoidal temporary state 142 is valid, and therefore, the recovery of the early state temporary boundary 142 is an effective state. Flow ends at block 514. It indicates that the clearing of the pipelined microprocessor 100 is due to one of the branch corrections in the S phase 126 as determined in block 504, which is an event that creates a condition for block 512 to produce a determination. Referring to Figure 6, the operation of the pipelined micro-processing state 100 is illustrated in accordance with one of the flowcharts of the present invention to perform early branch correction, and the flow begins at block 602. "In block 602, a conditional branch instruction arrives at stage j 114, and the flow continues to decision block 604. In block 604, early branch correction logic 144 detects the output of early invalid register 246 to determine the early state. Whether crying is valid, if yes, the flow continues to block 606; otherwise, the process is bundled at 36 1273485. Therefore, when the early state register line is modified, the early conditional branch is corrected. If the σ 142 system is invalid, the device is not in compliance. The ::ear branch correction logic circuit 144 in the :::606 will detect the content of the early heart state temporary state 142 to determine whether the condition of the conditional branch instruction condition is not met. The flow continues to the decision block $卯.
η於判斷方塊608中,早期分支修正邏輯電路144判斷 是=需依據方塊606以修正條件分支指令之預測。條件分 支指令需被修正,當條件係滿足於有效早期狀態暫存器 142,像疋條件分支指令應該發生,而分支預測器預 測分支不應該發生(如由分支預測發生信號2〇8管線下了階 #又114版本之一假值所指示),即導致管線式微處理器1〇〇 去取出下一個連續的指令;相反的,當條件係不滿足於有 效早期狀態暫存器142則條件分支指令需要被修正,像是 條件分支指令應該不發生,而分支預測器132預測分支應 該發生(如由分支預測發生信號2〇8管線下;階段η4版本 的一真值所指示),即導致管線式微處理器1〇〇轉移至預測 分支目標位址。若修正條件分支指令之預測為必須的,流 程繼續至判斷方塊612 ;否則,流程結束。 於判斷方塊612中,已判斷條件分支指令需要修正, 早期分支修正邏輯電路144會檢測分支預測發生信號2〇8 之管線下J階段114版本,以判斷條件分支指令是否被預 測發生,如果是,流程繼續至方塊616 •,否則流程繼續至 方塊614。 37 1273485 於方塊614中,早期分支修正邏輯電路144經由第一 控制信號154指示I階段1〇2清除位於j階段ι14上之管 線式微處理器100以及分支管線式微處理器1〇〇至條件分 支指令的分支目標位址,於一實施例中,條件分支指令之 分支目標位址係由A階段112中之早期執行邏輯電路/位 址產生裔138所產生。此外,早期分支修正邏輯電路144 在由官線下傳送至晚期分支修正邏輯電路148之早期分支 修正信號258產生一真值,其使用係參照圖7並如下所 述,流程結束於方塊614。 於方塊616中,早期分支修正邏輯電路144經由第一 控制信號154指示I階段1〇2清除位於】階段114上之管 線式微處理斋100,以及分支管線式微處理器1〇〇至條件 分支指令後的下-個連續的指令。此外,早期分支修正邏 輯電路144在由官線下傳送至晚期分支修正邏輯電路148 之早期分支修正信號258產生-真值,其使用係參照圖7 並如下所述,流程結束於方塊616。 睛參照圖7所不,依本發明之一流程圖說明管線式微 處理器1GG之運算以進行晚期分支修正。流程開始於方塊 702 〇 於方塊702中,一條件分支指令抵it S階段126。流 程繼續至方塊704。 於方塊704中,晚期分支修正邏輯電路會檢測結 構式狀態暫存器162内容以判斷由條件分支指令之條件碼 所指示之條件是否係成立的。流㈣續至判斷方塊7〇6。 38 1273485 於方塊706中,晚期分支修正邏輯電路148判斷是否 =要依據方塊704以修正條件分支指令之預測。條件分支 指=需被修正,當條件係滿足於結構錄㈣存器⑹, 像是條件分支齡㈣H❿分支賴ϋ 132預測分支 不應該發生(如由分支預測發生信號2〇8管線下s階段 版本之一假值所指示),即導致管線式微處理器1〇〇去取出 :一個連續的指令;相反的,當條件係不滿足於結構式狀 悲,存器162,則條件分支指令需要被修正,像是條件分 支指令不應該發生’而分支預測器132賴分支應該發生 (如由分支預測發生指令2 〇 8管線下s階段丨2 6版本的二真 值指示)’即導致管線式微處理器1〇〇轉移至預測分支目j罕 位址。若修正條件分支指令之預測為必須的,流程繼續^ 判斷方塊708 ;否則流程結束。 於判斷方塊708中,晚期分支修正邏輯電路148會檢 7早期分支修正信號2 5 8以判斷條件分支指令的錯誤預^ 疋否已由早期分支修正邏輯電路144修正,如果是,流程 結束;否則流程繼續至判斷方塊712。 於判斷方塊712中,已判斷條件分支指令需要修正, 晚期分支修正邏輯電路148會檢測分支預測發生信號2〇8 之官線下S階段126版本,以判斷條件分支指令是否被預 測發生,如果是,流程繼續至方塊716 ;否則流程繼續至 方塊714 〇 於方塊714中,晚期分支修正邏輯電路148經由第一 控制信號154指示I階段102清除S階段126上之管線式 39 1273485 微處理器loo,且分支管線式微處理器10()至條件分支指 令之分支目標位址。流程結束於方塊714。 於方塊716中,晚期分支修正邏輯電路148經由信號 154指示I階段102清除s階段126上之管線式微處理器 100,且分支管線式微處理器100至條件分支指令之後之 下一個連續的指令。流程結束於方塊716。 如上所述,被描述之管線式微處理器1〇〇係相較於沒 有早期執行邏輯電路/位址產生器138及早期暫存檔136所 提供優點之微處理器,有能力提早多個時脈週期至一先前 指令之一結果作為一暫存運算元給一連續的位址產生或 非位址產生和令,因此減少所招致之管線氣泡數量。降低 管線氣泡減少微處理器執行之主要組成每一指令之平均 週期。此外,早期結果可被使用以比先前更快速的更新狀 態旗標,因此可能比先前更快速的啟動條件指令之執行。 又,被描述之管線式微處理器100係相較於沒有早期分支 修正邏輯電路144所提供優點之微處理器,有能力提早多 個時脈週期修正一錯誤地預測之條件分支指令。最後,高 時脈頻率微處理器之需求係導致微處理設計者增加管線 P白#又之數里。g管線階段之數量增加,管線氣泡之數量導 致可能增加等待指令結果及/或狀態旗標更新,相同地,當 管線階段數量增加,修正錯誤預測之分支指令之延遲亦可 能增加。這些事實增加了於此描述之微處理器、裝置以及 方法之優點。 1273485 雖然本發明及其目的、特徵及優點已詳細描述,其餘 任何未脫離本發明之精神與範疇之實施例亦應被包含,舉 例來說,雖然所描述之一微處理器實施例大體上與x86結 構符合’但所描述的裝置及方法並不限於x86結構而且可 被使用於不同的微處理器結構。另外,雖然所描述之一實 施例於其中一條件分支指令被提早修正,早期產生狀態旗 _示的優點可被利用以提早執行其他指令,像是,以 L00PCC指令為例,於此所描述的優點同樣發生在Jcc指 令;或x86 SETcc及CMOVcc指令可被提早執行,因此其 結果可讓隨後依靠的指令利用。此外,除了利用早期結果 產生早期狀態旗標值外,早期結果亦可用以執行間接分支 才曰令之早期分支修正,通常係指jump thr〇ugh 令,其係指示分支目標位址為一來源暫存運算元值。 又,除了利用硬體實現本發明之外,本發明可以實現 於儲存於一電腦可用(例如可讀取)媒體中之電腦可讀碼 (例如電腦可讀取碼、資料等),電腦碼導致本發明之功用 或構造或兩者得以實施。舉例來說,其係可藉由利用一般 的程式語言(例如C語言,c++語言,JAva,及其他類似的語 言)而實現;GDSII資料庫;硬體描述語言(HDL)包括In the decision block 608, the early branch correction logic circuit 144 determines YES = the block 606 is required to correct the prediction of the conditional branch instruction. The conditional branch instruction needs to be modified. When the condition is satisfied with the valid early state register 142, the conditional branch instruction should occur, and the branch predictor predicts that the branch should not occur (eg, the branch prediction signal 2〇8 is under the pipeline) The step ##114 version of one of the false values is indicated), which causes the pipelined microprocessor 1 to pick up the next consecutive instruction; conversely, when the condition is not satisfied with the valid early state register 142, the conditional branch instruction Need to be corrected, such as the conditional branch instruction should not occur, and the branch predictor 132 predicts that the branch should occur (as indicated by the branch prediction generation signal 2〇8 pipeline; the true value of the phase η4 version), which leads to the pipeline type micro The processor 1〇〇 transfers to the predicted branch target address. If the prediction of the modified conditional branch instruction is necessary, the flow continues to decision block 612; otherwise, the flow ends. In decision block 612, it has been determined that the conditional branch instruction needs to be corrected, and the early branch correction logic circuit 144 detects the sub-stage J stage 114 version of the branch prediction occurrence signal 2〇8 to determine whether the conditional branch instruction is predicted to occur, and if so, Flow continues to block 616 •, otherwise flow continues to block 614. 37 1273485 In block 614, the early branch correction logic circuit 144 instructs the I-stage 1〇2 to clear the pipelined microprocessor 100 located on the j-phase i14 and the branch-line microprocessor 1 to the conditional branch instruction via the first control signal 154. The branch target address, in one embodiment, the branch target address of the conditional branch instruction is generated by the early execution logic/address generation 138 in stage A. In addition, the early branch correction logic circuit 144 generates a true value at the early branch correction signal 258 transmitted by the official line to the late branch correction logic circuit 148, the use of which is described below with reference to Figure 7 and the flow ends at block 614. In block 616, the early branch correction logic circuit 144 instructs the I-phase 1〇2 to clear the pipelined microprocessor 100 located on the stage 114 via the first control signal 154, and the branch-line microprocessor 1 to the conditional branch instruction. The next - a continuous instruction. In addition, the early branch correction logic circuit 144 generates a true value at the early branch correction signal 258 transmitted by the official line to the late branch correction logic circuit 148, the use of which is described below with reference to Figure 7 and as follows, and the flow ends at block 616. Referring to Figure 7, the operation of the pipelined microprocessor 1GG is illustrated in accordance with one of the flowcharts of the present invention for late branch correction. The flow begins in block 702 方块 in block 702, a conditional branch instruction is directed to the S stage 126. Flow continues to block 704. In block 704, the late branch correction logic circuit detects the contents of the structured state register 162 to determine if the condition indicated by the condition code of the conditional branch instruction is valid. Flow (4) continues to decision block 7〇6. 38 1273485 In block 706, late branch correction logic 148 determines if = is to rely on block 704 to correct the prediction of the conditional branch instruction. Conditional branching = need to be corrected, when the condition is satisfied with the structure record (4) register (6), such as the conditional branch age (four) H❿ branch Lai ϋ 132 prediction branch should not occur (such as the branch prediction signal 2 〇 8 pipeline s phase version One of the false values is indicated), which causes the pipelined microprocessor 1 to remove: a continuous instruction; conversely, when the condition is not satisfied with the structural sorrow, the memory 162, the conditional branch instruction needs to be corrected. , as if the conditional branch instruction should not occur 'and the branch predictor 132 should branch (as indicated by the branch prediction, the instruction 2 〇8 pipeline s phase 丨26 version of the two true value indication)', resulting in a pipelined microprocessor 1〇〇 Transfer to the predicted branch j. If the prediction of the modified conditional branch instruction is necessary, the flow continues to determine block 708; otherwise the process ends. In decision block 708, the late branch correction logic circuit 148 checks 7 the early branch correction signal 2 58 to determine if the error pre-condition of the conditional branch instruction has been corrected by the early branch correction logic circuit 144, and if so, the flow ends; otherwise Flow continues to decision block 712. In decision block 712, it has been determined that the conditional branch instruction needs to be corrected, and the late branch correction logic circuit 148 detects the sub-stage S-stage 126 version of the branch prediction occurrence signal 2〇8 to determine whether the conditional branch instruction is predicted to occur, if The flow continues to block 716; otherwise the flow continues to block 714, where block 714 indicates that the late branch correction logic 148 instructs the I phase 102 to clear the pipelined 39 1273485 microprocessor loo on the S phase 126 via the first control signal 154. And branching the pipelined microprocessor 10 () to the branch target address of the conditional branch instruction. Flow ends at block 714. In block 716, late branch correction logic 148 instructs I stage 102 to clear pipelined microprocessor 100 on stage 126 via signal 154 and branch pipelined microprocessor 100 to the next consecutive instruction following the conditional branch instruction. Flow ends at block 716. As described above, the described pipelined microprocessor 1 has the ability to advance multiple clock cycles compared to microprocessors that do not have the advantages of early execution logic/address generator 138 and early temporary archive 136. The result of one of the previous instructions is used as a temporary operand to generate or generate a continuous address, thus reducing the number of pipeline bubbles incurred. Reducing the pipeline bubble reduces the average period of each instruction executed by the microprocessor. In addition, early results can be used to update status flags faster than previously, so it is possible to execute conditional instructions faster than previously. Moreover, the described pipelined microprocessor 100 is capable of correcting a erroneously predicted conditional branch instruction by an earlier plurality of clock cycles than a microprocessor without the advantages provided by the early branch correction logic circuit 144. Finally, the demand for high-clock-frequency microprocessors has led micro-processor designers to increase pipelines. The number of pipeline stages increases, and the number of pipeline bubbles results in an increase in the waiting command result and/or status flag update. Similarly, as the number of pipeline stages increases, the delay in correcting the mispredicted branch instruction may also increase. These facts add to the advantages of the microprocessors, devices, and methods described herein. 1273485 While the invention and its objects, features and advantages have been described in detail, any embodiments that do not depart from the spirit and scope of the invention should be included, for example, although one of the microprocessor embodiments described is generally The x86 architecture conforms to 'but the devices and methods described are not limited to x86 architectures and can be used with different microprocessor architectures. Additionally, while one of the described embodiments is modified early, one of the advantages of the early generation status flag can be utilized to execute other instructions early, as in the case of the L00PCC instruction, as described herein. The advantages also occur with Jcc instructions; or x86 SETcc and CMOVcc instructions can be executed early, so the results can be used by subsequent instructions. In addition, in addition to using early results to generate early state flag values, early results can also be used to perform early branch corrections, usually referred to as jump thr〇ugh orders, which indicate that the branch target address is a source. Store the operand value. Moreover, in addition to implementing the invention by hardware, the invention can be implemented in a computer readable code (eg, computer readable code, data, etc.) stored in a computer usable (eg, readable) medium, resulting in a computer code The utility or construction of the invention or both are implemented. For example, it can be implemented by using a general programming language (e.g., C language, C++ language, JAVA, and the like); GDSII database; hardware description language (HDL) includes
VerilogHDL’VHDL,Altera HDL(AHDL),等等;或可在本 技藝中獲得的其他程式設計及/或電路(即圖示)擷取工 具。電腦程式碼可設置於任何可用的電腦(例如可讀取)媒 體包括半^體記憶體,磁碟,光碟(例如cd_r〇m、 DVD-ROM及其他類似的媒體),及作為可利用傳輸媒體(例 1273485 如載波或任何其他媒體包括數位、光學或類比媒體)儲存 (例如可讀取)於電腦中之電腦資料信號。因此,電腦程式 碼可以被傳輸於通訊網路上,包括網際網路及内部網路。 因此可了解本發明可包含在電腦程式碼(例如作為一 IP(智 慧財產)核心之部分,像一微處理器核心,或作為一系統階 層設計,像一系統晶片(SOC)以及轉化為硬體以作為整體 電路生產之部分。又,本發明可以具體化為一硬體與電腦 程式碼之組合。 最後,應感謝熟習本項技藝者可以立即利用本發明揭 露之概念以及具體實施例作為設計或修改其他結構之基 礎以實現與本發明相同之目的,任何未脫離本發明之精神 與範脅係由後附之申請專利範圍定義之。 【圖式簡單說明】 圖1為顯示依本發明之一管線式微處理器之一方塊示 意圖, 圖2為顯示依本發明圖1之管線式微處理器之R階 段、A階段及J階段之一方塊示意圖詳細說明; 圖3為顯示依本發明圖1及圖2之結構式暫存檔、早 期暫存檔、結構式狀態暫存器及早期狀態暫存器之一方塊 圖詳細說明; 圖4為顯示依本發明圖2之產生早期結果及早期狀態 暫存器之一裝置運作流程圖; 圖5為顯示依本發明圖2之裝置之恢復及確認早期狀 42 1273485 態暫存器之一運算流程圖; 圖6為顯示依本發明之管線式微處理器之執行早期分 支修正操作流程圖;以及 圖7為顯示依本發明執行晚期分支修正之微處理器之 操作流程圖 〇 元件符號說明: 100 管線式微處理器 102 I階段 104 F階段 106 X階段 108 R階段 112 A階段 114 J階段 116 D階段 118 G階段 122 Η階段 124 Ε階段 126 S階段 128 W階段 132 分支預測器 134 結構式暫存稽 136 早期暫存檔 138 早期執行邏輯電路/位址產生器 43 1273485 142 早期狀態暫存器 144 早期分支修正邏輯電路 146 執行單元 148 晚期分支修正邏輯電路 152 早期結果匯流排 154 第一控制信號 156 第二控制信號 158 結果匯流排 162 結構式狀態暫存器 202 旗標修正指令存在信號 204 目的標籤 206 指令 208 分支預測發生信號 212 早期旗標產生/控制邏輯電路 214 來源運算元標籤 216 目標運算元標籤 218 有效位元 222 直接/固定運算元 226 多工器 228 暫停信號 232 、 234 管線暫存器 236 有效位元管線暫存器 238 來源運算元管線暫存器 242 早期結果 44 1273485 244 早期結果有效信號 246 早期狀態有效暫存器 248〜252 管線暫存器 254 早期結果管線暫存器 258 早期分支修正信號 262 早期狀態暫存器值 264 控制信號 266 記憶運算元信號 268 分支修正晚期信號 272 算術邏輯電路 274 布林運算邏輯電路 276 位移邏輯電路 278 目標運算元標籤 282 選擇器控制信號 402〜444 早期結果與早期旗標運算之流程 502〜514 早期旗標恢復運算之流程 602〜616 早期分支修正之流程 702〜716 晚期分支修正之流程 45VerilogHDL'VHDL, Altera HDL (AHDL), etc.; or other programming and/or circuitry (i.e., pictorial) capture tools available in the art. The computer code can be set to any available computer (eg, readable) media including half-body memory, disk, optical disk (eg cd_r〇m, DVD-ROM and other similar media), and as a usable transmission medium (Example 1273485, such as a carrier or any other medium including digital, optical or analog media) stores (for example, readable) computer data signals in a computer. As a result, computer code can be transmitted over the communications network, including the Internet and the internal network. Thus, it can be appreciated that the present invention can be embodied in computer code (eg, as part of an IP (Intellectual Property) core, like a microprocessor core, or as a system hierarchy design, like a system chip (SOC) and converted to hardware As a part of the overall circuit production, the present invention can be embodied as a combination of a hardware and a computer program. Finally, those skilled in the art can immediately use the concept disclosed by the present invention as well as specific embodiments as a design or The basis of the other structures is modified to achieve the same objectives as the present invention, and the spirit and scope of the present invention are defined by the scope of the appended claims. FIG. 1 is a view showing one of the present invention. FIG. 2 is a block diagram showing one of the R phase, the A phase, and the J phase of the pipeline microprocessor of FIG. 1 according to the present invention; FIG. 3 is a diagram showing FIG. 1 and FIG. 2 block diagram of structural temporary archive, early temporary archive, structured state register and early state register; Figure 4 shows FIG. 2 is a flow chart showing the operation of one of the early results and the early state register of the present invention; FIG. 5 is a flow chart showing the operation of recovering and confirming the early state 42 1273485 state register of the device of FIG. 2 according to the present invention; 6 is a flow chart showing the operation of the early branch correction operation of the pipeline type microprocessor according to the present invention; and FIG. 7 is a flow chart showing the operation of the microprocessor performing the late branch correction according to the present invention. 〇 Component symbol description: 100 Pipeline type micro processing 102 I Phase 104 F Phase 106 X Phase 108 R Phase 112 Phase A 114 J Phase 116 D Phase 118 G Phase 122 Η Phase 124 Ε Phase 126 S Phase 128 W Phase 132 Branch Predictor 134 Structured Temporary 136 Early Stage Archive 138 Early Execution Logic/Address Generator 43 1273485 142 Early State Register 144 Early Branch Correction Logic 146 Execution Unit 148 Late Branch Correction Logic 152 Early Result Bus 154 First Control Signal 156 Second Control Signal 158 Result Bus 162 Structured State Register 202 Flag Correction Instruction Existence Letter No. 204 destination tag 206 instruction 208 branch prediction generation signal 212 early flag generation/control logic circuit 214 source operand tag 216 target operand tag 218 valid bit 222 direct/fixed operand 226 multiplexer 228 pause signal 232, 234 Pipeline Register 236 Valid Bit Pipeline Register 238 Source Operation Element Pipeline Register 242 Early Results 44 1273485 244 Early Result Valid Signal 246 Early State Valid Scratchpad 248~252 Pipeline Register 254 Early Result Pipeline Staging 258 Early branch correction signal 262 Early state register value 264 Control signal 266 Memory operand signal 268 Branch correction late signal 272 Arithmetic logic circuit 274 Boolean logic circuit 276 Displacement logic circuit 278 Target operand tag 282 Selector control signal 402~444 Early results and early flag operation flow 502~514 Early flag recovery operation flow 602~616 Early branch correction process 702~716 Late branch correction process 45
Claims (1)
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US10/771,678 US7100024B2 (en) | 2003-02-04 | 2004-02-04 | Pipelined microprocessor, apparatus, and method for generating early status flags |
Publications (2)
Publication Number | Publication Date |
---|---|
TW200527287A TW200527287A (en) | 2005-08-16 |
TWI273485B true TWI273485B (en) | 2007-02-11 |
Family
ID=34860772
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
TW93128090A TWI273485B (en) | 2004-02-04 | 2004-09-16 | Pipeline microprocessor, apparatus, and method for generating early status flags |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN100343799C (en) |
TW (1) | TWI273485B (en) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8521996B2 (en) * | 2009-02-12 | 2013-08-27 | Via Technologies, Inc. | Pipelined microprocessor with fast non-selective correct conditional branch instruction resolution |
US9052890B2 (en) * | 2010-09-25 | 2015-06-09 | Intel Corporation | Execute at commit state update instructions, apparatus, methods, and systems |
CN105993000B (en) * | 2013-10-27 | 2021-05-07 | 超威半导体公司 | Processor and method for floating point register aliasing |
CN107193768B (en) * | 2016-03-15 | 2021-06-29 | 厦门旌存半导体技术有限公司 | Method and device for inquiring queue state |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5881305A (en) * | 1996-12-13 | 1999-03-09 | Advanced Micro Devices, Inc. | Register rename stack for a microprocessor |
US6647489B1 (en) * | 2000-06-08 | 2003-11-11 | Ip-First, Llc | Compare branch instruction pairing within a single integer pipeline |
US7134005B2 (en) * | 2001-05-04 | 2006-11-07 | Ip-First, Llc | Microprocessor that detects erroneous speculative prediction of branch instruction opcode byte |
US7130988B2 (en) * | 2002-11-15 | 2006-10-31 | Via-Cyrix, Inc. | Status register update logic optimization |
-
2004
- 2004-09-16 TW TW93128090A patent/TWI273485B/en active
-
2005
- 2005-01-28 CN CNB2005100051447A patent/CN100343799C/en active Active
Also Published As
Publication number | Publication date |
---|---|
TW200527287A (en) | 2005-08-16 |
CN1629802A (en) | 2005-06-22 |
CN100343799C (en) | 2007-10-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN100377078C (en) | Pipeline work micro processor, apparatus and method for performing early correction of conditional branch instruction mispredictions | |
EP1296229B1 (en) | Scoreboarding mechanism in a pipeline that includes replays and redirects | |
CN101395573B (en) | Distributive scoreboard scheduling in an out-of order processor | |
EP1296230B1 (en) | Instruction issuing in the presence of load misses | |
CN100468323C (en) | Pipeline type microprocessor, device and method for generating early stage instruction results | |
US7024537B2 (en) | Data speculation based on addressing patterns identifying dual-purpose register | |
TWI436275B (en) | Microprocessor and method for immediately executing instructions of call and return instruction types using the same | |
TW201042543A (en) | Out-of-order execution microprocessor and operation method thereof | |
TWI411957B (en) | Out-of-order execution microprocessor that speculatively executes dependent memory access instructions by predicting no value change by older instruction that load a segment register | |
WO2004021174A2 (en) | Scheduler for use in a microprocessor that supports data-speculative-execution | |
EP1504340A1 (en) | System and method for linking speculative results of load operations to register values | |
TWI273485B (en) | Pipeline microprocessor, apparatus, and method for generating early status flags | |
CN102163139B (en) | Microprocessor fusing loading arithmetic/logic operation and skip macroinstructions | |
US7269714B2 (en) | Inhibiting of a co-issuing instruction in a processor having different pipeline lengths | |
TWI284282B (en) | Processor including branch prediction mechanism for far jump and far call instructions | |
TWI231450B (en) | Processor including fallback branch prediction mechanism for far jump and far call instructions | |
US11537402B1 (en) | Execution elision of intermediate instruction by processor | |
US7783692B1 (en) | Fast flag generation | |
Richardson et al. | Precise exception handling for a self-timed processor | |
US7100024B2 (en) | Pipelined microprocessor, apparatus, and method for generating early status flags | |
KR20070019750A (en) | System and method for validating a memory file that links speculative results of load operations to register values | |
CN118012509A (en) | Method for emptying pipeline, processor, chip and electronic equipment |