TWI281121B - Apparatus and method for selectively overriding return stack prediction in response to detection of non-standard return sequence - Google Patents

Apparatus and method for selectively overriding return stack prediction in response to detection of non-standard return sequence Download PDF

Info

Publication number
TWI281121B
TWI281121B TW093122812A TW93122812A TWI281121B TW I281121 B TWI281121 B TW I281121B TW 093122812 A TW093122812 A TW 093122812A TW 93122812 A TW93122812 A TW 93122812A TW I281121 B TWI281121 B TW I281121B
Authority
TW
Taiwan
Prior art keywords
return
branch
target address
instruction
prediction
Prior art date
Application number
TW093122812A
Other languages
Chinese (zh)
Other versions
TW200513961A (en
Inventor
Glenn G Henry
Thomas Mcdonald
Original Assignee
Ip First Llc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US10/679,830 external-priority patent/US7237098B2/en
Application filed by Ip First Llc filed Critical Ip First Llc
Publication of TW200513961A publication Critical patent/TW200513961A/en
Application granted granted Critical
Publication of TWI281121B publication Critical patent/TWI281121B/en

Links

Abstract

A microprocessor for predicting a target address of a return instruction is disclosed. The microprocessor includes a BTAC and a return stack that each makes a predition of the target address. Typically the return stack is more accurate. However, If the return stack mispredicts, update logic sets an override flag associated with the return instruction in the BTAC. The next time the return instruction is encountered, if the override flag is set, branch control logic branches the microprocessor to the BTAC predition. Otherwise, the microprocessor branches to the return stack predition. If the BTAC mispredicts, then the update logic clears the override flag. In one embodiment, the return stack predicts in response to decode of the return instruction. In another embodiment, the return stack predicts in response to the BTAC predicting the return instruction is present in an instruction cache line. Another embodiment includes a second, BTAC-based return stack.

Description

1281121 九、發明說明: 【優先權資訊】 [0001]本發明案剌訂狀細專糧時㈣案與美國專利申 靖案之優先權··1281121 IX. Invention Description: [Priority Information] [0001] The priority of the case of the invention in the case of the fine-grained special grain (IV) and the US patent application case··

曰期 h 案號 名稱 9/8/2003 60/501203 APPARATUS AND METHOD FOR OVERRIDING RETURN STACK PREDICTION IN RESPONSE TO DETECTION OF NON-STANDARD RETURN 10/6/2003 10/679830 APPARATUS AND METHOD FOR SELECTIVELY OVERRIDING RETURN STACK PREDICTION IN RESPONSE TO DETECTION OF NON-STANDARD RETURN SEQUENCE 7 1281121 【發明所屬之技術領域】 [0002] 本發明係有關於一種微處理器中的分支預測之相關技術領 域,尤指一種有關於使用返回堆疊及分支目標位址快取記憶體之返回 指令目標位址預測。 【先前技術】h期 h Case No. Name 9/8/2003 60/501203 APPARATUS AND METHOD FOR OVERRIDING RETURN STACK PREDICTION IN RESPONSE TO DETECTION OF NON-STANDARD RETURN 10/6/2003 10/679830 APPARATUS AND METHOD FOR SELECTIVELY OVERRIDING RETURN STACK PREDICTION IN RESPONSE TO DETECTION OF NON-STANDARD RETURN SEQUENCE 7 1281121 [Technical Field] [0002] The present invention relates to a related art field of branch prediction in a microprocessor, and more particularly to a use of a return stack and a branch target bit The address of the address cache memory is returned to the target address prediction. [Prior Art]

[0003] 微處理器為一種執行電腦程式所指定的指令之數位裝置。現 代的微處理器一般為管線式。亦即,在微處理器的不同區塊或管線階 段内’可同時使許多指令運作。Hennessy與Patterson將管線定義為「一 種多種指令可同時執行的實施技術」。在”Computer Architecture: A[0003] A microprocessor is a digital device that executes instructions specified by a computer program. Modern microprocessors are generally pipelined. That is, many instructions can be operated simultaneously in different blocks or pipeline stages of the microprocessor. Hennessy and Patterson define the pipeline as "an implementation technique in which multiple instructions can be executed simultaneously." At "Computer Architecture: A

Quantitatve Approach”(第二版),1996 年由加州舊金山的 Morgan Kaufmann 出版商出版,John L· Hennessy 與 david A· Patterson 所著。 他們提供了以下的管線佳例: 管線與組裝線類似。在汽車組裝線中,有許多步驟,每個步驟對於 汽車的建造都有某些貢獻。雖然在不同的汽車上,但是每個步驟會與 其他步驟並行地運作。在電腦管線中,管線中的每個步驟會完成指令 中的一部份。如同組裝線,不同步驟會並行地完成不同指令中的不同 部份。這些步驟中的每一個稱為管線階段或管線區段。這些階段會將 -個階段連接至下鑛段,_錄道_齡雜_端進人,經由^些 階段來進行,並且從另一端離開,就如同組裝線中的汽車。 口口 [0004]微處理器係根據時脈週期來運作。通常,以一個時脈週期為 單位,指令會從微處理n管線的_個階段傳至另—階段。在汽車岭 線中,若在此線的-個階段中缸作者因為未有汽車f運作而處於閒 置狀態,則此線的生產或效能會降低。同樣地,若微處理器階段因為 未有指令統作祕-雜職_,處於織(通霜為管線泡 沫(plpelinebUbble)的事件),則處理器的效能會降低。 1281121 [0005] 管線泡沫的潛在原因是分支指令。當遭遇分支指令時,處理 器必須決定分支指令的目標位址,並且開始在目標位址(而不是在分支 指令之後的下個循序位址)處提取指令。因為明確地決定目標位址之管 線階段係剛好位於提取指令的階段之後,所以泡沫係由分支指令所產 生。如底下更多的討論,微處理器通常包括分支預測機制,以降低由 分支指令所產生之泡沫的數目。 [0006] 分支指令的一種特定型式是返回指令。為了使程式流程恢復 到呼叫常式(其為使程式控制轉移到副常式之常式)之目的,返回指令通 常是由副常式所執行的上一個指令。在典型的程式序列中,呼叫常式 會執行呼叫指令。呼叫指令會指示微處理器,將返回位址推入記憶體 中的堆璺,然後使副常式的位址分支。推入堆疊的返回位址為緊接於 呼叫常式中的呼叫指令之後的指令之位址。副常式最終會執行返回指 令,其會使返回位址離開堆疊(其係先前藉由呼叫指令所推入),並且會 使返回位址分支,其為返回指令的目標位址。返回指令的一例為Χ86 RET指令。呼叫指令的一例為χ86呼叫指令。 [0007] 執行呼叫/返回序列的優點之一是能使副常式呼叫巢狀化。例 如,=常式會呼叫副常式A,用以推入返回位址;以及副常式Α會呼 叫副常式B,用以推入返回位址;然後副常式B會執行返回指令,用 =將由副常式A所推入的返回位址推出;然後副常式a會執行返回指 t,I以將由主常式所推入的返回位址推出。巢狀式副常式呼叫的概 ^非常有用,並且上例可延伸至與可支援的堆疊大小一樣多之呼叫深 用芈因為呼叫/返回指令序列的規則特性,所以現代微處理器會使 、“ 4稱為返回堆疊的分支預測機制,來預測返回指令的目 =堆豐為小緩衝器,係以後進先出的方式,來快取返回位址。每次 遇到呼叫指令時,推人記憶體堆疊的返龍址也會推人返轉疊。每 1281121 火遇到返回彳日令時’在返回堆疊的頂端之返回位址會推出,並且用來 當作返回指令的剩目標位址。因為微處理器不必等待從記憶體堆疊 中所提取的返回位址,所以此運作會降低泡沫。 [0009]由於呼叫/返回序列的規則特性,所以返回堆疊預測返回指令 目標位址通f非常精確。_,本發财已發現某錄式(如某些作業 系統)不會總是以標準形式來執行呼叫/返回指令。例如,執行位於χ86 微處理杰之上的程式碼(e〇de)可包括呼叫(call),然後推人(ρ腦), 用以將不同的返回位址置於堆疊上,然後返回(RET),其會導致回到推 入的返回位址,而不是_ CALL後的齡之練(其储由call而 推^堆疊)。在另一例中,程式碼會執行PUSH,肖以將返回位址置於 堆’然後會執行CALL,然後會執行二個拙丁指令,其會導致回 J第RET的事件巾之推人的返回位址,而不是回到位於卩。紐之前 的L後之才曰7。這種行為會因為返回堆疊,而導致預測錯誤。 、_〇]因此’需要-種能更精確預測返回指令目標位址,特別是用 以執行非鮮呼叫/返回的程式碼之裝置。 【發明内容】 辦入μ “ 的覆載(GV e)旗標,以致於在下次出現的返 r ^ 刀支目軚位址快取記憶體(Branch Target Addrcss 11,,賴BTAC)仙來齡職於獅齡的賴旗標。在一實 〜叫&quot;用以制返回指令的目標位址之其他機制為BTAC,其在正 1在用ΓΓ序?,事件中’或許通常會比返回堆疊的精確度低,但是 精確。订1‘準呼叫/返回序列之程式碼(eode)的事件中,會較為 1281121 _2]在本發明的-觀點中,係提出 括返回堆疊,顧產生返回指令 :™此讀理益包 還包括分支目標細棘記,_(===;;酬。此微處理器 址之第二腳j,以及肋產生覆 返回齡的目標位 件的返回指令之目標位址,則===錯誤預測第-事 包括分支控_,_至返尋^ = 定值,則對於第二事件的返回指令 不預 第二預測的目標位址,Μ會分支到第_=錢此赠理器分支到 支預[=:Ϊ‘Γ,係提出一種用以改善微處理器㈣ 支預測精確度之裝置,此微處理料有 (職C)及返回堆疊,每個返回堆疊會產生返; ^此_括_標。此__新邏輯,耦接==預 ^由^堆疊所產生的預測錯誤預·—出_返回指令之目桿位 址’則其践使魏鋪錢為纽。域 ^ 输至麵#爾細㈣㈣轉:㈣; =以選擇由BTAC魅生的_,而不會選擇由返_疊所產生的 八陶4]在另—觀點中,本發明係提出—種預測微處理器中之返回指 7的目標位址之方法。此方法包括回應於返回堆疊錯誤預測返回指令 ,目標位址’而將«減更新為真值。此方法還包括在更新之後, ,由分支目標位址快取記舰(BTAC)來產生目標位址之_。此方法 $包括在BTAC產生刪讀,_韻減衫具有餘。此方法 還包括若覆載指標具有真值’則使此微處理器分支到纟btac所產生 的預測。 [0015]在另-觀點中,本發明係提出一種用以改善微處理器中的分 支預測精確度之裝置,此微處理器具有返回堆疊及另一種預測裝置, 1281121 ㈣錄^糊’从衫目版快取記憶 靜俨1〇括覆載指標。此震置還包括更新邏輯,耗接至f =右=堆疊所產生的預測錯誤預測第-出現的返回指令: 址’職用以使BTAC巾的覆載指標更新為真值。此 括分支控制邏輯’ _至覆載指標,若覆載指標為真_ ^ 現:返回指令而言’係用以選擇由另—種預 生預= 不會選擇由返回堆疊所產生的預測。 生々預測而 [6]在另-觀財’本發明係提出—種傳輪媒體巾所包含 :貝料訊號,包括電腦可讀取程式碼,用以提供給微處姆。此^ 包括弟-程式碼’用以提供給返_疊,其用以產生返回指令的此 位址之第-刪。此程式碼還包括第二程式碼,肋提供給分支: 位址快取記紐(BTAC),其用喊生如齡的目標絲之第二預 測,以及用以產生覆载指標。若第一預測錯誤預測第_事件的返_ 令之目標位址,職餘標會顯示預定值。此程式碼還包括第三程式 碼,用以提供給分支控制邏輯,其耦接至返回堆疊及BTAC,若覆載 指標顯示,值,麟於第二事件的返回指令而言,伽以使微^理 器分支到第二預測的目標位址,而不會分支到第一預測。 [0017]本發明的一優點是可潛在地改善執行非標準呼叫熥回序列 之程式的分支預測精確度。當使用如在此實施例中所述之覆載機制 時,所執行的模擬已顯示benchmark分數的效能改善。此外,若微處 理器已包括BTAC及另一種返回指令目標位址預測機制,則會以增加 小量硬體,來實現此優點。 [0018]在研讀說明書的其餘部分及圖式之後,本發明的其他特性及 優點將立即會變成顯然可知。 12 1281121 【實施方式】 [0025]現在參照圖1,係顯示根據本發明之管線式微處理器ι〇〇 的方塊圖。在此一實施例中,微處理器100包括指令集實質上符合x86 架構指令集(包括x86呼叫(CALL)及返回(RET)指令)之微處理器。然 而本毛月不X限於x86架構的微處理裔,而是可用於使用返回堆疊, 來預測返回指令的目標位址之任何微處理器中。 [0026]微處理器1〇〇包括指令快取記憶體1〇8。指令快取記憶體1〇8 會從耦接至微處理器1〇〇的系統記憶體中,快取指令位元組。指令快 取a己憶體108會快取數條線的指令位元組。在一實施例中,快取線包 括32個位元組的指令位元組。指令快取記憶體1〇8會從多工器懸 中’接收提取位址132。若提取位址132命中指令快取記憶體1〇8,則 指令快取記憶體108會輸出由提取位址132所指定之一快取線的指令 位兀組186。特別而言,由提取位址132所指定之此快取線的指令位 元組186會包括一個或多個返回指令。指令位元組186會經由管線暫 存器⑵及I23而沿著微處理器100管線往下傳送,如圖所示。雖然 只有二個管線暫存H m及⑵係顯示驗往下傳送的指令位元組 186 ’所以其他實施例會包括更多的管線階段。 ⑼)27]微處理器100魏括減至管線暫存器123的輸出之指令解 碼器(稱為F-階段指令解碼器114)。指令解碼器114會接收指令位元組 186及相關資訊,以及將指令位元組解碼。在一實施例中,微處理哭 1〇〇會支援可變長度的指令。指令解碼器、114會接收串流指令位元組: 並且會將指令格式化為分離指令,以判斷每健令的長度。特別而古, 才曰令解碼114會使返回㈣峨154赶真值,峨轉已解 回指令。在-實施例中,«理器、⑽包括用以執行微指令的精 令集電腦(RISC)核心、’以及指令解碼器m會將巨集指令(如幼隹 指令)轉譯成原有RISC指令集的微指令。微指令會經由管線暫存器^ 13 1281121 及127而沿著微處理器1〇〇管線往下傳送,如所示。雖然只有二個管 線暫存器U5及127係顯示用於往下傳送的微指♦,所以其他實施二 會包括更多的管線階段。例如,這些階段可包括暫存器檔案、位址產 生器、資料載入/儲存單元、整數執行單元、浮點執行單元、ΜΜχ執 行單元、SSE執行單元、以及SSE_2執行單元。 [0028] 微處理器1〇〇還包括耦接至管線暫存器127的輸出之分支解 決邏輯(稱為E-階段分支解決邏輯124)。當分支指令沿著微處理^^㈨ 管線往下傳送時,分支解決邏輯124會接收分支指令(包括返回指°令), 以及最後會決定出所有分支指令的目標位址。分支解決邏輯ΐ24θ^將 正確分支指令目標位址提供給多工器1〇6之輸入的£_階段目標位址訊 號148。此外,若目標位址係用來預測分支指令,則分支解決&amp;輯以 會接收預測目標位址。分支解決邏輯124會比較預測目標位址與正確 目標位址148,並且判斷是否做出目標位址的錯誤預測(如因為分支目 才不位址快取a己憶體(Branch Target Address Cache,簡稱陣列 102、BTAC返回堆疊104、或F_階段返回堆疊116),其全部會於底下 詳細揭露。若做出目標位址的錯誤預測,則分支解決邏輯124會產生 預測錯誤訊號158之真值。 [0029] 微處理器100還包括分支控制邏輯112,其耦接至多工器 106。分支控制邏輯112會產生多工(mux)選擇訊號168,用以控制多工 器106選擇多種輸入位址其中之一(如底下所述),而輸出當作提取位址 132。分支控制邏輯in的運作會於底下進行更詳細地說明。 [0030] 微處理器100還包括加法器182,用以接收提取位址132, 以及使提取位址132增加,而產生下個循序提取位址162,來當作多 工器106的輸入。若在已知時脈週期的期μ,未綱或執行分支指令, 則分支控制邏輯112會控制多工器1()6選擇下個循序提取位址162。 [0031] ¼處理器1〇〇還包括分支目標位址快取記憶體(btac)陣列 14 1281121 102,其耦接用以接收提取位址132。BTAC陣列102包括複數個儲存 元件,或項目(entry),每個係用以快取分支指令目標位址及相關的分支 預測資訊。當將提取位址132輸入至指令快取記憶體刚且指令快取 記憶體108回應地產生此線的指令位元組186時,btac陣列1〇2實 質上會同時產生分支指令是否存在於快取線186中的預測、分支指令 的預測目標位址、以及分支指令是否為返回指令。有助益的是,根據 本發明,BTAC陣列102也會產生覆載指標,用以指示返回指令的目 標位址應該由BTAC陣列102而不是由返回堆疊來預測,如底下詳細 地說明。 [0032] 由BTAC陣列102所預測的返回指令之目標位址164係用來 當作第H I26的輸人。多工器m的輸出(目標位址叫係用來 當作多工器106的輸入。目標位址144也會經由管線暫存器ln及113 而沿著微處理器励管線往下傳送,如所示。管線暫存器113的輸出 係稱為目標位址176。雖然只有二個管線暫存器⑴及113係顯示用 於往下傳送的目標位址144,所以其他實施例會包括好的管線階段。 [0033] 在一貝施例中,BTAC陣列102係配置為可儲存4〇96個目 標位址及_資訊之2向集合組合式(way set嶋咖㈣快取記憶體。 然而,本發明不受限於一特定實施例之bTAC陣列1〇2。在一實施例 中,提取位址132的較低位元會選擇BTAC陣列1〇2中的一组,或列。 位址標籤係儲存用於BTAC陣列102中的每個項目,用以顯示分支指 令(其目標位址係儲存於對應的項目中)的位址之較高位址位元。提取位 址132的較高位元會與選擇組中的每個項目之位址標籤進行比較。若 提取位址132的較高位元與選擇組中的有效位址標藏匹配,則btac 陣列102中的命中會發生,#係顯示BTAC陣列1〇2會預測分支指令 係存在於由提取位址132所選擇的指令快取線186中,並且係藉由^ 質上與目標位址預測164同時之指令快取記憶體1〇8而輸出。 15 1281121 [0〇34]BTAC陣列1〇2中的每個項目也會儲存存在於 m所就的指令快取線186中之分支指令的型式之指示The Quantitatve Approach (Second Edition), published in 1996 by the Morgan Kaufmann publisher in San Francisco, California, by John L. Hennessy and David A. Patterson. They provide the following examples of pipelines: Pipelines are similar to assembly lines. There are many steps in the assembly line, each of which has some contribution to the construction of the car. Although on different cars, each step will work in parallel with the other steps. In the computer pipeline, each in the pipeline The step will complete a part of the instruction. As with the assembly line, the different steps will complete different parts of the different instructions in parallel. Each of these steps is called a pipeline phase or a pipeline segment. These phases will be - one phase Connected to the lower mining section, _ recording _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ The cycle operates. Usually, in one clock cycle, the command is passed from the _ phase of the microprocessor n pipeline to the other phase. In the car ridge line, if it is in this line - In the stage, the cylinder author is idle because there is no car f operation, then the production or performance of the line will be reduced. Similarly, if the microprocessor stage is not commanded as a secret - miscellaneous _ For the pipeline bubble (plpelinebUbble) event, the performance of the processor will be reduced. 1281121 [0005] The potential cause of the pipeline bubble is the branch instruction. When encountering the branch instruction, the processor must determine the target address of the branch instruction and start The instruction is fetched at the target address (rather than the next sequential address after the branch instruction). Since the pipeline stage of the target address is explicitly determined to be just after the stage of fetching the instruction, the bubble is generated by the branch instruction. As discussed further below, microprocessors typically include a branch prediction mechanism to reduce the number of bubbles generated by branch instructions. [0006] A specific type of branch instruction is a return instruction. In order to restore the program flow to the call routine (For the purpose of transferring program control to the routine of the subroutine), the return instruction is usually executed by the subroutine. In a typical program sequence, the call routine executes a call instruction. The call instruction instructs the microprocessor to push the return address into the stack in memory and then branch the sub-normal address. The return address of the stack is the address of the instruction immediately following the call instruction in the call routine. The secondary routine will eventually execute the return instruction, which will cause the return address to leave the stack (which was previously pushed by the call instruction) Enter), and will return the address branch, which is the target address of the return instruction. An example of the return instruction is the Χ86 RET instruction. An example of the call instruction is the χ86 call instruction. [0007] One of the advantages of performing the call/return sequence It is possible to make the sub-normal call nested. For example, the = routine will call the secondary routine A to push the return address; and the secondary routine will call the secondary routine B to push the return address; then the secondary routine B will execute the return instruction. Use = to push the return address pushed by the secondary routine A; then the secondary routine a will execute the return pointer t, I to push the return address pushed by the main routine. The nested deputy call is very useful, and the above example can be extended to as many calls as the supported stack size. Because of the regular nature of the call/return sequence, modern microprocessors will "4 is called the branch prediction mechanism that returns to the stack, to predict the return instruction's destination = heap buffer is a small buffer, which is the way of the last in, first out, to cache the return address. Every time a call command is encountered, push people The memory stack's return address will also be pushed back and forth. Every 12811121 fire encounters a return time. 'The return address at the top of the return stack will be pushed out and used as the remaining target address for the return instruction. Because the microprocessor does not have to wait for the return address extracted from the memory stack, this operation will reduce the bubble. [0009] Due to the regular nature of the call/return sequence, the return stack prediction return instruction target address is very f Accurate. _, this is a fortune has found that a recording (such as some operating systems) will not always execute the call / return instructions in a standard form. For example, execute the code located on the χ86 micro-processing (e〇 De) may include a call, then push (p brain) to place a different return address on the stack and then return (RET), which will cause the return address to be pushed back instead of _ CALL after the age of practice (the store is pushed by the call ^ stack). In another case, the code will execute PUSH, Xiao will put the return address in the heap ' then CALL will be executed, then two 拙 will be executed Ding instructions, which will lead back to the return address of the J RET event towel, instead of returning to the 之前 before the 卩. New before L 之 7. This behavior will cause prediction errors due to returning to the stack , _〇] Therefore, the 'required' type can more accurately predict the return instruction target address, especially the device for executing the non-fresh call/return code. [Summary of the invention] The flag is so that the next time the return of the ^ ^ knife branch address cache memory (Branch Target Addrcss 11, Lai BTAC) Xian Lai Ling worked for the Lion Age Lai flag. In the real ~ call &quot; other mechanism used to make the return address of the instruction is BTAC, which is in the positive order, the event 'may be usually lower than the accuracy of the return stack, but accurate. In the event of a 1' quasi-call/return sequence code (eode), it will be more than 1281121 _2] In the present invention, it is proposed to return to the stack, and generate a return instruction: TM This read benefit package also includes Branch target fine-grained, _ (===;; reward. The second foot j of the microprocessor address, and the target address of the return instruction of the target position of the rib to generate the return age, then === error prediction The first thing includes branch control _, _ to return to find ^ = fixed value, then the return instruction of the second event does not pre-predict the target address of the second prediction, and the branch will branch to the _= money branch of this processor to the branch Pre-[=:Ϊ'Γ, a device for improving the prediction accuracy of the microprocessor (4) is proposed. The micro-processing material has a (C) and return stack, and each return stack will generate a return; _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ Face #尔细(四)(四)转:(四); =Select _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ A method of predicting the target address of the return finger 7 in the microprocessor. The method includes responding to the return stack error prediction return instruction, the target address 'and the minus to the true value. This method is also included in the update. After that, the branch target address cache (BTAC) is used to generate the target address. This method includes the deletion of the BTAC, and the method also includes if the overlay indicator has true. The value 'branches this microprocessor to the prediction produced by 纟btac. [0015] In another aspect, the present invention provides a device for improving the accuracy of branch prediction in a microprocessor. With a return stack and another predictive device, 1281121 (four) recorded ^ paste 'from the eye version of the cache memory static 1 including the overlay indicator. This shock also includes update logic, consumed to f = right = stacked Predictive error prediction first-occurring return instruction: The address 'location is used to update the BTAC towel's overlay indicator to true value. This includes the branch control logic '_ to the overlay indicator, if the overlay indicator is true _ ^ now: return In terms of instructions, 'used to choose from another species Pre-production pre-= will not choose the forecast generated by the return stack. [6] In the other----------------------------------------------------------------------------------------------------------------------------------------------------------------- The code is provided to the micro-segment. The ^-code-code is provided to the back-stack, which is used to generate the first-deletion of the address of the return instruction. The code further includes the second code. , ribs are provided to the branch: Address Cache Access (BTAC), which uses the second prediction of the target of the ageing target, and is used to generate the overlay indicator. If the first prediction incorrectly predicts the return of the _ event For the target address, the job title will display the predetermined value. The code also includes the third code, which is provided to the branch control logic, which is coupled to the return stack and BTAC, if the overlay indicator is displayed, the value is In the case of the return instruction of the second event, the gamma branches the microprocessor to the second predicted target address without branching to the first prediction. [0017] One advantage of the present invention is that it can potentially improve the branch prediction accuracy of programs that execute non-standard call round-trip sequences. When using the overlay mechanism as described in this embodiment, the simulations performed have shown an improvement in the performance of the benchmark score. In addition, if the microprocessor already includes BTAC and another return instruction target address prediction mechanism, this advantage will be achieved by adding a small amount of hardware. Other features and advantages of the present invention will become apparent immediately upon review of the <RTIgt; 12 1281121 [Embodiment] Referring now to Figure 1, there is shown a block diagram of a pipelined microprocessor ι according to the present invention. In this embodiment, microprocessor 100 includes a microprocessor having an instruction set that substantially conforms to an x86 architecture instruction set, including x86 call (CALL) and return (RET) instructions. However, this month is not limited to the x86 architecture of the microprocessor, but can be used in any microprocessor that uses the return stack to predict the target address of the return instruction. The microprocessor 1 includes an instruction cache memory 1〇8. The instruction cache memory 1〇8 caches the instruction byte from the system memory coupled to the microprocessor 1〇〇. The instruction caches a memory block that will cache a number of lines of instruction bytes. In one embodiment, the cache line includes 32 byte instruction byte groups. The instruction cache memory 1〇8 will receive the extracted address 132 from the multiplexer. If the fetch address 132 hits the instruction cache 1-8, the instruction cache 108 outputs the instruction bit group 186 of the cache line specified by the fetch address 132. In particular, the instruction byte 186 of the cache line designated by the fetch address 132 will include one or more return instructions. The instruction byte 186 is transferred down the microprocessor 100 pipeline via pipeline registers (2) and I23 as shown. Although only two pipelines temporarily store Hm and (2) show the instruction byte 186' transmitted down, other embodiments will include more pipeline stages. (9)) 27] The microprocessor 100 includes an instruction decoder (referred to as an F-stage instruction decoder 114) that is reduced to the output of the pipeline register 123. Instruction decoder 114 receives instruction byte 186 and associated information, and decodes the instruction byte. In one embodiment, the micro-processing cry will support variable length instructions. The instruction decoder, 114 receives the stream instruction byte: and formats the instruction as a separate instruction to determine the length of each command. In particular, it is only after the decoding 114 that the return (four) 峨 154 catches the true value, and the call has been decoded. In an embodiment, the processor, (10) includes a fine set computer (RISC) core for executing microinstructions, and the instruction decoder m translates macro instructions (eg, cub instructions) into the original RISC instructions. Set of microinstructions. The microinstructions are transferred down the microprocessor 1 via the pipeline registers ^ 13 1281121 and 127 as shown. Although only two of the pipeline registers U5 and 127 show the microfinger ♦ for downward transfer, the other implementations 2 will include more pipeline stages. For example, these stages may include a scratchpad file, an address generator, a data load/store unit, an integer execution unit, a floating point execution unit, an execution unit, an SSE execution unit, and an SSE_2 execution unit. [0028] The microprocessor 1A also includes branch resolution logic (referred to as E-stage branch resolution logic 124) coupled to the output of the pipeline register 127. When the branch instruction is transferred down the microprocessor (9) pipeline, the branch resolution logic 124 receives the branch instruction (including the return instruction) and finally determines the target address of all the branch instructions. The branch resolution logic ΐ 24θ^ provides the correct branch instruction target address to the £_stage target address signal 148 of the input of the multiplexer 1〇6. In addition, if the target address is used to predict a branch instruction, the branch resolution &amp; will receive the predicted target address. The branch resolution logic 124 compares the predicted target address with the correct target address 148 and determines whether to make a false prediction of the target address (eg, because the branch target does not address the address of the buffer (Branch Target Address Cache, referred to as The array 102, the BTAC returns to the stack 104, or the F_phase returns to the stack 116), all of which are disclosed in detail below. If a false prediction of the target address is made, the branch resolution logic 124 will generate the true value of the prediction error signal 158. [0029] The microprocessor 100 further includes branch control logic 112 coupled to the multiplexer 106. The branch control logic 112 generates a multiplex (mux) selection signal 168 for controlling the multiplexer 106 to select a plurality of input addresses. One (as described below) and the output is taken as the extracted address 132. The operation of the branch control logic in will be explained in more detail below. [0030] The microprocessor 100 further includes an adder 182 for receiving the extraction. The address 132, and the extracted address 132 are incremented, and the next sequential extracted address 162 is generated to serve as an input to the multiplexer 106. If the period of the known clock cycle is μ, the branch instruction is not executed or executed. then The branch control logic 112 controls the multiplexer 1 () 6 to select the next sequential fetch address 162. [0031] The processor 1 〇〇 further includes a branch target address cache memory (btac) array 14 12811121 102, which The BTAC array 102 includes a plurality of storage elements, or entries, each of which is used to cache the branch instruction target address and associated branch prediction information. When the address is to be extracted 132 When inputting to the instruction cache memory and the instruction cache 108 responds to generate the instruction byte 186 of the line, the btac array 1 实质上 2 substantially simultaneously generates a prediction whether the branch instruction exists in the cache line 186. The predicted target address of the branch instruction, and whether the branch instruction is a return instruction. Advantageously, in accordance with the present invention, the BTAC array 102 also generates an overlay indicator to indicate that the target address of the return instruction should be from the BTAC array. 102 is instead predicted by the return stack, as explained in detail below. [0032] The target address 164 of the return command predicted by the BTAC array 102 is used as the input of the H I 26. The output of the multiplexer m Target position The call is used as an input to the multiplexer 106. The target address 144 is also transferred down the microprocessor line via the pipeline registers ln and 113, as shown. The output of the pipeline register 113 This is referred to as the target address 176. Although only two pipeline registers (1) and 113 show the target address 144 for downstream transfer, other embodiments would include a good pipeline stage. [0033] The BTAC array 102 is configured to store 4 〇 96 target addresses and a 2-way set combination of _ information (way set ( (4) cache memory. However, the invention is not limited to the bTAC array 1〇2 of a particular embodiment. In one embodiment, extracting the lower bits of the address 132 selects one of the BTAC arrays 1 〇 2, or a column. The address tag stores each item in the BTAC array 102 for displaying the higher address bits of the address of the branch instruction whose target address is stored in the corresponding item. The higher bit of the extracted address 132 is compared to the address tag of each item in the selected group. If the higher bit of the extracted address 132 matches the valid address in the selected group, the hit in the btac array 102 will occur, and the # system shows that the BTAC array 1〇2 predicts that the branch instruction exists in the extracted address. The selected instruction cache line 186 is output by the instruction cache 1 〇 8 at the same time as the target address prediction 164. 15 12811121 [0〇34] Each entry in the BTAC array 1〇2 also stores an indication of the type of branch instruction present in the instruction cache line 186 of m.

個BTAC陣列1〇2項目中所儲存的分支指令型式節立包括二個位元 其係以如表1中所顯示的來進行編碼。The branch instruction type of the BTAC array 1〇2 project includes two bits which are encoded as shown in Table 1.

良於由提取位址 示。亦即,BTAC 4係藉由BTAC 00 不返回或呼叫 01 呼叫 10正常呼叫 11覆載返回 表1 [0035]在一實把例中,为支型式攔位的最大有效位元係位於返回訊 號138上,而覆載訊號的最小有效位元係位於覆載訊號136上。在呼 叫指令的事件中,不會使用覆載訊號136。如可觀察到的是,因為型 式攔位已經是二個位元,並且四個可能狀態中,只使用三個,所以不 需額外的儲存元件來容納覆載位元。覆載訊號136會經由管線暫存器 1〇1、103、1〇5、以及107而沿著微處理器100管線往下傳送,如所示。 特別而言,管線暫存器103的輸出(稱為覆載(oveiTide:LF訊號172)會傳 送到分支控制邏輯112。此外,管線暫存器1〇7的輸出係稱為oveiTide_E 訊號174。雖然只有四個管線暫存器l(U、1〇3、1〇5、以及107係顯示 用於往下傳送的覆载訊號136,但是其他實施例苦包括更多的管線階 段。 ' [0036]在一實施例中,當分支解決邏輯124解決新呼叫指令時,呼 16 1281121 叫指令的目標位址,以及用以顯示呼叫指令的型式欄位值合 BTAC陣列102中。同樣地’當分支解決邏輯124解決新返回指令時、, 返回指令的目標位址,以及用以顯示正常返邮旨令的型 ^ 取於BTAC陣列102中。 曰 [0037] 微處理H 還包括返回堆疊1G4(稱為btac返回堆最 1〇4) ’餘接用以接收來自於BTAC陣列1〇2的返回訊號138。 返回堆疊Η)4會讀魏㈣方式,綠取由呼叫齡触定的返回 位址。在-實施例中,當分支解決邏輯124解決新呼叫指令時,由呼 叫指令所指定的返回位址會推人BTAC返回堆疊1〇4 _端。# MM 陣列1〇2經由返回訊號138來顯示返回指令係存在於由提取位址成 所指定的快取線186中時,位於BTAC返回堆疊1〇4的頂端之返回位 址會推出’並且用來當作多工器126的目標位址142。若返回訊號138 為真且覆載減D6域’齡紐鑛輯m會_ 來控制多工器126選擇由BTAC返回堆疊1Q4所預測的目標恤142。 此外’分支控制邏輯112會經由控制訊號184,來控制多工器丨%選 擇由BTAC陣列1〇2所預測的目標位址164。 [0038] 微處理器励還包括第二返回堆疊116(稱為階段返回堆疊 116),^馬接用以接收來自於指令解碼㈣114的返回訊號154。階段 返回堆f 116會以後進先㈣方式,來快取由呼叫指令所指定的返回 位=。在-實施例中,當分支解決邏輯124解決新呼叫指令時,由呼 叫指令所指定的返回位址會推入卜階段返回堆^ 116的頂端。當指令 解碼器丄14經由返回訊號154來顯示返回指令已解碼時,位於f一階段 返回堆疊116的頂端之返回位址會推出,並且用來當作多工器126的 目標位址146。 [〇〇39]M處理器1〇〇還包括比較器118。比較器118會比較卜階段 返回堆豐116 _標位址146與往下傳送的目標位址1?6。若卜階段 1281121 返回堆疊116的目標位址146與往下傳送的目標位址176未匹配,則 比較器118會使未匹配訊號152(其會傳送到分支控制邏輯112)產生真 值。若返回訊號154為真、若override一F 172為偽、以及若未匹配訊號 152為真,則分支控制邏輯112會經由控制訊號168,來控制多工器 106選擇F-階段返回堆疊116的目標位址146。此外,分支控制邏輯 112會經由控制訊號168,來控制多工器106選擇其另外輸入其中之一。 [0040] 微處理器1〇〇還包括BTAC更新邏輯122,其耦接至分支解 決邏輯124及BTAC陣列102。BTAC更新邏輯122會接收來自於分 支解決邏輯124的預測錯誤訊號158〇BTAC更新邏輯122會接收为來 自於管線暫存器107的〇verride_F訊號174°BTAC更新邏輯122會產 生BTAC更新請求訊號134,其會傳送到BTAC陣列102。BTAC更 新請求訊號134包括用以更新BTAC陣列102的項目之資訊。在一實 施例中,BTAC更新請求訊號134包括分支指令的目標位址、分支指 令的位址、以及形式攔位之值。 [0041] 當分支解決邏輯124解決新分支指令時,BTAC更新邏輯122 會產生BTAC更新請求134,而以用以預測其後出現的新分支指令之 目標位址及型式,或例如是由提取位址132所指定的指令快取線中之 分支指令的目標位址及型式之資訊,來更新BTAC陣列102。此外, 若錯誤預測訊號158為真,則BTAC更新邏輯122會產生BTAC更新 請求134,來更新對應於分支指令之BTAC陣列1〇2中的項目。特別 而言,若分支指令為由BTAC返回堆疊1〇4或由F-階段返回堆疊n6 所錯誤預測的返回指令,則BTAC更新邏輯122會將BTAC陣列102 項目中的覆載位元指定成預定值,以表示BTAC返回堆疊1〇4的預測 142及F-階段返回堆疊116的預測146應該在返回指令的下次出現或 事件時,藉由BTAC陣列1〇2的預測164來覆載。在一實施例中,型 式攔位係設定為覆載返回值,或11,如以上的表1所指定。反之,若 18 1281121 因為已設定覆載位元,所以分支指令為由BTAC陣列102所錯誤預測 的返回指令,則BTAC更新邏輯122會將BTAC陣列102項目中的覆 載位元指定成預定值,以表示應該選擇BTAC返回堆疊104的預測 142,以及若必要的話,F-階段返回堆疊116的預測146,而不會選擇 在返回指令的下次出現或事件時之BTAC陣列102的預測164。在— 實施例中,型式攔位係設定為正常返回值,或10,如以上的表1所指 定。微處理器1〇〇的運作現在將配合圖2到4,做更完整地說明。 [0042] 現在參考圖2 ’所顯示的是根據本發明之圖1的微處理器 100之運作狀態一流程圖。圖2係描述微處理器1〇〇回應於藉由圖1 的BTAC陣列102及BTAC返回堆疊104來預測返回指令之運作。流 程係從方塊202開始。 [0043] 在方塊202,圖1的提取位址132係用於圖1的指令快取記 憶體108及並行的BTAC陣列102。回應時,指令快取記憶體1〇8會 回應提取位址132,而將圖1之快取線的指令位元組186,傳送到微處 理器100管線。流程會繼續進行方塊204。 [0044] 在方塊204,BTAC陣列102會基於提取位址132,而經由 返回訊號138,來預測返回指令係存在於指令快取記憶體1〇8送到微 處理器100的指令快取線186中,並且BTAC陣列102會將目標位址 164傳送到多工器126。流程會繼續進行判斷方塊206。 [0045] 在判斷方塊206,分支控制邏輯112會判斷是否已設定覆載 指標136。若如此,流程會繼續進行方塊212 ;否則,流程會繼續進行 方塊208。 [0046] 在方塊208,分支控制邏輯112會控制多工器126及多工器 106,選擇BTAC返回堆疊目標位址142來當作提取位址132,而使微 處理器100到此時會分支。在方塊2〇8,流程會結束。 [0047] 在方塊212,分支控制邏輯112會控制多工器126及多工器 1281121 l〇6 ’選擇BTAC陣列目標位址⑹來當作提 m 器卿_嫩。蝴212,_料。使韻理 师]如可 2觀察得知,若已設定覆載指標叫如底下配合方 塊4〇8所述之先前出現返回指令的期間),分支控制賴山會有助於 覆載BTAC返回堆疊104,並且另一種可選擇由BTAC陣列碰所預 測的目標位址164,藉此,若執行程式正執行非標準呼叫/返回序列, 則幾乎可避免由BTAC返瞒疊1()4職生的某種錯誤預測。 [0049]=見在參照圖3,所顯示的是根據本發明之圖丨的微處理器励 之運作狀態-流程@。® 3係描述微處理器刚回應於酬返回指令 (如藉由圖1的F_階段返回堆疊116之圖2中所預測的返回指令)之運 作。流程係從方塊302開始。 [0050]在方塊302,圖1的F-階段指令解碼器114會回應於施加至 圖2的方塊202中之BTAC陣列102,而將存在於由指令快取記憶體 108所輸出的指令快取線186中,並且接下來會由BTAC陣列1〇2及 BTAC返回堆疊104來預測之返回指令進行解碼,如配合圖2所述。 回應於經由返回訊號154,來表示返回指令已解碼之F_階段指令解碼 器114,F_階段返回堆疊116會將其預測的目標位址146傳送到多工器 106。流程會繼續進行方塊304。 [0051] 在方塊304,圖1的比較器118會比較圖1之心階段返回堆 疊所預測的目標位址146及目標位址176。若位址H6及176未匹配, 則比較器118會使圖1的未匹配訊號152產生真值。流程會繼續進行 判斷方塊306。 [0052] 在判斷方塊306,分支控制邏輯112會檢查未匹配訊號152, 來判斷是否發生未匹配。若如此,則流程會繼續進行判斷方塊308 ; 否則,流程會結束。 [0053] 在判斷方塊3⑽’分支控制邏輯112會檢查圖1的〇verride ρ 1281121 訊號172,來判斷是否已設定〇verride_F位元172。若如此,流程會結 束(亦即’在圖2的方塊212所執行之BTAC陣列目標位址164之分支 不會由F_階段返回堆疊所預測的目標位址146來取代)。若清除 ovemdej?位元172,則流程會繼續進行方塊312。 [0054] 在方塊312,分支控制邏輯112會控制多工器106,選擇F_ 階段返回堆疊所預測的目標位址146,而使微處理器1〇〇到此時會分 支。在一實施例中,在F_階段返回堆疊所預測的目標位址146之分支 之前’微處理器100會刷新F_階段之上的階段中之指令。在方塊312, 流程會結束。 [0055] 如可從圖3觀察得知,若已設定〇verride_F指標172(如底下 配合方塊408所述之先前出現返回指令的期間),分支控制邏輯112會 有助於覆載F-階段返回堆疊116,並且另一種可維持由BTAC陣列搬 所預測的目標位址164,藉此,若執行程式正執行非標準呼叫/返回序 列,則幾乎可避免由F-P皆段返回堆疊116所產生的某種錯誤預測。 [0056] 現在參照目4 ’彳輸示的是繪示根據本發明之圖丨的微處理 器励之運作的流程圖。圖4係描述微處理器、1〇〇回應於解決返回指 令(如圖2及3中所酬及解碼之絲事件的返回指令)之運作。流程係 從方塊402開始。 [0057] 在方塊4〇2,圖1的E-P皆段分支解決賴m會解決返回指 令。亦即’分支解決邏輯124最後會決定返回指令之圖丨的正_目# 位址148。特別而言,若使微處理$刚分支到返回指令的不正確目 標位址,則分支解決邏輯124會使圖丨的錯誤酬訊號W產生真值。 流程會繼續進行判斷方塊4〇4。 [觀]在靖方塊4〇4,BTAC更新邏輯m會檢查未匹配訊號 158,來判斷返回指令目標位歧否預測錯誤。若如此,則流程會繼續 進行判斷方塊406 ;否則,流程會結束。 21 1281121 [0059]在判斷方塊406,BTAC更新邏輯122會檢查〇肅咖 號Π4,來判斷是否已設定㈣^_ρ位元174。若如此,流程會繼浐 進行方塊408 ;否則,流程會繼續進行方塊412。 只 _0]在方塊408,BTAC更新邏輯122會產生btac更新請 134 ’而清除錯誤制返回指令的項目之覆載位元 = 已知械㈣她娜w恤目== 式麟徑達成(如上述的程式碼路鮮中之―),其總是會使返回堆疊錯 誤預測返回指令的目標錄;然而’姻返回指令也可由構成桿準坪 叫/返回對序列之程式碼路徑達成。在後者的事件中,返回堆疊一般能 更精確地預測返回指令的目標位址。因此,若當已設定覆載位元f 發生錯誤預測’翻為可預期的是,係以標準呼叫/返回對序列為主, 所以BTAC更新邏輯m會清除方塊中的覆載位元。流程 進行方塊414。 9 $ [0061] 在方塊412 ’因為F_階段返回堆疊m錯誤預測返回指令的 目標位址,所以BTAC更新邏輯122會產生BTAC更新請求m,以 設定BTAC陣列102中的合適項目中之覆載位元。藉由設定用以儲存 返回指令之刪的項目之BTAC 102的負載位元,本發明係有助於解 決由非標準呼叫/返回序列所產生的問題。亦即,會分支到btac陣列 的目標位址164,而不會分支到btac返回堆疊的目標位址142或F_ 階段返回堆疊所預測的目標位址146,其預測返回指令的目標位址將 會不正確。流程會繼續進行方塊414。 [0062] 在方塊414,因為不正確指令所導致之錯誤預測的返回指令 目標位址會從指令快取記憶體1〇8提取到微處理器1〇〇管線,所以微 處理器100會刷新其管線;因此,不必執行那些指令。接下來,分支 控制邏輯112會控制多卫n 1G6,選擇E階段目標位址146,而使微處 理器1〇〇齡分支,以提取正確目標指令。在方塊414,流程會結束。 22 1281121 [0063]在-實施例中,根據以上的表卜方塊412會以二進位值^ 來更新BTAC陣列102項目的型式攔位,方塊概會以二進位值ι〇 來更新BTAC陣列1〇2項目的型式攔位。 [0〇64]如可從圖2到4觀察得知,覆勸旨標可潛在地改善返回指令 的預測精確度。右微處理II制到因為返回堆疊錯誤預騎回指令的 目標位址,而使返回指令已執行一部份非標準呼叫/返回糊,則微處 理器會設定對應於BTAC中的返回指令之覆載指標,以及在返回指令 的下個事射,因為微處職會賴載指標中,觸返輯疊可能錯 誤預測目前出現的返回指令之目標位址,所以微處理器會使用除了^ 回堆疊之外的賴鋪,來鋼返回齡的目標健。狀,雜返 回指令先前已執行-部份的非標準呼叫/返回序列,但是若微處理器感 測到因為BTAC陣列錯誤預測返回指令的目標位址,而使返回指令接 下來已執行-部份標準呼叫/返回相,職處理器會清除對應於 BTAC中的返回才曰令之覆載指標,以及在返回指令的下個事件中,因 為微處理H會從覆載指標巾,騎返回堆疊可能正確地糊目前出現 的返回指令之目標紐,所峨處理器會賴返回堆疊,來預測返回 指令的目標位址。 ' [0065]現在參照目5,所顯示的是樹康本發明的另一實施例之管線 式微處理H 的方塊圖。圖5的微處,係與圖丨的微處理器 1〇〇類似,除了不包括BTAC返回堆疊1〇4或多工器126之外。因此, 由BTAC陣列102所輸出之預測的目標位址164會直接傳送到多工器 106,而不會經由多工器126。此外,BTAC陣列1〇2的目標位址(而不 是圖1的目標位址144)會用來當作管線暫存器ln的輸入,並且會往 下傳送,當作目標位址176。 [0066]現在參照圖6,所顯示的是繪示根據本發明的另—實施例之 圖5的微處理器500之運作的流程圖。圖6係與圖2類似,除了判斷 23 1281121 方塊206及方塊208不存在之外;因此,流程會從方塊204進行到方 塊212。因此,因為圖1的微處理器ι〇〇之BTAC返回堆疊104及多 工器126並不存在於圖5的微處理器5〇〇,所以當BTAC陣列102經 由返回訊號138來預測返回指令時,分支控制邏輯112總會用來使微 處理器500分支到由BTAC陣列102所預測的目標位址164。 [0067]圖5的微處理器5〇〇也會根據圖3及4的流程圖來運作。要 注意的是,因為BTAC返回堆疊1〇4不存在於微處理器500中,所以 往下傳送的目標位址176總是為BTAC陣列102的目標位址164 ;因 此,在F-階段返回堆疊丨16的目標位址146與目標位址176之間的方 塊304中所執行的比較會與往下傳送之BTAC陣列1〇2的目標位址 164進行比較。 [0068]雖然本發明及其目的、特性、以及優點已詳細地說明、但是 本發明包含其他的實施例。例如,雖然實施例已說明微處理器具有二 個返回堆疊,但是微處理器可具有其他數目的返回堆疊,如只^單一 返回堆疊,或超過二個返回堆疊。另外,雖然實施例已說明除了儲存 ,應於由返回堆疊所錯誤删的返回指令之覆載位元之外,btac還 疋用以覆載返回堆疊之另-種目標位址預測機制,但是可使用其他另 一種目標位址預測機制,如分支目標緩衝器。 /0069]再者’軸本個及其目的、躲、以及伽已詳細地說明、 但疋本發明包含其他的實施例。本發赚了麵硬體來實施之外,本 發明也可實祕可個⑽如,可)舰传包含的電腦可讀取 ,電腦可讀取程式碼’資料等〉中。電腦程式碼可使在此所揭露 的本發明之魏或製造可行,或者是二者皆可行。例如,這可瘦 用將Γ ^ ΓΓκΓ++、_、以及類_程式語言)義It is better to extract the address. That is, BTAC 4 is not returned by BTAC 00 or call 01 call 10 normal call 11 overlay return table 1 [0035] In a practical example, the most significant bit of the typed block is located at return signal 138 The least significant bit of the overlay signal is located on the overlay signal 136. The overlay signal 136 is not used in the event of a call instruction. As can be observed, because the pattern is already two bits and only three of the four possible states are used, no additional storage elements are needed to accommodate the overlay. The overlay signal 136 is transmitted down the microprocessor 100 pipeline via pipeline registers 1〇1, 103, 1〇5, and 107, as shown. In particular, the output of the pipeline register 103 (referred to as the overlay (oveiTide: LF signal 172) is transferred to the branch control logic 112. In addition, the output of the pipeline register 1〇7 is referred to as the oveiTide_E signal 174. Only four pipeline registers 1 (U, 1〇3, 1〇5, and 107 show the overlay signal 136 for downstream transmission, but other embodiments suffer from more pipeline stages. '[0036] In one embodiment, when branch resolution logic 124 resolves a new call instruction, call 16 1281121 is called the target address of the instruction, and the type field value used to display the call instruction is in BTAC array 102. Similarly, when the branch resolves When the logic 124 resolves the new return instruction, the target address of the return instruction, and the type used to display the normal return order are taken from the BTAC array 102. 曰[0037] The microprocessor H also includes a return stack 1G4 (called Btac returns to the heap most 1〇4) 'Remaining to receive the return signal 138 from the BTAC array 1〇2. Return to the stack Η) 4 will read the Wei (four) mode, green takes the return address determined by the call age. In an embodiment, when branch resolution logic 124 resolves a new call At the time, the return address specified by the call instruction will push the BTAC back to the stack 1〇4 _ end. # MM Array 1〇2 via the return signal 138 to display the return command system exists in the fast specified by the extracted address When the line 186 is taken, the return address located at the top of the BTAC return stack 1 会 4 will be pushed out 'and used as the target address 142 of the multiplexer 126. If the return signal 138 is true and the offset D6 field is overwritten' The multiplexer _ selects the multiplexer 126 to select the target shirt 142 predicted by the BTAC to return to the stack 1Q4. In addition, the 'branch control logic 112 controls the multiplexer 经由% by the control signal 184% selected by the BTAC array 1 The predicted target address 164 is 〇 2. [0038] The microprocessor excitation further includes a second return stack 116 (referred to as phase return stack 116) for receiving the return signal 154 from the instruction decode (four) 114. Returning to heap f 116 will advance the first (four) mode to cache the return bit specified by the call instruction =. In the embodiment, when branch resolution logic 124 resolves the new call instruction, the return address specified by the call instruction Will push the stage into the top of the heap ^ 116 When the instruction decoder 丄14 displays via the return signal 154 that the return instruction has been decoded, the return address at the top of the f-stage return stack 116 is pushed out and used as the target address 146 of the multiplexer 126. [〇〇39] The M processor 1〇〇 further includes a comparator 118. The comparator 118 compares the back-up phase 116 to the target address 146 and the target address to be transmitted 1 to 6. The phase is 1281121. Returning to the target address 146 of the stack 116 does not match the target address 176 being transferred down, the comparator 118 causes the unmatched signal 152 (which will be passed to the branch control logic 112) to produce a true value. If the return signal 154 is true, if the override F 172 is false, and if the unmatched signal 152 is true, the branch control logic 112 controls the multiplexer 106 to select the F-phase to return to the target of the stack 116 via the control signal 168. Address 146. In addition, branch control logic 112 controls control multiplexer 106 to select one of its additional inputs via control signal 168. [0040] The microprocessor 1 further includes BTAC update logic 122 coupled to the branch resolution logic 124 and the BTAC array 102. The BTAC update logic 122 receives the predicted error signal 158 from the branch resolution logic 124. The BTAC update logic 122 receives the 〇verride_F signal 174° BTAC update logic 122 from the pipeline register 107 to generate the BTAC update request signal 134. It will be transmitted to the BTAC array 102. The BTAC Update Request Signal 134 includes information to update the items of the BTAC Array 102. In one embodiment, the BTAC update request signal 134 includes the target address of the branch instruction, the address of the branch instruction, and the value of the formal block. [0041] When the branch resolution logic 124 resolves the new branch instruction, the BTAC update logic 122 generates a BTAC update request 134 to predict the target address and pattern of the new branch instruction that subsequently appears, or for example, by extracting the bit. The BTAC array 102 is updated by the information of the target address and the type of the branch instruction in the instruction cache line specified by the address 132. In addition, if the error prediction signal 158 is true, the BTAC update logic 122 will generate a BTAC update request 134 to update the entries in the BTAC array 1〇2 corresponding to the branch instructions. In particular, if the branch instruction is a return instruction that is incorrectly predicted by the BTAC returning to the stack 1〇4 or by the F-phase back to the stack n6, the BTAC update logic 122 will specify the overlay bit in the BTAC array 102 item as a predetermined order. The value, to predict BTAC to return to stack 142, and the prediction 146 of F-phase return stack 116 should be overridden by prediction 164 of BTAC array 1 在 2 at the next occurrence or event of the return instruction. In one embodiment, the pattern retaining system is set to override the return value, or 11, as specified in Table 1 above. Conversely, if 18 1281121 is a reset bit that is incorrectly predicted by the BTAC array 102 because the overlay bit has been set, the BTAC update logic 122 will specify the overlay bit in the BTAC array 102 item to a predetermined value. To indicate that the BTAC should be selected to return to the prediction 142 of the stack 104, and if necessary, the F-phase returns to the prediction 146 of the stack 116 without selecting the prediction 164 of the BTAC array 102 at the next occurrence or event of the return instruction. In the embodiment, the pattern retaining system is set to a normal return value, or 10, as specified in Table 1 above. The operation of the microprocessor 1 will now be more fully illustrated in conjunction with Figures 2 through 4. [0042] Referring now to Figure 2' is a flow chart showing the operational state of the microprocessor 100 of Figure 1 in accordance with the present invention. 2 illustrates the operation of the microprocessor 1 to predict the return command in response to the BTAC array 102 and the BTAC returning to the stack 104 of FIG. The process begins at block 202. At block 202, the fetch address 132 of FIG. 1 is used for the instruction cache body 108 of FIG. 1 and the parallel BTAC array 102. In response, the instruction cache 1 8 responds to the fetch address 132 and the instruction byte 186 of the cache line of FIG. 1 is transferred to the microprocessor 100 pipeline. The process continues with block 204. At block 204, the BTAC array 102 predicts that the return instruction is present on the instruction cache line 186 that the instruction cache memory 1 送到 8 sends to the microprocessor 100 via the return signal 138 based on the extracted address 132. The BTAC array 102 transmits the target address 164 to the multiplexer 126. The process continues with decision block 206. [0045] At decision block 206, the branch control logic 112 determines if the overlay indicator 136 has been set. If so, the process continues with block 212; otherwise, the process continues with block 208. At block 208, the branch control logic 112 controls the multiplexer 126 and the multiplexer 106 to select the BTAC to return to the stack target address 142 as the extracted address 132, causing the microprocessor 100 to branch at this time. . At block 2〇8, the process ends. At block 212, the branch control logic 112 controls the multiplexer 126 and the multiplexer 1281121 l 〇 6 ' to select the BTAC array target address (6) to be used as the clarifier. Butterfly 212, _ material. Let the rhyme] observe that if the overload indicator has been set as the period of the previous return instruction as described in the box 4〇8, the branch control Laishan will help to cover the BTAC and return to the stack. 104, and the other can select the target address 164 predicted by the BTAC array, whereby if the execution program is performing a non-standard call/return sequence, the BTAC can be prevented from being folded back to the first (4) Some kind of wrong prediction. [0049] = See, referring to Fig. 3, shown is a microprocessor-operated operational state - flow @ according to the diagram of the present invention. The ® 3 describes the operation of the microprocessor just responding to the reward return instruction (such as the return instruction predicted in Figure 2 of the stack 116 returned by the F_ phase of Figure 1). The process begins at block 302. At block 302, the F-stage instruction decoder 114 of FIG. 1 will respond to the BTAC array 102 applied to block 202 of FIG. 2, and will cache the instructions present in the instruction cache memory 108. Line 186, and then BTAC array 1 及 2 and BTAC return to stack 104 to predict the return instruction for decoding, as described in conjunction with FIG. In response to the return signal 154, the F_phase instruction decoder 114, which returns the decoded instruction, is returned to the stack 116 and its predicted target address 146 is transmitted to the multiplexer 106. The flow continues to block 304. At block 304, the comparator 118 of FIG. 1 compares the target phase address 146 and the target address 176 predicted by the core phase of FIG. If the addresses H6 and 176 do not match, the comparator 118 causes the unmatched signal 152 of FIG. 1 to generate a true value. The process continues with decision block 306. [0052] At decision block 306, the branch control logic 112 checks the unmatched signal 152 to determine if a mismatch has occurred. If so, the process continues with decision block 308; otherwise, the process ends. [0053] At decision block 3 (10), the branch control logic 112 checks the 〇verride ρ 1281121 signal 172 of FIG. 1 to determine if the 〇verride_F bit 172 has been set. If so, the process will end (i.e., the branch of the BTAC array target address 164 executed at block 212 of FIG. 2 will not be replaced by the F_phase back to the predicted target address 146 of the stack). If ovemdej? bit 172 is cleared, then flow continues to block 312. At block 312, the branch control logic 112 controls the multiplexer 106 to select the F_ stage to return to the predicted target address 146 of the stack, causing the microprocessor 1 to branch at this time. In one embodiment, the microprocessor 100 will refresh the instructions in the phase above the F_ phase before the F_phase returns to the branch of the predicted target address 146. At block 312, the process ends. [0055] As can be seen from FIG. 3, if the 〇verride_F indicator 172 has been set (as during the previous occurrence of the return instruction as described below in conjunction with block 408), the branch control logic 112 will help to override the F-phase return. Stack 116, and another can maintain the target address 164 predicted by the BTAC array, whereby if the execution program is executing a non-standard call/return sequence, then some of the FP segments are returned to the stack 116. Mispredictions. [0056] Referring now to Figure 4, a flowchart showing the operation of the microprocessor excitation in accordance with the Figure of the present invention is shown. Figure 4 illustrates the operation of the microprocessor in response to resolving the return instruction (such as the return instructions for the reconciled and decoded silk events in Figures 2 and 3). The flow begins at block 402. [0057] At block 4〇2, the E-P segmentation branch of FIG. 1 resolves the return instruction. That is, the 'branch resolution logic 124 will eventually determine the positive _ directory # address 148 of the return instruction. In particular, if the microprocessor $ is just branched to the incorrect target address of the return instruction, the branch resolution logic 124 will cause the error compensation number W of the map to produce a true value. The process will continue with decision block 4〇4. [View] In Jing Box 4〇4, the BTAC update logic m will check the unmatched signal 158 to determine whether the return instruction target bit is not predicted. If so, the process continues with decision block 406; otherwise, the process ends. 21 12811121 [0059] At decision block 406, the BTAC update logic 122 checks the 咖4 ,4 to determine if the (4)^_ρ bit 174 has been set. If so, the process proceeds to block 408; otherwise, the process continues to block 412. Only _0] At block 408, the BTAC update logic 122 will generate a btac update 134' and clear the error-based return command for the item's overwrite bit = known weapon (four) her na w-shirt == In the above code, the ") will always return the target record of the stack error prediction return instruction; however, the 'marriage return instruction can also be achieved by the code path that constitutes the bar call/return pair sequence. In the latter case, the return stack generally predicts the target address of the return instruction more accurately. Therefore, if an error prediction is made when the overlay bit f has been set, it is expected that the standard call/return pair sequence is dominant, so the BTAC update logic m will clear the overlay bit in the block. Flow proceeds to block 414. 9 $ [0061] At block 412 'because the F_ phase returns to the target address of the stack m error prediction return instruction, the BTAC update logic 122 generates a BTAC update request m to set the overlay in the appropriate entry in the BTAC array 102. Bit. The present invention is useful in solving the problems caused by non-standard call/return sequences by setting the load bits of the BTAC 102 for storing deleted items of the return instruction. That is, it will branch to the target address 164 of the btac array without branching to the target address 142 of the btac return stack or the F_ phase returning the predicted target address 146 of the stack, and the target address of the predicted return instruction will be Incorrect. The process continues with block 414. [0062] At block 414, the error-predicted return instruction target address resulting from the incorrect instruction is fetched from the instruction cache 1 8 to the microprocessor 1 pipeline, so the microprocessor 100 refreshes it. Pipeline; therefore, you do not have to execute those instructions. Next, branch control logic 112 controls the multi-node n 1G6, selects the E-phase target address 146, and causes the microprocessor 1 to branch to extract the correct target instruction. At block 414, the process ends. 22 1281121 [0063] In the embodiment, according to the above table block 412, the type of the BTAC array 102 item is updated with the binary value ^, and the block will update the BTAC array with the binary value ι〇. Type 2 block of the project. [0〇64] As can be seen from Figures 2 through 4, the advisory target can potentially improve the prediction accuracy of the return instruction. Right micro-processing II to return to the target address of the stack error pre-rideback instruction, so that the return instruction has executed a part of the non-standard call/return paste, the microprocessor will set the corresponding return command corresponding to the BTAC. The indicator, as well as the next event in the return instruction, because the micro-inspection index, the touch-back stack may mispredict the target address of the current return instruction, so the microprocessor will use the stack back. In addition to the Laipu, the goal of returning to the steel is healthy. The miscellaneous return instruction has previously been executed - part of the non-standard call/return sequence, but if the microprocessor senses that the BTAC array incorrectly predicts the target address of the return instruction, the return instruction is executed next - part The standard call/return phase, the job processor will clear the overlay indicator corresponding to the return command in the BTAC, and in the next event of the return command, because the microprocessor H will return from the overlay to the index towel, and the ride back to the stack may Correctly paste the target of the current return instruction, the processor will return to the stack to predict the target address of the return instruction. Referring now to Figure 5, a block diagram of a pipelined microprocessor H of another embodiment of the present invention is shown. The micro-section of Figure 5 is similar to the microprocessor of Figure 1, except that BTAC is not included in the stack 1 or 4 or multiplexer 126. Therefore, the predicted target address 164 output by the BTAC array 102 is directly transmitted to the multiplexer 106 without passing through the multiplexer 126. In addition, the target address of the BTAC array 1〇2 (instead of the target address 144 of Figure 1) is used as an input to the pipeline register ln and will be transmitted down as the target address 176. Referring now to Figure 6, shown is a flow chart showing the operation of the microprocessor 500 of Figure 5 in accordance with another embodiment of the present invention. Figure 6 is similar to Figure 2 except that it is determined that the block 128 12811 block 206 and block 208 are not present; therefore, the flow proceeds from block 204 to block 212. Therefore, since the BTAC return stack 104 and the multiplexer 126 of the microprocessor of FIG. 1 are not present in the microprocessor 5 of FIG. 5, when the BTAC array 102 predicts the return command via the return signal 138, The branch control logic 112 will always be used to branch the microprocessor 500 to the target address 164 predicted by the BTAC array 102. The microprocessor 5 of FIG. 5 also operates in accordance with the flowcharts of FIGS. 3 and 4. It is to be noted that since the BTAC return stack 1 〇 4 is not present in the microprocessor 500, the previously transmitted target address 176 is always the target address 164 of the BTAC array 102; therefore, the stack is returned in the F-phase. The comparison performed in block 304 between the target address 146 of the 丨16 and the target address 176 is compared to the target address 164 of the BTAC array 〇2 being transmitted downstream. While the invention and its objects, features, and advantages have been described in detail, the invention includes other embodiments. For example, although the embodiment has illustrated that the microprocessor has two return stacks, the microprocessor can have other numbers of return stacks, such as only a single return stack, or more than two return stacks. In addition, although the embodiment has been described, in addition to storage, btac is used to overlay another target address prediction mechanism that returns to the stack, in addition to the overlay bit of the return instruction that was incorrectly deleted by the return stack. Use another alternative target address prediction mechanism, such as a branch target buffer. [0069] Further, the present invention, its purpose, hiding, and gamma have been described in detail, but the present invention encompasses other embodiments. In addition to the implementation of the hardware to achieve the implementation of the hardware, the present invention can also be a secret (10), for example, the computer can be read by the ship, the computer can read the code 'data, etc.>. The computer code can make the invention or the manufacturing of the invention disclosed herein feasible, or both. For example, this can be used to Γ ^ ΓΓ Γ Γ ++, _, and class _ programming language

AlteraHI&quot;(AHD 聯的硬體描 心。5(狐),或此項技術中可用的其他程式化及/或電路(例如, 24 1281121 記錄工具來達成。電腦程式碼可置於任何已知的電腦可使用(例如,可 讀取)媒體(包括半導體記憶體、磁碟、光碟(例如,cd_r〇m、 DVD遍、以及類似之物)、以及㈣腦可使用(例如,可讀取)傳送 媒體(例如,餘’或包括數位、絲、或祕體之任意轉 的媒體)所包含的電腦資料訊號中。就本身而言,電腦程式碼可在触 網路(包括網際網路及内部網路)上傳輪。要了解到的是,本發明可實施 於電腦程式糊如,如智慧職雜)核心(如微處顧如)的部份, 或如系統階層式設計(如系統單晶片(System〇nChip,簡稱 並且會轉換成賴,當作频電㈣造的—部份。再者,本發明可實 為硬體及電腦程式瑪的纟且合。 、 術者_了_岐,在不麟後附的申 =專利範_絲之本發_鱗及制之下,為了進行與本發明相 =的目的,其可立即使闕露的概念及狀的實_,來當作設計或 修改其他的結構之基礎。 ^ 【圖式簡單說明】 [0019] 圖1係根據本發明之管線式微處理器的方塊圖; [0020] ® 2細康本發明之圖j的微處理器之運作流程圖; [0021] 圖3係根據本發明之圖1的微處理ϋ之運作流程圖; [〇〇22]圖4係根據本發明之圖1的微處理器之運作的流程圖; 圖^=]圖5練據本發_另—實施例之管線式微處理器之方塊 流程[Γ。4]圖6係根據本發明的另—實施例之圖5的微處理器之運作的 25 1281121 圖式標示說明: 100,500 :管線式微處理器 101、103、105、107、111、113、121、123、125、127 ··管線暫存器 102 : BTAC 陣列 104 : BTAC返回堆疊 106,126 :多工器 108 : 指令快取記憶體 112 : 分支控制邏輯 114 : 指令解碼器 116 : F-階段返回堆疊 118 : 比較器 122 : BTAC更新邏輯 124 : 分支解決邏輯 132 : 提取位址 134 : BTAC更新請求訊號 136 : 覆載訊號 138,154 ··返回(ret)訊號 142、144、146、164、176 :目標位址 148 : E-階段目標位址訊號 152 : 未匹配訊號 158 : 預測錯誤訊號 162 : 下個循序提取位址 164 : 預測目標位址 168 : 多工(mux)選擇訊號 172 : override_F 訊號 174 : override E 訊號 26 1281121 182 :加法器 184 :控制訊號 186 :指令位元組Altera HI&quot; (AHD combined hardware description. 5 (fox), or other stylized and/or circuits available in this technology (for example, 24 1281121 recording tools to achieve. Computer code can be placed in any known The computer can use (eg, readable) media (including semiconductor memory, disk, optical disk (eg, cd_r〇m, DVD pass, and the like), and (iv) brain usable (eg, readable) transfer The computer (for example, the media that contains the digits, the silk, or the arbitrarily transferred media) contains computer data signals. In its own right, the computer code can be touched on the Internet (including the Internet and intranet). Road) uploading wheel. It is to be understood that the present invention can be implemented in a computer program such as a smart (such as a smart) core (such as a micro-location), or as a system hierarchical design (such as a system single chip ( System〇nChip, abbreviated and converted into Lai, is part of the frequency (4). In addition, the present invention can be implemented as a combination of hardware and computer programs. The application of the patent = _ _ _ _ _ _ _ _ _ _ _ _ _ _ The purpose of the present invention is to immediately make the concept and shape of the dew as the basis for designing or modifying other structures. ^ [Simple Description of the Drawings] [0019] FIG. 1 is based on this A block diagram of a pipelined microprocessor of the invention; [0020] a flow chart of the operation of the microprocessor of FIG. 1 of the present invention; [0021] FIG. 3 is a flowchart of the operation of the microprocessor of FIG. 1 according to the present invention FIG. 4 is a flow chart showing the operation of the microprocessor of FIG. 1 according to the present invention; FIG. 5 is a block flow diagram of a pipeline microprocessor according to the present invention. 4] FIG. 6 is a schematic diagram of the operation of the microprocessor of FIG. 5 in accordance with another embodiment of the present invention. 25 12811: FIG. 100:500: pipelined microprocessors 101, 103, 105, 107, 111, 113, 121, 123, 125, 127 · Pipeline register 102: BTAC array 104: BTAC return stack 106, 126: multiplexer 108: instruction cache memory 112: branch control logic 114: instruction decoder 116: F-stage returns to stack 118: Comparator 122: BTAC Update Logic 124: Branch Resolution Logic 132: Extract address 134: BTAC update request signal 136: Overwrite signal 138, 154 · Return (ret) signal 142, 144, 146, 164, 176: Target address 148: E-stage target address signal 152: Unmatched signal 158: Predictive error signal 162: Next sequential extraction address 164: Prediction target address 168: multiplex (mux) selection signal 172: override_F signal 174: override E signal 26 12811121 182: adder 184: control signal 186: instruction byte

2727

Claims (1)

1281121 案號93122812 95年l〇月26日十、申請專利範圍:⑽ 修正本 1 · 一種微處理器,包括〜 返回堆®,用以產生一返回指令的 測; 目標位址之一第一預 一分支目標位址快取記憶體,用以吝 產生该返回指令的該目標 位址之一弟二預測,以及用以產生_ 後戰知標,其中若該第_預 測錯誤預測一第一事件的該返回指令 、 μ目軚位址,則該覆载指 標會顯不一預定值;以及 -分支控健輯,祕至該返回堆疊及該分支目標位址快取 兒憶體,右«載指標顯示該預定值,則對於—第二事件的該返 回指令而言,係用以使該微處理器分支 刀文到5亥弟二預測的該目標位 址,而不會分支到該第一預測。 2·如申請專利範圍第1項所述之微處理ϋ,其中該第二事件 係緊接於該第一事件之後。 3.如申請專利範圍第1項所述之微處理器,更包括: -更新邏輯,雛至該分支目標她快取記憶體,若該第一 預測錯誤顧m第-事件_返回齡之該目標位址,則用以將 該覆載指標更新為該預定值。 4·如申請專利範圍第3項所述之微處理器,其中若該第二預 測錯誤預測-第三事件的該返回指令之該目標位址,則該更新邏 輯會將該賴指標更新為-第二預定值,財該第二秋值與該 預定值不同。 TT^ Docket No: 0608-A40751-TW/Finall/Rita/2〇〇6/〇9/22 28 \T 1281121 5·如申請專利範圚笙 標顯示該第二敢值=、所4之微處職’射若該覆载指 分支控制邏輯會使該微::= 6.如申請__二物—預測的該目標位址。 -比##,/ 所述之微處理器、,更包括: Χσ° 11接錢分支㈣_,_比較該f 1、_ ::=第其中___第二預定值丄=: 二::=預_配’該分支控制邏輯會使 q弟—預測的該目標位址。 ㈣卿4酬切,綱返回指令 的該第二事件贿接_第二事件之後。 =物瓣4 _㈣娜,綱 的該第三事件係聽第二事件。 I 會產項所述之微處理器,其中該返回堆疊 十%A 、測’緊接著該分支目標位址快取記憶體會產生該 第二預測。 Μ 1如申請專利範圍第!項所述之微處理器,更包括: 一指令解碼邏輯,至該分支控贿輯,用以將該返财 令解碼,其中該返回堆疊會回應該指令解碼邏輯將該返回指令: 碼而產生該第一預測。 田11.如申請專利範圍第10項所述之微處理器,其中該返回堆 疊會回應肋將-呼叫指令解碼之令解碼邏輯而儲存該目標 TT^ Docket No: 〇608-A40751-TW/Finall/Rita/2006/09/22 29 位址的n二。一 -J ^ Ί月專利圍第:L項所述之微處理器,其中該返回堆 锋一生該帛預I倾齡支目標魏快取記憶體產生該 第一預測同時發生。 .士申》月專利$ϋ圍第;!_項所述之微處理器,其中該分支目 標位址快取記贿更配置収產生—指示,該指示係顯示該返回 扎令係存在於由-指令快取記憶體所提供之—快取線的指令位元 組中。 I 14·如申請專利範圍第13項所述之微處理器,其中該返回堆 宜會回應_分支目標位址快取記髓產生魏回指令係存在於 該快取線中之該指示,而產生該第一預測。 15·如申請專利範圍第13項所述之微處理器,其中該分支目 標位址快取記憶體會回制以指定該齡快取記紐中的該快取 線’而產生該返回指令係存在於該快取線中之該指示。 田16·如申請專利範圍第工項所述之微處理器,其中該返回堆 疊會回應於該分支目標位址快取記憶體產生—呼叫齡係存在於 一指令快取線中之—指示,而儲存該該目標位址的第-預測。、 I7·如申請專利範圍第i項所述之微處理器,更包括·· 一第二返回堆叠,祕至該分支控制邏輯,用以產生該返回 指令的該目標位址之一第三預測。 Ό 18.如申請專利範圍fl7項所述之微處理器,其中若該覆载 TT^s Docket No: 〇608.A40751.TW/FinaIl/Riia/2006/09/22 30 1281lit” *· t':; 年 指標顯示5¾¾]¾於該第二事件的該 支控制邏輯會使該微處理器分支到 ^而言’該分 不合分,丨兮笸一 @、日, 預领的该目標位址’而 +曰刀支到該第二預测的該目標位址。 器,其中若該覆載 19.如申請專利範圍第則所述之微處理 目標位址。 2〇.如申請專利範圍第19項所述之微處理器,更包括: -比較器’祕至該分支控制邏輯,㈣味該第—預測鱼 該第三預測。 η 21 ·如申請專利範圍帛2 0項所述之微處理器,其中若該比較 顯搞第-酬與該第三細未匹配,以及若域載指標顯示 除了該預定值之外的值,則在分支_第三賴之後,該分支控 制邏輯會使該微處理器分支到該第一預測。 22·如申請專利範圍第1項所述之微處理器,其中該分支控 制邏輯包括—多卫器,肋選擇該第-酬與該第二預測中的— 個,而傳送到一指令快取記憶體,當作用以使該微處理器分支到 該第一預測與該第二預測中的該選擇一個之一提取位址。 23 · 一種用以改善微處理器中的分支預測精確度之裝置,其 中’該微處理器具有一分支目標位址快取記憶體及一返回堆疊, 每個會產生一返回指令的一目標位址之一預測,該裝置包括: TT^S Docket No: 〇6〇8-A40751^TW/Finall/Rita/2006/09/22 31 1281121 覆載指標 HPii新補祕至錢載指標,若由該返回堆叠所產生的 顧測錯誤酬H現的該返触衫該目標她,則用以 使該覆載指標更新為一真值;以及 ,刀支控制邏輯,耗接至該覆载指標,芳該覆·標為真, 則對於-第二出現的該返回指令而言,㈣以選擇由該分支目標 位止决取技、體所產生_糊,而不會選擇蝴返轉疊所產 生的該預測。 …24·如中轉利範圍第23項所述之裝f,其巾魏回指令的 該第二出現係緊接於該第_出現之後。 &gt; 25·如申請專利範圍第23項所述之裝置,其中該覆載指標係 由該分支目標位址快取記憶體所產生。 26 ·如申請專利範圍第25項所述之裝置,其中該分支目標位 址陕取3己憶體係用以儲存複數個返回指令的複數個覆載指標,其 中右該些返回齡其巾之—為魏回指令,麟分支目標位址快 取讀、體會將該返回指令之該賴軸標其巾之—#作該覆載指 標0 27.如申請專利範圍第26項所述之裝置,其中該分支目標位 址快取記憶體絲於—提取位址輸人,來觸該些返回指令其中 之是否為该返回指令,其中該提取位址為該微處理器之一指令 快取記憶體的一位址輸入。 TT s Docket No: 〇608-A40751-TW/Finall/Rita/2006/09/22 32 Of- 1281121 如申請專利範圍第23項 K衣置’其中若由該分支目 #位址快取記賴所產生的該獅 指今&gt; # s y 、預測该弟一出現的該返回 才曰7之该目標位址,則該更新邏輯會 9q A由一 I亥覆载指標更新為-偽值。 M·如申請專利範圍第幻項所述 ΑΆ , t 义之衣置,其中若該覆载指標 馬偽,則對於該第二出現的該返回 丁 選盤士#、 7 s ’该分支控制邏輯會 k擇由该返回堆疊所產生的該預测, 不έ選擇由該分支目標位 址决取圮憶體所產生的該預測。 3〇·如申請專利範圍第23項所述 _u, 义〈衣置,更包括: 一比較裔,耦接至該分支控制邏輯 ^ + ^ 餌用以比較由該分支目標 位址快取魏體職生之對該第 血Μ π ^ 兄的该返回指令之該預測, -由该返回堆疊所產生之對該第二出現的該返回指令之該預測。 η如申請專利_3(3_述之数,其_覆載指標 為偽,則該分支控繼輯會麵由該分支目址快取記憶體所 赶之對該第二出現的該返回指令之該預測,而接下來若該比較 益顯不由該返回堆疊所產生之該酬,與由該分支目標位址快取 魏體所產生之該侧不匹配,齡轉_返㈣疊所產生之 對該第二出現的該返回指令之該預測。 32 ·如申請專利範圍第31項所述之裝置,其中在-第一時脈 週期中’該分支控制邏輯會接收由該分支目標位址快取記憶體所 產生之該預測,緊接著在一第二時脈週期中,該分支控制賴會 接收由該返回堆疊所產生之該預測。 TT s Docket No: 〇608-A40751-TW/Finall/Rita/2006/09/22 33 牟 1281121 33. 一種酬微處理器中之返回指令的目標位址之方法,包括 下列步驟: 回應於-返回堆疊錯誤糊該返回指令的該目標位址,而將 一覆載指標更新為一真值; 在該更新之後,藉由-分支目標位址快取記來產生該目 標位址之一預測; 在該分支目標概快取記憶·生該酬讀,觸該覆載 指標是否具有一真值;以及 若該覆載指標具有-真值,則使該微處理器分支到由該分支 目標位址快取記憶體所產生的該預測。 34·如申請專利範圍帛33項之方法,更包括: 回應於該分支目標魏快取記鍾錯誤酬該返回指令的該 目標位址,而將該覆載指標更新為一偽值。 35.如申請專利範圍第33項之方法,更包括: 使該微處麵分支到由該分支目標位址快取記憶體所產生的 該預測; 在該分支目標位址快取記憶體產生該目標位址的該預測之 後’藉由該返回堆疊來產生該目標位址之一預測;以及 在使該微處理器分支到由該分支目標位址快取記憶體所產生 的該預測之後,會比較由該分支目標位址快取記憶體所產生的該 預測,與由該返回堆疊來所產生的該預測。 TT^s Docket No: 〇6〇8.A40751-TW/Fmall/Rita/2006/09/22 341281121 Case No. 93712812 95年月〇26日10, the scope of application for patents: (10) Amendment 1 · A microprocessor, including ~ return to heap®, used to generate a return instruction; one of the target addresses a branch target address cache memory for generating a second prediction of the target address of the return instruction, and for generating a _ post-war knowledge, wherein the first event is predicted by the first prediction error The return instruction, μ target address, the overlay indicator will not be a predetermined value; and - branch control, the secret to the return stack and the branch target address cache memory, right « The indicator displays the predetermined value, and for the return instruction of the second event, is used to cause the microprocessor to branch the knife to the target address predicted by the 5th brother, without branching to the first prediction. 2. The microprocessing cartridge of claim 1, wherein the second event is immediately after the first event. 3. The microprocessor of claim 1, further comprising: - updating logic to the branch target her cache memory, if the first prediction error is m-event_return age The target address is used to update the overlay indicator to the predetermined value. 4. The microprocessor of claim 3, wherein if the second prediction error predicts the target address of the return instruction of the third event, the update logic updates the Lai indicator to - The second predetermined value is different from the predetermined value. TT^ Docket No: 0608-A40751-TW/Finall/Rita/2〇〇6/〇9/22 28 \T 1281121 5·If the patent application standard indicates that the second value is 4, the 4 is the slightest The job 'shooting if the overlay refers to the branch control logic will cause the micro::= 6. If the application __ two objects - the predicted target address. - than the ##, / the microprocessor, including: Χσ° 11 money branch (four) _, _ compare the f 1, _ ::= where ___ second predetermined value 丄 =: two:: = pre-configured 'this branch control logic will make q-different-predicted target address. (4) Qing 4 rewards, the second return of the order return bribes _ after the second incident. = flap 4 _ (four) Na, the third event of the class is listening to the second event. I. The microprocessor of the production item, wherein the return stack is 10% A, and the subsequent target memory address cache generates the second prediction. Μ 1 As for the scope of patent application! The microprocessor of the present invention further includes: an instruction decoding logic, to the branch control bribe, for decoding the return order, wherein the return stack is returned to the instruction decoding logic to generate the return instruction: The first prediction. The microprocessor of claim 10, wherein the return stack responds to the rib-call instruction decoding decoding logic to store the target TT^Docket No: 〇608-A40751-TW/Finall /Rita/2006/09/22 29 addresses n two. A -J ^ Ί月专利围: The microprocessor described in item L, wherein the returning to the stacking life of the 帛 I 倾 倾 魏 魏 魏 魏 魏 魏 魏 魏 魏 魏 魏 魏 魏 魏 魏 魏 魏 魏 魏 魏 魏 魏 魏 魏The patent described in the syllabus of the syllabus of the syllabus of the syllabus of the syllabus of the syllabus of the syllabus - The instruction cache is provided in the instruction byte of the cache line. The microprocessor of claim 13, wherein the return stack should respond to the indication that the branch target address cache is generated in the cache line, and the Wei return command is present in the cache line. This first prediction is generated. 15. The microprocessor of claim 13, wherein the branch target address cache memory is returned to specify the cache line in the age cache entry to generate the return instruction system. The indication in the cache line. [16] The microprocessor of claim 1, wherein the return stack is responsive to the branch target address cache memory generation - the call age is present in an instruction cache line - indication, And store the first prediction of the target address. I7. The microprocessor of claim i, further comprising: a second return stack, secret to the branch control logic for generating a third prediction of the target address of the return instruction . Ό 18. The microprocessor of claim 56, wherein if the overlay is TT^s Docket No: 〇608.A40751.TW/FinaIl/Riia/2006/09/22 30 1281lit” *·t' :; The annual indicator shows that the control logic of the second event causes the microprocessor to branch to ^, 'the score is not consistent, 丨兮笸一@,日, the pre-collected target address 'And + the knives branch to the second predicted target address. If the overlay 19. The micro-processing target address as described in the scope of the patent application. 2 〇. The microprocessor of the 19th item further comprises: - a comparator 'secret to the branch control logic, (4) a taste of the first - predicting the third prediction of the fish. η 21 · as claimed in the patent scope 帛20 item a processor, wherein if the comparison shows that the first reward does not match the third detail, and if the domain indicator displays a value other than the predetermined value, the branch control logic will be after the branch_third dependency The microprocessor branches to the first prediction. The microprocessor of claim 1, wherein The branch control logic includes a multi-guard, the rib selecting the one of the first reward and the second prediction, and transmitting to an instruction cache to act as a branch to the first prediction and The selected one of the second predictions extracts an address. 23 - A device for improving branch prediction accuracy in a microprocessor, wherein the microprocessor has a branch target address cache memory and a Returning to the stack, each of which generates a prediction of a target address of a return instruction, the device includes: TT^S Docket No: 〇6〇8-A40751^TW/Finall/Rita/2006/09/22 31 1281121 The indicator HPii is newly added to the money-bearing indicator. If the return error generated by the return stack is the target of the returning shirt, the target is updated to a true value; The knife control logic is used to consume the overlay indicator, and the flag is true. For the second occurrence of the return instruction, (4) to select the branch target to stop the technique and the body. _ paste, and will not choose the forecast produced by the butterfly back stack. 24. If the f is described in item 23 of the transfer range, the second occurrence of the towel Wei return instruction is immediately after the occurrence of the first occurrence. &gt; 25. As described in claim 23 The device, wherein the overlay indicator is generated by the branch target address cache memory. 26. The device of claim 25, wherein the branch target address is a 3 recall system for storing A plurality of returning indicators of the plurality of return instructions, wherein the right ones return the age of the towel - for the Wei return instruction, the branch target address of the branch is quickly read, and the return axis of the return instruction is marked by the towel -# 27. The apparatus of claim 26, wherein the device of claim 26, wherein the branch target address caches the memory to the input address, and touches the return instruction whether the Returning an instruction, wherein the fetch address is an address input of one of the microprocessors that instructs the cache memory. TT s Docket No: 〇608-A40751-TW/Finall/Rita/2006/09/22 32 Of- 1281121 If you apply for the scope of the patent, the 23rd item, "Where by the branch # address cache The generated lion finger today &gt;# sy , predicting the target address of the returning 曰 7 of the younger brother, the update logic 9q A is updated from an I hai coverage index to a pseudo value. M. As stated in the illusion of the scope of application for patents, t is the clothing of the right, wherein if the overlay indicator is falsified, then the branch control logic for the second occurrence of the returning singer #, 7 s ' The prediction generated by the return stack is selected, and the prediction generated by the branch target address is selected. 3〇· As stated in the 23rd item of the patent application scope, _u, Yiyi, including: a comparator, coupled to the branch control logic ^ + ^ bait is used to compare the target address of the branch The prediction of the return instruction of the blood Μ π ^ brother of the physical student, - the prediction of the return instruction for the second occurrence generated by the return stack. ηIf the patent application _3 (3_ stated in the number, the _ overlay index is false, then the branch control relay meeting is rushed by the branch directory cache memory to the second occurrence of the return instruction The prediction, and if the comparison is not affected by the return to the stack, the side is not matched with the side generated by the branch target address, and the generation is converted to (four) stack. The device of the second occurrence of the returning instruction. The device of claim 31, wherein in the first clock cycle, the branch control logic receives the target address by the branch. Taking the prediction generated by the memory, the branch control will receive the prediction generated by the return stack in a second clock cycle. TT s Docket No: 〇608-A40751-TW/Finall/ Rita/2006/09/22 33 牟1281121 33. A method for retrieving a target address of a return instruction in a microprocessor, comprising the steps of: responding to - returning a stack error to the target address of the return instruction, and An overlay indicator is updated to a true value; in the update And generating, by the branch target address cache, a prediction of the target address; obtaining, in the branch target, the memory, the reward, whether the overlay indicator has a true value; and if The overlay indicator has a true value, which causes the microprocessor to branch to the prediction generated by the branch target address cache memory. 34. The method of claim 33, further comprising: responding to the The branch target Wei fast takes the clock to return the target address of the return instruction, and updates the overlay indicator to a pseudo value. 35. The method of claim 33, further includes: making the micro-face Branching to the prediction generated by the branch target address cache memory; after the branch target address cache memory generates the prediction of the target address, 'the target address is generated by the return stack a prediction; and after causing the microprocessor to branch to the prediction generated by the branch target address cache memory, comparing the prediction generated by the branch target address cache memory, and Return to the stack The generated prediction TT ^ s Docket No: 〇6〇8.A40751-TW / Fmall / Rita / 2006/09/22 34 1281121 fiTGi: !' f ^ ί I·'; i- ^—, 36·如申請專利範圍第35項之方法,更包括: 若由該分支目標位址快取記憶體所產生的該糊,斑由該返 回堆疊來所產生的該綱未匹配,則會使該微處理器分摘倾 返回堆疊所產生的該預測。 V 37 ·如申請專利範圍第33項之方法,更包括: 在該更新之後’藉由-返回堆疊來產生該返回指令的該目標 位址之一預測;以及 不 Θ若該覆餘標具有-倾,職該微處理奸心丨由該返回 堆疊所產生的該預測。 38 ·如申請專利範圍第37項之方法,更包括: 、回應於-提取位址’會藉由該分支目標位域取記憶體,來 預測該返回齡係存在於由—齡快取記憶體所提供之—快取線 中’其中該提取位址係用以指定由該指令快取記憶體所提供之該 快取線。 ~ 39 .如申請細刪38項之方法,其中該分支目標位址快 取記憶體魅該返回齡的該目標位址的_,包細應於該分 支目標位址快取記憶_測該返回指令係存在於由該快取線中, 而產生該目標位址。 40·如申請專利範圍帛38項之方法,其中該返回堆疊產生該 返回指令的該目標位址之該預測的動作,包括回應於該分支目標 位址快取記紐酬該相指令健在於由該錄線巾,而產生 TT5s Docket No: 0608-A40751-TW/Finall/Rita/2006/09/22 35 1281121 該目標位址。 如申請專利範圍第33項之方法,更包括: 後,將該亥練記輪生雜址之該預測之 返^==4W射軸堆疊細 令解碼,而她目標健預简作,包械婦該返回指 寸=申料_第41項之方法,射在該返回指令係由 ^快取記嶋_,她蝴恤軸。 * 翻以改善微處理器Μ分支預測精確度之裝置,1 生-返 具有—返尋纽種测裝置,每個會產 生返回指令的-目標位址之一預測,以及 2 記憶體,1¾裝置包括:支目払位址快取 i载指標’域分支目標位址快取記紐所提供. 一更新邏輯’_⑽魏指標,若域返回堆疊所產 :預測錯誤侧-第__該返回指令之該目標位址,則用以 2該分支目標位址快取記憶财的該難指標靖為—真值;以 -分支控繼輯,输至該賴指標,若該賴指標為直, 携於-第二出現的該返回指令而言,係用以選擇由該另一種預 測裝置所產钱該預測,而不會選擇由該返轉疊所赴的該預 TT^s Docket No: 〇6〇8-A4075 l-TW/Finall/Rita/2006/09/22 36 1281121 測01281121 fiTGi: !' f ^ ί I·'; i- ^-, 36 · The method of claim 35, further includes: if the paste is generated by the branch target address cache memory, the spot The unmatched result of the return stack will cause the microprocessor to pick up the prediction resulting from the dump back to the stack. V 37. The method of claim 33, further comprising: "after the update, by using - returning to the stack to generate a prediction of the target address of the return instruction; and if the overlay has - Pour, the job should be micro-processing, and the prediction generated by the return stack. 38. The method of claim 37, further comprising: responding to - extracting the address by using the branch target bit field to predict that the return age is present in the age-old cache memory Provided - in the cache line 'where the extracted address is used to specify the cache line provided by the instruction cache. ~ 39. If you apply for a detailed deletion of 38 items, the branch target address cache memory MM should return the age of the target address _, the packet should be in the branch target address cache memory _ test the return The instruction system exists in the cache line to generate the target address. 40. The method of claim 38, wherein the returning stack generates the predicted action of the target address of the return instruction, including responding to the branch target address cache, and the phase command is The line towel is produced, and the TT5s Docket No is generated: 0608-A40751-TW/Finall/Rita/2006/09/22 35 1281121 The target address. For example, if the method of applying for the scope of patents is included in the 33rd paragraph, the method further includes: after the prediction of the rounding of the miscellaneous miscellaneous address, the ===4W shot stacking fine decoding, and her target is pre-made, The woman should return the finger = the method of claim _ item 41, shot in the return instruction is ^ quick to remember _, she is on the axis. * A device that improves the accuracy of the microprocessor's branch prediction. A raw-return-return-to-search device, each of which generates a return instruction - one of the target addresses, and 2 memory, 13⁄4 devices Including: branch 払 address cache i load indicator 'domain branch target address cache record provided. One update logic '_ (10) Wei indicator, if the domain returns to the stack produced: prediction error side - __ the return instruction The target address is used to capture the difficult indicator of the branch target address cache memory as the true value; the branch control succeeds to the target, and if the target is straight, carry The second occurrence of the return instruction is for selecting the prediction produced by the other prediction device, and does not select the pre-TT^s Docket No: 〇6 〇8-A4075 l-TW/Finall/Rita/2006/09/22 36 1281121 Measure 0 TT5s Docket No: 0608-A40751-TW/Finall/Rita/2006/09/22 37 1281121 案號93122812 95年10月26日 成厶d 、 年月Q窗#王$紐丨 修正頁 132 00/1\ 34 Η 104 142 BTAC返回堆疊 / \ \ —&gt; 陣列 -&amp;1 102 164 138 \/TT5s Docket No: 0608-A40751-TW/Finall/Rita/2006/09/22 37 1281121 Case No. 93122812 October 26, 1995 Cheng Cheng d, Year Month Q Window #王$纽丨修正页132 00/1\ 34 Η 104 142 BTAC Back to Stack / \ \ —&gt;Array-&amp;1 102 164 138 \/ v 一——^ 101 136 103 172 ψ 111·~~〉 v 113 176v 一——^ 101 136 103 172 ψ 111·~~〉 v 113 176 182 U U 15: 118182 U U 15: 118 分支控制邏輯 VBranch control logic V 丄 F-階段返 回堆疊 八 V F-階段指 令解碼器 \/ v丄 F-stage back to stacking Eight V F-stage instruction decoder \/ v 〆—、—〆 154 107 174 158 116 έ 125 127 \/ ν BTAC .)1- Ε-階段分支 更新邏輯 解決邏輯 122 第1圖 124 1281121 案號93122812 95年10月26曰 修正頁〆—, —〆 154 107 174 158 116 έ 125 127 \/ ν BTAC .)1- Ε-stage branch Update logic Solution logic 122 Figure 1 128 128111 Case number 93122912 October 26, 1995 Revision page 第2圖 1281121Figure 2 1281121 第3圖 1281121Figure 3 1281121 50012811 1: 一 13250012811 1: One 132 102102 ,——^,——^ 101 136 103 172 \/ 108 164 138101 136 103 172 \/ 108 164 138 指令快取 記憶體 114 154 八 \/ F-階段指 令解碼器 v —-— 丄 一 107 174 116 125 127 158 BTAC .)- E-階段分支 更新邏輯 解成邏輯 122 第5圖 124Instruction Cache Memory 114 154 VIII \/ F-stage Instruction Decoder v —- — 丄 One 107 174 116 125 127 158 BTAC .)- E-Stage Branch Update Logic Solution into Logic 122 Figure 5 124
TW093122812A 2003-10-06 2004-07-30 Apparatus and method for selectively overriding return stack prediction in response to detection of non-standard return sequence TWI281121B (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US10/679,830 US7237098B2 (en) 2003-09-08 2003-10-06 Apparatus and method for selectively overriding return stack prediction in response to detection of non-standard return sequence

Publications (2)

Publication Number Publication Date
TW200513961A TW200513961A (en) 2005-04-16
TWI281121B true TWI281121B (en) 2007-05-11

Family

ID=34394250

Family Applications (1)

Application Number Title Priority Date Filing Date
TW093122812A TWI281121B (en) 2003-10-06 2004-07-30 Apparatus and method for selectively overriding return stack prediction in response to detection of non-standard return sequence

Country Status (2)

Country Link
CN (1) CN1291311C (en)
TW (1) TWI281121B (en)

Cited By (30)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9678882B2 (en) 2012-10-11 2017-06-13 Intel Corporation Systems and methods for non-blocking implementation of cache flush instructions
US9678755B2 (en) 2010-10-12 2017-06-13 Intel Corporation Instruction sequence buffer to enhance branch prediction efficiency
US9710399B2 (en) 2012-07-30 2017-07-18 Intel Corporation Systems and methods for flushing a cache with modified data
US9720839B2 (en) 2012-07-30 2017-08-01 Intel Corporation Systems and methods for supporting a plurality of load and store accesses of a cache
US9720831B2 (en) 2012-07-30 2017-08-01 Intel Corporation Systems and methods for maintaining the coherency of a store coalescing cache and a load cache
US9733944B2 (en) 2010-10-12 2017-08-15 Intel Corporation Instruction sequence buffer to store branches having reliably predictable instruction sequences
US9766893B2 (en) 2011-03-25 2017-09-19 Intel Corporation Executing instruction sequence code blocks by using virtual cores instantiated by partitionable engines
US9767038B2 (en) 2012-03-07 2017-09-19 Intel Corporation Systems and methods for accessing a unified translation lookaside buffer
US9811377B2 (en) 2013-03-15 2017-11-07 Intel Corporation Method for executing multithreaded instructions grouped into blocks
US9811342B2 (en) 2013-03-15 2017-11-07 Intel Corporation Method for performing dual dispatch of blocks and half blocks
US9823930B2 (en) 2013-03-15 2017-11-21 Intel Corporation Method for emulating a guest centralized flag architecture by using a native distributed flag architecture
US9842005B2 (en) 2011-03-25 2017-12-12 Intel Corporation Register file segments for supporting code block execution by using virtual cores instantiated by partitionable engines
US9858080B2 (en) 2013-03-15 2018-01-02 Intel Corporation Method for implementing a reduced size register view data structure in a microprocessor
US9886416B2 (en) 2006-04-12 2018-02-06 Intel Corporation Apparatus and method for processing an instruction matrix specifying parallel and dependent operations
US9886279B2 (en) 2013-03-15 2018-02-06 Intel Corporation Method for populating and instruction view data structure by using register template snapshots
US9891924B2 (en) 2013-03-15 2018-02-13 Intel Corporation Method for implementing a reduced size register view data structure in a microprocessor
US9898412B2 (en) 2013-03-15 2018-02-20 Intel Corporation Methods, systems and apparatus for predicting the way of a set associative cache
US9916253B2 (en) 2012-07-30 2018-03-13 Intel Corporation Method and apparatus for supporting a plurality of load accesses of a cache in a single cycle to maintain throughput
US9921845B2 (en) 2011-03-25 2018-03-20 Intel Corporation Memory fragments for supporting code block execution by using virtual cores instantiated by partitionable engines
US9934042B2 (en) 2013-03-15 2018-04-03 Intel Corporation Method for dependency broadcasting through a block organized source view data structure
US9940134B2 (en) 2011-05-20 2018-04-10 Intel Corporation Decentralized allocation of resources and interconnect structures to support the execution of instruction sequences by a plurality of engines
US9965281B2 (en) 2006-11-14 2018-05-08 Intel Corporation Cache storing data fetched by address calculating load instruction with label used as associated name for consuming instruction to refer
US10031784B2 (en) 2011-05-20 2018-07-24 Intel Corporation Interconnect system to support the execution of instruction sequences by a plurality of partitionable engines
US10140138B2 (en) 2013-03-15 2018-11-27 Intel Corporation Methods, systems and apparatus for supporting wide and efficient front-end operation with guest-architecture emulation
US10146548B2 (en) 2013-03-15 2018-12-04 Intel Corporation Method for populating a source view data structure by using register template snapshots
US10169045B2 (en) 2013-03-15 2019-01-01 Intel Corporation Method for dependency broadcasting through a source organized source view data structure
US10191746B2 (en) 2011-11-22 2019-01-29 Intel Corporation Accelerated code optimizer for a multiengine microprocessor
US10198266B2 (en) 2013-03-15 2019-02-05 Intel Corporation Method for populating register view data structure by using register template snapshots
US10228949B2 (en) 2010-09-17 2019-03-12 Intel Corporation Single cycle multi-branch prediction including shadow cache for early far branch prediction
US10521239B2 (en) 2011-11-22 2019-12-31 Intel Corporation Microprocessor accelerated code optimizer

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100442226C (en) * 2007-07-02 2008-12-10 美的集团有限公司 Setting method for microwave oven return key

Cited By (56)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10289605B2 (en) 2006-04-12 2019-05-14 Intel Corporation Apparatus and method for processing an instruction matrix specifying parallel and dependent operations
US9886416B2 (en) 2006-04-12 2018-02-06 Intel Corporation Apparatus and method for processing an instruction matrix specifying parallel and dependent operations
US11163720B2 (en) 2006-04-12 2021-11-02 Intel Corporation Apparatus and method for processing an instruction matrix specifying parallel and dependent operations
US9965281B2 (en) 2006-11-14 2018-05-08 Intel Corporation Cache storing data fetched by address calculating load instruction with label used as associated name for consuming instruction to refer
US10585670B2 (en) 2006-11-14 2020-03-10 Intel Corporation Cache storing data fetched by address calculating load instruction with label used as associated name for consuming instruction to refer
US10228949B2 (en) 2010-09-17 2019-03-12 Intel Corporation Single cycle multi-branch prediction including shadow cache for early far branch prediction
US9733944B2 (en) 2010-10-12 2017-08-15 Intel Corporation Instruction sequence buffer to store branches having reliably predictable instruction sequences
US10083041B2 (en) 2010-10-12 2018-09-25 Intel Corporation Instruction sequence buffer to enhance branch prediction efficiency
US9921850B2 (en) 2010-10-12 2018-03-20 Intel Corporation Instruction sequence buffer to enhance branch prediction efficiency
US9678755B2 (en) 2010-10-12 2017-06-13 Intel Corporation Instruction sequence buffer to enhance branch prediction efficiency
US9842005B2 (en) 2011-03-25 2017-12-12 Intel Corporation Register file segments for supporting code block execution by using virtual cores instantiated by partitionable engines
US9921845B2 (en) 2011-03-25 2018-03-20 Intel Corporation Memory fragments for supporting code block execution by using virtual cores instantiated by partitionable engines
US9766893B2 (en) 2011-03-25 2017-09-19 Intel Corporation Executing instruction sequence code blocks by using virtual cores instantiated by partitionable engines
US10564975B2 (en) 2011-03-25 2020-02-18 Intel Corporation Memory fragments for supporting code block execution by using virtual cores instantiated by partitionable engines
US11204769B2 (en) 2011-03-25 2021-12-21 Intel Corporation Memory fragments for supporting code block execution by using virtual cores instantiated by partitionable engines
US9990200B2 (en) 2011-03-25 2018-06-05 Intel Corporation Executing instruction sequence code blocks by using virtual cores instantiated by partitionable engines
US9934072B2 (en) 2011-03-25 2018-04-03 Intel Corporation Register file segments for supporting code block execution by using virtual cores instantiated by partitionable engines
US10031784B2 (en) 2011-05-20 2018-07-24 Intel Corporation Interconnect system to support the execution of instruction sequences by a plurality of partitionable engines
US10372454B2 (en) 2011-05-20 2019-08-06 Intel Corporation Allocation of a segmented interconnect to support the execution of instruction sequences by a plurality of engines
US9940134B2 (en) 2011-05-20 2018-04-10 Intel Corporation Decentralized allocation of resources and interconnect structures to support the execution of instruction sequences by a plurality of engines
US10191746B2 (en) 2011-11-22 2019-01-29 Intel Corporation Accelerated code optimizer for a multiengine microprocessor
US10521239B2 (en) 2011-11-22 2019-12-31 Intel Corporation Microprocessor accelerated code optimizer
US10310987B2 (en) 2012-03-07 2019-06-04 Intel Corporation Systems and methods for accessing a unified translation lookaside buffer
US9767038B2 (en) 2012-03-07 2017-09-19 Intel Corporation Systems and methods for accessing a unified translation lookaside buffer
US10698833B2 (en) 2012-07-30 2020-06-30 Intel Corporation Method and apparatus for supporting a plurality of load accesses of a cache in a single cycle to maintain throughput
US9720839B2 (en) 2012-07-30 2017-08-01 Intel Corporation Systems and methods for supporting a plurality of load and store accesses of a cache
US10210101B2 (en) 2012-07-30 2019-02-19 Intel Corporation Systems and methods for flushing a cache with modified data
US9720831B2 (en) 2012-07-30 2017-08-01 Intel Corporation Systems and methods for maintaining the coherency of a store coalescing cache and a load cache
US9858206B2 (en) 2012-07-30 2018-01-02 Intel Corporation Systems and methods for flushing a cache with modified data
US9740612B2 (en) 2012-07-30 2017-08-22 Intel Corporation Systems and methods for maintaining the coherency of a store coalescing cache and a load cache
US9916253B2 (en) 2012-07-30 2018-03-13 Intel Corporation Method and apparatus for supporting a plurality of load accesses of a cache in a single cycle to maintain throughput
US9710399B2 (en) 2012-07-30 2017-07-18 Intel Corporation Systems and methods for flushing a cache with modified data
US10346302B2 (en) 2012-07-30 2019-07-09 Intel Corporation Systems and methods for maintaining the coherency of a store coalescing cache and a load cache
US10585804B2 (en) 2012-10-11 2020-03-10 Intel Corporation Systems and methods for non-blocking implementation of cache flush instructions
US9678882B2 (en) 2012-10-11 2017-06-13 Intel Corporation Systems and methods for non-blocking implementation of cache flush instructions
US9842056B2 (en) 2012-10-11 2017-12-12 Intel Corporation Systems and methods for non-blocking implementation of cache flush instructions
US10140138B2 (en) 2013-03-15 2018-11-27 Intel Corporation Methods, systems and apparatus for supporting wide and efficient front-end operation with guest-architecture emulation
US9811377B2 (en) 2013-03-15 2017-11-07 Intel Corporation Method for executing multithreaded instructions grouped into blocks
US9823930B2 (en) 2013-03-15 2017-11-21 Intel Corporation Method for emulating a guest centralized flag architecture by using a native distributed flag architecture
US10248570B2 (en) 2013-03-15 2019-04-02 Intel Corporation Methods, systems and apparatus for predicting the way of a set associative cache
US10255076B2 (en) 2013-03-15 2019-04-09 Intel Corporation Method for performing dual dispatch of blocks and half blocks
US10275255B2 (en) 2013-03-15 2019-04-30 Intel Corporation Method for dependency broadcasting through a source organized source view data structure
US9811342B2 (en) 2013-03-15 2017-11-07 Intel Corporation Method for performing dual dispatch of blocks and half blocks
US10169045B2 (en) 2013-03-15 2019-01-01 Intel Corporation Method for dependency broadcasting through a source organized source view data structure
US10146576B2 (en) 2013-03-15 2018-12-04 Intel Corporation Method for executing multithreaded instructions grouped into blocks
US10198266B2 (en) 2013-03-15 2019-02-05 Intel Corporation Method for populating register view data structure by using register template snapshots
US10503514B2 (en) 2013-03-15 2019-12-10 Intel Corporation Method for implementing a reduced size register view data structure in a microprocessor
US10146548B2 (en) 2013-03-15 2018-12-04 Intel Corporation Method for populating a source view data structure by using register template snapshots
US9858080B2 (en) 2013-03-15 2018-01-02 Intel Corporation Method for implementing a reduced size register view data structure in a microprocessor
US9886279B2 (en) 2013-03-15 2018-02-06 Intel Corporation Method for populating and instruction view data structure by using register template snapshots
US9891924B2 (en) 2013-03-15 2018-02-13 Intel Corporation Method for implementing a reduced size register view data structure in a microprocessor
US9898412B2 (en) 2013-03-15 2018-02-20 Intel Corporation Methods, systems and apparatus for predicting the way of a set associative cache
US10740126B2 (en) 2013-03-15 2020-08-11 Intel Corporation Methods, systems and apparatus for supporting wide and efficient front-end operation with guest-architecture emulation
US9904625B2 (en) 2013-03-15 2018-02-27 Intel Corporation Methods, systems and apparatus for predicting the way of a set associative cache
US9934042B2 (en) 2013-03-15 2018-04-03 Intel Corporation Method for dependency broadcasting through a block organized source view data structure
US11656875B2 (en) 2013-03-15 2023-05-23 Intel Corporation Method and system for instruction block to execution unit grouping

Also Published As

Publication number Publication date
CN1581070A (en) 2005-02-16
TW200513961A (en) 2005-04-16
CN1291311C (en) 2006-12-20

Similar Documents

Publication Publication Date Title
TWI281121B (en) Apparatus and method for selectively overriding return stack prediction in response to detection of non-standard return sequence
US6877089B2 (en) Branch prediction apparatus and process for restoring replaced branch history for use in future branch predictions for an executing program
CN101876891B (en) Microprocessor and method for quickly executing conditional branch instructions
TWI223195B (en) Optimized branch predictions for strongly predicted compiler branches
TWI519955B (en) Prefetcher, method of prefetch data and computer program product
CN101876889B (en) Method for performing a plurality of quick conditional branch instructions and relevant microprocessor
US20130073833A1 (en) Reducing store-hit-loads in an out-of-order processor
US20130152048A1 (en) Test method, processing device, test program generation method and test program generator
US11416256B2 (en) Selectively performing ahead branch prediction based on types of branch instructions
US10592248B2 (en) Branch target buffer compression
TW200525355A (en) Microprocessor and apparatus for performing speculative load operation from a stack memory cache
TWI251776B (en) Pipelined microprocessor, apparatus, and method for generating early instruction results
TW312775B (en) Context oriented branch history table
US20190369999A1 (en) Storing incidental branch predictions to reduce latency of misprediction recovery
US20030204705A1 (en) Prediction of branch instructions in a data processing apparatus
US7603545B2 (en) Instruction control method and processor to process instructions by out-of-order processing using delay instructions for branching
US11397685B1 (en) Storing prediction entries and stream entries where each stream entry includes a stream identifier and a plurality of sequential way predictions
CN102163139A (en) Microprocessor fusing loading arithmetic/logic operation and skip macroinstructions
US7900027B2 (en) Scalable link stack control method with full support for speculative operations
KR20230084140A (en) Restoration of speculative history used to make speculative predictions for instructions processed by processors employing control independence techniques
US20230195468A1 (en) Predicting upcoming control flow
JP2886838B2 (en) Apparatus and method for parallel decoding of variable length instructions in super scalar pipelined data processor
JP3851235B2 (en) Branch prediction apparatus and branch prediction method
JP3967363B2 (en) Branch prediction apparatus and branch prediction method