TWI283827B - Apparatus and method for efficiently updating branch target address cache - Google Patents

Apparatus and method for efficiently updating branch target address cache Download PDF

Info

Publication number
TWI283827B
TWI283827B TW093100409A TW93100409A TWI283827B TW I283827 B TWI283827 B TW I283827B TW 093100409 A TW093100409 A TW 093100409A TW 93100409 A TW93100409 A TW 93100409A TW I283827 B TWI283827 B TW I283827B
Authority
TW
Taiwan
Prior art keywords
instruction
branch
btac
target address
cache
Prior art date
Application number
TW093100409A
Other languages
Chinese (zh)
Other versions
TW200414034A (en
Inventor
Thomas C Mcdonald
Original Assignee
Ip First Llc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ip First Llc filed Critical Ip First Llc
Publication of TW200414034A publication Critical patent/TW200414034A/en
Application granted granted Critical
Publication of TWI283827B publication Critical patent/TWI283827B/en

Links

Landscapes

  • Advance Control (AREA)
  • Memory System Of A Hierarchy Structure (AREA)

Abstract

A microprocessor with a write queue for a branch target address cache (BTAC) is disclosed. The BTAC is read in parallel with an instruction cache in order to predict a target address of a branch instruction in the accessed cache line. In one embodiment, the BTAC is single-ported; hence, the single port must be shared for reading and writing. When the BTAC needs updating, such as when a branch target address is resolved, the microprocessor stores the branch target address and related information in the write queue. Thus, the write queue potentially enables updating of the BTAC to be delayed until the BTAC is not being read, such as when the instruction cache is idle, a misprediction by the BTAC is being corrected, or the prediction by the BTAC is being overridden. If the write queue becomes full, then it updates the BTAC anyway.

Description

I28382l7828twfl.doc/〇〇6 / 95-9-13 九、發明說明: 本發明是有關於一種微處理器之分支預測(branch prediction),且特別是有關於一種利用預測性分支目標位址 快取之分支預測。 先前技術 現代的微處理器都是管線(PiPeline)之微處理器。亦 即,在微處理器之不同方塊或管線階段中’可同時操作數 個指令。由 John L. Hennessy 與 David A. Patterson 在其著 作:電腦架構:量化法(Computer Architecture: A Quantitative Approach)(由摩根霍夫曼出版社(加州,舊金山) 在1996所出之第二版)中,定義管線爲:”在執行時,多個 指示彼此重疊之實施技術(an implementation technique whereby multiple instructions are overlapped in execution)。其提供管線之絕佳描述: 管線類似於裝配線。在車輛裝配線中,有許多步驟, 各步驟對車輛之組裝做出.某些貢獻。雖然對於不同車輛, 各步驟之操作平行於其他步驟。在電腦管線中,管線之各 管線完成指令之一部份。類似於裝配線,不同步驟完成了 平行之不同指令之不同部份。各步驟稱爲管線階段(pipe stage)或管線部份(pipe segment)。這些階段彼此相連以形成 管線,指令從一端進入,經由這些階段處理,並在另一端 中輸出,就如同裝配線處理車輛般。 同步微處理器係根據時脈周期而操作。一般而言,在 各時脈周期,指令從該微處理器之管線之一階段前進至另 1283822. 8twf1.doc/006I28382l7828twfl.doc/〇〇6 / 95-9-13 IX. Description of the invention: The present invention relates to a branch prediction of a microprocessor, and in particular to a method for utilizing a predictive branch target address cache Branch prediction. Prior Art Modern microprocessors are microprocessors for pipelines (PiPeline). That is, several instructions can be operated simultaneously in different blocks or pipeline stages of the microprocessor. By John L. Hennessy and David A. Patterson in his book: Computer Architecture: A Quantitative Approach (second edition in 1996 by Morgan Hoffman Press (San Francisco, California)) The definition pipeline is: "an implementation technique railway multiple instructions are overlapped in execution. It provides an excellent description of the pipeline: the pipeline is similar to the assembly line. In the vehicle assembly line, there are In many steps, each step makes some contribution to the assembly of the vehicle. Although for different vehicles, the operation of each step is parallel to the other steps. In the computer pipeline, each pipeline of the pipeline completes one part of the command. Similar to the assembly line, Different steps complete different parts of the different instructions in parallel. Each step is called a pipe stage or a pipe segment. These stages are connected to each other to form a pipeline, and instructions are entered from one end, and processed through these stages. And output in the other end, just like the assembly line handles the vehicle. The processor operates according to the clock cycle. In general, at each clock cycle, the instruction proceeds from one of the pipelines of the microprocessor to another 1283822. 8twf1.doc/006

95-9-13 一階段。在車輛裝配線中,如果因爲沒有車輛要裝配使得 線上工作員處於閒置,則該線之產量或性能會下降。相似 的,如果在一時脈周期中,某一微處理器之管線因爲沒有 指令要操作而處於閒置,通常指此狀態爲管線氣泡(pipeline bubble),則該微處理器之性能會下降。 造成管線氣泡之可能原因之一是分支指令。當處理分 支指令時,處理器必需決定該分支指令之目的位址並開始 在該目標位址處而非在該分支指令後之下一位址處擷取指 令。甚至,如果該分支指令是一狀況分支指令(亦即,必需 根據一特定狀況是否存在而決定該分支是否要執行),除了 決定該目標位址外,該處理器更必需決定該分支指令是否 要執行。因爲最後決定該目標位址及/或分支結果(亦即分支 是否要執行)之該管線階段通常處於指令擷取階段之下 方,可能會產生氣泡。 爲解決此問題,現代微處理器一般應用分支預測機制 以在管線之早期預測目標位址與分支結果。分支預測機制 之一例是分支目標位址快取(bench target address cache, BTAC),其平行於從該微處理器之一指令快取擷取指令而 預測該分支結果與目標位址。當微處理器執行分支指令且 最後決定要執行該分支與決定其目標位址時,該分支指令 之位址與其目標位址係寫入至該BTAC內。下次從該指令 快取擷取該分支指令時,該分支指令位址會命中於該BTAC 內且該BTAC可在管線早期輸出該分支指令目標位址。 有效的BTAC可排除或減少要等待分支指令決定之氣 泡數量,以改善處理器性能。然而’當該BTAC預測錯誤 I游S條和正替換η 丄 28jiS/2428twfl.doc/006 .一一一——————— » 95-9-13 時,錯誤擷取指令之管線之部份必需被放棄’以及必需擷 取正確指令,當指令放棄與擷取發生時,會在管線內造成 氣泡。當微處理器之管線更深時,BTAC之有效性更會是影 響性能之關鍵處。 BTAC之有效性主要是BTAC之命中率之作用。影響 BTAC命中率之因素之一是其所儲存目標位址之不同分支 指令數量。儲存更多的分支指令目標位址,BTAC更有效。 然而,在微處理器晶片面積總是有限,因而要儘可能令既 定功能方塊(比如BTAC)之面積變小。影響BTAC之實際面 積之一因素是將目標位址與相關資訊儲存於BTAC內之儲 存晶胞(cell)之大小。特別是,單埠晶胞之面積小於多埠晶 胞之面積。由單埠晶胞組成之BTAC在一既定時脈周期內 只能讀或寫,無法同時進行讀寫,但由多埠晶胞組成之 BTAC可在一既定時脈周期內同時進行讀寫。然而,多埠 BTAC之面積大於單埠BTAC。這意味著,假設給定BTAC 之被允許實際面積,多埠BTAC可儲存之目標位址數量必 需小於單埠BTAC可儲存之目標位址數量,因而會降低 BTAC之有效性。因此,由此觀點來看,單埠BTAC是較 佳的。 然而,由於單埠BTAC在一既定時脈周期內只能讀或 寫,無法同時進行讀寫,此事實會因爲僞性落空(false miss) 而降低BTAC有效性。在BTAC需要被讀取之周期中,當 單埠BTAC正被寫入,比如利用新目標位址來更新BTAC 或要使某一目標位址無效時,會發生僞性落空。在此情況 下,BTAC必需對該讀取產生落空,因爲其無法供給可能已 128382,7 28twfl .doc/00( Γ 9修止替換 95-9-13 存在於BTAC內之該目標位址,因爲該BTAC正被寫入。 因而,需要一種能降低單璋BTAC內之僞性落空之方 法與裝置。 可能會降低BTAC有效性之另一現象是BTAC會多次 儲存分支指令之目標位址。此現象可能發生於多向指令集 聯合(multi-way set-associative)BTAC 內。因爲 BTAC 空間 有限,多餘的目標位址儲存會降低BTAC有效性,因爲多 餘BTAC項目可儲存另一分支指令之目標位址。管線愈長, 亦即階段數愈大,多餘目標位址愈可能會存於BTAC內。 同一分支指令在BTAC內被多次快取之最常見情況是 在碼之緊湊迴圈(tight loop)內。第一次執行分支指令且其目 標位址係寫入至該BTAC,比如寫至第二向,因爲第二向是 最久未用。然而,在目標位址寫入至BTAC之前,分支指 令再次出現,亦即該BTAC查調落空之該指令快取擷取位 址,因爲該目標位址尙未寫入至該BTAC內。接著,會將 目標位址第二次寫入至該BTAC。如果在該指令集內插入不 同分支指令之BTAC讀取造成第二向不.再是最久未用,則 另一向,比如第一向,會被選擇成第二次寫入該目標位址。 現在,同一分支指令之目標位址二次存在於該BTAC內。 這是一種BTAC空間浪費且會降低BTAC有效性,因爲第 二次寫入很可能會覆蓋另一分支指令之有效目標位址。 因此,需要一種能避免同一分支指令之目標位址之多 餘快取所造成之有用BTAC空間浪費之方法與裝置。 甚至,相關於BTAC預測性之某些情況之組合會造成 微處理內之死結(deadlock)。BTAC之分支預測之組合,橫 ^28twfl.doc/〇〇6 9,' Μ,Λν 95-9-13 跨指令快取邊界線之分支指令,以及處理器匯流排會交易 預測性指令擷取之事實,會造成錯誤情況,導致某些情況 下之死結。 因而,需要一種能避免應用預測性BTAC之微處理器 內之死結情況之方法與裝置。 發明內容 本發明提供一種寫入佇列以延遲BTAC之寫入,直到 該BTAC未讀取,因而減少僞性落空率。在一觀點中,本 發明提供一種寫入佇列,改善一微處理器內之一分支目標 位址快取(BTAC)之效率。該寫入佇列包括一要求輸入,接 收一要求以更新該BTAC。該要求包括一分支指令目標位 址。該寫入佇列也包括複數儲存元件,儲存該要求輸入端 所接收之該要求。該寫入佇列也包括控制邏輯電路,耦合 至該些儲存元件,回應於一或多既定情況而將存於該些儲 存元件內之該些要求之一寫入至該BTAC。 在另一觀點中,本發明提供一種微處理器。該微處理 器包括一指令快取,回應於一指令擷取位址而提供指令位 元組之一快取線。該微處理器也包括一分支目標位址快取 (BTAC),耦合至該指令快取,預測存於該快取線內之一分 支指令之一分支目標位址。該微處理器也包括一寫入佇 列,耦合至該BTAC,儲存用於更新該BTAC之分支目標 位址。 在另一觀點中,本發明提供一種更新一微處理器內之 一分支目標位址快取(BTAC)之方法。該方法包括下列步 驟:產生一要求以更新該BTAC ;儲存該要求於一佇列;以 1283827s 28twfl - doc/006:95-9-13 One stage. In the vehicle assembly line, if the line worker is idle because there is no vehicle to be assembled, the line's output or performance may decrease. Similarly, if a microprocessor's pipeline is idle because there is no instruction to operate during a clock cycle, which is usually referred to as a pipeline bubble, the performance of the microprocessor may degrade. One of the possible causes of pipeline air bubbles is the branch instruction. When processing a branch instruction, the processor must determine the destination address of the branch instruction and begin fetching the instruction at the target address rather than at the address below the branch instruction. Even if the branch instruction is a status branch instruction (that is, it is necessary to determine whether the branch is to be executed according to whether a specific condition exists or not), in addition to determining the target address, the processor must determine whether the branch instruction is to be carried out. Because the pipeline phase that ultimately determines the target address and/or branch outcome (i.e., whether the branch is to be executed) is typically below the instruction capture phase, bubbles may be generated. To solve this problem, modern microprocessors typically apply branch prediction mechanisms to predict target addresses and branch results early in the pipeline. One example of a branch prediction mechanism is a branch target address cache (BTAC) that predicts the branch result and the target address in parallel with an instruction fetch instruction from one of the microprocessors. When the microprocessor executes the branch instruction and finally decides to execute the branch and determine its target address, the address of the branch instruction and its target address are written into the BTAC. The next time the branch instruction is fetched from the instruction cache, the branch instruction address will hit the BTAC and the BTAC can output the branch instruction target address early in the pipeline. A valid BTAC can eliminate or reduce the number of bubbles waiting to be determined by the branch instruction to improve processor performance. However, when the BTAC predicts an error I swims S and replaces η 丄28jiS/2428twfl.doc/006 .11——————— » 95-9-13, the part of the pipeline for the error capture instruction Must be abandoned 'and must take the correct instructions, when the instruction abandon and capture occurs, it will cause bubbles in the pipeline. When the microprocessor pipeline is deeper, the effectiveness of the BTAC is more critical to performance. The effectiveness of BTAC is primarily the role of BTAC's hit rate. One of the factors affecting the BTAC hit rate is the number of different branch instructions for the target address it stores. BTAC is more efficient by storing more branch instruction target addresses. However, the area of the microprocessor chip is always limited, so the area of a given functional block (such as BTAC) is made as small as possible. One factor that affects the actual area of BTAC is the size of the storage cell in which the target address and related information are stored in the BTAC. In particular, the area of the germanium cell is smaller than the area of the germanium cell. A BTAC composed of a unit cell can only be read or written in a timed pulse period and cannot be read and written at the same time, but a BTAC composed of a plurality of cells can be simultaneously read and written in a timed pulse period. However, the area of the multi-turn BTAC is larger than that of the 單埠BTAC. This means that given the allowed real area of the BTAC, the number of target addresses that can be stored by the BTAC must be less than the number of target addresses that the BTAC can store, thus reducing the effectiveness of the BTAC. Therefore, from this point of view, 單埠BTAC is better. However, since 單埠BTAC can only read or write in a single clock cycle and cannot read and write at the same time, this fact reduces the validity of BTAC due to false misses. In the period in which the BTAC needs to be read, when the BTAC is being written, such as updating the BTAC with a new target address or invalidating a certain target address, a false fall occurs. In this case, the BTAC must be frustrated by the read because it cannot supply the target address that may have been 128382,7 28twfl .doc/00 ( Γ 9 repair replace 95-9-13 exists in the BTAC because The BTAC is being written. Therefore, there is a need for a method and apparatus that can reduce the false nulls in a single BTAC. Another phenomenon that may reduce the effectiveness of the BTAC is that the BTAC will store the target address of the branch instruction multiple times. The phenomenon may occur in multi-way set-associative BTAC. Because BTAC space is limited, redundant target address storage will reduce BTAC validity, because redundant BTAC items can store the target bit of another branch instruction. The longer the pipeline, the larger the number of stages, the more likely the extra target address will be stored in the BTAC. The most common case of the same branch instruction being cached multiple times in the BTAC is the tight loop of the code (tight loop) Within the first execution of the branch instruction and its target address is written to the BTAC, such as to the second direction, because the second direction is the oldest unused. However, before the target address is written to the BTAC, the branch Instruction again That is, the BTAC checks the fetched instruction cache address, because the target address is not written into the BTAC. Then, the target address is written to the BTAC a second time. The BTAC read of the different branch instructions inserted in the instruction set causes the second direction to be no longer. If the longest is unused, the other direction, such as the first direction, will be selected to write the target address a second time. Now, the same The target address of the branch instruction exists twice in the BTAC. This is a waste of BTAC space and will reduce the validity of the BTAC, because the second write is likely to cover the effective target address of another branch instruction. Therefore, it is required A method and apparatus for avoiding wasted BTAC space caused by redundant cache of the target address of the same branch instruction. Even a combination of certain conditions related to BTAC predictability can cause a deadlock in the microprocessor. BTAC's branch prediction combination, horizontal ^28twfl.doc/〇〇6 9,' Μ, Λν 95-9-13 cross-instruction cache boundary line branch instruction, and processor bus transaction transaction predictive instruction acquisition Fact, it will cause mistakes In this case, there is a need for a method and apparatus for avoiding dead junctions in a microprocessor that uses predictive BTAC. SUMMARY OF THE INVENTION The present invention provides a write queue to delay writes of BTAC, Until the BTAC is not read, the pseudo-following rate is reduced. In one aspect, the present invention provides a write queue that improves the efficiency of a branch target address cache (BTAC) within a microprocessor. The entry queue includes a request input that receives a request to update the BTAC. This requirement includes a branch instruction target address. The write queue also includes a plurality of storage elements that store the request received by the request input. The write queue also includes control logic coupled to the storage elements to write one of the requests stored in the storage elements to the BTAC in response to one or more predetermined conditions. In another aspect, the invention provides a microprocessor. The microprocessor includes an instruction cache that provides a cache line of the instruction byte in response to an instruction fetch address. The microprocessor also includes a branch target address cache (BTAC) coupled to the instruction cache to predict a branch target address stored in one of the branch instructions in the cache line. The microprocessor also includes a write queue coupled to the BTAC for storing a branch target address for updating the BTAC. In another aspect, the present invention provides a method of updating a branch target address cache (BTAC) within a microprocessor. The method includes the steps of: generating a request to update the BTAC; storing the request in a queue; to 1283827s 28twfl - doc/006:

95-9-13 及在該儲存步驟之後,根據該要求而更新該BTAC。 在另一觀點中,本發明提供一種實施於一傳輸媒介 內之電腦資料信號,包括電腦可讀式程式碼,以提供一微 處理器。該程式碼包括第一程式碼,提供一指令快取,回 應於一指令擷取位址而提供指令位元組之一快取線。該程 式碼包括第二程式碼,提供一分支目標位址快取 (BTAC),耦合至該指令快取,以預測存於該快取線內之 一分支指令之一分支目標位址。該程式碼包括第三程式 碼,提供一寫入ί宁列,稱合至該BTAC,以儲存用於更新 該BTAC之分支目標位址。 本發明之優點在於,其能減少因BTAC被讀取時卻要 寫入至BTAC所造成僞性落空之數量,以增加BTAC之效 率。此外,本發明可應用單埠BTAC,而非應用面積較大之 多璋BTAC,以減少BTAC之面積。此外,本發明能使得 BTAC能儲存更多的目標位址,因而更有效率,比起相似尺 寸之多璋BTAC。 爲讓本發明之上述和其他目的、特徵、和優點能更明 顯易懂,下文特舉一較佳實施例,並配合所附圖式,作詳 細說明如下= 實施方式: 現參考第1圖,顯示根據本發明之一微處理器100之 方塊圖。該微處理器100包括一管線微處理器。 微處理器100包括一指令擷取器102。指令擷取器102 係從耦合至該微處理器100之一記憶體(比如,系統記憶體) 擷取出指令138。在一實施例中,指令擷取器102從快取線 I28382^28t wf 1 .doc/〇|)6 .嘴95-9-13 And after this storage step, the BTAC is updated according to the request. In another aspect, the present invention provides a computer data signal embodied in a transmission medium, including computer readable code to provide a microprocessor. The code includes a first code that provides an instruction cache and provides a cache line for one of the instruction bytes in response to an instruction fetch address. The program code includes a second code, providing a branch target address cache (BTAC) coupled to the instruction cache to predict a branch target address of a branch instruction stored in the cache line. The code includes a third code, providing a write column, called to the BTAC, to store a branch target address for updating the BTAC. The present invention has the advantage that it can reduce the amount of false fall caused by the BTAC being written to the BTAC to increase the efficiency of the BTAC. In addition, the present invention can be applied to 單埠BTAC instead of using a larger area of BTAC to reduce the area of BTAC. In addition, the present invention enables BTAC to store more target addresses and is therefore more efficient than BTACs of similar size. The above and other objects, features, and advantages of the present invention will become more apparent and understood. A block diagram of a microprocessor 100 in accordance with one embodiment of the present invention is shown. The microprocessor 100 includes a pipeline microprocessor. Microprocessor 100 includes an instruction capture device 102. The instruction fetcher 102 fetches instructions 138 from a memory (e.g., system memory) coupled to the microprocessor 100. In one embodiment, the command extractor 102 is from the cache line I28382^28t wf 1 .doc/〇|)6.

95-9-13 之基本單位(granularity)中之記憶體擷取指令。在一實施例 中,指令是長度可變式指令。亦即,微處理器之指令 集內之所有指令之長度都不相同。在一實施例中,微處理 器100包括指令集本質上相容於指令長度可變之X86架構 指令集之一微處理器。 微處理器100也包括一指令快取104,耦合至指令擷 取器102。指令快取104接收指令擷取器102輸出之指令位 元組之快取線並快取該微處理器100後續所用之指令快取 線。在一實施例中,指令快取104包括64KB之4向指令 集聯合Ll(leveM)快取。當一指令落空於該指令快取104 內時,指令快取104會通知指令擷取器102,其回應地從記 憶體擷取包括該落空指令之該快取線。一目前擷取位址162 輸入至指令快取104以選擇快取線。在一實施例中,指令 快取104內之快取線包括32位元組。指令快取104也產生 一指令快取閒置信號158。當指令快取104閒置時,指令快 取1〇4產生爲真値之指令快取閒置信號158。當指令快取 1〇4未被讀取時,指令快取104會閒置。在一實施例中.,如 果指令快取104未被讀取,則該微處理器之BTAC142(將於 底下詳細討論)也未被讀取。 微處理器100也包括一指令緩衝器106,耦合至指令 快取104。指令緩衝器106從指令快取1〇4接收指令位元組 之快取線並暫存該些快取線直到其被規格化成可被微處理 器1〇〇執行之明確性指令。在一實施例中,指令緩衝器106 包括4個項目(entry)以儲存高達4條的快取線。指令緩衝 器1〇6產生指令緩衝器全滿信號156。當指令緩衝器106 12The memory capture instruction in the basic unit of 95-9-13. In one embodiment, the instructions are length variable instructions. That is, all instructions within the microprocessor's instruction set are of different lengths. In one embodiment, microprocessor 100 includes a microprocessor that is inherently compatible with an instruction set that is inherently variable in the X86 architecture instruction set. Microprocessor 100 also includes an instruction cache 104 coupled to instruction fetcher 102. The instruction cache 104 receives the cache line of the instruction byte output by the instruction fetcher 102 and caches the instruction cache line used by the microprocessor 100. In one embodiment, the instruction cache 104 includes a 64 KB 4-way instruction set in conjunction with an Ll (leveM) cache. When an instruction falls within the instruction cache 104, the instruction cache 104 notifies the instruction fetcher 102, which responsively retrieves the cache line including the frustration command from the memory. A current capture address 162 is input to the instruction cache 104 to select the cache line. In one embodiment, the cache line within the instruction cache 104 includes 32 bytes. The instruction cache 104 also generates an instruction cache idle signal 158. When the instruction cache 104 is idle, the instruction cache 1 〇 4 is generated as a true instruction cache idle signal 158. When the instruction cache 1〇4 is not read, the instruction cache 104 is idle. In one embodiment, if the instruction cache 104 is not read, the BTAC 142 of the microprocessor (discussed in detail below) is also unread. Microprocessor 100 also includes an instruction buffer 106 coupled to instruction cache 104. The instruction buffer 106 receives the cache line of the instruction byte from the instruction cache 1〇4 and temporarily stores the cache lines until it is normalized to an explicit instruction that can be executed by the microprocessor 1. In one embodiment, the instruction buffer 106 includes four entries to store up to four cache lines. Instruction buffer 1-6 generates an instruction buffer full full signal 156. When the instruction buffer 106 12

I283822.SI283822.S

95-9-13 全滿時,指令緩衝器106產生爲真之指令緩衝器全滿信號 156。在一實施例中,如果指令緩衝器106全滿,則BTAC142 不能被讀取。 微處理器1〇〇也包括一指令規格化器108,耦合至指 令緩衝器106。指令規格化器108從指令緩衝器106接收指 令位元組並從而產生規格化指令。亦即,指令規格化器108 檢視指令緩衝器1〇6內之一串指令位元組,決定哪些位元 組包括下一指令與其長度,並輸出下一指令與其長度。在 一實施例中,規格化指令包括本質上相容於x86架構指令 集之指令。 指令規格化器108也包括產生分支目標位址之邏輯電 路,稱爲取代預測目標位址174。在一實施例中,分支目標 位址產生邏輯電路包括一加法器,將一相對分支指令之偏 差加至分支指令位址以產生取代預測目標位址174。在一實 施例中,該邏輯電路包括一分支目標緩衝器以產生間接分 支指令之目標位址。在一實施例中,該邏輯電路包括一呼 叫/回傳堆疊,以產生呼叫與回傳指令之目標位址。指令規 格化器108也包括一預測取代信號154。指令規格化器108 產生爲真之預測取代信號154以取代該微處理器100內之 BTAC142所做之分支預測,將於底下詳細描述。亦即,如 果指令規格化器108內之邏輯電路所產生之目標位址不符 合BTAC142所產生目標位址,則指令規格化器108產生爲 真之預測取代信號154以使得該BTAC142之預測所擷取之 指令被放棄並使得微處理器100分支至該取代預測目標位 址174。在一實施例中,在指令被放棄且微處理器100分支 1283827s 28twfl .doc/When 95-9-13 is full, the instruction buffer 106 generates a true instruction buffer full full signal 156. In an embodiment, if the instruction buffer 106 is full, the BTAC 142 cannot be read. The microprocessor 1 also includes an instruction normalizer 108 coupled to the instruction buffer 106. The instruction normalizer 108 receives the instruction byte from the instruction buffer 106 and thereby generates a normalized instruction. That is, the instruction normalizer 108 looks at a string of instruction bytes within the instruction buffer 1〇6, determines which bytes include the next instruction and its length, and outputs the next instruction and its length. In one embodiment, the normalized instructions include instructions that are substantially compatible with the x86 architectural instruction set. The instruction normalizer 108 also includes a logic circuit that generates a branch target address, referred to as a substitute prediction target address 174. In one embodiment, the branch target address generation logic circuit includes an adder that adds a offset of a relative branch instruction to the branch instruction address to generate a substitute prediction target address 174. In one embodiment, the logic circuit includes a branch target buffer to generate a target address of the indirect branch instruction. In one embodiment, the logic circuit includes a call/backhaul stack to generate a target address for the call and return command. The command specification 108 also includes a predictive replacement signal 154. The instruction normalizer 108 produces a true prediction replacement signal 154 to replace the branch prediction made by the BTAC 142 within the microprocessor 100, as will be described in detail below. That is, if the target address generated by the logic circuit in the instruction normalizer 108 does not conform to the target address generated by the BTAC 142, the instruction normalizer 108 generates a prediction replacement signal 154 that is true to make the prediction of the BTAC 142 The fetch instruction is discarded and causes the microprocessor 100 to branch to the alternate prediction target address 174. In one embodiment, the instruction is discarded and the microprocessor 100 branches 1283827s 28twfl .doc/

95-9-13 至該取代預測目標位址174之時間內,BTAC142不能被讀 取。 微處理器100也包括一規格化指令佇列112,耦合至 指令規格化器108。規格化指令佇列112接收從指令規格化 器108輸出之規格化指令並暫存該些規格化指令直到其被 轉譯成微指令。在一實施例中,規格化指令佇列Π2包括 儲存高達12個規格化指令之項目,雖然第12圖只顯示出4 個項目。 微處理器100也包括一指令轉譯器114,耦合至規格 化指令佇列112。指令轉譯器114將存於該規格化指令佇列 112內之該規格化指令轉譯成微指令。在一實施例中,微處 理器100包括精簡指令集電腦(reduced instruction set computer,RISC)核心,其執行本身或精簡指令集之微指令。 微處理器100也包括一轉譯後指令佇列116,耦合至 指令轉譯器114。轉譯後指令佇列116接收從指令轉譯器 114傳來之轉譯後微指令並暫存該些微指令直到其可被其 餘微處理器之管線執行爲止。 微處理器100也包括一暫存器階段118,耦合至轉譯 後指令佇列116。暫存器階段118包括複數暫存器以儲存指 令運算子與結果。暫存器階段118包括使用者可視暫存器 檔案以儲存微處理器100之使用者可視狀態。 微處理器100也包括一位址階段122,耦合至暫存器 階段118。位址階段122包括位址產生邏輯電路,產生記憶 體存取指令(比如載入或儲存指令及分支指令)之記憶體位 址。 1283827s 28twf 1 .doc/00695-9-13 By the time the super-predicted target address 174 is replaced, the BTAC 142 cannot be read. Microprocessor 100 also includes a normalized instruction queue 112 coupled to instruction normalizer 108. The normalized instruction queue 112 receives the normalized instructions output from the instruction normalizer 108 and temporarily stores the normalized instructions until they are translated into microinstructions. In one embodiment, the normalized command queue 2 includes items that store up to 12 normalized instructions, although Figure 12 shows only four items. Microprocessor 100 also includes an instruction translator 114 coupled to a normalized instruction queue 112. The instruction translator 114 translates the normalized instructions stored in the normalized instruction queue 112 into microinstructions. In one embodiment, microprocessor 100 includes a reduced instruction set computer (RISC) core that executes microinstructions of the native or reduced instruction set. Microprocessor 100 also includes a post-translation command queue 116 coupled to instruction translator 114. The post-translation command queue 116 receives the translated micro-instructions from the instruction translator 114 and temporarily stores the micro-instructions until they are executable by the pipeline of the remaining microprocessors. Microprocessor 100 also includes a register stage 118 coupled to post-translation command queue 116. The scratchpad stage 118 includes a plurality of registers to store the instruction operators and results. The scratchpad stage 118 includes a user visual register file to store the user visual status of the microprocessor 100. Microprocessor 100 also includes an address stage 122 coupled to a scratchpad stage 118. Address stage 122 includes address generation logic that generates a memory address for a memory access instruction, such as a load or store instruction and a branch instruction. 1283827s 28twf 1 .doc/006

95-9-13 微處理器100也包括一資料階段124,耦合至位址階 段122。資料階段124包括從記憶體載入資料之邏輯電路及 快取從記憶體所載入資料之一或多快取。 微處理器1〇〇也包括一執行階段126,耦合至資料階 段124。執行階段126包括執行指令之執行單元,比如執行 算數與邏輯指令之算數與邏輯單元。在一實施例中,執行 階段126包括整數執行單元,浮點執行單元,MMX執行單 元與SSE執行單元。執行階段126也包括分支指令決定邏 輯電路。特別是,執行階段126決定分支指令是否要執行 及BTAC142先前誤測之分支指令是否要執行。此外,執行 階段126決定BTAC142先前預測之分支目標位址是否被 BTAC142 測,亦即是否不正確。如果執行p皆段120決定 先前分支預測是不正確的,執行階段126產生爲真値之分 支誤測信號152,以使得由於BTAC142誤測所擷取之指令 被放棄且使得該微處理器1〇〇分支至該正確位址172。在一 實施例中,在指令被放棄且使得該微處理器1〇〇分支至該 正確位址172之期間內,BTAC142不能被讀取。 微處理器100也包括一儲存階段128,耦合至執行階 段126。儲存階段128包括回應於儲存微指令而將資料存至 記憶體之邏輯電路。儲存階段128產生一正確位址172。正 確位址172包括分支指令之正確分支目標位址。亦即,正 確位址172是分支指令之非預測性目標位址。當執行與確 定分支指令時,正確位址Π2也寫入至BTAC142,這將於 底下詳細描述。儲存階段128也產生一 BTAC寫入要求176 以更新BTAC142。BTAC寫入要求176將參考第7圖做詳 1283827« 28twf 1 .doc/006The microprocessor 100 also includes a data stage 124 coupled to the address stage 122. The data phase 124 includes logic for loading data from memory and one or more caches of data loaded from the memory. The microprocessor 1 also includes an execution stage 126 coupled to the data stage 124. Execution phase 126 includes execution units that execute instructions, such as arithmetic and logic units that execute arithmetic and logic instructions. In an embodiment, execution stage 126 includes an integer execution unit, a floating point execution unit, an MMX execution unit and an SSE execution unit. Execution phase 126 also includes branch instructions to determine the logic circuitry. In particular, execution stage 126 determines if the branch instruction is to be executed and if the branch instruction previously misdetected by BTAC 142 is to be executed. In addition, the execution phase 126 determines if the branch target address previously predicted by the BTAC 142 is measured by the BTAC 142, i.e., is incorrect. If the execution p segment 120 determines that the previous branch prediction is incorrect, the execution phase 126 generates a true branch misdetection signal 152 such that the instruction fetched due to the BTAC 142 misdetection is discarded and the microprocessor is The branch branches to the correct address 172. In one embodiment, the BTAC 142 cannot be read while the instruction is discarded and the microprocessor is branched to the correct address 172. Microprocessor 100 also includes a storage phase 128 coupled to execution stage 126. The storage phase 128 includes logic that stores data in memory in response to storing microinstructions. The storage phase 128 produces a correct address 172. The correct address 172 includes the correct branch target address of the branch instruction. That is, the correct address 172 is the non-predictive target address of the branch instruction. When the branch instruction is executed and determined, the correct address Π2 is also written to the BTAC 142, which will be described in detail below. The storage phase 128 also generates a BTAC write request 176 to update the BTAC 142. BTAC write request 176 will refer to Figure 7 for details 1283827 « 28twf 1 .doc/006

95-9-13 細描述。 微處理器100也包括一寫回階段132,耦合至儲存階 段128。寫回階段132包括將指令結果寫至暫存器階段118 之邏輯電路。 微處理器100也包括BTAC142。BTAC142包括可快 取目標位址與其他分支預測資訊之快取記憶體。BTAC142 回應於從一多工器148接收之一位址182而產生一預測目 標位址164。在一實施例中,BTAC142包括單埠快取記憶 體,被BTAC142之讀取與寫入存取所共享,因而使得 BTAC142有僞性落空(false miss)之機率。BTAC142與多工 器148將於底下詳述。 微處理器1〇〇也包括一第二多工器136,耦合至 BTAC142。多工器136選擇6個輸入之一以輸出成一目前 擷取位址162。輸入之一是由一加法器134所產生之一下一 擷取位址166,加法器134對目前擷取位址162加上快取線 之大小以產生該下一擷取位址166。在從指令快取104正常 擷取一快取線後,多工器136選擇該下一擷取位址166以 輸出成該目前擷取位址162。另一輸入是目前擷取位址 162。另一輸入是BTAC預測目標位址164,如果BTAC142 指示一分枝指令存在於從該指令快取104之該目前擷取位 址162所擇出之該快取線內且BTAC142預測出該分支指令 要被執行,則多工器136選擇BTAC預測目標位址164。另 一輸入是從儲存階段128接收之正確位址172,多工器136 選擇正確位址172以校正一分支誤測。另一輸入是從指令 規格化器108接收之取代預測目標位址174,多工器136 9-12838¾ 828twfl .doc/006 95-9-13 選擇該取代預測目標位址174以取代該BTAC測試目標位 址164。另一輸入是一目前指令指標168,其指向目前正被 該指令規格化器1〇8規格化之指令之位址。多工器136選 擇該目前指令指標168以避免死結情況,如下述。 微處理器1〇〇也包括一 BTAC寫入佇列(BWQ)144,耦 合至BTAC142。BTAC寫入佇列144包括複數儲存元件以 暫存BTAC寫入要求176,直到其可被寫入至BTAC142爲 止。BTAC寫入佇列144接收該分支誤測信號152,該預測 取代信號154,該指令緩衝器全滿信號156,與該指令快取 閒置信號158。有利的是,BTAC寫入佇列144能利用BTAC 寫入要求176來延遲BTAC142之更新,直到輸入信號 152〜158所指示之適當時間,亦即BTAC142未被讀取之時 間,以增加BTAC142之效率,將於底下詳述。 BTAC寫入佇列144產生一 BTAC寫入佇列位址178, 其輸入至多工器148。BTAC寫入佇列144也包括儲存一目 前佇列深度146之一暫存器。佇列深度146指出目前存於 BWQ144內之有效BTAC寫入要求176之數量。佇列深度 146之初始値爲0。每次將一 BTAC寫入要求176存至BTAC 寫入佇列144內,佇列深度146都會增加。每次將一 BTAC 寫入要求176從BWQ144移走,佇列深度146都會減少。 BTAC寫入佇列144將於底下詳述。 現參考第2圖,顯示根據本發明之第1圖之微處理器 之部份詳細方塊圖。第2圖顯示BTAC寫入佇列144 ’ BTAC142與第1圖之多工器148,另增加一仲裁器202,以 及耦合於該BTAC寫入佇列144與該BTAC142間之3-輸入 I283827s28 .「Π V \ vy twfl .doc/00695-9-13 Detailed description. Microprocessor 100 also includes a write back stage 132 coupled to storage stage 128. The write back phase 132 includes logic to write the result of the instruction to the scratchpad stage 118. Microprocessor 100 also includes BTAC 142. The BTAC 142 includes a cache memory that can cache the target address and other branch prediction information. The BTAC 142 generates a predicted target address 164 in response to receiving one of the addresses 182 from a multiplexer 148. In one embodiment, the BTAC 142 includes a cache memory that is shared by the read and write accesses of the BTAC 142, thereby causing the BTAC 142 to have a false miss probability. BTAC 142 and multiplexer 148 will be detailed below. The microprocessor 1A also includes a second multiplexer 136 coupled to the BTAC 142. Multiplexer 136 selects one of the six inputs to output a current capture address 162. One of the inputs is a next fetch address 166 generated by an adder 134, and the adder 134 adds the size of the cache line to the current fetch address 162 to generate the next fetch address 166. After a cache line is normally retrieved from the instruction cache 104, the multiplexer 136 selects the next capture address 166 to output the current capture address 162. Another input is the currently retrieved address 162. The other input is the BTAC prediction target address 164, if the BTAC 142 indicates that a branch instruction exists in the cache line selected from the current capture address 162 of the instruction cache 104 and the BTAC 142 predicts the branch instruction To be executed, multiplexer 136 selects BTAC prediction target address 164. The other input is the correct address 172 received from the storage phase 128, and the multiplexer 136 selects the correct address 172 to correct for a branch misdetection. The other input is the substitute predicted target address 174 received from the instruction normalizer 108, and the multiplexer 136 9-128383⁄4 828 twfl.doc/006 95-9-13 selects the replacement predicted target address 174 to replace the BTAC test target. Address 164. The other input is a current command indicator 168 that points to the address of the instruction currently being normalized by the instruction normalizer 1-8. Multiplexer 136 selects the current command indicator 168 to avoid deadlock conditions, as described below. The microprocessor 1A also includes a BTAC write queue (BWQ) 144 coupled to the BTAC 142. The BTAC write queue 144 includes a plurality of storage elements to temporarily store the BTAC write request 176 until it can be written to the BTAC 142. The BTAC write queue 144 receives the branch misdetection signal 152, which replaces the signal 154, the instruction buffer full signal 156, and the idle signal 158 is cached with the instruction. Advantageously, the BTAC write queue 144 can utilize the BTAC write request 176 to delay the update of the BTAC 142 until the appropriate time indicated by the input signals 152-158, ie, the time that the BTAC 142 has not been read, to increase the efficiency of the BTAC 142. Will be detailed below. The BTAC write queue 144 generates a BTAC write queue address 178 that is input to the multiplexer 148. The BTAC write queue 144 also includes a register that stores one of the front queue depths 146. The queue depth 146 indicates the number of valid BTAC write requests 176 currently stored in the BWQ 144. The initial 値 of the queue depth 146 is zero. Each time a BTAC write request 176 is stored in the BTAC write queue 144, the queue depth 146 is incremented. Each time a BTAC write request 176 is removed from the BWQ 144, the queue depth 146 is reduced. The BTAC write queue 144 will be detailed below. Referring now to Figure 2, there is shown a detailed block diagram of a portion of a microprocessor in accordance with Figure 1 of the present invention. Figure 2 shows the BTAC write queue 144 'BTAC142 and the multiplexer 148 of Figure 1, plus an arbiter 202, and a 3-input I283827s28 coupled between the BTAC write queue 144 and the BTAC 142. Π V \ vy twfl .doc/006

95-9-13 多工器206。雖然第1圖之多工器148只接收2個輸入,多 工器148是4-輸入多工器,如第2圖所示。如第2圖所示, BTAC142包括一讀/寫輸入,一位址輸入與一資料輸入。 如第1圖所示,多工器148接收該目前擷取位址162 與該BWQ位址178。此外,多工器148也接收一多餘TA 位址234與一死結位址236,將分別參考第HM1圖與第 12-13圖做詳細描述。多工器148根據該仲裁器202所產生 之一控制信號258而選擇其4個輸入之一以輸出成第1圖 之一位址資料182,該位址資料182係輸入至該BTAC142 之該位址輸入。 該多工器206接收一多餘TA資料信號244與一死結 資料信號246,將分別參考第10-11圖與第12-13圖做詳細 描述。多工器206也接收從該BTAC寫入佇列144傳來之 一 BWQ資料信號248,其爲該目前BTAC寫入佇列144需 要更新該BTAC142之資料。多工器206根據該仲裁器202 所產生之一控制信號262而選擇三個輸入之一以輸出成一 資料信號256,其輸入至該BTAC142之資料輸入。 仲裁器202仲裁對該BTAC142要求存取之複數來 源。當BTAC142被讀或寫時,仲裁器202產生一信號252 至該BTAC142之該讀/寫輸入以控制之。仲裁器202接收 一 BTAC讀取要求信號212,其代表平行於也利用目前擷取 位址162而對指令快取1〇4之讀取之利用目前擷取位址162 而讀取BTAC142之一要求。仲裁器202也接收一多餘目標 位址(TA)要求信號214,其代表要無效該BTAC142內之該 多餘TA位址234所選指令集內之相同分支指令之一多餘項 128382,795-9-13 Multiplexer 206. Although the multiplexer 148 of Figure 1 receives only two inputs, the multiplexer 148 is a 4-input multiplexer, as shown in Figure 2. As shown in Figure 2, the BTAC 142 includes a read/write input, an address input and a data input. As shown in FIG. 1, multiplexer 148 receives the current capture address 162 and the BWQ address 178. In addition, multiplexer 148 also receives a redundant TA address 234 and a dead node address 236, which will be described in detail with reference to Figures HM1 and 12-13, respectively. The multiplexer 148 selects one of its four inputs based on a control signal 258 generated by the arbiter 202 for output as an address data 182 of FIG. 1, which is input to the bit of the BTAC 142. Address input. The multiplexer 206 receives a redundant TA data signal 244 and a dead junction data signal 246, which will be described in detail with reference to Figures 10-11 and 12-13, respectively. The multiplexer 206 also receives a BWQ data signal 248 from the BTAC write queue 144 which updates the BTAC 142 data for the current BTAC write queue 144. The multiplexer 206 selects one of the three inputs based on one of the control signals 262 generated by the arbiter 202 to output a data signal 256 that is input to the data input of the BTAC 142. Arbiter 202 arbitrates the complex source of access to the BTAC 142. When BTAC 142 is read or written, arbiter 202 generates a signal 252 to the read/write input of BTAC 142 to control it. The arbiter 202 receives a BTAC read request signal 212 that represents one of the requirements for reading the BTAC 142 using the current capture address 162 in parallel with the read of the instruction cache 162 using the current capture address 162. . The arbiter 202 also receives a redundant target address (TA) request signal 214 representative of one of the same branch instructions in the selected instruction set to invalidate the redundant TA address 234 within the BTAC 142. 128382,7

95-9-13 目之一要求,將於底下描述。仲裁器202也接收一死結要 求信號216,其代表要將誤測該死結位址236所選之指令集 , 內之一分支指令未橫跨快取邊界線之該BTAC142內之一項 目無效化之一要求,將於底下描述。仲裁器202也接收從 該BTAC寫入佇列144輸出之一BWQ非空信號218,其代 表有待處理之至少一要求以更新該BWQ位址178所選指令 集內之該BTAC142內之一項目,這將於底下描述。仲裁器 202也接收從該BTAC寫入佇列144輸出之一 BWQ全滿信 號222,其代表該BTAC寫入佇列144塡滿了要更新該BWQ φ 位址178所選指令集內之該BTAC142內之一項目之待處理 要求,將於底下描述。 在一實施例中,仲裁器202指定優先權,如底下表1 所示,其中1代表最高優先權而5代表最低優先權: 1- 死結要求216 2- BMQ 全滿 222 3- BTAC讀取要求212 4- 多餘ΤΑ要求214 5- BWQ 非空 218 φ 現參考第3圖,顯示根據本發明之第1圖之BTAC 142 之詳細方塊圖。如第3圖,該BTAC 142包括一目標位址陣 列302, 一標籤陣列304,與一計數器陣列306。各陣列302, 304與306接收第1圖之位址182。第3圖之實施例顯示4 向指令集聯合BTAC142快取記憶體。在另一實施例中, BTAC142包括2向指令集聯合快取記憶體。在一實施例 中,該目標位址陣列302與該標籤陣列304是單埠,但計 19 1283827s, 8twf 1 .doc/006 7 1 年月曰修止替換頁 95-9-13 數器陣列306是具有一讀取埠與一寫入璋之雙埠,因爲計 數器陣列306之更新頻率高於該目標位址陣列302與該標 籤陣列304之更新頻率。 該目標位址陣列302包括一儲存元件陣列,以儲存能 快取分支目標位址與相關分支預測資訊之目標位址陣列項 目312。目標位址陣列項目312之內容將參考第4圖而於底 下描述。該標籤陣列304包括一儲存元件陣列,以儲存可 儲存位址標籤與相關分支預測資訊之標籤陣列項目314。標 籤陣列項目314之內容將參考第5圖而於底下描述。該計 數器陣列306包括一儲存元件陣列,以儲存可儲存分支結 果預測資訊之計數器陣列項目316。計數器陣列項目316 之內容將參考第6圖而於底下描述。 各目標位址陣列302,標籤陣列304,與計數器陣列 306係規劃成4向,如所示般之第0向(way 0),第1向(way 1),第2向(way 2)與第3向(way 3)。較好是,目標位址陣 列302之各向儲存2個項目或一部份,以快取分支目標位 址與預測性分支資訊,由A與B代表,使得如果有兩個分 支指令存在於快取線內,BTAC142可預測出適當之分支指 令。 各陣列302-306由第1圖之位址182做索引。位址182 之低位元選擇各陣列302-306內之快取線。在一實施例中, 各陣列302-306包括128個指令集。因此,BTACI42能快 取高達1024個目標位址,各指令集之各向(各指令集有4 向)有2個位址。較好是,陣列302-306由位址182之位元 [11 : 5]做索引以選擇BTAC142內之4向指令集。 20 128382] 28twfl .doc/006 95-9-13 現參考第4圖,顯示根據本發明之第3圖之目標位址 陣列項目312之內容。 該目標位址陣列項目312包括一分支目標位址 (TA)402。在一實施例中,目標位址402包括32-位元位址, 從分支指令之先前執行快取得。BTAC142提供關於預測TA 輸出164之目標位址402。 該目標位址陣列項目312也包括一開始欄位404。開 始欄位404代表回應於該目前擷取位址162而從該指令快 取104輸出之一快取線內之該分支指令之第一位元組之位 元組偏差(byte offset)。在一實施例中,一快取線包括32位 元組;因此,開始欄位404包括5位元。 該目標位址陣列項目312也包括一橫跨(wrap)位元 406。如果該預測分支指令係橫跨指令快取104之兩快取線 的話,橫跨位元406爲真。BTAC142提供關於B_wrap信 號1214之橫跨位元406,將參考第12圖而於底下討論。 請參考第5圖,顯示根據本發明之第3圖之標籤陣列 項目314之內容。 該標籤陣列項目314包括一標籤502。在一實施例中, 標籤502包括該分支指令之位址之高階20位元,該分支指 令使該目標位址陣列302內之相關項目儲存一預測目標位 址402。如果該項目爲有效的話,BTAC142比較標籤502 與第1圖之位址182之高階20位元以決定該項目是否匹配 於位址182,亦即位址182是否命中於BTAC142內。 該標籤陣列項目314也包括一 A有效位元504,如果 該目標位址陣列302內之相關項目之A部份內之該目標位 1283822, 8twfl .doc/006 95-9-13One of the requirements of 95-9-13 will be described below. The arbiter 202 also receives a dead-end request signal 216 representing the set of instructions selected to be mis-detected by the dead-end address 236, one of the branch instructions not invalidating one of the BTACs 142 across the cache boundary line. A request will be described below. Arbiter 202 also receives a BWQ non-null signal 218 output from the BTAC write queue 144, which represents at least one request to be processed to update an item within the BTAC 142 within the selected set of instructions of the BWQ address 178, This will be described below. Arbiter 202 also receives a BWQ full full signal 222 from the BTAC write queue 144 output, which represents that the BTAC write queue 144 is full of the BTAC 142 in the selected instruction set to update the BWQ φ address 178. The pending requirements for one of the projects will be described below. In one embodiment, the arbiter 202 assigns priority as shown in Table 1 below, where 1 represents the highest priority and 5 represents the lowest priority: 1- Dead knot requirement 216 2- BMQ Full 222 3- BTAC read request 212 4- ΤΑ ΤΑ Requirement 214 5- BWQ Non-empty 218 φ Referring now to Figure 3, a detailed block diagram of BTAC 142 in accordance with Figure 1 of the present invention is shown. As shown in FIG. 3, the BTAC 142 includes a target address array 302, a tag array 304, and a counter array 306. Each array 302, 304 and 306 receives the address 182 of Figure 1. The embodiment of Figure 3 shows a 4-way instruction set in conjunction with BTAC 142 cache memory. In another embodiment, the BTAC 142 includes a 2-way instruction set joint cache memory. In one embodiment, the target address array 302 and the tag array 304 are 單埠, but 19 1283827s, 8 twf 1 .doc/006 7 1 曰 曰 替换 替换 95 95 95 95 95 95 There is a double 埠 with a read 埠 and a write 埠 because the update frequency of the counter array 306 is higher than the update frequency of the target address array 302 and the tag array 304. The target address array 302 includes an array of storage elements to store a target address array item 312 that can cache branch target addresses and associated branch prediction information. The contents of the target address array item 312 will be described below with reference to FIG. The tag array 304 includes an array of storage elements for storing tag array entries 314 that store address tags and associated branch prediction information. The contents of the label array item 314 will be described below with reference to Figure 5. The counter array 306 includes an array of storage elements for storing counter array items 316 that store branch result prediction information. The contents of counter array item 316 will be described below with reference to Figure 6. Each target address array 302, tag array 304, and counter array 306 are planned in a 4-way direction, as shown, the 0th direction (way 0), the 1st direction (way 1), the 2nd direction (way 2) and 3rd direction (way 3). Preferably, the target address array 302 stores two items or a part in each direction to cache the branch target address and the predictive branch information, represented by A and B, so that if two branch instructions exist in the fast Within the line, BTAC142 can predict the appropriate branch instruction. Each array 302-306 is indexed by address 182 of Figure 1. The lower bits of address 182 select the cache lines in each of arrays 302-306. In one embodiment, each array 302-306 includes 128 instruction sets. Therefore, BTACI42 can quickly fetch up to 1024 target addresses, and each instruction set has 4 addresses for each direction (4 directions for each instruction set). Preferably, arrays 302-306 are indexed by bits [11:5] of address 182 to select a 4-way instruction set within BTAC 142. 20 128382] 28 twfl .doc/006 95-9-13 Referring now to Figure 4, the contents of the target address array item 312 in accordance with Figure 3 of the present invention are shown. The target address array item 312 includes a branch target address (TA) 402. In one embodiment, the target address 402 includes a 32-bit address that is fetched from the previous execution of the branch instruction. The BTAC 142 provides a target address 402 for the predicted TA output 164. The target address array item 312 also includes a start field 404. The start field 404 represents a byte offset of the first byte of the branch instruction within one of the cache lines output from the instruction fetch 104 in response to the current fetch address 162. In one embodiment, a cache line includes 32 bytes; therefore, the start field 404 includes 5 bits. The target address array item 312 also includes a wrap bit 406. The traversal bit 406 is true if the predicted branch instruction spans the two cache lines of the instruction cache 104. BTAC 142 provides a VS 406 for B_wrap signal 1214, which will be discussed below with reference to Figure 12. Referring to Figure 5, the contents of the tag array item 314 in accordance with Figure 3 of the present invention are shown. The tag array item 314 includes a tag 502. In one embodiment, tag 502 includes a high order 20 bit of the address of the branch instruction, the branch instruction causing a related item within the target address array 302 to store a predicted target address 402. If the entry is valid, BTAC 142 compares tag 502 with the higher order 20 bits of address 182 of Figure 1 to determine if the entry matches address 182, i.e., if address 182 hits BTAC 142. The tag array entry 314 also includes an A valid bit 504 if the target bit in the A portion of the associated item in the target address array 302 is 1283822, 8twfl .doc/006 95-9-13

址402爲有效的話’ A有效位元504爲真。該標籤陣列項 目314也包括一 B有效位元506,如果該目標位址陣列302 內之相關項目之B部份內之該目標位址402爲有效的話, 該B有效位元506爲真。 該標籤陣列項目314也包括一 3-位元lru欄位508, 其指示所選指令集之該4向之哪一向是lru(Least Recently Used,最久未用)。在一實施例中,當執行BTAC分支時, BTAC142只更新該lru 位508 〇亦即,只有當BTAC142 預測一分支指令將被執行且該微處理器1〇〇根據預測而 分支至該BTAC142所提供之該預測目標位址164時, BTAC142才會更新該lrii欄位508 〇當BTAC分支正被執 行時,於BTAC142未被讀取且不需要使用BTAC寫入佇 列144之期間內,BTAC142會更新lru欄位508。 請參考第6圖,顯示根據本發明之第3圖之計數器陣 列項目316之內容。 計數器陣列項目316包括一預測狀態A計數器602。 在一實施例中,該預測狀態A計數器602是2-位元飽和計 數器,每次該微處理器1〇〇決定要執行相關分支指令時’ 其往上計數;每次不執行相關分支指令時,其往下計數。 往上計數時,該預測狀態A計數器602飽和於b'11之二進 位値;往下計數時,該預測狀態A計數器602飽和於b'00 之二進位値。在一實施例中,如果該預測狀態A計數器602 之値是b'll或b'10,則BTAC142預測相關於所選目標位 址陣列項目312之A部份之分支指令要被執行;否則’ BTAC142預測分支指令不要被執行。計數器陣列項目316 22 1283827 828twfl .doc/006If address 402 is valid, then A valid bit 504 is true. The tag array entry 314 also includes a B valid bit 506 that is true if the target address 402 in the B portion of the associated item within the target address array 302 is valid. The tag array entry 314 also includes a 3-bit lru field 508 indicating which of the four directions of the selected instruction set is lru (Least Recently Used). In an embodiment, when the BTAC branch is executed, the BTAC 142 only updates the lru bit 508, that is, only when the BTAC 142 predicts that a branch instruction will be executed and the microprocessor 1 branches to the BTAC 142 according to the prediction. When the predicted target address 164 is predicted, the BTAC 142 updates the lrii field 508. When the BTAC branch is being executed, the BTAC 142 is updated during the period in which the BTAC 142 is not read and the BTAC is not required to be written to the queue 144. Lru field 508. Referring to Figure 6, the contents of counter array item 316 in accordance with Figure 3 of the present invention are shown. Counter array item 316 includes a predicted state A counter 602. In one embodiment, the predicted state A counter 602 is a 2-bit saturation counter, each time the microprocessor 1 determines that the relevant branch instruction is to be executed, 'it counts up; each time the relevant branch instruction is not executed , it counts down. When counting up, the predicted state A counter 602 is saturated with the binary carry of b'11; when counting down, the predicted state A counter 602 is saturated with the binary carry of b'00. In an embodiment, if the predicted state A counter 602 is then b'll or b'10, the BTAC 142 predicts that the branch instruction associated with the A portion of the selected target address array item 312 is to be executed; otherwise ' BTAC142 predicts that branch instructions should not be executed. Counter Array Item 316 22 1283827 828twfl .doc/006

95-9-13 也包括一預測狀態B計數器604,其操作相似於該預測狀 態A計數器602,但其相關於所選目標位址陣列項目312 之B部份。 計數器陣列項目316也包括一 A/Blru位元606〇A/Blm 位元606內之b'l之二進位値代表所選目標位址陣列項目 312之A部份是最久未用;否則,則是所選目標位址陣列 項目312之B部份是最久未用。在一實施例中,當分支指 令到達會決定分支結果(亦即分支要執行與否)之該儲存階 段128時,Α/Β1ηι位元606連同該預測狀態A與B計數器 602與604 —起被更新。在一實施例中,更新計數器陣列項 目316不需要使用到BTAC寫入佇列144,因爲計數器陣列 306包括一讀取埠與一寫入埠,如第3圖所示。 現請參考第7圖,顯示根據本發明之第1圖之BTAC 寫入要求176之內容。第7圖顯示輸入至BTAC寫入佇列 144之BTAC寫入要求信號176內之由儲存階段128所產生 之用於更新一 BTAC142之項目之資訊,其也是存於BTAC 寫入佇列144之項目內之內容,如第8圖所示。 BTAC寫入要求176包括一分支指令位址櫊位702, 其是要更新該BTAC142之先前執行分支指令之位址。當該 寫入要求176接著更新BTAC142時,分支指令位址欄位702 之高階20位元係存至第5圖之標籤陣列項目314之標籤欄 位502。分支指令位址欄位702之低階7位元[11 : 5]係當 成BTAC142之索引。在一實施例中,分支指令位址欄位702 是32-位元欄位。 BTAC寫入要求176也包括一開始欄位708,以儲存 1283822,95-9-13 also includes a prediction state B counter 604 that operates similarly to the predicted state A counter 602, but is associated with portion B of the selected target address array item 312. The counter array item 316 also includes an A/Blru bit 606 〇 A/Blm bit 606 in the binary 606 bit 値 represents that the selected target address array item 312 part A is the oldest unused; otherwise, It is the B portion of the selected target address array item 312 that is the longest unused. In one embodiment, when the branch instruction arrives at the storage phase 128 that determines the branch result (i.e., the branch is to be executed), the Α/Β1ηι bit 606 along with the predicted state A and B counters 602 and 604 are Update. In one embodiment, the update counter array item 316 does not require the use of a BTAC write queue 144 because the counter array 306 includes a read buffer and a write buffer, as shown in FIG. Referring now to Figure 7, the contents of the BTAC write request 176 in accordance with Figure 1 of the present invention are shown. Figure 7 shows the information generated by the storage phase 128 for updating an BTAC 142 entry into the BTAC write request signal 176 of the BTAC write queue 144, which is also the item stored in the BTAC write queue 144. The content inside, as shown in Figure 8. The BTAC write request 176 includes a branch instruction address bit 702, which is the address of the previous execution branch instruction to update the BTAC 142. When the write request 176 then updates the BTAC 142, the higher order 20 bits of the branch instruction address field 702 are stored in the tag field 502 of the tag array entry 314 of FIG. The lower order 7-bit [11:5] of the branch instruction address field 702 is used as the index of BTAC142. In an embodiment, the branch instruction address field 702 is a 32-bit field. The BTAC Write Request 176 also includes a start field 708 to store 1283822,

8twf1.doc/0068twf1.doc/006

95-9-13 於第4圖之開始欄位404內。BTAC寫入要求176也包括一 橫跨位元712,以儲存於第4圖之橫跨位元406內。 BTAC寫入要求176也包括一寫入致能A欄位714, 其代表是否要利用BTAC寫入要求176指定之資訊來更新 所選目標位址陣列項目312內之A部份。BTAC寫入要求 176也包括一寫入致能B欄位716 ’其代表是否要利用 BTAC寫入要求176指定之資訊來更新所選目標位址陣列 項目312內之B部份。 BTAC寫入要求176也包括一無效A欄位718,其代 表是否要無效化所選目標位址陣列項目312內之A部份。 無效化所選目標位址陣列項目312內之A部份係包括:清 除第5圖之該A有效位元50^BTAC寫入要求176也包括 一無效B欄位722,其代表是否要無效化所選目標位址陣 列項目312內之B部份。無效化所選目標位址陣列項目312 內之B部份係包括:清除第5圖之該B有效位元506。 BTAC寫入要求176也包括一 4_位元向欄位724,其 指定要更新.所選指令集之四向之哪一向。向欄位724是全 解碼。在一實施例中,當微處理器100讀取BTAC142以得 到分支預測時,微處理器100決定要放於向欄位724內之 値並透過管線階段而將該値往下送至儲存階段128以包含 於該BTAC寫入要求176內。如果微處理器100正在更新 BTAC142內之一既有項目,亦即,如果目前}i取位址162 命中於BTAC142內,微處理器1〇〇將既有項目之向設於向 欄位724內。如果微處理器1〇〇正在寫入新項目於BTAC142 內,比如,新分支指令,微處理器100將所選之BTAC142 24 12838¾ 8twfl .doc/006 95-9-13 指令集之最久未用向設於向欄位724內。在一實施例中, 當微處理器100讀取BTAC142以得到分支預測時,微處理 器100從第5圖之Ini欄位508來決定最久未用向。 現參考第8圖,顯示根據本發明之第3圖之BTAC寫 入f宁列144之方塊圖。 BTAC寫入佇列144包括複數儲存元件802以儲存第 7圖之BTAC寫入要求176。在一實施例中,BTAC寫入佇 列144包括6個儲存元件802以儲存6筆BTAC寫入要求 176,如所示。 BTAC寫入佇列144也包括一有效位元804,相關於 各BTAC寫入要求項目802;如果相關項目爲有效,則有效 位元804爲真;如果相關項目爲無效,則有效位元804爲 假。 BTAC寫入佇列144也包括控制邏輯電路806,耦合 至儲存元件802與有效位元804。控制邏輯電路806也耦合 至佇列深度暫存器146。當有一 BTAC寫入要求176載入至 BTAC寫入佇列.144時,控制邏輯電路806增加佇列深度 146;當BTAC寫入要求176從BTAC寫入佇列144移出時, 控制邏輯電路806減少佇列深度146。控制邏輯電路806 接收從第1圖之儲存階段128傳來之BTAC寫入要求信號 176並將所接收之要求存於項目802。控制邏輯電路806也 接收第1圖之分支誤測信號B2 ’預測取代信號154 ’指令 緩衝器全滿信號156與指令快取閒置信號158。當佇列深度 146大於0時,控制邏輯電路806產生爲真之第2圖之BWQ 非空信號218。當當佇列深度146之値等於項目802之總數 25 1283822,95-9-13 is in the beginning field 404 of Figure 4. The BTAC write request 176 also includes a traversing bit 712 for storage in the traverse 406 of FIG. The BTAC Write Request 176 also includes a Write Enable A field 714 which indicates whether the A portion of the selected target address array item 312 is to be updated using the information specified by the BTAC Write Request 176. The BTAC Write Requirement 176 also includes a Write Enable B field 716' which indicates whether the B portion of the selected target address array item 312 is to be updated using the information specified by the BTAC Write Request 176. The BTAC Write Request 176 also includes an Invalid A Field 718 which indicates whether the A portion of the selected Target Address Array Item 312 is to be invalidated. The invalidation of the A portion of the selected target address array item 312 includes: clearing the A significant bit of the 5th figure. The BTAC write request 176 also includes an invalid B field 722, which represents whether or not to invalidate. Part B of the selected target address array item 312. Invalidating the portion B of the selected target address array item 312 includes clearing the B significant bit 506 of FIG. The BTAC Write Request 176 also includes a 4_bit to field 724 which specifies which direction the four directions of the selected instruction set are to be updated. The field 724 is fully decoded. In one embodiment, when the microprocessor 100 reads the BTAC 142 for branch prediction, the microprocessor 100 decides to place it in the field 724 and forwards the buffer to the storage phase 128 through the pipeline stage. To be included in the BTAC write request 176. If the microprocessor 100 is updating an existing item in the BTAC 142, that is, if the current address of the address 162 is hit in the BTAC 142, the microprocessor 1 sets the direction of the existing item in the field 724. . If the microprocessor 1 is writing a new item in the BTAC 142, such as a new branch instruction, the microprocessor 100 will select the longest unused instruction set for the selected BTAC142 24 128383⁄4 8twfl .doc/006 95-9-13 instruction set. Located in the field 724. In one embodiment, when the microprocessor 100 reads the BTAC 142 to obtain a branch prediction, the microprocessor 100 determines the oldest unused direction from the Ini field 508 of Figure 5. Referring now to Figure 8, there is shown a block diagram of a BTAC write in accordance with Figure 3 of the present invention. The BTAC write queue 144 includes a plurality of storage elements 802 to store the BTAC write request 176 of FIG. In one embodiment, the BTAC write queue 144 includes six storage elements 802 to store six BTAC write requests 176, as shown. The BTAC write queue 144 also includes a valid bit 804 associated with each BTAC write request item 802; if the associated item is valid, the valid bit 804 is true; if the associated item is invalid, the valid bit 804 is false. The BTAC write queue 144 also includes control logic 806 coupled to the storage element 802 and the active bit 804. Control logic circuit 806 is also coupled to bank depth register 146. When a BTAC write request 176 is loaded into the BTAC write queue 144, the control logic 806 increases the queue depth 146; when the BTAC write request 176 is removed from the BTAC write queue 144, the control logic 806 is reduced. The queue depth is 146. Control logic circuit 806 receives BTAC write request signal 176 from storage stage 128 of FIG. 1 and stores the received request in item 802. Control logic circuit 806 also receives branch misdetection signal B2' prediction replacement signal 154' command buffer full full signal 156 and instruction cache idle signal 158 of Figure 1. When the queue depth 146 is greater than zero, the control logic circuit 806 generates a BWQ non-empty signal 218 of Figure 2 that is true. When the queue depth 146 is equal to the total number of items 802 25 1283822,

丨E替换頁I 95-9-13 量(在第8圖之實施例中爲8)時,控制邏輯電路806產生爲 真之第2圖之BWQ全滿信號222。當控制邏輯電路806產 生爲真之BWQ非空信號218時,控制邏輯電路806將BTAC 寫入佇列144之最舊(或最底部)項目802之分支指令位址 702設於第1圖之BWQ位址信號178內。此外,當控制邏 輯電路806產生爲真之MWQ非空信號218時,控制邏輯 電路806也將BTAC寫入佇列144之最舊(或最底部)項目 802之第7圖之欄位706〜724設於BWQ資料信號248內。 現參考第9圖,顯示根據本發明之第1圖之BTAC寫 入佇列144之操作流程圖。流程開始於決定方塊902。 在決定方塊902,BTAC寫入佇列144藉由決定第1 圖之佇列深度146是否等於BTAC寫入佇列144內之總項 目數量來決定BTAC寫入佇列144是否全滿。如果全滿, 流程跳至方塊918以更新BTAC142 ;否則,流程跳至決定 方塊904。 在決定方塊904,BTAC寫入佇列144藉由檢查該指 令快取閒置信號158來決定第1圖之該指令快取104是否 閒置。如果閒置,必要時,流程跳至決定方塊922以更新 BTAC142因爲BTAC142可能未被讀取;否則,流程跳至 決定方塊906。 在決定方塊906,BTAC寫入佇列144藉由檢查該指 令緩衝器全滿信號156來決定第1圖之指令緩衝器106是 否全滿。如果全滿,必要時,流程跳至決定方塊922以更 新BTAC142因爲BTAC142可能未被讀取;否則,流程跳 至決定方塊908。 26 I283827828t wf 1 .doc/ _牟消替換頁 在決定方塊908,BTAC寫入佇列144藉由檢查該預 測取代信號154來決定BTAC142分支預測是否已被取代。 如果是,必要時,流程跳至決定方塊922以更新BTAC142 因爲BTAC142可能未被讀取;否則,流程跳至決定方塊 912 〇 在決定方塊912,BTAC寫入佇列144藉由檢查該分 支誤測信號152來決定BTAC142分支預測是否已被校正。 如果是,必要時,流程跳至決定方塊922以更新BTAC142 因爲BTAC142可能未被讀取;否則,流程跳至決定方塊 914。 在決定方塊914,BTAC寫入佇列144決定是否已產 生該BTAC寫入要求176。如果否,流程跳回至決定方塊 902 ;否則,流程跳至方塊916。 在決定方塊916,BTAC寫入佇列144載入該BTAC 寫入要求176並增加佇列深度146。該BTAC寫入要求176 被載入至BTAC寫入佇列144之最頂端之無效項目,接著 該項目被標示爲有效。流程跳回至決定方塊902。 在決定方塊918,BTAC寫入佇列144利用BTAC寫 入佇列144內之最舊或底部項目來更新BTAC142,並減少 佇列深度146。BTAC寫入佇列144接著往下移一個項目。 藉由將最舊項目之第7圖之分支指令位址攔位702之値設 成BWQ位址信號178,以及將最舊BTAC寫入要求176之 其他部份設於BWQ資料信號248,BTAC寫入佇列I44利 用BTAC寫入佇列144內之最舊項目來更新BTAC142。此 外,BTAC寫入佇列144發出爲真之BWQ非空信號218至 27When 丨E replaces page I 95-9-13 (8 in the embodiment of Fig. 8), control logic circuit 806 generates a BWQ full full signal 222 of Fig. 2 which is true. When control logic circuit 806 generates a true BWQ non-empty signal 218, control logic circuit 806 writes BTAC to the branch instruction address 702 of the oldest (or bottommost) item 802 of queue 144, which is set to BWQ in FIG. Within address signal 178. Moreover, when control logic circuit 806 generates a true MWQ non-empty signal 218, control logic circuit 806 also writes BTAC to fields 706-724 of Figure 7 of the oldest (or bottommost) item 802 of queue 144. It is set in the BWQ data signal 248. Referring now to Figure 9, a flow chart showing the operation of the BTAC write queue 144 in accordance with Figure 1 of the present invention is shown. The process begins at decision block 902. At decision block 902, the BTAC write queue 144 determines whether the BTAC write queue 144 is full by determining whether the queue depth 146 of FIG. 1 is equal to the total number of entries in the BTAC write queue 144. If full, the flow jumps to block 918 to update BTAC 142; otherwise, the flow jumps to decision block 904. At decision block 904, the BTAC write queue 144 determines whether the instruction cache 104 of FIG. 1 is idle by examining the instruction cache idle signal 158. If idle, the process jumps to decision block 922 to update BTAC 142 if necessary because BTAC 142 may not be read; otherwise, flow jumps to decision block 906. At decision block 906, the BTAC write queue 144 determines if the instruction buffer 106 of FIG. 1 is full by checking the instruction buffer full full signal 156. If full, if necessary, the flow jumps to decision block 922 to update BTAC 142 because BTAC 142 may not be read; otherwise, flow jumps to decision block 908. 26 I283827828t wf 1 .doc/ _牟Replacement page At decision block 908, the BTAC write queue 144 determines whether the BTAC 142 branch prediction has been replaced by examining the predicted replacement signal 154. If so, if necessary, the flow jumps to decision block 922 to update BTAC 142 because BTAC 142 may not be read; otherwise, flow jumps to decision block 912. At decision block 912, BTAC writes queue 144 by checking the branch for misdetection. Signal 152 determines if the BTAC 142 branch prediction has been corrected. If so, if necessary, the flow jumps to decision block 922 to update BTAC 142 because BTAC 142 may not be read; otherwise, flow jumps to decision block 914. At decision block 914, the BTAC write queue 144 determines if the BTAC write request 176 has been generated. If not, the process jumps back to decision block 902; otherwise, the process jumps to block 916. At decision block 916, the BTAC write queue 144 loads the BTAC write request 176 and increments the queue depth 146. The BTAC write request 176 is loaded into the topmost invalid entry of the BTAC write queue 144, and the entry is then marked as valid. The process jumps back to decision block 902. At decision block 918, the BTAC write queue 144 updates the BTAC 142 with the oldest or bottom item written into the queue 144 by the BTAC and reduces the queue depth 146. The BTAC writes to the queue 144 and then moves down one item. By setting the branch instruction address block 702 of the oldest picture to the BWQ address signal 178 and the other part of the oldest BTAC write request 176 to the BWQ data signal 248, the BTAC writes The entry queue I44 updates the BTAC 142 with the oldest entry in the queue 144 using the BTAC. In addition, the BTAC writes the BWQ non-empty signals 218 to 27 that are asserted by the queue 144.

I283822.S twf1.doc/006I283822.S twf1.doc/006

95-9-13 第2圖之仲裁器202。如果流程係從決定方塊902跳至方塊 918, BTAC寫入佇列144也發出爲真之BWQ全滿信號2228 至第2圖之仲裁器202。流程從方塊918跳至決定方塊914。 要注意,如果在BTAC讀取要求信號212也在待處理 期間內,BTAC寫入佇列144發出該BWQ全滿信號222且 該仲裁器202允許BTAC寫入佇列144存取BTAC142 ;則 BTAC142將會落空,但如果BTAC142所預測之分支指令 之有效目標位址存在於BTAC142內之目前擷取位址162所 指定之快取線的話,此落空係爲僞性落空。然而,有利的 是,藉由在大部份情況下將BTAC142之寫入延遲到 BTAC142未被讀取,BTAC寫入佇歹[J 144可降低BTAC142 之僞性落空之可能性,如第9圖所示。 在決定方塊922,控制邏輯電路806藉由決定佇列深 度146是否等於0來決定是否BTAC寫入佇列144爲空。 如果是,流程跳至決定方塊914 ;否則,流程跳至決定方塊 922以更新BTAC142因爲BTAC142可能未被讀取。 現參考第10圖,顯示根據本發明之第1圖之該微處 理器100內之將該BTAC內多餘目標位址無效化之邏輯電 路之方塊圖。 第10圖顯示第3圖之BTAC142之標籤陣列304接收 第1圖之位址182並回應性產生4個標籤,標示爲tagO 1002A,tagl 1002B,tag2 1002C 與 tag3 1002D,總稱爲標 籤1002。標籤1002包括從標籤陣列304之4向之各向傳來 之第5圖之標籤502。此外,標籤陣列304回應性產生8 個有效位元[7: 0],標示爲1004,其爲從標籤陣列304之4 28 I283827828t95-9-13 Arbiter 202 of Figure 2. If the flow jumps from decision block 902 to block 918, the BTAC write queue 144 also issues the true BWQ full full signal 2228 to the arbiter 202 of FIG. Flow moves from block 918 to decision block 914. It is noted that if the BTAC read request signal 212 is also in the pending period, the BTAC write queue 144 issues the BWQ full full signal 222 and the arbiter 202 allows the BTAC write queue 144 to access the BTAC 142; then the BTAC 142 will It will be lost, but if the effective target address of the branch instruction predicted by BTAC142 exists in the BTAC 142 and the cache line specified by the current address 162 is selected, the failure is false. However, advantageously, by delaying the writing of BTAC 142 to BTAC 142 in most cases, BTAC writes 伫歹 [J 144 can reduce the possibility of BTAC 142 false fall, as shown in Figure 9. Shown. At decision block 922, control logic 806 determines if the BTAC write queue 144 is empty by determining if queue depth 146 is equal to zero. If so, the flow jumps to decision block 914; otherwise, the flow jumps to decision block 922 to update BTAC 142 because BTAC 142 may not be read. Referring now to Fig. 10, there is shown a block diagram of a logic circuit in the microprocessor 100 in accordance with the first embodiment of the present invention for invalidating the redundant target address in the BTAC. Figure 10 shows that tag array 304 of BTAC 142 of Figure 3 receives address 182 of Figure 1 and responsively generates four tags, labeled tagO 1002A, tagl 1002B, tag2 1002C and tag3 1002D, collectively referred to as tag 1002. Tag 1002 includes a label 502 of Figure 5 that is transmitted from 4 to 4 of tag array 304. In addition, tag array 304 responsively generates 8 valid bits [7:0], designated 1004, which is 4 28 I283827828t from tag array 304.

wf1.doc/006Wf1.doc/006

95-9-13 向之各向傳來之A有效位元504與B有效位元506。 微處理器1〇〇也包括比較器1012,耦合至標籤陣列 304,該比較器1012接收位址182。在第10圖之實施例中, 比較器1012包括4個20-位元比較器,各比較器比較位址 182之高階20位元與相關標籤1002以產生四個匹配信號, 標示爲 matchO 1006A,matchl 1006B,match2 1006C 與 match3 1006D,總稱爲匹配信號1006。如果位址182匹配 於相關標籤1002,則比較器1012產生爲真値之匹配信號 1006。 微處理器100也包括控制邏輯電路1014,耦合至比較 器1012,該電路1014接收匹配信號1006與有效信號1004。 如果標籤陣列304之所選指令集之向中有複數向具有爲真 値之匹配信號1006與至少一個爲真値之有效位元1004,則 控制邏輯電路1014儲存一真値於多餘TA旗標暫存器1024 內,以代表同一分支指令之一個以上之有效目標位址係存 於BTAC142內。此外,控制邏輯電路1014使得位址182 載入至於多餘TA位址暫存器1026內。最後,控制邏輯電 路1014載入多餘TA無效資料至多餘TA無效資料暫存器 1022內。在一實施例中,存於多餘TA無效資料暫存器1022 內之資料係相似於第7圖之BTAC寫入要求176,除了未儲 存分支指令位址702外,因爲該分支指令之位址係存於多 餘TA位址暫存器1〇26內;且也未儲存目標位址706,開 始位元708,與橫跨位元712,因爲其在無效BTAC142項 目內是無關緊要的;因而,當進行多餘TA無效化時,目標 位址陣列302不會被寫入,而只有標籤陣列304被更新以 I283827828t wf 1 . doc/006 9.95-9-13 A valid bit 504 and B valid bit 506 are transmitted to the respective directions. The microprocessor 1A also includes a comparator 1012 coupled to the tag array 304, which receives the address 182. In the embodiment of FIG. 10, the comparator 1012 includes four 20-bit comparators, each comparator comparing the higher order 20 bits of the address 182 with the associated tag 1002 to generate four matching signals, designated as matchO 1006A, Matchl 1006B, match2 1006C and match3 1006D are collectively referred to as match signal 1006. If the address 182 matches the associated tag 1002, the comparator 1012 generates a match signal 1006 that is true. Microprocessor 100 also includes control logic circuit 1014 coupled to comparator 1012, which receives matching signal 1006 and valid signal 1004. If the selected instruction set of the tag array 304 has a complex signal 1006 with a true match and at least one valid bit 1004, the control logic circuit 1014 stores a true TA flag. Within the 1024, more than one valid target address representing the same branch instruction is stored in the BTAC 142. In addition, control logic circuit 1014 causes address 182 to be loaded into redundant TA address register 1026. Finally, control logic circuit 1014 loads the redundant TA invalidation data into redundant TA invalid data register 1022. In one embodiment, the data stored in the redundant TA invalid data register 1022 is similar to the BTAC write request 176 of FIG. 7, except that the branch instruction address 702 is not stored because of the address of the branch instruction. Stored in the redundant TA address register 1 〇 26; and also does not store the target address 706, the start bit 708, and the traverse bit 712, since it is irrelevant within the invalid BTAC 142 item; thus, when When redundant TA invalidation is performed, the target address array 302 is not written, and only the tag array 304 is updated to I283827828t wf 1. doc/006 9.

95-9-13 無效該多餘BTAC142項目。該多餘TA無效資料暫存器 1022之輸出包括第2圖之多餘TA無效資料信號244。該多 餘TA旗標暫存器1024之輸出包括第2圖之多餘TA要求 214。該多餘TA位址暫存器1026之輸出包括第2圖之多餘 TA位址234。在一實施例中,存於該多餘TA無效資料暫 存器1022與該多餘TA旗標暫存器1024內之該向値724 之產生等式係顯示於底下之表2。在表2中,有效位元[3] 包括A有效位元[3]504與B有效位元[3]506之邏輯OR結 果;有效位元[2]包括A有效位元[2]504與B有效位元[2]506 之邏輯OR結果;有效位元[1]包括A有效位元[1]504與B 有效位元[1]506之邏輯OR結果;以及有效位元[0]包括A 有效位元[〇]504與B有效位元[0]506之邏輯OR結果。 RedundantInvalWay[3]=(valid[3]&match[3])&((valid[0]&match[0])| (valid[ l]&match[l])|(valid[2]&match[2]));95-9-13 This extra BTAC142 item is invalid. The output of the redundant TA invalid data register 1022 includes the redundant TA invalid data signal 244 of FIG. The output of the redundant TA flag register 1024 includes the redundant TA requirement 214 of FIG. The output of the redundant TA address register 1026 includes the redundant TA address 234 of FIG. In one embodiment, the generation equation for the direction 724 stored in the redundant TA invalid data register 1022 and the redundant TA flag register 1024 is shown in Table 2 below. In Table 2, the effective bit [3] includes the logical OR result of the A significant bit [3] 504 and the B significant bit [3] 506; the valid bit [2] includes the A significant bit [2] 504 and The logical OR result of B significant bit [2] 506; the valid bit [1] includes the logical OR result of A valid bit [1] 504 and B valid bit [1] 506; and the valid bit [0] includes The logical OR result of A valid bit [〇] 504 and B valid bit [0] 506. RedundantInvalWay[3]=(valid[3]&match[3])&((valid[0]&match[0])| (valid[ l]&match[l])|(valid[2 ]&match[2]));

RedundantInvalWay[2]=(valid[2]&match[2])&((valid[0]&match[0])| (valid[l]&match[l]));RedundantInvalWay[2]=(valid[2]&match[2])&((valid[0]&match[0])| (valid[l]&match[l]));

RedundantInvalWay[l]=(valid[l]&match[l])&(valid[0]&match[0]); RedundantInvalWay[0]=0; /*Way 0 永遠不會被無效 */ RedundanInAFlag=((valid[3]&match[3])&(valid[2]&match[2]))| ((valid[3]&match[3])&(valid[l]&match[l]))| ((valid[3]&match[3])&(valid[0]&match[0]))| ((valid[2]&match[2])&(valid[l]&match[l]))| ((valid[2]&match[2])&(valid[0]&match[0]))| ((valid[ I]&match[l])&(valid[0]&match[0])); 爲使第10圖之多餘目標位址無效邏輯電路之適當操 30 1283827s 28twfl .doc/006RedundantInvalWay[l]=(valid[l]&match[l])&(valid[0]&match[0]);RedundantInvalWay[0]=0; /*Way 0 will never be invalid*/ RedundanInAFlag=((valid[3]&match[3])&(valid[2]&match[2]))| ((valid[3]&match[3])&(valid[l ]&match[l]))| ((valid[3]&match[3])&(valid[0]&match[0]))| ((valid[2]&match[2 ])&(valid[l]&match[l]))| ((valid[2]&match[2])&(valid[0]&match[0]))| ((valid [I]&match[l])&(valid[0]&match[0])); In order to invalidate the redundant target address of Figure 10, the appropriate operation is 30 1283827s 28twfl .doc/006

95-9-13 作,如第11圖所示,將一串的指令執行爲例做說明,其可 在BTAC142內產生同一分支指令之多餘目標位址項目。 第1圖之第一目前擷取位址162係輸入至指令快取 104與BTAC142。第一目前擷取位址162所選之快取線包 括一分支指令,稱爲分支-A。第一目前擷取位址162選擇 BTAC142內之一指令集,稱爲指令集N。指令集N之向內 沒有一個標籤1002匹配於第一目前擷取位址162 ;因此, BTAC142產生落空。在此例中,lru値508所代表之最久未 用向是2。因此,關於分支-A之更新BTAC142之資訊係沿 著管線往下送,連同代表向2必需被更新之分支-A。 接著,輸入一第二目前擷取位址162至該指令快取1〇4 與BTAC142。由第二目前擷取位址162所選之快取線包括 一分支指令,稱爲分支-B。第二目前擷取位址162也選擇 指令集N且命中於指令集N之3向;接著,BTAC142產生 一命中。此外,BTAC142更新指令集N之lru値508爲1 向。 接著,因爲分支-A是碼之緊湊迴圈之一部份,再次輸 入該第一目前擷取位址162至該指令快取104與 BTAC142,並再次選擇指令集N。因爲分支-A之第一次執 行未到達第1圖之儲存階段128,BTAC142未利用分支-A 之目標位址做更新。接著,BTAC142再次產生落空。然而, 此次之lru値508所指之最久未用向是1,因爲lru5〇8回應 於分支-B之命中而被更新。因此,關於分支-A之第二次執 行之更新BTAC142之資訊係沿著管線往下送,連同代表向 1必需被更新之分支-A之第二次執行。 31 I283822.S, )& /f 1 . doc/00 6 1 9修(A)正替換員 95-9-13 接著,該第一分支-A到達該儲存階段128並產生一 BTAC寫入要求176以利用分支-A之目標位址來更新指令 集N之向2,這將於後續進行。 接著,該第二分支-A到達該儲存階段128並產生一 BTAC寫入要求176以利用分支-A之目標位址來更新指令 集N之向1,這將於後續進行。因此,同一分支指令,分 支-A,之兩個有效項目存在於BTAC142內。該些項目之一 是多餘的且造成BTAC142之使用較無效率,因爲該多餘項 目可以被另一分支指令使用及/或會佔去另一分支指令之 φ 有效目標位址。 現參考第11圖,顯示根據本發明之第10圖之多餘目 標位址裝置之操作流程圖。流程開始於方塊1102。 在方塊1102,仲裁器202允許第2圖之BTAC讀取要 求212對BTAC142之存取,造成多工器148選擇目前擷取 位址162以設於第1圖之位址信號182上並產生第2圖之 控制信號252以代表BTAC142之讀取。接著,目前擷取位 址162之低階位元透過位址182而當成選擇BTAC142之指 令集之索引。流程接續至方塊1104。 # 在方塊1104,比較器1〇12比較所選BTAC142之指令 集之所有4個向之第10圖之標籤1〇〇2與設於位址信號182 上之目前擷取位址162之高階位元以產生第10圖之匹配信 號1006。控制邏輯電路1014接收第10圖之匹配信號1006 與有效位元1004。流程接續至方塊1106。 在方塊1106,控制邏輯電路1014決定是否發生一個 以上之有效標籤匹配。亦即,根據有效位元1004與匹配信 32 1283 82l7s 2 81 wf 1 .doc/|§.6幕·月日修(〆)止替換頁 95-9-13 號1006,控制邏輯電路1014決定是否有目前擷取位址162 所選之BTAC142之指令集內之2個以上的向有一有效匹配 標籤1002。如果是,流程接續至方塊1108 ;否則,流程結 束。 在方塊1108,控制邏輯電路1014儲存一真値於多餘 TA旗標暫存器1024,儲存位址182於多餘TA位址暫存器 1026,以及儲存無效資料於多餘TA無效資料暫存器1022。 特別是,控制邏輯電路1014儲存爲真値之寫入致能A欄位 714、寫入致能B欄位716、無效A攔位718與無效B欄位 722於多餘TA無效資料暫存器1022。此外,控制邏輯電路 1014將根據第10圖所描述之表2之向欄位724之値存於多 餘TA無效資料位址暫存器1022。流程接續至方塊1112。 在方塊1112,仲裁器202允許第2圖之多餘TA要求 214對BTAC142之存取,造成多工器148選擇多餘TA位 址234以設於位址信號182上且產生第2圖之控制信號252 以指示BTAC142之寫入。接著,多餘TA位址234之低階 位元透過位址182而當成選擇BTAC142之指令集之索引。 BTAC142接收多餘TA資料暫存器1022所輸出之多餘資料 信號244並將所選指令集內之向欄位724所指向之該些向 無效化。流程結束於方塊1112。 現參考第12圖,顯示根據本發明之該微處理器100 內之死結避免邏輯電路之方塊圖。 第12圖顯示第1圖之BTAC142,指令快取104,指 令緩衝器106,指令規格化器108,規格化後指令佇列112 與多工器136,以及第10圖之控制邏輯電路1014。 33 1283827m twf 1 . d oc/006 ψ 丨1 條1替換頁 95-9-13 如第12圖,微處理器100也包括一死結無效資料暫 存器1222,一死結旗標暫存器1224,與一死結位址暫存器 1226 〇 指令規格化器108解碼存於該指令緩衝器106內之指 令,以及如果指令規格化器1〇8解碼出橫跨兩快取線之分 支指令,則產生爲真之F_wrap信號1202。特別是,在指令 規格化器108解碼出橫跨兩快取線之分支指令時,一旦已 解碼出存於指令緩衝器106內之一第一快取線內之一橫跨 分支指令之該第一部份,不論指令規格化器108是否已解 碼尙未存於指令緩衝器106內之該第二快取線內之該橫跨 分支指令之其他部份,指令規格化器1〇8產生爲真之 F_wrap信號1202。F_wrap信號1202係輸入至控制邏輯電 路 1014。 當目前擷取位址162落空時,指令快取104產生爲真 値之落空信號1206。落空信號1206係輸入至控制邏輯電路 1014。 當輸入至指令快取104之目前擷取位址162是預測 的,亦即,當目前擷取位址162是一預測性位址時,指令 快取104產生爲真値之一預測信號1208,比如當多工器136 選擇BTAC預測目標位址164爲目前擷取位址162時。預 測信號1208係輸入至指令快取104。在一實施例中,指令 快取104將預測信號1208送至第1圖之指令擷取器102, 使得指令擷取器102放棄從記憶體之預測記憶體位址處擷 取落空於指令快取104內之快取線,理由將參考第13圖而 於底下描述。 34 1283 8Z7: 28twfl .doc/00695-9-13, as shown in Figure 11, a series of instructions are executed as an example, which can generate redundant target address items of the same branch instruction in BTAC142. The first current capture address 162 of Figure 1 is input to the instruction cache 104 and the BTAC 142. The cache line selected by the first current capture address 162 includes a branch instruction called branch-A. The first current capture address 162 selects one of the instruction sets within the BTAC 142, referred to as the instruction set N. Inward of the instruction set N, none of the tags 1002 match the first current capture address 162; therefore, the BTAC 142 is lost. In this case, the longest unused orientation represented by lru値508 is 2. Therefore, the information about the update of BTAC 142 of branch-A is sent down the pipeline, together with the branch-A which represents the need to be updated to 2. Next, a second current capture address 162 is input to the instruction cache 1 〇 4 and BTAC 142. The cache line selected by the second current capture address 162 includes a branch instruction, called branch-B. The second current capture address 162 also selects the instruction set N and hits the 3 direction of the instruction set N; then, the BTAC 142 generates a hit. In addition, the BTAC 142 updates the command set N to lru 508 for 1 direction. Next, because branch-A is part of the compact loop of the code, the first current capture address 162 is again input to the instruction cache 104 and BTAC 142, and the instruction set N is again selected. Since the first execution of branch-A did not reach the storage phase 128 of Figure 1, BTAC 142 did not utilize the target address of branch-A for updates. Then, the BTAC 142 is again defeated. However, the longest unused position referred to by lru値508 this time is 1, because lru5〇8 is updated in response to the hit of branch-B. Therefore, the information about the second execution of the branch-A update BTAC 142 is sent down the pipeline, along with the second execution of the branch-A, which must be updated. 31 I283822.S, )& /f 1 . doc/00 6 1 9 repair (A) replacement staff 95-9-13 Next, the first branch-A reaches the storage phase 128 and generates a BTAC write request 176 updates the direction of instruction set N to 2 using the target address of branch-A, which will be performed subsequently. Next, the second branch-A arrives at the storage phase 128 and generates a BTAC write request 176 to update the direction of the instruction set N to 1 using the target address of branch-A, which will be performed subsequently. Therefore, the same branch instruction, branch-A, two valid items exist in BTAC142. One of these items is redundant and makes the use of BTAC 142 less efficient because the redundant item can be used by another branch instruction and/or can take up the φ valid target address of another branch instruction. Referring now to Figure 11, there is shown an operational flow diagram of the redundant target address device in accordance with Figure 10 of the present invention. Flow begins at block 1102. At block 1102, the arbiter 202 allows the BTAC read request 212 of FIG. 2 to access the BTAC 142, causing the multiplexer 148 to select the current capture address 162 to be placed on the address signal 182 of FIG. 1 and generate the Control signal 252 of FIGURE 2 is representative of the reading of BTAC 142. Next, the lower order bit of the currently retrieved address 162 is passed through the address 182 as an index to select the command set of the BTAC 142. Flow continues to block 1104. # At block 1104, comparators 1〇12 compare all four of the instruction sets of the selected BTAC 142 to the tag 1〇〇2 of FIG. 10 and the high order bits of the current fetch address 162 set on the address signal 182. The element is generated to produce the match signal 1006 of FIG. Control logic circuit 1014 receives match signal 1006 and valid bit 1004 of FIG. Flow continues to block 1106. At block 1106, control logic circuit 1014 determines if more than one valid tag match has occurred. That is, according to the valid bit 1004 and the matching letter 32 1283 82l7s 2 81 wf 1 .doc/|§.6, the screen is replaced by the page 95-9-13, the control logic circuit 1014 determines whether or not There are more than two valid directions in the instruction set of the BTAC 142 selected by the current address 162 to have a valid matching tag 1002. If so, the flow continues to block 1108; otherwise, the process ends. At block 1108, control logic circuit 1014 stores a true TA flag register 1024, stores address 182 in excess TA address register 1026, and stores invalid data in redundant TA invalid data register 1022. In particular, the control logic circuit 1014 stores the write enable field A 714, the write enable B field 716, the invalid A block 718, and the invalid B field 722 in the redundant TA invalid data register 1022. . In addition, control logic circuit 1014 stores the header field 724 of Table 2 as described in FIG. 10 in a redundant TA invalid data address register 1022. Flow continues to block 1112. At block 1112, the arbiter 202 allows the redundant TA request 214 of FIG. 2 to access the BTAC 142, causing the multiplexer 148 to select the redundant TA address 234 to be placed on the address signal 182 and generate the control signal 252 of FIG. To indicate the writing of BTAC142. Next, the lower order bits of the redundant TA address 234 are passed through the address 182 as an index to select the instruction set of the BTAC 142. The BTAC 142 receives the redundant data signal 244 output by the redundant TA data register 1022 and invalidates the directions pointed to by the field 724 in the selected instruction set. Flow ends at block 1112. Referring now to Figure 12, there is shown a block diagram of a deadlock avoidance logic circuit within the microprocessor 100 in accordance with the present invention. Figure 12 shows the BTAC 142 of Figure 1, the instruction cache 104, the instruction buffer 106, the instruction normalizer 108, the normalized instruction queue 112 and the multiplexer 136, and the control logic 1014 of Figure 10. 33 1283827m twf 1 . d oc/006 ψ 丨 1 1 replacement page 95-9-13 As shown in Fig. 12, the microprocessor 100 also includes a dead knot invalid data register 1222, a dead knot flag register 1224, And a dead-tie address register 1226, the instruction normalizer 108 decodes the instructions stored in the instruction buffer 106, and if the instruction normalizer 1 解码 8 decodes the branch instructions across the two cache lines, The true F_wrap signal 1202. In particular, when the instruction normalizer 108 decodes the branch instruction across the two cache lines, once the decoded one of the first cache lines in the instruction buffer 106 is decoded, the first branch line is decoded. In part, regardless of whether the instruction normalizer 108 has been decoded and not stored in the other portion of the traversal branch instruction in the second cache line in the instruction buffer 106, the instruction normalizer 1 产生 8 is generated as True F_wrap signal 1202. The F_wrap signal 1202 is input to the control logic circuit 1014. When the current capture address 162 is lost, the instruction cache 104 is generated as a true fall signal 1206. The drop signal 1206 is input to the control logic circuit 1014. When the current capture address 162 input to the instruction cache 104 is predicted, that is, when the current capture address 162 is a predictive address, the instruction cache 104 is generated as one of the true prediction signals 1208, For example, when the multiplexer 136 selects the BTAC prediction target address 164 as the current capture address 162. The predictive signal 1208 is input to the instruction cache 104. In one embodiment, the instruction cache 104 sends the prediction signal 1208 to the instruction fetcher 102 of FIG. 1 such that the instruction fetcher 102 discards the fetch from the predictive memory address of the memory to the instruction fetch 104. The reason for the cache line will be described below with reference to Figure 13. 34 1283 8Z7: 28twfl .doc/006

95-9-13 BTAC142產生一執行/不執行(T/NT)信號1212,其輸 出至控制邏輯電路1014。爲真値之T/NT信號1212代表位 址182命中於BTAC142內,代表BTAC142預測一分支指 令係包括於回應於目前擷取位址162而由指令快取104提 供之快取線內,代表該分支指令要被執行,以及代表 BTAC142將分支指令之目標位址設於BTAC預測目標位址 信號164上。BTAC142根據第6圖之預測狀態A 602或預 測狀態B 604之値而產生T/NT信號1212,取決於該 BTAC142在分支預測時係使用A或B部份。 BTAC142也產生B」vrap信號1214,輸出至控制邏輯 電路1014。所選之BTAC目標位址陣列項目312之第4圖 之橫跨位元406之値係設成B_wrap信號1214。因此, B_w:rap信號1214之僞値代表,BTAC142預測成該分支指 令未橫跨於兩快取線。在一實施例中,控制邏輯電路1014 暫存B_wrap信號1214以維持從先前BTAC142存取所得之 B_wrap信號1214之値。 控制邏輯電路1014也產生第1圖之目前指令指標 168。控制邏輯電路1014也產生一控制信號1204,其是多 工器136之輸入選擇信號。 如果控制邏輯電路1014偵測出死結狀態(亦即,所暫 存之B—wrap信號1214爲僞値,與F_wrap信號1202、落 空信號1206與預測信號1208爲真値),這將於底下詳述, 則控制邏輯電路1014儲存一真値於一死結旗標暫存器 1224內以代表現在有死結狀態,使得造成死結狀態之 BTAC142內之項目被無效。此外,控制邏輯電路1014載入 12838¾95-9-13 BTAC 142 generates an Execute/Do Not Execute (T/NT) signal 1212 which is output to control logic circuit 1014. The true T/NT signal 1212 represents the address 182 hits the BTAC 142, and the BTAC 142 predicts that a branch instruction is included in the cache line provided by the instruction cache 104 in response to the current capture address 162, representing the The branch instruction is to be executed, and the target address of the branch instruction is set on the BTAC prediction target address signal 164 on behalf of the BTAC 142. The BTAC 142 generates a T/NT signal 1212 based on the predicted state A 602 or the predicted state B 604 of Figure 6, depending on whether the BTAC 142 uses the A or B portion during branch prediction. The BTAC 142 also generates a B"vrap signal 1214 that is output to the control logic circuit 1014. The traversal bit 406 of Figure 4 of the selected BTAC target address array item 312 is set to a B_wrap signal 1214. Thus, the false representation of the B_w:rap signal 1214 represents that the BTAC 142 predicts that the branch instruction does not span the two cache lines. In one embodiment, control logic circuit 1014 temporarily stores B_wrap signal 1214 to maintain the B of B_wrap signal 1214 obtained from previous BTAC 142 access. Control logic circuit 1014 also produces the current command indicator 168 of Figure 1. Control logic circuit 1014 also produces a control signal 1204 which is an input select signal for multiplexer 136. If the control logic circuit 1014 detects the deadlock state (ie, the temporarily stored B-wrap signal 1214 is false, and the F_wrap signal 1202, the nulling signal 1206 and the prediction signal 1208 are true), this will be detailed below. Then, the control logic circuit 1014 stores a token in the dead-tag flag register 1224 to indicate that there is a dead-end state, so that the item in the BTAC 142 causing the dead-end state is invalidated. In addition, the control logic circuit 1014 loads 128383⁄4

28twf 1 .doc/00628twf 1 .doc/006

95-9-13 死結無效資料至死結無效資料暫存器1222內。在一實施例 中,存於死結無效資料暫存器1222內之資料係相似於第7 圖之BTAC寫入要求176;除了未儲存分支指令位址702 外,因爲該分支指令之位址係存於死結位址暫存器1226 內;以及未儲存目標位址706,開始位元708與橫跨位元 712,因爲在一無效BTAC142項目內,這些位元是無關緊 要的;因而,當執行死結無效化時,目標位址陣列302未 被寫入,而只有標籤陣列304被更新以將誤測之BTAC142 之項目無效化。死結無效資料暫存器1222之輸出包括第2 圖之死結資料信號246。死結旗標暫存器1224之輸出包括 第2圖之死結要求216。死結位址暫存器1226之輸出包括 第2圖之死結位址236。存於死結無效資料暫存器1222內 之該向値724係由造成該死結狀態之該BTAC142之該向塡 入。 如果控制邏輯電路1014偵測出死結狀態,則在將誤 測項目無效化後,控制邏輯電路1014也產生一値於控制信 號1204上以使得該多工器1306選擇該目前指令指標168 以造成微處理器100之分支,使得包括該誤測分支指令之 該快取線可被再次擷取。 現參考第13圖,顯示根據本發明之第12圖之死結避 免邏輯電路之操作流程圖。流程開始於方塊1302。 在方塊1302,目前擷取位址162係經由位址信號182 而輸入至指令快取104與輸入至BTAC142。在第13圖中, 該目前擷取位址162係稱爲擷取位址A。流程接續至方塊 1304。 1283822 28twfl .doc/00695-9-13 Dead knot invalid data to the dead knot invalid data register 1222. In one embodiment, the data stored in the dead-end invalid data register 1222 is similar to the BTAC write request 176 of Figure 7; except for the branch instruction address 702 not stored, because the address of the branch instruction is stored Within the dead node address register 1226; and the target address 706 is not stored, the start bit 708 and the traverse bit 712, because within an invalid BTAC 142 item, these bits are irrelevant; thus, when performing a dead knot In the case of invalidation, the target address array 302 is not written, and only the tag array 304 is updated to invalidate the mis-tested items of the BTAC 142. The output of the dead-end invalid data register 1222 includes the dead-end data signal 246 of FIG. The output of the dead-tie flag register 1224 includes the dead-end request 216 of Figure 2. The output of the dead-tie address register 1226 includes the dead-end address 236 of Figure 2. The direction 724 stored in the dead-end invalid data register 1222 is caused by the BTAC 142 causing the dead-end state. If the control logic circuit 1014 detects the deadlock state, after invalidating the misdetected item, the control logic circuit 1014 also generates a control signal 1204 to cause the multiplexer 1306 to select the current command indicator 168 to cause micro The branch of processor 100 causes the cache line including the misdetected branch instruction to be retrieved again. Referring now to Figure 13, there is shown an operational flow diagram of the deadlock avoidance logic circuit in accordance with Figure 12 of the present invention. Flow begins at block 1302. At block 1302, the current capture address 162 is input to the instruction cache 104 and to the BTAC 142 via the address signal 182. In Fig. 13, the current capture address 162 is referred to as the capture address A. The flow continues to block 1304. 1283822 28twfl .doc/006

95-9-13 在方塊1304,指令快取104將擷取位址A所指定之快 取線(稱爲快取線A)提供至指令緩衝器106,快取線A包括 分支指令之第一部份,但並無包括該分支指令之全部。流 程接續至方塊1306。 在方塊.1306,回應於擷取位址A,BTAC142預測快 取線A內之分支指令將被執行並設於T/NT信號1212上, 產生爲僞値之B_wrap信號1214,並將一預測目標位址設 於BTAC預測目標位址164上。流程接續至方塊1308。 在方塊1308,控制邏輯電路1014控制多工器136以 選擇BTAC預測目標位址164爲下一個目前擷取位址162, 稱爲擷取位址B。控制邏輯電路1014也產生爲真値之預測 信號1208,因爲BTAC預測目標位址164是預測性的。流 程接續至方塊1312。 在方塊1312,指令快取104產生爲真値之落空信號 1206以代表分支位址B係落空於指令快取104內。正常下, 指令擷取器102可能從記憶體擷取該落空快取線;然而, 因爲預測信號1208爲真,指令規格化器108並不記憶體擷 取該落空快取線,理由將於底下描述。流程接續至方塊 1314 〇 在方塊1314,指令規格化器108解碼指令緩衝器106 內之快取線A並產生爲真値之F_wi*ap信號1202,因爲該 分支指令橫跨兩快取線。指令規格化器108等待要存於指 令緩衝器106內之下一快取線,使得其可完成對分支指令 之規格化以將之輸出至規格化後指令佇列112。流程接續至 方塊1316。 1283822 28twfl .doc/〇)〇6 95-9-13 在方塊1316,控制邏輯電路1014決定:所暫存之 B_wrap信號1214是否爲僞値,F_wrap信號1202是否爲真 値,落空信號1206是否爲真値與預測信號1208是否爲真 値;這包括了底下所描述之死結狀態。如果是,流程接續 至方塊1318 ;否則,流程結束。 在方塊1318,控制邏輯電路1014將造成死結狀態之 該BTAC142項目無效化,如參考第12圖所述。接著,當 下次將擷取位址A輸入至BTAC142時,BTAC142將產生 一落空,因爲造成死結狀態之該項目現已被無效化。流程 接續至方塊1322。 在方塊1322,控制邏輯電路1014控制多工器136以 分支至目前指令指標168,如參考第12圖之描述。此外, 當控制邏輯電路1014控制該多工器136選擇目前指令指標 168時,控制邏輯電路1014產生爲僞値之預測信號1208, 因爲目前指令指標168不是預測性記憶體位址。很可能目 前指令指標168會命中於指令快取104內;然而,如果沒 命中的話,指令擷取器102將從記憶體擷取目前指令指標 168所指定之快取線,因爲預測信號1208代表目前指令指 標168不是預測性。流程結束於方塊1322。 如果決定方塊1316爲真時,存在有死結狀態之理由 在於,造成死結之必要情況是存在的。造成死結之第一情 況是橫跨於兩不同快取線之多位元組分支指令。亦即,該 分支指令位元組之第一部份係位於第一快取線之尾端,而 該分支指令位元組之第二部份係位於下一快取線之開端。 因爲橫跨分支指令之可能性,該BTAC142必需儲存預測一 38 1283827s95-9-13 At block 1304, the instruction cache 104 provides a cache line (referred to as cache line A) designated by the capture address A to the instruction buffer 106, the cache line A including the first branch instruction Part, but does not include all of the branch instructions. The flow continues to block 1306. At block .1306, in response to the retrieved address A, the BTAC 142 predicts that the branch instruction in the cache line A will be executed and set on the T/NT signal 1212, generating a false B_wrap signal 1214 and a prediction target. The address is located on the BTAC prediction target address 164. Flow continues to block 1308. At block 1308, control logic circuit 1014 controls multiplexer 136 to select BTAC prediction target address 164 as the next current capture address 162, referred to as capture address B. Control logic circuit 1014 also produces a prediction signal 1208 that is true because BTAC prediction target address 164 is predictive. The flow continues to block 1312. At block 1312, the instruction cache 104 is generated as a true fail signal 1206 to indicate that the branch address B is dropped within the instruction cache 104. Normally, the command extractor 102 may retrieve the lost cache line from the memory; however, because the prediction signal 1208 is true, the command normalizer 108 does not retrieve the cache line from the memory, for the reason that it will be underneath. description. Flow continues to block 1314. At block 1314, instruction normalizer 108 decodes cache line A in instruction buffer 106 and generates a true F_wi*ap signal 1202 because the branch instruction spans the two cache lines. The instruction normalizer 108 waits for the next cache line to be stored in the instruction buffer 106 so that it can normalize the branch instructions to output to the normalized instruction queue 112. The flow continues to block 1316. 1283822 28twfl .doc/〇)〇6 95-9-13 At block 1316, control logic circuit 1014 determines whether the temporarily stored B_wrap signal 1214 is false, whether the F_wrap signal 1202 is true, and whether the null signal 1206 is true.値 and prediction signal 1208 is true; this includes the dead knot state described below. If so, the process continues to block 1318; otherwise, the process ends. At block 1318, control logic circuit 1014 invalidates the BTAC 142 item causing the dead-end state, as described with reference to FIG. Next, when the capture address A is input to the BTAC 142 next time, the BTAC 142 will fail, because the item causing the dead state is now invalidated. Flow continues to block 1322. At block 1322, control logic circuit 1014 controls multiplexer 136 to branch to current instruction indicator 168 as described with reference to FIG. In addition, when control logic circuit 1014 controls multiplexer 136 to select current command indicator 168, control logic circuit 1014 generates a predicted signal 1208 that is false because the current command indicator 168 is not a predictive memory address. It is likely that the current instruction indicator 168 will hit the instruction cache 104; however, if there is no hit, the instruction fetcher 102 will retrieve the cache line specified by the current instruction indicator 168 from the memory because the prediction signal 1208 represents the current Command indicator 168 is not predictive. Flow ends at block 1322. If the decision block 1316 is true, the reason for the dead state is that the necessary condition for the dead knot is present. The first cause of a dead knot is a multi-byte branch instruction that spans two different cache lines. That is, the first portion of the branch instruction byte is located at the end of the first cache line, and the second portion of the branch instruction byte is located at the beginning of the next cache line. Because of the possibility of crossing the branch instruction, the BTAC142 must store the prediction one 38 1283827s

95-9-13 分支指令是否橫跨快取線之資訊,使得控制邏輯電路1014 得知是否要擷取下一快取線以在擷取位於目標位址164之 快取線之前就取得分支指令位元組之下半部。如果 BTAC142儲存了錯誤的預測資訊,BTAC142可會g會錯誤地 預測爲該分支指令未橫跨,但實際上有橫跨。在此例下, 該指令規格化器108將利用分支指令之前半部來解碼該快 取線並偵測出已存在有一分支指令,但並非分支指令之全 部位元組已可用於解碼。該指令規格化器108會等待下一 快取線。該管線會一直等待要被規格化之更多指令以將之 執行。 造成死結情況之第二情況是,因爲該BTAC142預測 該分支指令未橫跨,該分支控制邏輯電路1014擷取該 BTAC142輸出之目標位址164所暗指之快取線(並無擷取下 一快取線)。然而,該目標位址164落空於該指令快取104 內。因此,該指令規格化器1〇8所等待之下一快取線必需 從記憶體擷取。 造成死結情況之第三情況是,微處理器之晶片組並無 預期到會有從某些記憶體位址範圍內擷取出指令,以及如 果該微處理器從未預期之記憶體位址範圍產生指令擷取 時,微處理器之晶片組可能會使得系統閒置或產生其他不 良之系統情況。預測性位址,比如BTAC142所輸出之目標 位址164,可能會從未預期之記憶體位址範圍造成指令擷 取。因而,該微處理器1〇〇並無從記憶體之一預測性BTAC 預測目標位址164擷取一落空快取線。 因此,指令規格化器1〇8與管線之其他部份係等待另 39 1283827s 950·Λ3日修(jk)正替換頁I 2 81 w f 1 . d 〇 c / Ο 6 95-9-13 一快取線。同時,該指令擷取器102係等待該管線以告知 要執行一非預測性擷取。在非死結情況下,比如,如果該 目標位址164命中於指令快取104內,指令規格化器108 會將分支指令規格化(雖然是利用不正確的位元組)與將規 格化後之分支指令提供至分支之執行階段,執行階段會偵 測出誤測並將BTAC142之誤測更正,因而使得該預測信號 1208變成僞値。然而,在死結情況下,該執行將永遠無法 偵測出誤測,因爲指令規格化器108未將規格化後之分支 指令提供至分支之執行階段,因爲指令規格化器108仍在 等待下一快取線。因此,發生死結情況。然而,第12圖之 死結避免邏輯電路可有效避免死結情況之發生,如第12圖 與第13圖所述,因而使得微處理器100可適當操作。 雖然已詳細描述本發明與其目的,特徵與優點,本發 明仍可包括其他實施例。比如,雖然該寫入佇列係相關於 單埠BTAC,在某些微處理器架構中,僞性落空也可能發生 於多埠BTAC中,儘管頻率較低。因此,可應用該寫入佇 列以減少多埠BTAC之僞性落空率。此外,在未讀取BTAC 之某些微處理器中,可能也有除了在此所描述情況外之其 他情況,其中佇列於該寫入佇列內之要求可寫入至BTAC。 另,雖然已詳細描述本發明與其目的,特徵與優點, 本發明仍可包括其他實施例。除了利用硬體來實施本發明 外,本發明也可實施於電腦可用式(比如,可讀式)媒介內之 電腦可讀碼(比如,電腦可讀程式碼,資料等)。電腦碼可完 成所揭露之本發明之功能或製造或兩者皆可。比如,可利 用一般程式語言(比如,C,C++,JAVA等);GDSII資料庫; 12838¾ 2 8 twf1.doc/00695-9-13 Whether the branch instruction crosses the information of the cache line, so that the control logic circuit 1014 knows whether to capture the next cache line to obtain the branch instruction before extracting the cache line located at the target address 164 The lower half of the byte. If BTAC142 stores incorrect prediction information, BTAC142 may incorrectly predict that the branch instruction does not span, but actually spans. In this example, the instruction normalizer 108 will utilize the first half of the branch instruction to decode the cache line and detect that a branch instruction already exists, but not all of the branch tuples are available for decoding. The instruction normalizer 108 will wait for the next cache line. The pipeline will wait for more instructions to be normalized to execute. The second condition causing the dead knot condition is that because the BTAC 142 predicts that the branch instruction does not straddle, the branch control logic circuit 1014 retrieves the cache line implied by the target address 164 of the BTAC 142 output (no next step is taken) Cache line). However, the target address 164 falls within the instruction cache 104. Therefore, the instruction normalizer 1〇8 waits for the next cache line to be retrieved from the memory. The third situation that causes a deadlock situation is that the microprocessor chipset does not anticipate that instructions will be fetched from certain memory address ranges and if the microprocessor generates an instruction from an unanticipated memory address range. At the time of the acquisition, the microprocessor chipset may cause the system to be idle or cause other undesirable system conditions. Predictive addresses, such as destination address 164 output by BTAC142, may cause instruction fetches from unanticipated memory address ranges. Thus, the microprocessor 1 does not draw a lost cache line from one of the predictive BTAC prediction target addresses 164 of the memory. Therefore, the instruction normalizer 1〇8 and other parts of the pipeline are waiting for another 39 1283827s 950·Λ3 day repair (jk) is replacing page I 2 81 wf 1 . d 〇c / Ο 6 95-9-13 Take the line. At the same time, the command extractor 102 waits for the pipeline to signal that a non-predictive capture is to be performed. In the case of a non-dead knot, for example, if the target address 164 hits the instruction cache 104, the instruction normalizer 108 normalizes the branch instruction (although it utilizes incorrect bytes) and normalizes it. The branch instruction is provided to the execution phase of the branch, which detects the misdetection and corrects the misdetection of the BTAC 142, thereby causing the prediction signal 1208 to become false. However, in the event of a deadlock, the execution will never detect a misdetection because the instruction normalizer 108 does not provide the normalized branch instruction to the execution phase of the branch because the instruction normalizer 108 is still waiting for the next Cache line. Therefore, a dead knot occurs. However, the dead-end avoidance logic circuit of Fig. 12 can effectively prevent the occurrence of a dead-end condition, as described in Figs. 12 and 13, thus allowing the microprocessor 100 to operate properly. Although the invention has been described in detail, its features and advantages, the invention may include other embodiments. For example, although the write queue is related to 單埠BTAC, in some microprocessor architectures, false nulls may occur in multiple BTACs, albeit at a lower frequency. Therefore, the write queue can be applied to reduce the false drop rate of the multi-turn BTAC. In addition, in some microprocessors that do not read the BTAC, there may be other cases than those described herein in which the requirements listed in the write queue can be written to the BTAC. In addition, while the invention has been described in detail, its features and advantages, the invention may include other embodiments. In addition to the use of hardware to implement the present invention, the present invention can also be embodied in computer readable code (e.g., computer readable code, data, etc.) in a computer usable (e.g., readable) medium. The computer code may fulfill the functionality or manufacture of the disclosed invention or both. For example, general programming languages (eg, C, C++, JAVA, etc.); GDSII database; 128383⁄4 2 8 twf1.doc/006

95-9-1395-9-13

硬體描述語言(hard description language,HDL),包括 Verilog HDL,VHDL,Altera HDL(AHDL)等;或現有之其他 程式及/或電路(亦即槪要式)擷取工具。電腦碼可載入於包 括半導體記憶體,磁碟,光碟(比如,CD-ROM,DVD-ROM 等)之任意習知電腦可用式(比如,可讀式)媒介內;以及以 電腦資料信號之形式實施於電腦可用式(比如,可讀式)傳輸 媒介(比如,載波,或包括數位,光學或類比式媒介之其他 媒介)。因此,電腦碼可傳輸於包括網際網路與企業網路(指 令tranet)通訊網路上。要知道,本發明可實施於電腦碼(比 如,IP(智財權)核心之一部份,比如爲微處理器核心,或爲 系統級設計,比如系統單晶片(SOC))與轉換成積體電路之 部份硬體。另,本發明可實施成硬體與電腦碼之組合。 雖然本發明已以一較佳實施例揭露如上,然其並非用 以限定本發明,任何熟習此技藝者,在不脫離本發明之精 神和範圍內,當可作些許之更動與潤飾,因此本發明之保 護範圍當視後附之申請專利範圍所界定者爲準。Hard description language (HDL), including Verilog HDL, VHDL, Altera HDL (AHDL), etc.; or other existing programs and/or circuits (also known as summary). The computer code can be loaded into any conventional computer usable (eg, readable) medium including semiconductor memory, magnetic disk, optical disk (eg, CD-ROM, DVD-ROM, etc.); The form is implemented in a computer usable (eg, readable) transmission medium (eg, a carrier wave, or other medium including digital, optical, or analog media). Therefore, the computer code can be transmitted over the communication network including the Internet and the corporate network (instruction tranet). It should be understood that the present invention can be implemented in a computer code (for example, a part of the core of the IP, such as a microprocessor core, or a system level design, such as a system single chip (SOC)) and converted into an integrated body. Part of the hardware of the circuit. In addition, the present invention can be implemented as a combination of hardware and computer code. Although the present invention has been described above in terms of a preferred embodiment, it is not intended to limit the invention, and it is obvious to those skilled in the art that the present invention may be modified and retouched without departing from the spirit and scope of the invention. The scope of the invention is defined by the scope of the appended claims.

Jfl式簡單說明 第1圖顯示根據本發明之微處理器之方塊圖。 第2圖顯示根據本發明之第1圖之微處理器之部份詳 細方塊圖。 第3圖顯示根據本發明之第1圖之BTAC之部份詳細 方塊圖。 第4圖顯示根據本發明之第3圖之目標位址陣列項目 內容之方塊圖。 第5圖顯示根據本發明之第3圖之標籤陣列項目內容 41 128382^BRIEF DESCRIPTION OF THE FLOW FIG. 1 shows a block diagram of a microprocessor in accordance with the present invention. Fig. 2 is a partial block diagram showing a portion of a microprocessor in accordance with Fig. 1 of the present invention. Fig. 3 is a partial block diagram showing a portion of the BTAC according to Fig. 1 of the present invention. Fig. 4 is a block diagram showing the contents of the target address array item in accordance with Fig. 3 of the present invention. Figure 5 shows the contents of the tag array item according to Fig. 3 of the present invention 41 128382^

828twf 1 .doc/006828twf 1 .doc/006

95-9-13 之方塊圖。 第6圖顯示根據本發明之第3圖之計數器陣列項目內 容之方塊圖。 第7圖顯示根據本發明之第1圖之BTAC寫入要求內 容之方塊圖。 第8圖顯示根據本發明之第3圖之BTAC寫入佇列之 方塊圖。 第9圖顯示根據本發明之第1圖之BTAC寫入佇列之 操作流程圖。 第10圖顯示根據本發明之第1圖之該微處理器內之 該BTAC之多餘目標位址無效邏輯電路之方塊圖。 第11圖顯示根據本發明之第10圖之多餘目標位址裝 置之操作流程圖。 第12圖顯示根據本發明之第1圖之該微處理器內之 死結避免邏輯電路之方塊圖。 第13圖顯示根據本發明之第12圖之死結避免邏輯電 路之操作流程圖。 圖式標示說明= 100 :微處理器 102 :指令擷取器 104 :指令快取 106 :指令緩衝器 108 :指令規格化器 112 :規格化指令佇列 114 :指令轉譯器 1283827, 828twfl .doc/006Block diagram of 95-9-13. Fig. 6 is a block diagram showing the contents of a counter array item according to Fig. 3 of the present invention. Fig. 7 is a block diagram showing the contents of the BTAC write request in accordance with Fig. 1 of the present invention. Fig. 8 is a block diagram showing the BTAC write queue according to Fig. 3 of the present invention. Fig. 9 is a flow chart showing the operation of the BTAC write queue according to Fig. 1 of the present invention. Figure 10 is a block diagram showing the redundant target address invalidation logic circuit of the BTAC in the microprocessor in accordance with the first embodiment of the present invention. Fig. 11 is a flow chart showing the operation of the redundant target address device in accordance with Fig. 10 of the present invention. Figure 12 is a block diagram showing the dead-end avoidance logic circuit in the microprocessor in accordance with Figure 1 of the present invention. Figure 13 is a flow chart showing the operation of the deadlock avoidance logic circuit according to Fig. 12 of the present invention. Schematic Description = 100: Microprocessor 102: Instruction Extractor 104: Instruction Cache 106: Instruction Buffer 108: Instruction Normalizer 112: Normalized Instruction Array 114: Instruction Translator 1283827, 828 twfl .doc/ 006

95-9-1 3 116 :轉譯後指令佇列 118 :暫存器階段 122 :位址階段 124 :資料階段 126 :執行階段 128 :儲存階段 132 :寫回階段 134 :加法器 136,148,206 ··多工器 138 :指令95-9-1 3 116: Post-translation command queue 118: Register stage 122: Address stage 124: Data stage 126: Execution stage 128: Storage stage 132: Write back stage 134: Adder 136, 148, 206 ··Multiplexer 138: Instructions

142 : BTAC 144 : BTAC 寫入佇列(BWQ) 146 :佇列深度 152 :分支誤測信號 154 :預測取代信號 156 :指令緩衝器全滿信號 15.8 :指令快取閒置信號 162 :目前擷取位址 164 :預測目標位址 1 6 6 :下一擷取位址 168 :目前指令指標 172 :正確位址 174 :取代預測目標位址 176 : BTAC寫入要求 178 : BTAC寫入佇列位址 43142 : BTAC 144 : BTAC write queue (BWQ) 146 : 深度 column depth 152 : branch misdetection signal 154 : predictive replacement signal 156 : command buffer full full signal 15.8 : command cache idle signal 162 : current capture bit Address 164: Prediction target address 1 6 6 : Next acquisition address 168: Current instruction indicator 172: Correct address 174: Replace prediction target address 176: BTAC write request 178: BTAC write queue address 43

128382J128382J

28twfl .doc/00628twfl .doc/006

95-9-13 182 :位址 202 :仲裁器 212 : BTAC讀取要求信號 214 :多餘目標位址(ΤΑ)要求信號 216 :死結要求信號 218 : BWQ非空信號 222 : BWQ全滿信號 234 :多餘ΤΑ位址 236 :死結位址 φ 244 :多餘ΤΑ資料信號 246 :死結資料信號 248 : BWQ資料信號 252,258,262,1204 :控制信號 256 :資料信號 302 :目標位址陣列 304 :標籤陣列 306 :計數器陣列 312 :目標紐陣翻目 ® 314 :標籤陣列項目 316 :計數器陣列項目 402 :分支目標位址 404,708 :開始欄位 406 :橫跨位元 502 :標籤 504 : Α有效位元 44 1283827, 81 wf 195-9-13 182: Address 202: Arbiter 212: BTAC Read Requirement Signal 214: Excess Target Address (ΤΑ) Requirement Signal 216: Dead End Requirement Signal 218: BWQ Non-empty Signal 222: BWQ Full Full Signal 234: Redundant address 236: dead node address φ 244: excess data signal 246: dead junction data signal 248: BWQ data signal 252, 258, 262, 1204: control signal 256: data signal 302: target address array 304: label array 306: Counter Array 312: Target Snapshot® 314: Tag Array Item 316: Counter Array Item 402: Branch Destination Address 404, 708: Start Field 406: Across Byte 502: Label 504: Α Valid Bit 44 1283827, 81 wf 1

95-9-13 506 : B有效位元 508 : lru 欄位 ^ 602 :預測狀態A計數器 604 :預測狀態B計數器 606 : A/Blru 位元 702 :分支指令位址欄位 706 :目標位址 712 :橫跨位元 714 :寫入致能A欄位 · 716 :寫入致能B欄位 718 :無效A欄位 722 :無效B欄位 724 :向欄位 802 :儲存元件 804,1004 ··有效位元 806,1014 :控制邏輯電路 1002 :標籤 1006 :匹配關 ^ 1012 :比較器 1022 :多餘TA無效資料暫存器 1024 :多餘TA旗標暫存器 1026 :多餘TA位址暫存器 1202 : F_wrap 信號 1206 :落空信號 1208 :預測信號 45 12838¾ 828twf 1 .d 嗶令月U曰修止替換頁 3C/006 95-9-X 3 1212 :執行/不執行(T/NT)信號 1 214 : B—wrap 信號 1222 :死結無效資料暫存器 1224 :死結旗標暫存器 1226 :死結位址暫存器95-9-13 506: B valid bit 508: lru field ^ 602: predicted state A counter 604: predicted state B counter 606: A/Blru bit 702: branch instruction address field 706: target address 712 : Horizon Bit 714: Write Enable A Field · 716: Write Enable B Field 718: Invalid A Field 722: Invalid B Field 724: Direction Field 802: Storage Element 804, 1004 ·· Valid Bits 806, 1014: Control Logic Circuit 1002: Tag 1006: Matching Off 1012: Comparator 1022: Extra TA Invalid Data Register 1024: Extra TA Flag Register 1026: Extra TA Address Register 1202 : F_wrap signal 1206: Falling signal 1208: Predicted signal 45 128383⁄4 828twf 1 .d 哔 月 曰 曰 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 执行 执行 执行 执行 执行 执行 执行 执行 执行 执行 执行 执行B-wrap signal 1222: dead knot invalid data register 1224: dead knot flag register 1226: dead node address register

4646

Claims (1)

1283827s 27828twfl .doc/0061283827s 27828twfl .doc/006 頊請委β明示跑.來〕馮 Ε 所提之修正本有無辟也带說明f 或1]式所揭露之範函 一 十、申請專利範圍: 1·一種寫入佇列裝置,改善一微處理器內之一分支目 標位址快取之效率,該寫入佇列裝置包括: 一要求輸入單元,接收一要求以更新該分支目標位址 快取,該要求包括一分支指令目標位址; 多數個儲存元件,儲存該要求輸入單元所接收之該些 要求;以及 控制邏fe電路’親合至該些儲存兀件,回應於一^或多 既定條件而將存於該些儲存元件內之該些要求之一寫入至 該分支目標位址快取。 2·如申請專利範圍第1項所述之寫入佇列裝置,更包 括: 一快取閒置輸入單元,耦合至該控制邏輯電路,當平 行於該分支目標位址快取而存取之一指令快取爲閒置時, 指定該分支目標位址快取未被讀取之該一或多既定條件之 -· 〇 3·如申請專利範圍第1項所述之寫入佇列裝置,更包 括: 一緩衝器全滿輸入單元,親合至該控制邏輯電路,因 爲一指令緩衝器全滿,其指定該分支目標位址快取未被讀 取之該一或多既定情況之一,其中該指令緩衝器從平行於 該分支目標位址快取而存取之一指令快取接收指令。 4.如申請專利範圍第1項所述之寫入佇列裝置,更包 括: 一預測取代輸入單元,耦合至該控制邏輯電路,因爲 1283822 8twf 1 -doc/006顼 委 明 明 明 明 明 明 来 来 明 明 明 明 Ε Ε Ε Ε Ε Ε Ε Ε Ε Ε Ε Ε Ε Ε Ε Ε Ε Ε Ε Ε Ε Ε Ε Ε Ε Ε Ε Ε Ε Ε Ε Ε Ε Ε Ε Ε Ε Ε Ε Ε The efficiency of one of the branch target address caches in the processor, the write queue device includes: a request input unit, receiving a request to update the branch target address cache, the request including a branch instruction target address; a plurality of storage elements storing the requirements received by the request input unit; and the control logic circuit 'affining to the storage elements, and storing in the storage elements in response to one or more predetermined conditions One of these requirements is written to the branch target address cache. 2. The write queue device of claim 1, further comprising: a cache idle input unit coupled to the control logic circuit for accessing one another parallel to the branch target address cache When the instruction cache is idle, the one or more predetermined conditions for which the branch target address cache is not read are specified - 〇 3 · The write queue device described in claim 1 of the patent scope includes : a buffer full full input unit, affinity to the control logic circuit, because an instruction buffer is full, which specifies one of the one or more predetermined conditions in which the branch target address cache is not read, wherein The instruction buffer accesses an instruction fetch receive instruction from a cache parallel to the branch target address. 4. The write queue device of claim 1, further comprising: a predictive replacement input unit coupled to the control logic circuit because 1283822 8twf 1 -doc/006 95-9-13 該分支目標位址快取所完成之一第一分支指令預測係被該 微處理器內之分支預測邏輯電路所完成之一第二分支指令 預測所取代,其指定該分支目標位址快取未被讀取之該一 或多既定情況之一^。 5·如申請專利範圍第1項所述之寫入佇列裝置,更包 括: 一分支誤測輸入單元,耦合至該控制邏輯電路,因爲 偵測到該分支目標位址快取完成一分支指令誤測,其指定 該分支目標位址快取未被讀取之該一或多既定情況之一。 6·如申請專利範圍第1項所述之寫入佇列裝置,更包 括: 一疗列全滿信號暫存器,耦合至該控制邏輯電路,指 定全部儲存元件係正儲存待寫入至該分支目標位址快取之 一要求之該一或多既定情況之一。 7·如申請專利範圍第1項所述之寫入佇列裝置,更包 括: 多數個有效位元暫存器,耦合至該控制邏輯電路,每 個有效位元暫存器指示存於對應之儲存元件內之該要求是 否爲有效° 8·如申請專利範圍第1項所述之寫入佇列裝置,其中 該要求更包括該分支指令之一記憶體位址。 9.如申請專利範圍第1項所述之寫入佇列裝置,其中 該分支目標位址快取是~ N-向指令集聯合快取,其中該要 求更包括指定該要求係要寫入至該分支目標位址快取內之 該N向之哪一向之資訊。 48 1283822, 8twfl .doc/00695-9-13 The branch target address cache completes that one of the first branch instruction predictions is replaced by a second branch instruction prediction done by the branch prediction logic circuit in the microprocessor, which specifies the branch target The address cache is one of the one or more predetermined conditions that were not read. 5. The write queue device of claim 1, further comprising: a branch misdetection input unit coupled to the control logic circuit, wherein the branch target address cache is detected to complete a branch instruction False detection, which specifies one of the one or more established conditions in which the branch target address cache is not read. 6. The write queue device of claim 1, further comprising: a therapy column full signal register coupled to the control logic circuit, specifying that all storage components are being stored to be written to the One of the one or more established conditions required for one of the branch target address caches. 7. The write queue device of claim 1, further comprising: a plurality of valid bit registers, coupled to the control logic circuit, each valid bit register indication being stored in a corresponding Whether the requirement in the storage element is valid or not. The writing device according to claim 1, wherein the request further comprises a memory address of the branch instruction. 9. The write queue device of claim 1, wherein the branch target address cache is a ~N-to-instruction set joint cache, wherein the request further comprises specifying that the request is to be written to The information of the N direction in the branch target address cache. 48 1283822, 8twfl .doc/006 95-9-13 10.—種微處理器,包括: 一指令快取,回應於一指令擷取位址而提供指令位元 組之一快取線; 一分支目標位址快取,耦合至該指令快取,預測存於 該快取線內之一分支指令之一分支目標位址;以及 一寫入佇列,耦合至該分支目標位址快取,儲存用於 更新該分支目標位址快取之分支目標位址。 11·如申請專利範圍第10項所述之微處理器,其中如 果該寫入佇列不是空的,當該指令快取處於閒置時,則該 寫入佇列利用該些分支目標位址之一來更新該分支目標位 址快取。 12·如申請專利範圍第10項所述之微處理器,更包括: 一指令緩衝器,耦合至該指令快取,儲存從該指令快 取接收之零或多個快取線。 13.如申請專利範圍第12項所述之微處理器,其中如 果該寫入佇列不是空的,當該指令緩衝器指示其爲滿的 時,則該寫入佇列利用該些分支目標位址之一來更新該分 支目標位址快取。 · 14·如申請專利範圍第1〇項所述之微處理器,更包括: 分支預測邏輯電路,耦合至該寫入佇列,其中在該分 支目標位址快取完成一分支指令之一第一預測後,該分支 預測邏輯電路完成該分支指令之一第二預測,其中該微處 理器利用該第二預測來取代該第一預測。 15·如申請專利範圍第14項所述之微處理器,其中如 果該寫入佇列不是空的,當該微處理器利用該第二預測來 49 1283827s 28twf 1 .doc/006 瓦1:95-9-13 10. A microprocessor comprising: an instruction cache, providing a cache line of one of the instruction bytes in response to an instruction fetch address; a branch target address cache, coupled to The instruction cache, predicting a branch target address stored in one of the branch instructions in the cache line; and a write queue coupled to the branch target address cache for storing the branch target address The branch target address of the cache. 11. The microprocessor of claim 10, wherein if the write queue is not empty, when the instruction cache is idle, the write queue utilizes the branch target addresses First update the branch target address cache. 12. The microprocessor of claim 10, further comprising: an instruction buffer coupled to the instruction cache to store zero or more cache lines received from the instruction cache. 13. The microprocessor of claim 12, wherein if the write queue is not empty, when the instruction buffer indicates that it is full, the write queue utilizes the branch targets One of the addresses to update the branch target address cache. 14. The microprocessor of claim 1, further comprising: a branch prediction logic coupled to the write queue, wherein the branch target address cache completes one of the branch instructions After a prediction, the branch prediction logic completes a second prediction of the branch instruction, wherein the microprocessor replaces the first prediction with the second prediction. 15. The microprocessor of claim 14, wherein if the write queue is not empty, the microprocessor utilizes the second prediction to 49 1283827s 28twf 1 .doc/006 watt 1: 95-9-13 取代該第一預測時,則該寫入佇列利用該些分支目標位址 之一來更新該分支目標位址快取。 16·如申請專利範圍第10項所述之微處理器,更包括: 分支決定邏輯電路,耦合至該寫入佇列,更正該分支 目標位址快取所完成之一分支指令之一誤測。 17. 如申請專利範圍第16項所述之微處理器,其中如 果該寫入佇列不是空的,當該微處理器校正該分支目標位 址快取所完成之該分支指令之該誤測時,該寫入佇列利用 該些分支目標位址之一來更新該分支目標位址快取。 18. 如申請專利範圍第10項所述之微處理器,其中如 果該寫入佇列變成全滿,該寫入佇列利用該些分支目標位 址之一來更新該分支目標位址快取。 19. 如申請專利範圍第10項所述之微處理器,其中該 寫入佇列正在寫入該分支目標位址快取時,如果該分支目 標位址快取被讀取,該分支目標位址快取產生一落空。 20. 如申請專利範圍第10項所述之微處理器,其中該 分支目標位址快取包括一單埠記憶體陣列以儲存多數個分 支目標位址。 21. 如申請專利範圍第10項所述之微處理器,其中該 分支目標位址快取包括一單埠記憶體陣列以儲存多數個分 支!指令之位址標簾。 22. —種更新一微處理器內之一分支目標位址快取之 方法,該方法包括下列步驟: 產生一要求以更新該分支目標位址快取; 儲存該要求於一佇列;以及 12838¾ 278 28twfl .doc/00 6.When the first prediction is replaced by 95-9-13, the write queue updates the branch target address cache with one of the branch target addresses. 16. The microprocessor of claim 10, further comprising: a branch decision logic coupled to the write queue to correct one of the branch instructions of the branch target address cache . 17. The microprocessor of claim 16, wherein if the write queue is not empty, the microprocessor corrects the misdetection of the branch instruction completed by the branch target address cache. The write queue updates the branch target address cache with one of the branch target addresses. 18. The microprocessor of claim 10, wherein if the write queue becomes full, the write queue updates the branch target address cache using one of the branch target addresses . 19. The microprocessor of claim 10, wherein the write queue is being written to the branch target address cache, and if the branch target address cache is read, the branch target bit The address cache has failed. 20. The microprocessor of claim 10, wherein the branch target address cache comprises a memory array to store a plurality of branch target addresses. 21. The microprocessor of claim 10, wherein the branch target address cache comprises a memory array to store a plurality of branch! 22. A method of updating a branch target address cache in a microprocessor, the method comprising the steps of: generating a request to update the branch target address cache; storing the request in a queue; and 128383⁄4 278 28twfl .doc/00 6. έι條(笑kn妈i 95-9-13 在該儲存步驟之後,根據該要求而更新該分支目標位 址快取。 23. 如申請專利範圍第22項所述之方法,其中更新該 分支目標位址快取之步驟係執行於該儲存步驟後之該微處 理器之一時脈周期內。 24. 如申請專利範圍第22項所述之方法,更包括: 決定該分支目標位址快取是否未被正在讀取; 其中如果該分支目標位址快取未被正在讀取,則執行 該更新步驟。 25. 如申請專利範圍第24項所述之方法,更包括: 因锅合至該分支目標位址快取之一指令快取係閒置 而決定該分支目標位址快取是否未被正在讀取。 26. 如申請專利範圍第24項所述之方法,更包括: 因一指令緩衝器係全滿而決定該分支目標位址快取 是否未被正在讀取,其中該指令緩衝器接收從耦合至該分 支目標位址快取之一指令快取所輸出之指令。 27. 如申請專利範圍第22項所述之方法,更包括: 決定該分支目標位址快取所完成之一第一分支指令 預測是否被該微處理器內之其他分支預測邏輯電路所完成 之一第二分支指令預測取代; 其中如果該分支目標位址快取所完成之該第一分支 指令預測被該第二分支指令預測取代,則執行該更新步驟。 28. 如申請專利範圍第22項所述之方法,更包括: 決定該分支目標位址快取是否已誤測一分支指令; 其中如果該分支目標位址快取已誤測一分支指令,則 8twfl .doc/006 哞9月13日修泌)正替換頁 95-9-13 執行該更新步驟。 29.如申請專利範圍第22項所述之方法,更包括: 決定該佇列是否已全滿; 其中如果該佇列已全滿,則執行該更新步驟。Έι条 (笑笑妈妈i 95-9-13) After the storing step, the branch target address cache is updated according to the request. 23. The method of claim 22, wherein the branch target is updated The address cache step is performed in one of the clock cycles of the microprocessor after the storing step. 24. The method of claim 22, further comprising: determining whether the branch target address cache is Not being read; wherein if the branch target address cache is not being read, the update step is performed. 25. The method of claim 24, further comprising: fusing the branch to the branch One of the target address caches is idle and determines whether the branch target address cache is not being read. 26. The method of claim 24, further comprising: an instruction buffer Fully determining whether the branch target address cache is not being read, wherein the instruction buffer receives an instruction outputted from an instruction cache coupled to the branch target address cache. Scope 22 The method of the present invention further includes: determining whether one of the first branch instruction predictions completed by the branch target address cache is replaced by one of the second branch instruction predictions performed by other branch prediction logic circuits in the microprocessor; And if the first branch instruction prediction completed by the branch target address cache is replaced by the second branch instruction prediction, the updating step is performed. 28. The method according to claim 22, further comprising: Determining whether the branch target address cache has misdetected a branch instruction; if the branch target address cache has misdetected a branch instruction, then 8twfl.doc/006 哞September 13th revision) is replacing page 95 -9-13 Perform this update procedure. 29. The method of claim 22, further comprising: determining whether the queue is full; wherein the update step is performed if the queue is full. 52 128382,7: 28twf 1 .doc/006 f 97ΊΤ 95-9-13 instruction cache is idle, a misprediction by the BTAC is being corrected, or the prediction by the BTAC is being overridden. If the write queue becomes full, then it updates the BTAC anyway. 七、指定代表圖: (一) 本案指定代表圖為:第(1 )圖。 (二) 本代表圖之元件符號簡單說明: 10〇 :微處理器 W2 :指令擷取器 104 :指令快取 106 :指令緩衝器 1〇8 :指令規格化器 112 :規格化指令f宁列 114 :指令轉譯器 116 :轉譯後指令佇列 118 :暫存器階段 122 :位址階段· 124 :資料階段 126 :執行階段 128 :儲存階段 132 :寫回階段 134 :加法器 136 :多工器 :指令 142 : BTAC 2^28twf 1 .doc/006 95-9-13 144 : BTAC 寫入佇列(BWQ) 146 :佇列深度 148 :多工器 152 :分支誤測信號 154 :預測取代信號 156 :指令緩衝器全滿信號 158 :指令快取閒置信號 162 :目前擷取位址 164 :預測目標位址 166 :下一擷取位址 168 :目前指令指標 172 :正確位址 174 :取代預測目標位址 176 : BTAC寫入要求 178 : BTAC寫入佇列位址 182 :位址 八、本案若有化學式時,請揭示最能顯示發明特徵的化 學式:52 128382,7: 28twf 1 .doc/006 f 97ΊΤ 95-9-13 instruction cache is idle, a misprediction by the BTAC is being corrected, or the prediction by the BTAC is being overridden. If the write queue becomes full, then It updates the BTAC anyway. VII. Designated representative map: (1) The representative representative of the case is: (1). (2) A brief description of the component symbols of this representative diagram: 10〇: Microprocessor W2: Instruction fetcher 104: Instruction cache 106: Instruction buffer 1〇8: Instruction normalizer 112: Normalized instruction f 114: instruction translator 116: post-translation instruction queue 118: register stage 122: address stage · 124: data stage 126: execution stage 128: storage stage 132: write back stage 134: adder 136: multiplexer : Command 142 : BTAC 2^28twf 1 .doc/006 95-9-13 144 : BTAC write queue (BWQ) 146 : 深度 column depth 148 : multiplexer 152 : branch misdetection signal 154 : prediction replacement signal 156 : Instruction Buffer Full Completion Signal 158: Command Cache Idle Signal 162: Current Capture Address 164: Prediction Target Address 166: Next Capture Address 168: Current Instruction Indicator 172: Correct Address 174: Replace Prediction Target Address 176: BTAC write request 178: BTAC write queue address 182: Address VIII. If there is a chemical formula in this case, please disclose the chemical formula that best shows the characteristics of the invention:
TW093100409A 2003-01-14 2004-01-08 Apparatus and method for efficiently updating branch target address cache TWI283827B (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US44006503P 2003-01-14 2003-01-14

Publications (2)

Publication Number Publication Date
TW200414034A TW200414034A (en) 2004-08-01
TWI283827B true TWI283827B (en) 2007-07-11

Family

ID=34375165

Family Applications (1)

Application Number Title Priority Date Filing Date
TW093100409A TWI283827B (en) 2003-01-14 2004-01-08 Apparatus and method for efficiently updating branch target address cache

Country Status (2)

Country Link
CN (1) CN1282930C (en)
TW (1) TWI283827B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103870249B (en) * 2014-04-01 2017-08-25 龙芯中科技术有限公司 IA acquisition methods and instant compiler
CN106227676B (en) * 2016-09-22 2019-04-19 大唐微电子技术有限公司 A kind of cache and the method and apparatus that data are read from cache
CN111459551B (en) * 2020-04-14 2022-08-16 上海兆芯集成电路有限公司 Microprocessor with highly advanced branch predictor
CN111459550B (en) * 2020-04-14 2022-06-21 上海兆芯集成电路有限公司 Microprocessor with highly advanced branch predictor
CN112181497B (en) * 2020-09-28 2022-07-19 中国人民解放军国防科技大学 Method and device for transmitting branch target prediction address in pipeline

Also Published As

Publication number Publication date
CN1282930C (en) 2006-11-01
TW200414034A (en) 2004-08-01
CN1542625A (en) 2004-11-03

Similar Documents

Publication Publication Date Title
US10430340B2 (en) Data cache virtual hint way prediction, and applications thereof
JP4699666B2 (en) Store buffer that forwards data based on index and optional way match
JP3907809B2 (en) A microprocessor with complex branch prediction and cache prefetching.
US6393536B1 (en) Load/store unit employing last-in-buffer indication for rapid load-hit-store
US6523109B1 (en) Store queue multimatch detection
US7165168B2 (en) Microprocessor with branch target address cache update queue
US6622237B1 (en) Store to load forward predictor training using delta tag
US6651161B1 (en) Store load forward predictor untraining
US9009445B2 (en) Memory management unit speculative hardware table walk scheme
US6481251B1 (en) Store queue number assignment and tracking
TWI238966B (en) Apparatus and method for invalidation of redundant branch target address cache entries
US9131899B2 (en) Efficient handling of misaligned loads and stores
US20050268076A1 (en) Variable group associativity branch target address cache delivering multiple target addresses per cache line
US6694424B1 (en) Store load forward predictor training
JP2003514299A5 (en)
US20080086623A1 (en) Strongly-ordered processor with early store retirement
US10713172B2 (en) Processor cache with independent pipeline to expedite prefetch request
CN107992331B (en) Processor and method for operating processor
US7185186B2 (en) Apparatus and method for resolving deadlock fetch conditions involving branch target address cache
US6704854B1 (en) Determination of execution resource allocation based on concurrently executable misaligned memory operations
TWI283827B (en) Apparatus and method for efficiently updating branch target address cache
TWI242744B (en) Apparatus, pipeline microprocessor and method for avoiding deadlock condition and storage media with a program for avoiding deadlock condition
CN115380273A (en) Fetch stage handling of indirect jumps in a processor pipeline

Legal Events

Date Code Title Description
MK4A Expiration of patent term of an invention patent