TW200414034A

TW200414034A - Apparatus and method for efficiently updating branch target address cache

Info

Publication number: TW200414034A
Application number: TW093100409A
Authority: TW
Inventors: Thomas C Mcdonald
Original assignee: Ip First Llc
Priority date: 2003-01-14
Filing date: 2004-01-08
Publication date: 2004-08-01
Also published as: CN1542625A; CN1282930C; TWI283827B

Abstract

A microprocessor with a write queue for a branch target address cache (BTAC) is disclosed. The BTAC is read in parallel with an instruction cache in order to predict a target address of a branch instruction in the accessed cache line. In one embodiment, the BTAC is single-ported; hence, the single port must be shared for reading and writing. When the BTAC needs updating, such as when a branch target address is resolved, the microprocessor stores the branch target address and related information in the write queue. Thus, the write queue potentially enables updating of the BTAC to be delayed until the BTAC is not being read, such as when the instruction cache is idle, a misprediction by the BTAC is being corrected, or the prediction by the BTAC is being overridden. If the write queue becomes full, then it updates the BTAC anyway.

Description

200414034 五、發明說明（1) 發明所屬之技術領域本發明是有關於一種微處理器之分支預測（branch p r e d i c t i ο η )，且特別是有關於一種利用預測性分支目標位址快取之分支預測。先前技#ί200414034 V. Description of the invention (1) The technical field to which the invention belongs The present invention relates to a branch prediction of a microprocessor (branch predicti ο η), and in particular to a branch prediction using a predictive branch target address cache. . Previous technique # ί

現代的微處理器都是管線（P i Pe 1 i n e )之微處理器。亦即，在微處理器之不同方塊或管線階段中，可同時操作數個指令。由John L. Hennessy 與David A. Patterson 在其著作：電腦架構：量化法（Computer Architecture: AModern microprocessors are pipelined (P i Pe 1 i n e) microprocessors. That is, several instructions can operate simultaneously in different blocks or pipeline stages of a microprocessor. By John L. Hennessy and David A. Patterson in his book: Computer Architecture: A

Quantitative Approach)(由摩根霍夫曼出版社（加州，舊金山）在1996所出之第二版）中，定義管線為：”在執行時，多個指示彼此重疊之實施技術（a n i m p 1 e m e n t a t i ο η technique whereby multiple instructions are overlapped in execution)。其提供管線之絕佳描述：管線類似於裝配線。在車輛裝配線中，有許多步驟，各步驟對車輛之組裝做出某些貢獻。雖然對於不同車輛，各步驟之操作平行.於其他步驟。在電腦管線中，管線之各管線完成指令之一部份。類似於裝配線，不同步驟完成了平行之^不同‘令之不同部份。各步驟稱為管後卩皆段（p i p e ；ΓΓ^Γ?：^(ρ1γ segraent) ° 二端中&出"令從一鈿進入，經由這些階段處理，並在另知中輸出’鱿如同裝配線處理車輛般。各時：ίί!處理器係根據時脈周期而操作。-般而言，在各夺锨周』’指令從該微處理器之管線之_階段前進至另Quantitative Approach (Second Edition by Morgan Huffman Press (San Francisco, California) in 1996) defines the pipeline as: "an implementation technique where multiple instructions overlap with each other (animp 1 ementati ο η technique by multiple instructions are overlapped in execution). It provides an excellent description of pipelines: A pipeline is similar to an assembly line. In a vehicle assembly line, there are many steps, each of which makes some contribution to the assembly of the vehicle. Although for different vehicles, each The operation of the steps is parallel to other steps. In the computer pipeline, each pipeline of the pipeline completes a part of the instruction. Similar to the assembly line, different steps complete different parts of the parallel ^ different order. Each step is called after the pipe.卩都段 (pipe; ΓΓ ^ Γ ?: ^ (ρ1γ segraent) ° The two ends of the & 出 " let you enter from the first corner, process through these stages, and output in other knowledge that the squid is like an assembly line handling a vehicle. Each time: ί! The processor operates according to the clock cycle.-In general, the instructions from the microprocessor pipeline are _ Forward to another stage

12828twf.ptd12828twf.ptd

第8頁 200414034 五、發明說明（2) 一階段。在車輛裝配線中，如果因為沒有車輛要裝配使得線上工作員處於閒置，則該線之產量或性能會下降。相似的，如果在一時脈周期中，某一微處理器之管線因為沒有指令要操作而處於閒置，通常指此狀態為管線氣泡 (pipeline bubble)，則該微處理器之性能會下降。造成管線氣泡之可能原因之一是分支指令。當處理分支指令時，處理器必需決定該分支指令之目的位址並開始在該目標位址處而非在該分支指令後之下一位址處擷取指令。甚至，如果該分支指令是一狀況分支指令（亦即，必需根據一特定狀況是否存在而決定該分支是否要執行），除了決定該目標位址外，該處理器更必需決定該分支指令是否要執行。因為最後決定該目標位址及/或分支結果（亦即分支是否要執行）之該管線階段通常處於指令擷取階段之下方，可能會產生氣泡。為解決此問題，現代微處理器一般應用分支預測機制以在管線之早期預測目標位址與分支結果。分支預測機制之一例是分支目標·位址快取（branch target address c a c h e，B T A C )，其平行於從該微處理器之一指令快取擷取指令而預測該分支結果與目標位址。當微處理器執行分支指令且最後決定要執行該分支與決定其目標位址時，該分支指令之位址與其目標位址係寫入至該B T A C内。下次從該指令快取擷取該分支指令時，該分支指令位址會命中於該 B T A C内且該B T A C可在管線早期輸出該分支指令目標位址。有效的B T A C可排除或減少要等待分支指令決定之氣泡Page 8 200414034 V. Description of the invention (2) One stage. In a vehicle assembly line, if workers on the line are idle because there are no vehicles to be assembled, the production or performance of the line will decrease. Similarly, if a microprocessor's pipeline is idle because there are no instructions to operate in a clock cycle, which usually refers to this state as a pipeline bubble, the performance of the microprocessor will decrease. One of the possible causes of pipeline bubbles is branch instructions. When processing a branch instruction, the processor must determine the destination address of the branch instruction and start fetching the instruction at the target address instead of the next address after the branch instruction. Even if the branch instruction is a status branch instruction (that is, whether the branch is to be executed according to whether a specific condition exists), in addition to determining the target address, the processor must determine whether the branch instruction is to be executed. carried out. Because the pipeline stage that finally determines the target address and / or branch result (that is, whether the branch is to be executed) is usually below the instruction fetch stage, a bubble may be generated. To solve this problem, modern microprocessors generally use branch prediction mechanisms to predict target addresses and branch results early in the pipeline. An example of a branch prediction mechanism is a branch target address cache (BTAC), which is parallel to fetching an instruction from an instruction cache of the microprocessor to predict the branch result and the target address. When the microprocessor executes the branch instruction and finally decides to execute the branch and determine its target address, the address of the branch instruction and its target address are written into the BAT AC. The next time the branch instruction is fetched from the instruction cache, the branch instruction address will hit the B T A C and the B T A C can output the branch instruction target address early in the pipeline. Effective B T A C eliminates or reduces bubbles waiting for branch instruction decision

12828twf.ptd 第9頁 20041403412828twf.ptd Page 9 200414034

五、發明說明（3) 數量，以改善處理器性能。然而，當該bta 錯誤擷取指令之管線之部份必需被放棄，以!，測錯誤時，確指令’當指令放棄與擷取發生時，；^在必需_取正泡。當微處理器之管線更深時，BTAC二有效1 =造成氣性能之關鍵處。更會是影響 B T A C之有效性命中率之因素之一量。儲存更多的分在微處理為晶片面方塊（比如BTAC)之素是將目標位址與 (c e 1 1 )之大小。特面積。由單埠晶胞或寫，無法同時進一既定時脈周期内大於單淳BTAC。這積，多埠BTAC可健· 儲存之目標位址數由此觀點來看，單主要是BTAC 是其所儲存支指令目標積總是有限面積變小。相關資訊儲別是，單埠組成之BTAC 行讀寫，但同時進行讀意味著，假存之目標位量，因而會璋BTAC是較之命中率之作目標位址之不位址，BTAC更 ’因而要儘可影響BTAC之實存於BTAC内之晶胞之面積小在一既定時脈由多埠晶胞組寫。然而，多設給定BTAC之址數量必需小降低BTAC之有佳的。二分支指令數 f效。然而，能令既定功能際面積之一 ϋ 儲存晶胞於多槔晶胞之周期内只能讀成之BTAC可在埠BTAC之面積被允許實際面於單槔BTAC可效性。因此，然而，由於單埠B T A C在一既定時脈周期内只能讀或寫，無法同時進行讀寫，此事實會因為偽性落空（f a 1 s e miss)而降低BTAC有效性。在BTAC需要被讀取之周期中，當單埠BTAC正被寫入，比如利用新目標位址來更新BTAC或要使某一目標位址無效時，會發生偽性落空。在此情況5. Description of the invention (3) Quantity to improve processor performance. However, when the part of the pipeline of the bta error fetch instruction must be discarded, when! Is detected, make sure that the instruction is' when the instruction abandonment and fetch occurs, and ^ is necessary to get a positive bubble. When the microprocessor's pipeline is deeper, BTAC 2 is effective 1 = the key to the gas performance. It will also be one of the factors that affect the effectiveness of B T AC. Store more points. The element that is micro-processed into the chip surface (such as BTAC) is the size of the target address and (c e 1 1). Special area. It is not possible to simultaneously advance into a given clock cycle by the port cell unit or write, which is greater than the single-chun BTAC. From this point of view, the number of target addresses that can be stored in a multi-port BTAC is stable. The main reason is that BTAC is the target instruction stored in the product. The area is always limited and the area becomes smaller. The related information is that the BTAC line composed of the port is read and written, but reading at the same time means that the false target bit amount, so the BTAC is not the address of the target address compared with the hit rate. 'Therefore, the area of the unit cell existing in the BTAC that can affect the BTAC as small as possible is written by the multi-port unit cell in a given clock. However, it is necessary to set a small number of BTAC sites to reduce the number of BTAC sites. The number of two branch instructions f is effective. However, the BTAC that can only be read as a unit cell of a given function in a multicell unit cycle can be allowed to actually face the efficiency of a single BTAC in the area of the BTAC. Therefore, however, because the port B T A C can only read or write within a given clock cycle, and cannot read and write at the same time, this fact will reduce the effectiveness of BTAC due to f a 1 s e miss. In the cycle that BTAC needs to be read, when the port BTAC is being written, such as updating the BTAC with a new target address or invalidating a target address, a false fall occurs. In this case

12828twf.ptd 第10頁 200414034 五、發明說明（4) 下，BTAC必需對該讀取產生落空，因為其無法供給可能已存在於BTAC内之該目標位址，因為該BTAC正被寫入。因而，需要一種能降低單埠BTAC内之偽性落空之方法與裝置。可能會降低BTAC有效性之另一現象是BTAC會多次儲存分支指令之目標位址。此現象可能發生於多向指令集聯合 (multi-way set-associative)BTAC内。因為BTAC空間有限，多餘的目標位址儲存會降低BTAC有效性，因為多餘 B T A C項目可儲存另一分支指令之目標位址。管線愈長，亦即階段數愈大，多餘目標位址愈可能會存於BTAC内。同一分支指令在BTAC内被多次快取之最常見情況是在碼之緊湊迴圈（tight loop)内。第一次執行分支指令且其目標位址係寫入至該BTAC，比如寫至第二向，因為第二向是最久未用。然而，在目標位址寫入至BTAC之前，分支指令再次出現，亦即該B T A C查調落空之該指令快取擷取位址，因為該目標位址尚未寫入至該BTAC内。接著，會將目標位址第二次寫入，至該BTAC。如果在該指令集内插入不同分支指令之BTAC讀取造成第二向不再是最久未用，則另一向，比如第一向，會被選擇成第二次寫入該目標位址。現在，同一分支指令之目標位址二次存在於該BTAC内。這是一種BTAC空間浪費且會降低BTAC有效性，因為第二次寫入很可能會覆蓋另一分支指令之有效目標位址。因此，需要一種能避免同一分支指令之目標位址之多餘快取所造成之有用BTAC空間浪費之方法與裝置。12828twf.ptd Page 10 200414034 V. Description of the invention (4), BTAC must fail the read because it cannot supply the target address that may already exist in BTAC because the BTAC is being written. Therefore, there is a need for a method and a device that can reduce the falsification of anomalous BTAC. Another phenomenon that may reduce the effectiveness of BTAC is that BTAC stores the target address of a branch instruction multiple times. This phenomenon may occur in a multi-way set-associative BTAC. Because the BTAC space is limited, excess target address storage will reduce the effectiveness of BTAC, because the redundant B T A C item can store the target address of another branch instruction. The longer the pipeline, that is, the larger the number of stages, the more likely the extra target addresses will be stored in the BTAC. The most common case where the same branch instruction is cached multiple times in BTAC is in a tight loop of code. The first execution of a branch instruction and its target address are written to the BTAC, such as to the second direction, because the second direction is the longest unused. However, before the target address is written to the BTAC, the branch instruction reappears, that is, the instruction cache fetch address of the B T AC check fails, because the target address has not been written to the BTAC. The target address is then written a second time to the BTAC. If a BTAC read that inserts a different branch instruction into the instruction set makes the second direction no longer the longest unused, the other direction, such as the first direction, will be selected to write the target address a second time. Now, the target address of the same branch instruction exists twice in the BTAC. This is a waste of BTAC space and reduces BTAC validity, because the second write is likely to overwrite a valid target address of another branch instruction. Therefore, there is a need for a method and apparatus that can avoid the waste of useful BTAC space caused by the excess cache of the target address of the same branch instruction.

12828twf.ptd 第11頁 200414034 五、發明說明（5) 甚至，相關於BTAC預測性之某些情況之組合會造成微處理内之死結（d e a d 1 〇 c k )。B T A C之分支預測之組合，橫跨指令快取邊界線之分支指令，以及處理器匯流排會交易預測性指令擷取之事實，會造成錯誤情況，導致某些情況下之死結。因而，需要一種能避免應用預測性BTAC之微處理器内之死結情況之方法與裝置。發明内容本發明提供一種寫入佇列以延遲BTAC之寫入，直到該 B T A C未讀取，因而減少偽性落空率。在一觀點中，本發明提供一種寫入佇列，改善一微處理器内之一分支目標位址快取（BTAC)之效率。該寫入佇列包括一要求輸入，接收一要求以更新該B T A C。該要求包括一分支指令目標位址。該寫入佇列也包括複數儲存元件，儲存該要求輸入端所接收之該要求。該寫入佇列也包括控制邏輯電路，耦合至該些儲存元件，回應於一或多既定情況而將存於該些儲存元件内之該些要求之一寫入至該BTAC。在另一觀點中，本發明提供一種微處理器。該微處理器包括一指令快取，回應於一指令擷取位址而提供指令位元組之一快取線。該微處理器也包括一分支目標位址快取 (BTAC )，耦合至該指令快取，預測存於該快取線内之一分支指令之一分支目標位址。該微處理器也包括一寫入佇歹|J，耦合至該BTAC，儲存用於更新該BTAC之分支目標位址〇12828twf.ptd Page 11 200414034 V. Description of the Invention (5) Furthermore, the combination of certain conditions related to the predictability of BTAC will cause dead knots in the microprocessing (d e a d 1 0 c k). The combination of B T A C's branch prediction, branch instructions that cross the instruction cache boundary, and the fact that the processor bus will trade predictive instruction fetches can cause erroneous conditions and cause deadlocks in some cases. Therefore, what is needed is a method and apparatus that can avoid dead-knot conditions in microprocessors employing predictive BTAC. SUMMARY OF THE INVENTION The present invention provides a write queue to delay the writing of BTAC until the B T AC is unread, thereby reducing the false fall rate. In one aspect, the present invention provides a write queue that improves the efficiency of a branch target address cache (BTAC) in a microprocessor. The write queue includes a request input, and a request is received to update the B T AC. The request includes a branch instruction target address. The write queue also includes a plurality of storage elements that store the request received by the request input. The write queue also includes control logic circuitry coupled to the storage elements and writing one of the requirements stored in the storage elements to the BTAC in response to one or more predetermined conditions. In another aspect, the invention provides a microprocessor. The microprocessor includes an instruction cache that provides a cache line of instruction bytes in response to an instruction fetch address. The microprocessor also includes a branch target address cache (BTAC), which is coupled to the instruction cache and predicts a branch target address of a branch instruction stored in the cache line. The microprocessor also includes a write 伫歹 | J, which is coupled to the BTAC and stores a branch target address for updating the BTAC.

12828twf.ptd 第12頁 200414034 — :二種更十微處理器内之五、發明說明（6) 在另一觀點， - q王人7TV丨〜辦声神势士一分支s標位址快取（BTAC)之方法。該方法二二驟：產生一要求以更新該BTAC ;儲存該要求一 γ 及在該儲存步驟之後，根據該要求而更新兮^ 丁幻，以在另一觀點中，本發明提供一猶宭 ^ 。之電腦資料信號，包括電腦可讀 =於一傳輸媒介内理器。該程式碼包括第一程式碼，二以提供一微處於一指令擷取位址而提供指令仿-八一指令快取，回應 ”括第二程式碼，提供一；^ =二块取線。該程式 2支目標位址。該程式=括1 =之一分支指令之仔列’耦合至該BTAC，以儲存用第—耘式碼，提供一寫入位址。存用於更新該BTAC之分支目# 本發明之優點在 " ί至所造成偽性落二：K少因BTAC被讀取時卻要寫 ^AC ^ 5可應用單槔bTaC罝，以增加BTAC之效率。此 BTAC ^以減少BTac JAC，而非應用面積較大之多 BTAC 〇 ‘位址’因而更有效率U月f使得能儲為讓本發明之、、 ^ ^ ^ 顯易懂，下文牲上迷和其他目的、牯細說明如下特舉1佳實施例的* υσ優點能更明實施方. 耳也例，並西己合所附圖式，作詳現參考第1圖， ’、、、不根據本發明之一微處理器1 〇〇之方丰比起相似尺寸之多埠 ZUU414UJ4 五、發明說明（Ό 管線微處理器塊圖。該微處理器丨〇〇包括指令擷取器1 〇 2 微處理器100包括—指 - 係從耦合至該微處理器1(^曰7擷取器102 體）擷取出指令1 3 8。在一 h 一冗憶體（比如，系統記憶取線之基本單位（granu~5施例中，指令擷取器102從快一實施例中，指令是县庚 y中之記憶體擷取指令。在之指令集内之所Λ度人可^長式Λ令。亦即，微處理器中，微處理器1〇〇包括；：i ΐ f都不相同。在-實施例加播并八崔括私々集本質上相容於指令長度可變之X 8 6木構私令集之一微處理器。抑微處理器1 00也包括一指令快取i 04，耦合至指令擷取器1 0 2。指令快取1 〇 4接收指令擷取器丨〇 2輸出之指令位元組之快取線並快取該微處理器丨〇〇後續所用之指令快取線。在一實施例中，指令快取丨〇 4包括6 4 K B之4向指令集聯合L 1 ( 1 e v e 1 - 1 )快取。當一指令落空於該指令快取丨〇 4内時，指令快取1 0 4會通知指令擷取器1 〇 2，其回應地從記憶體擷取包括該落空指令之該快取線。一目前擷取位址1 6 2 輸入至指令快取1 〇 ,4以選擇快取線。在一實施例中，指令快取1 0 4内之快取線包括3 2位元組。指令快取1 〇 4也產生一指令快取閒置信號1 5 8。當指令快取1 〇 4閒置時，指令快取 1 0 4產生為真值之指令快取閒置信號1 5 8。當指令快取1 0 4 未被讀取時，指令快取1 0 4會閒置。在一實施例中，如果指令快取104未被讀取，則該微處理器之BTAC142C將於底下洋細討論）也未被項取。微處理器1 0 0也包括一指令緩衝器1 〇 6，轉合至指令快12828twf.ptd Page 12 200414034 —: Five of the ten more microprocessors, description of the invention (6) In another point of view,-q 王人 7TV 丨 ~ Do singal shishi address branch cache (BTAC) method. The second and second steps of the method: generate a request to update the BTAC; store the request a γ and update it according to the request after the storing step ^ Ding Huan, in another view, the present invention provides a still ^ . Computer data signals, including computer-readable = on a transmission medium processor. The code includes the first code, and the second is to provide a command imitation address at a micro-instruction fetch address, and to provide an instruction imitation-81 command cache. The response "encloses the second code and provides one; ^ = two blocks to fetch. The program has 2 target addresses. The program = including 1 = a branch of a branch instruction 'is coupled to the BTAC to store the first code and provide a write address. The store is used to update the BTAC.支目 # The advantages of the present invention are inferior to the fact that "there is no reason why K should be written when BTAC is read ^ AC ^ 5 A single" bTaC "can be applied to increase the efficiency of BTAC. This BTAC ^ In order to reduce the BTac JAC, rather than the large application area BTAC 〇 'address' and therefore more efficient U month f makes it possible to store the ^ ^ ^ for the purpose of making the present invention more understandable, the following addiction and other purposes The detailed description is as follows: The advantages of the best embodiment of 1 * υσ can be explained more clearly. The ear is also an example, and the figure of the figure is combined. For details, reference is made to FIG. 1, ',,, not according to the present invention. One of Fangfeng, a microprocessor of 100, compared to a multi-port ZUU414UJ4 of similar size. 5. Description of the invention (Ό pipeline microprocessor Block diagram. The microprocessor includes the instruction fetcher 1 〇 The microprocessor 100 includes-refers to-fetches instructions from the microprocessor 1 (7 extractor 102 body) 1 3 8. In a h, a redundant memory (for example, the basic unit of system memory access line (granu ~ 5 embodiment, the instruction fetcher 102 is from the fast one embodiment, the instruction is the memory in the county g Instruction. All instructions in the instruction set can be extended. In other words, in the microprocessor, the microprocessor 100 includes: i ΐ f are all different. The eight Cui Kuo private set is essentially compatible with one of the microprocessors of the X 8 6 wooden private set with variable instruction length. The microprocessor 100 also includes an instruction cache i 04, which is coupled to instruction fetch. 1 0 2. Instruction cache 1 04 Receives the instruction line of the instruction fetcher 丨〇 2 cache line and caches the microprocessor 丨〇 subsequent use of the instruction cache line. In an implementation In the example, the instruction cache 丨〇4 includes a 64-way 4-way instruction set combined with L 1 (1 eve 1-1) cache. When an instruction falls into the instruction cache 丨〇4 , The instruction cache 1 0 4 will notify the instruction fetcher 1 0 2, which in response retrieves the cache line including the failed instruction from the memory. A current fetch address 1 6 2 is input to the instruction cache 1 〇, 4 to select the cache line. In one embodiment, the cache line in the instruction cache 104 includes 32 bytes. The instruction cache 1 〇4 also generates an instruction cache idle signal 1 5 8 When the instruction cache 1 0 4 is idle, the instruction cache 1 104 generates the instruction cache idle signal 1 5 8 which is a true value. When the instruction cache 104 is not read, the instruction cache 104 will be idle. In one embodiment, if the instruction cache 104 is not read, the BTAC142C of the microprocessor will be discussed in detail below). The microprocessor 100 also includes an instruction buffer 106, which is switched to the instruction fast.

12828twf.ptd 第14頁 200414034 五、發明說明（8) 取1 0 4。指令緩衝器1 0 6從指令快取1 0 4接收指令位元組之快取線並暫存該些快取線直到其被規格化成可被微處理器 1 0 0執行之明確性指令。在一實施例中，指令緩衝器1 〇 6包括4個項目（e n t r y )以儲存高達4條的快取線。指令緩衝器 1 0 6產生指令緩衝器全滿信號1 5 6。當指令緩衝器1 〇 6全滿時，指令緩衝器1 0 6產生為真之指令緩衝器全滿信號1 5 6。在一實施例中，如果指令緩衝器1 0 6全滿，則B T A C 1 4 2不能被讀取。微處理器1 0 0也包括一指令規格化器1 0 8，耦合至指令緩衝器1 0 6。指令規格化器1 〇 8從指令緩衝器1 〇 6接收指令位元組並從而產生規格化指令。亦即，指令規格化器1 0 8 檢視指令緩衝器1 0 6内之一串指令位元組，決定哪些位元組包括下一指令與其長度，並輸出下一指令與其長度。在一實施例中’規格化指令包括本質上相谷於X 8 6架構4曰令集之指令。指令規格化器1 0 8也包括產生分支目標位址之邏輯電路，稱為取代預測·目標位址1 7 4。在一實施例中，分支目標位址產生邏輯電路包括一加法器，將一相對分支指令之偏差加至分支指令位址以產生取代預測目標位址1 7 4。在一實施例中，該邏輯電路包括一分支目標緩衝器以產生間接分支指令之目標位址。在一實施例中’該邏輯電路包括一呼叫/回傳堆疊，以產生呼叫與回傳指令之目標位址。指令規格化器1 〇 8也包括一預測取代信號1 5 4 °指令規格化器1 0 8產生為真之預測取代信號1 5 4以取代該微處理器1 0 012828twf.ptd Page 14 200414034 V. Description of the invention (8) Take 1 0 4. The instruction buffer 1 06 receives the instruction byte cache lines from the instruction cache 1 0 4 and temporarily stores the cache lines until it is normalized into explicit instructions that can be executed by the microprocessor 100. In one embodiment, the instruction buffer 106 includes 4 items (entry) to store up to 4 cache lines. The instruction buffer 1 0 6 generates an instruction buffer full signal 1 5 6. When the instruction buffer 106 is full, the instruction buffer 106 generates a true instruction buffer full signal 156. In one embodiment, if the instruction buffer 106 is full, then B T A C 1 4 2 cannot be read. The microprocessor 100 also includes an instruction normalizer 108, which is coupled to the instruction buffer 106. The instruction normalizer 108 receives instruction bytes from the instruction buffer 106 and thereby generates a normalized instruction. That is, the instruction normalizer 108 looks at a string of instruction bytes in the instruction buffer 106, determines which bytes include the next instruction and its length, and outputs the next instruction and its length. In one embodiment, the 'normalized instructions include instructions that are essentially in the X 8 6 architecture 4 command set. The instruction normalizer 108 also includes a logic circuit that generates a branch target address, which is called a replacement prediction target address 174. In one embodiment, the branch target address generation logic circuit includes an adder that adds a deviation from a branch instruction to the branch instruction address to generate a substitute predicted target address 174. In one embodiment, the logic circuit includes a branch target buffer to generate a target address of an indirect branch instruction. In one embodiment, the logic circuit includes a call / backhaul stack to generate target addresses for call and return instructions. The instruction normalizer 1 0 8 also includes a predictive replacement signal 1 5 4 ° The instruction normalizer 1 0 8 generates a true prediction replace signal 1 5 4 to replace the microprocessor 1 0 0

12828twf.ptd 第15頁 200414034 五、發明說明（9) 内之BTAC 142所做之分支預測，將於底下詳細描述。亦即，如果指令規格化器1 〇 8内之邏輯電路所產生之目標位址不符合BTAC1 42所產生目標位址，則指令規格化器1 08產生為真之預測取代信號1 5 4以使得該B T A C 1 4 2之預測所擷取之指令被放棄並使得微處理器1 0 0分支至該取代預測目標位址1 7 4。在一實施例中，在指令被放棄且微處理器1 〇〇分支至該取代預測目標位址1 74之時間内，BTAC1 42不能被讀取。微處理器1 0 0也包括一規格化指令佇列1 1 2，耦合至指令規格化器1 0 8。規格化指令佇列1 1 2接收從指令規格化器 1 0 8輸出之規格化指令並暫存該些規格化指令直到其被轉譯成微指令。在一實施例中，規格化指令佇列1 1 2包括儲存高達1 2個規格化指令之項目，雖然第丨2圖只顯示出4個項目。微處理器1 0 〇也包括一指令轉譯器1 1 4，耦合至規格化指令佇列1 1 2。指令轉譯器1 1 4將存於該規格化指令佇列 1 1 2内之δ亥規格化指令轉譯成微指令。在一實施例中，微處理器100包括精簡指令集電腦（reduced instructi〇n set computer，RISC)核心，其執行本身或精簡指令集之微指令。微處理器1 0 〇也包括一轉譯後指令佇列丨i 6，耦合至二轉譯後指令仔列116接收從指令轉譯器"4 以暫存該些微指令直到其可被其餘微12828twf.ptd Page 15 200414034 V. The branch prediction made by BTAC 142 in the description of the invention (9) will be described in detail below. That is, if the target address generated by the logic circuit in the instruction normalizer 1 08 does not match the target address generated by BTAC1 42, the instruction normalizer 1 08 generates a true prediction replacement signal 1 5 4 so that The instruction fetched by the BTAC 1 4 2 prediction is abandoned and the microprocessor 100 branches to the replacement prediction target address 1 74. In one embodiment, BTAC1 42 cannot be read during the time the instruction is abandoned and the microprocessor branches to 1000 to the replacement predicted target address 174. The microprocessor 100 also includes a normalization instruction queue 1 12 coupled to the instruction normalizer 108. The normalized instruction queue 1 1 2 receives the normalized instructions output from the instruction normalizer 108 and temporarily stores the normalized instructions until it is translated into a micro instruction. In one embodiment, the normalization instruction queue 1 12 includes items that store up to 12 normalization instructions, although Figure 4 shows only 4 items. The microprocessor 100 also includes an instruction translator 1 1 4 coupled to the normalized instruction queue 1 1 2. The instruction translator 1 1 4 translates the delta normalized instructions stored in the normalized instruction queue 1 1 2 into micro instructions. In one embodiment, the microprocessor 100 includes a reduced instruction set computer (RISC) core, which executes micro instructions of itself or the reduced instruction set. The microprocessor 100 also includes a post-translated instruction queue i 6 which is coupled to the second post-translated instruction queue 116 and receives the instruction translator " 4 to temporarily store the micro-instructions until it can be used by other micro-instructions.

12828twf.ptd 第16頁 200414034 五、發明說明（ίο) 微處理器1〇〇也包扭 . 此入於t 括一暫存器階段1 1 8，耦人吞缸指令Ο列1 1 6。暫存器階揭口至轉譯後運算子與結果。暫存器階 '括複數暫存器以儲存指令案以儲存微處理器1〇〇之使用者者可視暫存器槽段118。位址階段122包括括—/立址階段122，轉合至暫存器階七入γ 括位址產生邏輯電路，產生記恃轉址。飞储存指令及分支指令）之記憶體位 122。微資處料理二04也Λ括一資㈣貝枓& I又1 24包括從記憶體載入資料之取從記憶體所載入資料之一或多快取。铒电及决微處理器1 0 〇也包括一執行階段i 2 6，耦合至資料階 124。執行階段126包括執行指令之執行單元，比如執行算數與邏輯指令之算數與邏輯單元。在一實施例中，執行^ 段126包括整數執行單元，浮點執行單元，ΜΜχ執行單元與 SS£執行單元。執行階段126也包括分支指令決定邏輯電” 路。特別是，執行·階段126決定分支指令是否要執行及 B T A C 1 4 2先别誤測之分支指令是否要執行。此外，執行階段1 2 6決定B T A C 1 4 2先前預測之分支目標位址是否被 B T A C 1 4 2誤測，亦即是否不正確。如果執行階段丨2 6決定先前分支預測是不正確的，執行階段1 2 6產生為真值之分支誤測信號1 5 2，以使得由於B T A C 1 4 2誤測所擷取之指令被放棄且使得該微處理器1 〇 0分支至該正確位址1 7 2。在一實施例中，在指令被放棄且使得該微處理器1 〇〇分支至該正確12828twf.ptd Page 16 200414034 V. Description of the Invention (1) The microprocessor 100 is also complicated. This includes a register stage 1 1 8 which is coupled to a tank swallowing instruction 0 1 16. The register stage is exposed to the translated operators and results. The register stage includes a plurality of register registers for storing instructions to store the user 100 of the microprocessor 100 as a register register section 118. The address stage 122 includes bracket- / site-position stage 122, which is transferred to the register stage. Seven into the bracket bracket address generation logic circuit to generate a register address. Memory instructions and branch instructions). The micro-assessment cuisine 2 04 also includes a resource ㈣枓 & 1 1 1 24 includes loading data from the memory to fetch one or more caches from the data loaded in the memory. The power and decision microprocessor 100 also includes an execution stage i 2 6 coupled to the data stage 124. The execution stage 126 includes execution units that execute instructions, such as arithmetic and logic units that execute arithmetic and logic instructions. In one embodiment, the execution section 126 includes an integer execution unit, a floating-point execution unit, a MM × execution unit, and an SS £ execution unit. Execution phase 126 also includes branch instructions to determine logic circuits. In particular, execution · phase 126 determines whether branch instructions are to be executed and whether BTAC 1 4 2 is to be misjudged before branch instructions are to be executed. In addition, execution phase 1 2 6 determines BTAC 1 4 2 Whether the previously predicted branch target address was misdetected by BTAC 1 4 2, that is, whether it is incorrect. If the execution stage 丨 2 6 determines that the previous branch prediction is incorrect, the execution stage 1 2 6 produces a true value The branch mis-measures the signal 1 5 2 so that the instruction fetched due to the BTAC 1 42 2 mis-measures is abandoned and the microprocessor 100 branches to the correct address 1 72. In one embodiment, After the instruction is abandoned and the microprocessor 100 branches to the correct

12828twf.ptd 第17頁 200414034 五、發明說明（li) — 位址1 72之期間内，BTAC 142不能被讀取。微處理器1 0 0也包括一儲存階段丨2 8，耦合至執行階$ 1 2 6。儲存階段1 2 8包括回應於儲存微指令而將資料存至^ 憶體之邏輯電路。儲存階段丨28產生一正確位址172。正確位址1 72包括分支指令之正確分支目標位址。亦即，正確位址1 7 2是分支指令之非預測性目標位址。當執行與確定分支指令時，正確位址172也寫入至BTAC 142，這將於底^ 詳細描述。儲存階段丨28也產生一BTAC寫入要求176以更 BTAC142 °BTAC寫入要求176將參考第7圖做詳細描述。微處理器1 0 0也包括一寫回階段丨3 2，耦合至儲存階俨 1 2 8。寫回階段1 3 2包括將指令結果寫至暫存器階段1 1 8 邏輯電路。微處理器100也包括BTAC142 0BTAC142包括可快取目標位址與其他分支預測資訊之快取記憶體。bTAC丨42回應於從一多工器1 4 8接收之一位址1 8 2而產生一預測目標位址 1 6 4。在一貫施例中，b T A C 1 4 2包括單埠快取記憶體，被 BTAC142之讀取與寫入存取所共享，因而使得BTAC142有偽性落空（false miss)之機率。BTAC142與多工器148將於麻下詳述。、& 微處理器100也包括一第二多工器136，耦合至 BTAC142。多工器136選擇6個輸入之一以輸出成一目前擷取位址162。輸入之一是由一加法器134所產生之一下_擷取位址1 6 6 ’加法器1 3 4對目前擷取位址1 6 2加上快取線之大小以產生該下一擷取位址1 6 6。在從指令快取1 〇 4正常擷12828twf.ptd Page 17 200414034 V. Description of Invention (li)-BTAC 142 cannot be read during the period of address 1 72. The microprocessor 100 also includes a storage stage 218, which is coupled to the execution stage $ 126. The storage phase 1 2 8 includes a logic circuit for storing data in the memory in response to the storage microinstruction. The storage phase 28 generates a correct address 172. The correct address 1 72 includes the correct branch target address of the branch instruction. That is, the correct address 1 72 is the non-predictive target address of the branch instruction. When the branch instruction is executed and determined, the correct address 172 is also written to BTAC 142, which will be described in detail at the end. The storage phase 28 also generates a BTAC write request 176 to BTAC142 ° BTAC write request 176 will be described in detail with reference to FIG. 7. The microprocessor 1 0 0 also includes a write back stage 3 2, which is coupled to the storage stage 1 2 8. The write-back stage 1 3 2 includes writing the instruction result to the scratchpad stage 1 1 8 logic circuit. Microprocessor 100 also includes BTAC142. BTAC142 includes cache memory that caches target addresses and other branch prediction information. bTAC 丨 42 generates a predicted target address 1 6 4 in response to receiving an address 1 8 2 from a multiplexer 1 4 8. In a consistent embodiment, b T A C 1 4 2 includes a port cache memory, which is shared by BTAC142 read and write accesses, thus making BTAC142 have a chance of false miss. The BTAC142 and multiplexer 148 will be described in detail under Ma. The & microprocessor 100 also includes a second multiplexer 136 coupled to the BTAC 142. The multiplexer 136 selects one of the six inputs to output to a current fetch address 162. One of the inputs is one generated by an adder 134. _ fetch address 1 6 6 'Adder 1 3 4 adds the size of the current capture address 16 2 to the cache line to generate the next capture Address 1 6 6. In the cache from the command cache

200414034 五、發明說明（12) 取一快，線後，多工器136選擇該下一擷取位址166以輸出成該目前擷取位址162。另一輸入是目前擷取位址162。另一輸入是BTAC預測目標位址1 64，如果BTAC142指示一分枝指令存在於從該指令快取1 〇 4之該目前擷取位址丨6 2所擇出之該快取線内且B T A C 1 4 2預測出該分支指令要被執行，則多工器1 3 6選擇B T A C預測目標位址1 6 4。另一輸入是從儲存階段1 2 8接收之正確位址1 7 2，多工器1 3 6選擇正確位址1 7 2 以杈正一分支誤測。另一輸入是從指令規格化器丨〇 8接收之取代預測目標位址1 7 4，多工器丨3 6選擇該取代預測目標位址174以取代該BTAC測試目標位址164。另一輸入是一= 如指令指標1 6 8，其指向目前正被該指令規格化器1 〇 8規柊化之指令之位址。多工器136選擇該目前指令指標168以^ 免死結情況，如下述。微處理器100也包括一 BTAC寫入佇列（BWQ)144，_合至BTAC142。BTAC寫入仔列144包括複數儲存元件以暫存 BTAC寫入要求176，直到其可被寫入至BTAC142為止。ΒΤΑ(：寫入彳宁列1 4 4接收該分支誤測信號1 5 2，該預測取代信號 1 5 4，該指令緩衝器全滿信號1 5 6，與該指令快取閒置信號 158。有利的是，BTAC寫入佇列144能利用BTAC寫入要&〜 176來延遲BTAC 142之更新，直到輸入信號丨52〜158所指示之適當時間，亦即BTAC 142未被讀取之時間，以增加曰 BTAC142之效率，將於底下詳述。 BTAC寫入佇列144產生一 BTAC寫入佇列位址178，其輸入至多工器148。BTAC寫入佇列144也包括儲存一目前仔' ^200414034 V. Description of the invention (12) Take a quick, after the line, the multiplexer 136 selects the next capture address 166 to output into the current capture address 162. The other input is the current fetch address 162. The other input is BTAC prediction target address 1 64. If BTAC142 indicates that a branch instruction exists in the current fetch address cached from the instruction 1 04 4 6 2 in the cache line selected by BTAC 1 4 2 predicts that the branch instruction will be executed, then the multiplexer 1 3 6 selects the BTAC prediction target address 1 6 4. The other input is the correct address 1 7 2 received from the storage stage 1 2 8 and the multiplexer 1 3 6 selects the correct address 1 7 2 to make a false detection on a branch. The other input is the replacement prediction target address 174 received from the instruction normalizer 丨 08, and the multiplexer 3 6 selects the replacement prediction target address 174 to replace the BTAC test target address 164. The other input is one = if the instruction index 1 6 8 points to the address of the instruction currently being normalized by the instruction normalizer 108. The multiplexer 136 selects the current instruction index 168 to avoid deadlock situations, as described below. The microprocessor 100 also includes a BTAC write queue (BWQ) 144, which is combined to BTAC 142. The BTAC write queue 144 includes a plurality of storage elements to temporarily store the BTAC write request 176 until it can be written to the BTAC 142. ΒΤΑ (: write to Suining column 1 4 4 receives the branch misdetection signal 1 5 2, the prediction replaces the signal 1 5 4, the instruction buffer full signal 1 5 6, and the instruction cache idle signal 158. Favorable The BTAC write queue 144 can delay the update of BTAC 142 by using the BTAC write request & ~ 176 until the appropriate time indicated by the input signals 52 ~ 158, that is, the time when BTAC 142 is not read, In order to increase the efficiency of BTAC142, it will be detailed below. BTAC write queue 144 generates a BTAC write queue address 178, which is input to multiplexer 148. BTAC write queue 144 also includes storing a current sacrifice ' ^

12828twf.ptd 第19頁 (13) 200414034 五深度146之一暫存器。佇列深度146指出目前存於mqi44内之有效BTAC寫入要求176之數量。佇列深度146之初始值為 0。每次將一BTAC寫入要求176存至BTAC寫入佇列144内，佇列深度1 4 6都會增加。每次將一 BTAC寫入要求丨7 6從 BWQ144移走，佇列深度146都會減少。BTAC寫入佇列144 於底下詳述。現參考第2圖，顯示根據本發明之第1圖之微處理器之部份詳細方塊圖。第2圖顯示BTAC寫入佇列144，BTAC 1°4 2 與第1圖之多工器148 ，另增加一仲裁器2〇2，以及耦合於該BTAC寫入佇列144與該BTAC1 42間之3-輸入多工器2〇6。雖然第1圖之多工器148只接收2個輸入，多工器148是4-輸入多工器，如第2圖所示。如第2圖所示，BTAC142包括一讀/寫輸入，一位址輸入與一資料輸入。如第1圖所示，多工器1 4 8接收該目前擷取位址丨6 2與該BWQ位址1 78。此外，多工器1 48也接收一多餘TA位址2 34 與一死結位址2 3 6，將分別參考第10 —n圖與第12 —13圖做詳細描述。多工器.148根據該仲裁器2 0 2所產生之一控制信號258而選擇其4個輸入之一以輸出成第1圖之一位址資料 182，該位址資料182係輸入至該BTAC 142之該位址輸入。該多工器2 0 6接收一多餘TA資料信號2 44與一死結資料信號2 4 6，將分別參考第10_n圖與第12_13圖做詳細描、述。多工器206也接收從該BTAC寫入仵列144傳來之一 BWQ 資料#號2 4 8，其為該目前b τ A C寫入彳宁列1 4 4需要更新該 BTAC 1 42之資料。多工器2 0 6根據該仲裁器2 0 2所產生之一12828twf.ptd Page 19 (13) 200414034 V One of the registers of depth 146. The queue depth 146 indicates the number of valid BTAC write requirements 176 currently stored in mqi44. The initial value of the queue depth 146 is zero. Each time a BTAC write request 176 is stored in the BTAC write queue 144, the queue depth 1 4 6 increases. Each time a BTAC write request is removed from BWQ144, the queue depth 146 decreases. The BTAC write queue 144 is detailed below. Referring now to Fig. 2, a detailed block diagram of a portion of a microprocessor according to Fig. 1 of the present invention is shown. Figure 2 shows the BTAC write queue 144, BTAC 1 ° 4 2 and the multiplexer 148 of Figure 1, another arbiter 202 is added, and the BTAC write queue 144 and the BTAC1 42 are coupled. Of 3-input multiplexer 206. Although the multiplexer 148 of FIG. 1 receives only two inputs, the multiplexer 148 is a 4-input multiplexer, as shown in FIG. 2. As shown in Figure 2, BTAC142 includes a read / write input, a bit address input, and a data input. As shown in Figure 1, the multiplexer 1 4 8 receives the current fetch address 6 2 and the BWQ address 1 78. In addition, the multiplexer 1 48 also receives an extra TA address 2 34 and a dead node address 2 3 6, which will be described in detail with reference to Figs. 10-n and 12-13 respectively. Multiplexer. 148 selects one of its four inputs according to a control signal 258 generated by the arbiter 202 to output one of the address data 182 in the first figure, which is input to the BTAC Enter the address of 142. The multiplexer 2 0 6 receives an excess TA data signal 2 44 and a dead knot data signal 2 4 6, which will be described and described in detail with reference to FIGS. 10_n and 12_13 respectively. The multiplexer 206 also receives one of the BWQ data # 2 2 8 transmitted from the BTAC write queue 144, which is the current b τ AC write to the queue 1 4 4 and the data of the BTAC 1 42 needs to be updated. Multiplexer 2 0 6 generates one according to the arbiter 2 0 2

12828twf.ptd 第20頁 200414034 五、發明說明（14) ---- 控制信號2 6 2而選擇三個輸入之一以輸出成一資料信號 2 5 6，其輸入至該BTAC142之資料輸入。〜12828twf.ptd Page 20 200414034 V. Description of the invention (14) ---- Control signal 2 6 2 and select one of the three inputs to output a data signal 2 5 6 which is input to the data input of the BTAC142. ~

仲裁器2 0 2仲裁對該BTAC 142要求存取之複數來源。去 BTAC142被讀或寫時，仲裁器202產生一信號252至該田 BTAC142之該讀/寫輸入以控制之。仲裁器202接收一BTAC 讀取要求信號2 1 2，其代表平行於也利用目前擷取位址丨6 2 而對指令快取1 0 4之讀取之利用目前擷取位址1 6 2而讀取 BTAC142之一要求。仲裁器202也接收一多餘目標位址（τα) 要求號214 ’其代表要無效該BTAC142内之該多餘TA位址 234所選指令集内之相同分支指令之一多餘項目之一要求，將於底下描述。仲裁器2 0 2也接收一死結要求信號 2 1 6 ’其代表要將誤測該死結位址2 3 6所選之指令集内之一分支指令未橫跨快取邊界線之該BTAC1 42内之一項目無效化之一要求，將於底下描述。仲裁器2 0 2也接收從該BTAC 寫入佇列144輸出之一BWQ非空信號218，其代表有待處理之至少一要求以更新該BWQ位址1 78所選指令集内之該 BTAC1 42内之一項目，這將於底下描述。仲裁器2 0 2也接收從該BTAC寫入佇列144輸出之一BWQ全滿信號2 2 2，其代表該BTAC寫入佇列144填滿了要更新該BWQ位址178所選指令集内之該BTAC142内之一項目之待處理要求，將於底下描述。在一實施例中，仲裁器2 0 2指定優先權，如底下表1所示’其中1代表最高優先權而5代表最低優先權： 1 -死結要求2 1 6The arbiter 2 0 2 arbitrates the multiple sources that the BTAC 142 requested access to. When the BTAC142 is read or written, the arbiter 202 generates a signal 252 to the read / write input of the field BTAC142 to control it. The arbiter 202 receives a BTAC read request signal 2 1 2, which represents that the read of the instruction cache 1 0 4 using the current fetch address is parallel to the current fetch address 1 6 2 and One of the requirements for reading BTAC142. The arbiter 202 also receives a redundant target address (τα) request number 214 'which represents a request to invalidate one of the redundant items of the same branch instruction in the instruction set selected by the redundant TA address 234 in the BTAC142, Will be described below. The arbiter 2 0 2 also receives a dead-knot request signal 2 1 6 'It represents that the branch instruction in the instruction set selected by the dead-knot address 2 3 6 is mis-measured within the BTAC1 42 of the cache boundary. One of the requirements for invalidation of a project will be described below. The arbiter 2 0 2 also receives a BWQ non-empty signal 218 output from the BTAC write queue 144, which represents at least one request pending to update the BWQ address 1 78 within the BTAC 1 42 selected instruction set. One of the projects, which will be described below. The arbiter 2 0 2 also receives a BWQ full signal 2 2 2 output from the BTAC write queue 144, which represents that the BTAC write queue 144 is filled with the instruction set selected to update the BWQ address 178. The pending requirements for one of the items in the BTAC142 will be described below. In one embodiment, the arbiter 2 0 2 specifies the priority, as shown in the table below. 1 of which 1 represents the highest priority and 5 represents the lowest priority: 1-Dead knot request 2 1 6

12828twf.ptd 第21頁 200414034 五、發明說明（15) 2- BMQ 全滿 222 3- BTAC讀取要求2 1 2 4- 多餘TA要求214 5- BWQ 非空 218 現參考第3圖，顯示根據本發明之第1圖之BTAC 142之詳細方塊圖。如第3圖，該B T A C 1 4 2包括一目標位址陣列 3 0 2，一標籤陣列3 0 4，與一計數器陣列3 0 6。各陣列3 0 2， 3 0 4與3 0 6接收第1圖之位址1 8 2。第3圖之實施例顯示4向指令集聯合BTAC142快取記憶體。在另一實施例中，BTAC142 包括2向指令集聯合快取記憶體。在一實施例中，該目標位址陣列3 0 2與該標籤陣列3 0 4是單埠，但計數器陣列3 0 6 是具有一讀取淳與一寫入埠之雙埠，因為計數器陣列3 0 6 之更新頻率高於該目標位址陣列3 0 2與該標籤陣列3 0 4之更新頻率。該目標位址陣列3 0 2包括一儲存元件陣列，以儲存能快取分支目標位址與相關分支預測資訊之目標位址陣列項目3 1 2 。目標位址陣列項目3 1 2之内容將參考第4圖而於底下描述。該標籤陣列3 0 4包括一儲存元件陣列，以儲存可儲存位址標籤與相關分支預測資訊之標籤陣列項目3 1 4。標籤陣列項目3 1 4之内容將參考第5圖而於底下描述。該計數器陣列3 0 6包括一儲存元件陣列，以儲存可儲存分支結果預測資訊之計數器陣列項目3 1 6。計數器陣列項目3 1 6之内容將參考第6圖而於底下描述。各目標位址陣列3 0 2，標籤陣列3 0 4，與計數器陣列12828twf.ptd Page 21 200414034 V. Description of the invention (15) 2- BMQ full 222 3- BTAC read requirement 2 1 2 4- Excess TA requirement 214 5- BWQ non-empty 218 Now refer to Figure 3, showing Detailed block diagram of BTAC 142 of Figure 1 of the invention. As shown in FIG. 3, the B T A C 1 4 2 includes a target address array 3 0 2, a tag array 3 0 4, and a counter array 3 0 6. Each array 3 0 2, 3 0 4 and 3 0 6 receives the address 1 8 2 of the first figure. The embodiment of Figure 3 shows a 4-way instruction set in conjunction with BTAC142 cache memory. In another embodiment, BTAC142 includes 2-way instruction set joint cache memory. In one embodiment, the target address array 3 0 2 and the tag array 3 0 4 are dual ports, but the counter array 3 0 6 is a dual port with a read port and a write port, because the counter array 3 The update frequency of 0 6 is higher than the update frequency of the target address array 302 and the tag array 304. The target address array 3 0 2 includes an array of storage elements to store a target address array item 3 1 2 capable of caching branch target addresses and related branch prediction information. The contents of the target address array item 3 1 2 will be described below with reference to FIG. 4. The label array 3 0 4 includes a storage element array to store a label array item 3 1 4 that can store an address label and related branch prediction information. The contents of the label array item 3 1 4 will be described below with reference to FIG. 5. The counter array 3 06 includes a storage element array to store a counter array item 3 1 6 that can store branch result prediction information. The contents of the counter array item 3 1 6 will be described below with reference to FIG. 6. Each target address array 302, label array 304, and counter array

12828twf.ptd 第22頁 200414034 五、發明說明（16) 3 06係規劃成4向，如所示般之第0向（way 0)，第1向（way 1)，第2向（way 2)與第3向（way 3)。較好是，目標位址陣列3 0 2之各向儲存2個項目或一部份，以快取分支目標位址與預測性分支資訊，由A與B代表，使得如果有兩個分支指令存在於快取線内，B T A C 1 4 2可預測出適當之分支指令。各陣列3 0 2 - 3 0 6由第1圖之位址1 8 2做索引。位址1 8 2之低位元選擇各陣列3 0 2 - 3 0 6内之快取線。在一實施例中，各陣列3 0 2 - 3 0 6包括128個指令集。因此，BTAC1 42能快取高達1024個目標位址，各指令集之各向（各指令集有4向）有2個位址。較好是，陣列3 0 2 - 3 0 6由位址1 8 2之位元[1 1 : 5]做索引以選擇BTAC 142内之4向指令集。現參考第4圖，顯示根據本發明之第3圖之目標位址陣列項目3 1 2之内容。該目標位址陣列項目3 1 2包括一分支目標位址 (T A ) 4 0 2。在一實施例中，目標位址4 0 2包括3 2 -位元位址，從分支指令之先前執行快取得。BTAC 142提供關於預測T A輸出1 6 4之目標位址4 〇 2。該目標位址陣列項目3 1 2也包括一開始攔位4 0 4。開始欄位4 0 4代表回應於該目前擷取位址1 6 2而從該指令快取 1 0 4輸出之一快取線内之該分支指令之第一位元組之位元組偏差（b y t e 〇 f f s e t)。在一實施例中，一快取線包括3 2 位元組；因此，開始攔位4 〇 4包括5位元。該目標位址陣列項目3 1 2也包括一橫跨（w r a p )位元 4 0 6。如果該預測分支指令係橫跨指令快取1 〇 4之兩快取線12828twf.ptd Page 22 200414034 V. Description of the Invention (16) The 3 06 series is planned in 4 directions, as shown in the 0th direction (way 0), the 1st direction (way 1), and the 2nd direction (way 2) With the third direction (way 3). Preferably, the target address array 3 2 stores 2 items or a part in each direction to cache the branch target address and predictive branch information, represented by A and B, so that if two branch instructions exist Within the cache line, BTAC 1 4 2 can predict appropriate branch instructions. Each array 3 0 2-3 0 6 is indexed by the address 1 8 2 in the first figure. The lower bits of address 1 8 2 select the cache lines within each array 3 2-3 0 6. In one embodiment, each of the arrays 302-306 includes 128 instruction sets. Therefore, BTAC1 42 can cache up to 1024 target addresses, each direction of each instruction set (each instruction set has 4 directions) has 2 addresses. Preferably, the arrays 3 2-3 0 6 are indexed by bits [1 1: 5] at address 1 8 2 to select the 4-way instruction set in BTAC 142. Referring now to Fig. 4, the contents of the target address array item 3 12 according to Fig. 3 of the present invention are shown. The target address array item 3 1 2 includes a branch target address (T A) 402. In one embodiment, the target address 4 2 includes a 3 2 -bit address, which is quickly obtained from a previous execution of a branch instruction. The BTAC 142 provides a target address 4 02 for a predicted TA output of 164. The target address array item 3 1 2 also includes an initial stop 4 0 4. The start field 4 0 4 represents the byte offset of the first byte of the branch instruction within one of the cache lines output from the instruction cache 1 104 in response to the current fetch address 16 2 ( byte 〇ffset). In one embodiment, a cache line includes 32 bytes; therefore, the starting block 404 includes 5 bytes. The target address array item 3 1 2 also includes a span (w r a p) bit 4 0 6. If the predicted branch instruction crosses the two cache lines of the instruction cache

12828twf.ptd 第23頁 200414034 五、發明說明（π) 的話，橫跨位元4 0 6為真。BTAC142提供關於B —wrap信號 1214之橫跨位元406，將參考第丨2圖而於底下討論。請參考第5圖’顯示根據本發明之第3圖之標籤陣列項目3 1 4之内容。該標籤陣列項目3 1 4包括一標籤5 0 2。在一實施例中，標籤5 0 2包括該分支指令之位址之高階2 〇位元，該分支指令使該目標位址陣列3 〇 2内之相關項目儲存一預測目標位址4 0 2。如果該項目為有效的話，Β τ A c丨4 2比較標籤5 〇 2與第1圖之位址1 8 2之高階2 〇位元以決定該項目是否匹配於位址182 ，亦即位址182是否命中於BTAC142内。该標籤陣列項目3 1 4也包括一 A有效位元5 〇 4，如果該目標位址陣列3 0 2内之相關項目之A部份内之該目標位址 4 0 2為有效的話，A有效位元5 〇 4為真。該標籤陣列項目3 i 4 也包括一 B有效位元5 〇 6，如果該目標位址陣列3 〇 2内之相關項目之B部份内之該目標位址4〇2為有效的話，該B有效位元5 0 6為真。該標籤陣列項.目314也包括一3_位元lru欄位5〇8，其指示所^指令集之該4向之哪一向是lru(Uast RecenUy12828twf.ptd Page 23 200414034 5. In the description of the invention (π), crossing the bit 4 0 6 is true. The BTAC142 provides the crossing bit 406 of the B-wrap signal 1214, which will be discussed below with reference to FIG. 2. Please refer to Fig. 5 'for the contents of the label array item 3 1 4 according to Fig. 3 of the present invention. The label array item 3 1 4 includes a label 50 2. In one embodiment, the tag 502 includes the high-order 20 bits of the address of the branch instruction, and the branch instruction causes a related item in the target address array 3 02 to store a predicted target address 402. If the item is valid, B τ A c 丨 4 2 compares the label 5 〇2 with the high-order 20 bits of the address 1 8 2 in the first figure to determine whether the item matches the address 182, that is, the address 182 Whether to hit within BTAC142. The label array item 3 1 4 also includes an A effective bit 5 0 4. If the target address 4 2 in the A part of the relevant item in the target address array 3 2 2 is valid, A is valid. Bit 5 04 is true. The label array item 3 i 4 also includes a B effective bit 506. If the target address 402 in the B part of the relevant item in the target address array 3 002 is valid, the B Valid bits 5 0 6 are true. The label array entry. Item 314 also includes a 3-bit lru field of 508, which indicates which of the four directions of the ^ instruction set is lru (Uast RecenUy

Used，取久未用）。在一實施例中，當執行βτac分支時， BTAC 1 4^，、更新該1 ru攔位5 08。亦即，只有當^ 預測 rm被執行且該微處理器1〇0根據預測而分支至 "亥所知1供之§亥預測目標位址1 6 4時，β T A C 1 4 2才會更新該lru欄位5 08 ^當BTAC分支正被執行時，於btacm2 未被讀取且不需要使用BTAC寫入佇列144之期間内，Used, long unused). In one embodiment, when the βτac branch is executed, BTAC 1 4 ^, updates the 1 ru block 5 08. That is, β TAC 1 4 2 will be updated only when ^ prediction rm is executed and the microprocessor 100 branches according to the prediction to the prediction target address 1 6 4 The lru field 5 08 ^ while the BTAC branch is being executed, during the period when btacm2 is not read and the queue 144 is not written using BTAC,

第24頁 200414034 五、發明說明（18) BTAC142會更新lru攔位5 0 8。請參考第6圖’顯示根據本發明之第3圖之計數器陣列項目3 1 6之内容。計數器陣列項目3 1 6包括一預測狀態a計數器6 〇 2。在一實施例中’該預測狀態A計數器6 0 2是2-位元飽和計數器，每次該微處理器1 〇〇決定要執行相關分支指令時，其彺上计數，母次不執行相關分支指令時，其往下計數。往上計數時，該預測狀態A計數器6 0 2飽和於b ‘ ；[ i之二進位值；往下計數，，該預測狀態A計數器6 〇 2飽和於b‘ 〇〇之二進位值。在一實施例中，如果該預測狀態A計數器6 〇 2之值是b 1 1或b‘ 1 0 ’則BTAC1 42預測相關於所選目標位址陣列項目3 1 2之A部份之分支指令要被執行；否則，B T A c丨4 2預測分支指令不要被執行。計數器陣列項目3丨6也包括一預 = ，其操作相似於該預測狀態A計數器 > 一，\/目關於所選目標位址陣列項目3 1 2之B部份。。十數器陣‘列項目316也包括一 A/Blru位元606 °A/Blru 位元6 0 6内之b 1之—進位值代表所選目標位址陣列項目 3 1 2之A部份是最久未用；$ %，則陣目312之Β部份是最久去田士 ^ , yi ^ 丄止丨平幻只、告合土 —八士处里r未用。在一實施例中，當分支指令到達日决疋刀支、、、口果_(亦即分支要執行與否）之該儲存階段位丨6〇6連同該預測狀態A與B計數器6〇2與 ^ —貫施例中，更新計數器陣列項目3 1 6 不需要使用到BTAC寫入俨…μ m ^ & π χ ^ 知列144，因為計數裔陣列3 0 6包括一續取埠與一寫入埠，Page 24 200414034 V. Description of the invention (18) BTAC142 will update the lru stop 5 0 8. Please refer to FIG. 6 'to show the contents of the counter array item 3 1 6 according to the 3rd figure of the present invention. The counter array item 3 1 6 includes a predicted state a counter 6 02. In one embodiment, 'the predicted state A counter 602 is a 2-bit saturation counter. Each time the microprocessor 100 decides to execute a related branch instruction, it counts up, and the parent and child do not perform the correlation. When a branch instruction is issued, it counts down. When counting up, the prediction state A counter 602 is saturated with b '; [i bis carry value; when counting down, the prediction state A counter 602 is saturated with b' 00 bis carry value. In one embodiment, if the value of the prediction state A counter 6 〇 2 is b 1 1 or b ′ 1 0 ′, then BTAC1 42 predicts the branch instruction related to part A of the selected target address array item 3 1 2 To be executed; otherwise, the BTA c 4 4 predicts that the branch instruction will not be executed. The counter array item 3 丨 6 also includes a pre- =, whose operation is similar to that of the predicted state A counter > one, / / about the part B of the selected target address array item 3 1 2. . Decimal array 'column item 316 also includes an A / Blru bit 606 ° A / Blru bit 6 0 6 b 1-the carry value represents the selected target address array item 3 1 2 A part is The longest unused; $%, then the B part of the array 312 is the longest to go to Tian Shi ^, yi ^ 丄丨 Ping Huan only, to lay the soil together-Ba Shi Chuli r unused. In an embodiment, when the branch instruction arrives on the day of the cut, the storage stage bit (that is, whether the branch is to be executed or not) is stored in the storage stage bit 606 and the predicted state A and B counter 602 In the embodiment, the update of the counter array item 3 1 6 does not require the use of BTAC write 俨 ... μ m ^ & π χ ^ Known column 144, because the count array 3 0 6 includes a continuous access port and a Write port,

12828twf.ptd 第25頁 20041403412828twf.ptd Page 25 200414034

五、發明說明（19) 現請參考第7圖，顯示根據本發明之第1圖之B T A c寫要求176之内容。第7圖顯示輸入至BTAC寫入仵列丨44之入 BTAC寫入要求信號176内之由儲存階段128所產生之用於新一 BTAC142之項目之資訊，其也是存於BTAC寫入佇列14更之項目内之内容，如第8圖所示。 4 β T A C寫入要求1 7 6包括一分支指令位址攔位7 〇 2，其B 要更新該BTAC142之先前執行分支指令之位址。當該寫入& 要求1 7 6接著更新B T A C 1 4 2時，分支指令位址欄位7 〇 2之高階2 0位元係存至第5圖之標籤陣列項目3 1 4之標籤欄位。 5 0 2。分支指令位址襴位7 〇 2之低階7位元[丨i ·· 5 ]係當成 BTAC1 4#2之索引。在—實施例中，分支指令位址欄位7〇2 3 2 -位元搁位。 TAC寫入要求1 76也包括一開始欄位7〇8，以儲存於 4一圖之開始攔位4 0 4内。BTAC寫入要求176也包括一橫跨位兀712 ，以儲存於第4圖之橫跨位元4〇6内。矣-，入要求176也包括一寫入致能A攔位714，其代 ^ 5 u陵用BTAC寫入要求176指定之資訊來更新所選目 1 =項目312内之A部份。寫入要求176也包括 1 7厂# — 欄位7U，其代表是否要利用BTAC寫入要求 1二"曰疋之資訊來更新所選目標位址陣列項目312内 w 。否要入f求176也包括一無效八攔位718，其代表是 Ϊ ί : ί Π目標位址陣列項目312内之Α部份。無效化所…示位址陣列項目312内之A部份係包括：清除第5圖V. Description of the invention (19) Please refer to FIG. 7 to show the content of the request for writing B T A c according to FIG. 1 of the present invention. FIG. 7 shows the information input to the BTAC write queue 丨 44 The input of the BTAC write request signal 176 generated by the storage phase 128 for the new BTAC142 item, which is also stored in the BTAC write queue 14 The contents of the project are shown in Figure 8. 4 β T A C write request 1 7 6 includes a branch instruction address block 702, whose B is to update the address of the previously executed branch instruction of the BTAC142. When the write & request 1 7 6 then updates BTAC 1 4 2, the high-order 20 bits of the branch instruction address field 7 0 2 are stored in the label field of the label array item 3 1 4 in FIG. 5 . 5 0 2. The lower order 7 bits [丨 i ·· 5] of the branch instruction address 襕 2 are regarded as the index of BTAC1 4 # 2. In an embodiment, the branch instruction address field 702 3 2 -bit is set aside. The TAC write request 1 76 also includes a start field 708 to be stored in the start block 404 of the map. The BTAC write request 176 also includes a straddle bit 712 to be stored in the stride bit 406 of Figure 4.矣-, entry request 176 also includes a write enable A block 714, which replaces ^ 5 with the information specified by BTAC write request 176 to update the selected item 1 = part A in item 312. The write request 176 also includes 1 7factory # — field 7U, which represents whether to use the information of the BTAC write request 12 or “疋” to update the selected target address array item 312 w. If you want to enter f, 176 also includes an invalid eight block 718, which is represented by ί ί: ί Part A of the target address array item 312. Invalidation ... Part A of the address array item 312 includes: Clear Figure 5

12828twf.ptd12828twf.ptd

第26頁 200414034 五、發明說明（20) 之該A有效位元504 °BTAC寫入要求176也包括一無效B攔位 7 2 2 ’其代表是否要無效化所選目標位址陣列項目3丨2内之 B部份。無效化所選目標位址陣列項目3丨2内之b部份係包括：清除第5圖之該Β有效位元5〇β。 BTAC寫入要求176也包括一 4 —位元向欄位724，其指定要更新所選指令集之四向之哪一向。向欄位7 2 4是全解碼。在一實施例中，當微處理器1〇〇讀取BTAC142以得到分支=測時’微處理器1 〇〇決定要放於向欄位7 2 4内之值並透過官線階段而將該值往下送至儲存階段丨2 8以包含於該 BTAC寫入要求176内。如果微處理器1〇()正在更新BTAC142 内之一既有項目，亦即，如果目前擷取位址丨6 2命中於 BTAC1 42内，微處理器1 〇〇將既有項目之向設於向攔位以4 内。如果微處理器1〇〇正在寫入新項目於BTAC142内，比如，新分支指令，微處理器丨〇〇將所選之Β τ A c 1 4 2指令集之未^向設於向欄位724内。在一實施例中，當微處理 ^ 〇靖取BTAC1 42以得到分支預測時，微處理器100從第5 圖之lru攔位5 0 8來.決定最久未用向。力3 現參考第8圖，顯示根據本發明之第3圖之BTAC 列1 4 4之方塊圖。 ^八打之RT^A^寫入，列144包括複數儲存元件8 0 2以儲存第7圖括6個蚀/^要求176。在一實施例中，BTAC寫入佇列1 44包括6個忽存元件8〇2以儲存^BTAC寫人要求1?6，如所示， BTAci入ίί仔列丨44也包括一有致位元804，相關於各 •、、、要求項目8 0 2 ;如果相關項目為有效，則有效位Page 26 200414034 V. Description of the invention (20) The A valid bit 504 ° BTAC write request 176 also includes an invalid B stop 7 2 2 'It represents whether the selected target address array item is to be invalidated 3 丨Part B in 2. Invalidating part b of the selected target address array item 3 丨 2 includes: clearing the B valid bit 50β in FIG. 5. The BTAC write request 176 also includes a 4-bit direction field 724, which specifies which of the four directions of the selected instruction set to update. The field 7 2 4 is fully decoded. In one embodiment, when the microprocessor 100 reads BTAC142 to get the branch = test, the microprocessor 100 decides to put the value in the field 7 2 4 and pass the official line stage to the The value is sent down to the storage stage 2-8 to be included in the BTAC write request 176. If the microprocessor 10 () is updating an existing item in BTAC142, that is, if the current fetch address 丨 62 hits BTAC1 42, the microprocessor 100 will set the direction of the existing item to Within 4 stops. If the microprocessor 100 is writing a new item in BTAC142, for example, a new branch instruction, the microprocessor 丨〇〇 sets the selected direction of the B τ A c 1 4 2 instruction set to the direction field Within 724. In one embodiment, when the micro processor ^ 0 takes BTAC1 42 to obtain the branch prediction, the microprocessor 100 determines the longest unused direction from the lru block 5 of FIG. 5. Force 3 Referring now to Figure 8, a block diagram of BTAC columns 1 4 4 according to Figure 3 of the present invention is shown. ^ Eight dozen RT ^ A ^ writes, column 144 includes a plurality of storage elements 802 to store the seventh figure, including six etch / ^ requirements 176. In one embodiment, the BTAC write queue 1 44 includes 6 memory elements 802 to store the ^ BTAC writer request 1 to 6, as shown, the BTAci entry queue 44 also includes a valid bit 804, related to each required item 8 0 2; if the related item is valid, the effective bit

12828twf.ptd 第27頁 20041403412828twf.ptd Page 27 200414034

真京入如ίΊ關4:目為無效，則有效位元804為假。 r 7Γ ^Φ8Π9 ' 包括控制邏輯電路806，耦合至儲存疋件8 0 2與有效位元8 04。控制邏至储 "ϊί^Λ11146 ° 7βΛ/β^ 入仔列144時，控制邏輯電路8〇6增加 C寫 H寫Λ要求176從BTAC寫以宁列144移^ i制邏3輯電 f，P白&1 28傳來之BTAC寫人要求信號丨76並將所接== 求存於項目8 0 2。控制邏輯電路8〇6也接收W圖之之要 ==2，予頁測取代信號154，指令緩衝器全滿信^ 與L令快取閒置信號158。當佇列深度146大於〇時，扣邏輯電路8 06產生為真之第2圖之BWQ非空信號218。當卷列深度146之值等於項目8 0 2之總數量（在第8圖之實施^ ^ 為8)犄，控制邏輯電路8 〇6產生為真之第2圖之BWQ全滿號2 2 2。當控制邏輯電路8〇β產生為真之BWQ非空信號以^ 時，控制邏輯電路8 0 6將BTAC寫入佇列144之最舊（或最部）項目8 0 2之分支，指令位址7 〇 2設於第1圖之b w Q位址信^ 1 78内。此外，當控制邏輯電路8〇6產生為真之MWQ非空f 號218時，控制邏輯電路8〇6也將^人(]寫入佇列144之最^ (或最底部）項目802之第7圖之欄位706〜724設於6”(3資耝號248内。、竹信現參考第9圖，顯示根據本發明之第1圖之BTAC寫人列1 4 4之操作流程圖。流程開始於決定方塊9 〇 2。宁在決定方塊902，BTAC寫入佇列144藉由決定第1圖之Zhen Jing Ru Ru Guan 4: If the item is invalid, the effective bit 804 is false. r 7Γ ^ Φ8Π9 'includes a control logic circuit 806, which is coupled to the storage file 802 and the effective bit 804. Control logic to storage " ϊί ^ Λ11146 ° 7βΛ / β ^ When entering the column 144, the control logic circuit 806 adds C to write H to write Λ requires 176 to move from BTAC to write to column 144 ^ i system logic 3 series electric f , P white & 1 28 The BTAC writer request signal from 丨 76 and connected == to be stored in item 8 0 2. The control logic circuit 806 also receives the key of the W diagram == 2, and replaces the signal 154 with the page test. The instruction buffer is full and the L idle cache signal 158. When the queue depth 146 is greater than 0, the buck logic circuit 806 generates a BWQ non-empty signal 218 of FIG. 2 which is true. When the value of the roll depth 146 is equal to the total number of items 8 0 2 (the implementation in FIG. 8 ^ ^ is 8) 犄, the control logic circuit 8 0 6 generates the BWQ full number of the second image 2 2 2 . When the control logic circuit 80β generates a true BWQ non-empty signal to ^, the control logic circuit 806 writes BTAC to the branch of the oldest (or the most) item 8 of queue 144, the instruction address. 7 〇 2 is located in bw Q address letter ^ 1 78 in Figure 1. In addition, when the control logic circuit 806 generates the true MWQ non-empty f-number 218, the control logic circuit 806 also writes ^ person () into the ^ (or bottom) item 802 of the queue 144 Columns 706 to 724 in Figure 7 are located in 6 ”(3 Assets No. 248). Zhuxin now refers to Figure 9 and shows the operation flowchart of BTAC writer column 1 4 4 according to Figure 1 of the present invention. Process Start at decision block 9 02. Rather at decision block 902, BTAC writes queue 144 by determining

12828twf.ptd 第28頁 200414034 五、發明說明（22) 仵列深度1 4 6是否等於B T A C寫入仵列1 4 4内之總項目數量來決疋B T A C寫入彳丁列1 4 4疋否全滿。如果全滿，流程跳至方塊9 1 8以更新B T A C 1 4 2 ;否則，流程跳至決定方塊g 〇 4。在決定方塊9 0 4 ’ B T A C寫入佇列1 4 4藉由檢查該指令快取閒置信號1 5 8來決定第1圖之該指令快取1 〇 4是否閒置。如果閒置，必要時，流程跳至決定方塊9 2 2以更新Β Τ Δ C 1 4 2 因為B T A C 1 4 2可能未被續取；否則，流程跳至決定方塊 9 0 6 〇在決疋方塊9 0 6 ’ B T A C寫入仵列1 4 4藉由檢查該指令緩衝為全滿信號1 5 6來決定第1圖之指令緩衝器1 〇 6是否全滿。如果全滿，必要時，流程跳至決定方塊9 2 2以更新 BTAC142因為BTAC142可能未被讀取；否則，流程跳至決方塊9 0 8。 ' 在決定方塊9 0 8，BTAC寫入佇列144藉由檢查該預代信號154來決定BTAC1 42分支預測是否已被取代。如果取是，必要時，流程跳至決定方塊9 22以更新BTAC142因為 B T A C 1 4 2可能未被讀取；否則，流程跳至決定方塊9丨2。在決定方塊912 ’ BTAC寫入佇列144藉由檢查該分。測信號152來決定BTAC1 42分支預測是否已被校正。如^辦是，必要時，流程跳至決定方塊9 22以更新BTAC1 42因 B T A C 1 4 2可能未被讀取，否則，流程跳至決定方塊9丨4二在決定方塊914 ’BTAC寫入佇列144決定是否已產 BTAC寫入要求176。如果否，流程跳回至決定方塊9〇2芝^ 則，流程跳至方塊9 1 6。 ’否12828twf.ptd Page 28 200414034 V. Description of the invention (22) The queue depth 1 4 6 is equal to the total number of items in the BTAC write queue 1 4 4 to determine the BTAC write 彳 queue 1 4 4 full. If it is full, the flow jumps to block 9 1 8 to update B T A C 1 4 2; otherwise, the flow jumps to decision block g 04. In the decision block 9 0 4 ′ B T A C is written to the queue 1 4 4 by checking the instruction cache idle signal 1 5 8 to determine whether the instruction cache 104 of the first figure is idle. If idle, the process jumps to decision block 9 2 2 to update B Δ Δ C 1 4 2 because BTAC 1 4 2 may not be renewed; otherwise, the process jumps to decision block 9 0 6 〇 at decision block 9 0 6 'BTAC write queue 1 4 4 Determine whether the instruction buffer 1 06 in the first figure is full by checking that the instruction buffer is a full signal 1 5 6. If full, if necessary, the process jumps to decision block 9 2 2 to update BTAC142 because BTAC142 may not be read; otherwise, the process jumps to decision block 9 0 8. 'At decision block 908, the BTAC write queue 144 determines whether the BTAC1 42 branch prediction has been replaced by checking the prediction signal 154. If yes, if necessary, the flow jumps to decision block 9 22 to update BTAC142 because B T A C 1 4 2 may not be read; otherwise, the flow jumps to decision block 9 丨 2. At decision block 912 ', the BTAC writes queue 144 by checking the score. The signal 152 is measured to determine whether the BTAC1 42 branch prediction has been corrected. If yes, if necessary, the process jumps to decision block 9 22 to update BTAC1 42 because BTAC 1 4 2 may not be read. Otherwise, the process jumps to decision block 9 丨 4. In decision block 914 'BTAC write 伫Column 144 determines whether BTAC write request 176 has been produced. If not, the flow jumps back to decision block 9202. Then, the flow jumps to block 9 1 6. 'no

12828twf.ptd 第29頁12828twf.ptd Page 29

200414034 在决疋方塊916，BTAC寫入佇列144載入該BTAC寫入要求176並增加佇列深度146。該BTAC寫入要求176被載入至 B一TAC寫入佇列144之最頂端之無效項目，接著該項目被標示為有效。流程跳回至決定方塊9 〇 2。在決疋方塊9 1 8 ，B T A C寫入佇列丨4 4利用b T A C寫入佇列 144内之最舊或底部項目來更新BTAC142，並減少佇列深度 1 4 6。B Tj C寫入佇列1 4 4接著往下移一個項目。藉由將最舊項目之第7圖之分支指令位址襴位7〇2之值設成MQ位址信 ^178 ’以及將最舊BTAC寫入要求176之其他部份設於BWQ 貢，=號24 8，BTAC寫入佇列144利用BTAC寫入佇列144内之最舊項目來更新BTAC142。此外，BTAC寫入佇列144發出為真之BWQ非空信號21 8至第2圖之仲裁器2 0 2。如果流程係從決定方塊9 0 2跳至方塊918，BTAC寫入佇列144也發出為真之B W Q全滿信號2 2 2 8至第2圖之仲裁器2 0 2。流程從方塊 9 1 8跳至決定方塊9 1 4。要注意’如果在BTAC讀取要求信號212也在待處理期間内，BTAC寫入仔·列144發出該BWQ全滿信號2 2 2且該仲裁器2 0 2允許BTAC寫入佇列144存取BTAC142 ;則BTAC142將會落空’但如果B T A C 1 4 2所預測之分支指令之有效目標位址存在於B T A C 1 4 2内之目前擷取位址1 6 2所指定之快取線的話，此落空係為偽性落空。然而，有利的是，藉由在大部份情況下將BTAC142之寫入延遲到BTAC142未被讀取，BTAC 寫入佇列1 4 4可降低B T A C 1 4 2之偽性落空之可能性，如第9 圖所示。200414034 At block 916, the BTAC write queue 144 loads the BTAC write request 176 and increases the queue depth 146. The BTAC write request 176 is loaded into the invalid item at the top of the B-TAC write queue 144, and then the item is marked as valid. The process jumps back to decision block 902. At block 9 1 8, the B T A C write queue 4 4 updates the BTAC142 with the oldest or bottom item in the b T A C write queue 144 and reduces the queue depth 1 4 6. B Tj C writes to queue 1 4 4 and moves down one item. By setting the value of the branch instruction address bit 702 of the oldest item in Figure 7 to the MQ address letter 178 'and the other part of the oldest BTAC write request 176 to the BWQ tribute, = No. 24 8, BTAC write queue 144 uses the oldest item in BTAC write queue 144 to update BTAC 142. In addition, the BTAC write queue 144 issues a true BWQ non-empty signal 21 8 to the arbiter 2 02 of FIG. 2. If the flow jumps from decision block 902 to block 918, the BTAC write queue 144 also issues a true B W Q full signal 2 2 2 8 to the arbiter 2 0 2 of FIG. 2. The flow jumps from block 9 1 8 to decision block 9 1 4. Note that 'if the BTAC read request signal 212 is also pending, the BTAC write column 144 issues the BWQ full signal 2 2 2 and the arbiter 2 0 2 allows the BTAC write queue 144 access BTAC142; then BTAC142 will fail ', but if the effective target address of the branch instruction predicted by BTAC 1 4 2 exists in the cache line specified by the current fetch address 1 6 2 in BTAC 1 4 2, this fails The system is false. However, it is advantageous that by delaying the writing of BTAC142 until BTAC142 is not read in most cases, the BTAC write queue 1 4 4 can reduce the possibility of BTAC 1 4 2 false failure, such as Figure 9 shows.

12828twf.ptd 第30頁 200414034 五、發明說明（24) 在決定方塊9 2 2，控制邏輯電路8 〇 6藉由決定佇列深度 146是否等於0來決定是否BTAC寫入佇列丨44為空。如果又疋’ SlL程跳至决疋方塊9 1 4，否則，流程跳至決定方塊g 2 2 以更新BTAC142因為BTAC142可能未被讀取。現參考第1 0圖’顯示根據本發明之第1圖之該微處理裔100内之將该BTAC内多餘目標位址無效化之邏輯電路之方塊圖。第10圖顯示第3圖之BTAC142之標籤陣列304接收第1圖之位址1 8 2並回應性產生4個標籤，標示為t a g 〇丨〇〇 2 a， tagl 1002B ，tag2 1002C 與tag3 1002D ，總稱為標籤 1002。標籤1002包括從標籤陣列304之4向之各向傳來之第 5圖之標籤5 0 2。此外，標籤陣列3 〇 4回應性產生8個有效位元[7 :0]’標示為1004，其為從標籤陣列304之4向之各向傳來之A有效位元504與B有效位元506。微處理器1 0 0也包括比較器1 〇 1 2，耦合至標籤陣列 3 0 4，該比較器1 0 1 2接收位址1 8 2。在第1 0圖之實施例中，比較器1 0 1 2包括4個2 0 -位元比較器，各比較器比較位址 1 8 2之局階2 0位元與相關標籤1 〇〇 2以產生四個匹配信號，標示為matchO 1006A ’matchl 1006B ，match2 1006C 與 m a t c h 3 1 0 0 6 D，總稱為匹配信號1 〇〇 6。如果位址1 8 2匹配於相關標籤1 0 0 2，則比較器1 〇 1 2產生為真值之匹配信號 1 0 0 6 ° 微處理器1 0 0也包括控制邏輯電路1 〇 1 4，耦合至比較器1 0 1 2，該電路1 0 1 4接收匹配信號1 〇〇 6與有效信號1 〇〇 4。12828twf.ptd Page 30 200414034 V. Description of the invention (24) In the decision block 9 2 2, the control logic circuit 8 determines whether the BTAC write queue 44 is empty by determining whether the queue depth 146 is equal to 0. If again, the SlL process jumps to decision block 9 1 4; otherwise, the process jumps to decision block g 2 2 to update BTAC142 because BTAC142 may not be read. Reference is now made to Fig. 10 ', which is a block diagram of a logic circuit in the microprocessor 100 according to Fig. 1 of the present invention that invalidates an excess target address in the BTAC. Fig. 10 shows that the tag array 304 of BTAC142 in Fig. 3 receives the address 1 8 2 in Fig. 1 and generates 4 tags responsively, labeled tag 〇丨〇〇2 a, tagl 1002B, tag2 1002C, and tag3 1002D. Collectively referred to as the label 1002. The label 1002 includes the label 5 0 2 of the fifth figure transmitted from the label array 304 in each direction. In addition, the tag array 3 responsively generates 8 significant bits [7: 0] 'labeled 1004, which are A significant bits 504 and B significant bits transmitted from the tag array 304 in each direction 506. The microprocessor 1 0 0 also includes a comparator 1 0 12 which is coupled to the tag array 3 0 4 and which receives the address 1 8 2. In the embodiment of FIG. 10, the comparator 10 12 includes four 20-bit comparators, and each comparator compares the local 20 bits of the address 1 8 2 with the related tag 1 〇〇2 To generate four matching signals, labeled as matchO 1006A 'matchl 1006B, match2 1006C and match 3 1 0 0 6 D, collectively referred to as the matching signal 1 006. If the address 1 8 2 matches the relevant tag 1 0 0 2, the comparator 1 〇 1 2 generates a true value matching signal 1 0 0 6 ° The microprocessor 1 0 0 also includes the control logic circuit 1 〇 4 Coupled to the comparator 10 12, the circuit 10 14 receives a matching signal 1 006 and a valid signal 1 004.

12828twf.ptd 第31頁 200414034 五、發明說明（25) 如果標籤陣列304之所選指令集之向中有複數向且值之匹配信號1 0 0 6與至少一個為真值之有效位元、1〇〇，具則控制邏輯電路1014儲存一真值於多餘TA旗標暫内，以代表同一分支指令之一個以上之有$文目標；址係存於BTAC142内。此外，控制邏輯電路1〇14使至於多餘TA位址暫存器1〇26内。最後，控制邏輯電8路Ί 載入多餘TA無效資料至多餘以無效資料暫存器1〇22内。在一實施例中，存於多餘TA無效資料暫存器1〇22内之相似於第7圖之BTAC寫入要求丨76，除了未儲存分支指令位址7 0 2外，因為該分支指令之位址係存於多餘ta位址暫存為1 0 2 6内；且也未儲存目標位址7〇6，開始位元7〇8，盥橫 mi'為其在無效btaci42項目内是無關緊要的口而，s進仃夕餘TA無效化時，目標位址陣列3〇2不會被寫入，而只有標籤陣列3〇4被更新以無效該多餘Μ Μ Η〗項目。該多巧ΤΑ無效資料暫存器1〇22之輸出包括第2圖之多、餘ΤΑ無效資料信號2 44。該多餘ΤΑ旗標暫存器丨〇24之於出包括第2圖之多餘^要求214。該多餘以位址暫存器之輸出包括第2圖之多餘以位址2 34。在一實施例中，存於該多餘T A無效資料暫存器丨〇 2 2與該多餘τ A旗 ' 之該向值7 2 4之產生等式係顯示於底下之表2。在m24内有效位tl[3]包括a有效位元[3]5〇4與3有效位元[3]5〇6之邏輯OR結果；有效位元[2]包括A有效位元[2]5〇4與B有效位兀[2 ] 5 0 6之邏輯0R結果；有效位元[丨]包括a有效位元 [1] 5 0 4與B有效位元[1]5〇6之邏輯〇R結果；以及有效位元12828twf.ptd Page 31 200414034 V. Description of the invention (25) If there is a complex number in the direction of the selected instruction set of the tag array 304 and a value matching signal 1 0 0 6 and at least one valid bit that is true, 1 〇〇, the control logic circuit 1014 stores a truth value in the redundant TA flag temporarily to represent more than one target with the same branch instruction; the address is stored in BTAC142. In addition, the control logic circuit 1014 stores the extra TA address register 1026. Finally, the control logic circuit 8 loads the redundant TA invalid data into the redundant invalid data register 1022. In one embodiment, the BTAC write request stored in the redundant TA invalid data register 1022 is similar to that in FIG. 7 except that the branch instruction address 7 0 2 is not stored because the branch instruction The address is stored in the redundant ta address temporarily as 1026; and the target address 706, the starting bit 708, is not stored. It is irrelevant that it is in the invalid btaci42 project. In addition, when the TA is invalidated, the target address array 302 will not be written, and only the tag array 304 will be updated to invalidate the redundant M M Η item. The output of the Duo TA invalid data register 1022 includes as many as in Fig. 2 and the remaining TA invalid data signal 2 44. The redundant TA flag register _24 includes the redundant request 214 of FIG. 2. The output of the redundant address register includes the redundant address 2 34 of FIG. 2. In one embodiment, the generation equation of the direction value 7 2 4 stored in the redundant TA invalid data register 〇 02 2 and the redundant τ A flag 'is shown in Table 2 below. In m24, the significant bit t1 [3] includes the logical OR result of a significant bit [3] 504 and 3 significant bits [3] 506; the significant bit [2] includes A significant bit [2] Logic 0R result of 504 and B significant bit [2] 5 0 6; significant bit [丨] includes logic of a significant bit [1] 5 0 4 and B significant bit [1] 506. R result; and significant bits

第32頁 12828twf.ptd 200414034 五、發明說明（26) [0]包括A有效位元[0]504與B有效位元[〇]506之邏輯〇R結果。Page 32 12828twf.ptd 200414034 V. Description of the invention (26) [0] includes the logical OR result of A significant bit [0] 504 and B significant bit [0] 506.

RedundantInvalWay[3]-(valid[3]&match[3])&((valid[0 ]&match[0]) | (valid[l]&match[l ]) | (valid[2]&match[2] ))；RedundantInvalWay [3]-(valid [3] & match [3]) & ((valid [0] & match [0]) | (valid [l] & match [l]) | (valid [2 ] & match [2]));

RedundantInvalWay[2]=(valid[2]&match[2])&((valid[0 ]&match[0]) | (valid[l]&match[l]));RedundantInvalWay [2] = (valid [2] & match [2]) & ((valid [0] & match [0]) | (valid [l] & match [l]));

RedundantInvalWay[l ] = (valid[l ]&match[1 ])&(val id[0] &match [ 0 ]);RedundantInvalWay [l] = (valid [l] & match [1]) & (val id [0] & match [0]);

RedundantInvalWay[0] = 0; / 氺Way 0 永遠不會被無效氺 /RedundantInvalWay [0] = 0; / 氺 Way 0 will never be invalidated 氺 /

RedundanlnAFlag二（（val id[3]&match[3])&(valid[2]&matRedundanlnAFlag two ((val id [3] & match [3]) & (valid [2] & mat

ch[2 ] ) ) I ((valid[3]&match[3])&(valid[l]&match[l]))I ((valid[3]&match[3])&(valid[0]&match[0]))I ((valid[2]&match[2])&(valid[l]&match[l]))I C(valid[2]&match[2])&(valid[0]&match[0]))l ((valicUH&matchElD&CvalifHiH&matcmO])); 為使第10圖之多餘目標位址無效邏輯電路之適當操作，如第1 1圖所示，將一串的指令執行為例做說明，其可在BTAC 142内產生同一分支指令之多餘目標位址項目。第1圖之第一目前擷取位址1 6 2係輸入至指令快取1 〇 4 與BTAC1 42。第一目前擷取位址1 62所選之快取線包括一分支指令，稱為分支-A。第一目前擷取位址162選擇BTAC 142 内之一指令集，稱為指令集N。指令集N之向内沒有一個標ch [2])) I ((valid [3] & match [3]) & (valid [l] & match [l])) I ((valid [3] & match [3]) & (valid [0] & match [0])) I ((valid [2] & match [2]) & (valid [l] & match [l])) IC (valid [2] & match [2]) & (valid [0] & match [0])) l ((valicUH & matchElD & CvalifHiH &matcmO])); It is appropriate to make the redundant target address in Figure 10 invalid. The operation, as shown in FIG. 11, is described by taking a series of instructions as an example, which can generate redundant target address items of the same branch instruction in BTAC 142. The first current fetch address 16 in Figure 1 is input to the instruction cache 1 04 and BTAC1 42. The cache line selected for the first current fetch address 1 62 includes a branch instruction called branch-A. The first current fetch address 162 selects one instruction set in BTAC 142, which is called instruction set N. There is no index in the instruction set N.

12828twf.ptd 第33頁 200414034 五、發明說明（27) 籤1002匹配於第一目前擷取位址162 ;因此，BTAC142產生落空。在此例中，lru值508所代表之最久未用向是2。因此，關於分支-A之更新BTAC1 42之資訊係沿著管線往下送，連同代表向2必需被更新之分支-A。接著，輸入一第二目前擷取位址1 6 2至該指令快取1 〇 4 與BTAC1 42。由第二目前擷取位址162所選之快取線包括一分支指令，稱為分支-B。第二目前擷取位址1 6 2也選擇指令集N且命中於指令集N之3向；接著，BTAC142產生一命中。此外’BTAC142更新指令集N之lru值508為1向。接著’因為分支-A是碼之緊湊迴圈之一部份，再次輸入該第一目前擷取位址1 62至該指令快取1 04與BTAC1 42，並再次選擇指令集N。因為分支—A之第一次執行未到達第1 圖之儲存階段1 2 8，B T A C 1 4 2未利用分支-A之目標位址做更新。接著，BTAC142再次產生落空。然而，此次ilru值 508所指之最久未用向是1 ，因為iru5〇8回應於分支一 B之命中而被更新。因此’關於分支-A之第二次執行之更新 BTAC 142之資訊係沿著管線往下送，連同代表向i必需被更新之分支-A之第二次執行。接著，該第一分支-A到達該儲存階段1 2 8並產生一 B T A C寫入要求1 7 6以利用分支-A之目標位址來更新指令集N 之向2，這將於後續進行。接著’該第二分支-A到達該儲存階段1 2 8並產生一 BTAC寫入要求1 76以利用分支-A之目標位址來更新指令集N 之向1 ，這將於後續進行。因此，同一分支指令，分支/、12828twf.ptd Page 33 200414034 V. Description of the invention (27) The signature 1002 matches the first current retrieval address 162; therefore, BTAC142 fails. In this example, the longest unused direction represented by the lru value 508 is two. Therefore, the information on the update of branch-A BTAC1 42 is sent down the pipeline, along with branch-A, which must be updated on behalf of 2. Next, enter a second current fetch address 16 2 into the instruction cache 1 104 and BTAC1 42. The cache line selected by the second current fetch address 162 includes a branch instruction called branch-B. The second current fetch address 1 6 2 also selects instruction set N and hits 3 directions of instruction set N; then, BTAC142 generates a hit. In addition, the lru value 508 of the 'BTAC142 update instruction set N is one-way. Then 'because branch-A is part of the compact loop of the code, enter the first current fetch address 1 62 to the instruction cache 1 04 and BTAC1 42 again, and select instruction set N again. Because the first execution of branch-A did not reach the storage stage 1 2 8 of Figure 1, B T A C 1 4 2 did not use the target address of branch-A to update. Then, BTAC142 failed again. However, the longest unused direction pointed to by the ilru value 508 this time is 1, because iru508 was updated in response to the hit of branch B. Therefore, the information about the second execution of branch-A BTAC 142 is sent down the pipeline, along with the second execution of branch-A, which must be updated to represent i. Then, the first branch-A reaches the storage stage 1 2 8 and generates a B T A C write request 1 7 to use the target address of branch-A to update the direction 2 of the instruction set N, which will be performed later. Then 'the second branch-A reaches the storage stage 1 2 8 and generates a BTAC write request 1 76 to use the target address of branch-A to update the direction of instruction set N, which will be carried out later. Therefore, the same branch instruction, branch /,

12828twf.ptd 第34頁 200414034 五、發明說明（28) -A，之兩個有效項目存在於BT A C 142内。該些項目之一是多餘的且造成BTAC142之使用較無效率，因為該多餘項目可以被另一分支指令使用及/或會佔去另一分支指令之有效目標位址。現參考第11圖，顯示根據本發明之第10圖之多餘目標位址裝置之操作流程圖。流程開始於方塊1 1 〇 2。在方塊1 1 02 ，仲裁器2 0 2允許第2圖之BTAC讀取要求 212對BTAC142之存取，造成多工器148選擇目前擷取位址 1 6 2以設於第1圖之位址信號1 8 2上並產生第2圖之控制信號 2 5 2以代表BTAC1 42之讀取。接著，目前擷取位址1 62之低階位元透過位址182而當成選擇BTAC1 42之指令集之索引。流程接續至方塊1 1 0 4。在方塊1104，比較器1〇12比較所選BTAC142之指令集之所有4個向之第10圖之標籤10〇2與設於位址信號182上之目前擷取位址1 6 2之高階位元以產生第丨〇圖之匹配信號 1006。控制邏輯電路1〇14接收第1〇圖之匹配信號1〇〇6盥有效位元1 0 0 4。流程接續至方塊1106。 /、在方塊1106，控制邏輯電路1014決定是否發生一上之有效標籤匹配。亦即，根據有效位元丨〇〇4盥 1006，S制邏輯電路1014決定是否有目前操取位^ ^ 選之BTAC 142之指令集内之2個以上的向有一斤^ 1〇〇2。如果是，流程接續至方塊丨1〇8 ;否則，^ 2，籤在方塊1 1 0 8，控制邏輯電路丨〇丨4儲存_ ^ =二束。12828twf.ptd Page 34 200414034 V. Description of the Invention (28) -A, two valid items exist in BT A C 142. One of these items is redundant and makes the use of BTAC142 inefficient, because the redundant item can be used by another branch instruction and / or will take up a valid target address of another branch instruction. Referring now to Fig. 11, there is shown a flowchart of the operation of the redundant target address device according to Fig. 10 of the present invention. The process starts at block 1 102. At block 1 1 02, the arbiter 2 0 2 allows the BTAC read request 212 of Figure 2 to access BTAC 142, causing the multiplexer 148 to select the current fetch address 1 6 2 to be set at the address of Figure 1 The signal 1 8 2 is generated and the control signal 2 5 2 in FIG. 2 is generated to represent the reading of BTAC1 42. Then, the low-order bits currently fetching address 1 62 pass through address 182 and serve as an index to select the instruction set of BTAC1 42. The process continues to block 1 104. At block 1104, the comparator 1012 compares all four directions of the selected instruction set of the BTAC142 to the 10th direction label 1002 of FIG. 10 and the higher-order bit of the currently fetched address 1 6 2 set on the address signal 182. Unit to generate the matching signal 1006 of the figure. The control logic circuit 1014 receives the matching signal 1006 in FIG. 10 and the effective bit 10 04. Flow continues to block 1106. /. At block 1106, the control logic circuit 1014 determines whether a valid tag match occurs. That is, according to the valid bit 1004, the logic circuit 1014 of the S system determines whether there are currently more than 2 bits in the instruction set of the selected BTAC 142, and there is 1 kg ^ 002. If yes, the flow continues to block 丨 108; otherwise, ^ 2, sign at block 1108, and the control logic circuit 丨 ○ 4 stores _ ^ = two beams.

旗標暫存器1 0 24，儲存位址182於多餘以位址真暫直存於器多餘TAFlag register 1 0 24, the address 182 is stored in the excess, and the address is temporarily stored in the register.

12828twf.ptd12828twf.ptd

第35頁 200414034 五、發明說明（29) 1 0 2 6，以及儲存無效資料於多餘TA無效資料。 $別是，控制^輯電路1〇14儲存為真值之寫人致欄位 ”、、入至月匕β欄位7 1 β、無效Α欄位7 1 8與無效β欄位7 2 2 於夕餘ΤΑ無效資料暫存器1〇22。此外，控制邏輯電路ι〇ΐ4 :Ϊ 所描述之表2之向攔位724之值存於多餘ΤΑ無效貝料位址暫存器1〇22。流程接續至方塊丨U2。斜R τ 9塊1 1 1 2 ，仲裁器2 0 2允許第2圖之多餘Τ Α要求2 1 4 對BTAC142之存取，造成多工器148選擇多餘以位址234以設於位址信號丨82上且產生第2圖之控制信號2 5 2以指示 BMy 42之寫入。接著，多餘TA位址2 3 4之低階位元透過位址182而當成選擇BTAC142之指令集之索引。BTAC142接收多，TA貧料暫存器丨〇22所輸出之多餘資料信號以^並將所選指令集内之向攔位7 2 4所指向之該些向無效化。流士束於方塊1112。現參考第1 2圖，顯示根據本發明之該微處理器丨〇〇之死結避免邏輯電路之方塊圖。 ^ σ第12圖顯示第1圖之BTAC142，指令快取1〇4，指令緩衝，106，指令規格化器1〇8，規格化後指令佇列丨12與多工器136 ’以及第1〇圖之控制邏輯電路1〇14。如第12圖，微處理器100也包括一死結無效資料暫存器1222，一死結旗標暫存器1224，死結位址 1 2 2 6。裔才曰々規格化器1 〇 8解碼存於該指令緩衝器1 〇 6内之指令’以及如果指令規格化器丨〇 8解碼出橫跨兩快取線之分Page 35 200414034 V. Description of the invention (29) 1 0 2 6 and storing invalid data in redundant TA invalid data. $ Do n’t, the control circuit circuit 1014 stores the writer's message field as the true value ", the entry to the moon β field 7 1 β, the invalid A field 7 1 8 and the invalid β field 7 2 2 Yu Xiyu TA Invalid Data Register 1102. In addition, the control logic circuit ι〇ΐ4: Ϊ described in Table 2 as the value of the direction block 724 is stored in the redundant TA Invalid Material Address Register 1102. The flow continues to the block U2. The oblique R τ 9 blocks 1 1 1 2, the arbiter 2 0 2 allows the extra T in the second figure Α requires 2 1 4 access to BTAC142, causing the multiplexer 148 to select the extra bits The address 234 is set on the address signal 丨 82 and generates the control signal 2 5 2 of FIG. 2 to instruct the writing of BMy 42. Then, the lower-order bits of the excess TA address 2 3 4 are treated as the address 182 Select the index of the instruction set of BTAC142. If BTAC142 receives more, the extra data signals output by TA lean material register 〇22 will invalidate the directions pointed by the direction block 7 2 4 in the selected instruction set. Rushi is tied at block 1112. Reference is now made to Fig. 12 which shows a block diagram of the microprocessor's dead-knot avoidance logic circuit according to the present invention. ^ Σ Fig. 12 shows BTAC142 in Fig. 1, instruction cache 104, instruction buffer, 106, instruction normalizer 108, instruction queue after normalization 12 and multiplexer 136 ', and control logic circuit 1 in Fig. 10 〇14. As shown in FIG. 12, the microprocessor 100 also includes a dead-knot invalid data register 1222, a dead-knot flag register 1224, and a dead-knot address 1 2 2 6. The normalizer 1 0 8 Decode the instruction stored in the instruction buffer 106 and if the instruction normalizer decodes the points across the two cache lines

12828twf.ptd 第36頁 200414034 五、發明說明（30) 支指令，則產生為真之F_wrap信號丨2〇2。特別是，在指令規格化器1 0 8解碼出橫跨兩快取線之分支指令時，一旦已解碼出存於指令緩衝器1 〇 6内之一第一快取線内之一橫跨分支指令之該第一部份，不論指令規格化器丨0 8是否已解碼尚未存於指令緩衝器丨〇 6内之該第二快取線内之該橫跨分支指令之其他部份，指令規格化器丨〇 8產生為真之 F 一 wrap信號1202 aF — wrap信號1202係輸入至控制邏輯電路 10 14° 當目前擷取位址1 6 2落空時，指令快取1 〇 4產生為真值之落空信號1 2 0 6。落空信號1 2 0 6係輸入至控制邏輯電路 1014 ° 當輸入至指令快取1 〇 4之目前擷取位址1 6 2是預測的，亦即，當目前擷取位址1 6 2是一預測性位址時，指令快取 1 0 4產生為真值之一預測信號1 2 〇 8，比如當多工器1 3 6選擇 B T A C預測目標位址1 6 4為目前擷取位址1 6 2時。預測信號 1 2 0 8係輸入至指令快取1 〇 4。在一實施例中，指令快取1 〇 4 將預測信號1 2 0 8送至第1圖之指令擷取器1 〇 2，使得指令擷取器1 0 2放棄從記憶體之預測記憶體位址處擷取落空於指令快取1 0 4内之快取線，理由將參考第1 3圖而於底下描述° BTAC142產生一執行/不執行（T/NT)信號1212，其輸出至控制邏輯電路1 0 1 4。為真值之T / N T信號1 2 1 2代表位址 1 82命中於BTAC1 42内，代表BTAC1 42預測一分支指令係包括於回應於目前擷取位址1 6 2而由指令快取1 〇 4提供之快取12828twf.ptd Page 36 200414034 V. Description of the invention (30) If the instruction is 30, it will generate the F_wrap signal that is true 丨 2 02. In particular, when the instruction normalizer 108 decodes a branch instruction that spans two cache lines, once it has been decoded, it stores one of the cross branches in the first cache line stored in the instruction buffer 106. The first part of the instruction, regardless of whether the instruction normalizer 丨 0 8 has decoded the other part of the cross-branch instruction in the second cache line that is not yet stored in the instruction buffer 丨 06, instruction specification丨〇8 produces true F_wrap signal 1202 aF — wrap signal 1202 is input to the control logic circuit 10 14 ° When the current fetch address 1 6 2 fails, the instruction cache 1 〇4 is generated as the true value The fall signal is 1 2 0 6. The fail signal 1 2 0 6 is input to the control logic circuit 1014 ° When input to the instruction cache 1 0, the current fetch address 1 6 2 is predicted, that is, when the current fetch address 1 6 2 is a In the case of a predictive address, the instruction cache 1 0 4 generates a prediction signal 1 2 0 which is one of the true values. For example, when the multiplexer 1 3 6 selects the BTAC prediction target address 1 6 4 as the current fetch address 1 6 2 o'clock. The prediction signal 1 2 0 8 is input to the instruction cache 1 104. In one embodiment, the instruction cache 1 104 sends the prediction signal 1 2 0 8 to the instruction fetcher 1 0 2 in FIG. 1, so that the instruction fetcher 1 2 abandons the predicted memory address from the memory. Retrieve the cache line that falls within the instruction cache 104. The reason will be described below with reference to Figure 13. BTAC142 generates an execute / not execute (T / NT) signal 1212, which is output to the control logic circuit. 1 0 1 4. The T / NT signal which is the true value 1 2 1 2 represents that the address 1 82 hits BTAC1 42 and represents that BTAC1 42 predicts that a branch instruction is included in the instruction cache in response to the current fetch address 1 62. 4 provided cache

12828twf.ptd 第37頁 200414034 五、發明說明（31) 線内，代表該分支指令要被執行，以及代表BTAC 142將分支指令之目標位址設於BTAC預測目標位址信號1 64上。 BTAC142根據第6圖之預測狀態A 6 0 2或預測狀態B 6 0 4之值而產生T/NT信號1212，取決於該BTAC 142在分支預測時係使用A或B部份。 BTAC142也產生B_wrap信號1214，輸出至控制邏輯電路1014。所選之BTAC目標位址陣列項目312之第4圖之橫跨位元406之值係設成β — wrap信號丨214。因此，B_wrap信號 1214之偽值代表，BTAC1 42預測成該分支指令未橫跨於兩快取線。在一實施例中，控制邏輯電路1〇14暫存B_wrap信號1214以維持從先前BTAC142存取所得之B —wrap信號1214 之值。控，邏輯電路1〇14也產生第1圖之目前指令指標168。控制邏輯電路1 0 1 4也產生一控制信號丨2 〇 4，其是多工器 136之輸入選擇信號。如果控制邏輯電路丨〇丨4偵測出死結狀態（亦即，所暫存之B 一wrapk唬12 14為偽值，與F—wrap信號丨202、落空信號與預測信號1 2 0 8為真值），這將於底不詳述，則控 =f 5 S =1 014儲存一真值於一死結旗標暫存器1 2 24内以 t Ϊ死結狀態’使得造成死結狀態之BTAC142内之至死結無效資料暫存^ 2邏2 t電ί1 載人死結無效資料無效資料暫存器1 2 2 2 η内之^二f例同中，存於死結要求1除了未儲存分支指令位址m外，因為該分支指12828twf.ptd Page 37 200414034 V. Description of the invention (31) In the line, it means that the branch instruction is to be executed, and on behalf of BTAC 142, the target address of the branch instruction is set on the BTAC predicted target address signal 1 64. The BTAC142 generates a T / NT signal 1212 according to the value of the prediction state A 6 0 2 or the prediction state B 6 0 4 in FIG. 6, depending on whether the BTAC 142 uses the A or B part in the branch prediction. The BTAC142 also generates a B_wrap signal 1214 and outputs it to the control logic circuit 1014. The selected BTAC target address array item 312 in FIG. 4 has a value of span bit 406 set to β-wrap signal 214. Therefore, the false value of the B_wrap signal 1214 represents that BTAC1 42 predicts that the branch instruction does not cross the two cache lines. In one embodiment, the control logic circuit 1014 temporarily stores the B_wrap signal 1214 to maintain the value of the B-wrap signal 1214 obtained from the previous BTAC142 access. Control, the logic circuit 1014 also generates the current instruction index 168 of FIG. The control logic circuit 104 also generates a control signal 2104, which is an input selection signal of the multiplexer 136. If the control logic circuit 丨〇丨 4 detects the dead-knot state (that is, the temporarily stored B-wrapk bluff 12 14 is a false value, and the F-wrap signal 丨 202, the failure signal and the prediction signal 1 2 0 8 are true Value), which will not be detailed at the end, then control = f 5 S = 1 014 stores a true value in a dead knot flag register 1 2 24 with t Ϊ dead knot state 'so that the BTAC142 causing the dead knot state Temporary storage of dead data invalid 2 Logic 2 t ί1 Manned dead data invalid data temporary register 1 2 2 2 η The same as in the example of f, stored in dead knot request 1 except that the branch instruction address m is not stored Outside, because the branch refers to

12828twf.ptd 200414034 五、發明說明（32) 令之位址係存於死結位址暫存器1 22 6内；以及未儲存目標位址7 0 6 ’開始位元7 0 8與橫跨位元7 1 2，因為在一無效 BTAC1 42項目内，這些位元是無關緊要的；因而，當執行死結無效化時，目標位址陣列3 0 2未被寫入，而只有標籤陣= 3 0 4被更新以將誤測之BTAC142之項目無效化。死結無效貢料暫存器1 2 2 2之輸出包括第2圖之死結資料信號2 4 6。死結旗標暫存器1224之輸出包括第2圖之死結要求216。死結位址暫存器丨226之輸出包括第2圖之死結位址236。存於死結無效資料暫存器丨2 2 2内之該向值7 2 4係由造成該死結狀態之該BTAC 142之該向填入。如果控制邏輯電路1 〇 1 4偵測出死結狀態，則在將誤測項目無效化後，控制邏輯電路丨〇丨4也產生一值於控制信號 1 2 0 4上以使得該多工器丨3 0 6選擇該目前指令指標丨6 8以造成微處理器1 0 〇之分支，使得包括該誤測分支指令之該快取線可被再次擷取。現參考第1 3圖，顯示根據本發明之第1 2圖之死結避免邏輯電路之操作流，程圖。流程開始於方塊1 3 0 2。在方塊1 3 0 2，目前擷取位址1 6 2係經由位址信號1 8 2而輸入至指令快取104與輸入至BTAC142。在第13圖中，該目前擷取位址1 6 2係稱為擷取位址A。流程接續至方塊1 3 0 4。在方塊1 3 0 4，指令快取1 〇 4將擷取位址A所指定之快取線（稱為快取線A )提供至指令緩衝器1 〇 6，快取線A包括分支指令之第一部份，但並無包括該分支指令之全部。流程接續至方塊1 3 0 6。12828twf.ptd 200414034 V. Description of the invention (32) The address of the order is stored in the dead-end address register 1 22 6; and the target address 7 0 6 'start bit 7 0 8 and straddle bit are not stored 7 1 2, because in an invalid BTAC1 42 item, these bits are irrelevant; therefore, when dead-knot invalidation is performed, the target address array 3 0 2 is not written, and only the tag array = 3 0 4 Updated to invalidate mistested BTAC142 items. The output of the dead knot invalid tribute register 1 2 2 2 includes the dead knot data signal 2 4 6 in FIG. 2. The output of the dead knot flag register 1224 includes the dead knot request 216 of FIG. 2. The output of the dead-node address register 226 includes the dead-node address 236 in FIG. 2. The value 7 2 4 stored in the dead knot invalid data register 丨 2 2 2 is filled by the corresponding direction of the BTAC 142 that caused the dead knot state. If the control logic circuit 104 detects a dead-knot state, the control logic circuit 丨丨 4 also generates a value on the control signal 1024 to make the multiplexer after invalidating the mismeasured item. 3 0 6 selects the current instruction index 丨 6 to cause a branch of the microprocessor 100, so that the cache line including the mismeasured branch instruction can be retrieved again. Referring now to FIG. 13, the operation flow and process diagram of the dead knot avoidance logic circuit according to FIG. 12 of the present invention are shown. The process starts at block 1 3 02. At block 1 3 0 2, the current captured address 1 6 2 is input to the instruction cache 104 and input to the BTAC142 via the address signal 1 8 2. In Figure 13, the current fetch address 16 is referred to as fetch address A. The process continues to block 1 3 0 4. In block 1 304, instruction cache 1 04 provides the cache line designated by fetch address A (referred to as cache line A) to instruction buffer 1 06. Cache line A includes the branch instruction. The first part does not include all of the branch instructions. The flow continues to block 1 3 0 6.

12828twf.ptd 第39頁 200414034 五、發明說明（33) 在方塊1 3 0 6，回應於擷取位址A，BTAC142預測快取線 A内之分支指令將被執行並設於了 / NT信號丨2 1 2上，產生為偽值之B〜wrap信號1 2 1 4，並將一預測目標位址設於BTAC預測目標位址1 6 4上。流程接續至方塊1 3 0 8。在方塊1 3 0 8，控制邏輯電路1 0 1 4控制多工器1 3 6以選擇B T A C預測目標位址1 6 4為下一個目前擷取位址1 6 2，稱為擷取位址B。控制邏輯電路1 〇 1 4也產生為真值之預測信號 1 2 0 8 ’因為B T A C預測目標位址1 6 4是預測性的。流程接續至方塊1 3 1 2。在方塊1 3 1 2，指令快取1 0 4產生為真值之落空信號 1 2 0 6以代表分支位址B係落空於指令快取1 〇 4内。正常下，指令擷取器1 0 2可能從記憶體擷取該落空快取線；然而，因為預測信號1 2 0 8為真，指令規格化器1 〇 8並不記憶體擷取該落空快取線，理由將於底下描述。流程接續至方塊 1314 〇在方塊1 3 1 4 ’指令規格化器1 0 8解碼指令緩衝器1 q 6内之快取線A並產生為真值之F-wrap信號1 2 0 2，因為該分支指令橫跨兩快取線。指令規格化器1 0 8等待要存於指令緩衝器1 0 6内之下一快取線，使得其可完成對分支指令之規格化以將之輸出至規格化後指令佇列1 1 2。流程接^至方塊 1 3 1 6。、、在方塊1 3 1 6，控制邏輯電路1 0 1 4決定：所暫存之 B一wrap信號1214是否為偽值，F — wrap信號1 2 0 2是否為真值，落空信號1 2 0 6是否為真值與預測信號1 2 〇 8是否為真'12828twf.ptd Page 39 200414034 V. Description of the invention (33) At block 1 3 0 6 in response to the fetch address A, BTAC142 predicts that the branch instruction in the cache line A will be executed and set to the / NT signal 丨On 2 1 2, a B ~ wrap signal 1 2 1 4 with a false value is generated, and a prediction target address is set on the BTAC prediction target address 1 6 4. The process continues to block 1 3 0 8. At block 1 3 0 8, the control logic circuit 1 0 1 4 controls the multiplexer 1 3 6 to select the BTAC prediction target address 1 6 4 as the next current fetch address 1 6 2 and is called fetch address B. . The control logic circuit 1 0 1 4 also generates a prediction signal 1 2 0 8 'which is a true value because the B T A C prediction target address 16 4 is predictive. The process continues to block 1 3 1 2. In block 1 3 1 2, the instruction cache 1 0 4 generates a failure signal of true value 1 2 0 6 to represent that the branch address B falls into the instruction cache 1 104. Normally, the instruction fetcher 102 may fetch the missed cache line from the memory; however, because the prediction signal 12 08 is true, the instruction normalizer 1 08 does not fetch the missed fast memory. Take the line, the reason will be described below. The flow continues to block 1314. At block 1 3 1 4 'the instruction normalizer 1 0 8 decodes the cache line A in the instruction buffer 1 q 6 and generates a true F-wrap signal 1 2 0 2 because this The branch instruction spans two cache lines. The instruction normalizer 1 0 8 waits for the next cache line to be stored in the instruction buffer 106, so that it can complete the normalization of the branch instruction to output it to the normalized instruction queue 1 12. The process goes to block 1 3 1 6. In block 1 3 1 6, the control logic circuit 1 0 1 4 decides whether the temporarily stored B_wrap signal 1214 is a false value, and whether F — wrap signal 1 2 0 2 is a true value, and the frustrating signal 1 2 0 Whether 6 is true and the prediction signal 1 2 0 8 is true '

12828twf.ptd 第40頁 200414034 五、發明說明（34) 值；這包括了底下所描述之死結狀態。如果是，流程接續至方塊1 3 1 8 ;否則，流程結束。在方塊1 3 1 8，控制邏輯電路1 0 1 4將造成死結狀態之該 BTAC1 42項目無效化，如參考第1 2圖所述。接著，當下次將擷取位址A輸入至BTAC 142時，BTAC 142將產生一落空，因為造成死結狀態之該項目現已被無效化。流程接續至方塊1 3 2 2。在方塊1 3 2 2，控制邏輯電路1 0 1 4控制多工器1 3 6以分支至目前指令指標1 6 8，如參考第1 2圖之描述。此外，當控制邏輯電路1 0 1 4控制該多工器1 3 6選擇目前指令指標1 6 8 時，控制邏輯電路1 0 1 4產生為偽值之預測信號1 2 0 8，因為目前指令指標1 6 8不是預測性記憶體位址。很可能目前指令指標1 6 8會命中於指令快取1 0 4内；然而，如果沒命中的話，指令擷取器1 0 2將從記憶體擷取目前指令指標1 6 8所指定之快取線，因為預測信號1 2 0 8代表目前指令指標1 6 8不是預測性。流程結束於方塊1 3 2 2。如果決定方塊·1 3 1 6為真時，存在有死結狀態之理由在於，造成死結之必要情況是存在的。造成死結之第一情況是橫跨於兩不同快取線之多位元組分支指令。亦即，該分支指令位元組之第一部份係位於第一快取線之尾端，而該分支指令位元組之第二部份係位於下一快取線之開端。因為橫跨分支指令之可能性，該BTAC1 42必需儲存預測一分支指令是否橫跨快取線之資訊，使得控制邏輯電路1 〇 1 4得知是否要擷取下一快取線以在擷取位於目標位址1 6 4之快12828twf.ptd Page 40 200414034 V. Description of the invention (34) value; this includes the dead knot state described below. If yes, the process continues to block 1 3 1 8; otherwise, the process ends. At block 1 3 1 8, the control logic circuit 1 0 1 4 invalidates the BTAC1 42 item that caused the dead knot state, as described with reference to FIG. 12. Then, the next time the capture address A is input to the BTAC 142, the BTAC 142 will fail, because the item that caused the deadlock status is now invalidated. The process continues to block 1 3 2 2. At block 1 3 2 2, the control logic circuit 1 0 1 4 controls the multiplexer 1 3 6 to branch to the current instruction index 1 6 8 as described with reference to FIG. 12. In addition, when the control logic circuit 1 0 1 4 controls the multiplexer 1 3 6 to select the current instruction index 1 6 8, the control logic circuit 1 0 1 4 generates a prediction signal 1 2 0 8 which is a false value because the current instruction index 1 6 8 is not a predictive memory address. It is likely that the current instruction pointer 1 6 8 will hit the instruction cache 1 104; however, if there is no hit, the instruction fetcher 102 will retrieve the cache specified by the current instruction pointer 1 6 8 from memory. Line, because the predicted signal 1 2 0 8 represents the current instruction indicator 1 6 8 is not predictive. The process ends at block 1 3 2 2. If the decision block · 1 3 1 6 is true, the reason for the existence of a dead knot is that the necessary conditions for the dead knot exist. The first situation that causes the deadlock is a multi-byte branch instruction that spans two different cache lines. That is, the first part of the branch instruction byte is located at the end of the first cache line, and the second part of the branch instruction byte is located at the beginning of the next cache line. Because of the possibility of spanning branch instructions, the BTAC1 42 must store information that predicts whether a branch instruction crosses the cache line, so that the control logic circuit 104 knows whether to fetch the next cache line to fetch Located at the target address 1 6 4

12828twf.ptd 第41頁 200414034 五、發明說明（35) ' 取線之前就取根八丄广传分支指令位元組之下半部。如果BTAC 142 a二 > 二$ ?預測資訊，BT AC 1 4 2可能會錯誤地預測為該 ^ #二Λ Q松跨’但實際上有橫跨。在此例下，該指令規 "^ 將利用分支指令之前半部來解碼該快取線並偵 /則σ已子在有一分支指令，但並非分支指令之全部位元組已=用於解碼。該指令規格化器1 0 8會等待下一快取線。 e亥官、=會一直等待要被規格化之更多指令以將之執行。八 f f死，情況之第二情況是，因為該BTAC142預測該分支指令未橫跨，該分支控制邏輯電路1 0 1 4擷取該12828twf.ptd Page 41 200414034 V. Description of the invention (35) 'Before the line is taken, the root half of the instruction byte of the broadcast branch is taken. If BTAC 142 a 2 > 2? Prediction information, BT AC 1 4 2 may mistakenly predict that ^ # 二 Λ Q loose span ’but actually has a span. In this example, the instruction specification "^ will use the first half of the branch instruction to decode the cache line and detect / then σ has a branch instruction, but not all the bytes of the branch instruction have been used for decoding . The instruction normalizer 108 will wait for the next cache line. e Haiguan, = will always wait for more instructions to be normalized to execute. Eight f f is dead, the second case is because the BTAC142 predicts that the branch instruction does not cross, the branch control logic circuit 1 0 1 4 fetches the

BjAC1 42輸出之目標位址1 64所暗指之快取線（並無擷取下一快取線）°然而，該目標位址1 6 4落空於該指令快取1 〇 4 内。因此’該指令規格化器1 0 8所等待之下一快取線必需從記憶體擷取。造成死結情況之第三情況是，微處理器之晶片組並無預，，會有從某些記憶體位址範圍内擷取出指令，以及如果該微處理器從未預期之記憶體位址範圍產生指令擷取時’微處理器之晶·片組可能會使得系統閒置或產生其他不良之系統情況。預測性位址，比如BTAC 142所輸出之目標位址1 6 4 ’可能會從未預期之記憶體位址範圍造成指令擷取。因而’該微處理器1 0 0並無從記憶體之一預測性BTAC 預測目標位址1 β 4擷取一落空快取線。因此’指令規格化器1 0 8與管線之其他部份係等待另一快取線。同時，該指令擷取器丨〇2係等待該管線以告知要執行一非預測性擷取。在非死結情況下，比如，如果該The cache line implied by the target address 1 64 output by BjAC1 42 (the next cache line is not retrieved) ° However, the target address 1 64 falls into the instruction cache 1 04. So 'the instruction normalizer 108 waits for the next cache line to be fetched from memory. The third situation that causes the deadlock situation is that the microprocessor chipset is unpredictable, it will fetch instructions from certain memory address ranges, and if the microprocessor generates instructions from an unexpected memory address range When fetching, the 'microchip crystal chip group' may leave the system idle or cause other undesirable system conditions. Predictive addresses, such as BTAC 142's target address 1 6 4 ′ may cause instruction fetches from unexpected memory address ranges. Therefore, 'the microprocessor 100 does not retrieve a missed cache line from one of the predictive BTAC predicted target addresses 1 β 4 in the memory. So the 'instruction normalizer 108 and the rest of the pipeline are waiting for another cache line. At the same time, the instruction fetcher 02 waits for the pipeline to inform it to perform a non-predictive fetch. In non-knot situations, such as if the

12828twf.ptd 第42頁 200414034 五、發明說明（36) 目標位址1 6 4命中於指令快取1 0 4内，指令規格化器1 0 8會將分支指令規格化（雖然是利用不正確的位元組）與將規格化後之分支指令提供至分支之執行階段，執行階段會偵測出誤測並將BTAC 1 4 2之誤測更正，因而使得該預測信號 1 2 0 8變成偽值。然而，在死結情況下，該執行將永遠無法偵測出誤測，因為指令規格化器1 0 8未將規格化後之分支指令提供至分支之執行階段，因為指令規格化器1 0 8仍在等待下一快取線。因此，發生死結情況。然而，第1 2圖之死結避免邏輯電路可有效避免死結情況之發生，如第1 2圖與第1 3圖所述，因而使得微處理器1 0 0可適當操作。雖然已詳細描述本發明與其目的，特徵與優點，本發明仍可包括其他實施例。比如，雖然該寫入佇列係相關於單埠BTAC，在某些微處理器架構中，偽性落空也可能發生於多埠BTAC中，儘管頻率較低。因此，可應用該寫入佇列以減少多埠BTAC之偽性落空率。此外，在未讀取BTAC之某些微處理器中，可能也有除了在此所描述情況外之其他情況，其中佇列於該·寫入佇列内之要求可寫入至BTAC。另，雖然已詳細描述本發明與其目的，特徵與優點，本發明仍可包括其他實施例。除了利用硬體來實施本發明外，本發明也可實施於電腦可用式（比如，可讀式）媒介内之電腦可讀碼（比如，電腦可讀程式碼，資料等）。電腦碼可完成所揭露之本發明之功能或製造或兩者皆可。比如，可利用一般程式語言（比如，C，C + +，J A V A等）；G D S I I資料庫；硬體描述語言（hard description language ，12828twf.ptd Page 42 200414034 V. Explanation of the invention (36) The target address 1 6 4 hits the instruction cache 1 0 4 and the instruction normalizer 1 0 8 will normalize the branch instruction (although it is incorrectly used Bytes) and the normalized branch instruction is provided to the execution phase of the branch. The execution phase will detect the mismeasurement and correct the mismeasurement of BTAC 1 4 2 so that the predicted signal 1 2 0 8 becomes a false value. . However, in the case of a dead knot, the execution will never be able to detect false positives, because the instruction normalizer 108 did not provide the normalized branch instruction to the execution stage of the branch, because the instruction normalizer 108 still Waiting for the next cache line. Therefore, a dead knot situation occurs. However, the dead-knot avoidance logic circuit in Fig. 12 can effectively prevent the occurrence of the dead-knot situation, as shown in Figs. 12 and 13, so that the microprocessor 100 can operate properly. Although the invention and its objects, features, and advantages have been described in detail, the invention may include other embodiments. For example, although the write queue is related to the port BTAC, in some microprocessor architectures, spurious failure may also occur in the multi-port BTAC, albeit at a lower frequency. Therefore, the write queue can be applied to reduce the false failure rate of multi-port BTAC. In addition, in some microprocessors that have not read BTAC, there may be other situations besides those described here, where the requirements listed in the write queue can be written to BTAC. In addition, although the present invention and its objects, features, and advantages have been described in detail, the present invention may include other embodiments. In addition to using hardware to implement the present invention, the present invention can also be implemented in computer-readable codes (such as computer-readable codes, data, etc.) in computer-usable (eg, readable) media. The computer code can perform the functions or manufacture of the disclosed invention or both. For example, you can use general programming languages (such as C, C ++, J A V A, etc.); G D S I I database; hard description language (hard description language,

12828twf.ptd 第43頁 200414034 五、發明說明（37) HDL)，包括Verilog HDL, VHDL, Altera HDL(AHDL)等；或現有之其他程式及/或電路（亦即概要式）擷取工具。電腦碼可載入於包括半導體記憶體，磁碟，光碟（比如， CD-ROM，DVD-ROM等）之任意習知電腦可用式（比如，可讀式）媒介内；以及以電腦資料信號之形式實施於電腦可用式（比如，可讀式）傳輸媒介（比如，載波，或包括數位，光學或類比式媒介之其他媒介）。因此，電腦碼可傳輸於包括網際網路與企業網路（指令t r an e t )通訊網路上。要知道，本發明可實施於電腦碼（比如，I P (智財權）核心之一部份，比如為微處理器核心，或為系統級設計，比如系統單晶片（S0C))與轉換成積體電路之部份硬體。另，本發明可實施成硬體與電腦碼之組合。雖然本發明已以一較佳實施例揭露如上，然其並非用以限定本發明，任何熟習此技藝者，在不脫離本發明之精神和範圍内，當可作些許之更動與潤飾，因此本發明之保護範圍當視後附之申請專利範圍所界定者為準。12828twf.ptd Page 43 200414034 V. Description of the Invention (37) HDL), including Verilog HDL, VHDL, Altera HDL (AHDL), etc .; or other existing programs and / or circuits (that is, summary) capture tools. Computer code can be loaded into any conventional computer-usable (eg, readable) medium including semiconductor memory, magnetic disks, and optical disks (eg, CD-ROM, DVD-ROM, etc.); Forms are implemented on computer-usable (eg, readable) transmission media (eg, carrier waves, or other media including digital, optical, or analog media). Therefore, the computer code can be transmitted on a communication network including the Internet and the corporate network (command tran e t). It should be understood that the present invention can be implemented in computer code (for example, part of an IP (Intellectual Property Right) core, such as a microprocessor core, or for a system-level design, such as a system-on-a-chip (S0C)) Part of the hardware of the circuit. In addition, the present invention can be implemented as a combination of hardware and computer code. Although the present invention has been disclosed as above with a preferred embodiment, it is not intended to limit the present invention. Any person skilled in the art can make some changes and retouch without departing from the spirit and scope of the present invention. The scope of protection of the invention shall be determined by the scope of the attached patent application.

12828twf.ptd 第44頁 200414034 圖式簡單說明第1圖顯示根據本發明之微處理器之方塊圖。第2圖顯示根據本發明之第1圖之微處理器之部份詳細方塊圖。第3圖顯示根據本發明之第1圖之BTAC之部份詳細方塊圖。第4圖顯示根據本發明之第3圖之目標位址陣列項目内容之方塊圖。第5圖顯示根據本發明之第3圖之標籤陣列項目内容之方塊圖。第6圖顯示根據本發明之第3圖之計數器陣列項目内容之方塊圖。第7圖顯示根據本發明之第1圖之B T A C寫入要求内容之方塊圖。第8圖顯示根據本發明之第3圖之BTAC寫入佇列之方塊圖。第9圖顯示根據本發明之第1圖之BTAC寫入佇列之操作流程圖。 - 第1 0圖顯示根據本發明之第1圖之該微處理器内之該 BTAC之多餘目標位址無效邏輯電路之方塊圖。第1 1圖顯示根據本發明之第1 0圖之多餘目標位址裝置之操作流程圖。第1 2圖顯示根據本發明之第1圖之該微處理器内之死結避免邏輯電路之方塊圖。第1 3圖顯示根據本發明之第1 2圖之死結避免邏輯電路12828twf.ptd Page 44 200414034 Brief Description of Drawings Figure 1 shows a block diagram of a microprocessor according to the present invention. Figure 2 shows a detailed block diagram of a portion of a microprocessor according to Figure 1 of the present invention. Figure 3 shows a detailed block diagram of a portion of the BTAC according to Figure 1 of the present invention. Fig. 4 is a block diagram showing the contents of a target address array item according to Fig. 3 of the present invention. Fig. 5 is a block diagram showing the contents of a label array item according to Fig. 3 of the present invention. Fig. 6 is a block diagram showing the contents of a counter array item according to Fig. 3 of the present invention. Fig. 7 is a block diagram showing the contents of a B T A C write request according to Fig. 1 of the present invention. Fig. 8 shows a block diagram of a BTAC write queue according to Fig. 3 of the present invention. Fig. 9 shows a flowchart of the operation of the BTAC write queue according to Fig. 1 of the present invention. -Fig. 10 shows a block diagram of the redundant target address invalid logic circuit of the BTAC in the microprocessor according to Fig. 1 of the present invention. FIG. 11 shows a flowchart of the operation of the redundant target address device according to FIG. 10 of the present invention. Fig. 12 shows a block diagram of a dead-knot avoidance logic circuit in the microprocessor according to Fig. 1 of the present invention. Figure 13 shows the dead-knot avoidance logic circuit according to Figure 12 of the present invention.

12828twf.ptd 第45頁 200414034 圖式簡單說明之操作流程圖。圖式標示說明： 100 微處理器 102 指令擷取器 1 04 指令快取 106 指令緩衝器 108 指令規格化器 112 規格化指令佇列 1 14 指令轉譯器 116 轉譯後指令佇列 118 暫存器階段 122 位址階段 124 資料階段 126 執行階段 128 儲存階段 132 寫回階段 · 134 力Π法器 136 : ，1 4 8 ，2 0 6 :多工器 138 指令 142 BTAC 144 BTAC寫入佇列（BWQ) 146 佇列深度 152 分支誤測信號12828twf.ptd Page 45 200414034 A simple illustration of the operation flow chart. Graphical description: 100 microprocessor 102 instruction fetcher 1 04 instruction cache 106 instruction buffer 108 instruction normalizer 112 normalized instruction queue 1 14 instruction translator 116 instruction queue after translation 118 register stage 122 address phase 124 data phase 126 execution phase 128 storage phase 132 write-back phase · 134 force ii processor 136:, 148, 2 0 6: multiplexer 138 instruction 142 BTAC 144 BTAC write queue (BWQ) 146 queue depth 152 branch mismeasured signal

12828twf.ptd 第46頁 200414034 圖式簡單說明 1 5 4 :預測取代信號 1 5 6 :指令緩衝器全滿信號 1 5 8 :指令快取閒置信號 1 6 2 :目前掘取位址 1 6 4 :預測目標位址 1 6 6 :下一操取位址 1 6 8 :目前指令指標 1 7 2 :正確位址 1 7 4 :取代預測目標位址 1 76 : BTAC寫入要求 1 7 8 : B T A C寫入佇列位址 1 8 2 :位址 2 0 2 :仲裁器 2 1 2 : B T A C讀取要求信號 214 :多餘目標位址（TA)要求信號 2 1 6 :死結要求信號 218 : BWQ非空信號- 2 2 2 : BWQ全滿信號 2 3 4 :多餘TA位址 2 3 6 :死結位址 2 44 :多餘TA資料信號 2 4 6 :死結資料信號 2 4 8 : BWQ資料信號 2 5 2，2 5 8，2 6 2，1 2 0 4 :控制信號12828twf.ptd Page 46 200414034 Brief description of the diagram 1 5 4: Predictive replacement signal 1 5 6: Instruction buffer full signal 1 5 8: Instruction cache idle signal 1 6 2: Current mining address 1 6 4: Predicted target address 1 6 6: Next operation address 1 6 8: Current instruction index 1 7 2: Correct address 1 7 4: Replace predicted target address 1 76: BTAC write request 1 7 8: BTAC write Queue address 1 8 2: Address 2 0 2: Arbiter 2 1 2: BTAC read request signal 214: Extra target address (TA) request signal 2 1 6: Dead-knot request signal 218: BWQ non-empty signal -2 2 2: BWQ full signal 2 3 4: Extra TA address 2 3 6: Dead node address 2 44: Extra TA data signal 2 4 6: Dead node data signal 2 4 8: BWQ data signal 2 5 2, 2 5 8, 2 6 2, 1 2 0 4: control signal

12828twf.ptd 第47頁 200414034 圖式簡單說明 2 5 6 :資料信號 3 0 2 :目標位址陣列 3 0 4 :標籤陣列 3 0 6 :計數器陣列 3 1 2 :目標位址陣列項目 3 1 4 :標籤陣列項目 3 1 6 :計數器陣列項目 4 0 2 :分支目標位址 4 0 4，7 0 8 :開始欄位 4 0 6 :橫跨位元 5 0 2 :標籤 5 0 4 ·· A有效位元 5 0 6 ·· B有效位元 5 0 8 : 1 r u 欄位 6 0 2 :預測狀態A計數器 6 0 4 :預測狀態B計數器 606 :A/Blru位元· 7 0 2 :分支指令位址欄位 7 0 6 :目標位址 7 1 2 :橫跨位元 7 1 4 :寫入致能A欄位 71 6 :寫入致能B欄位 7 1 8 ·無效A棚位 7 2 2 ·無效B搁位12828twf.ptd Page 47 200414034 Brief description of the diagram 2 5 6: Data signal 3 0 2: Target address array 3 0 4: Tag array 3 0 6: Counter array 3 1 2: Target address array item 3 1 4: Label array item 3 1 6: Counter array item 4 0 2: Branch target address 4 0 4, 7 0 8: Start field 4 0 6: Stride bit 5 0 2: Label 5 0 4 ·· A valid bit Element 5 0 6 · B effective bit 5 0 8: 1 ru Field 6 0 2: Prediction state A counter 6 0 4: Prediction state B counter 606: A / Blru bit · 7 0 2: Branch instruction address Field 7 0 6: target address 7 1 2: stride bit 7 1 4: write enable A field 71 6: write enable B field 7 1 8 · invalid A booth 7 2 2 · Void B

12828twf.ptd 第48頁 200414034 圖式簡單說明 7 2 4 :向欄位 8 0 2 :儲存元件 8 0 4，1 0 0 4 ··有效位元 8 0 6，1 0 1 4 :控制邏輯電路 1 0 0 2 :標籤 1 0 0 6 ··匹配信號 1 0 1 2 :比較器 1 0 2 2 :多餘TA無效資料暫存器 1 0 2 4 :多餘TA旗標暫存器 1 0 2 6 :多餘TA位址暫存器 1202 :F_wrap 信號 1 2 0 6 :落空信號 1 2 0 8 :預測信號 1212 :執行/不執行（T/NT)信號 1214 :B_wrap 信號 1 2 2 2 :死結無效資料暫存器 1 2 2 4 :死結旗標暫·存器 1 2 2 6 ··死結位址暫存器12828twf.ptd Page 48 200414034 Brief description of the diagram 7 2 4: To the field 8 0 2: Storage element 8 0 4, 1 0 0 4 ·· Effective bit 8 0 6, 1 0 1 4: Control logic circuit 1 0 0 2: Tag 1 0 0 6 ·· Matching signal 1 0 1 2: Comparator 1 0 2 2: Extra TA invalid data register 1 0 2 4: Extra TA flag register 1 0 2 6: Extra TA address register 1202: F_wrap signal 1 2 0 6: Failure signal 1 2 0 8: Prediction signal 1212: Execute / do not execute (T / NT) signal 1214: B_wrap signal 1 2 2 2: Dead knot invalid data temporary storage Register 1 2 2 4: Dead-Knot Flag Register 1 2 2 6 · Dead-Knot Address Register

12828twf.ptd 第49頁12828twf.ptd Page 49

Claims

200414034 6. Scope of Patent Application 1. A write queue to improve the efficiency of a branch target address cache (BTAC) in a microprocessor. The write queue includes: a request for input, a request for update The branch target address cache, the request includes a branch instruction target address; a plurality of storage elements, which store the requests received by the request input end; and a control logic circuit, which is coupled to the storage elements, and responds to One or more of the predetermined conditions write one of the requests stored in the storage elements to the branch target address cache. 2. The write queue described in item 1 of the scope of patent application, further comprising: a cache idle input, coupled to the control logic circuit, and accessing an instruction fast when parallel to the branch target address cache. When taken as idle, one of the one or more predetermined conditions that specifies that the branch target address cache is not read. 3. The write queue as described in item 1 of the scope of patent application, further including: a buffer full input coupled to the control logic circuit, because an instruction buffer is full, it specifies that the branch target address is fast Fetch one of the one or more established cases that have not been read, wherein the instruction buffer receives instructions from an instruction cache that is accessed in parallel to the branch target address cache. 4. The write queue as described in item 1 of the scope of patent application, further comprising: a prediction replaces an input and is coupled to the control logic circuit because one of the first branch instruction prediction systems completed by the branch target address cache It is replaced by a second branch instruction prediction performed by a branch prediction logic circuit in the microprocessor, which specifies that the branch target address cache is not read in one of the one or more predetermined cases.

12828twf.ptd Page 50 200414034 6. Scope of patent application 5. The write queue as described in item 1 of the scope of patent application, including: a branch mis-test input, coupled to the control logic circuit, because the The branch target address cache completes a branch instruction misdetection, which specifies that the branch target address cache is not read in one of the one or more predetermined cases. 6. The writing queue described in item 1 of the scope of patent application, further comprising: a queue full input, coupled to the control logic circuit, designating that all storage elements are storing to be written to the branch target address Cache one requires one of the one or more established situations. 7. The write queue described in item 1 of the scope of patent application, further comprising: a plurality of valid bits, coupled to the control logic circuit, each valid bit indicates whether the requirement stored in the corresponding storage element is Is effective. 8. The write queue according to item 1 of the scope of patent application, wherein the request further includes a memory address of one of the branch instructions. 9. The write queue as described in item 1 of the scope of patent application, wherein the branch target address cache is an N-direction instruction set joint cache, wherein the request further includes specifying that the request is to be written to the Which N-direction information in the branch target address cache. 1 0 · — A microprocessor including: an instruction cache, providing a cache line of instruction bytes in response to an instruction fetch address; a branch target address cache (BTAC), coupled to the instruction cache Instruction cache, which predicts a branch target address of a branch instruction stored in the cache line; and a write queue coupled to the branch target address cache and stored for updating the branch target address cache Take the branch target address.

12828twf.ptd Page 51 200414034 VI. Patent application scope 1 1. The microprocessor described in item 10 of the patent application scope, wherein if the write queue is not empty, when the instruction cache is idle, Then the write queue uses one of the branch target addresses to update the branch target address cache. 12. The microprocessor as described in item 10 of the scope of patent application, further comprising: an instruction buffer that fits into the instruction cache and stores zero or more cache lines received from the instruction cache. 1 3. The microprocessor according to item 12 of the scope of patent application, wherein if the write queue is not empty, when the instruction buffer indicates that it is full, the write queue uses these One of the branch target addresses to update the branch target address cache. 14. The microprocessor according to item 10 of the scope of patent application, further comprising: a branch prediction logic circuit coupled to the write queue, wherein one of the branch instructions is completed at the branch target address cache. After a prediction, the branch prediction logic circuit completes a second prediction of the branch instruction, wherein the microprocessor uses the second prediction to replace the first prediction. 15. The microprocessor according to item 14 of the scope of patent application, wherein if the write queue is not empty, when the microprocessor uses the second prediction to replace the first prediction, the write The queue uses one of the branch target addresses to update the branch target address cache. 1 6 · The microprocessor described in item 10 of the patent application scope, further including:

12828twf.ptd Page 52 200414034 VI. Scope of Patent Application The branch decision logic circuit is coupled to the write queue to correct a false test of one of the branch instructions completed by the branch target address cache. 17. The microprocessor according to item 16 of the scope of patent application, wherein if the write queue is not empty, when the microprocessor corrects the branch target address cache, the branch instruction is completed. In the case of a false detection, the write queue uses one of the branch target addresses to update the branch target address cache. 18. The microprocessor according to item 10 of the scope of patent application, wherein if the write queue becomes full, the write queue uses one of the branch target addresses to update the branch target address. Cache. 19. The microprocessor according to item 10 of the scope of patent application, wherein when the write queue is writing to the branch target address cache, if the branch target address cache is read, the branch The destination address cache fails. 20. The microprocessor according to item 10 of the patent application scope, wherein the branch target address cache includes a port memory array to store a plurality of branch target addresses. 2 1. The microprocessor according to item 10 of the scope of patent application, wherein the branch target address cache includes a port memory array to store address tags of a plurality of branch instructions. 2 2. A method for updating a branch target address cache (BTAC) in a microprocessor, the method includes the following steps: generating a request to update the branch target address cache; storing the request in a frame Column; and after the storing step, the branch target address cache is updated according to the request.

12828twf.ptd Page 53 200414034 6. Application for Patent Scope 2 3. The method as described in Item 22 of the Patent Application Scope, wherein the step of updating the branch target address cache is performed by the microprocessing after the storage step Within one of the clock cycles. 2 4. The method described in item 22 of the scope of patent application, further comprising: determining whether the branch target address cache is not being read; wherein if the branch target address cache is not being read, Then perform this update step. 25. The method as described in item 24 of the scope of patent application, further comprising: determining whether the branch target address cache is not being read because one of the instruction caches coupled to the branch target address cache is idle. take. 26. The method according to item 24 of the scope of patent application, further comprising: determining whether the branch target address cache is not being read because an instruction buffer is full, wherein the instruction buffer receives from The instruction output from an instruction cache coupled to the branch target address cache. 27. The method as described in item 22 of the scope of patent application, further comprising: determining whether one of the first branch instruction predictions completed by the branch target address cache is predicted by the micro processing, and other branch prediction logic circuits in the device. One of the completed second branch instruction prediction substitutions; wherein if the first branch instruction prediction completed by the branch target address cache is replaced by the second branch instruction prediction, the updating step is performed. 2 8. The method as described in item 22 of the scope of patent application, further comprising: determining whether the branch target address cache has misdetected a branch instruction; wherein if the branch target address cache has misdetected a branch instruction , Then perform the update step.

12828twf.ptd Page 54 200414034 VI. Scope of patent application 29. The method described in item 22 of the scope of patent application further includes: determining whether the queue is full; where if the queue is full, then Perform this update step. 3 0. — A computer data signal that can be implemented in a transmission medium, including: computer-readable code that provides a microprocessor, the code includes: the first code that provides an instruction cache to respond to Provides one cache line of instruction byte at an instruction fetch address; the second code provides a branch target address cache (branch target address cache), which is coupled to the instruction cache to predict A branch target address of a branch instruction stored in the cache line; and a third code that provides a write column and is compiled into the branch target address cache to store and update the branch Branch target address of the target address cache.

12828twf.ptd Page 55