TWI285841B

TWI285841B - Branch target buffer, branch target buffer memory array, branch prediction unit and processor with a function of branch instruction predictions

Info

Publication number: TWI285841B
Application number: TW94116653A
Authority: TW
Inventors: Gi-Ho Park
Original assignee: Samsung Electronics Co Ltd
Priority date: 2004-07-16
Filing date: 2005-05-23
Publication date: 2007-08-21
Also published as: GB2416412A; GB0514599D0; TW200617777A; JP2006031697A; GB2416412B

Abstract

A branch target buffer (BTB) storing a data entry related to a branch instruction is disclosed. The BTB conditionally enables access to the data entry in response to a word line gating circuit associated with a word line in the BTB. The word line gating circuit stores a word line gating value derived from branch history data related to the instruction. Additionally, a branch prediction unit and a processor incorporating the BTB are disclosed, along with methods for operating the BTB.

Description

I2858417p,doc 功能步驟或階段而處理。每—處理階段通f在一單個的時脈週期内完成其要素操作。非管線(n〇n-Pipeiined)處理器是完成每一指令的處理之後才開始下-指令的處理，和—非管線的處理器不同， -管線todined)處理器在此管線之不同的處理階段㈤喊理祕指令。管線階段可由-設計者任意但通常來說，包括如下這些：即指令獲取階段、 ^解碼階段、指令執行階段以及執行分解(execution resolution)階段。取階段從指令當前存儲的任何地方（比如一主己憶體或-指令件列)查找一指— v 筱執由疋才日令所指定的一或多個操作。 (::=rr通常涉及將此指令執行而產生二用ΐ)’寫叫—❹料存料記_，用以作段轉:Ϊ令更適宜於在—蚊週_顧裏，從一管線階 ^轉到下-個階段。從而，在第— 獲取階段從存儲中獲取—第似曰令的硬體寄存器内，由此而進行解y。在入相關聯好，邀此取一第二指令，並將其排列的郝’在執補段襄，此指令起動-執行操 6 1285841 17007pif.doc 如：一邏輯的、數學的、定址的，或索引的操作)， =時廷指令解碼器階段對此第二指令解碼，而此指= 儲中獲取—第三指令。管線操作已知繼續刀白#又，在14樣的方式下，此處理器整體的比非管線的_有__縣的提高。物作速度’ 超量化的架構，可同時處理和/或執行兩條或多個執行(或 -鮮八= 丁的、同時並且獨立地執行入“”。&篁的(Scalar)系統在每個週期只能執行一條二非St指令是形成於一串管線的指令，還是此指令‘ “的方式執彳了。多條指令的同時執行，進—步地尚了一處理器的性能。扞的ΐΪΐ作已何置疑地提供了性能的好處，只要被執 β、"序職持⑧度的雜，或者是可刪。但不幸的部分触令糊’包含㈣可轉致_序執行路徑的“令。所謂的，，分支指令，，(包括比如：跳轉、返回，以 2件分支指令等），在—管線處理器裏產生了顯著的性能 ^ 除非可以實施一個有效的分支預測之方式。當一 =有預/則（或者是錯誤地預測)分支指令引起從這樣的指令歹J離開時，即從當前在此處理器中管線排列的指令序 = 性能損失就發生了。當這樣的損失發生，當前的管指令序列必須被扔掉或，，沖洗掉，，(fluShed)，而一新的札序列必須載入進到此管線裏。管線的沖洗操作浪費了很夕時脈週期，並通常降低了處理器的執行速度。 7 1285841 17007pif.doc 、：個提升和一分支指令相關的執行性能之方法，是預測这分支指令的結果，並插入一預測的指令到此管線讓其直接跟在這分支指令的後面。如^ ' = 裏實施’那麼這和管線沖洗相關的性能 Ϊ圭在如果此分支指令的結果被錯誤地預測時發傳統的技術和分析已經確認了許多分支指果疋^被正確地預測的，這有—很高的確定性— 對一一應用程式來說，有接近80%的確定性。因此，幾種分支預測機制的傳統類型已被發展 ^預測機制的其中一種類型’是利用—分支目標緩衝器來存儲很多的資料條目，其中每一資料條目和一分支= 相關聯。從而’這分支目標緩衝器存儲有許多的所謂，，分^ 位址標籤”(branch address tag)，每-分支位址標藏，作為 -分支指令分個索引。除此分支健標籤之外，每 -分支目標緩衝器的條目’更包括一目標位址过 address)、^指令操作碼(〇pc〇de)、分支歷史資訊，以友可 • 能其他的資料。在利用-分支目標緩衝器的-處理器中，此刀支預測機制監控每條進入此管線的指令。通當令的位址被監控，而且當此指令位址和這分二的一條目相匹配時，此指令就被識別為一分支指令。從相關的分支歷史資訊，此分支預測機制去確定此分支是否會被執行。分支歷史資訊一般地是由一狀態機(state maehin^ ^確定，此狀態機監控每條索引在這分支目標緩衝器裏的分支指令，並定義作為分支歷史資訊而存儲的資料，此資 8 1285841 17007pif.doc 訊涉及此分支在之前的操作中是否被執行過。當此分支歷史資訊指出這分支很可能會被執行時，一或多條預測的指令就被插入到這管線裏。按照慣例，每一分支目標緩衝器的資料條目，包括有和這分支指令相關聯的指令操作碼，這分支指令是正在被評估關於其分支歷^ ί δίΐ的扎令。在適當的由這分支預測機制指明之後，這此操作碼可被直接地插入到這管線之中。同樣地，每一^ 目標緩衝器的資料條目，包括有一，，目標位址，，，它和1 被評估這分支指令相關聯。此外，在適當機制指明之後，這目標似止被此分支預測單元作為刀一測位址”而輸出，並被用於獲取此指令序列中的下測對此分支指令及其隨後指令的處理，在此管線、曰^ 1幾7，朗在執行階段中這分㈣令t 止/、有在廷一時點上，此分支預測的丁為 γ此分支齡的絲之賴正確地賴，=77。器的執行，間斷的連續進行。】=此3理 ΐ:果::沒有被正確地預測到’那麼此管線被ί、先::令須被沖洗掉。夕兩倍數篁的指令必 9 1285841 17007pif.doc 箱、si =說明—傳統的分支目標緩衝11 1，其連接到分支和相關的硬體。分支目標緩衝器‘二ΐ Γ 解碼②3、—記賴陣列4,以及-讀出放大器I2858417p, doc function step or stage. Each processing phase completes its elemental operations in a single clock cycle. The non-pipelined (n〇n-Pipeiined) processor starts the processing of each instruction after the completion of the processing of each instruction. Unlike the non-pipelined processor, the pipeline is used in different processing stages of the pipeline. (5) Calling the secret instructions. The pipeline stage can be arbitrarily-designed by the designer but generally includes the following: the instruction acquisition phase, the ^decoding phase, the instruction execution phase, and the execution resolution phase. The fetch phase looks for a finger from anywhere where the instruction is currently stored (such as a primary memory or an instruction column)—v 一 one or more operations specified by the 日日. (::=rr usually involves the execution of this instruction and generates a dual-use ΐ) 'Write----------------------------------------------------------------------------------------------------------------------------------------------------------------------- Step ^ goes to the next stage. Thus, in the first acquisition phase, the hardware register is obtained from the storage, and the solution is solved by y. In the relevant association, invite this to take a second instruction, and arrange it in the 'Ha's section, this instruction starts-execute operation 6 1285841 17007pif.doc such as: a logical, mathematical, address, Or the operation of the index), = the timer instruction decoder stage decodes the second instruction, and this refers to the acquisition - the third instruction. The pipeline operation is known to continue. Knife White # Again, in the 14-type mode, the overall processor of this processor is higher than the non-pipeline ___ county. The object's speed's ultra-quantitative architecture can simultaneously process and/or perform two or more executions (or - fresh eight = s, simultaneously and independently into the "". & 篁 (Scalar) system in each Only one non-St instruction can execute an instruction formed in a string of pipelines, or the instruction of this instruction is executed. The simultaneous execution of multiple instructions advances the performance of a processor. The work of the game has undoubtedly provided the performance benefits, as long as the beta, " sequence is held at 8 degrees, or can be deleted. But the unfortunate part of the touch paste contains (four) can be transferred to the _ order execution path "Orders, so-called, branch instructions, (including such as: jump, return, with 2 branch instructions, etc.), produce significant performance in the pipeline processor ^ unless a valid branch prediction can be implemented When a = pre/th (or erroneously predicted) branch instruction causes a departure from such an instruction 歹J, ie, the instruction sequence from the current pipeline in this processor = performance loss occurs. Loss occurred, current The sequence of pipe instructions must be thrown away, flushed, (fluShed), and a new sequence must be loaded into the pipeline. The flushing operation of the pipeline wastes a very large clock cycle and usually reduces processing. The execution speed of the device. 7 1285841 17007pif.doc ,: The method of evaluating the execution performance associated with a branch instruction is to predict the result of the branch instruction and insert a predicted instruction into the pipeline to directly follow the branch instruction. The latter. If ^ ' = implementation in 'this is related to the performance of the pipeline flushing, if the results of this branch instruction are incorrectly predicted, the traditional techniques and analysis have confirmed that many branches are correct. Predicted, this has - very high certainty - nearly 80% certainty for one-to-one applications. Therefore, the traditional types of several branch prediction mechanisms have been developed ^ one of the types of prediction mechanisms 'is The branch-target buffer is used to store a large number of data items, each of which is associated with a branch = so that the branch target buffer stores a lot of , ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, The target address is over address), the ^ instruction opcode (〇pc〇de), branch history information, and other information can be used. In the processor that utilizes the - branch target buffer, this knife prediction mechanism monitors Each instruction that enters the pipeline. The address of the general command is monitored, and when the instruction address matches an entry of the second, the instruction is recognized as a branch instruction. From the relevant branch history information, The branch prediction mechanism determines whether the branch will be executed. The branch history information is generally determined by a state machine (state maehin^^, which monitors the branch instructions of each index in the branch target buffer and defines Information stored as branch history information, this resource 8 1285841 17007pif.doc relates to whether this branch was executed in the previous operation. When this branch history indicates that the branch is likely to be executed, one or more predicted instructions are inserted into the pipeline. By convention, the data entry for each branch target buffer includes the instruction opcode associated with the branch instruction, which is the arbitrage being evaluated for its branch history. This opcode can be inserted directly into this pipeline after it is properly indicated by this branch prediction mechanism. Similarly, the data entry for each ^ object buffer, including one, the destination address, is associated with the branch instruction being evaluated. In addition, after the appropriate mechanism is specified, the target appears to be output by the branch prediction unit as a tool address, and is used to obtain the processing of the branch instruction and its subsequent instructions in the sequence of instructions. This pipeline, 曰 ^ 1 7 , Lang in the execution phase of this point (four) let t stop /, there is a point in the court, the branch predicted that the gamma is the correct age of this branch, = 77. Execution of the device, intermittent continuous.] = This 3 theory: Fruit:: Not correctly predicted 'The pipeline is ί, first:: The order must be washed away. The double-digit command will be 9 1285841 17007pif.doc box, si = description - traditional branch target buffer 11 1, which is connected to the branch and associated hardware. Branch target buffer 'binary 解码 decode 23, - record array 4, and - sense amplifier

沪namp hfier)5。位址解碼器3從一指令獲取單元接收一二址，並選擇—條和這已解碼的指令位址相關聯的字兀線。字讀選擇是按照慣例崎行的，但其通常包括施力Π-字元線電壓到選中的字元線上。按照f慣…批字元線以一逐列之方式，從位址解碼器3伸展到記憶體陣列4。 S己憶體陣列4包括很多記憶胞，每個記憶胞存儲至少了資料位元。而資料條目，其每個包括許多的㈣位元，被&且地以列的方式存儲，由此—具體字元線的選擇，基本上存取-對應的資料條目。資料條目包括至少—資料棚位(data field) ’其定義一分支位址標籤，更包括另一資料欄位，其定義一目標位址。一字元線選擇的資料條目，被按慣例的從記憶體陣列4經由讀出放大器5輸出。從讀出放大器5 ,此分支位址標籤被傳達到一標籤比較寄存器(tag compare register)6,此寄存器也接收這指令的位址。來自a貝出放大^§ 5的目標位址，連同和非分支的指令序列（例如一 32位元的指令字處理器的一程式計數值 (PC)+4)相關聯的一位址一起，被傳達到多工器 (multiplexer)7。多工器的這兩個輸入的其中之一，被選擇出來而傳達到此指令4宁列(在此展示為一程式計數值多工器(PC multiplexer) 8)，此傳達是通過這樣的操作，即一邏輯閘9接收來自標籤比較寄存器6之結果和來自分支預測 I28584107pifdoc 邏輯走/不取走”(恤_财—)標識。 -，在此統的分支目標緩衝11遭受許多的缺點。第二體i t :示餘下的配置中，此分支目標緩衝器的記破每條分支指令存取，而不管這指令可能的結果。對刀支目標緩衝器的存取，一 ^ 取(READ)摔作，m 瓜涉及執仃一傳統之讀 nm/、由址解碼輯選巾的字元線。 €_取電能’以便提供記憶胞能源並 ❿ 彳° 出貝料’其中這些記憶胞與選中字it線相關聯。 ’、、' L應此浪費的情形’其他傳統分支目標緩衝器的設口已巴一致能線(enable iine)整合到此記憶體陣列的設計之中。美國專利號Νο·574〇417就是一個例子。在此文檔中頦示〜此提4的分支目標緩衝器包括一致能線，其啟動或 Τ閉字元線軸H，這些字元線驅_在讀取操作的過程中，和這記憶體陣列相關聯。這些字元線驅動器根據一，，取走/不取走”標識的狀態而被啟動或關閉，此標識預測一具體的指令是否不太可能地被執行。例如，當一具體指令 •的，走/不取走，，標識指出一，，強烈的不取走，，狀^二致能線轉變到一禁動準位，由此使得此記憶體陣線驅動器失去能力。于几不幸的是，這傳統的方法，在分支目標緩衝器存取操作過程中節省電能的同時，也帶來了很大的開銷。也就是= 這分支預測機制對此致能訊號之生成，需要不僅時間^有資源來，，預解碼”(pre-decode)此指令，然後確定其分支辦史資料和取走/不取走的狀態，然後並在需要時，改變此二能 I2858iUc 訊號的準位。隨著指令執行的速度以及指令管線深度的增加，分支的準確性和速度變得越來越重要。基於對此重要性的認識’很多傳統的處理器都整合了擴展的預解碼方案，由此所有的指令都被評估，分支指令被識別，而分支歷史資 t被查找出來’或對於正被評估和預解碼的分支指令，此分，歷5資訊則被動態地計算出來。不用說，這種預測分 ^指令行為的方法，需要很多的時間和很多的額外資源。列，理中，額外的延遲和複雜性，在-分支預測描㈣疋不期望的特性。然而’這正是很多傳統方法所提供的。電能消耗的問題，進一步使設計出勝變匕。不令人驚_，當代的處理器都= 上’即其特徵在於電能消耗上都有嚴格電腦和移練置，比如手機和個人數位助最小雷& 些裝置現成的例子，其整合了寧願消耗最少電能之處理器。 $ 型的述#此分支目標緩衝11是—高賴衝記憶體類目心：幻可能Ϊ儲很多的資料條目。因此，這分支有—記憶體_，並適宜於是一揮發的 5£M(v〇latlieme咖y)_ 大量資早乃其對電ι的祕，有的渴望制對此分支目標緩衝器的每一個存取，都意味著對此^支 12 1285841 17007pif.doc 目標缓衡器之記憶體陣列的一”讀取”操作。所有人都同意分支目標缓衝器的存取正在增加，並且一些估計提出，對此分支目標缓衝器之記憶體陣列的，，讀取，，操作，占了傳統處理器中電能消耗的足足10%。Shanghai namp hfier) 5. The address decoder 3 receives a binary address from an instruction fetch unit and selects the word line associated with the decoded instruction address. Word reading selection is routine, but it usually involves applying the force-character line voltage to the selected word line. The batch word line is stretched from the address decoder 3 to the memory array 4 in a column by column manner. The S memory array 4 includes a plurality of memory cells, each of which stores at least a data bit. The data items, each of which includes a plurality of (four) bits, are stored in a <and column by column, whereby - the selection of a particular word line, substantially access - the corresponding data item. The data entry includes at least a data field ‘which defines a branch address label and a further data field that defines a target address. The data items selected by one word line are conventionally output from the memory array 4 via the sense amplifier 5. From the sense amplifier 5, the branch address tag is passed to a tag compare register 6, which also receives the address of the instruction. The target address from a amplifying ^§ 5, along with an address associated with a non-branched instruction sequence (eg, a program count value (PC) + 4 of a 32-bit instruction word processor), It is communicated to the multiplexer 7. One of the two inputs of the multiplexer is selected and communicated to this instruction 4 column (shown here as a program multiplexer 8), this communication is through such an operation , that is, a logic gate 9 receives the result from the tag comparison register 6 and the logical walk/no take away from the branch prediction I28584107pifdoc. - The branch target buffer 11 in this system suffers from many disadvantages. Two-body it: In the remaining configuration, the branch target buffer is broken by each branch instruction access, regardless of the possible result of the instruction. Access to the knife target buffer, a READ drop For example, m melon involves a traditional reading of nm/, the word line from the address decoding series. €_takes electric energy to provide memory energy and ❿ 出° out of the shell material' where these memory cells and selected words The it line is associated with ',, 'L should be wasted'. Other traditional branch target buffers have been integrated into the design of this memory array. US Patent No. Νο· 574〇417 is an example. Here The branch target buffer of the present invention includes a consistent energy line that activates or closes the character spool H, which is associated with the memory array during the read operation. These word line drivers are activated or deactivated according to the status of the "take/take away" flag, which predicts whether a particular instruction is likely to be executed. For example, when a specific command, go/no take, the mark indicates one, and the strong does not take away, the shape of the second enable line transitions to a forbidden level, thereby making the memory line driver Lost ability. Unfortunately, this traditional approach saves power during branch target buffer access operations and also introduces significant overhead. That is, the branch prediction mechanism generates this signal, which requires not only time ^ resources, but pre-decode this instruction, and then determines its branch history data and take/take away status. Then, and if necessary, change the level of the two I2858iUc signals. As the speed of instruction execution and the depth of the instruction pipeline increase, the accuracy and speed of the branch become more and more important. Based on the understanding of this importance. 'Many traditional processors incorporate an extended pre-decoding scheme whereby all instructions are evaluated, branch instructions are identified, and branch history is looked up' or for branch instructions being evaluated and pre-decoded, This point, the calendar 5 information is dynamically calculated. Needless to say, this method of predicting the behavior of the instruction requires a lot of time and a lot of extra resources. Column, reason, extra delay and complexity, in - Branch prediction (4) 疋 undesired characteristics. However, 'this is exactly what many traditional methods provide. The problem of power consumption further makes the design win and lose. Not surprising _ Contemporary processors are all on the top, that is, they are characterized by strict computer and rehearsal for power consumption, such as mobile phones and personal digital assisted minimum thunder & some of the devices are ready-made examples, which integrate processors that prefer to consume the least amount of power. $型的述# This branch target buffer 11 is a high-reliance memory class: the illusion may store a lot of data items. Therefore, this branch has a memory _ and is suitable for a volatile £5M (v〇latlieme y) _ A lot of money is the secret of the electric y, some desire to make every access to this branch target buffer, means that this 12 branch 46088581 17007pif.doc target balancer a "read" operation of the memory array. Everyone agrees that the branch target buffer access is increasing, and some estimates are made about the memory array of this branch target buffer, read, Operation, accounting for 10% of the power consumption of traditional processors.

明顯的，很需要在新出現的處理器中，以一更好的方法來實施分支預測的機制。傳統的方法對分支指令需要時間很長的、即時的評估，以及/或者對分支歷史資^動能的搜索或計算W慢。另外，在許㈣f二的由不間斷地、但又是必須的存取此分支目標緩衝器之記憶體陣列而導致的電能消耗，完全是浪費。。 " 【發明内容】在本發明-實施例中提供一分支目標緩衝器記憶列，其包含-字元線和-相關的字元線選通電 circuit)。此字讀選通電路包括—職體電路， g 一字元線選通值。 >、仔储者此記憶體陣列更好地適合存儲和此字元線相 =電選Γ路除了存儲著一字元線選通: 的5己憶體電路之外’歧適宜包括—選通邏 = 4】定的例子中，此字元線選通電路回應—字元 ^壓是施加麻字元線以及這字元_通_ ^此使^ 可以對這條目資料進行存敗菸从 ^ 由此使侍操作，其響應在此分支7::二==可;是-寫此存取操作或者是-讀操作，=中接㈣的—寫訊號，憶體陣列。、’、作關涉及這字it線之記 13 1285841 17007pif.doc 之揮爰匕器ί記議車列’適宜於包括-個陣列 -靜態隨機存取記憶二：子列中並 r的記憶體電路包含一個一位元的靜在-相關的特定例子中’此邏輯閘，其接收這字元射匕括第值作為其輸人，由此產生儲起來的字元線選通包括一 m_、s 產生第一璉輯輸出，此選通邏輯電 $作為輯閘，其接收這第—邏輯輸出以及寫訊號作為其輸由此產生—第二邏輯訊號。處一ίΓ!還提供—分支目標緩衝器之記憶體陣列，其回包括厂記紐電路，翻以麵這字元線選通值，此字元線選通電路更包括—選通邏輯電路，其接收—寫訊號以及此字兀線選通值作為輸人，由此在這寫減指出—肯定標識的情況下’使得在此寫操作的過程中允許存取這資料條目，亚^僅僅在此字元線選通值指出一肯定標識的情況下，使知在一讀操作的過程中，允許有條件地存取這資料 ::鉻Γ。儲一資料條目，並回應一讀操作而輸出此一貝"”廷刀支目標緩衝器之記憶體陣列，適宜於包括 -字兀線魏電路，其使得纽寫操狀過程巾允許存取這貢料條目，並回應在此字元線選通電路中存儲著的-字瓜if通值’使得在此讀操作之過財，允許有條件地存取這負料條目。在另一特定的例子中，此字元線選通電路條目0 I2858iW,〇c 憶體陣列的實施例，可容易地和—分支目標緩衝 :勹广上,結合起來。例如’本發明的另-實施例，提供人=二、°己憶體陣列的一分支目標緩衝器，此記憶體陣列通閘的字元線。每條有閘的字元線適宜於包括一選分，更包括一字元線選通電路，其包括一用 -一子兀線選通值的記憶體電路，每條有閘的字元線 ^ ^ 有選通閘的字元線部分。此分支目標緩衝器通常而if解碼β，其接收-指令健，並回應此指令位址些！閑的字元線的其中之-，此分支目標緩衝器收停^資^岐大器’其適合為從選中的㈣字元線接 ί讀，以細應接收到此解碼H巾的這指令位址。路，的特㈣子中’此讀出放大器包括這樣的線、言此·ν=子讀觀值傳轴各自的記憶體電路中，而和這些有閘的，此讀出放大 :門：的線路’即其將一寫訊號傳達到和這此有閘子讀相關聯之各自的字元線選通電路中。、- 此分支目標緩衝器的實施例，八，括分支預測邏輯，其用以:以施: :::::這:心此分支目標緩衝器適宜於包括有選通閘的字元線’每條有閘的字元線;：1=: 15Obviously, it is necessary to implement a branch prediction mechanism in a better way in the emerging processor. Traditional methods require long, immediate evaluation of branch instructions, and/or slow search or calculation of branch history kinetic energy. In addition, the power consumption caused by the uninterrupted, but necessary, access to the memory array of the branch target buffer in Xu (f)f is completely wasted. . < SUMMARY OF THE INVENTION In the present invention - an embodiment provides a branch target buffer memory column comprising - word lines and - associated word line electrification circuits). The word read strobe circuit includes a body circuit, g word line strobe value. >, the memory of this memory array is better suited for storage and the character line phase = electrical selection circuit in addition to the storage of a word line strobe: 5 忆体电路 ' ' ' ' ' Tongluo = 4] In the example, the word line strobe circuit responds - the word ^ pressure is the applied word line and the character _ _ _ ^ this ^ can be used to save the item data ^ Thus the wait operation, its response in this branch 7:: two == can; yes - write this access operation or - read operation, = connect (four) - write signal, memory array. , ', the key involved in this word line 13 1285841 17007pif.doc ί 记车车 ' ' ' ' ' ' ' 适宜适宜适宜适宜适宜适宜适宜适宜适宜适宜适宜适宜适宜适宜适宜适宜适宜适宜适宜适宜适宜适宜适宜适宜适宜适宜适宜适宜适宜适宜适宜适宜The circuit includes a one-bit static-related specific example of the logic gate that receives the character and includes the value as its input, thereby generating a stored word line strobe including an m_, s generates a first burst output, the strobe logic $ is used as a gate, which receives the first logic output and writes the signal as its output - the second logic signal. Also provided is a memory array of the branch target buffer, which includes a factory circuit, which is turned over to the word line strobe value, and the word line strobe circuit further includes a strobe logic circuit. The receive-write signal and the word strobe value are input as input, thereby making it possible to access the data item during the write operation in the case of the write-reduction-positive identification. When the word line strobe value indicates a positive identification, it is known to allow conditional access to the data during the first reading operation: chrome tanning. Storing a data item and outputting a memory array of the target buffer in response to a read operation, which is suitable for including a word line circuit, which allows the write operation process to allow access This tribute entry, and in response to the word-meeting pass value stored in the word line gating circuit, makes the read operation too expensive, allowing conditional access to the negative item. In an example, the embodiment of the word line strobe circuit entry 0 I2858iW, 〇c memory array can be easily combined with the - branch target buffer: for example, 'another embodiment of the present invention, Provider = two, a branch target buffer of the memory array, the word line of the memory array is turned on. Each of the gated word lines is suitable for including a selection, and further includes a word line selection a pass circuit comprising a memory circuit with a strobe line strobe value, each word line having a gate ^^ having a word line portion of the strobe gate. The branch target buffer is usually and if decoded β , its receive-instruction, and respond to this instruction address some of the idle word lines - The branch target buffer is stopped and closed. It is suitable for reading from the selected (four) word line to receive the instruction address of the decoded H towel. (4) In the sub-invention, the sense amplifier includes such a line, in which the ν = sub-reading value transfer axis is in each of the memory circuits, and the gates are read, and the readout is amplified: the gate: the line' Passing a write signal to the respective word line strobe circuit associated with the ram read. - An embodiment of the branch target buffer, eight, including branch prediction logic, for: : ::::: This: The heart of this branch target buffer is suitable for the word line including the gate of the gate - each word line with a gate;: 1 =: 15

I2858iU 並包含一字元線選通電路，而字元線選通電路包括一記憶體電路，其存儲著一字元線選通值，此選通值得自於這分支歷史資料。在一相關的特定例子中，此分支歷史單元包括一狀態機，其依照一指令以往的分支執行歷史，確定此指令的分支歷史資料。按照本發明，分支預測單元的另一實施例包括一分支歷史單元，其用以存儲分支歷史資料，此實施例更包括一分支目標緩衝器，其包含一批有選通閘的字元線，每條有閘的字元線被經由一字元線選通電路而存取，其中此分支目標緩衝器輸出一資料條目，以此回應接收到這分支目標緩衝器中的一指令部分，此分支目標緩衝器還輸出一字元線選通值，此選通值得自於這分支歷史資料。按照本發明，此分支預測單元的實施例，可容易地整合到一處理器中。例如，此處理器包括一指令獲取單元，其用以接收一指令並提供一對應的指令位址，此處理器更包括一分支預測單元，其接收此指令位址並提供一預測的位址給這指令獲取單元，此處理器也更包括一指令解碼器/ T行單元，其用以接收這指令並提供一已解碼的指令，並回應此已解碼的指令而提供一更新的位址。时一在一特定的例子中，此整合到這處理器中的分支預測單t，包括一分支歷史單元，其用以存儲分支歷史資料，此分支預測單元更包括分支預測邏輯，其用以接收此指令位址以及這已更新的位址，由此提供這預測的位址，並由 1285841 17007pif.doc 史資料，此分支預測單元更包括-分支目 I °八用以接收此指令位址並輸出條目資料。這分 ^目標緩衝器適宜於包括如上述實施例說明的-記憶^ 本發明可容易地應用到超量化的處理， =器/執行單元包括-批執行路徑，每條路徑^ 早獨的解碼H和執行單元。超量化處理㈣不限於這些，…處理器和單指令多資 (g e instruction-muitiple-data，SIMD)處理器。對-提供了各種的方法。—較佳的方法使得可以十有閘的字元線進行一讀操作，此字元線是在一分列中的。這方法包括：在和這有閉的 I胃聯的予疋線選通電路中，存儲一字元線選通 ^應此字元_通值，使得有條件地允許此讀操作。，在相關的實施例中，此方法更包括：接收這分支目衝討的—指令部分，回應這指令部分而選擇此有問 =字兀^，並在這選中的有閘字元線上施加一字元線電堅’再回應這字猶電壓和這字元線選通值，而有條件地允許此讀操作。、在另一相關的實施例中，此方法更包括：定義，於此&7部分的分支歷史資料，並從這分支歷史資料中獲得這字元線選通值。發巧，另—相關的實施例中，此方法更包括：從這分支才丁、農衝w中輪出一資料條目，以回應一已被允許的讀操 17 1285841 17007pif.d〇c 作 j明還在—實施例中，提供一運作—分支目衡益的方法，正如上述所描述的方法。此方法適宜於二11 ^一，指令中的每—條指令，在這批有閘字元線的=之一，存儲一對應的資料條目，並且，在此字元=甲之中’存儲-對應的字元線選通值，此字元線選通雷3路The I2858iU also includes a word line strobe circuit, and the word line strobe circuit includes a memory circuit that stores a word line strobe value, which is worth from the branch history data. In a related specific example, the branch history unit includes a state machine that determines the branch history data for the instruction in accordance with a previous branch execution history of an instruction. According to the present invention, another embodiment of the branch prediction unit includes a branch history unit for storing branch history data, and this embodiment further includes a branch target buffer including a plurality of word lines having gates. Each gated word line is accessed via a word line strobe circuit, wherein the branch target buffer outputs a data entry in response to receiving an instruction portion of the branch target buffer, the branch The target buffer also outputs a word line strobe value, which is worth from this branch history data. In accordance with the present invention, an embodiment of the branch prediction unit can be easily integrated into a processor. For example, the processor includes an instruction fetching unit for receiving an instruction and providing a corresponding instruction address, the processor further comprising a branch prediction unit that receives the instruction address and provides a predicted address to The instruction fetch unit further includes an instruction decoder/T line unit for receiving the instruction and providing a decoded instruction and providing an updated address in response to the decoded instruction. In a specific example, the branch prediction unit t integrated into the processor includes a branch history unit for storing branch history data, and the branch prediction unit further includes branch prediction logic for receiving The instruction address and the updated address, thereby providing the predicted address, and the 1285841 17007pif.doc history data, the branch prediction unit further includes - branch I ° eight for receiving the instruction address and Output entry data. The sub-target buffer is adapted to include the memory as explained in the above embodiment. The present invention can be easily applied to the processing of super-quantization, and the =exe/execution unit includes a -batch execution path, each path ^previously decoding H And execution unit. The ultra-quantization processing (4) is not limited to these, ... processor and g instruction-muitiple-data (SIMD) processors. Yes - provides a variety of methods. - A preferred method is to perform a read operation on a ten-character word line, the word line being in a sequence. The method includes storing a word line strobe in the squall line strobe circuit with the closed I gastric splicing, such that the word _ pass value is enabled to conditionally permit the read operation. In a related embodiment, the method further includes: receiving an instruction portion of the branch, and selecting the commanded word in response to the instruction portion, and applying the selected word line on the selected gate. A word line is firmly responsive to the word voltage and the word line strobe value, and conditionally allows this read operation. In another related embodiment, the method further includes: defining a branch history data of the & 7 portion, and obtaining the word line strobe value from the branch history data. Coincidentally, in another related embodiment, the method further includes: rotating a data item from the branch, and responding to an allowed reading operation 17 1285841 17007pif.d〇c In still another embodiment, a method of operating-branching benefits is provided, as described above. This method is suitable for two instructions, each instruction in the instruction, in the batch of one of the gate word lines, stores a corresponding data item, and, in this character = A, 'storage- Corresponding character line strobe value, this word line is strobed by Ray 3

其中之一相關聯。在接收到從這= ^出來的-當前指令之後，這分支目標緩衝器有條件二輸出對應此當前指令的資料條目，此輸出是從這樣的八目標緩衝器進行的，即其涉及對應存儲起來的字元線甬在一相關的實施例中，有條件地輸出此資料條目更包括：從涉及這當前指令的這批有閘的字元線中，選擇一= 閘的字元線，對此選中的有閘字元線施加一字元線電壓，並通過此相關聯的字元線選通電路，閘控這字元線電壓，以此回應這存儲起來的字元線選通值。在此指令執行完之後，至少一字元線選通值會被更新。本發明還在另一實施例中，提供一運作一分支預測單元的方法。此方法適宜於包括在一分支歷史單元中，為— 指令存儲分支歷史資料，從這分支歷史資料中獲得一字元線選通值，此方法更包括存儲一資料條目，而這資料條目涉及一分支目標緩衝器記憶體陣列中的此指令，其中這資料條目被經由一有閘的字元線，在此分支目標緩衝器中存取，此方法也更包括在一字元線選通電路中存儲一字元線 1285841 17007pif.doc 選通值，而這字元線選通電路和這有閘的字元線相關聯，此方法還有條件地允許從這分支目標緩衝器之記憶體陣列中，輸出此資料條目，以此回應在此分支預測單元中接收的這指令，以及涉及這存儲起來的字元線選通值。在一相關的實施例中，此方法更包括··在執行每條指令之後，更新這分支歷史資料以及對應的字元線選通值: 為讓本發明之上述和其他目的、特徵和優點能更明顯易懂，下文特舉較佳實施例，並配合所附圖式，作詳細說 ® 明如下。【實施方式】為讓本發明之上述和其他目的、特徵和優點能更明顯易II，下文特舉較佳貫施例，並配合所附圖式，作詳細說明如下。本發明之保護範圍當視後附之申請專利範圍所界定者為準。在通常的應用中，本發明的實施例提供一分支目樣缓衝器’使得處理器在執行以及運作時具有較低的功率消鲁耗’並提高了分支指令的執行速度，還減少了整體的複雜性。在一方面，此處理器電能的消耗，通過有條件地使得不允許對此分支目標緩衝器之讀操作而減少了。在一相關的方面，此處理器中對分支指令的處理，沒有了由需要搜索和/或計算分支歷史資訊的操作而產生的延遲。但在另一方面，此處理器從一基礎構造之分支預測單元中減少了的複雜性而得到益處。術語”處理器”廣泛地包括任何數位邏輯裝置或系統， 1285841 17007pif.doc 其可以執行或回應一指令序列。此術語包括，比如這樣一些例子’如中央處理器（CPU)、微處理器、數位訊號處理器（digital signal processors，DPSs)、精簡指令集電腦 (reduced instruction set computer)處理器，以及單指令多資料(single-instruction-multiple-data，SIMD)處理器。官線式的處理器，特別適合於整合到依照本發明的敍述而設計的一分支預測單元中。因此，多管線式的處理器將作為說明駐作例子，並作為本發明的構造和應用而描述，而且由此描述本發明所提供的一些優點。圖2以一方塊圖的層面，說明第一個示範性的管線式處理器。處理器10經由匯流排14，採用許多傳統的資料傳輸技術的其巾任何之-，將㈣傳麵記龍12，或從其取得資料。記憶體12假定存儲-或多個軟體程式或程式: -程式或套裝程式含-串的指令。記,_ 12還假定存儲盘这指令序列串相關的資料。這資料可包括由這處理用、，以及/或者由處理器10存儲在記憶體J 果舅料。指令從記憶體12返回到處理琴 #虛搜哭社山从， ° 以此回應從坆處理态扎出的一位址。一位址的指出可有报多的但一程式計數器（program counter，Ρ〇是一妯良劣》技術，通過此程式計數器，處理器1〇指出記哪一個記憶體位置（即記憶體位址），存；;二中的被獲取之指令。 #㈣T-條”要如上所述，指出一條要從記憶體中查指令的簡單方法，當指令中的-或多條分支指2條 20 1285841 17007pif.doc 變得非常的複雜，此分支指八址，而在另-條件下指出另件下指出-個下〜仇理器來說，尤其正確。下一位址。這對管綠式處請參考圖2,管線式處理A、元13，指令獲取單元13從記:通常包括一指令獲取翠提供一預測的位址(如一程式^體。12接收一指令(取），迷獲取單元13將這指令提佴终一 ^值)給記憶體丨2。指令器1S解碼這指令，即通常提H令解 15。指令解螞到一執行單元17。執行單礼令至少的一操作碼部分 (或指令部分），此已解瑪二^矣枚到的這已解石馬的指令個或多個操作。這些操作初始化執行單元中-寫回到記髓12❹、統中料他H㈣結果資料，即其 ===令解：15之外，指令獲取單元括-指令位址Me二:^^單元19°指令部分通常包還接收-由執杆--^括其他資訊。分支預測單元I9 3在之確定的”下-位址'也就指令確實知n執後令序列中要執行的下一條下-位址的ί出：此分支指令的條件被確定了)。因此，分支預測單饋到分支預測單元19。利用此資訊，上的下入焚這捕賴的下—條齡是否實際 -指令位Γ從執行單元17指出的此下-位址，一般是八位二21仃單7017指出的此下—位址’和之前預測的指 " 配時(即一”擊中”條件），此處理器就繼續向前 21 17007pif.doc 1285841 處理這管線的指令序列。但是，如果指出的此下一位址，和之前預測的指令位址不相匹配時（即—，，擊不中”條件），那麼’此處理器就沖洗掉這管線’並加載入由這下一位址指示的指令。下一位址指示的指令和先前預測的指令位址的比較，適宜在分支預測單元19中進行。在如下所描述的另外一些細節中，管線式處理器10内也提供有分支預測單元19了由此提供一預測的位址到指令獲取單元13。在繼續對此較佳實施例作一更詳細的描述之前，應該特別地注意此發明也特別的適合超量化的處理器。圖3說明一作了很大簡化的超量化的處理器。這裏，記憶體12 一樣地經由匯流排14，將指令和/或資料供給到超量^匕處理器11。分支預測單元39和指令獲取單元33，除了指令獲取單元33把指令提供給多執行路徑34、35和％之外，通常如上描述那樣地操作。類似地，每一執行路徑34、35 和36提供一下一位址指示給分支預測單元39。在此說明的超量化處理器之例子中，繪示了三個執行路徑，不過這一數目僅僅是一示範，並且是任意選擇的。此外，每一執行路徑的特徵在於一組合的解碼器/執行單元，其從一公共的指令獲取單元接收指令。和這被圖解的處理器之示範環境相關聯之硬體和功能性的邊界，完全是一設計者之常規設計的課題。例如，解碼和執行功能可容易的在一單獨部分的硬體中執行（比如，一積體電路(1C))或者在多個部分的硬體中執行，即互 22 1285841 17007pif.doc 相合作的積體電路。解碼和/或執行，可以肖硬體、軟體、固件’或此三種通常的平臺類型之任一組合而執行。類似的，此指令獲取單元和指令解碼器單元、和/或分支預元之間的硬體和功能性的邊界，在此實施例中只是說明性的。可以預期的，在本發明的背景内，可有很更動與潤飾。 •二論此整合的處理㈣型，本發明適宜於提供一分支制料’其包括某娜式的分支刪賴、某些存儲與分支指令相關資料的機制，以及某種形式的分支歷史資料的存儲和/或計算。® 4還以方塊圖的形式，說明圖的分支預測單元19。圖4中，分支預測邏輯2〇提供一預測位址，它是至少提供到指令獲取單元Π的典騰^。分支賴邏輯如從此，令獲取單元η接收—指令位址，並通常和分支目標緩衝益22以及分支歷史單元24溝通資訊。這三個功能模組被選作圖解說明的目的。本發明並不限制於在一硬體層面 • ㈣70件，定組合。例如，在本發明的實際實施中，與分支歷史單元24相_ (如下所述)此#料存儲功能，可以和與刀支目標緩衝器22相關的—記憶體陣列結合起來，或者和與分支預測邏輯20相關的一記憶體裝置結合起來。類似的，和分支歷史單元24相關的計算功能，可採用由分支預測邏輯2〇所提供的硬體或軟體資源來實施。更確徹地說，分支預測邏輯2〇從指令獲取單元13接收心·部分，典型的是—指令位址(即一當前程式計數器 23 I28m_ 的值），然後預測此處理器是否應該分支到和此指令相關的一目標位址，或執行指令序列串中的下一條指令。此術語，，預測”通常指由這分支預測邏輯所產生的一邏輯的或計算後的輸出’此輸出的生成是涉及到所接收到的指令位址，還涉及與接收到的指令位址相關的分支歷史資訊。因此，分支預測邏輯20可包含很多邏輯結構專用的組合、計算電One of them is related. After receiving the current instruction from this = ^, the branch target buffer has condition 2 outputting a data entry corresponding to the current instruction, and the output is performed from such an eight-object buffer, that is, it relates to the corresponding storage. Character line 甬 In a related embodiment, conditionally outputting the data entry further includes selecting a word line of the = gate from the batch of word lines associated with the current instruction. The selected gated word line applies a word line voltage, and the associated word line strobe circuit is used to gate the word line voltage to respond to the stored word line strobe value. At least one word line strobe value will be updated after this instruction is executed. In still another embodiment, the present invention provides a method of operating a branch prediction unit. The method is adapted to include a branch history unit, wherein the instruction stores the branch history data, and obtains a word line strobe value from the branch history data, the method further comprising storing a data item, and the data item relates to The instruction in the branch target buffer memory array, wherein the data entry is accessed in the branch target buffer via a gated word line, and the method is further included in a word line gating circuit Storing a word line 1285841 17007 pif.doc strobe value, and the word line strobe circuit is associated with the gated word line, this method conditionally allows memory arrays from the branch target buffer This data entry is output in response to this instruction received in this branch prediction unit, as well as the word line strobe value involved in this storage. In a related embodiment, the method further includes updating the branch history data and corresponding word line strobe values after each instruction is executed: for the above and other objects, features and advantages of the present invention. It is to be understood that the preferred embodiments are described in detail below, and in conjunction with the drawings, the details are as follows. The above and other objects, features, and advantages of the present invention will become more apparent from the description of the appended claims. The scope of the invention is defined by the scope of the appended claims. In a typical application, embodiments of the present invention provide a branching target buffer 'so that the processor has lower power consumption during execution and operation' and improves the execution speed of branch instructions, and also reduces the overall The complexity. In one aspect, the consumption of power to the processor is reduced by conditionally preventing read operations on the branch target buffer. In a related aspect, the processing of branch instructions in this processor is free of delays caused by operations that require searching and/or computing branch history information. On the other hand, however, this processor benefits from the reduced complexity of the branch prediction unit of an infrastructure. The term "processor" broadly includes any digital logic device or system, 1285841 17007pif.doc which can execute or respond to a sequence of instructions. This term includes, for example, such as central processing units (CPUs), microprocessors, digital signal processors (DPSs), reduced instruction set computer processors, and single instructions. Single-instruction-multiple-data (SIMD) processor. The official line processor is particularly suitable for integration into a branch prediction unit designed in accordance with the teachings of the present invention. Accordingly, a multi-pipelined processor will be described as an example and described as a construction and application of the present invention, and thus some of the advantages provided by the present invention are described. Figure 2 illustrates the first exemplary pipelined processor at the level of one block diagram. The processor 10, via the busbar 14, employs any of the many conventional data transmission techniques, and (4) transmits or retrieves data from the face. Memory 12 assumes storage-or multiple software programs or programs: - Programs or package programs with -string instructions. Note that _12 also assumes that the disk is associated with the instruction sequence string. This information may include data stored by the processor, and/or stored by the processor 10 in memory. The instruction returns from the memory 12 to the processing of the piano #虚搜哭社山从, ° in response to the address from the 坆 processing state. The address of a single address can be reported as a program counter (program counter, which is a good and bad) technology. Through this program counter, the processor 1 indicates which memory location (ie memory address) is recorded. , (2) T-strips, as described above, indicate a simple method to look up instructions from memory, when - or more branches in the instruction refer to 2 20 1285841 17007pif .doc becomes very complicated, this branch refers to the eight-site, and in the other-conditions, it is pointed out that the next one is pointed out - the next one is especially correct. The next address. This is the green type. Referring to FIG. 2, the pipeline processing A, the element 13, the instruction acquisition unit 13 from the record: usually includes an instruction to acquire a predicted address (such as a program body. 12 receives an instruction (take), the fan acquisition unit 13 will This instruction raises the final value to the memory 丨 2. The commander 1S decodes the instruction, that is, usually modifies the solution 15. The instruction is decoded to an execution unit 17. The at least one operation code portion of the single ritual is executed ( Or the instruction part), this has been solved by the solution One or more operations of the horse's instruction. These operations initialize the execution unit - write back to the memory, and then the H (four) result data, that is, its === order solution: 15, the instruction acquisition unit includes - instruction bit Address Me 2: ^^ Unit 19° command part usually receives and receives - other information is included by the stick--^. The branch-predicting unit I9 3 determines the "down-address" and the command does know the order. The next lower-address of the sequence to be executed: the condition of this branch instruction is determined). Therefore, the branch prediction is fed to the branch prediction unit 19. Using this information, whether the lower-industry of the above-mentioned insured is actually-instructed to be located from the lower-address indicated by the execution unit 17, generally the lower-position indicated by the eight-digit 21-single 7017 The address 'and the previously predicted " timing (ie, a "hit" condition), the processor continues to forward 21 17007pif.doc 1285841 to process the instruction sequence of this pipeline. However, if the next address indicated does not match the previously predicted instruction address (ie, the -, hit miss condition), then 'this processor flushes the pipeline' and loads it into The instruction of the next address indication. The comparison of the instruction of the next address indication with the previously predicted instruction address is suitably performed in the branch prediction unit 19. In still other details as described below, the pipelined processor 10 A branch prediction unit 19 is also provided therein to thereby provide a predicted address to the instruction fetch unit 13. Before proceeding with a more detailed description of this preferred embodiment, it should be particularly noted that the invention is also particularly suitable for super Quantized processor. Figure 3 illustrates a greatly simplified hyper-quantitative processor. Here, memory 12 supplies instructions and/or data to the over-the-counter processor 11 via bus 14 as well. Unit 39 and instruction fetch unit 33, except that instruction fetch unit 33 provides instructions to multiple execution paths 34, 35, and %, typically operates as described above. Similarly, each execution path 34, 35 And 36 provides a bit address indication to branch prediction unit 39. In the example of the hyperquantization processor described herein, three execution paths are depicted, although this number is merely exemplary and is arbitrarily chosen. Each execution path is characterized by a combined decoder/execution unit that receives instructions from a common instruction fetch unit. The hardware and functional boundaries associated with the exemplary environment of the illustrated processor are completely It is a subject of a designer's conventional design. For example, the decoding and execution functions can be easily performed in a single part of the hardware (for example, an integrated circuit (1C)) or in multiple parts of the hardware, That is, the integrated circuit of the mutual 12 1285841 17007pif.doc. Decoding and / or execution can be performed by any combination of hardware, software, firmware or any of the three common platform types. Similarly, this instruction acquisition unit The hardware and functional boundaries between the instruction decoder unit and/or the branch pre-expo are merely illustrative in this embodiment. It is contemplated that in the context of the present invention There may be a lot of changes and refinements. • In the case of this integrated processing (fourth) type, the present invention is suitable for providing a branching material, which includes a certain type of branch deletion, some storage and branch instruction related information, And storage and/or calculation of some form of branch history data. The branch 4 also illustrates the branch prediction unit 19 of the graph in the form of a block diagram. In Figure 4, the branch prediction logic 2 provides a predicted address, which is at least Provided to the instruction fetch unit 。. The branch Logic thus, as such, causes the fetch unit η to receive the instruction address and typically communicates information with the branch target buffer benefit 22 and the branch history unit 24. These three functional modules are For the purpose of illustration, the invention is not limited to a hardware layer. (4) 70 pieces, a combination. For example, in the actual implementation of the present invention, the branch history unit 24 is _ (described below) The memory function can be combined with a memory array associated with the knife target buffer 22 or with a memory device associated with the branch prediction logic 20. Similarly, the computational functions associated with branch history unit 24 may be implemented using hardware or software resources provided by branch prediction logic. More precisely, the branch prediction logic 2 receives the heart portion from the instruction fetch unit 13, typically an instruction address (ie, the value of a current program counter 23 I28m_), and then predicts whether the processor should branch to and A target address associated with this instruction, or the next instruction in the sequence of instructions. The term "predict" generally refers to a logical or computed output produced by the branch prediction logic. The generation of this output relates to the received instruction address and to the received instruction address. Branch history information. Therefore, branch prediction logic 20 can contain many logical structure-specific combinations, computing power

硌、舅料寄存器、比較電路，和/或類似的硬體資源可能嵌入的控制器軟體，其用以驅動這硬體資源。正如當前所適宜的，分支預測邏輯2〇提供一寫 (WRITE)訊號給分支目冑緩衝$ 22。此寫訊號控制在分支目払緩衝器22中的一寫操作。此術語，，讀(READ)，，和，，寫 (WRITE)”在這裏一般是用於描述相應、的操作，這些操作在普通記憶體裝置的操作中是熟知的，比如—靜態&機存取記憶體⑼atic Random Access Memory，sram)以及動態隨機存取記憶體(Dynamic Random Aceess M DRAM) 〇由分支預簡輯2 0所作的這樣—個確定，即分支到這 η = τ冉為一”取走”條件。由分支預測邏輯2〇所作即不分支到這目標位址，而是執行此指 ^1、羅輯20 條指令，被稱為―”不取走，，條件。分支預歷ΐ資料二Λ走或不取走的條件，是依賴於分支址所指出的此:令二::的狀’癌’是和此接收到的指令位，和/或為至少分支分支歷史單元24負責計算、存儲 24 I285m 預測邏輯20提供分支歷史資料。的資料，即盆有用㈣支歷史貝枓疋任何這樣 ^ 條件之間選擇其—。大量傳統的演曾二if“已經提出’它們建議區分這樣的資料之計算；法1 匕身料指出一分支指令是否會被執行。本發明易於採用這些方法學中的 ^ ^ 算法或:法對分支指令的行為提供測:要此演軍=史資料的存儲和提供，適宜於由一和的記㈣元件而提供。每—在此分支儲有—對應資料條目的分支的指令，應該適宜ί Ϊ =歷史單元中’存儲有某種形式的分支的歷史資 Ϊ料L ΐ面所提到的，分支歷史資料可以和對應二同存儲在此分支目標緩衝器中）。一指令的貝料，可以實驗性地通過運行—衫個包含此指令的 ‘式而較，由此較此指令實際上分支的頻率。一 ^生地確定了，那麼此分支歷史#料就存儲在這分支歷史 ^中’準備作隨後的應用。正如當前所適宜的，最初確 ^的分支歷史讀，如果存在的話，在此指令隨後每一個執行之後，當必需時就被得以更新。在這樣的方式下，暫時的分支行為，被用以更新存在的分支歷史資料。當然，刀支歷史負料不必以某種方式而預先地確定，但可以，，在運作中”(on-the-fly)產生，以此回應實際的指令執行。無論何時確定下來和不管用何種方法得到更新，分支歷史資料可容易地利用一狀態機而確定。一個有競爭性的 25 I2858iUc 狀態機之複雜性和設計，是一常規設計的選擇。然而，如當前所適宜的，本發明整合了— 2位元的、向上/向下的飽和計數器(saturation counter)，其作為分支歷史單元24的計异部分。此2位元的、向上/向下飽和計數器的操作和應用，在圖5的流程圖中得以說明。這裏，一 2位元的分支歷史資料值，在一分支指令的執行之後，被增加了或減少了，這涉及在執行過程中，此指令是否實際地被取走或不被取走。這分支歷史資料指出了此指令一具體之，，取走情況”的程度。例如，一條之前不被取走的分支指令，從一，，強烈地不取走”狀態，轉變到一，，弱的不取走，，狀態。這一狀態的轉變，是通過將對應的分支歷史資料值從”〇〇，，增加到”〇1，，而指出。一條之前有一，，強烈地取走，，狀態，，的指令，通過減少對應的分支歷史資料值，而改變到一，，弱的取走”狀態，這 -改變是跟著—執行週期之後，在這週射，此指^沒被取走。一在當前較佳的實施例中，對大多數的應用程式，2個位元已被S忍為足夠於預測一條指令是否會被取走的可能，。然而，這對所有應用程式來說，並不是一定正確的，它們其中的-些，可能需要更大數量(比如更多的的分支歷史資料，以此作出—精確的取走/不取走的判定。，此’這分支歷史=#料具義定義，還有計算這分貢料所用演算法崎擇，錢/或者實施此選中 ς 一狀態機之一個定義，都是一設計上的選擇。、/ 、 26控制器, data registers, comparison circuits, and/or similar hardware resources may be embedded in the controller software that is used to drive the hardware resources. As is currently appropriate, the branch prediction logic 2 provides a write (WRITE) signal to the branch directory buffer $22. This write signal controls a write operation in the branch directory buffer 22. This term, READ, WRITE, is generally used herein to describe the corresponding operations that are well known in the operation of conventional memory devices, such as - Static & Access memory (9) atic Random Access Memory, sram) and dynamic random access memory (Dynamic Random Aceess M DRAM) 这样 such a determination by the branch pre-simplification 20, that is, branching to this η = τ 冉 is one "take away" condition. The branch prediction logic 2〇 does not branch to the target address, but executes this instruction ^1, Luo 20 instructions, is called "not taken, condition." The condition that the branch pre-study data is taken or not taken is dependent on the branch address: the order of the second:: is the instruction bit received, and/or is at least the branch branch. The history unit 24 is responsible for computing and storing 24 I285m prediction logic 20 to provide branch history data. The information that the basin is useful (four) is the history of Bellow any such ^ between the conditions to choose its -. A large number of traditional performances have "had proposed" that they suggest to distinguish the calculation of such data; the law 1 indicates whether a branch instruction will be executed. The invention is easy to adopt the ^ ^ algorithm or method pair in these methodologies The behavior of the branch instruction provides the test: the storage and provision of the data of the march = history is suitable for the component of the sum (4) of the sum. Each branch in the branch stores the instruction corresponding to the branch of the data entry. Ϊ = in the history unit, the historical information of the branch that stores some form of branch is mentioned. The branch history data can be stored in the branch target buffer in the same way as the corresponding branch. It can be experimentally run by running a shirt containing the 'invention', and thus the actual branching frequency of the instruction is determined. Then the branch history # is stored in the branch history ^ Prepared for subsequent applications. As is currently appropriate, the initial branch history read, if any, is updated when necessary after each subsequent execution of this instruction. In this way, the temporary branching behavior is used to update the existing branch history data. Of course, the knife history negative material does not have to be determined in advance in some way, but can, in operation, "on-the- Fly) is generated in response to the actual instruction execution. Whenever it is determined and updated regardless of the method used, the branch history data can be easily determined using a state machine. The complexity and design of a competitive 25 I2858iUc state machine is a common design choice. However, as is currently appropriate, the present invention incorporates a 2-bit, up/down saturation counter as a discrete portion of branch history unit 24. The operation and application of this 2-bit, up/down saturation counter is illustrated in the flow chart of Figure 5. Here, a 2-bit branch history data value is incremented or decremented after execution of a branch instruction, which involves whether the instruction is actually taken or not taken during execution. This branch of historical data points out the extent to which this instruction is specific, taking away the situation. For example, a branch instruction that was not taken before, from one, strongly does not take away, the state changes to one, weak Do not take away, state. The change of this state is indicated by increasing the corresponding branch historical data value from "〇〇," to "〇1". One before, one, strongly remove, state, and the instruction, by reducing the corresponding branch historical data value, and changing to a, weakly taken away state, this - change is followed - after the execution cycle, in This shot, this finger is not taken. In the presently preferred embodiment, for most applications, 2 bits have been tolerated by S to be sufficient to predict whether an instruction will be taken away. However, this is not necessarily true for all applications, and some of them may require a larger number (such as more branch history data to make - accurate take/not take The decision to go. This 'this branch history=#material definition, as well as the calculation of the algorithm used to calculate the tribute, money / or implement this selection ς a definition of a state machine, is a design Choice., / , 26

1285841 17007pif.doc 圖4戶斤示的分支目標緩衝器，通過炎，，描述。這裏，-解碼器43接收—指^二而更= 指出的字元元峻你紐说哭乂傳先的分支目標緩衝器，一批字 r 到記憶體陣列40。然而，在本發：有二；：：質、操作，和實施都作了更改。術語，，子赠，，應用在以下蝴的雜實施例中，以此描述本發明所預期的字元線。 °己隐體陣列4G適宜於用非揮發性的(non.vGlatile)記憶胞’比如靜悲隨機存取記憶胞，但也可採用其他形式的記憶胞。類似一傳統的分支目標緩衝器之記憶體陣列，本發明的記憶體陣列，適宜於存儲一批資料條目，其中每一資料條目對應於一指令，並適宜於包含炱少一分支位址標籤和一目標位址。每一資料條目中，還<雜關聯有其他類型的資料，但通常來說，需要有某種形式的分支位址標籤和目標位址。一當前較佳的有選通閘的字元線，在圖7得以繪示。一有閘的字元線通常包括一字元線7〇和一字元線選通電路60之結合。如圖6所示，這些字元線選通電路，適宜於以一對一的原則，和一對應的字元線相關聯。這批字元線選通電路，適宜於在記憶體陣列4〇内以列的方式來配置。這種配置對存儲在這些字元線選通電路中各自的字元線選通值，允許經由傳統的寫技術而容易地更新。在一較佳的實施例中，記憶體陣列40包括一靜態隨機存取記憶體陣 27 1285841 17007pif.doc ::===括一記憶體電路’它由-單 J月’心酼機存取記憶胞而形成。可疋此予元線選通電路的實際配置，每一字，路按，-，，字元線選通值，，而L:，l= 線相關的指令之分支歷史資料而得到的，由粗。許對—對應字元線的存取，而此分支歷史資部八，、ί疋#，每—接收在分支目標緩衝器中的分支指令元線存健-資料條目，其包二=:::和1 二它們和這接收到的分支指令部分相關=二 :43:携的字元線是一有間的字元線，即這樣料的字元線選通值而控制:此字元元疋侍自於和這指令相關的分支歷史資料。和、二支的仏’一字70線選通值，是適宜於得自的分支歷史資料。一較佳的得自方法，將以 ^貫施例的背景而描述。這得自方法只是—個例子。許 =同的方法可以應賴從這分支較資料中，得出一字 ^線選通值1這些方法將按照如下這卵素的不同而變 U巧歷史資料的性質、用於㈣此分支歷史資料的 t法、子疋線選通值的大小(即位元的數量），以及/或者此字元線選通電路之結構和作為其構造基礎的記憶體電路0 28 I28m_c 、假設採取關於圖5所描述之2位元的分支歷史資料，並，一步採取一單個位元的記憶體電路，而這記憶體電路和每一字元線選通電路相關聯，那麼，簡單地通過利用此为支歷史資料中最有意義的一個位元，就可以得出一勝任的，字元線選通值。在此例子中，一最有意義的位元之邏輯值”1”，指示此指令一，，強烈地取走”或，，弱取走，，的狀態。而一最有意義的位元之邏輯值’，〇，，，指示此指令一，，弱^不取走或”強烈地不取走”的狀態。通過存儲此位元到和這字元線選通電路相關聯的一單個位元的記憶胞，那麼，此指八之取走情況的-可接受精確度的指示，被用於控制此的字元線。 =到^圖7，—解選中字元線7。，並在傳統丄 > 式下，一子兀線電壓施加到字元線70。通常來講，這 =加的字元線電壓’將會提高橫跨此字 ^ =勢位元。㈣，在本發財，此字元線電壓橫電=〇線所長Λ的，，是有條件地由—關聯之字元線選通 =60所允和字元線選通電路⑼適宜於包括—記電路61和一選通邏輯電路62。 μ 一的4= 路二!會按照要存儲起來的字元線選通值尺寸大小砂料尺寸。在制關子巾，存儲有。然而，字元線控制值任何合理的尺寸大小， j起來朗於控制對此有閘字元線的存取。在圖7 /，四個N-類型的電晶體’它們用於為這記憶二;二 29 ，07Pif.doc 1285841 元線選通值。、、存儲起來的字元線選通值之邏輯值(“1”或”〇”)，被用作為選通邏輯電路62的一輸入。明確的說，此字元線選通值 =為輸^，施加到一第一邏輯閘82，此邏輯閘也從這分支預測邏輯接收一寫訊號。由於此第一邏輯閘是一或類型 =t，ype)的邏輯閘，在其中—個或同時兩個輸人的一邏輯 ^ ，都將導致一第一邏輯輸出，τ，。此第一邏輯輸出，輯，，°1，此^^電壓值(即…由此第二邏輯閘解釋為一邏或υ的―高或低的電壓），被施加到一第二邏輯閘兩侗ί於ί第二邏輯閘是一與類型(AND切e)的邏輯閘，個輸入都必須是，τ，，才導致—第二邏較佳實施例中，這從第-遴親卩卩在此被作輯輸出的第二邏輯輸出， -:字部分，即元線部分71可回應一指令位址而被-X線ΓΓ2,字的字元線科72，其只可_此字^ ^擇’而有間被允許存取。在-實施例中，此選擇的電路的操作而於從這解碼器接收一字元線電壓。、子元線部分被適合此字元線電壓，可作為與字元線7g 通電路的一個輸入。當有條件地被存儲聯的子π線選路60中的字元線選通值允許，此字 ^字★線選通電的有閘字元線72。、、電&被傳送到對應 I2858iUc 有條件地傳送(即”閘控選 Ϊ字元線，-對應的有二::線 ίΓί的!"#作°也就是說，當此字元線選通值指出八去歷史資料齡卜分支指令很可纟 θ刀支元線的-讀操作就被允許= 分支歷史資料褚、目丨丨…、田此子兀線選通值指出的字元令可能不被取走時，對此選擇的子几、，泉的一一操作就被不允許。這樣的有條件之，，存取操作，，授寫操作之過程中是不必要的，在此寫操二在^ 目被更新。因此’這寫訊號對此第一姐山I、=σ ’§—有閘字元線被此解碼器選中時，就直二==通值而進行。在此方式下，不僅有條件之讀 ϋ和無條件的(對應這字元線親縣說)寫操作，都有效地利用一最少的硬體資源。抑圖6所不之讀的分支目標緩衝器，更包括一讀出放大~ 5八在成功的（比如，一允許的)讀操作之後，從 δ己憶，陣列4G接收-資料條目。正如當前所適宜的，讀出放大器45咖於載入(寫或更新)字元線控制值(word line control values ’ WLCV)到和此字元線選通電路相關聯的各個記憶體電路。按照本發明，操作—分支預測單元的一示範方法，將參考圖8在下面作出描述。對應於—批分支指令的資料條目，存儲在此分支目標緩衝器之記憶體陣列中（1〇〇)。每一 31 I28584107p,doc 才"的刀^歷史貝料’通過採用—勝任的演算法而揭露出來(1〇1)。每一指令各自的字元線選通值(word line gating values，WLGV)，從分支歷史資料而得到（1〇2)，並存儲到對應於_字70線選通電路的記憶體電路中⑽）。、在口這些資料條目和字元線選通值存奴來後，此分支預測單it就準備好接收—指令部分，比如一指令位址 (104)。此指令部分被解碼⑽），並且一對應的字元線被選，出來(1G6)。此字兀線被選擇出來後，此存儲起來的字元線選通值，有條件地確定此字元線—有閘的部分是否被存取一”肯定的”字元線選通值的指示（即一此分支指令可能會被取走的指示），導致—允許的字元線存取(108)，並輸出對應的資料條目（109)。一”否定的，，字元線選通值的指示 (即-此分支指令可能不會被取走的指示），導致不作進一的存取，並由此記憶體陣列產生輸出（11Q)。上面提到的 k否疋的和”肯定的”字元線選通值指*，將通常對應到一具體指令的標識為取走/不取走之狀態。、抑上述的例子說明了經由按照本發明所設計的一分支預測單元而得_方便性和有效性，其可接彳卜分支指令，並有，=地允許存取存儲在一分支目標緩衝器中的資料條 1 /這資料條目只從這分支目標緩衝器中讀出，這分支目才不、、爰衝裔中對應的分支歷史資料，删此資料條目可能需被用到有低的取走”此分支可能性的指令，不會導致對此分支目標緩衝允許的讀操作。因此，花費在不必要之續操作的電能被節省下來了。 32 I285841〇7p_ 例如，圖9是一圖表，其說明一 EEMBC基準測試的模擬運行結果，此模擬是採用按照本發明設計的一分支預測單元而進行。沿著水平軸所指出的一系列的基準測試程式的上方，繪示這樣的一比較，即分支指令一預測的機率和分支指令-實際的機率的比較。這具體的類比揭示了分支指令的大約4G%，是和-”不取走，，的狀態相關聯，因此，和分支目標緩衝器記憶體陣列的讀操作相關的電能消耗可以減少40%。但是，本發明並不提供對這樣的電能節省，即對複雜性之增加和運作速度降低的代價。當—指令被接收到這分支預測單元時’此指令直接地由一解碼器處理，許時’接著直接產生-對應的資料條目輸出。對^ 處理，沒有為了如下這些操作而產生的延遲，即為^ 預解碼，查找和/或計#此指令齡支歷史資料，並只後產生一訊號，此訊號允許/不允許對此分二憶體作-對應_操作。 i讀&己本發明中，不需要另外複雜的線路或功能，條件地允許對此分支目標緩衝器記憶體作讀操作。Λ 而代之以的，一對應的字元線選通值，，等^，，中接收的每條指令。在一簡單的字元爰的子兀線選通值之應用，其允許/不允許〜_通電路取，字元線存儲騎應於此接收到指令==線的存每條指令的執行之後，此字元線選通值的可;=。在在此字元線選通電路裏得到更新。 w>確地 33 I2858ilP,doc 雖然本發明已以較佳實施例揭露如上，然其並非用以限^本發明，任何熟習此技藝者，在不脫離本發明之精神和範圍内，當可作些許之更動與潤飾，因此本發明之保護範圍當視後附之申請專利範圍所界定者為準。術S吾較佳的”和”適宜”貫通上面的描述。這些術語只是在這些展示的實施例中，指出當前的優先選擇。這些術語還承認，隨著技術不斷地改進其他電路，各種機制，以及各種方法將會產生出來，通過它們，可有效地實施本發明。【圖式簡單說明】為讓本發明之上述和其他目的、特徵和優點能更明顯易懂，下文特舉較佳實施例，並配合所附圖式，作詳細說明如下。其中：圖1說明一傳統的分支目標緩衝器，以及從此分支目標缓衝器輸出一資料條目所需要的元件。圖2是一方塊圖，其繪示一示範的處理器，按照本發明，此處理器可容易地整合到一分支預測單元和/或按照本發明的方法中。圖3是一方塊圖，其繪示一示範的超量化處理器，按照本發明，此處理器可容易地整合到一分支預測單元和/ 或按照本發明的方法中。圖4是一方塊圖’其按照本發明，進一步說明一分支預測單元一些另外的細節。圖5是一流程圖，其按照本發明，說明一易於包含在 34 1285841 17007pif.doc 一分支歷史單元中的狀態機。圖6是一方塊圖，其按照本發明，說明一分支目標緩衝器的記憶體陣列的一實施例。圖7是一電路圖，其按照本發明的一實施例，進一步說明一有選通閘的字元線的結構。圖8是一流程圖，其按照本發明，說明一較佳的操作一分支預測單元之方法。圖9是一圖表，其按照本發明，說明一分支預測單元之基準測試的類比結果。【主要元件符號說明】 I :分支目標緩衝器 2:分支預測邏輯1285841 17007pif.doc Figure 4 shows the branch target buffer, which is described by inflammation, . Here, the decoder 43 receives the finger and the indicated character, and the group of words r to the memory array 40. However, in this issue: There are two;:: quality, operation, and implementation have been changed. The term, sub-grant, is used in the following embodiments to describe the word lines contemplated by the present invention. The crypto-array array 4G is suitable for use with non-volatile (non.vGlatile) memory cells such as singular random access memory cells, but other forms of memory cells can also be used. Similar to a memory array of a conventional branch target buffer, the memory array of the present invention is adapted to store a batch of data entries, wherein each data entry corresponds to an instruction and is adapted to include a branch address label and A target address. In each data entry, there are also other types of data in the <hetero association, but in general, some form of branch address tag and target address are required. A currently preferred word line with a gate is shown in FIG. A gated word line typically includes a combination of a word line 7 〇 and a word line select circuit 60. As shown in Figure 6, these word line gating circuits are adapted to be associated with a corresponding word line on a one-to-one basis. The word line gating circuits are adapted to be arranged in columns in the memory array 4A. This configuration allows for the respective word line strobe values stored in these word line strobe circuits, allowing for easy updating via conventional write techniques. In a preferred embodiment, the memory array 40 includes a static random access memory array 27 1285841 17007pif.doc ::=== including a memory circuit 'it is accessed by a single J month' heartbeat machine Formed by memory cells. This can be obtained from the actual configuration of the semaphore circuit, each word, the path is pressed, the -, the word line strobe value, and the L:, l = line-related instruction branch history data is obtained from the coarse .对对—corresponding to the access of the word line, and the branch history department 八,, 疋#, each receiving the branch instruction line in the branch target buffer, the health-data entry, the package two =:: : and 1 2 are related to the received branch instruction part = 2: 43: The character line carried is an inter-word line, that is, the word line strobe value is controlled by this: the word element疋 From the branch history data related to this instruction. The 70-line strobe value of the 和、 and 二一 is a branch historical data suitable for obtaining. A preferred method of obtaining will be described in the context of a consistent embodiment. This is derived from the method is just an example. Xu = the same method can be relied on from this branch to obtain a word ^ line strobe value 1 These methods will be changed according to the following ovum, the nature of the historical data, for (4) the history of this branch The t-method of the data, the size of the sub-line strobe value (ie, the number of bits), and/or the structure of the word line gating circuit and the memory circuit 0 28 I28m_c as the basis of its construction, assumed to be taken in relation to FIG. Depicting the 2-bit branch history data, and taking a single bit memory circuit in one step, and this memory circuit is associated with each word line gating circuit, then simply by using this as a branch The most meaningful bit in the historical data, you can get a competent, word line strobe value. In this example, the logical value "1" of a most significant bit indicates that the instruction one, strongly removes the state of "or, weakly, away." The logical value of a most significant bit. ',〇,,, indicates the state of this instruction, the weak ^ does not take away or "strongly does not take away." By storing this bit to a single bit associated with the word line gating circuit Memory cell, then, this refers to the eight-take away condition - the indication of acceptable accuracy, is used to control the word line. = to ^ Figure 7, - uncheck the word line 7. And in the traditional 丄>, a sub-twist voltage is applied to the word line 70. Generally speaking, this = the added word line voltage 'will increase across the word ^ = potential bit. (d), in this fortune, this The word line voltage is equal to the length of the line, and is conditionally used by the associated word line strobe = 60 and the word line strobe circuit (9) is suitable for including the circuit 61 and a strobe. Logic circuit 62. μ 1 of 4 = way 2! The size of the sand material will be strobed according to the word line to be stored. Yes, however, the word line control value is any reasonable size, j is superior to controlling access to this gated word line. In Figure 7 /, four N-type transistors 'they are used for this Memory 2; 2, 29, Pif.doc 1285841 Yuan line strobe value. The logical value ("1" or "〇") of the stored word line strobe value is used as one of the strobe logic circuit 62. Input. Specifically, the word line strobe value = is input, applied to a first logic gate 82, and the logic gate also receives a write signal from the branch prediction logic. Since the first logic gate is one or Type = t, ype) logic gate, in which one or two simultaneous input logic ^ will result in a first logic output, τ, this first logic output, series,, °1, this The ^^ voltage value (ie...the second logic gate is interpreted as a logic or a high or low voltage) is applied to a second logic gate. The second logic gate is a type and The logical gate of AND cut e), the input must be, τ, to cause - the second logic in the preferred embodiment, this is from the first - 遴 relatives here The second logic output of the output, the -: word portion, that is, the line portion 71 can be responded to by an instruction address by the -X line ΓΓ 2, the word line line 72 of the word, which can only be _ this word ^ ^ select ' While in some embodiments, the selected circuit operates to receive a word line voltage from the decoder. The sub-line portion is adapted to the word line voltage and can be used as a word. An input of the line 7g pass circuit. When the word line strobe value in the sub-π line circuit 60 that is conditionally stored is allowed, the word line is selected to be energized by the word line 72. , electric & is transmitted to the corresponding I2858iUc conditionally transmitted (ie, the gate control select word line, - corresponding to two: : line ίΓί! "#作° That is, when this character line is selected The general value indicates that the eight-go historical data age branch instruction is very 纟 θ knife branch line-read operation is allowed = branch history data 丨丨, directory 丨丨, Tian 兀兀 line strobe value pointed word order When it is not possible to take it away, the one-to-one operation of this selection is not allowed. Such a conditional, access operation, and write operation are unnecessary, and the write operation is updated. Therefore, when this write signal is selected by the decoder for the first sister mountain I, = σ '§ - the gate word line is selected, the direct two == pass value. In this way, not only the conditional reading and the unconditional (corresponding to the word line pro-county) write operation, the use of a minimum of hardware resources is effectively utilized. The branch target buffer that is not read by Fig. 6 further includes a read-out </ RTI> </ RTI> </ RTI> </ RTI> after a successful (eg, an allowed) read operation, receiving a data entry from δ recall, array 4G. As is currently preferred, sense amplifier 45 is configured to load (write or update) word line control values (WLCV) to respective memory circuits associated with the word line gating circuit. In accordance with the present invention, an exemplary method of operation-branch prediction unit will be described below with reference to FIG. The data entry corresponding to the -batch instruction is stored in the memory array of the branch target buffer (1〇〇). Each 31 I28584107p, doc's "knife ^ history material" was revealed by the use of a competent algorithm (1〇1). The word line gating values (WLGV) of each instruction are obtained from the branch history data (1〇2) and stored in the memory circuit corresponding to the _word 70 line gating circuit (10) ). After the data entry and the word line strobe are stored, the branch prediction unit is ready to receive the instruction part, such as an instruction address (104). This instruction is partially decoded (10)), and a corresponding word line is selected (1G6). After the word line is selected, the stored word line strobe value conditionally determines whether the word line - the gated portion is accessed - a "positive" word line strobe value indication (i.e., an indication that a branch instruction may be taken away) causes the allowed word line access (108) and outputs a corresponding data item (109). A "negative" indication of the word line strobe value (ie, an indication that the branch instruction may not be taken away) results in no further access and thus an output (11Q) from the memory array. The reference to the k 疋 and "positive" word line strobe value refers to *, which will generally correspond to the status of a specific instruction as the take-away / not take-off state. The invention designs a branch prediction unit to obtain convenience and validity, which can be connected to the branch instruction, and has, for example, allow access to the data strip 1 stored in a branch target buffer/this data item only Read from this branch target buffer, this branch does not, the corresponding branch history data in the smashing, the deletion of this data entry may need to be used to have a low instruction to take away the possibility of this branch, not Will cause read operations allowed for this branch target buffer. Therefore, the power spent on unnecessary operation is saved. 32 I285841〇7p_ For example, Fig. 9 is a diagram illustrating the results of a simulation run of an EEMBC benchmark test performed using a branch prediction unit designed in accordance with the present invention. Above the series of benchmarks indicated along the horizontal axis, a comparison is made between the branch instruction-predicted probability and the branch instruction-actual probability comparison. This specific analogy reveals approximately 4G% of the branch instruction, which is associated with the state of "not taken away", so the power consumption associated with the read operation of the branch target buffer memory array can be reduced by 40%. The present invention does not provide for such power savings, that is, the increase in complexity and the reduction in operating speed. When the instruction is received by the branch prediction unit, the instruction is directly processed by a decoder, when Then directly generate the corresponding data item output. For the ^ processing, there is no delay for the following operations, that is, ^ pre-decoding, searching and/or counting the age data of the instruction, and only generating a signal. This signal allows/disallows the operation of this binary memory - corresponding _ operation. i read & in the present invention, no additional complicated lines or functions are required, conditionally allowing read operation of this branch target buffer memory Instead, a corresponding word line strobe value,, etc., is received in each instruction. The application of a sub-line strobe value in a simple character , allows/ Do not Xu ~ _ circuit take, word line storage ride should receive the command == line after the execution of each instruction, the word line strobe value can be; =. in the word line strobe Updated in the circuit. w> Indeed 33 I2858ilP, doc Although the present invention has been disclosed in the above preferred embodiments, it is not intended to limit the invention, and those skilled in the art, without departing from the spirit and scope of the invention The scope of protection of the present invention is defined by the scope of the appended claims. The preferred embodiments of the invention are the same as the above description. These terms are only used in the embodiments of these presentations to indicate current preferences. These terms also recognize that as the technology continues to improve other circuits, various mechanisms, and various methods will be produced by which the present invention can be effectively implemented. BRIEF DESCRIPTION OF THE DRAWINGS The above and other objects, features, and advantages of the present invention will be apparent from Where: Figure 1 illustrates a conventional branch target buffer and the components needed to output a data entry from the branch target buffer. 2 is a block diagram showing an exemplary processor that can be easily integrated into a branch prediction unit and/or in accordance with the method of the present invention in accordance with the present invention. 3 is a block diagram showing an exemplary hyperquantization processor which, in accordance with the present invention, can be easily integrated into a branch prediction unit and/or in accordance with the method of the present invention. Figure 4 is a block diagram which further illustrates some additional details of a branch prediction unit in accordance with the present invention. Figure 5 is a flow chart illustrating a state machine that is readily included in a branch history unit of 34 1285841 17007 pif.doc in accordance with the present invention. Figure 6 is a block diagram showing an embodiment of a memory array of a branch target buffer in accordance with the present invention. Figure 7 is a circuit diagram further illustrating the structure of a word line having a pass gate in accordance with an embodiment of the present invention. Figure 8 is a flow chart illustrating a preferred method of operating a branch prediction unit in accordance with the present invention. Figure 9 is a graph illustrating the analogy results of a benchmark test of a branch prediction unit in accordance with the present invention. [Main component symbol description] I : Branch target buffer 2: Branch prediction logic

3:指令位址解碼器 4 :記憶體陣列 5、45 :讀出放大器標籤比較寄存器多工器程式計數值多工器邏輯閘 10 :管線式處理器 II :超量化處理器 12 :記憶體 13、33 :指令獲取單元 14 ·匯流排 35 1285841 17007pif.doc 15 :指令解碼器 17 :執行單元 19、39 :分支預測單元 20 :分支預測邏輯 22 :分支目標緩衝器 24 :分支歷史單元 34、35、36 :執行路徑 40 :記憶體陣列 43 :解碼器 60 :字元線選通電路 61 :記憶體電路 62 :選通邏輯電路 70 :字元線 71 :選擇的字元線部分 72 :有閘的字元線部分 80 :第二邏輯閘 82 :第一邏輯閘 363: Instruction Address Decoder 4: Memory Array 5, 45: Sense Amplifier Tag Comparison Register multiplexer Program Count Value Multiplexer Logic Gate 10: Pipelined Processor II: Ultra-Quantization Processor 12: Memory 13 33: instruction fetch unit 14 · bus bar 35 1285841 17007pif.doc 15 : instruction decoder 17 : execution unit 19 , 39 : branch prediction unit 20 : branch prediction logic 22 : branch target buffer 24 : branch history unit 34 , 35 36: Execution path 40: Memory array 43: Decoder 60: Word line gating circuit 61: Memory circuit 62: Gating logic circuit 70: Word line 71: Selected word line portion 72: Gate Word line portion 80: second logic gate 82: first logic gate 36

Claims

December 19曰1285841 The cover disclosed in the Ming or the drawings. Patent application scope: 1. A branch target buffer memory array comprising: a word line and an associated word line gating circuit; The word line gating circuit includes a memory circuit that stores a word line strobe value. 2. The branch target buffer memory array of claim 1, wherein the memory array is adapted to store a data entry associated with the word line. 3. The branch target buffer memory array of claim 2, wherein the word line strobe circuit further comprises a strobe logic circuit. 4. The branch target buffer memory array of claim 3, wherein the strobe logic circuit is responsive to a word line voltage applied to the word line and the word line strobe value, This makes it possible to access this data item. 5. The strobe logic circuit is further responsive to a write signal as in the branch target buffer memory array of claim 4 of the patent application. 6. The branch target buffer memory array of claim 4, wherein the access operation is a read operation on the word line. 7. The branch target buffer memory array of claim 5, wherein the access operation is a write operation to the word line. 8. The branch target buffer memory array of claim 1, wherein the memory array comprises a non-volatile memory cell array. 9. The branch target buffer memory array of claim 8, wherein the memory array is a static random access memory 37 1285841 17007pif.doc

(SRAM) array. 10. The branch target buffer memory array of claim 9, wherein the memory circuit comprises a one-bit static random access memory cell. 11. The branch target buffer memory array of claim 2, wherein the data entry comprises a branch target tag and a target address. 12. The branch target buffer memory array of claim 5, wherein the memory circuit comprises a one-bit static random access memory cell that stores the word line strobe value; The strobe logic circuit includes: a first logic gate receiving a write signal and a word line strobe value as its input, thereby generating a first logic output; and a second logic gate receiving the first The logic output and the word line voltage are used as their inputs / thereby generating a second logic signal. 13. The branch target buffer memory array of claim 5, wherein the word line comprises a selected word line portion that receives the word line voltage, the word line further comprising a corresponding There is a gate word line part. 14. The branch target buffer memory array of claim 13, wherein the selected word line portion and the gated word line portion are electrically connected by a word line gating circuit. . 15. The branch target buffer memory array of claim 12, wherein the word line comprises a selected word line portion connected to 38 1285841 17007 pif.d〇c $ the word line voltage The character line further includes a corresponding branch word line portion, such as the branch target buffer record 2, _1 column, f, the selected word line portion and the The word line line knife with the gate is electrically connected by the word line gating circuit; and the / logic-path 5 hole number includes a word line voltage, and the word line lightning pressure is applied thereto. There is a _word line part. Sub-70 line price 1^. A branch target buffer memory array, which is stored in response-write selection - data entry = destination, the memory array includes: 4 for the input of poor material data; a line gating circuit that allows access to a word pass value stored in the word line gating circuit during the write operation. Conditionally allowing access to the data strip during the read operation (4) ί车8·列列nt nt, 15 of the branch target buffer described in Item 17 of the U-body array 'where the transition strobe circuit comprises: a memory circuit, (4) save money _ pass value; and ―Write signal and word line strobe value and ^ only in the vl write operation of the fortune, allowing access to this data item, sub-only in the sub-line strobe value indicates a positive identification of the case - read operation Seal, scale access conditional access. 19. The branch target buffer of claim 18, wherein the ur-th gate logic circuit receives a word line voltage as the strobe value of the word line and the word line. Electric will refer to 39 1285841 17007pif.doc in the case of a positive identification, the second two 'conditionally allow access to two = make the in-read operation memory array, where the memory array target buffer notes 21. The scope of the patent application is the first wave of 1 ▲ recalled cell array. The memory array, where the record; L is the target buffer list, and the memory = static = access memory cell. Bit static of your random access memory • branch target buffer, including · a memory array, Α 人幻有有有有有有有有有有字字字字字字字字字字字字字字字字'each-selected word line portion; strobe value = path, which includes -_storage-word line one" - word line portion with gates; and partial receive-command portion, and It is suitable to select the "MW word line" in response to the instruction. The branch target buffer described in item 22 of the patent scope, the selected word line portion of the idle word line is applied. Sub 7C line voltage, the _ word line is selected by the decoder. For the branch target buffer described in claim 23, the strobe circuit further includes a strobe logic circuit to receive the The word line character string_pass_ is input for it. 25. The branch target buffer as described in claim 24, 40

The year-and-month repair M is replacing page 1285841 17007pif.doc, the word line strobe circuit, which is adapted to conditionally allow access to the gated word line of the gated word, where the gated word The line is selected by this decoder, thereby responsive to the word line voltage and the word line strobe value. 26. The branch target buffer of claim 25, wherein the access operation is a read operation.

^ +27. The branch target buffer of claim 25, wherein the word line gating circuit is adapted to receive a write signal as an input, and wherein the access operation is a write operation. The branch target buffer of claim 26, wherein the memory array comprises a static random access memory array, the memory circuit comprising a one-bit static random access memory cell The strobe logic circuit includes: a NAND gate that receives the write signal and the word line strobe value, thereby outputting a first logic signal; and, for, the input _ and the gate, receiving the character The line voltage and the first logic signal are used as input inputs, thereby outputting a second logic signal. "," "29. The branching target as described in claim 28, wherein the second logic lobes u are _ word line voltages, which are applied to the word lines of the gates The word line portion, which has strobe = is selected by the decoder. The line 30. The branch directory as described in item 22 of the patent application scope includes: , , and descendants, a sense amplifier, which is adapted to receive a data item from the selected gated gate, which has The word line of the gate is cleared by the decoding of the '41' I28584l 17〇〇7pif.doc. ^ As in the branch target buffer described in claim 30, "Hai Jing Amplification includes such a line, which transmits the word line strobe value to the word line with the strobe gate Related to each, such as the patent application, the == device,

, the memory array includes a static random access memory array; and each of the memory circuits of each of the memory circuits, including a one-bit static, accessing 5 cells, for receiving A word line strobe value. A branch target buffer as described in claim 32 of the patent application, each of the respective memory circuits, comprising: a second or a gate that receives the memory from the cell of the cell The = sign and the word line strobe value as its input, the signal signal; and - and Μ 'the receiving the word line voltage and the first - logic signal as input, thereby outputting a second logic signal . 34. A branch prediction unit, comprising: a branch history unit configured to store branch history data; a branch logic circuit configured to receive an age address, provide a prediction address and update the branch history data; And the knife branch shows the buffer|§, which is used to receive the address of the instruction, and includes: s - a memory array containing the word line of the gate, each - has: the gate: the yuan line storage A data entry includes a word line select power-' and the word 70-line strobe f-channel includes a - note circuit that stores the word line strobe value derived from the historical data of the branch. 42 I2858iUc

35. The branch prediction unit of claim 34, wherein the branch prediction logic circuit provides a write signal to the branch target buffer 36. The branch prediction unit of claim 35, wherein the branch prediction unit The branch history unit includes a state machine for calculating its branch history data in accordance with the branch execution history of an instruction. 37. The branch prediction unit of claim 36, wherein the branch history unit includes a branch history table storing the branch history data, and wherein the state machine of the branch history unit includes a 2-bit Up/down saturation counter. 38. The branch prediction unit of claim 36, wherein the memory array comprises a static random access memory array, the memory circuit comprising a one-bit static random access memory cell, and The word line strobe value includes a single data bit derived from the branch history data. 39. The branch prediction unit of claim 38, wherein the branch target buffer further comprises: a decoder for receiving the instruction address and corresponding to the instruction address and selecting a gate a word line; and a sense amplifier adapted to receive a data entry from the selected gated word line, the sense amplifier further comprising a line for communicating the word line strobe value to each A word line strobe circuit, the word line strobe circuit being associated with the gated word line. 40. - Branch prediction unit, including: 43 降@正. Replacement page I 1285841 17007pif.doc = branch history unit, (4) age history data. -= branch target buffer, including - batch gated words The meta-green, one-bit sub-read is accessed by the operation of the corresponding word line strobe circuit, and the branch target buffer is adapted to output an instruction portion of the branch target buffer that is received by the data, / This is the one-character line strobe value of the historical data. The sub-components hereby include 4:1. The branch prediction military element as described in claim 4, the branch prediction logic, which is used to receive the instruction part and provide a β-return ^ branch target buffer The data item that is output. Eight go to turn the 41st pin division unit, the ^. , ', ' is suitable to provide a write signal to the branch transfer buffer in the special? The branch prediction unit described in Item 41 of 1st, its historical data. ▲ 匕分支分支分支分支分支分支分支分支 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 Providing a corresponding measured bit to = this instruction address and providing - a pre-center 7 solution 3 / implementation; ^ unit for receiving this instruction, providing a 44 I2858417 pif.doc 孽 I2fl 19 """ a _________ d two: '! and provide in response to the execution of this decoded instruction - update its branch prediction unit includes · 2 historical units 'used to store branch history data; address 34 34 34, 'Flip to receive this instruction address and this has been updated = to provide this predicted address, and thus update this branch history data 2 target buffer 11, which is used to receive this instruction address and output A Becco entry 'This branch target buffer includes: The word line includes the word line between them, each of the inter-circuits contains __ = word read strobe Wei, (four) word line select drinker 彳Please specializes in _ 44 Lai Shu's treatment, which is the solution of the aircraft and _7^彳3 Batch execution path 'each execution path comprising a cleavage cry T6. The application of the process in item H patentable scope 45, wherein where a super Cloth quantization processor. 47 The processor 3 is as described in claim 46, wherein the processor is a vector processor or a single instruction multiple data processor. She ran the processor described in the 44th patent range, where the score was 49^^—the write signal to the branch target buffer. The history of the single-t, please refer to the processor described in item 48 of the patent scope, wherein the division of the grievance machine is in accordance with the execution history of the branch of a Japanese calendar 45 I28584〇U

a history and calculate its branch history data. 50. The processor of claim 49, wherein the branch history unit comprises a branch history table that stores the branch history data. 51. The processor of claim 44, wherein the memory array comprises a static random access memory array, the memory circuit comprising a one-bit static random access memory cell, and the word The meta-line strobe value includes a single data bit derived from the branch history data. 52. The processor of claim 51, wherein the branch target buffer further comprises: a decoder for receiving the instruction address and corresponding to the instruction address and selecting a gate a word line; and a sense amplifier for receiving the data item from the selected gated word line, the sense amplifier further comprising a line that conveys the word line strobe value to the respective memory In the bulk circuit, these memory circuits are associated with these gated word lines. 46