TWI274285B

TWI274285B - Branch instruction prediction and skipping using addresses of precedent instructions

Info

Publication number: TWI274285B
Application number: TW94110688A
Authority: TW
Inventors: Hong-Men Su
Original assignee: Faraday Tech Corp
Priority date: 2005-04-04
Filing date: 2005-04-04
Publication date: 2007-02-21
Also published as: TW200636584A

Abstract

A method of predicting branch instructions and a method of skipping branch instructions for pipelines which need more than one cycle to predict branch direction and branch target addresses in microprocessors and digital signal processors are provided. The address of an instruction executed before the predicted branch is used as an index to enable early branch prediction so that the address of the instruction predicted to be executed immediately after the branch is available earlier, thereby reducing the number of idle or wasted clock cycles.

Description

1274285 13204twf.doc/m 九、發明說明：【發明所屬之技術領域】本發明是關於在微處理機與數位訊號處理機當中測分支指令目標位址的方法，以及省略某純定分支指令不予執行的綠，兩财法皆_適祕_過程所間大於一個時脈週期的管線中。【先前技術】當今的微處理機(microprocess〇r)與數位訊號處理機 (digital signal processor)都是以多階段（如㈣的管線 (pipeline)架構執行指令。管線包括提取（fetch)、解碼 (如code)、執行(execute)等多個階段。為了提高執行效率，管線中的各階段錄取同時運作，也就是第三個階段處理第一個指令時，第二個階段會同時處理第二個指令，第— 階段會同時處理第三個指令，而非等第—個指令離開管線，才開始處理第二個指令，以致於大部分的階段閒置二用，浪費資源。如此的管線设計當指令是依序執行時非常順暢，但分支指令(branch instruction)會造成問題。一旦分支指令^ 立’使程式計數器(program counter)跳至別處，就要清匕 (flush)管線中前幾個階段的結果，以開始處理位於分^承標位址的指令。也就是說，會浪費掉接下來的幾個時、= 期(cycle)。、° 為了減少浪費，預測分支指令目標位址的技術因應而生’簡稱為分支預測(branch prediction)，目的是盡可处在 1274285 13204twf.d〇c/m 管線的最早階段，預測分支指令的目標位址方岐指「分支成立」㈣en)或「分支不成立」、J；ake7 〇二―旦預測結果是「分支成立」，也就是即將跳到別处三官線就能儘早提取位於分支目標位址的指令。在稍後，的二線執行階段’如果確定預測的目標位址正確，前幾個階段的結果就能保留，因此不會浪費時脈週期，可 - 階段同時處理的效率。 % 圖1纟會示一個先前分支預測技術的典型設計，其中每 =扣令的長度固定為4個位元組(byte)。在管線的提取階 ’，程式計數器101的内容位址，會同時提供給指令快取區102以提取目前指令，並且提供給預測單元1〇4為索引， =尋找之前記錄的相關資訊。如果相關資訊存在，表示目則指令為一分支指令，預測單元1〇4會預測並提供目前分支指令的方向106與目標位址1〇5給多工器1〇7。同時間加法器103會計算出下一個循序指令的位址，也就是程式 _ 汁數态101的内容位址加上目前指令長度的結果，也提供給多工器107。如果預測單元104預測的方向106為「分支不成立」，多工器107會讓加法器103輪出的下個循序指令位址進入程式計數器101，做為下次提取指令的位址’也就是繼續執行下去。否則多工器會讓預測單元 1〇4預測出的目標位址1〇5進入程式計數器1〇1，做為下次提取指令的位址，也就是預先提取接續目前分支指令的指令。在預測準確的前提下，無論分支是否成立，管線都能自正確的位址提取指令。 7 1274285 13204twf.doc/m 至於分支預測的優點，請比對圖2與圖3。圖2繪示無分支預測能力的管線如何執行指令，假設管線包括五個階段，從F1到W5。在第4個時脈週期，管線提取分支指令BC4之後，因為無法預知接續指令的位址，管線前端只好空置，等到第6個時脈週期，指令BC4通過執行階段 E3之後，得知下一個指令為T5，才能繼續提取指令。如圖所示，提取指令BC4之後，管線閒置了兩個時脈週期。相比之下1274285 13204twf.doc/m IX. Description of the Invention: [Technical Field] The present invention relates to a method for measuring a target address of a branch instruction in a microprocessor and a digital signal processor, and omitting a pure branch instruction The execution of the green, the two financial methods are _ _ secret _ process between the pipeline is greater than a clock cycle. [Prior Art] Today's microprocessors and digital signal processors execute instructions in a multi-stage (such as (4) pipeline architecture. The pipeline includes fetching and decoding ( In order to improve the efficiency of execution, the stages of the pipeline are operated simultaneously, that is, when the third stage processes the first instruction, the second stage processes the second one at the same time. The instruction, the first stage will process the third instruction at the same time, instead of waiting for the first instruction to leave the pipeline, and then start processing the second instruction, so that most of the phases are idle and waste resources. The instructions are executed very smoothly in sequence, but the branch instruction can cause problems. Once the branch instruction is set to 'make the program counter jump elsewhere, the first few stages in the flush pipeline are cleared. As a result, the instruction to locate the sub-standard address is started. That is, the next few hours, the cycle, will be wasted. , ° To reduce waste, The technology of predicting the target address of the branch instruction is abbreviated as branch prediction. The purpose is to be in the earliest stage of the 1274285 13204twf.d〇c/m pipeline, and predict the target address of the branch instruction. "Branch establishment" (4) en) or "branch not established", J; ake7 ― ― ― ― ― 预测预测预测预测预测预测预测预测预测预测预测预测预测预测预测预测预测预测预测预测预测预测预测预测预测预测预测预测预测预测预测预测预测预测预测预测预测预测预测In the second-line execution phase later, if the predicted target address is determined to be correct, the results of the previous stages can be retained, so that the clock cycle is not wasted, and the efficiency of the simultaneous processing of the phase can be achieved. % Figure 1 shows a typical design of a previous branch prediction technique where the length of each deduction is fixed at 4 bytes. At the extraction stage of the pipeline, the content address of the program counter 101 is simultaneously supplied to the instruction cache area 102 to extract the current instruction, and is supplied to the prediction unit 1〇4 as an index, = to find related information recorded previously. If the relevant information exists, indicating that the target instruction is a branch instruction, the prediction unit 1〇4 predicts and provides the direction 106 of the current branch instruction and the target address 1〇5 to the multiplexer 1〇7. At the same time, the adder 103 calculates the address of the next sequential instruction, that is, the result of the content address of the program _ juice number 101 plus the current instruction length, and also supplies it to the multiplexer 107. If the direction 106 predicted by the prediction unit 104 is "branch not established", the multiplexer 107 causes the next sequential instruction address that the adder 103 rotates to enter the program counter 101 as the address of the next fetch instruction 'that is to continue Carry on. Otherwise, the multiplexer will cause the prediction target 1〇4 to predict the target address 1〇5 to enter the program counter 1〇1 as the address of the next fetch instruction, that is, the instruction to fetch the current branch instruction in advance. Under the premise of accurate prediction, the pipeline can extract instructions from the correct address regardless of whether the branch is established or not. 7 1274285 13204twf.doc/m As for the advantages of branch prediction, please compare Figure 2 and Figure 3. Figure 2 shows how the pipeline without branch prediction capability executes the instructions, assuming the pipeline consists of five phases, from F1 to W5. In the 4th clock cycle, after the pipeline extracts the branch instruction BC4, the pipeline front end has to be vacant because the address of the connection instruction cannot be predicted. After the 6th clock cycle, the instruction BC4 passes the execution stage E3 and knows the next instruction. For T5, you can continue to fetch instructions. As shown in the figure, after the instruction BC4 is fetched, the pipeline is idle for two clock cycles. In contrast

岡j蹭不有分支預測能力的管線如何執行指令，假設管線同樣包括五個階段，從F1到W5。在第4 個時脈週期，管線提取分支指令BC4之後，已經預測出下 -個指令為T5，gj此在第5辦脈週鮮線丁5。在第6辦脈補，#齡⑽通舰行階段g取目標她確實為T5之後，管線前端的結果就能保邊下來，繼績執行指令，整個管線在置的階段，可發揮最大效率。 ^中/又有閒期：=11例是假設預測過程只需要-個時脈週週期曰漸：，、產生預測過程所需的時脈有七個階段，從F1到=。如圖4所示，假設管線期。在第4個時脈週期，當=過進脈週 :兀也同時開始預測指令⑽的目標位址二I預測 :=，因此當第5個時脈週期開始缘才日令BC4的目標位址，口处 $線逷不知道個時脈週期結束時，管、二二 1274285 13204twf.doc/m —w项綱始時才能提取目標位置τ9的指 t，之峰取齡i5鱗脈挪等於白白浪費，沒有生產力。去利第6622240就提出一種解決上述問題的方法，主要疋將獲取分支指令份，做為一個「分支前指令」(ptt =的計算複製一始時，就能提取目標位址的指人。個時脈週期開令後不會錢置階段，細提在提取分支指個週期處理衍生的分支前指令耗至少-由以上討論與例證可知匕：：有浪費。，测過程所需時間大於一個時:時，斤=法^以在線階段的閒置與浪費。功守更進一步減少管【發明内容】本發明的目的是提供 -種忽略某些特定分支指令分3令的方法，以及過程所需時間大於-個時脈週期法’可以在預測閒置與浪費減少到最低程产。、中，將官線階段的令，則以-個先前執行過的前序指 ^曰令為-分支指預剛所需的相關資訊於預測單“=前= 1274285 13204twf.doc/m =位址為索引’使預測單元尋找對立，則使_二=^^ 位址，否則以下—個依序執行的指令位址，一次提取指令的位址。，照本發明的難實施觸述，此_方法是以執前的一個前序指令的位址為預測的索引，而非如先讀術-般’以分支指令本身的位址為索引。如此程在提取分支指令本身之前提早開始，透過時間差的精準安排，使管線在完成提取分支指令時，同時也完成分支目標位址的預測，使管線能在下—靖脈提取位於目標位址的指令’或者在分支不成立時提取下一個依序執行的指令。在預測準確的前提下，這個預測方法可使管線階段完全不會有閒置與浪費。如果分支指令的下一個指令的真實位址不等於之前預測的位址，就會發生預測錯誤，管線中在上述的分支指令之後提取的指令必須全部刪除。管線會在分支指令之後執行的下一個指令的真實位址，提取一個新指令以恢復動作。這個步驟稱為分支預測確認作服吐predicti〇n verification)。本發明亦提出一種省略分支指令的方法。首先，提供一個預測單元，然後提取目前指令。若目前指令為無條件式固定目標分支指令（unconditional flxed4arget bmnch 1274285 13204twf.doc/m instruction)，而且在目前指令之前執行的指令並非分支指令’則以目前指令之前第n個執行過的指令位址為索引，記錄預測所需的相關資訊於預測單元内，其中η為預測單元從接受索引位址，到完成預測分支指令目標位址所需的時脈週期數。然後，以目前指令之前第η個執行過的指令位址為索引，使預測單元尋找對應的相關資訊，以預測分支才曰令的結果。最後，若對應的相關資訊存在於預測單元内且預測的方向為分支成立，則使用預測出的分支指令目標位址，做為下一次提取指令的位址，否則以下一個依序執行的指令位址，做為下一次提取指令的位址。依照本發明的較佳實施例所述，此省略方法是以執行在分支指令之前的一個前序指令的位址為預測的索引，而非如先前技術一般，以分支指令本身的位址為索引。如此可使預測過程在提取分支指令本身之前提早開始，透過時間差的精準安排，使管線在完成提取分支指令之前一指令時，同時也完成分支目標位址的預測，使管線能在下一個日守脈週期，直接提取位於目標位址的指令，因此省略分支指令本身不予執行。由於無條件式固定目標分支指令每一 -人的執行結果都相同，上述的省略並非預測，而是可節省時脈週期的必然結果。如果遭遇會自我修改的程式碼(Self_m〇difying C〇de)，當無條件式固定目標分支指令改為不同的指令時，在分支預測單元内對應的相關資訊也必須移除(invaUdate)，以保證正確結果。How does the pipeline with no branch prediction capability execute the instructions, assuming that the pipeline also includes five phases, from F1 to W5. In the 4th clock cycle, after the pipeline extracts the branch instruction BC4, it has been predicted that the next instruction is T5, and gj is in the 5th cycle. In the 6th pulse, the #龄(10) pass-through phase g takes the target. After she is indeed T5, the result of the front end of the pipeline can be preserved. The successor is executed, and the whole pipeline can be used for maximum efficiency. ^中/又有闲期: =11 cases assume that the prediction process only requires - a clock cycle cycle:, the clock required to generate the prediction process has seven phases, from F1 to =. As shown in Figure 4, the pipeline period is assumed. In the 4th clock cycle, when = over the pulse period: 兀 also starts the prediction target (10) target address II I prediction: =, so when the 5th clock cycle begins, the target address of the BC4 At the end of the $ line at the mouth, I don’t know when the end of the clock cycle, the tube, the second two 1274285 13204twf.doc/m-w, the beginning of the line can extract the target position τ9 of the finger t, the peak age i5 scale pulse is equal to white Waste, no productivity. Go to 6622240 to propose a solution to the above problem, mainly to obtain the branch instruction share as a "pre-branch instruction" (the target of the target address can be extracted at the beginning of the calculation of the ptt = copy. After the clock cycle is started, there will be no money to set the stage. In detail, the instruction consumes at least the branch before the extraction branch refers to the processing of the derived branch. From the above discussion and illustration, it is known that there is waste. The time required for the measurement process is greater than one. : ○, 斤 = method ^ to idle and waste in the online phase. Guarding to further reduce the tube [Summary] The object of the present invention is to provide a method of ignoring certain specific branch instructions, and the time required for the process More than - a clock cycle method can reduce the idle and waste to the lowest production. In the middle, the order of the official line is - the pre-order of the previous execution is - branch refers to the pre-gang The relevant information required in the forecast list "= before = 1274285 13204twf.doc / m = address is indexed" to make the prediction unit look for the opposite, then make _ two = ^ ^ address, otherwise the following - a sequential execution of the instruction bit Address, one extraction The address of the instruction. According to the difficult implementation of the present invention, the method is based on the address of a preamble instruction before the prediction, rather than the position of the branch instruction itself as in the first reading. The address is an index. This process starts early before extracting the branch instruction itself. Through the precise arrangement of the time difference, the pipeline completes the branch target address prediction when the branch instruction is completed, so that the pipeline can be located at the lower-jing pulse extraction. The instruction of the address 'or extracts the next instruction executed sequentially when the branch is not established. Under the premise of accurate prediction, this prediction method can make the pipeline stage completely idle and wasteful. If the next instruction of the branch instruction is true If the address is not equal to the previously predicted address, a prediction error will occur, and the instructions fetched after the above branch instruction in the pipeline must be deleted. The pipeline will extract a new address from the real address of the next instruction executed after the branch instruction. The instruction is to resume the action. This step is called branch prediction confirmation as a service routine. A method of omitting a branch instruction. First, provide a prediction unit and then extract the current instruction. If the current instruction is an unconditional fixed target branch instruction (unconditional flxed4arget bmnch 1274285 13204twf.doc/m instruction), and the execution is performed before the current instruction. The instruction is not a branch instruction, and the instruction information of the nth execution before the current instruction is indexed, and the relevant information required for the prediction is recorded in the prediction unit, where η is the prediction unit from accepting the index address to completing the prediction branch instruction. The number of clock cycles required for the target address. Then, using the instruction address of the nth executed before the current instruction as an index, the prediction unit searches for the corresponding related information to predict the result of the branch. Finally, if the corresponding related information exists in the prediction unit and the direction of the prediction is a branch, the predicted branch instruction target address is used as the address of the next extraction instruction, otherwise the following instruction bit is executed sequentially. Address, as the address of the next fetch instruction. According to a preferred embodiment of the present invention, the omission method is to perform an index of a preamble instruction before the branch instruction as a predicted index, instead of indexing the address of the branch instruction itself as in the prior art. . In this way, the prediction process can be started before the branch instruction itself is extracted. Through the precise arrangement of the time difference, the pipeline completes the prediction of the branch target address at the same time as the instruction before the branch instruction is completed, so that the pipeline can be on the next day. The cycle directly extracts the instruction at the target address, so omitting the branch instruction itself does not execute. Since the unconditional fixed target branch instruction has the same execution result for each person, the above omission is not a prediction, but the inevitable result of the clock cycle can be saved. If you encounter a self-modifying code (Self_m〇difying C〇de), when the unconditional fixed target branch instruction is changed to a different instruction, the corresponding information in the branch prediction unit must also be removed (invaUdate) to ensure The correct result.

11 1274285 13204twf.doc/m 為讓本發明之上述和其他目的、特徵和優點能更明顯易懂，下文特舉一較佳實施例，並配合所附圖說明如下。 H、田【實施方式】、假設本實施例的管線(pipeline)包含一個預測單元，豆 f測過程需時η個時脈週期。另外假設這個管線會儲存目丽指令的位址，以及前11個執行過的指令位址，也就是程 φ 式计數态(program c〇unter)目前的内容位址以及前η個内容位址。本實施例以PC[0]表示目前指令的位址，表示上-偏旨令的健，PC[_2]表示上上—個指令的位址，依此類推，直到PC|>n]。圖5繪示本實施例使用的預測單元，其主體為一表格。圖5僅緣出四列，實際上可以包含任意數量的列，每一列的内容是職-個分支齡(b_h丨她⑽㈣的相關貧訊，每-列也稱為-筆記錄。表格有四搁··第一搁5〇ι 存放有效旗號，標示這一列所儲存的内容是否有效；第二 I 攔502存放對應的無條件式固定目標分支指令的之前第^ 個執行過的指令位址，或其他種類的分支指令的之前第η」個執行過的指令位址；第三個攔位5〇3存放分支指令的預測目標位址(prediction target address);以及第四攔5〇4，存放預測相關資訊。本預測單元的作用，是接受索引位址 5〇5，在表格中尋找對應記錄，並提供預測目標位址5〇6 與預測分支方向507。預測目標位址5〇6會輸出為圖！的目標位址105，而預測分支方向5〇7會輸出為圖j的預測 12 1274285 13204twf.doc/m 方向106。上述表格的用法在後面會有補充說明。要注意的是，有很多種方法可轉放_相_誠及利用它們來做預測。這些方法都可以用來預測分支方向為成立或不成立。本實施例使用的預測單元非常簡單，儲存的相關資訊也很單純，實際上的·單元可能儲存更多資訊，並提供更多預測結果。通常預測單福演算法越複雜，相關資訊就越多。以下說明本發明提出的一種預測分支指令的方法，其步驟與流程請參照圖6。從步驟6〇2開始，先提取位址為 pc[x]的指令。然後步驟604會判斷pc[x]是否存在於圖5 的分支預測單元，如果是，接著進行步驟6〇6，輸出對應的預測分支目標位址506與預測分支方向5〇7，否則到^ 驟608，將預測分支方向507設定為不成立。經過步驟606與608之後，在步驟61〇，多工器1〇7 會遥取私式计數态101的内容位址。由於分支預測的延遲，這個内容位址是用於pC[x]之後第η個執行的指令，也就是PC[x+n]。接下來，在步驟612,當管線執行指令pcR+n-i]時，會知道下一個指令的真實位址。如果指令PC[x+n-l]為分支指令’執行時會決定分支成立或不成立，因此會決定下一個指令的程式計數器内容位址。如果指令並非分支指令，下一個指令的程式計數器内容位址為 PC[x+n_l]+4 〇 1274285 13204twf.doc/m 接下來，在步驟614，如果PC[x+n-l]的下一個指令的真實位址不等於之前預測的PC[x+n]，會發生預測錯誤，所有管線中在PC[x+n-l]之後提取的指令都要刪除。管線會在PC[x+n-l]之後執行的下一個指令的真實位址，提取一個新指令以恢復動作。這個步驟稱為分支預測確認。最後，在步驟616，獨立於上述步驟之外，在執行— 個分支指令時，其目標位址PC[y]會被寫入分支預測單元，而分支方向之類的其他資訊，可用來更新同一筆記錄中的預測相關資訊504。在分支預測單元中將被寫入的記錄，引，是由目前分支指令之前第仏丨個指令的位址（也就是PC[y-n+l])❾一個函數決定。本實施例是以指令位址的m個最低位元為索引值，其中一預設常數。以下說明此種預測方法的效果，請參照圖7。假設羊 -實施例的預測單元需要2個時脈職妓成删，並假设此-貫施例的管線有七個階段，從F⑼w7。如圖所矛，分支指令BC4的前序指令為i3。線開始提取i3時，也同時開始預測分支指；二的目田； =^^4，_期結束時，當管線完成提取分1 G來在第刀5支” BC4的目標位址T9也同時被預測出， -由此:知:二 =的方法，可以使管=不下會=的;: 從本發出的_分支指令的方法，可讀生出— 1274285 13204twf.doc/m 種省略分支指令的方法’藉以略過某些特定分支指令不予執行，更進-步節省管線的時脈週期，這個省略分支指令方法的流程與步驟請參照圖8。這個省略分支指令的方法，與前述的預測分支指令的方法非常相似，步驟802至816與圖6的步驟至616 _。在最後的步驟818 ’獨立於上述步驟之外，當管線執行無條件式固定目標分支指令，而且前一個指令並非分 • 支指令日守，無條件式分支指令的目標位址PC[y]會被寫入分支預測單兀，其對應的預測相關資訊5〇4會被更新，使預測的分支方向永遠為成立。在分支預測單元中將被寫入的記錄索引’是由無條件式分支指令之前第n個指令的位址（也就是PC[y-n])的-個函數決定。本實施例在此使用的函數和步驟816使用的相同。如此-來，當管線完成提取這些分支指令的前一指令肖’同時也完成·這些分支指令的目標位址，在下一個 ’ 時脈職，管線就直接提取位於目標位址的指令。由於這些分支指令被省略不予執行，結果是比其他分支指令多節省一個時脈週期。圖9舉例說明這個省略分支指令方法的效果，假設某 -實施例的預測單元需要2個時脈週期以完成預測，炎假設此一實施例的管線有七個階段，從F1至,】W7。如圖所示，分支指令B4的前序指令為i2，在第2個時脈週期，當管線開始提取指令i2，也同時開始預測分支指令B4的目標 15 1274285 I3204twf.doc/m 位if，當第3個時脈週期結束時，管線完成提取指令g，同日守也預測出分支指令B4的目標位址T9，接下來管線就可以直接提取T9，跳過分支指令Β4本身。圖9中加底線 ^-行實際並未發生，分支指令Β4並沒有進人管線消耗時脈週期’省下來的時脈週期可用於執行其他指令。為了防止會自我修改的程式碼造成無條件式固定目私分支指令的省略錯誤，分支預測單元内含的分支指令記丨錄必須與記憶體實際儲存的分支指令保持一致。做法是如果記憶體的内容有所變動，使一個或多個無條件式固定目標分支指令被移除，就從預測單元中，移除此分支指令所對應的相關記錄。重新建立的記錄，可以保證與記憶體中最新的分支指令一致。 " ^由以上說明可知，本發明的主要特徵與優點為利用執行在前的前序指令位址為預測索引，使預測過程提早開始’以提早獲得預測的分支指令目標位址，進而完全避免管線階段的閒置與浪費。雖然本發明已以較佳實施例揭露如上，然其並非用以限定本發明，任何熟習此技藝者，在不脫離本發明之精神和範圍内，當可作些許之更動與潤飾，因此本發明之保護範圍當視後附之申請專利範圍所界定者為準。° 【圖式簡單說明】圖1為一分支預測技術的典型架構圖。圖2為一無分支預測能力的傳統管線的指令執行範 1274285 13204twf.d〇c/m 時姆要兩個一圖5緣示根據於本發明一實施例的分支指令預測單圖6為根據於本發明减程圖。 0 7為根據於本發明執行範例。圖8為根據於本發明流裎圖。圖9為根據於本發明執行範例。The above and other objects, features and advantages of the present invention will become more apparent from the <RTIgt; H. Field [Embodiment] It is assumed that the pipeline of the present embodiment includes a prediction unit, and the process of the bean f measurement takes η clock cycles. In addition, it is assumed that this pipeline will store the address of the command and the first 11 executed instruction addresses, that is, the current content address of the program c〇unter and the first n content addresses. . In this embodiment, the address of the current instruction is represented by PC[0], which represents the health of the upper-bias order, PC[_2] represents the address of the upper-up instruction, and so on, until PC|>n]. Fig. 5 is a diagram showing the prediction unit used in the embodiment, the main body of which is a table. Figure 5 only has four columns. It can actually contain any number of columns. The content of each column is the job-branch age (b_h丨 her (10) (four) related poor news, each column is also called - pen record. The table has four The first stop 5〇ι stores a valid flag to indicate whether the content stored in this column is valid; the second I block 502 stores the previous executed instruction address of the corresponding unconditional fixed target branch instruction, or The first n'th executed instruction address of the other kind of branch instruction; the third block 5〇3 stores the prediction target address of the branch instruction; and the fourth block 5〇4 stores the prediction Related information. The function of this prediction unit is to accept the index address 5〇5, find the corresponding record in the table, and provide the prediction target address 5〇6 and the prediction branch direction 507. The prediction target address 5〇6 will be output as The target address of Figure! is 105, and the predicted branch direction of 5〇7 is output as the prediction of Figure j. 1 1285285 13204twf.doc/m direction 106. The usage of the above table will be supplemented later. It should be noted that there are many Ways can be transferred _ _ Cheng and use them to make predictions. These methods can be used to predict whether the branch direction is true or not. The prediction unit used in this embodiment is very simple, and the related information stored is very simple. In fact, the unit may store more. Information, and provide more prediction results. Generally, the more complicated the prediction algorithm is, the more relevant information is. The following describes a method for predicting branch instructions proposed by the present invention, and the steps and processes thereof are shown in Figure 6. From step 6 2, first extract the instruction with the address pc[x]. Then step 604 will determine whether pc[x] exists in the branch prediction unit of Figure 5, and if yes, proceed to step 6〇6 to output the corresponding predicted branch target. The address 506 and the prediction branch direction are 5〇7, otherwise the prediction branch direction 507 is set to be unsuccessful. After steps 606 and 608, in step 61, the multiplexer 1〇7 will take the private meter. The content address of the number state 101. Due to the delay of the branch prediction, this content address is the instruction for the nth execution after pC[x], that is, PC[x+n]. Next, at step 612, when tube When the instruction pcR+ni] is executed, the real address of the next instruction will be known. If the instruction PC[x+nl] is a branch instruction, it will determine whether the branch is established or not, so it will determine the program counter content of the next instruction. If the instruction is not a branch instruction, the program counter content address of the next instruction is PC[x+n_l]+4 〇1274285 13204twf.doc/m Next, in step 614, if the next one of PC[x+nl] The actual address of the instruction is not equal to the previously predicted PC[x+n], a prediction error will occur, and the instructions extracted after PC[x+nl] in all pipelines will be deleted. The pipeline will extract a new instruction to resume the action by the real address of the next instruction executed after PC[x+n-l]. This step is called branch prediction confirmation. Finally, in step 616, in addition to the above steps, when the branch instruction is executed, the target address PC[y] is written to the branch prediction unit, and other information such as the branch direction can be used to update the same information. Prediction related information 504 in the pen record. The record to be written in the branch prediction unit is determined by the address of the first instruction (that is, PC[y-n+l]) 之前 before the current branch instruction. In this embodiment, the m lowest bits of the instruction address are indexed, and a preset constant. The effect of such a prediction method will be described below, please refer to FIG. Assume that the sheep-predictive unit of the embodiment requires two clocks to be deleted, and that the pipeline of this embodiment has seven stages, from F(9)w7. As shown in the figure, the preamble instruction of the branch instruction BC4 is i3. When the line starts to extract i3, it also starts to predict the branch finger; the second field; =^^4, at the end of the _ period, when the pipeline finishes extracting 1 G to the 5th branch of the knive, the target address T9 of BC4 is also It is predicted that - by this: know: two = method, you can make the tube = no = will =; from the method of the _ branch instruction issued, can be read - 1274285 13204twf.doc / m kind of omission of branch instructions The method 'sends some specific branch instructions and does not execute, and further saves the clock cycle of the pipeline. The flow and steps of the method of omitting the branch instruction are shown in Figure 8. This method of omitting the branch instruction, and the foregoing prediction The method of branching instructions is very similar, steps 802 to 816 and steps of 616 to 616. In the final step 818', independent of the above steps, when the pipeline executes an unconditional fixed target branch instruction, and the previous instruction is not divided. The target address PC[y] of the unconditional branch instruction is written into the branch prediction unit, and the corresponding prediction related information 5〇4 is updated so that the predicted branch direction is always established. Forecast list The record index to be written in the meta is determined by the function of the address of the nth instruction (that is, PC[yn]) before the unconditional branch instruction. The function used in this embodiment and step 816 are used. The same. So, when the pipeline completes the extraction of the previous instruction of these branch instructions, it also completes the target address of these branch instructions. At the next time, the pipeline directly extracts the instruction at the target address. Since these branch instructions are omitted and not executed, the result is one more clock cycle than other branch instructions. Figure 9 illustrates the effect of this method of omitting the branch instruction, assuming that the prediction unit of a certain embodiment requires 2 clock cycles. To complete the prediction, the inflammation assumes that the pipeline of this embodiment has seven phases, from F1 to, W7. As shown, the preamble instruction of the branch instruction B4 is i2, and in the second clock cycle, when the pipeline starts to extract The instruction i2 also starts to predict the target 15 1274285 I3204twf.doc/m bit if of the branch instruction B4. When the third clock cycle ends, the pipeline completes the extraction instruction g, and the same day predicts the score. The target address of the instruction B4 is T9, and then the pipeline can directly extract T9 and skip the branch instruction Β4 itself. The bottom line in Figure 9 does not actually occur, and the branch instruction Β4 does not enter the pipeline to consume the clock cycle. 'The saved clock cycle can be used to execute other instructions. In order to prevent the self-modifying code from causing the omitting error of the unconditional fixed private branch instruction, the branch instruction record contained in the branch prediction unit must be actually stored with the memory. The branch instruction is consistent. If the content of the memory is changed, one or more unconditional fixed target branch instructions are removed, and the relevant record corresponding to the branch instruction is removed from the prediction unit. The re-established record is guaranteed to be consistent with the latest branch instructions in the memory. " ^ From the above description, the main features and advantages of the present invention are that the execution of the preceding pre-order instruction address is the prediction index, so that the prediction process starts early to 'prematurely obtain the predicted branch instruction target address, thereby completely avoiding Idle and waste in the pipeline stage. While the present invention has been described in its preferred embodiments, the present invention is not intended to limit the invention, and the present invention may be modified and modified without departing from the spirit and scope of the invention. The scope of protection is subject to the definition of the scope of the patent application. ° [Simple description of the diagram] Figure 1 is a typical architecture diagram of a branch prediction technique. 2 is a command execution specification of a conventional pipeline without branch prediction capability 1274285 13204twf.d〇c/m, and two diagrams are shown in FIG. 5 according to an embodiment of the present invention. The reduction map of the present invention. 0 7 is an execution example according to the present invention. Figure 8 is a flow diagram in accordance with the present invention. Figure 9 is an example of execution in accordance with the present invention.

實施例的分支指令預測方法的實施例的分支指令預測方法的實施例的分支指令省略方法的實施例的分支指令省略方法的The branch instruction omitting method of the embodiment of the branch instruction prediction method of the embodiment is a branch instruction omitting method of the embodiment

【主要元件符號說明】 101 程式計數器 102 指令快取區 103 加法器 104 預測單元 105 目標位址 106 預測方向 107 多工器 501 有效旗號 502 索引位址 17 1274285 13204twf.doc/m 503 ··分支指令目標位址 504 :預測相關資訊 5〇5 :索引位址 506 :預測目標位址 5〇7 :預測分支方向 602 ·提取位址為pc[x]的目前指令 604 : PC[x]存在於圖5的分支預測單元中？ 606 :輸出對應的預測分支目標位址與預測分支方向 608 :設定預測分支方向為不成立 610·多工器1〇7會選取預測的程式計數器内容位址。由於分支預測的延遲，這個内容位址是用於p c [ χ ]之後第n 個執行的指令，也就是PC[x+n]。這個内容值會用於從步驟602開始的指令提取動作。 612 :當管線執行指令Ρ(：[χ+η_ιμγ，會知道下一個指令的真實位址。如果指令PC[x+n-l]為分支指令，執行時會決定分支成立或不成立，因此會決定下一個指令的程式計數器内容位址。如果指令PC[x+n-l]並非分支指令，下一個指令的程式計數器内容位址為PC[x+n-1]+4。 614 :如果PC[x+n-l]的下一個指令的真實位址不等於之前預測的PC[x+n]，會發生預測錯誤，所有管線中在 PC[x+n_i]之後提取的指令都要刪除。管線會在ρ〔[χ+η_ι] 之後執行的下一個指令的真實位址，提取一個新指令以恢復動作。這個步驟稱為分支預測確認。 616 ··獨立於上述步驟之外，在執行一個分支指令時，其目標位址PC[y]會被寫入分支預測單元，而分支方向之 18 1274285 13204twf.doc/m 類的其他資訊，可用來更新同一筆記錄中的預測相關資訊。在分支預測單元中將被寫入的記錄索引，是由目前分支指令之前第n-1個指令的位址（也就是pC[y-n+1])的一個函數決定。本實施例是以指令位址的m個最低位元為索引值，其中m為一預設常數。 802 :提取位址為pc[x]的目前指令 804 : PC[x]存在於圖5的分支預測單元中？ 806 ·輸出對應的預測分支目標位址與預測分支方向 808 :設定預測分支方向為不成立 810 ·多工器1〇7會選取預測的程式計數器内容位址。由於分支預測的延遲，這個内容位址是用於pc[x]之後第η 個執行的指令，也就是PC[x+n]。這個内容值會用於從步驟802開始的指令提取動作。 812 :當管線執行指令Ρ(：[χ+η_1μΐ，會知道下一個指令的真實位址。如果指令PC[x+n-l]為分支指令，執行時會決定分支成立或不成立，因此會決定下一個指令的程式計數器内容位址。如果指令PC[x+n-i]並非分支指令，下一個指令的程式計數器内容位址為PC[x+n-i]+4。乂814 :如果PC[x+n_l]的下一個指令的真實位址不等於之前預測的PC[x+n]，會發生預測錯誤，所有管線中在 PCfxn]之後提取的指令都要刪除。管線會在pc[x+n i] ^後執行的下一個指令的真實位址，提取一個新指令以恢復動作。這個步驟稱為分支預測確認。 816 ··獨立於上述步驟之外，在執行一個分支指令時， 19 1274285 13204twf.doc/m 其目標位址PC[y]會被寫入分支預測單元，而分支方向之類的其他資訊，可用來更新同一筆記錄中的預測相關資訊。在分支預測單元中將被寫入的記錄索引，是由目前八支指令之前第n-1個指令的位址（也就是pC[y-n+1])的〜個函數決定。本實施例是以指令位址的m個最低位元為索引值，其中m為一預設常數。 ’、 818 :獨立於上述步驟之外，當管線執行無條件式固定目標分支指令，而且前一個指令並非分支指令時，無條件式分支指令的目標位址PC[y]會被寫入分支預測單元，其對應的預測相關資訊會被更新，使預測的分支方向永遠為成立。在分支預測單元中將被寫入的記錄索引，是由無條件式分支指令之前第η個指令的位址（也就是pc[y-n]) 的一個函數決定。本實施例在此使用的函數和步驟816使用的相同。[Main component symbol description] 101 Program counter 102 Instruction cache area 103 Adder 104 Prediction unit 105 Target address 106 Prediction direction 107 Multiplexer 501 Valid flag 502 Index address 17 1274285 13204twf.doc/m 503 ·· Branch instruction Target address 504: prediction related information 5〇5: index address 506: prediction target address 5〇7: prediction branch direction 602 • extraction of current instruction 604 with address pc[x]: PC[x] exists in the diagram 5 branch prediction unit? 606: Output the corresponding predicted branch target address and prediction branch direction 608: Set the prediction branch direction to be invalid. 610. The multiplexer 1〇7 selects the predicted program counter content address. Due to the delay of the branch prediction, this content address is the instruction for the nth execution after p c [ χ ], that is, PC[x+n]. This content value will be used for the instruction fetch action starting at step 602. 612: When the pipeline executes the command Ρ(:[χ+η_ιμγ, the real address of the next instruction will be known. If the instruction PC[x+nl] is a branch instruction, the execution will determine whether the branch is established or not, so it will determine the next one. The program counter content address of the instruction. If the instruction PC[x+nl] is not a branch instruction, the program counter content address of the next instruction is PC[x+n-1]+4. 614: If PC[x+nl] The actual address of the next instruction is not equal to the previously predicted PC[x+n], and a prediction error will occur. All instructions in the pipeline that are extracted after PC[x+n_i] will be deleted. The pipeline will be in ρ[[χ +η_ι] The real address of the next instruction executed afterwards, a new instruction is fetched to resume the action. This step is called branch prediction acknowledgment. 616 ··································· The address PC[y] will be written to the branch prediction unit, and the other information in the branch direction 18 1274285 13204twf.doc/m can be used to update the prediction related information in the same record. It will be written in the branch prediction unit. The index of the record is the current branch The function of the address of the n-1th instruction (that is, pC[y-n+1]) before the instruction is determined. In this embodiment, the m lowest bits of the instruction address are indexed, where m is one. Preset constant 802: Extract the current instruction 804 whose address is pc[x]: PC[x] exists in the branch prediction unit of FIG. 5? 806 · Output the corresponding predicted branch target address and prediction branch direction 808: Set The prediction branch direction is not established. 810. The multiplexer 1〇7 selects the predicted program counter content address. Due to the delay of the branch prediction, this content address is the instruction for the nth execution after pc[x], that is, PC[x+n]. This content value will be used for the instruction fetch action starting from step 802. 812: When the pipeline executes the instruction Ρ(:[χ+η_1μΐ, the real address of the next instruction will be known. If the instruction PC[ x+nl] is a branch instruction, which determines whether the branch is established or not, so it determines the program counter content address of the next instruction. If the instruction PC[x+ni] is not a branch instruction, the program counter content bit of the next instruction The address is PC[x+ni]+4. 乂814: If PC[x+n_l] The real address of the next instruction is not equal to the previously predicted PC[x+n], a prediction error will occur, and the instructions fetched after PCfxn] in all pipelines will be deleted. The pipeline will be executed after pc[x+ni] ^ The real address of the next instruction, extract a new instruction to resume the action. This step is called branch prediction acknowledgment. 816 ·························································· The target address PC[y] is written to the branch prediction unit, and other information such as the branch direction can be used to update the prediction related information in the same record. The record index to be written in the branch prediction unit is determined by the ~ function of the address of the n-1th instruction (i.e., pC[y-n+1]) before the current eight instructions. In this embodiment, the m lowest bits of the instruction address are the index values, where m is a predetermined constant. ', 818: Independent of the above steps, when the pipeline executes the unconditional fixed target branch instruction, and the previous instruction is not a branch instruction, the target address PC[y] of the unconditional branch instruction is written to the branch prediction unit. The corresponding prediction related information will be updated so that the predicted branch direction is always established. The record index to be written in the branch prediction unit is determined by a function of the address of the nth instruction (i.e., pc[y-n]) before the unconditional branch instruction. The function used in this embodiment is the same as that used in step 816.

2020

Claims

The party of the predicted branch instruction described in 1 item 1274285 13204twf.doc/m X. A patent scope: 1. A method for predicting branch instructions, sentence extraction and execution - current instruction; Step: If the current instruction is a - branch instruction , the address of the sequence instruction is indexed, and the address of the pre-order instruction is recorded as the target address of the branch instruction; and A 'predicts if The relevant information exists and the prediction direction is the target address of the branch instruction that is measured, and if so, the pre-executed instruction address is used, == 2 · If the patent application scope law 'includes more: For the n-executed instruction address before the current instruction, and the address of the destination instruction, the towel β is indexed from the address receiving the pre-order instruction to 70 to predict the target address and branch of the branch instruction. The number of clock cycles required for the direction. 3. The method of predicting a branch instruction as described in claim 2, wherein the address of the preamble instruction is the nth consecutive execution of the instruction address before the current instruction. 4. The method of predicting a branch instruction as described in claim 1 of the scope of the patent application, wherein the related information includes the direction of the branch instruction. 5. The method of predicting a branch instruction as described in claim 1 wherein the related information includes a target address of the branch instruction.

21 1274285 13204twf.doc/m 6. The predictive branching method as described in claim 1 of the patent application, wherein the relevant information is stored in a branch of the prediction unit of a branch. 7. A method for omitting a branch instruction, comprising the steps of: extracting and executing a current instruction; if the current instruction is an unconditional fixed target divided by eight before the current instruction - the instruction is not a - branch instruction, then The nth executed instruction address is indexed, and the record pre-command is related to the poor news, among them! ! To accept the index, 'and ~ the first n executed before the current instruction: find the relevant information to predict the branch direction of the branch instruction two; the position is in the direction of the branch, then make = The target of the private instruction that is output is: ,::r, the instruction address that is executed sequentially, == method, please the party that omits the branch instruction mentioned in item 7 of the patent scope; The executed instruction address, and the method, == the party that omits the branch instruction described in item 7 of the township, wherein the relevant education includes the direction method of the branch instruction, and the party that omits the branch instruction in the seventh paragraph The related information of the branch instruction includes the target address of the branch instruction. 22 1274285 13204twf.doc/m 11. The method for omitting a branch instruction as described in claim 7 of the patent application, further comprising: providing a memory for storing the instruction; extracting the current instruction from the memory; and if the memory The content of the branch is changed, and the branch instructions are removed from the memory, and the related information corresponding to the unconditional fixed target branch instruction is removed.

twenty three