TWI274285B - Branch instruction prediction and skipping using addresses of precedent instructions - Google Patents

Branch instruction prediction and skipping using addresses of precedent instructions Download PDF

Info

Publication number
TWI274285B
TWI274285B TW94110688A TW94110688A TWI274285B TW I274285 B TWI274285 B TW I274285B TW 94110688 A TW94110688 A TW 94110688A TW 94110688 A TW94110688 A TW 94110688A TW I274285 B TWI274285 B TW I274285B
Authority
TW
Taiwan
Prior art keywords
instruction
branch
address
branch instruction
prediction
Prior art date
Application number
TW94110688A
Other languages
Chinese (zh)
Other versions
TW200636584A (en
Inventor
Hong-Men Su
Original Assignee
Faraday Tech Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Faraday Tech Corp filed Critical Faraday Tech Corp
Priority to TW94110688A priority Critical patent/TWI274285B/en
Publication of TW200636584A publication Critical patent/TW200636584A/en
Application granted granted Critical
Publication of TWI274285B publication Critical patent/TWI274285B/en

Links

Abstract

A method of predicting branch instructions and a method of skipping branch instructions for pipelines which need more than one cycle to predict branch direction and branch target addresses in microprocessors and digital signal processors are provided. The address of an instruction executed before the predicted branch is used as an index to enable early branch prediction so that the address of the instruction predicted to be executed immediately after the branch is available earlier, thereby reducing the number of idle or wasted clock cycles.

Description

1274285 13204twf.doc/m 九、發明說明: 【發明所屬之技術領域】 本發明是關於在微處理機與數位訊號處理機當中 測分支指令目標位址的方法,以及省略某純定分支指令 不予執行的綠,兩财法皆_適祕_過程所 間大於一個時脈週期的管線中。 【先前技術】 當今的微處理機(microprocess〇r)與數位訊號處理機 (digital signal processor)都是以多階段(如㈣的管線 (pipeline)架構執行指令。管線包括提取(fetch)、解碼 (如code)、執行(execute)等多個階段。為了提高執行效率, 管線中的各階段錄取同時運作,也就是第三個階段處理 第一個指令時,第二個階段會同時處理第二個指令,第— 階段會同時處理第三個指令,而非等第—個指令離開管 線,才開始處理第二個指令,以致於大部分的階段閒置二 用,浪費資源。 如此的管線设計當指令是依序執行時非常順暢,但分 支指令(branch instruction)會造成問題。一旦分支指令^ 立’使程式計數器(program counter)跳至別處,就要清匕 (flush)管線中前幾個階段的結果,以開始處理位於分^承 標位址的指令。也就是說,會浪費掉接下來的幾個時、= 期(cycle)。 、° 為了減少浪費,預測分支指令目標位址的技術因應而 生’簡稱為分支預測(branch prediction),目的是盡可处在 1274285 13204twf.d〇c/m 管線的最早階段,預測分支指令的目標位址 方岐指「分支成立」㈣en)或「分支不成立」、J;ake7 〇二―旦預測結果是「分支成立」,也就是即將跳到別 处三官線就能儘早提取位於分支目標位址的指令。在稍後 ,的二線執行階段’如果確定預測的目標位址正確,前幾個 階段的結果就能保留,因此不會浪費時脈週期,可 - 階段同時處理的效率。 % 圖1纟會示一個先前分支預測技術的典型設計,其中每 =扣令的長度固定為4個位元組(byte)。在管線的提取階 ’,程式計數器101的内容位址,會同時提供給指令快取 區102以提取目前指令,並且提供給預測單元1〇4為索引, =尋找之前記錄的相關資訊。如果相關資訊存在,表示目 則指令為一分支指令,預測單元1〇4會預測並提供目前分 支指令的方向106與目標位址1〇5給多工器1〇7。同時間 加法器103會計算出下一個循序指令的位址,也就是程式 _ 汁數态101的内容位址加上目前指令長度的結果,也提供 給多工器107。如果預測單元104預測的方向106為「分 支不成立」,多工器107會讓加法器103輪出的下個循序 指令位址進入程式計數器101,做為下次提取指令的位 址’也就是繼續執行下去。否則多工器會讓預測單元 1〇4預測出的目標位址1〇5進入程式計數器1〇1,做為下次 提取指令的位址,也就是預先提取接續目前分支指令的指 令。在預測準確的前提下,無論分支是否成立,管線都能 自正確的位址提取指令。 7 1274285 13204twf.doc/m 至於分支預測的優點,請比對圖2與圖3。圖2繪示 無分支預測能力的管線如何執行指令,假設管線包括五個 階段,從F1到W5。在第4個時脈週期,管線提取分支指 令BC4之後,因為無法預知接續指令的位址,管線前端只 好空置,等到第6個時脈週期,指令BC4通過執行階段 E3之後,得知下一個指令為T5,才能繼續提取指令。如 圖所示,提取指令BC4之後,管線閒置了兩個時脈週期。 相比之下1274285 13204twf.doc/m IX. Description of the Invention: [Technical Field] The present invention relates to a method for measuring a target address of a branch instruction in a microprocessor and a digital signal processor, and omitting a pure branch instruction The execution of the green, the two financial methods are _ _ secret _ process between the pipeline is greater than a clock cycle. [Prior Art] Today's microprocessors and digital signal processors execute instructions in a multi-stage (such as (4) pipeline architecture. The pipeline includes fetching and decoding ( In order to improve the efficiency of execution, the stages of the pipeline are operated simultaneously, that is, when the third stage processes the first instruction, the second stage processes the second one at the same time. The instruction, the first stage will process the third instruction at the same time, instead of waiting for the first instruction to leave the pipeline, and then start processing the second instruction, so that most of the phases are idle and waste resources. The instructions are executed very smoothly in sequence, but the branch instruction can cause problems. Once the branch instruction is set to 'make the program counter jump elsewhere, the first few stages in the flush pipeline are cleared. As a result, the instruction to locate the sub-standard address is started. That is, the next few hours, the cycle, will be wasted. , ° To reduce waste, The technology of predicting the target address of the branch instruction is abbreviated as branch prediction. The purpose is to be in the earliest stage of the 1274285 13204twf.d〇c/m pipeline, and predict the target address of the branch instruction. "Branch establishment" (4) en) or "branch not established", J; ake7 ― ― ― ― ― 预测 预测 预测 预测 预测 预测 预测 预测 预测 预测 预测 预测 预测 预测 预测 预测 预测 预测 预测 预测 预测 预测 预测 预测 预测 预测 预测 预测 预测 预测 预测 预测 预测 预测 预测 预测 预测 预测 预测In the second-line execution phase later, if the predicted target address is determined to be correct, the results of the previous stages can be retained, so that the clock cycle is not wasted, and the efficiency of the simultaneous processing of the phase can be achieved. % Figure 1 shows a typical design of a previous branch prediction technique where the length of each deduction is fixed at 4 bytes. At the extraction stage of the pipeline, the content address of the program counter 101 is simultaneously supplied to the instruction cache area 102 to extract the current instruction, and is supplied to the prediction unit 1〇4 as an index, = to find related information recorded previously. If the relevant information exists, indicating that the target instruction is a branch instruction, the prediction unit 1〇4 predicts and provides the direction 106 of the current branch instruction and the target address 1〇5 to the multiplexer 1〇7. At the same time, the adder 103 calculates the address of the next sequential instruction, that is, the result of the content address of the program _ juice number 101 plus the current instruction length, and also supplies it to the multiplexer 107. If the direction 106 predicted by the prediction unit 104 is "branch not established", the multiplexer 107 causes the next sequential instruction address that the adder 103 rotates to enter the program counter 101 as the address of the next fetch instruction 'that is to continue Carry on. Otherwise, the multiplexer will cause the prediction target 1〇4 to predict the target address 1〇5 to enter the program counter 1〇1 as the address of the next fetch instruction, that is, the instruction to fetch the current branch instruction in advance. Under the premise of accurate prediction, the pipeline can extract instructions from the correct address regardless of whether the branch is established or not. 7 1274285 13204twf.doc/m As for the advantages of branch prediction, please compare Figure 2 and Figure 3. Figure 2 shows how the pipeline without branch prediction capability executes the instructions, assuming the pipeline consists of five phases, from F1 to W5. In the 4th clock cycle, after the pipeline extracts the branch instruction BC4, the pipeline front end has to be vacant because the address of the connection instruction cannot be predicted. After the 6th clock cycle, the instruction BC4 passes the execution stage E3 and knows the next instruction. For T5, you can continue to fetch instructions. As shown in the figure, after the instruction BC4 is fetched, the pipeline is idle for two clock cycles. In contrast

岡j蹭不有分支預測能力的管線如何執行 指令,假設管線同樣包括五個階段,從F1到W5。在第4 個時脈週期,管線提取分支指令BC4之後,已經預測出下 -個指令為T5,gj此在第5辦脈週鮮線 丁5。在第6辦脈補,#齡⑽通舰行階段g取 目標她確實為T5之後,管線前端的結果就能 保邊下來,繼績執行指令,整個管線在 置的階段,可發揮最大效率。 ^中/又有閒 期:=11例是假設預測過程只需要-個時脈週 週期曰漸:,、產生預測過程所需的時脈 有七個階段,從F1到=。如圖4所示,假設管線 期。在第4個時脈週期,當=過進 脈週 :兀也同時開始預測指令⑽的目標位址二I預測 :=,因此當第5個時脈週期開始缘 才日令BC4的目標位址,口处 $線逷不知道 個時脈週期結束時,管、二二 1274285 13204twf.doc/m —w项綱始時才能提取目標位置τ9的指 t,之峰取齡i5鱗脈挪等於白白浪費,沒有生產 力。 去利第6622240就提出一種解決上述問題的方 法,主要疋將獲取分支指令 份,做為一個「分支前指令」(ptt =的計算複製一 始時,就能提取目標位址的指人。個時脈週期開 令後不會錢置階段,細提在提取分支指 個週期處理衍生的分支前指令耗至少-由以上討論與例證可知匕::有浪費。 ,测過程所需時間大於一個時:時,斤=法^以在 線階段的閒置與浪費。 功守更進一步減少管 【發明内容】 本發明的目的是提供 -種忽略某些特定分支指令分3令的方法,以及 過程所需時間大於-個時脈週期法’可以在預測 閒置與浪費減少到最低程产。、中,將官線階段的 令,則以-個先前執行過的前序指 ^曰令為-分支指 預剛所需的相關資訊於預測單“=前= 1274285 13204twf.doc/m =位址為索引’使預測單元尋找對 立,則使_二=^^ 位址,否則以下—個依序執行的指令位址, 一次提取指令的位址。 ,照本發明的難實施觸述,此_方法是以執 前的一個前序指令的位址為預測的索引,而 非如先讀術-般’以分支指令本身的位址為索引。如此 程在提取分支指令本身之前提早開始,透過時 間差的精準安排,使管線在完成提取分支指令時,同時也 完成分支目標位址的預測,使管線能在下—靖脈 提取位於目標位址的指令’或者在分支不成立時提取下一 個依序執行的指令。在預測準確的前提下,這個預測方法 可使管線階段完全不會有閒置與浪費。 如果分支指令的下一個指令的真實位址不等於之前 預測的位址,就會發生預測錯誤,管線中在上述的分支指 令之後提取的指令必須全部刪除。管線會在分支指令之後 執行的下一個指令的真實位址,提取一個新指令以恢復動 作。這個步驟稱為分支預測確認作服吐predicti〇n verification)。 本發明亦提出一種省略分支指令的方法。首先,提供 一個預測單元,然後提取目前指令。若目前指令為無條件 式固定目標分支指令(unconditional flxed4arget bmnch 1274285 13204twf.doc/m instruction),而且在目前指令之前執行的指令並非分支指 令’則以目前指令之前第n個執行過的指令位址為索引, 記錄預測所需的相關資訊於預測單元内,其中η為預測單 元從接受索引位址,到完成預測分支指令目標位址所需的 時脈週期數。然後,以目前指令之前第η個執行過的指令 位址為索引,使預測單元尋找對應的相關資訊,以預測分 支才曰令的結果。最後,若對應的相關資訊存在於預測單元 内且預測的方向為分支成立,則使用預測出的分支指令 目標位址,做為下一次提取指令的位址,否則以下一個依 序執行的指令位址,做為下一次提取指令的位址。 依照本發明的較佳實施例所述,此省略方法是以執行 在分支指令之前的一個前序指令的位址為預測的索引,而 非如先前技術一般,以分支指令本身的位址為索引。如此 可使預測過程在提取分支指令本身之前提早開始,透過時 間差的精準安排,使管線在完成提取分支指令之前一指令 時,同時也完成分支目標位址的預測,使管線能在下一個 日守脈週期,直接提取位於目標位址的指令,因此省略分支 指令本身不予執行。由於無條件式固定目標分支指令每一 -人的執行結果都相同,上述的省略並非預測,而是可節省 時脈週期的必然結果。 如果遭遇會自我修改的程式碼(Self_m〇difying C〇de), 當無條件式固定目標分支指令改為不同的指令時,在分支 預測單元内對應的相關資訊也必須移除(invaUdate),以保 證正確結果。How does the pipeline with no branch prediction capability execute the instructions, assuming that the pipeline also includes five phases, from F1 to W5. In the 4th clock cycle, after the pipeline extracts the branch instruction BC4, it has been predicted that the next instruction is T5, and gj is in the 5th cycle. In the 6th pulse, the #龄(10) pass-through phase g takes the target. After she is indeed T5, the result of the front end of the pipeline can be preserved. The successor is executed, and the whole pipeline can be used for maximum efficiency. ^中/又有闲期: =11 cases assume that the prediction process only requires - a clock cycle cycle:, the clock required to generate the prediction process has seven phases, from F1 to =. As shown in Figure 4, the pipeline period is assumed. In the 4th clock cycle, when = over the pulse period: 兀 also starts the prediction target (10) target address II I prediction: =, so when the 5th clock cycle begins, the target address of the BC4 At the end of the $ line at the mouth, I don’t know when the end of the clock cycle, the tube, the second two 1274285 13204twf.doc/m-w, the beginning of the line can extract the target position τ9 of the finger t, the peak age i5 scale pulse is equal to white Waste, no productivity. Go to 6622240 to propose a solution to the above problem, mainly to obtain the branch instruction share as a "pre-branch instruction" (the target of the target address can be extracted at the beginning of the calculation of the ptt = copy. After the clock cycle is started, there will be no money to set the stage. In detail, the instruction consumes at least the branch before the extraction branch refers to the processing of the derived branch. From the above discussion and illustration, it is known that there is waste. The time required for the measurement process is greater than one. : ○, 斤 = method ^ to idle and waste in the online phase. Guarding to further reduce the tube [Summary] The object of the present invention is to provide a method of ignoring certain specific branch instructions, and the time required for the process More than - a clock cycle method can reduce the idle and waste to the lowest production. In the middle, the order of the official line is - the pre-order of the previous execution is - branch refers to the pre-gang The relevant information required in the forecast list "= before = 1274285 13204twf.doc / m = address is indexed" to make the prediction unit look for the opposite, then make _ two = ^ ^ address, otherwise the following - a sequential execution of the instruction bit Address, one extraction The address of the instruction. According to the difficult implementation of the present invention, the method is based on the address of a preamble instruction before the prediction, rather than the position of the branch instruction itself as in the first reading. The address is an index. This process starts early before extracting the branch instruction itself. Through the precise arrangement of the time difference, the pipeline completes the branch target address prediction when the branch instruction is completed, so that the pipeline can be located at the lower-jing pulse extraction. The instruction of the address 'or extracts the next instruction executed sequentially when the branch is not established. Under the premise of accurate prediction, this prediction method can make the pipeline stage completely idle and wasteful. If the next instruction of the branch instruction is true If the address is not equal to the previously predicted address, a prediction error will occur, and the instructions fetched after the above branch instruction in the pipeline must be deleted. The pipeline will extract a new address from the real address of the next instruction executed after the branch instruction. The instruction is to resume the action. This step is called branch prediction confirmation as a service routine. A method of omitting a branch instruction. First, provide a prediction unit and then extract the current instruction. If the current instruction is an unconditional fixed target branch instruction (unconditional flxed4arget bmnch 1274285 13204twf.doc/m instruction), and the execution is performed before the current instruction. The instruction is not a branch instruction, and the instruction information of the nth execution before the current instruction is indexed, and the relevant information required for the prediction is recorded in the prediction unit, where η is the prediction unit from accepting the index address to completing the prediction branch instruction. The number of clock cycles required for the target address. Then, using the instruction address of the nth executed before the current instruction as an index, the prediction unit searches for the corresponding related information to predict the result of the branch. Finally, if the corresponding related information exists in the prediction unit and the direction of the prediction is a branch, the predicted branch instruction target address is used as the address of the next extraction instruction, otherwise the following instruction bit is executed sequentially. Address, as the address of the next fetch instruction. According to a preferred embodiment of the present invention, the omission method is to perform an index of a preamble instruction before the branch instruction as a predicted index, instead of indexing the address of the branch instruction itself as in the prior art. . In this way, the prediction process can be started before the branch instruction itself is extracted. Through the precise arrangement of the time difference, the pipeline completes the prediction of the branch target address at the same time as the instruction before the branch instruction is completed, so that the pipeline can be on the next day. The cycle directly extracts the instruction at the target address, so omitting the branch instruction itself does not execute. Since the unconditional fixed target branch instruction has the same execution result for each person, the above omission is not a prediction, but the inevitable result of the clock cycle can be saved. If you encounter a self-modifying code (Self_m〇difying C〇de), when the unconditional fixed target branch instruction is changed to a different instruction, the corresponding information in the branch prediction unit must also be removed (invaUdate) to ensure The correct result.

11 1274285 13204twf.doc/m 為讓本發明之上述和其他目的、特徵和優點能更明顯 易懂,下文特舉一較佳實施例,並配合所附圖 說明如下。 H、田 【實施方式】 、假設本實施例的管線(pipeline)包含一個預測單元,豆 f測過程需時η個時脈週期。另外假設這個管線會儲存目 丽指令的位址,以及前11個執行過的指令位址,也就是程 φ 式计數态(program c〇unter)目前的内容位址以及前η個内 容位址。本實施例以PC[0]表示目前指令的位址, 表示上-偏旨令的健,PC[_2]表示上上—個指令的位 址,依此類推,直到PC|&gt;n]。 圖5繪示本實施例使用的預測單元,其主體為一表 格。圖5僅緣出四列,實際上可以包含任意數量的列,每 一列的内容是職-個分支齡(b_h丨她⑽㈣的相 關貧訊,每-列也稱為-筆記錄。表格有四搁··第一搁5〇ι 存放有效旗號,標示這一列所儲存的内容是否有效;第二 I 攔502存放對應的無條件式固定目標分支指令的之前第^ 個執行過的指令位址,或其他種類的分支指令的之前第η」 個執行過的指令位址;第三個攔位5〇3存放分支指令的預 測目標位址(prediction target address);以及第四攔5〇4,存 放預測相關資訊。本預測單元的作用,是接受索引位址 5〇5,在表格中尋找對應記錄,並提供預測目標位址5〇6 與預測分支方向507。預測目標位址5〇6會輸出為圖!的 目標位址105,而預測分支方向5〇7會輸出為圖j的預測 12 1274285 13204twf.doc/m 方向106。上述表格的用法在後面會有補充說明。要注意 的是,有很多種方法可轉放_相_誠及利用它們 來做預測。這些方法都可以用來預測分支方向為成立或不 成立。 本實施例使用的預測單元非常簡單,儲存的相關資訊 也很單純,實際上的·單元可能儲存更多資訊,並提供 更多預測結果。通常預測單福演算法越複雜, 相 關資訊就越多。 以下說明本發明提出的一種預測分支指令的方法,其 步驟與流程請參照圖6。從步驟6〇2開始,先提取位址為 pc[x]的指令。然後步驟604會判斷pc[x]是否存在於圖5 的分支預測單元,如果是,接著進行步驟6〇6,輸出對應 的預測分支目標位址506與預測分支方向5〇7,否則到^ 驟608,將預測分支方向507設定為不成立。 經過步驟606與608之後,在步驟61〇,多工器1〇7 會遥取私式计數态101的内容位址。由於分支預測的延 遲,這個内容位址是用於pC[x]之後第η個執行的指令, 也就是PC[x+n]。 接下來,在步驟612,當管線執行指令pcR+n-i]時, 會知道下一個指令的真實位址。如果指令PC[x+n-l]為分 支指令’執行時會決定分支成立或不成立,因此會決定下 一個指令的程式計數器内容位址。如果指令並 非分支指令,下一個指令的程式計數器内容位址為 PC[x+n_l]+4 〇 1274285 13204twf.doc/m 接下來,在步驟614,如果PC[x+n-l]的下一個指令的 真實位址不等於之前預測的PC[x+n],會發生預測錯誤, 所有管線中在PC[x+n-l]之後提取的指令都要刪除。管線 會在PC[x+n-l]之後執行的下一個指令的真實位址,提取 一個新指令以恢復動作。這個步驟稱為分支預測確認。 最後,在步驟616,獨立於上述步驟之外,在執行— 個分支指令時,其目標位址PC[y]會被寫入分支預測單 元,而分支方向之類的其他資訊,可用來更新同一筆記錄 中的預測相關資訊504。在分支預測單元中將被寫入的記 錄,引,是由目前分支指令之前第仏丨個指令的位址(也 就是PC[y-n+l])❾一個函數決定。本實施例是以指令位 址的m個最低位元為索引值,其中一預設常數。 以下說明此種預測方法的效果,請參照圖7。假設羊 -實施例的預測單元需要2個時脈職妓成删,並假 设此-貫施例的管線有七個階段,從F⑼w7。如圖所矛, 分支指令BC4的前序指令為i3。 線開始提取i3時,也同時開始預測分支指;二的目田; =^^4,_期結束時,當管線完成提取分1 G來在第刀5支” BC4的目標位址T9也同時被預測出, -由此:知:二 =的方法,可以使管=不下會=的;: 從本發出的_分支指令的方法,可讀生出— 1274285 13204twf.doc/m 種省略分支指令的方法’藉以略過某些特定分支指令不予 執行,更進-步節省管線的時脈週期,這個省略分支指令 方法的流程與步驟請參照圖8。 這個省略分支指令的方法,與前述的預測分支指令的 方法非常相似,步驟802至816與圖6的步驟至616 _。在最後的步驟818 ’獨立於上述步驟之外,當管線 執行無條件式固定目標分支指令,而且前一個指令並非分 • 支指令日守,無條件式分支指令的目標位址PC[y]會被寫入 分支預測單兀,其對應的預測相關資訊5〇4會被更新,使 預測的分支方向永遠為成立。在分支預測單元中將被寫入 的記錄索引’是由無條件式分支指令之前第n個指令的位 址(也就是PC[y-n])的-個函數決定。本實施例在此使用 的函數和步驟816使用的相同。 如此-來,當管線完成提取這些分支指令的前一指令 肖’同時也完成·這些分支指令的目標位址,在下一個 ’ 時脈職,管線就直接提取位於目標位址的指令。由於這 些分支指令被省略不予執行,結果是比其他分支指令多節 省一個時脈週期。 圖9舉例說明這個省略分支指令方法的效果,假設某 -實施例的預測單元需要2個時脈週期以完成預測,炎假 設此一實施例的管線有七個階段,從F1至,】W7。如圖所示, 分支指令B4的前序指令為i2,在第2個時脈週期,當管 線開始提取指令i2,也同時開始預測分支指令B4的目標 15 1274285 I3204twf.doc/m 位if,當第3個時脈週期結束時,管線完成提取指令g, 同日守也預測出分支指令B4的目標位址T9,接下來管線就 可以直接提取T9,跳過分支指令Β4本身。圖9中加底線 ^-行實際並未發生,分支指令Β4並沒有進人管線消耗 時脈週期’省下來的時脈週期可用於執行其他指令。 為了防止會自我修改的程式碼造成無條件式固定目 私分支指令的省略錯誤,分支預測單元内含的分支指令記 丨錄必須與記憶體實際儲存的分支指令保持一致。做法是如 果記憶體的内容有所變動,使一個或多個無條件式固定目 標分支指令被移除,就從預測單元中,移除此分支指令所 對應的相關記錄。重新建立的記錄,可以保證與記憶體中 最新的分支指令一致。 &quot; ^由以上說明可知,本發明的主要特徵與優點為利用執 行在前的前序指令位址為預測索引,使預測過程提早開 始’以提早獲得預測的分支指令目標位址,進而完全避免 管線階段的閒置與浪費。 雖然本發明已以較佳實施例揭露如上,然其並非用以 限定本發明,任何熟習此技藝者,在不脫離本發明之精神 和範圍内,當可作些許之更動與潤飾,因此本發明之保護 範圍當視後附之申請專利範圍所界定者為準。° 【圖式簡單說明】 圖1為一分支預測技術的典型架構圖。 圖2為一無分支預測能力的傳統管線的指令執行範 1274285 13204twf.d〇c/m 時姆要兩個 一圖5緣示根據於本發明一實施例的分支指令預測單 圖6為根據於本發明 减程圖。 0 7為根據於本發明 執行範例。 圖8為根據於本發明 流裎圖。 圖9為根據於本發明 執行範例。The above and other objects, features and advantages of the present invention will become more apparent from the <RTIgt; H. Field [Embodiment] It is assumed that the pipeline of the present embodiment includes a prediction unit, and the process of the bean f measurement takes η clock cycles. In addition, it is assumed that this pipeline will store the address of the command and the first 11 executed instruction addresses, that is, the current content address of the program c〇unter and the first n content addresses. . In this embodiment, the address of the current instruction is represented by PC[0], which represents the health of the upper-bias order, PC[_2] represents the address of the upper-up instruction, and so on, until PC|&gt;n]. Fig. 5 is a diagram showing the prediction unit used in the embodiment, the main body of which is a table. Figure 5 only has four columns. It can actually contain any number of columns. The content of each column is the job-branch age (b_h丨 her (10) (four) related poor news, each column is also called - pen record. The table has four The first stop 5〇ι stores a valid flag to indicate whether the content stored in this column is valid; the second I block 502 stores the previous executed instruction address of the corresponding unconditional fixed target branch instruction, or The first n'th executed instruction address of the other kind of branch instruction; the third block 5〇3 stores the prediction target address of the branch instruction; and the fourth block 5〇4 stores the prediction Related information. The function of this prediction unit is to accept the index address 5〇5, find the corresponding record in the table, and provide the prediction target address 5〇6 and the prediction branch direction 507. The prediction target address 5〇6 will be output as The target address of Figure! is 105, and the predicted branch direction of 5〇7 is output as the prediction of Figure j. 1 1285285 13204twf.doc/m direction 106. The usage of the above table will be supplemented later. It should be noted that there are many Ways can be transferred _ _ Cheng and use them to make predictions. These methods can be used to predict whether the branch direction is true or not. The prediction unit used in this embodiment is very simple, and the related information stored is very simple. In fact, the unit may store more. Information, and provide more prediction results. Generally, the more complicated the prediction algorithm is, the more relevant information is. The following describes a method for predicting branch instructions proposed by the present invention, and the steps and processes thereof are shown in Figure 6. From step 6 2, first extract the instruction with the address pc[x]. Then step 604 will determine whether pc[x] exists in the branch prediction unit of Figure 5, and if yes, proceed to step 6〇6 to output the corresponding predicted branch target. The address 506 and the prediction branch direction are 5〇7, otherwise the prediction branch direction 507 is set to be unsuccessful. After steps 606 and 608, in step 61, the multiplexer 1〇7 will take the private meter. The content address of the number state 101. Due to the delay of the branch prediction, this content address is the instruction for the nth execution after pC[x], that is, PC[x+n]. Next, at step 612, when tube When the instruction pcR+ni] is executed, the real address of the next instruction will be known. If the instruction PC[x+nl] is a branch instruction, it will determine whether the branch is established or not, so it will determine the program counter content of the next instruction. If the instruction is not a branch instruction, the program counter content address of the next instruction is PC[x+n_l]+4 〇1274285 13204twf.doc/m Next, in step 614, if the next one of PC[x+nl] The actual address of the instruction is not equal to the previously predicted PC[x+n], a prediction error will occur, and the instructions extracted after PC[x+nl] in all pipelines will be deleted. The pipeline will extract a new instruction to resume the action by the real address of the next instruction executed after PC[x+n-l]. This step is called branch prediction confirmation. Finally, in step 616, in addition to the above steps, when the branch instruction is executed, the target address PC[y] is written to the branch prediction unit, and other information such as the branch direction can be used to update the same information. Prediction related information 504 in the pen record. The record to be written in the branch prediction unit is determined by the address of the first instruction (that is, PC[y-n+l]) 之前 before the current branch instruction. In this embodiment, the m lowest bits of the instruction address are indexed, and a preset constant. The effect of such a prediction method will be described below, please refer to FIG. Assume that the sheep-predictive unit of the embodiment requires two clocks to be deleted, and that the pipeline of this embodiment has seven stages, from F(9)w7. As shown in the figure, the preamble instruction of the branch instruction BC4 is i3. When the line starts to extract i3, it also starts to predict the branch finger; the second field; =^^4, at the end of the _ period, when the pipeline finishes extracting 1 G to the 5th branch of the knive, the target address T9 of BC4 is also It is predicted that - by this: know: two = method, you can make the tube = no = will =; from the method of the _ branch instruction issued, can be read - 1274285 13204twf.doc / m kind of omission of branch instructions The method 'sends some specific branch instructions and does not execute, and further saves the clock cycle of the pipeline. The flow and steps of the method of omitting the branch instruction are shown in Figure 8. This method of omitting the branch instruction, and the foregoing prediction The method of branching instructions is very similar, steps 802 to 816 and steps of 616 to 616. In the final step 818', independent of the above steps, when the pipeline executes an unconditional fixed target branch instruction, and the previous instruction is not divided. The target address PC[y] of the unconditional branch instruction is written into the branch prediction unit, and the corresponding prediction related information 5〇4 is updated so that the predicted branch direction is always established. Forecast list The record index to be written in the meta is determined by the function of the address of the nth instruction (that is, PC[yn]) before the unconditional branch instruction. The function used in this embodiment and step 816 are used. The same. So, when the pipeline completes the extraction of the previous instruction of these branch instructions, it also completes the target address of these branch instructions. At the next time, the pipeline directly extracts the instruction at the target address. Since these branch instructions are omitted and not executed, the result is one more clock cycle than other branch instructions. Figure 9 illustrates the effect of this method of omitting the branch instruction, assuming that the prediction unit of a certain embodiment requires 2 clock cycles. To complete the prediction, the inflammation assumes that the pipeline of this embodiment has seven phases, from F1 to, W7. As shown, the preamble instruction of the branch instruction B4 is i2, and in the second clock cycle, when the pipeline starts to extract The instruction i2 also starts to predict the target 15 1274285 I3204twf.doc/m bit if of the branch instruction B4. When the third clock cycle ends, the pipeline completes the extraction instruction g, and the same day predicts the score. The target address of the instruction B4 is T9, and then the pipeline can directly extract T9 and skip the branch instruction Β4 itself. The bottom line in Figure 9 does not actually occur, and the branch instruction Β4 does not enter the pipeline to consume the clock cycle. 'The saved clock cycle can be used to execute other instructions. In order to prevent the self-modifying code from causing the omitting error of the unconditional fixed private branch instruction, the branch instruction record contained in the branch prediction unit must be actually stored with the memory. The branch instruction is consistent. If the content of the memory is changed, one or more unconditional fixed target branch instructions are removed, and the relevant record corresponding to the branch instruction is removed from the prediction unit. The re-established record is guaranteed to be consistent with the latest branch instructions in the memory. &quot; ^ From the above description, the main features and advantages of the present invention are that the execution of the preceding pre-order instruction address is the prediction index, so that the prediction process starts early to 'prematurely obtain the predicted branch instruction target address, thereby completely avoiding Idle and waste in the pipeline stage. While the present invention has been described in its preferred embodiments, the present invention is not intended to limit the invention, and the present invention may be modified and modified without departing from the spirit and scope of the invention. The scope of protection is subject to the definition of the scope of the patent application. ° [Simple description of the diagram] Figure 1 is a typical architecture diagram of a branch prediction technique. 2 is a command execution specification of a conventional pipeline without branch prediction capability 1274285 13204twf.d〇c/m, and two diagrams are shown in FIG. 5 according to an embodiment of the present invention. The reduction map of the present invention. 0 7 is an execution example according to the present invention. Figure 8 is a flow diagram in accordance with the present invention. Figure 9 is an example of execution in accordance with the present invention.

實施例的分支指令預測方法的 實施例的分支指令預測方法的 實施例的分支指令省略方法的 實施例的分支指令省略方法的The branch instruction omitting method of the embodiment of the branch instruction prediction method of the embodiment is a branch instruction omitting method of the embodiment

【主要元件符號說明】 101 程式計數器 102 指令快取區 103 加法器 104 預測單元 105 目標位址 106 預測方向 107 多工器 501 有效旗號 502 索引位址 17 1274285 13204twf.doc/m 503 ··分支指令目標位址 504 :預測相關資訊 5〇5 :索引位址 506 :預測目標位址 5〇7 :預測分支方向 602 ·提取位址為pc[x]的目前指令 604 : PC[x]存在於圖5的分支預測單元中? 606 :輸出對應的預測分支目標位址與預測分支方向 608 :設定預測分支方向為不成立 610·多工器1〇7會選取預測的程式計數器内容位址。 由於分支預測的延遲,這個内容位址是用於p c [ χ ]之後第n 個執行的指令,也就是PC[x+n]。這個内容值會用於從步 驟602開始的指令提取動作。 612 :當管線執行指令Ρ(:[χ+η_ιμγ,會知道下一個指 令的真實位址。如果指令PC[x+n-l]為分支指令,執行時 會決定分支成立或不成立,因此會決定下一個指令的程式 計數器内容位址。如果指令PC[x+n-l]並非分支指令,下 一個指令的程式計數器内容位址為PC[x+n-1]+4。 614 :如果PC[x+n-l]的下一個指令的真實位址不等於 之前預測的PC[x+n],會發生預測錯誤,所有管線中在 PC[x+n_i]之後提取的指令都要刪除。管線會在ρ〔[χ+η_ι] 之後執行的下一個指令的真實位址,提取一個新指令以恢 復動作。這個步驟稱為分支預測確認。 616 ··獨立於上述步驟之外,在執行一個分支指令時, 其目標位址PC[y]會被寫入分支預測單元,而分支方向之 18 1274285 13204twf.doc/m 類的其他資訊,可用來更新同一筆記錄中的預測相關資 訊。在分支預測單元中將被寫入的記錄索引,是由目前分 支指令之前第n-1個指令的位址(也就是pC[y-n+1])的一 個函數決定。本實施例是以指令位址的m個最低位元為索 引值,其中m為一預設常數。 802 :提取位址為pc[x]的目前指令 804 : PC[x]存在於圖5的分支預測單元中? 806 ·輸出對應的預測分支目標位址與預測分支方向 808 :設定預測分支方向為不成立 810 ·多工器1〇7會選取預測的程式計數器内容位址。 由於分支預測的延遲,這個内容位址是用於pc[x]之後第η 個執行的指令,也就是PC[x+n]。這個内容值會用於從步 驟802開始的指令提取動作。 812 :當管線執行指令Ρ(:[χ+η_1μΐ,會知道下一個指 令的真實位址。如果指令PC[x+n-l]為分支指令,執行時 會決定分支成立或不成立,因此會決定下一個指令的程式 計數器内容位址。如果指令PC[x+n-i]並非分支指令,下 一個指令的程式計數器内容位址為PC[x+n-i]+4。 乂814 :如果PC[x+n_l]的下一個指令的真實位址不等於 之前預測的PC[x+n],會發生預測錯誤,所有管線中在 PCfxn]之後提取的指令都要刪除。管線會在pc[x+n i] ^後執行的下一個指令的真實位址,提取一個新指令以恢 復動作。這個步驟稱為分支預測確認。 816 ··獨立於上述步驟之外,在執行一個分支指令時, 19 1274285 13204twf.doc/m 其目標位址PC[y]會被寫入分支預測單元,而分支方向之 類的其他資訊,可用來更新同一筆記錄中的預測相關資 訊。在分支預測單元中將被寫入的記錄索引,是由目前八 支指令之前第n-1個指令的位址(也就是pC[y-n+1])的〜 個函數決定。本實施例是以指令位址的m個最低位元為索 引值,其中m為一預設常數。 ’、 818 :獨立於上述步驟之外,當管線執行無條件式固 定目標分支指令,而且前一個指令並非分支指令時,無條 件式分支指令的目標位址PC[y]會被寫入分支預測單元, 其對應的預測相關資訊會被更新,使預測的分支方向永遠 為成立。在分支預測單元中將被寫入的記錄索引,是由無 條件式分支指令之前第η個指令的位址(也就是pc[y-n]) 的一個函數決定。本實施例在此使用的函數和步驟816使 用的相同。[Main component symbol description] 101 Program counter 102 Instruction cache area 103 Adder 104 Prediction unit 105 Target address 106 Prediction direction 107 Multiplexer 501 Valid flag 502 Index address 17 1274285 13204twf.doc/m 503 ·· Branch instruction Target address 504: prediction related information 5〇5: index address 506: prediction target address 5〇7: prediction branch direction 602 • extraction of current instruction 604 with address pc[x]: PC[x] exists in the diagram 5 branch prediction unit? 606: Output the corresponding predicted branch target address and prediction branch direction 608: Set the prediction branch direction to be invalid. 610. The multiplexer 1〇7 selects the predicted program counter content address. Due to the delay of the branch prediction, this content address is the instruction for the nth execution after p c [ χ ], that is, PC[x+n]. This content value will be used for the instruction fetch action starting at step 602. 612: When the pipeline executes the command Ρ(:[χ+η_ιμγ, the real address of the next instruction will be known. If the instruction PC[x+nl] is a branch instruction, the execution will determine whether the branch is established or not, so it will determine the next one. The program counter content address of the instruction. If the instruction PC[x+nl] is not a branch instruction, the program counter content address of the next instruction is PC[x+n-1]+4. 614: If PC[x+nl] The actual address of the next instruction is not equal to the previously predicted PC[x+n], and a prediction error will occur. All instructions in the pipeline that are extracted after PC[x+n_i] will be deleted. The pipeline will be in ρ[[χ +η_ι] The real address of the next instruction executed afterwards, a new instruction is fetched to resume the action. This step is called branch prediction acknowledgment. 616 ··································· The address PC[y] will be written to the branch prediction unit, and the other information in the branch direction 18 1274285 13204twf.doc/m can be used to update the prediction related information in the same record. It will be written in the branch prediction unit. The index of the record is the current branch The function of the address of the n-1th instruction (that is, pC[y-n+1]) before the instruction is determined. In this embodiment, the m lowest bits of the instruction address are indexed, where m is one. Preset constant 802: Extract the current instruction 804 whose address is pc[x]: PC[x] exists in the branch prediction unit of FIG. 5? 806 · Output the corresponding predicted branch target address and prediction branch direction 808: Set The prediction branch direction is not established. 810. The multiplexer 1〇7 selects the predicted program counter content address. Due to the delay of the branch prediction, this content address is the instruction for the nth execution after pc[x], that is, PC[x+n]. This content value will be used for the instruction fetch action starting from step 802. 812: When the pipeline executes the instruction Ρ(:[χ+η_1μΐ, the real address of the next instruction will be known. If the instruction PC[ x+nl] is a branch instruction, which determines whether the branch is established or not, so it determines the program counter content address of the next instruction. If the instruction PC[x+ni] is not a branch instruction, the program counter content bit of the next instruction The address is PC[x+ni]+4. 乂814: If PC[x+n_l] The real address of the next instruction is not equal to the previously predicted PC[x+n], a prediction error will occur, and the instructions fetched after PCfxn] in all pipelines will be deleted. The pipeline will be executed after pc[x+ni] ^ The real address of the next instruction, extract a new instruction to resume the action. This step is called branch prediction acknowledgment. 816 ·························································· The target address PC[y] is written to the branch prediction unit, and other information such as the branch direction can be used to update the prediction related information in the same record. The record index to be written in the branch prediction unit is determined by the ~ function of the address of the n-1th instruction (i.e., pC[y-n+1]) before the current eight instructions. In this embodiment, the m lowest bits of the instruction address are the index values, where m is a predetermined constant. ', 818: Independent of the above steps, when the pipeline executes the unconditional fixed target branch instruction, and the previous instruction is not a branch instruction, the target address PC[y] of the unconditional branch instruction is written to the branch prediction unit. The corresponding prediction related information will be updated so that the predicted branch direction is always established. The record index to be written in the branch prediction unit is determined by a function of the address of the nth instruction (i.e., pc[y-n]) before the unconditional branch instruction. The function used in this embodiment is the same as that used in step 816.

2020

Claims (1)

1項所述之預測分支指令的方 1274285 13204twf.doc/m 十、甲請專利範圍: 1·一種預測分支指令的方法,句 提取並執行-目前指令; 步驟: 若該目前指令為-分支指令,則以 序指令的位址為索引,記錄預測所心 則執行過的前 ^以該前序指令的位址為索弓I尋目 该分支指令的目標位址;以及 、A ’以預測 若該相關資訊存在且預測方向為分 測出的該分支指令的目標位址,做為,則使用預 個依序執行的指令位址,== 2·如申請專利範圍第 法’更包括: 錄為該目前指令之前η個執行過的指令位址,以及 該目河指令的位址,其巾β從接受該前序指令的位址為 索引到70成預測該分支指令的目標位址與分支方向為 止,所需的時脈週期數。 3·如申请專利範圍第2項所述之預測分支指令的方 法’其中該前序指令之位址為該目前指令之前第n—丨個執 行過的指令位址。 4·如申睛專利範圍第1項所述之預測分支指令的方 法’其中該相關資訊包括該分支指令之方向。 5·如申請專利範圍第1項所述之預測分支指令的方 法’其中該相關資訊包括該分支指令之目標位址。The party of the predicted branch instruction described in 1 item 1274285 13204twf.doc/m X. A patent scope: 1. A method for predicting branch instructions, sentence extraction and execution - current instruction; Step: If the current instruction is a - branch instruction , the address of the sequence instruction is indexed, and the address of the pre-order instruction is recorded as the target address of the branch instruction; and A 'predicts if The relevant information exists and the prediction direction is the target address of the branch instruction that is measured, and if so, the pre-executed instruction address is used, == 2 · If the patent application scope law 'includes more: For the n-executed instruction address before the current instruction, and the address of the destination instruction, the towel β is indexed from the address receiving the pre-order instruction to 70 to predict the target address and branch of the branch instruction. The number of clock cycles required for the direction. 3. The method of predicting a branch instruction as described in claim 2, wherein the address of the preamble instruction is the nth consecutive execution of the instruction address before the current instruction. 4. The method of predicting a branch instruction as described in claim 1 of the scope of the patent application, wherein the related information includes the direction of the branch instruction. 5. The method of predicting a branch instruction as described in claim 1 wherein the related information includes a target address of the branch instruction. 21 1274285 13204twf.doc/m 6. 如申請專利範圍帛1項所述之預測分支指 法,其中該相關資訊係存放於一分支預測單元的—表^久勺方 0中。 7. -種省略分支指令的方法,包括下列步驟: 提取並執行一目前指令; 若該目前指令為一無條件式固定目標分八 目前指令之前-指令並非為—分支指令,則以^足讀 之前第η個執行過的指令位址為索引,記錄預^,令 相關貧訊,其中!!為從接受該索引, ' 而的〜 以該目前指令之前第n個執行過的指 :找?相關資訊,以預測該分支指令的分支方向二;丨位 在且酬方向為分支成立,則使= 劂出的私支指令的目標健, :令::r個依序執行的指令位址,== 法,請專利範圍第7項所述之省略分支指令的方 該目;指令之^個執行過的指令位址,以及 法,==鄉圍第7項所述之省略分支指令的方 其中该相關育訊包括該分支指令之方向 法,概圍第7销述之省略分支指令的方 去、中该相關肓訊包括該分支指令之目標位址。 22 1274285 13204twf.doc/m 11.如申請專利範圍第7項所述之省略分支指令的方 法,更包括: 提供一存放指令之記憶體; 自該記憶體提取該目前指令;以及 若該記憶體的内容有變動,使該些分支指令從該記憶 體中移除,則移除該無條件式固定目標分支指令所對應的 該相關資訊。21 1274285 13204twf.doc/m 6. The predictive branching method as described in claim 1 of the patent application, wherein the relevant information is stored in a branch of the prediction unit of a branch. 7. A method for omitting a branch instruction, comprising the steps of: extracting and executing a current instruction; if the current instruction is an unconditional fixed target divided by eight before the current instruction - the instruction is not a - branch instruction, then The nth executed instruction address is indexed, and the record pre-command is related to the poor news, among them! ! To accept the index, 'and ~ the first n executed before the current instruction: find the relevant information to predict the branch direction of the branch instruction two; the position is in the direction of the branch, then make = The target of the private instruction that is output is: ,::r, the instruction address that is executed sequentially, == method, please the party that omits the branch instruction mentioned in item 7 of the patent scope; The executed instruction address, and the method, == the party that omits the branch instruction described in item 7 of the township, wherein the relevant education includes the direction method of the branch instruction, and the party that omits the branch instruction in the seventh paragraph The related information of the branch instruction includes the target address of the branch instruction. 22 1274285 13204twf.doc/m 11. The method for omitting a branch instruction as described in claim 7 of the patent application, further comprising: providing a memory for storing the instruction; extracting the current instruction from the memory; and if the memory The content of the branch is changed, and the branch instructions are removed from the memory, and the related information corresponding to the unconditional fixed target branch instruction is removed. 23twenty three
TW94110688A 2005-04-04 2005-04-04 Branch instruction prediction and skipping using addresses of precedent instructions TWI274285B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
TW94110688A TWI274285B (en) 2005-04-04 2005-04-04 Branch instruction prediction and skipping using addresses of precedent instructions

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
TW94110688A TWI274285B (en) 2005-04-04 2005-04-04 Branch instruction prediction and skipping using addresses of precedent instructions

Publications (2)

Publication Number Publication Date
TW200636584A TW200636584A (en) 2006-10-16
TWI274285B true TWI274285B (en) 2007-02-21

Family

ID=38623119

Family Applications (1)

Application Number Title Priority Date Filing Date
TW94110688A TWI274285B (en) 2005-04-04 2005-04-04 Branch instruction prediction and skipping using addresses of precedent instructions

Country Status (1)

Country Link
TW (1) TWI274285B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI768547B (en) * 2020-11-18 2022-06-21 瑞昱半導體股份有限公司 Pipeline computer system and instruction processing method

Also Published As

Publication number Publication date
TW200636584A (en) 2006-10-16

Similar Documents

Publication Publication Date Title
US10255074B2 (en) Selective flushing of instructions in an instruction pipeline in a processor back to an execution-resolved target address, in response to a precise interrupt
US7278012B2 (en) Method and apparatus for efficiently accessing first and second branch history tables to predict branch instructions
TW200820072A (en) Methods an apparatus for emulating the branch prediction behavior of an explicit subroutine call
CN104471529B (en) To the method and apparatus of extended software branch target prompting
TWI416408B (en) A microprocessor and information storage method thereof
TWI253588B (en) Pipelined architecture with separate pre-fetch and instruction fetch stages
TWI238966B (en) Apparatus and method for invalidation of redundant branch target address cache entries
TW200951811A (en) System and method of selectively committing a result of an executed instruction
EP2084602B1 (en) A system and method for using a working global history register
TW201224919A (en) Execute at commit state update instructions, apparatus, methods, and systems
TW201106261A (en) Microprocessors and storing methods using the same
US6760835B1 (en) Instruction branch mispredict streaming
TW202111524A (en) Apparatus and system for improvingbranch prediction throughput by ski pping over cachelines without branches
US9317293B2 (en) Establishing a branch target instruction cache (BTIC) entry for subroutine returns to reduce execution pipeline bubbles, and related systems, methods, and computer-readable media
TW200841236A (en) Distributed dispatch with concurrent, out-of-order dispatch
TWI274285B (en) Branch instruction prediction and skipping using addresses of precedent instructions
US20060149947A1 (en) Branch instruction prediction and skipping method using addresses of precedent instructions
TWI284282B (en) Processor including branch prediction mechanism for far jump and far call instructions
TW200416603A (en) An apparatus for avoiding a deadlock condition in a microprocessor with a speculative branch target address cache
US20160335089A1 (en) Eliminating redundancy in a branch target instruction cache by establishing entries using the target address of a subroutine
TW200915180A (en) Single hot forward interconnect scheme for delayed execution pipelines
TWI232403B (en) Apparatus and method for buffering instructions and late-generated related information using history of previous load/shifts
TW200931443A (en) Apparatus for predicting memory access and method thereof
TWI254247B (en) Usurpation of a waited pipeline bus request
EP2693333A1 (en) Processor and instruction processing method thereof

Legal Events

Date Code Title Description
MM4A Annulment or lapse of patent due to non-payment of fees