TW559733B - Method and apparatus for reducing branch latency - Google Patents

Method and apparatus for reducing branch latency Download PDF

Info

Publication number
TW559733B
TW559733B TW090128029A TW90128029A TW559733B TW 559733 B TW559733 B TW 559733B TW 090128029 A TW090128029 A TW 090128029A TW 90128029 A TW90128029 A TW 90128029A TW 559733 B TW559733 B TW 559733B
Authority
TW
Taiwan
Prior art keywords
branch
instruction
mentioned
address
block
Prior art date
Application number
TW090128029A
Other languages
Chinese (zh)
Inventor
John L Redford
Original Assignee
Chipwrights Design Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chipwrights Design Inc filed Critical Chipwrights Design Inc
Application granted granted Critical
Publication of TW559733B publication Critical patent/TW559733B/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • G06F9/3802Instruction prefetching
    • G06F9/3804Instruction prefetching for branches, e.g. hedging, branch folding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/32Address formation of the next instruction, e.g. by incrementing the instruction counter
    • G06F9/322Address formation of the next instruction, e.g. by incrementing the instruction counter for non-sequential address
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • G06F9/3836Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
    • G06F9/3842Speculative instruction execution
    • G06F9/3846Speculative instruction execution using static prediction, e.g. branch taken strategy
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/448Execution paradigms, e.g. implementations of programming paradigms
    • G06F9/4482Procedural
    • G06F9/4484Executing subprograms
    • G06F9/4486Formation of subprogram jump address

Abstract

A method and apparatus for reducing latency in execution of branch instructions are provided. A branch instruction includes an opcode portion and an address portion. The address portion includes a displacement and a code which identifies a block in the instruction memory in which the branch target instruction is located and, therefore, the block to which execution will branch in response to the branch instruction. During the fetch cycle in which the branch instruction is fetched, the displacement portion of the branch instruction is reinserted into the address register as the address of the next instruction to be fetched. The code that identifies the block containing the branch target instruction is used to ensure that the address register is pointing to the correct block. As a result, during the next instruction fetch cycle, the branch target instruction is fetched for execution. Hence, the branch processing latency found in prior systems in which the next fetch cycle is skipped while the branch target address is computed, such as by adding an offset to the program counter value, is eliminated.

Description

559733 五、發明說明(1) 傳統上程序系統以多階(stage)或多週 指令。記憶體位址儲存所須執行的指令執订私式 式計數器來讀取並寫入到位址暫丁時由程 用以存取記憶體配置的指令,並從子益的位址 且載入至指令暫存器。 並以…己憶體中提取指令 定指心:元位元的字元’傳統上指令包括確 ::::夕位兀運异碼及記憶體位址資訊。舉例而古,在 才曰々儿成時,記憶體位址資訊可句 口 記憶异兀(opreand)體位置或儲存指令結果的 在提取指令之後並下都5扣人* 期。在解碼迅期湘二: 暫存器,並執行解碼週 指令和所要求的判定程序 :::t 丁的確認 元。當解碼完成後,就後可被用於回收運算 在一般指令執行的連續户 增程器以指向指令記憶體的下=指令之後’遞 刀支才日令為一種非當並、基半s;丨 改變指令執行時的一般連;;:”指令。分支指令用以 時,如果未滿足迴圈中° r到達迴圈的最後指令 圈的最頂,。在此情“的式的流程必須回到迴 數器而獲得頂部之位址。八:二:用以載入遞增程式計 :址資料來將程序模組“至二:4ϊ=ί;‘:: 1057-4473-PF.ptd 第6頁 559733 五、發明說明(2) =同大部份傳統的指令一樣,分支指令包括運算碼和 位址資訊。位址資訊傳統上採用定義為使用分支時程式將 跳過的位址數量之偏移值的方式。偏移值傳統上為有號數 目且在解碼週期時附加到目前指令上。在解碼週期之後, = Ϊ位址暫存器因此在下個提取週期時便可 用以提取分支的第一指令。 ,式執行的效率可借由管線化〇 Ηη J加強。在管線化時,在第一提取週期期間提=曰=丁 ΐ個:ί的;::#一指令的解碼週期,同時地丄亍i式 個指令是說,*第-指令解碼時提取下 效率。 。’此方法明顯地增加程式執行的速度和 在此情況下’由管線化可獲八 與包括計算分支目標位址的等η:的效率須 週期期間’位移值必須加在目前數=支指令解碼 附加動作須要在解碼週期的明顯部;:、位址值。這 址尚未判定下載至位址暫存器中,;以下個::個指令位 期無法同時間執行。相反 斤乂下個扣令的提取週 支目標指令,直到解碼週 f旨令的提取週期,即分 上面描述的情況為::成之後才能開始。 分支之後的第-指令參考。㈣求減“ 在提取分支指令之後迅速載入 &心令之等待,可借由 期時能提取分支目標指令。 位址暫存器使下個提取週 本發明提供一減少^去笠a〜 支4待條件的方法。依據本發明 1057-4473-PF.ptd 第7頁 559733 五、發明說明(3) 而提供分支指令程 算碼的運算螞部份 資訊之位址部份, 支指令位址的部份 份,記憶體區塊確 支對應的分支指令 份來當作分支指令 *忍在記憶體區的位 當獲得分支時 部份用以確認執行 在實施例中,區塊 為包括分支指令的 支指令在同一區塊 區塊。 在實施例中, 至少四碼來確認方 區塊。其中第二碼 先區塊。第四碼用 第一碼亦可用以定 此,在此設定下舉 01,可用以碟認前 塊。例如第三碼和 支,前者進入領先 在實施例中’ 序的裝置 和具有分 用以執行 包括記憶 認部份用 。使用分 的位址部 址來執行 分支指令 分支在記 可為一-可 區塊,而 。其他區 與方法。 支目標指 程式分支 體區塊確 以確認在 支指令的 份的記憶 分支至分 的位址部 憶體的區 月匕區塊之 在此情況 塊為目前 分支指令 令的位址 對應的分 認部份和 記憶體的 位址部份 體區塊確 支指令。 份的記憶 塊對應的 包括具有運 相關的位址 支指令。> 位移值部 區塊執行分 的位移值部 認部份來確 體區塊確認 分支指令。d 其 區塊之一可能 下分支目 領先區塊 標指令與分 和目前落後 區塊確認器 塊。其中一 定義目前落 以定義在相 義在相同區 例子,例如第 包括至少 碼用以確 後區塊, 同區塊的 塊的分支 進分支,珂 第四碼為1 0 區塊而後者 區塊確認部 碼 者在區塊 和11,可 在目前區 份可用以 二位元容量以定義 認分支指令的相同 第二螞定義目前領 分支之各別方向。 為不同方向。因 和第二碼為0 0和 中而後者在下一區 用Μ定義後退分 塊中6 分支預測,即預測559733 V. Description of the invention (1) Traditionally, a program system uses multi-stage or multi-cycle instructions. The instruction to be executed in the memory address storage is a private counter to read and write to the address. The program temporarily uses the instruction to access the memory configuration, and loads the instruction from the sub address and loads it into the instruction. Register. And the instructions are extracted from ... memory body: the character of meta-bits ’traditionally, the instructions include ::::: Xi Wuyun code and memory address information. For example, in ancient times, the memory address information can be used to memorize opreand body positions or store instruction results after the instruction is fetched and issued for 5 times *. During the fast decoding phase, the second: register, and execute the decoding cycle instruction and the required judgment procedure ::: t ding confirmation element. When the decoding is completed, it can be used for recovery operations. In the general instruction execution, the continuous range extender points to the instruction memory. After the instruction = the knife is only ordered as a kind of improper union and base half s;丨 Change the general connection when the instruction is executed;;: "instruction. When the branch instruction is used, if the loop r does not meet the top of the last instruction circle in the circle, the flow of the formula in this case must return Go to the counter and get the top address. Eight: two: used to load the incremental program: address data to the program module "to two: 4ϊ = ί; ':: 1057-4473-PF.ptd page 6 559733 5. Description of the invention (2) = same Like most traditional instructions, branch instructions include opcodes and address information. Address information has traditionally been defined as the offset value of the number of addresses that the program will skip when using a branch. The offset value is traditionally The number of numbers is appended to the current instruction during the decoding cycle. After the decoding cycle, the = Ϊ address register can therefore be used to fetch the first instruction of the branch in the next fetch cycle. The efficiency of the formula execution can be obtained by Enhancement of pipeline 〇Ηη J. When pipelined, during the first fetch cycle, it is said that: = 丁 ΐ 个: ί 的 :: ## Decoding cycle of an instruction, meanwhile, an i-type instruction is said, * -The efficiency is extracted when the instruction is decoded. 'This method significantly increases the speed of program execution and in this case,' Efficiencies obtained by pipeline and eighth including the calculation of the target address of the branch must be shifted during the cycle ' The value must be added to the current value = additional instruction decoding additional action Need to be in the obvious part of the decoding cycle ;: address value. This address has not yet been determined and downloaded to the address register; the following :: The instruction bit period cannot be executed at the same time. Conversely, the extraction of the next deduction order The target instruction of the weekly branch is decoded until the fetch cycle of the f instruction of the week f is decoded, that is, the situation described above is as follows: it cannot be started. The first instruction reference after the branch. ; Wait for waiting, you can fetch branch target instructions by time. The address register enables the next fetch cycle. The present invention provides a method for reducing the number of waiting conditions. According to the invention 1057-4473-PF.ptd Page 7 559733 V. Description of the invention (3) Provides the address part of the operation part information of the branch instruction program code, part of the branch instruction address, memory The block instruction is used to determine the corresponding branch instruction as a branch instruction. * The bit in the memory area is used to confirm the execution of the branch. In the embodiment, the block is a branch instruction including the branch instruction in the same area. Block by block. In an embodiment, at least four yards are used to confirm the square block. The second code is the first block. The fourth code can also be used for the first code. In this setting, 01 is used to identify the previous block. For example, the third code and the branch, the former enters the lead in the embodiment. The device has a sequence for execution and includes a memory recognition section. Use the address of the branch to execute the branch instruction. The branch can be recorded as a one-block, and. Other areas and methods. The branch target refers to the block of the program branch. It is confirmed that the branch of the branch instruction is in the memory branch to the address of the branch. The block in this case is the identification corresponding to the address of the current branch instruction. Partial and memory addresses Partial block confirmation instructions. The corresponding memory block includes the instruction with the address associated with the operation. > Displacement value part The displacement value part of the block execution point is used to confirm the block confirmation branch instruction. d One of its blocks may have a branch target Leading block Marking instructions and points and Currently behind block confirmer block. One of the definitions is currently defined to be in the same meaning as the example in the same area. For example, the first code includes at least the code to determine the block, the branch of the same block is branched into the branch, and the fourth code is the 10 block and the latter block. The person who confirms the code is in the block and 11 can use the two-bit capacity to define the same second instruction of the branch instruction in the current area to define the respective directions of the current branch. For different directions. Because and the second code is 0 0 and and the latter is in the next area, use M to define the 6 branch prediction in the backward block, that is, the prediction

1057-4473-PF.ptd $ 8頁 559733 五、發明說明(4) ___ =否將要選取分支。在實施例中,如果為後退分支, 選:::選取分支。如果分支為前進,然後可預測將;會 二位iii。,因此;在上述第四圖解集中,如果定義碼的第 、…,則稱之為後退分支,並預測將選取分支。換 ,如果定義碼的第一位元為〇,則稱之為前進分 支’並預測將不會選取分支。 等待本質上用以減少或消除習知技術的分4 你^ 令的位移值位址資訊來直接產生分支目护 杯心,不-須執行例如附加位址偏移參數值至程式計數器; 送二=^ ί作上之時間消耗。所提供的分支目標位址直接 暫存器來當作分支指令的部份提取週期之立即: =的硬體功能。因此’在下個週期可用以提取分支目 =曰:l而不使週期錯失。本結果可用以與習知技術在解 =㈣因為要計算分支目標指令所包括的等待而跳= 乍比對。本發明因此提出在分支指令程序時 比I知技術更明顯改善效率的方法。 汴于 圖式簡單說明: 明如ί ί ?的別述和其他目㈤、特徵和優點將出現在本發 ^例中作更個別的描述。如附圖所示類似參考特 到不;觀點的相同部份。附圖無比例上的須要 而本發明的原則強調附圖中的位移值。 要反 r係解管線化結構下傳統的程式指令執 订’包括提取時序和解碼週期; 第2圖係圖形圖解管線化結構下其中分支指令的程序 l〇57-4473-PF.ptd $ 9頁 559733 五、發明說明(5) _____ 以傳統的程式指令執行,包括提取時序和 第3圖係圖形圖解傳統分支指令; 馬週期; 第4圖係圖形圖解傳統結構下指令 方塊圖; 7 、執行之功能概要 第5圖係圖形圖解依據本發明的形 支等待問題之指令的執行之功能:要开;二解決第4… 第6圖係圖解指令記憶體的位址和配’ 塊圖; 31 4伤之概要方 係圖:依本發明的分支指令 第8圖係圖解依據本發明之結構由 =概要圖, 執行時指令的記憶體區塊而執行指 y確二分支 以及 刀%概要方塊圖; 第9圖係圖解依據本發明可 法之程式“執行的時序時心解碼週置與方 符號說明: 肝’ %朋之時序圖; 1 0〜程序系統; 1 4〜遞增模組; 1 8〜分支預測; 2 2〜指令記憶體; 2 6〜總和模組; 122〜運算碼攔位 1 2 6〜區塊攔位,· 21 6〜多工器; 222〜遞減模組; 1 2〜程式計數器; 1 6〜多工器; 2 0〜位址暫存器 24〜指令暫存器; 102、1〇4、1〇6 〜區掩; 124〜位址參數值攔; 1 2 8〜位移值攔位; 220〜遞增模組; 2 2 3〜直接路徑。 画 1057-4473-PF.ptd 第10頁 559733 五、發明說明(6) 實施例之說明 執行解管線化結構下傳統的程式指令 週期期間,如第1圖所示,在第: 憶體配置位置來提取第= =數值Pc當作指令記 指令pc且遞增程式計 =^ 週期期間,將解碼 在第二個週期期間,解碼指令Pc ,疋的4令。 值PC + 2以提取確定的 八—,曰程式計數器參數 ;來連續控制指令提取及的:續 令提取。一般而;令】:= 指令解碼和下個指 第2圖係圖形圖解管線化結構中^八# ° 以傳統的程式指令執行,包括提ϋ &令的程序 示的時序圖,▲第一週期時 二如所 第-指令。在此例巾,指令為八“t=°十數㈣所確認之 如第3圖所示。如第3圖所示’:、:支:::支!令的格式 ,記憶體位移值部份。運算碼定義八日括運异碼部份和 分支被提取的情況。令的種類,例如, 流程將繼續的位址。傳统上:值部份定義程式 因此,在下個週二,:器的參數值pc。 a ’由附加目前位址到位移值中, 例如,pc+Dif來將位址將被載人程試計數器以執行計算 的分支。因為,附加要大量的時間來完成,分支指令的起 始位址’即分支目標指令位址’不會被載入至程式計數器 1057-4473-PF.ptd 第11頁 559733 五、發明說明(7) 中^到第^週期的末端。由於位址參數值PC + Disp所對應 ^ $支目^指令,直到第三週期才會被提取。因此,在計 f分支目標指令位址時週期會錯失。此情況通常當作分支 等待。 太换=為#圖形圖解傳統結構下指令的執行之功能概要 鬼圖。如第4圖所示。冑序系統1〇包括用 續執行的位址之藉士 士+ I哭彳9卡i二、 生夺日7連 中,假設^入f式计數 來執仃式指令。在本描述 的位址大小可應用在其也位址的大小。由程式計】 ::ΐίί;Γ=ΐ;和模組26。總和模組2°6將; 送往多工器U 6^ Ί '暫存1524相加產生的結果位移值 而遞增的結果提供 田2曰棋組"來遞增, 模組18來透過多的另一輸入端。由分支預測 供的位址。如果八擇輸入㈣來選擇多工器16提 工器26輸入由多工器26將位移值心從多 相加來產生的多工器16選擇之分;二i计數盗參數值 測模組18並下載至位址暫:器20支2'“立址至分支預 並下載至位址暫存器20。 ° 則’選擇遞增的位址 個執:址;;=址提供指令記憶體來存取下 指令在解碼或其t:::;r2;r指令暫存器“。 任何值的話,將可被令位移值部份如果* 跳到的位址,例如,當提:::二6:= 士流程將要 7馬刀支指令時。如上所 1057-4473-PF.ptd 第12頁 559733 五、發明說明(8) 述,此方法由於包括執行多工器26的時間而導致分支等 待。 第5圖圖形圖解依據本發明而解決第4圖中分支等待問 題之結構下指令的執行之功能概要方塊圖。在第5圖中, 代替程式計數器參數值的分支指令之位移值部份,位移值 ^接由J令記憶體22的指令中提取並直接輸入至多工器16 ===,人端來當作位移值的最低有效位元(lsBs),在此 J別=中,標示為15:0的多工器16最低有效位元。從指 的記憶體配置中提取分支指令在分支指令的提 此且在下個分提取週期開始之前迅速完成。因 :令在:的週期’可以提取為分支目標指令的下個 在位址暫存器2〇固5功提取週期開始之前它的仉址已出現 分支目標指;可被^成功的提取週期中分支指令和 述的分支等待。不錯失週期。可消除上面所描 令記ί ί ί 2 ::缺點由第6圖來圖解’第6圖為傳統指 22定義為用以建",置部份之概要方塊圖。指令記憶體 解例子中所亍數區塊102、104和丨〇6。如特別的圖 ffff16。因此,;—區塊具一組位址配置從oooou至 元與包括2“配署母二記憶體位址定義的配置16最低有效位 在程式置之特別的區塊。 存取間’在任何給予的㈣’程式計數器 為例。當分支指人;J的ΐ中一配置的指令。以區塊104 7開始什算時,依據上面所描述的方法, 1057-4473-PF.ptd 第13頁 559733 五、發明說明(9) $ : : 1 6位元位移值的部 來代替下個位址。執杆* ^ Γ他町1 0取低有效位兀位置 情況的缺點來自於實d、V:區塊104其中-個配置。此 中。因為這樣,所以;;立的配;必須在同-區塊 程式計數器的目前來數值g t小可一些限制。依據 開始計算分支指令時如取f配置。例如’當 則前進分支只能作小距離近區塊的末端, 程式目前執行靠近區塊的起@ ^八禮同樣的,如果 距離…清況限制了程=統支僅限制在可能的 為了解決此問題’本發明的實施例 位移值部份之一部份爽宗荔八士 人成也 刀又?日7旧 丨切來疋義刀支扣令應建立的指令記憶體 中之£塊。第7圖圖解依本發明的分支指令的袼式之概要 ^以第7圖為例’使用具有16位元位移值搁位的32位元 才曰令。可理解的本發明可應用在其他的大小。 、》對應到第7圖,指令120的格式包括具有位元16_31的 運算碼攔位122和具有位元〇 —15的位移值攔位128。128更 區分成具有位元0-13的位址參數值攔位和具有位元丨5 的區塊攔位126。2位元區塊欄位定義是否分支應以目前的 區塊(對應到PC)、目前領先(對應到Pc—〇或目前連結的區 塊(對應到pc+i)來取代相同的區塊。位址參數值攔位124 以分支目標指令應提取的定義區塊來定義位址。 因此,在本實施例的圖解中,區塊攔位丨26包括至少2 位元容量來定義用以確認區塊的至少4個碼。其中一個碼 疋義分支指令在相同的區塊。第二碼定確認目前落後區1057-4473-PF.ptd $ 8 pages 559733 V. Description of the invention (4) ___ = No branch will be selected. In the embodiment, if it is a backward branch, select ::: select branch. If the branch is forward, then predictable will; will be two iii. Therefore, in the above fourth set of illustrations, if the first, ... of the code is defined, it is called a backward branch, and the branch is predicted to be selected. In other words, if the first bit of the definition code is 0, it is called the forward branch 'and it is predicted that the branch will not be selected. Waiting to essentially reduce or eliminate the points of the conventional technology. You can directly generate branching and protecting the core of the shift value address information. No-you must perform, for example, adding an address offset parameter value to the program counter; send two = ^ ί Time spent on making. The provided branch target address is directly used as a part of the branch instruction fetch cycle immediately: = hardware function. Therefore, ′ can be used to extract branch heads in the next cycle = said: l without missing cycles. This result can be used to solve the problem with the conventional technique. = ㈣ Jump because the wait included in the branch target instruction is calculated. The present invention therefore proposes a method which improves the efficiency more significantly than the known technique when branching the instruction program. Brief description of the drawings: Mingru such as ί? Other descriptions and other objectives, features, and advantages will appear in the present example for a more individual description. Similar references are not shown as shown in the drawings; the same parts of the point of view. The drawings are not necessarily to scale and the principles of the present invention emphasize the displacement values in the drawings. To decompose the traditional program instruction under the de-pipelined structure, including the extraction timing and decoding cycle; Figure 2 is a graphical illustration of the branch instruction program under the pipelined structure. 1057-4473-PF.ptd $ 9 pages 559733 V. Description of the invention (5) _____ Executed by traditional program instructions, including extraction timing and traditional branch instructions shown in Figure 3 graphically; horse cycle; Figure 4 shows the block diagram of instructions under the traditional structure; Function Overview Figure 5 is a graphic diagram illustrating the function of the execution of the instruction of the shape-waiting problem according to the present invention: to open; second to solve the 4th ... Figure 6 is a block diagram illustrating the address and allocation of the instruction memory; 31 4 Diagram of the outline of the injury: According to the branch instruction of the present invention, FIG. 8 is a block diagram showing the execution of the instruction block and the knife% outline block according to the structure of the present invention. Figure 9 is a diagram illustrating the execution of the program according to the present invention, "the time sequence of the heart and the heart decoding cycle and square symbol description: the liver '% time sequence diagram; 1 0 ~ program system; 1 4 ~ incremental module; 1 8 ~ branch Test; 2 2 ~ instruction memory; 2 6 ~ sum module; 122 ~ op code block 1 2 6 ~ block block, 21 6 ~ multiplexer; 222 ~ decrement module; 1 2 ~ program counter ; 16 to multiplexer; 20 to address register 24 to instruction register; 102, 104, 106 to area mask; 124 to address parameter value block; 1 2 to 8 shift value Stop; 220 ~ incremental module; 2 2 3 ~ direct path. Draw 1057-4473-PF.ptd Page 10 559733 V. Description of the invention (6) Description of the embodiment Execute the traditional program instruction cycle under the de-pipelined structure During the period, as shown in Figure 1, in the first memory allocation position to extract the first = value Pc as the instruction to record the instruction pc and increment the program count = ^ During the cycle, the decoding instruction Pc will be decoded during the second cycle , 疋 4. Value PC + 2 to extract the determined eight-, said program counter parameters; to continuously control instruction extraction and: continued order extraction. General and; order]: = instruction decoding and the next refers to Figure 2 The graphic diagram of the pipelined structure ^ 八 # ° is executed by traditional program instructions, including the timing of program instructions Figure, ▲ In the first cycle, the second is the same as the-instruction. In this example, the instruction is eight "t = ° ten digits", as confirmed in Figure 3. As shown in Figure 3 ’:, :::::! The format of the command, the memory displacement value part. The operation code defines the case where the eighth difference code part and the branch are extracted. The type of order, for example, the address where the process will continue. Traditionally: the value part defines the program. So, next Tuesday, the parameter value of the device is pc. a 'consists of appending the current address to the offset value, for example, pc + Dif to branch the address to be carried in the trip counter to perform calculations. Because the addition takes a lot of time to complete, the start address of the branch instruction 'that is, the branch target instruction address' will not be loaded into the program counter 1057-4473-PF.ptd page 11 559733 V. Description of the invention (7 ) To the end of the ^ th period. Because the address parameter value PC + Disp corresponds to the ^ $ 支 目 ^ instruction, it will not be fetched until the third cycle. Therefore, the cycle is missed when counting the f branch target instruction address. This situation is usually treated as a branch wait. Taichang = # Graphical overview of the function execution of instructions under the traditional structure. As shown in Figure 4. The sequence system 10 includes a taxi with a continuously executed address + I cry 9 cards i 2 and 7 days in a row, and it is assumed that the f-type count is used to execute the f-type instruction. The address size described in this description can be applied to the size of its address as well. By program] :: ΐίί; Γ = ΐ; and module 26. The sum module 2 ° 6 will be sent to the multiplexer U 6 ^ Ί 'temporarily store the result shifted by the 1524 addition and increase the result to provide Tian 2 said chess group "to increment, module 18 to pass through multiple The other input. The address provided by the branch prediction. If eight selection inputs are selected, the multiplexer 16 is selected. The lifter 26 inputs the selection points of the multiplexer 16 generated by the multiplexer 26 by adding the displacement value center from the polyphase; 18 and download to the address temporary: 20 branches 2 '"stand-up to branch pre-download and download to the address temporary register 20. ° then' select the incremental address address: address;; = address provides instruction memory to Access the instruction under decoding or its t :::; r2; r instruction register ". Any value will be allowed to shift the address if the * jumps to the address. For example, when mentioning ::: 2 6: = the taxi process will require 7 sabre instructions. As mentioned above 1057-4473-PF.ptd Page 12 559733 V. Description of the Invention (8), this method involves branch waiting due to the time including execution of the multiplexer 26. Fig. 5 is a schematic block diagram illustrating the function execution of instructions under the structure of solving the branch wait problem in Fig. 4 according to the present invention. In Fig. 5, instead of the displacement value of the branch instruction of the program counter parameter value, the displacement value ^ is fetched from the instruction of the J command memory 22 and directly input to the multiplexer 16 ===. The least significant bit (lsBs) of the displacement value, in this J = =, the least significant bit of the multiplexer 16 labeled 15: 0. Fetching a branch instruction from the memory configuration of the finger completes immediately before the branch instruction is taken and before the next branch fetch cycle begins. Because: the cycle of: can be fetched as the branch target instruction. The next address address register 2 solid 5 power extraction cycle begins before its address has a branch target finger; can be used in the successful fetch cycle. The branch instruction and the branch wait are described. Well out of cycle. The above-mentioned order can be eliminated ί ί 2 :: The shortcoming is illustrated by FIG. 6 ′ FIG. 6 is a block diagram of the traditional reference 22 which is defined to be used for construction. The instruction memory interprets the blocks 102, 104, and 06 in the example. Such as the special picture ffff16. Therefore,;-a block has a set of address configurations from oooou to yuan and includes a 16-bit configuration with 2 memory address definitions. The least significant bit is located in a special block of the program. For example, when the branch refers to a person; the instruction in J's one configuration. Starting at block 104 7 and counting, according to the method described above, 1057-4473-PF.ptd Page 13 559733 V. Description of the invention (9) $:: 16 The part of the 6-bit displacement value is used to replace the next address. The disadvantage of the lever * ^ Γ 1 10 taking the position of low effective position comes from real d, V: There is one configuration in block 104. Here. Because of this, the stand-alone configuration must be limited to the current value of the block-counter counter, which may have some restrictions. According to the f instruction when starting to calculate the branch instruction. For example, 'Dangzhe forward branch can only be used as a short distance near the end of the block. The program currently executes close to the end of the block. @ ^ 八 礼 Similarly, if the distance ... the condition limits the process = system support is only limited to possible in order to solve This problem is part of the displacement value part of the embodiment of the present invention. The eight scholars also made a knife? The 7th old 丨 cut the block in the instruction memory that should be established by the knife support order. Figure 7 illustrates the outline of the method of the branch instruction according to the present invention ^ The picture shows an example of a command using a 32-bit bit with a 16-bit shift value. It is understandable that the present invention can be applied to other sizes. ">> Corresponds to Fig. 7. The format of the instruction 120 includes the bit 16_31. The operation code block 122 and the shift value block 128 with bits 0-15 are 128. 128 is further distinguished into a block with an address parameter value of bits 0-13 and a block block 126 with bits 丨 5. The 2-bit block field defines whether the branch should replace the same block with the current block (corresponding to the PC), currently leading (corresponding to Pc-0 or the currently connected block (corresponding to pc + i)). The address parameter value block 124 defines the address based on the definition block that the branch target instruction should fetch. Therefore, in the illustration of this embodiment, the block block 26 includes at least 2 bits of capacity to define the area for confirmation. At least 4 codes of the block. One of the codes means that the branch instruction is in the same block. The second code determines the confirmation target. Backward area

IM 1057-4473-PF.ptd 第14頁 559733 五、發明說明(ίο) 塊,第三碼確認目前領先區塊,第四碼可用以定確認分支 在相同的區塊而不同方向,而第一碼可用以定確認分支在 相同的區塊而相反方向。因此,以此結構下舉一例子,第 一和第二碼例如為0 0和〇 1,可用以確認前進分支,前者在 此區塊中而後者進入下個區塊。第三和第四碼例如為1 0和 11,可用以確認後退分支,前者進入區塊中而後者在目前 區塊中。因此,位址1 5的位元0可表示前進分支,位址1 5 的位元1可表示後退分支。 第8圖係圖解依據本發明之結構由分支指令確認分支 執行時指令的記憶體區塊而執行指令之功能概要方塊圖。& 在此結構下,1 4最低有效位元〇 -1 3從指令記憶體2 2連結到 多工器216的4輸入端之其中3輸入端。其他18位元31 - 14由 程式計數器1 2中取得並結合指令記憶體22的1 4最低有效位 元。其他1 8位元3 1 - 1 4連結到遞增模組2 2 0、遞減模組2 2 2 和直接路徑223,並將位元結果與指令記憶體22的14最低 有效位元在多工器216的輸入端結合。遞增模組220用以產 生當分支為記憶體下個區塊的位址;遞減模組222用以產 生當分支為記憶體目前領先區塊的位址;直接路徑223用 以產生當分支為記憶體目前區塊的位址;多工器2 1 6的第 四輸入端直接從程式計數器中接收位元31-0並供一船遠續〇 程式的執行所使用。 、 如分支偵測模組18用以選擇載入位址暫存器2〇的位址。 選沒有選取分支,則選擇14的位元31-0位址。如果分支 、、擇至下個區塊,則選擇220的位元3 1-14位址。如果分支IM 1057-4473-PF.ptd Page 14 559733 V. Description of the invention (ίο) block, the third code confirms the current leading block, the fourth code can be used to confirm that the branch is in the same block but different directions, and the first The code can be used to confirm that the branches are in the same block but in opposite directions. Therefore, an example is given in this structure. The first and second codes are, for example, 0 0 and 0 1, which can be used to confirm the forward branch, the former being in this block and the latter entering the next block. The third and fourth codes are, for example, 10 and 11, which can be used to confirm the backward branch, the former enters the block and the latter is in the current block. Therefore, bit 0 at address 15 can represent the forward branch, and bit 1 at address 15 can represent the backward branch. Fig. 8 is a block diagram illustrating the function of an instruction executed by a branch instruction confirming the execution of a memory block of the instruction according to the structure of the present invention. & In this structure, the 14 least significant bits of 0-1 3 are connected from the instruction memory 2 2 to 3 of the 4 inputs of the multiplexer 216. The other 18 bits 31-14 are obtained from the program counter 12 and combined with the least significant 4 bits of the instruction memory 22. The other 18 bits 3 1-1 4 are connected to the increment module 2 2 0, the decrement module 2 2 2 and the direct path 223, and the bit result and the 14 least significant bits of the instruction memory 22 are in the multiplexer. The input of 216 is combined. Increment module 220 is used to generate the address of the next block when the branch is in memory; decrement module 222 is used to generate the address of the current leading block when the branch is in memory; direct path 223 is used to generate when the branch is memory The address of the current block; the fourth input of the multiplexer 2 1 6 directly receives bits 31-0 from the program counter and is used by a ship to continue the program execution. For example, the branch detection module 18 is used to select an address of the address register 20 to be loaded. If no branch is selected, the bit 31-0 address of 14 is selected. If branch,, select to the next block, then select bit 3 1-14 address of 220. If branch

559733 五、發明說明(11) 選取至先前區塊,則選擇222的位元31-14位址。如果分支 選取至目前的區塊,則選擇223的位元3 1-14位址。 依據本發明的具有區塊確認碼之分支指令可用以作分 支偵測的幫助。如上所示,位址1 5的位元0可表示前進分 支,位址1 5的位元1可表示後退分支。一般分支預測的方 法中僅取後退分支而不取前進分支。因此,依據本發明, 如果位址1 5的位元0則不取分支,如果位址1 5的位元}則取 分支。 因此’依據本發明在第8圖的實施例中,提取分支的 位址可能範圍超過第5圖的實施例。使用後者的方法,位 址參數值的位移值的部份僅特定的2h可能位址,而非第5 圖前者方法中的216可能位址。然而,在此例中,使用第8 圖的方法,三個特定的記憶體區塊中的每一個之2U可能位 址。所以大大地增加可能分支距離和實現程式彈性的結 第9圖係圖解依據本發明降低分支等待的裝置與方法 之程式指令執行的時序時間和解碼週期之時序圖。如第9 圖所示,依據本發明,可在成功的週期内提取分支指令 分支目標指令。消除了傳統方法中所發現的分支等待。 在此註解本發明本以上述不同的方法來實行。舉 在指令快取記憶體載入欲執行之指令開始執行前 立即實灯’而不必以程式編譯或連結來個別改變指令本 :元:者:ί ί t確'忍攔位在示範的實施例中為2 位兀搁位 田載入才日1時附加$丨| 人 了加司適當的指令快取記憶體配559733 V. Description of the invention (11) If the previous block is selected, the bits 31-14 of 222 are selected. If the branch is selected to the current block, the bit 3 1-14 address of 223 is selected. A branch instruction with a block confirmation code according to the present invention can be used as an aid for branch detection. As shown above, bit 0 at address 15 can represent the forward branch, and bit 1 at address 15 can represent the backward branch. In the general branch prediction method, only the backward branch is taken and the forward branch is not taken. Therefore, according to the present invention, no branch is taken if bit 0 of address 15 is taken, and a branch is taken if bit 15 of address 15 is taken. Therefore, in the embodiment of FIG. 8 according to the present invention, the address of the extraction branch may exceed the embodiment of FIG. 5. Using the latter method, the shift part of the address parameter value is only a specific 2h possible address, rather than the 216 possible address in the former method in FIG. 5. However, in this example, using the method of Figure 8, the 2U possible address of each of the three specific memory blocks. Therefore, the possible branch distance is greatly increased and the flexibility of the program is realized. FIG. 9 is a timing diagram illustrating the timing of program instruction execution and the decoding cycle of the apparatus and method for reducing branch waiting according to the present invention. As shown in Fig. 9, according to the present invention, a branch instruction branch target instruction can be fetched in a successful cycle. Eliminates branch waits found in traditional methods. It is noted here that the present invention is implemented in different ways as described above. For example, the instruction cache is loaded immediately before the execution of the instruction to be executed. It is not necessary to change the instruction individually by program compilation or linking. In the middle of the table, 2 people are loaded. At 1 o'clock on the day, they are added with $ 丨 |

559733 五、發明說明(12) 置中。 在此實施例中,舉例所述的3 2位元指令,分支指令的 1 6位元位址參數值和2位元方塊確認參數值。可理解的這 些位元的數量在不超過本發明的範圍之内可以改變。 本發明的較佳實施例描述如上,然而,在不脫離本發 明之精神和範圍内,當可做些許的更動與潤飾。此外,其 餘實施例皆在下列申請專利範圍内。 參559733 V. Description of invention (12) Centered. In this embodiment, the 32-bit instruction, the 16-bit address parameter value of the branch instruction, and the 2-bit block confirmation parameter value are described as examples. It is understood that the number of these bits can be changed within the scope of the present invention. The preferred embodiment of the present invention has been described above. However, a few changes and modifications can be made without departing from the spirit and scope of the present invention. In addition, the other embodiments are all within the scope of the following patent applications. Participate

1057-4473-PF.ptd 第17頁1057-4473-PF.ptd Page 17

Claims (1)

559733559733 1· 一種分支指令程序的方 一指令記憶體的一區塊之一程 述方法包括: 法,上述分支指令為儲存在 式的複數指令其中一種,上 勹扛5 1、'、有運算部份包括一運算碼和具有-位址部份 匕’η目標指令位址相關的位址資訊之上述分支指令而 一程式的執行對應到上述分支指令的分支; 提供上述分支指令的上述位址部份和一記憶體區塊定 邛伤和一位移值部份,上述記憶體區塊確認部份確認了 亡边分支指令對應的上述分支執行在上述記憶體的 塊;以及 八使用上述刀支才曰令上述位址部份的上述位移值部份來 =支至上述分支目標指令當作上述分支指令的上述位址部 份之上述記憶體區塊確認部份確認的上述記憶體區塊 位址。 、2.如申請專利範圍第1項所述之方法,其中分支至上 述分支目標指令包括使用上述分支指令的上述位址部份之 上述位移值部份和上述分支指令的上述位址部份之上述記 憶體區塊確認部份來產生一分支目標位址。 ° 3·如申請專利範圍第2項所述之方法,其中上述分支 目標位址在上述分支指令提取週期間期產生。 4·如申請專利範圍第1項所述之方法,其中上述分支 指令的上述位址部份之上述記憶體區塊確認部份確認了分 支執行的區塊為一上述記憶體區塊領先上述分支指^的二 憶體區塊,上述記憶體區塊落後上述分支指令的記憶體區1. A method for describing one block of a block of instruction memory of a branch instruction program includes the following method: The branch instruction is one of the plural instructions stored in the formula, and the upper part carries 5 1, ', with an operation part. The above-mentioned branch instruction including an operation code and address information related to the target instruction address address portion, and the execution of a program corresponds to the branch of the above-mentioned branch instruction; providing the above-mentioned address portion of the branch instruction With a memory block to determine the damage and a displacement value part, the above-mentioned memory block confirmation part confirms that the above-mentioned branch corresponding to the dead-end branch instruction is executed in the above-mentioned memory block; and Let the above-mentioned displacement value part of the above-mentioned address part = the above-mentioned memory block address confirmed by the above-mentioned memory block confirmation part of the above-mentioned address part of the above-mentioned branch instruction as the branch target instruction. 2. The method according to item 1 of the scope of patent application, wherein branching to the branch target instruction includes using the above-mentioned shift value portion of the above-mentioned address portion of the above-mentioned branch instruction and the above-mentioned address portion of the above-mentioned branch instruction. The above-mentioned memory block confirmation part generates a branch target address. ° 3. The method according to item 2 of the scope of patent application, wherein the branch target address is generated during the fetch cycle of the branch instruction. 4. The method according to item 1 of the scope of patent application, wherein the above-mentioned memory block confirmation portion of the above-mentioned address portion of the above-mentioned branch instruction confirms that the block executed by the branch is a above-mentioned memory block leading the above-mentioned branch Refers to the memory area of ^, the memory area is behind the memory area of the branch instruction 559733 六、申請專利範圍 塊和上述分支指令的記憶體區塊。 5·、^申請專利範圍第1項所述之方法,其中更包括使 用上述分支指令的上述位址部份之上述記憶體區塊確認部 份來預測是否提取上述分支。 匕八6·如申請專利範圍第1項所述之方法,其中上述分支 和令的上述位址部份之上述記憶體區塊確認部份確認了至 少4碼來預測是否要提取上述分支。 7·如申請專利範圍第6項所述之方法,其中上述4碼包 括前進分支的一第一對碼和落後分支的一第二對碼。 8 ·如申明專利範圍第7項所述之方法,其中上述第一 f碼包括一碼用以前進分支至下一記憶體區塊和一碼用以 刖進分支至上述分支指令的上述區塊。 9·如申請專利範圍第7項所述之方法,其中上述第二 :J ί ΐ:碼用以後退分支至一領先記憶體區塊和-碼用 後、刀支至上述分支指令的上述區塊。 的複=·/人種Λ支指令的程序裝置,上述分支指令為程式 的複數私令其中一種,上述裝置包括: 一指令記憶體,用以儲存指令,上述分支 二-,上述分支:令具* 關的位址資訊之一位址括分支目標指令相 塊確認部份和一位移值部:址部份具有一記憶體區 認了執行上述*支指令對 ^ 己憶、體區塊確認部份確 才應的分支在上述記憶體的一區 Η 1057-4473-PF.ptd 第19頁 559733559733 VI. Patent application scope block and memory block of the above branch instruction. 5. The method described in item 1 of the scope of patent application, which further includes using the above-mentioned memory block confirmation portion of the above-mentioned address portion of the branch instruction to predict whether to extract the above-mentioned branch. The method described in item 1 of the scope of the patent application, wherein the above-mentioned memory block confirmation portion of the above-mentioned address portion of the branch and the order confirms at least 4 yards to predict whether the above-mentioned branch is to be extracted. 7. The method according to item 6 of the scope of patent application, wherein the above 4 codes include a first pair of codes for the forward branch and a second pair of codes for the backward branch. 8 · The method as described in claim 7 of the patent scope, wherein the first f-code includes a code to branch forward to the next memory block and a code to advance branch to the above-mentioned block of the branch instruction . 9. The method as described in item 7 of the scope of patent application, wherein the second: J: the code branches backward to a leading memory block and-after the code is used, the knife branches to the above-mentioned area of the branch instruction Piece. The complex = · / race Λ branch instruction program device, the above branch instruction is one of the program's plural private orders, the above device includes: an instruction memory for storing instructions, the above branch two-, the above branch: order * One of the relevant address information is an address including the branch target instruction phase block confirmation part and a displacement value part: the address part has a memory area that recognizes the execution of the above * branch instruction pair ^ self-memory, body block confirmation part A proper branch should be in the above area of the memory Η 1057-4473-PF.ptd Page 19 559733 559733 六、申請專利範圍 包括前進分支的一第一對碼和落後分支的一第二對碼。 1 7.如申請專利範圍第1 6項所述之裝置,其中上述第 一對碼包括一碼用以前進分支至一下個記憶體區塊和一碼 用以前進分支至上述分支指令的上述區塊。 1 8.如申請專利範圍第1 6項所述之裝置,其中上述第 二對碼包括一碼用以後退分支至一領先記憶體區塊和一碼 用以後退分支至上述分支指令的上述區塊。559733 6. The scope of patent application includes a first pair of codes for the forward branch and a second pair of codes for the backward branch. 1 7. The device according to item 16 of the scope of patent application, wherein the first pair of codes includes a code for branching to the next memory block and a code for branching to the above-mentioned area of the branch instruction. Piece. 1 8. The device according to item 16 of the scope of patent application, wherein the second pair of codes includes one code branching backward to a leading memory block and one code branching backward to the above-mentioned area of the branch instruction Piece. 1057-4473-PF.ptd 第21頁1057-4473-PF.ptd Page 21
TW090128029A 2000-11-10 2001-11-12 Method and apparatus for reducing branch latency TW559733B (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US71069900A 2000-11-10 2000-11-10

Publications (1)

Publication Number Publication Date
TW559733B true TW559733B (en) 2003-11-01

Family

ID=24855140

Family Applications (1)

Application Number Title Priority Date Filing Date
TW090128029A TW559733B (en) 2000-11-10 2001-11-12 Method and apparatus for reducing branch latency

Country Status (3)

Country Link
AU (1) AU2002227451A1 (en)
TW (1) TW559733B (en)
WO (1) WO2002039272A1 (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6970985B2 (en) 2002-07-09 2005-11-29 Bluerisc Inc. Statically speculative memory accessing
US20050114850A1 (en) 2003-10-29 2005-05-26 Saurabh Chheda Energy-focused re-compilation of executables and hardware mechanisms based on compiler-architecture interaction and compiler-inserted control
US7996671B2 (en) 2003-11-17 2011-08-09 Bluerisc Inc. Security of program executables and microprocessors based on compiler-architecture interaction
US8607209B2 (en) 2004-02-04 2013-12-10 Bluerisc Inc. Energy-focused compiler-assisted branch prediction
US20080126766A1 (en) 2006-11-03 2008-05-29 Saurabh Chheda Securing microprocessors against information leakage and physical tampering

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE3940450A1 (en) * 1989-12-07 1991-06-13 Voith Gmbh J M Squeegee device
CA2045790A1 (en) * 1990-06-29 1991-12-30 Richard Lee Sites Branch prediction in high-performance processor
DE69505717T2 (en) * 1994-03-08 1999-06-24 Digital Equipment Corp Method and apparatus for determining and making cross-routine subroutine calls
US5608886A (en) * 1994-08-31 1997-03-04 Exponential Technology, Inc. Block-based branch prediction using a target finder array storing target sub-addresses

Also Published As

Publication number Publication date
WO2002039272A1 (en) 2002-05-16
AU2002227451A1 (en) 2002-05-21
WO2002039272A9 (en) 2003-09-04

Similar Documents

Publication Publication Date Title
CN100428282C (en) Method and apparatus for token triggered multithreading
US6009509A (en) Method and system for the temporary designation and utilization of a plurality of physical registers as a stack
TW200525417A (en) Aliasing data processing registers
TW548590B (en) VLIW processor and code compression device
JPH02287704A (en) Decompiling method of ladder-type logical machine word program
TW200825906A (en) Method and system to combine multiple register units within a microprocessor
TW559733B (en) Method and apparatus for reducing branch latency
TW448403B (en) Pipeline data processing device and method for executing multiple data processing data dependent relationship
JPH10187661A (en) Method for entering scalar value of computer into vector
CN108804137A (en) For the conversion of double destination types, the instruction of cumulative and atomic memory operation
JP2002229777A (en) Processor device
EP0032515A1 (en) A method of pipeline control for a computer
JPH1078871A (en) Plural instruction parallel issue/execution managing device
CN111443948B (en) Instruction execution method, processor and electronic equipment
TW201342205A (en) Apparatus and method for selecting elements of a vector computation
US7644256B2 (en) Method in pipelined data processing
JPH06282700A (en) Information recording medium containing cpu
JP2000194556A (en) Instruction look-ahead system and hardware
TWI274285B (en) Branch instruction prediction and skipping using addresses of precedent instructions
JPS59123937A (en) By-pass control system of arithmetic device
JP2009059187A (en) Microprocessor and data processing method
JP3490191B2 (en) calculator
JPS5843042A (en) Advance controlling system
TWI311721B (en)
JPH0238966B2 (en)

Legal Events

Date Code Title Description
GD4A Issue of patent certificate for granted invention patent
MM4A Annulment or lapse of patent due to non-payment of fees