TW559733B

TW559733B - Method and apparatus for reducing branch latency

Info

Publication number: TW559733B
Application number: TW090128029A
Authority: TW
Inventors: John L Redford
Original assignee: Chipwrights Design Inc
Priority date: 2000-11-10
Filing date: 2001-11-12
Publication date: 2003-11-01
Also published as: WO2002039272A1; AU2002227451A1; WO2002039272A9

Abstract

A method and apparatus for reducing latency in execution of branch instructions are provided. A branch instruction includes an opcode portion and an address portion. The address portion includes a displacement and a code which identifies a block in the instruction memory in which the branch target instruction is located and, therefore, the block to which execution will branch in response to the branch instruction. During the fetch cycle in which the branch instruction is fetched, the displacement portion of the branch instruction is reinserted into the address register as the address of the next instruction to be fetched. The code that identifies the block containing the branch target instruction is used to ensure that the address register is pointing to the correct block. As a result, during the next instruction fetch cycle, the branch target instruction is fetched for execution. Hence, the branch processing latency found in prior systems in which the next fetch cycle is skipped while the branch target address is computed, such as by adding an offset to the program counter value, is eliminated.

Description

559733 五、發明說明（1) 傳統上程序系統以多階（stage)或多週指令。記憶體位址儲存所須執行的指令執订私式式計數器來讀取並寫入到位址暫丁時由程用以存取記憶體配置的指令，並從子益的位址且載入至指令暫存器。並以…己憶體中提取指令定指心：元位元的字元’傳統上指令包括確 ::::夕位兀運异碼及記憶體位址資訊。舉例而古，在才曰々儿成時，記憶體位址資訊可句口記憶异兀（opreand)體位置或儲存指令結果的在提取指令之後並下都5扣人* 期。在解碼迅期湘二：暫存器，並執行解碼週指令和所要求的判定程序：：：t 丁的確認元。當解碼完成後，就後可被用於回收運算在一般指令執行的連續户增程器以指向指令記憶體的下=指令之後’遞刀支才日令為一種非當並、基半s;丨改變指令執行時的一般連；;：”指令。分支指令用以時，如果未滿足迴圈中° r到達迴圈的最後指令圈的最頂，。在此情“的式的流程必須回到迴數器而獲得頂部之位址。八：二：用以載入遞增程式計 :址資料來將程序模組“至二:4ϊ=ί;‘:： 1057-4473-PF.ptd 第6頁 559733 五、發明說明（2) =同大部份傳統的指令一樣，分支指令包括運算碼和位址資訊。位址資訊傳統上採用定義為使用分支時程式將跳過的位址數量之偏移值的方式。偏移值傳統上為有號數目且在解碼週期時附加到目前指令上。在解碼週期之後， = Ϊ位址暫存器因此在下個提取週期時便可用以提取分支的第一指令。，式執行的效率可借由管線化〇 Ηη J加強。在管線化時，在第一提取週期期間提=曰=丁 ΐ個：ί的；::#一指令的解碼週期，同時地丄亍i式個指令是說，*第-指令解碼時提取下效率。。’此方法明顯地增加程式執行的速度和在此情況下’由管線化可獲八與包括計算分支目標位址的等η:的效率須週期期間’位移值必須加在目前數=支指令解碼附加動作須要在解碼週期的明顯部；：、位址值。這址尚未判定下載至位址暫存器中，;以下個：：個指令位期無法同時間執行。相反斤乂下個扣令的提取週支目標指令，直到解碼週 f旨令的提取週期，即分上面描述的情況為：：成之後才能開始。分支之後的第-指令參考。㈣求減“ 在提取分支指令之後迅速載入 &心令之等待，可借由期時能提取分支目標指令。位址暫存器使下個提取週本發明提供一減少^去笠a〜支4待條件的方法。依據本發明 1057-4473-PF.ptd 第7頁 559733 五、發明說明（3) 而提供分支指令程算碼的運算螞部份資訊之位址部份，支指令位址的部份份，記憶體區塊確支對應的分支指令份來當作分支指令 *忍在記憶體區的位當獲得分支時部份用以確認執行在實施例中，區塊為包括分支指令的支指令在同一區塊區塊。在實施例中，至少四碼來確認方區塊。其中第二碼先區塊。第四碼用第一碼亦可用以定此，在此設定下舉 01，可用以碟認前塊。例如第三碼和支，前者進入領先在實施例中’ 序的裝置和具有分用以執行包括記憶認部份用。使用分的位址部址來執行分支指令分支在記可為一-可區塊，而。其他區與方法。支目標指程式分支體區塊確以確認在支指令的份的記憶分支至分的位址部憶體的區月匕區塊之在此情況塊為目前分支指令令的位址對應的分認部份和記憶體的位址部份體區塊確支指令。份的記憶塊對應的包括具有運相關的位址支指令。> 位移值部區塊執行分的位移值部認部份來確體區塊確認分支指令。d 其區塊之一可能下分支目領先區塊標指令與分和目前落後區塊確認器塊。其中一定義目前落以定義在相義在相同區例子，例如第包括至少碼用以確後區塊，同區塊的塊的分支進分支，珂第四碼為1 0 區塊而後者區塊確認部碼者在區塊和11，可在目前區份可用以二位元容量以定義認分支指令的相同第二螞定義目前領分支之各別方向。為不同方向。因和第二碼為0 0和中而後者在下一區用Μ定義後退分塊中6 分支預測，即預測559733 V. Description of the invention (1) Traditionally, a program system uses multi-stage or multi-cycle instructions. The instruction to be executed in the memory address storage is a private counter to read and write to the address. The program temporarily uses the instruction to access the memory configuration, and loads the instruction from the sub address and loads it into the instruction. Register. And the instructions are extracted from ... memory body: the character of meta-bits ’traditionally, the instructions include ::::: Xi Wuyun code and memory address information. For example, in ancient times, the memory address information can be used to memorize opreand body positions or store instruction results after the instruction is fetched and issued for 5 times *. During the fast decoding phase, the second: register, and execute the decoding cycle instruction and the required judgment procedure ::: t ding confirmation element. When the decoding is completed, it can be used for recovery operations. In the general instruction execution, the continuous range extender points to the instruction memory. After the instruction = the knife is only ordered as a kind of improper union and base half s;丨 Change the general connection when the instruction is executed;;: "instruction. When the branch instruction is used, if the loop r does not meet the top of the last instruction circle in the circle, the flow of the formula in this case must return Go to the counter and get the top address. Eight: two: used to load the incremental program: address data to the program module "to two: 4ϊ = ί; ':: 1057-4473-PF.ptd page 6 559733 5. Description of the invention (2) = same Like most traditional instructions, branch instructions include opcodes and address information. Address information has traditionally been defined as the offset value of the number of addresses that the program will skip when using a branch. The offset value is traditionally The number of numbers is appended to the current instruction during the decoding cycle. After the decoding cycle, the = Ϊ address register can therefore be used to fetch the first instruction of the branch in the next fetch cycle. The efficiency of the formula execution can be obtained by Enhancement of pipeline 〇Ηη J. When pipelined, during the first fetch cycle, it is said that: = 丁 ΐ 个: ί 的 :: ## Decoding cycle of an instruction, meanwhile, an i-type instruction is said, * -The efficiency is extracted when the instruction is decoded. 'This method significantly increases the speed of program execution and in this case,' Efficiencies obtained by pipeline and eighth including the calculation of the target address of the branch must be shifted during the cycle ' The value must be added to the current value = additional instruction decoding additional action Need to be in the obvious part of the decoding cycle ;: address value. This address has not yet been determined and downloaded to the address register; the following :: The instruction bit period cannot be executed at the same time. Conversely, the extraction of the next deduction order The target instruction of the weekly branch is decoded until the fetch cycle of the f instruction of the week f is decoded, that is, the situation described above is as follows: it cannot be started. The first instruction reference after the branch. ; Wait for waiting, you can fetch branch target instructions by time. The address register enables the next fetch cycle. The present invention provides a method for reducing the number of waiting conditions. According to the invention 1057-4473-PF.ptd Page 7 559733 V. Description of the invention (3) Provides the address part of the operation part information of the branch instruction program code, part of the branch instruction address, memory The block instruction is used to determine the corresponding branch instruction as a branch instruction. * The bit in the memory area is used to confirm the execution of the branch. In the embodiment, the block is a branch instruction including the branch instruction in the same area. Block by block. In an embodiment, at least four yards are used to confirm the square block. The second code is the first block. The fourth code can also be used for the first code. In this setting, 01 is used to identify the previous block. For example, the third code and the branch, the former enters the lead in the embodiment. The device has a sequence for execution and includes a memory recognition section. Use the address of the branch to execute the branch instruction. The branch can be recorded as a one-block, and. Other areas and methods. The branch target refers to the block of the program branch. It is confirmed that the branch of the branch instruction is in the memory branch to the address of the branch. The block in this case is the identification corresponding to the address of the current branch instruction. Partial and memory addresses Partial block confirmation instructions. The corresponding memory block includes the instruction with the address associated with the operation. > Displacement value part The displacement value part of the block execution point is used to confirm the block confirmation branch instruction. d One of its blocks may have a branch target Leading block Marking instructions and points and Currently behind block confirmer block. One of the definitions is currently defined to be in the same meaning as the example in the same area. For example, the first code includes at least the code to determine the block, the branch of the same block is branched into the branch, and the fourth code is the 10 block and the latter block. The person who confirms the code is in the block and 11 can use the two-bit capacity to define the same second instruction of the branch instruction in the current area to define the respective directions of the current branch. For different directions. Because and the second code is 0 0 and and the latter is in the next area, use M to define the 6 branch prediction in the backward block, that is, the prediction

1057-4473-PF.ptd $ 8頁 559733 五、發明說明（4) ___ =否將要選取分支。在實施例中，如果為後退分支，選：：：選取分支。如果分支為前進，然後可預測將；會二位iii。，因此;在上述第四圖解集中，如果定義碼的第、…，則稱之為後退分支，並預測將選取分支。換，如果定義碼的第一位元為〇，則稱之為前進分支’並預測將不會選取分支。等待本質上用以減少或消除習知技術的分4 你^ 令的位移值位址資訊來直接產生分支目护杯心，不-須執行例如附加位址偏移參數值至程式計數器; 送二=^ ί作上之時間消耗。所提供的分支目標位址直接暫存器來當作分支指令的部份提取週期之立即： =的硬體功能。因此’在下個週期可用以提取分支目 =曰：l而不使週期錯失。本結果可用以與習知技術在解 =㈣因為要計算分支目標指令所包括的等待而跳= 乍比對。本發明因此提出在分支指令程序時比I知技術更明顯改善效率的方法。汴于圖式簡單說明：明如ί ί ?的別述和其他目㈤、特徵和優點將出現在本發 ^例中作更個別的描述。如附圖所示類似參考特到不；觀點的相同部份。附圖無比例上的須要而本發明的原則強調附圖中的位移值。要反 r係解管線化結構下傳統的程式指令執订’包括提取時序和解碼週期；第2圖係圖形圖解管線化結構下其中分支指令的程序 l〇57-4473-PF.ptd $ 9頁 559733 五、發明說明（5) _____ 以傳統的程式指令執行，包括提取時序和第3圖係圖形圖解傳統分支指令；馬週期；第4圖係圖形圖解傳統結構下指令方塊圖； 7 、執行之功能概要第5圖係圖形圖解依據本發明的形支等待問題之指令的執行之功能：要开;二解決第4… 第6圖係圖解指令記憶體的位址和配’ 塊圖； 31 4伤之概要方係圖：依本發明的分支指令第8圖係圖解依據本發明之結構由 =概要圖，執行時指令的記憶體區塊而執行指 y確二分支以及刀％概要方塊圖；第9圖係圖解依據本發明可法之程式“執行的時序時心解碼週置與方符號說明：肝’ ％朋之時序圖； 1 0〜程序系統； 1 4〜遞增模組； 1 8〜分支預測； 2 2〜指令記憶體； 2 6〜總和模組； 122〜運算碼攔位 1 2 6〜區塊攔位，· 21 6〜多工器； 222〜遞減模組； 1 2〜程式計數器； 1 6〜多工器； 2 0〜位址暫存器 24〜指令暫存器； 102、1〇4、1〇6 〜區掩； 124〜位址參數值攔； 1 2 8〜位移值攔位； 220〜遞增模組； 2 2 3〜直接路徑。画 1057-4473-PF.ptd 第10頁 559733 五、發明說明（6) 實施例之說明執行解管線化結構下傳統的程式指令週期期間，如第1圖所示，在第：憶體配置位置來提取第= =數值Pc當作指令記指令pc且遞增程式計 =^ 週期期間，將解碼在第二個週期期間，解碼指令Pc ，疋的4令。值PC + 2以提取確定的八—，曰程式計數器參數 ;來連續控制指令提取及的：續令提取。一般而；令】：= 指令解碼和下個指第2圖係圖形圖解管線化結構中^八# ° 以傳統的程式指令執行，包括提ϋ &令的程序示的時序圖，▲第一週期時二如所第-指令。在此例巾，指令為八“t=°十數㈣所確認之如第3圖所示。如第3圖所示’:、：支：：：支！令的格式，記憶體位移值部份。運算碼定義八日括運异碼部份和分支被提取的情況。令的種類，例如，流程將繼續的位址。傳统上：值部份定義程式因此，在下個週二，：器的參數值pc。 a ’由附加目前位址到位移值中，例如，pc+Dif來將位址將被載人程試計數器以執行計算的分支。因為，附加要大量的時間來完成，分支指令的起始位址’即分支目標指令位址’不會被載入至程式計數器 1057-4473-PF.ptd 第11頁 559733 五、發明說明（7) 中^到第^週期的末端。由於位址參數值PC + Disp所對應 ^ $支目^指令，直到第三週期才會被提取。因此，在計 f分支目標指令位址時週期會錯失。此情況通常當作分支等待。太换=為#圖形圖解傳統結構下指令的執行之功能概要鬼圖。如第4圖所示。冑序系統1〇包括用續執行的位址之藉士士+ I哭彳9卡i二、生夺日7連中，假設^入f式计數來執仃式指令。在本描述的位址大小可應用在其也位址的大小。由程式計】 ::ΐίί;Γ=ΐ;和模組26。總和模組2°6將; 送往多工器U 6^ Ί '暫存1524相加產生的結果位移值而遞增的結果提供田2曰棋組"來遞增，模組18來透過多的另一輸入端。由分支預測供的位址。如果八擇輸入㈣來選擇多工器16提工器26輸入由多工器26將位移值心從多相加來產生的多工器16選擇之分；二i计數盗參數值測模組18並下載至位址暫：器20支2'“立址至分支預並下載至位址暫存器20。 ° 則’選擇遞增的位址個執：址；；=址提供指令記憶體來存取下指令在解碼或其t:::;r2;r指令暫存器“。任何值的話，將可被令位移值部份如果* 跳到的位址，例如，當提：：：二6:= 士流程將要 7馬刀支指令時。如上所 1057-4473-PF.ptd 第12頁 559733 五、發明說明（8) 述，此方法由於包括執行多工器26的時間而導致分支等待。第5圖圖形圖解依據本發明而解決第4圖中分支等待問題之結構下指令的執行之功能概要方塊圖。在第5圖中，代替程式計數器參數值的分支指令之位移值部份，位移值 ^接由J令記憶體22的指令中提取並直接輸入至多工器16 ===，人端來當作位移值的最低有效位元（lsBs)，在此 J別=中，標示為15:0的多工器16最低有效位元。從指的記憶體配置中提取分支指令在分支指令的提此且在下個分提取週期開始之前迅速完成。因 :令在：的週期’可以提取為分支目標指令的下個在位址暫存器2〇固5功提取週期開始之前它的仉址已出現分支目標指;可被^成功的提取週期中分支指令和述的分支等待。不錯失週期。可消除上面所描令記ί ί ί 2 ::缺點由第6圖來圖解’第6圖為傳統指 22定義為用以建"，置部份之概要方塊圖。指令記憶體解例子中所亍數區塊102、104和丨〇6。如特別的圖 ffff16。因此，;—區塊具一組位址配置從oooou至元與包括2“配署母二記憶體位址定義的配置16最低有效位在程式置之特別的區塊。存取間’在任何給予的㈣’程式計數器為例。當分支指人；J的ΐ中一配置的指令。以區塊104 7開始什算時，依據上面所描述的方法， 1057-4473-PF.ptd 第13頁 559733 五、發明說明（9) $ : : 1 6位元位移值的部來代替下個位址。執杆* ^ Γ他町1 0取低有效位兀位置情況的缺點來自於實d、V:區塊104其中-個配置。此中。因為這樣，所以；；立的配；必須在同-區塊程式計數器的目前來數值g t小可一些限制。依據開始計算分支指令時如取f配置。例如’當則前進分支只能作小距離近區塊的末端，程式目前執行靠近區塊的起@ ^八禮同樣的，如果距離…清況限制了程=統支僅限制在可能的為了解決此問題’本發明的實施例位移值部份之一部份爽宗荔八士人成也刀又？日7旧丨切來疋義刀支扣令應建立的指令記憶體中之£塊。第7圖圖解依本發明的分支指令的袼式之概要 ^以第7圖為例’使用具有16位元位移值搁位的32位元才曰令。可理解的本發明可應用在其他的大小。、》對應到第7圖，指令120的格式包括具有位元16_31的運算碼攔位122和具有位元〇 —15的位移值攔位128。128更區分成具有位元0-13的位址參數值攔位和具有位元丨5 的區塊攔位126。2位元區塊欄位定義是否分支應以目前的區塊（對應到PC)、目前領先（對應到Pc—〇或目前連結的區塊（對應到pc+i)來取代相同的區塊。位址參數值攔位124 以分支目標指令應提取的定義區塊來定義位址。因此，在本實施例的圖解中，區塊攔位丨26包括至少2 位元容量來定義用以確認區塊的至少4個碼。其中一個碼疋義分支指令在相同的區塊。第二碼定確認目前落後區1057-4473-PF.ptd $ 8 pages 559733 V. Description of the invention (4) ___ = No branch will be selected. In the embodiment, if it is a backward branch, select ::: select branch. If the branch is forward, then predictable will; will be two iii. Therefore, in the above fourth set of illustrations, if the first, ... of the code is defined, it is called a backward branch, and the branch is predicted to be selected. In other words, if the first bit of the definition code is 0, it is called the forward branch 'and it is predicted that the branch will not be selected. Waiting to essentially reduce or eliminate the points of the conventional technology. You can directly generate branching and protecting the core of the shift value address information. No-you must perform, for example, adding an address offset parameter value to the program counter; send two = ^ ί Time spent on making. The provided branch target address is directly used as a part of the branch instruction fetch cycle immediately: = hardware function. Therefore, ′ can be used to extract branch heads in the next cycle = said: l without missing cycles. This result can be used to solve the problem with the conventional technique. = ㈣ Jump because the wait included in the branch target instruction is calculated. The present invention therefore proposes a method which improves the efficiency more significantly than the known technique when branching the instruction program. Brief description of the drawings: Mingru such as ί? Other descriptions and other objectives, features, and advantages will appear in the present example for a more individual description. Similar references are not shown as shown in the drawings; the same parts of the point of view. The drawings are not necessarily to scale and the principles of the present invention emphasize the displacement values in the drawings. To decompose the traditional program instruction under the de-pipelined structure, including the extraction timing and decoding cycle; Figure 2 is a graphical illustration of the branch instruction program under the pipelined structure. 1057-4473-PF.ptd $ 9 pages 559733 V. Description of the invention (5) _____ Executed by traditional program instructions, including extraction timing and traditional branch instructions shown in Figure 3 graphically; horse cycle; Figure 4 shows the block diagram of instructions under the traditional structure; Function Overview Figure 5 is a graphic diagram illustrating the function of the execution of the instruction of the shape-waiting problem according to the present invention: to open; second to solve the 4th ... Figure 6 is a block diagram illustrating the address and allocation of the instruction memory; 31 4 Diagram of the outline of the injury: According to the branch instruction of the present invention, FIG. 8 is a block diagram showing the execution of the instruction block and the knife% outline block according to the structure of the present invention. Figure 9 is a diagram illustrating the execution of the program according to the present invention, "the time sequence of the heart and the heart decoding cycle and square symbol description: the liver '% time sequence diagram; 1 0 ~ program system; 1 4 ~ incremental module; 1 8 ~ branch Test; 2 2 ~ instruction memory; 2 6 ~ sum module; 122 ~ op code block 1 2 6 ~ block block, 21 6 ~ multiplexer; 222 ~ decrement module; 1 2 ~ program counter ; 16 to multiplexer; 20 to address register 24 to instruction register; 102, 104, 106 to area mask; 124 to address parameter value block; 1 2 to 8 shift value Stop; 220 ~ incremental module; 2 2 3 ~ direct path. Draw 1057-4473-PF.ptd Page 10 559733 V. Description of the invention (6) Description of the embodiment Execute the traditional program instruction cycle under the de-pipelined structure During the period, as shown in Figure 1, in the first memory allocation position to extract the first = value Pc as the instruction to record the instruction pc and increment the program count = ^ During the cycle, the decoding instruction Pc will be decoded during the second cycle , 疋 4. Value PC + 2 to extract the determined eight-, said program counter parameters; to continuously control instruction extraction and: continued order extraction. General and; order]: = instruction decoding and the next refers to Figure 2 The graphic diagram of the pipelined structure ^ 八 # ° is executed by traditional program instructions, including the timing of program instructions Figure, ▲ In the first cycle, the second is the same as the-instruction. In this example, the instruction is eight "t = ° ten digits", as confirmed in Figure 3. As shown in Figure 3 ’:, :::::! The format of the command, the memory displacement value part. The operation code defines the case where the eighth difference code part and the branch are extracted. The type of order, for example, the address where the process will continue. Traditionally: the value part defines the program. So, next Tuesday, the parameter value of the device is pc. a 'consists of appending the current address to the offset value, for example, pc + Dif to branch the address to be carried in the trip counter to perform calculations. Because the addition takes a lot of time to complete, the start address of the branch instruction 'that is, the branch target instruction address' will not be loaded into the program counter 1057-4473-PF.ptd page 11 559733 V. Description of the invention (7 ) To the end of the ^ th period. Because the address parameter value PC + Disp corresponds to the ^ $ 支目 ^ instruction, it will not be fetched until the third cycle. Therefore, the cycle is missed when counting the f branch target instruction address. This situation is usually treated as a branch wait. Taichang = # Graphical overview of the function execution of instructions under the traditional structure. As shown in Figure 4. The sequence system 10 includes a taxi with a continuously executed address + I cry 9 cards i 2 and 7 days in a row, and it is assumed that the f-type count is used to execute the f-type instruction. The address size described in this description can be applied to the size of its address as well. By program] :: ΐίί; Γ = ΐ; and module 26. The sum module 2 ° 6 will be sent to the multiplexer U 6 ^ Ί 'temporarily store the result shifted by the 1524 addition and increase the result to provide Tian 2 said chess group "to increment, module 18 to pass through multiple The other input. The address provided by the branch prediction. If eight selection inputs are selected, the multiplexer 16 is selected. The lifter 26 inputs the selection points of the multiplexer 16 generated by the multiplexer 26 by adding the displacement value center from the polyphase; 18 and download to the address temporary: 20 branches 2 '"stand-up to branch pre-download and download to the address temporary register 20. ° then' select the incremental address address: address;; = address provides instruction memory to Access the instruction under decoding or its t :::; r2; r instruction register ". Any value will be allowed to shift the address if the * jumps to the address. For example, when mentioning ::: 2 6: = the taxi process will require 7 sabre instructions. As mentioned above 1057-4473-PF.ptd Page 12 559733 V. Description of the Invention (8), this method involves branch waiting due to the time including execution of the multiplexer 26. Fig. 5 is a schematic block diagram illustrating the function execution of instructions under the structure of solving the branch wait problem in Fig. 4 according to the present invention. In Fig. 5, instead of the displacement value of the branch instruction of the program counter parameter value, the displacement value ^ is fetched from the instruction of the J command memory 22 and directly input to the multiplexer 16 ===. The least significant bit (lsBs) of the displacement value, in this J = =, the least significant bit of the multiplexer 16 labeled 15: 0. Fetching a branch instruction from the memory configuration of the finger completes immediately before the branch instruction is taken and before the next branch fetch cycle begins. Because: the cycle of: can be fetched as the branch target instruction. The next address address register 2 solid 5 power extraction cycle begins before its address has a branch target finger; can be used in the successful fetch cycle. The branch instruction and the branch wait are described. Well out of cycle. The above-mentioned order can be eliminated ί ί 2 :: The shortcoming is illustrated by FIG. 6 ′ FIG. 6 is a block diagram of the traditional reference 22 which is defined to be used for construction. The instruction memory interprets the blocks 102, 104, and 06 in the example. Such as the special picture ffff16. Therefore,;-a block has a set of address configurations from oooou to yuan and includes a 16-bit configuration with 2 memory address definitions. The least significant bit is located in a special block of the program. For example, when the branch refers to a person; the instruction in J's one configuration. Starting at block 104 7 and counting, according to the method described above, 1057-4473-PF.ptd Page 13 559733 V. Description of the invention (9) $:: 16 The part of the 6-bit displacement value is used to replace the next address. The disadvantage of the lever * ^ Γ 1 10 taking the position of low effective position comes from real d, V: There is one configuration in block 104. Here. Because of this, the stand-alone configuration must be limited to the current value of the block-counter counter, which may have some restrictions. According to the f instruction when starting to calculate the branch instruction. For example, 'Dangzhe forward branch can only be used as a short distance near the end of the block. The program currently executes close to the end of the block. @ ^ 八礼 Similarly, if the distance ... the condition limits the process = system support is only limited to possible in order to solve This problem is part of the displacement value part of the embodiment of the present invention. The eight scholars also made a knife? The 7th old 丨 cut the block in the instruction memory that should be established by the knife support order. Figure 7 illustrates the outline of the method of the branch instruction according to the present invention ^ The picture shows an example of a command using a 32-bit bit with a 16-bit shift value. It is understandable that the present invention can be applied to other sizes. ">> Corresponds to Fig. 7. The format of the instruction 120 includes the bit 16_31. The operation code block 122 and the shift value block 128 with bits 0-15 are 128. 128 is further distinguished into a block with an address parameter value of bits 0-13 and a block block 126 with bits 丨 5. The 2-bit block field defines whether the branch should replace the same block with the current block (corresponding to the PC), currently leading (corresponding to Pc-0 or the currently connected block (corresponding to pc + i)). The address parameter value block 124 defines the address based on the definition block that the branch target instruction should fetch. Therefore, in the illustration of this embodiment, the block block 26 includes at least 2 bits of capacity to define the area for confirmation. At least 4 codes of the block. One of the codes means that the branch instruction is in the same block. The second code determines the confirmation target. Backward area

IM 1057-4473-PF.ptd 第14頁 559733 五、發明說明（ίο) 塊，第三碼確認目前領先區塊，第四碼可用以定確認分支在相同的區塊而不同方向，而第一碼可用以定確認分支在相同的區塊而相反方向。因此，以此結構下舉一例子，第一和第二碼例如為0 0和〇 1，可用以確認前進分支，前者在此區塊中而後者進入下個區塊。第三和第四碼例如為1 0和 11，可用以確認後退分支，前者進入區塊中而後者在目前區塊中。因此，位址1 5的位元0可表示前進分支，位址1 5 的位元1可表示後退分支。第8圖係圖解依據本發明之結構由分支指令確認分支執行時指令的記憶體區塊而執行指令之功能概要方塊圖。& 在此結構下，1 4最低有效位元〇 -1 3從指令記憶體2 2連結到多工器216的4輸入端之其中3輸入端。其他18位元31 - 14由程式計數器1 2中取得並結合指令記憶體22的1 4最低有效位元。其他1 8位元3 1 - 1 4連結到遞增模組2 2 0、遞減模組2 2 2 和直接路徑223，並將位元結果與指令記憶體22的14最低有效位元在多工器216的輸入端結合。遞增模組220用以產生當分支為記憶體下個區塊的位址；遞減模組222用以產生當分支為記憶體目前領先區塊的位址；直接路徑223用以產生當分支為記憶體目前區塊的位址；多工器2 1 6的第四輸入端直接從程式計數器中接收位元31-0並供一船遠續〇程式的執行所使用。、如分支偵測模組18用以選擇載入位址暫存器2〇的位址。選沒有選取分支，則選擇14的位元31-0位址。如果分支、、擇至下個區塊，則選擇220的位元3 1-14位址。如果分支IM 1057-4473-PF.ptd Page 14 559733 V. Description of the invention (ίο) block, the third code confirms the current leading block, the fourth code can be used to confirm that the branch is in the same block but different directions, and the first The code can be used to confirm that the branches are in the same block but in opposite directions. Therefore, an example is given in this structure. The first and second codes are, for example, 0 0 and 0 1, which can be used to confirm the forward branch, the former being in this block and the latter entering the next block. The third and fourth codes are, for example, 10 and 11, which can be used to confirm the backward branch, the former enters the block and the latter is in the current block. Therefore, bit 0 at address 15 can represent the forward branch, and bit 1 at address 15 can represent the backward branch. Fig. 8 is a block diagram illustrating the function of an instruction executed by a branch instruction confirming the execution of a memory block of the instruction according to the structure of the present invention. & In this structure, the 14 least significant bits of 0-1 3 are connected from the instruction memory 2 2 to 3 of the 4 inputs of the multiplexer 216. The other 18 bits 31-14 are obtained from the program counter 12 and combined with the least significant 4 bits of the instruction memory 22. The other 18 bits 3 1-1 4 are connected to the increment module 2 2 0, the decrement module 2 2 2 and the direct path 223, and the bit result and the 14 least significant bits of the instruction memory 22 are in the multiplexer. The input of 216 is combined. Increment module 220 is used to generate the address of the next block when the branch is in memory; decrement module 222 is used to generate the address of the current leading block when the branch is in memory; direct path 223 is used to generate when the branch is memory The address of the current block; the fourth input of the multiplexer 2 1 6 directly receives bits 31-0 from the program counter and is used by a ship to continue the program execution. For example, the branch detection module 18 is used to select an address of the address register 20 to be loaded. If no branch is selected, the bit 31-0 address of 14 is selected. If branch,, select to the next block, then select bit 3 1-14 address of 220. If branch

559733 五、發明說明（11) 選取至先前區塊，則選擇222的位元31-14位址。如果分支選取至目前的區塊，則選擇223的位元3 1-14位址。依據本發明的具有區塊確認碼之分支指令可用以作分支偵測的幫助。如上所示，位址1 5的位元0可表示前進分支，位址1 5的位元1可表示後退分支。一般分支預測的方法中僅取後退分支而不取前進分支。因此，依據本發明，如果位址1 5的位元0則不取分支，如果位址1 5的位元}則取分支。因此’依據本發明在第8圖的實施例中，提取分支的位址可能範圍超過第5圖的實施例。使用後者的方法，位址參數值的位移值的部份僅特定的2h可能位址，而非第5 圖前者方法中的216可能位址。然而，在此例中，使用第8 圖的方法，三個特定的記憶體區塊中的每一個之2U可能位址。所以大大地增加可能分支距離和實現程式彈性的結第9圖係圖解依據本發明降低分支等待的裝置與方法之程式指令執行的時序時間和解碼週期之時序圖。如第9 圖所示，依據本發明，可在成功的週期内提取分支指令分支目標指令。消除了傳統方法中所發現的分支等待。在此註解本發明本以上述不同的方法來實行。舉在指令快取記憶體載入欲執行之指令開始執行前立即實灯’而不必以程式編譯或連結來個別改變指令本 :元：者：ί ί t確'忍攔位在示範的實施例中為2 位兀搁位田載入才日1時附加$丨| 人了加司適當的指令快取記憶體配559733 V. Description of the invention (11) If the previous block is selected, the bits 31-14 of 222 are selected. If the branch is selected to the current block, the bit 3 1-14 address of 223 is selected. A branch instruction with a block confirmation code according to the present invention can be used as an aid for branch detection. As shown above, bit 0 at address 15 can represent the forward branch, and bit 1 at address 15 can represent the backward branch. In the general branch prediction method, only the backward branch is taken and the forward branch is not taken. Therefore, according to the present invention, no branch is taken if bit 0 of address 15 is taken, and a branch is taken if bit 15 of address 15 is taken. Therefore, in the embodiment of FIG. 8 according to the present invention, the address of the extraction branch may exceed the embodiment of FIG. 5. Using the latter method, the shift part of the address parameter value is only a specific 2h possible address, rather than the 216 possible address in the former method in FIG. 5. However, in this example, using the method of Figure 8, the 2U possible address of each of the three specific memory blocks. Therefore, the possible branch distance is greatly increased and the flexibility of the program is realized. FIG. 9 is a timing diagram illustrating the timing of program instruction execution and the decoding cycle of the apparatus and method for reducing branch waiting according to the present invention. As shown in Fig. 9, according to the present invention, a branch instruction branch target instruction can be fetched in a successful cycle. Eliminates branch waits found in traditional methods. It is noted here that the present invention is implemented in different ways as described above. For example, the instruction cache is loaded immediately before the execution of the instruction to be executed. It is not necessary to change the instruction individually by program compilation or linking. In the middle of the table, 2 people are loaded. At 1 o'clock on the day, they are added with $ 丨 |

559733 五、發明說明（12) 置中。在此實施例中，舉例所述的3 2位元指令，分支指令的 1 6位元位址參數值和2位元方塊確認參數值。可理解的這些位元的數量在不超過本發明的範圍之内可以改變。本發明的較佳實施例描述如上，然而，在不脫離本發明之精神和範圍内，當可做些許的更動與潤飾。此外，其餘實施例皆在下列申請專利範圍内。參559733 V. Description of invention (12) Centered. In this embodiment, the 32-bit instruction, the 16-bit address parameter value of the branch instruction, and the 2-bit block confirmation parameter value are described as examples. It is understood that the number of these bits can be changed within the scope of the present invention. The preferred embodiment of the present invention has been described above. However, a few changes and modifications can be made without departing from the spirit and scope of the present invention. In addition, the other embodiments are all within the scope of the following patent applications. Participate

1057-4473-PF.ptd 第17頁1057-4473-PF.ptd Page 17

Claims

559733

1. A method for describing one block of a block of instruction memory of a branch instruction program includes the following method: The branch instruction is one of the plural instructions stored in the formula, and the upper part carries 5 1, ', with an operation part. The above-mentioned branch instruction including an operation code and address information related to the target instruction address address portion, and the execution of a program corresponds to the branch of the above-mentioned branch instruction; providing the above-mentioned address portion of the branch instruction With a memory block to determine the damage and a displacement value part, the above-mentioned memory block confirmation part confirms that the above-mentioned branch corresponding to the dead-end branch instruction is executed in the above-mentioned memory block; and Let the above-mentioned displacement value part of the above-mentioned address part = the above-mentioned memory block address confirmed by the above-mentioned memory block confirmation part of the above-mentioned address part of the above-mentioned branch instruction as the branch target instruction. 2. The method according to item 1 of the scope of patent application, wherein branching to the branch target instruction includes using the above-mentioned shift value portion of the above-mentioned address portion of the above-mentioned branch instruction and the above-mentioned address portion of the above-mentioned branch instruction. The above-mentioned memory block confirmation part generates a branch target address. ° 3. The method according to item 2 of the scope of patent application, wherein the branch target address is generated during the fetch cycle of the branch instruction. 4. The method according to item 1 of the scope of patent application, wherein the above-mentioned memory block confirmation portion of the above-mentioned address portion of the above-mentioned branch instruction confirms that the block executed by the branch is a above-mentioned memory block leading the above-mentioned branch Refers to the memory area of ^, the memory area is behind the memory area of the branch instruction

559733 VI. Patent application scope block and memory block of the above branch instruction. 5. The method described in item 1 of the scope of patent application, which further includes using the above-mentioned memory block confirmation portion of the above-mentioned address portion of the branch instruction to predict whether to extract the above-mentioned branch. The method described in item 1 of the scope of the patent application, wherein the above-mentioned memory block confirmation portion of the above-mentioned address portion of the branch and the order confirms at least 4 yards to predict whether the above-mentioned branch is to be extracted. 7. The method according to item 6 of the scope of patent application, wherein the above 4 codes include a first pair of codes for the forward branch and a second pair of codes for the backward branch. 8 · The method as described in claim 7 of the patent scope, wherein the first f-code includes a code to branch forward to the next memory block and a code to advance branch to the above-mentioned block of the branch instruction . 9. The method as described in item 7 of the scope of patent application, wherein the second: J: the code branches backward to a leading memory block and-after the code is used, the knife branches to the above-mentioned area of the branch instruction Piece. The complex = · / race Λ branch instruction program device, the above branch instruction is one of the program's plural private orders, the above device includes: an instruction memory for storing instructions, the above branch two-, the above branch: order * One of the relevant address information is an address including the branch target instruction phase block confirmation part and a displacement value part: the address part has a memory area that recognizes the execution of the above * branch instruction pair ^ self-memory, body block confirmation part A proper branch should be in the above area of the memory Η 1057-4473-PF.ptd Page 19 559733

559733 6. The scope of patent application includes a first pair of codes for the forward branch and a second pair of codes for the backward branch. 1 7. The device according to item 16 of the scope of patent application, wherein the first pair of codes includes a code for branching to the next memory block and a code for branching to the above-mentioned area of the branch instruction. Piece. 1 8. The device according to item 16 of the scope of patent application, wherein the second pair of codes includes one code branching backward to a leading memory block and one code branching backward to the above-mentioned area of the branch instruction Piece.

1057-4473-PF.ptd Page 21