559733 五、發明說明(1) 傳統上程序系統以多階(stage)或多週 指令。記憶體位址儲存所須執行的指令執订私式 式計數器來讀取並寫入到位址暫丁時由程 用以存取記憶體配置的指令,並從子益的位址 且載入至指令暫存器。 並以…己憶體中提取指令 定指心:元位元的字元’傳統上指令包括確 ::::夕位兀運异碼及記憶體位址資訊。舉例而古,在 才曰々儿成時,記憶體位址資訊可句 口 記憶异兀(opreand)體位置或儲存指令結果的 在提取指令之後並下都5扣人* 期。在解碼迅期湘二: 暫存器,並執行解碼週 指令和所要求的判定程序 :::t 丁的確認 元。當解碼完成後,就後可被用於回收運算 在一般指令執行的連續户 增程器以指向指令記憶體的下=指令之後’遞 刀支才日令為一種非當並、基半s;丨 改變指令執行時的一般連;;:”指令。分支指令用以 時,如果未滿足迴圈中° r到達迴圈的最後指令 圈的最頂,。在此情“的式的流程必須回到迴 數器而獲得頂部之位址。八:二:用以載入遞增程式計 :址資料來將程序模組“至二:4ϊ=ί;‘:: 1057-4473-PF.ptd 第6頁 559733 五、發明說明(2) =同大部份傳統的指令一樣,分支指令包括運算碼和 位址資訊。位址資訊傳統上採用定義為使用分支時程式將 跳過的位址數量之偏移值的方式。偏移值傳統上為有號數 目且在解碼週期時附加到目前指令上。在解碼週期之後, = Ϊ位址暫存器因此在下個提取週期時便可 用以提取分支的第一指令。 ,式執行的效率可借由管線化〇 Ηη J加強。在管線化時,在第一提取週期期間提=曰=丁 ΐ個:ί的;::#一指令的解碼週期,同時地丄亍i式 個指令是說,*第-指令解碼時提取下 效率。 。’此方法明顯地增加程式執行的速度和 在此情況下’由管線化可獲八 與包括計算分支目標位址的等η:的效率須 週期期間’位移值必須加在目前數=支指令解碼 附加動作須要在解碼週期的明顯部;:、位址值。這 址尚未判定下載至位址暫存器中,;以下個::個指令位 期無法同時間執行。相反 斤乂下個扣令的提取週 支目標指令,直到解碼週 f旨令的提取週期,即分 上面描述的情況為::成之後才能開始。 分支之後的第-指令參考。㈣求減“ 在提取分支指令之後迅速載入 &心令之等待,可借由 期時能提取分支目標指令。 位址暫存器使下個提取週 本發明提供一減少^去笠a〜 支4待條件的方法。依據本發明 1057-4473-PF.ptd 第7頁 559733 五、發明說明(3) 而提供分支指令程 算碼的運算螞部份 資訊之位址部份, 支指令位址的部份 份,記憶體區塊確 支對應的分支指令 份來當作分支指令 *忍在記憶體區的位 當獲得分支時 部份用以確認執行 在實施例中,區塊 為包括分支指令的 支指令在同一區塊 區塊。 在實施例中, 至少四碼來確認方 區塊。其中第二碼 先區塊。第四碼用 第一碼亦可用以定 此,在此設定下舉 01,可用以碟認前 塊。例如第三碼和 支,前者進入領先 在實施例中’ 序的裝置 和具有分 用以執行 包括記憶 認部份用 。使用分 的位址部 址來執行 分支指令 分支在記 可為一-可 區塊,而 。其他區 與方法。 支目標指 程式分支 體區塊確 以確認在 支指令的 份的記憶 分支至分 的位址部 憶體的區 月匕區塊之 在此情況 塊為目前 分支指令 令的位址 對應的分 認部份和 記憶體的 位址部份 體區塊確 支指令。 份的記憶 塊對應的 包括具有運 相關的位址 支指令。> 位移值部 區塊執行分 的位移值部 認部份來確 體區塊確認 分支指令。d 其 區塊之一可能 下分支目 領先區塊 標指令與分 和目前落後 區塊確認器 塊。其中一 定義目前落 以定義在相 義在相同區 例子,例如第 包括至少 碼用以確 後區塊, 同區塊的 塊的分支 進分支,珂 第四碼為1 0 區塊而後者 區塊確認部 碼 者在區塊 和11,可 在目前區 份可用以 二位元容量以定義 認分支指令的相同 第二螞定義目前領 分支之各別方向。 為不同方向。因 和第二碼為0 0和 中而後者在下一區 用Μ定義後退分 塊中6 分支預測,即預測559733 V. Description of the invention (1) Traditionally, a program system uses multi-stage or multi-cycle instructions. The instruction to be executed in the memory address storage is a private counter to read and write to the address. The program temporarily uses the instruction to access the memory configuration, and loads the instruction from the sub address and loads it into the instruction. Register. And the instructions are extracted from ... memory body: the character of meta-bits ’traditionally, the instructions include ::::: Xi Wuyun code and memory address information. For example, in ancient times, the memory address information can be used to memorize opreand body positions or store instruction results after the instruction is fetched and issued for 5 times *. During the fast decoding phase, the second: register, and execute the decoding cycle instruction and the required judgment procedure ::: t ding confirmation element. When the decoding is completed, it can be used for recovery operations. In the general instruction execution, the continuous range extender points to the instruction memory. After the instruction = the knife is only ordered as a kind of improper union and base half s;丨 Change the general connection when the instruction is executed;;: "instruction. When the branch instruction is used, if the loop r does not meet the top of the last instruction circle in the circle, the flow of the formula in this case must return Go to the counter and get the top address. Eight: two: used to load the incremental program: address data to the program module "to two: 4ϊ = ί; ':: 1057-4473-PF.ptd page 6 559733 5. Description of the invention (2) = same Like most traditional instructions, branch instructions include opcodes and address information. Address information has traditionally been defined as the offset value of the number of addresses that the program will skip when using a branch. The offset value is traditionally The number of numbers is appended to the current instruction during the decoding cycle. After the decoding cycle, the = Ϊ address register can therefore be used to fetch the first instruction of the branch in the next fetch cycle. The efficiency of the formula execution can be obtained by Enhancement of pipeline 〇Ηη J. When pipelined, during the first fetch cycle, it is said that: = 丁 ΐ 个: ί 的 :: ## Decoding cycle of an instruction, meanwhile, an i-type instruction is said, * -The efficiency is extracted when the instruction is decoded. 'This method significantly increases the speed of program execution and in this case,' Efficiencies obtained by pipeline and eighth including the calculation of the target address of the branch must be shifted during the cycle ' The value must be added to the current value = additional instruction decoding additional action Need to be in the obvious part of the decoding cycle ;: address value. This address has not yet been determined and downloaded to the address register; the following :: The instruction bit period cannot be executed at the same time. Conversely, the extraction of the next deduction order The target instruction of the weekly branch is decoded until the fetch cycle of the f instruction of the week f is decoded, that is, the situation described above is as follows: it cannot be started. The first instruction reference after the branch. ; Wait for waiting, you can fetch branch target instructions by time. The address register enables the next fetch cycle. The present invention provides a method for reducing the number of waiting conditions. According to the invention 1057-4473-PF.ptd Page 7 559733 V. Description of the invention (3) Provides the address part of the operation part information of the branch instruction program code, part of the branch instruction address, memory The block instruction is used to determine the corresponding branch instruction as a branch instruction. * The bit in the memory area is used to confirm the execution of the branch. In the embodiment, the block is a branch instruction including the branch instruction in the same area. Block by block. In an embodiment, at least four yards are used to confirm the square block. The second code is the first block. The fourth code can also be used for the first code. In this setting, 01 is used to identify the previous block. For example, the third code and the branch, the former enters the lead in the embodiment. The device has a sequence for execution and includes a memory recognition section. Use the address of the branch to execute the branch instruction. The branch can be recorded as a one-block, and. Other areas and methods. The branch target refers to the block of the program branch. It is confirmed that the branch of the branch instruction is in the memory branch to the address of the branch. The block in this case is the identification corresponding to the address of the current branch instruction. Partial and memory addresses Partial block confirmation instructions. The corresponding memory block includes the instruction with the address associated with the operation. > Displacement value part The displacement value part of the block execution point is used to confirm the block confirmation branch instruction. d One of its blocks may have a branch target Leading block Marking instructions and points and Currently behind block confirmer block. One of the definitions is currently defined to be in the same meaning as the example in the same area. For example, the first code includes at least the code to determine the block, the branch of the same block is branched into the branch, and the fourth code is the 10 block and the latter block. The person who confirms the code is in the block and 11 can use the two-bit capacity to define the same second instruction of the branch instruction in the current area to define the respective directions of the current branch. For different directions. Because and the second code is 0 0 and and the latter is in the next area, use M to define the 6 branch prediction in the backward block, that is, the prediction
1057-4473-PF.ptd $ 8頁 559733 五、發明說明(4) ___ =否將要選取分支。在實施例中,如果為後退分支, 選:::選取分支。如果分支為前進,然後可預測將;會 二位iii。,因此;在上述第四圖解集中,如果定義碼的第 、…,則稱之為後退分支,並預測將選取分支。換 ,如果定義碼的第一位元為〇,則稱之為前進分 支’並預測將不會選取分支。 等待本質上用以減少或消除習知技術的分4 你^ 令的位移值位址資訊來直接產生分支目护 杯心,不-須執行例如附加位址偏移參數值至程式計數器; 送二=^ ί作上之時間消耗。所提供的分支目標位址直接 暫存器來當作分支指令的部份提取週期之立即: =的硬體功能。因此’在下個週期可用以提取分支目 =曰:l而不使週期錯失。本結果可用以與習知技術在解 =㈣因為要計算分支目標指令所包括的等待而跳= 乍比對。本發明因此提出在分支指令程序時 比I知技術更明顯改善效率的方法。 汴于 圖式簡單說明: 明如ί ί ?的別述和其他目㈤、特徵和優點將出現在本發 ^例中作更個別的描述。如附圖所示類似參考特 到不;觀點的相同部份。附圖無比例上的須要 而本發明的原則強調附圖中的位移值。 要反 r係解管線化結構下傳統的程式指令執 订’包括提取時序和解碼週期; 第2圖係圖形圖解管線化結構下其中分支指令的程序 l〇57-4473-PF.ptd $ 9頁 559733 五、發明說明(5) _____ 以傳統的程式指令執行,包括提取時序和 第3圖係圖形圖解傳統分支指令; 馬週期; 第4圖係圖形圖解傳統結構下指令 方塊圖; 7 、執行之功能概要 第5圖係圖形圖解依據本發明的形 支等待問題之指令的執行之功能:要开;二解決第4… 第6圖係圖解指令記憶體的位址和配’ 塊圖; 31 4伤之概要方 係圖:依本發明的分支指令 第8圖係圖解依據本發明之結構由 =概要圖, 執行時指令的記憶體區塊而執行指 y確二分支 以及 刀%概要方塊圖; 第9圖係圖解依據本發明可 法之程式“執行的時序時心解碼週置與方 符號說明: 肝’ %朋之時序圖; 1 0〜程序系統; 1 4〜遞增模組; 1 8〜分支預測; 2 2〜指令記憶體; 2 6〜總和模組; 122〜運算碼攔位 1 2 6〜區塊攔位,· 21 6〜多工器; 222〜遞減模組; 1 2〜程式計數器; 1 6〜多工器; 2 0〜位址暫存器 24〜指令暫存器; 102、1〇4、1〇6 〜區掩; 124〜位址參數值攔; 1 2 8〜位移值攔位; 220〜遞增模組; 2 2 3〜直接路徑。 画 1057-4473-PF.ptd 第10頁 559733 五、發明說明(6) 實施例之說明 執行解管線化結構下傳統的程式指令 週期期間,如第1圖所示,在第: 憶體配置位置來提取第= =數值Pc當作指令記 指令pc且遞增程式計 =^ 週期期間,將解碼 在第二個週期期間,解碼指令Pc ,疋的4令。 值PC + 2以提取確定的 八—,曰程式計數器參數 ;來連續控制指令提取及的:續 令提取。一般而;令】:= 指令解碼和下個指 第2圖係圖形圖解管線化結構中^八# ° 以傳統的程式指令執行,包括提ϋ &令的程序 示的時序圖,▲第一週期時 二如所 第-指令。在此例巾,指令為八“t=°十數㈣所確認之 如第3圖所示。如第3圖所示’:、:支:::支!令的格式 ,記憶體位移值部份。運算碼定義八日括運异碼部份和 分支被提取的情況。令的種類,例如, 流程將繼續的位址。傳统上:值部份定義程式 因此,在下個週二,:器的參數值pc。 a ’由附加目前位址到位移值中, 例如,pc+Dif來將位址將被載人程試計數器以執行計算 的分支。因為,附加要大量的時間來完成,分支指令的起 始位址’即分支目標指令位址’不會被載入至程式計數器 1057-4473-PF.ptd 第11頁 559733 五、發明說明(7) 中^到第^週期的末端。由於位址參數值PC + Disp所對應 ^ $支目^指令,直到第三週期才會被提取。因此,在計 f分支目標指令位址時週期會錯失。此情況通常當作分支 等待。 太换=為#圖形圖解傳統結構下指令的執行之功能概要 鬼圖。如第4圖所示。冑序系統1〇包括用 續執行的位址之藉士 士+ I哭彳9卡i二、 生夺日7連 中,假設^入f式计數 來執仃式指令。在本描述 的位址大小可應用在其也位址的大小。由程式計】 ::ΐίί;Γ=ΐ;和模組26。總和模組2°6將; 送往多工器U 6^ Ί '暫存1524相加產生的結果位移值 而遞增的結果提供 田2曰棋組"來遞增, 模組18來透過多的另一輸入端。由分支預測 供的位址。如果八擇輸入㈣來選擇多工器16提 工器26輸入由多工器26將位移值心從多 相加來產生的多工器16選擇之分;二i计數盗參數值 測模組18並下載至位址暫:器20支2'“立址至分支預 並下載至位址暫存器20。 ° 則’選擇遞增的位址 個執:址;;=址提供指令記憶體來存取下 指令在解碼或其t:::;r2;r指令暫存器“。 任何值的話,將可被令位移值部份如果* 跳到的位址,例如,當提:::二6:= 士流程將要 7馬刀支指令時。如上所 1057-4473-PF.ptd 第12頁 559733 五、發明說明(8) 述,此方法由於包括執行多工器26的時間而導致分支等 待。 第5圖圖形圖解依據本發明而解決第4圖中分支等待問 題之結構下指令的執行之功能概要方塊圖。在第5圖中, 代替程式計數器參數值的分支指令之位移值部份,位移值 ^接由J令記憶體22的指令中提取並直接輸入至多工器16 ===,人端來當作位移值的最低有效位元(lsBs),在此 J別=中,標示為15:0的多工器16最低有效位元。從指 的記憶體配置中提取分支指令在分支指令的提 此且在下個分提取週期開始之前迅速完成。因 :令在:的週期’可以提取為分支目標指令的下個 在位址暫存器2〇固5功提取週期開始之前它的仉址已出現 分支目標指;可被^成功的提取週期中分支指令和 述的分支等待。不錯失週期。可消除上面所描 令記ί ί ί 2 ::缺點由第6圖來圖解’第6圖為傳統指 22定義為用以建",置部份之概要方塊圖。指令記憶體 解例子中所亍數區塊102、104和丨〇6。如特別的圖 ffff16。因此,;—區塊具一組位址配置從oooou至 元與包括2“配署母二記憶體位址定義的配置16最低有效位 在程式置之特別的區塊。 存取間’在任何給予的㈣’程式計數器 為例。當分支指人;J的ΐ中一配置的指令。以區塊104 7開始什算時,依據上面所描述的方法, 1057-4473-PF.ptd 第13頁 559733 五、發明說明(9) $ : : 1 6位元位移值的部 來代替下個位址。執杆* ^ Γ他町1 0取低有效位兀位置 情況的缺點來自於實d、V:區塊104其中-個配置。此 中。因為這樣,所以;;立的配;必須在同-區塊 程式計數器的目前來數值g t小可一些限制。依據 開始計算分支指令時如取f配置。例如’當 則前進分支只能作小距離近區塊的末端, 程式目前執行靠近區塊的起@ ^八禮同樣的,如果 距離…清況限制了程=統支僅限制在可能的 為了解決此問題’本發明的實施例 位移值部份之一部份爽宗荔八士 人成也 刀又?日7旧 丨切來疋義刀支扣令應建立的指令記憶體 中之£塊。第7圖圖解依本發明的分支指令的袼式之概要 ^以第7圖為例’使用具有16位元位移值搁位的32位元 才曰令。可理解的本發明可應用在其他的大小。 、》對應到第7圖,指令120的格式包括具有位元16_31的 運算碼攔位122和具有位元〇 —15的位移值攔位128。128更 區分成具有位元0-13的位址參數值攔位和具有位元丨5 的區塊攔位126。2位元區塊欄位定義是否分支應以目前的 區塊(對應到PC)、目前領先(對應到Pc—〇或目前連結的區 塊(對應到pc+i)來取代相同的區塊。位址參數值攔位124 以分支目標指令應提取的定義區塊來定義位址。 因此,在本實施例的圖解中,區塊攔位丨26包括至少2 位元容量來定義用以確認區塊的至少4個碼。其中一個碼 疋義分支指令在相同的區塊。第二碼定確認目前落後區1057-4473-PF.ptd $ 8 pages 559733 V. Description of the invention (4) ___ = No branch will be selected. In the embodiment, if it is a backward branch, select ::: select branch. If the branch is forward, then predictable will; will be two iii. Therefore, in the above fourth set of illustrations, if the first, ... of the code is defined, it is called a backward branch, and the branch is predicted to be selected. In other words, if the first bit of the definition code is 0, it is called the forward branch 'and it is predicted that the branch will not be selected. Waiting to essentially reduce or eliminate the points of the conventional technology. You can directly generate branching and protecting the core of the shift value address information. No-you must perform, for example, adding an address offset parameter value to the program counter; send two = ^ ί Time spent on making. The provided branch target address is directly used as a part of the branch instruction fetch cycle immediately: = hardware function. Therefore, ′ can be used to extract branch heads in the next cycle = said: l without missing cycles. This result can be used to solve the problem with the conventional technique. = ㈣ Jump because the wait included in the branch target instruction is calculated. The present invention therefore proposes a method which improves the efficiency more significantly than the known technique when branching the instruction program. Brief description of the drawings: Mingru such as ί? Other descriptions and other objectives, features, and advantages will appear in the present example for a more individual description. Similar references are not shown as shown in the drawings; the same parts of the point of view. The drawings are not necessarily to scale and the principles of the present invention emphasize the displacement values in the drawings. To decompose the traditional program instruction under the de-pipelined structure, including the extraction timing and decoding cycle; Figure 2 is a graphical illustration of the branch instruction program under the pipelined structure. 1057-4473-PF.ptd $ 9 pages 559733 V. Description of the invention (5) _____ Executed by traditional program instructions, including extraction timing and traditional branch instructions shown in Figure 3 graphically; horse cycle; Figure 4 shows the block diagram of instructions under the traditional structure; Function Overview Figure 5 is a graphic diagram illustrating the function of the execution of the instruction of the shape-waiting problem according to the present invention: to open; second to solve the 4th ... Figure 6 is a block diagram illustrating the address and allocation of the instruction memory; 31 4 Diagram of the outline of the injury: According to the branch instruction of the present invention, FIG. 8 is a block diagram showing the execution of the instruction block and the knife% outline block according to the structure of the present invention. Figure 9 is a diagram illustrating the execution of the program according to the present invention, "the time sequence of the heart and the heart decoding cycle and square symbol description: the liver '% time sequence diagram; 1 0 ~ program system; 1 4 ~ incremental module; 1 8 ~ branch Test; 2 2 ~ instruction memory; 2 6 ~ sum module; 122 ~ op code block 1 2 6 ~ block block, 21 6 ~ multiplexer; 222 ~ decrement module; 1 2 ~ program counter ; 16 to multiplexer; 20 to address register 24 to instruction register; 102, 104, 106 to area mask; 124 to address parameter value block; 1 2 to 8 shift value Stop; 220 ~ incremental module; 2 2 3 ~ direct path. Draw 1057-4473-PF.ptd Page 10 559733 V. Description of the invention (6) Description of the embodiment Execute the traditional program instruction cycle under the de-pipelined structure During the period, as shown in Figure 1, in the first memory allocation position to extract the first = value Pc as the instruction to record the instruction pc and increment the program count = ^ During the cycle, the decoding instruction Pc will be decoded during the second cycle , 疋 4. Value PC + 2 to extract the determined eight-, said program counter parameters; to continuously control instruction extraction and: continued order extraction. General and; order]: = instruction decoding and the next refers to Figure 2 The graphic diagram of the pipelined structure ^ 八 # ° is executed by traditional program instructions, including the timing of program instructions Figure, ▲ In the first cycle, the second is the same as the-instruction. In this example, the instruction is eight "t = ° ten digits", as confirmed in Figure 3. As shown in Figure 3 ’:, :::::! The format of the command, the memory displacement value part. The operation code defines the case where the eighth difference code part and the branch are extracted. The type of order, for example, the address where the process will continue. Traditionally: the value part defines the program. So, next Tuesday, the parameter value of the device is pc. a 'consists of appending the current address to the offset value, for example, pc + Dif to branch the address to be carried in the trip counter to perform calculations. Because the addition takes a lot of time to complete, the start address of the branch instruction 'that is, the branch target instruction address' will not be loaded into the program counter 1057-4473-PF.ptd page 11 559733 V. Description of the invention (7 ) To the end of the ^ th period. Because the address parameter value PC + Disp corresponds to the ^ $ 支 目 ^ instruction, it will not be fetched until the third cycle. Therefore, the cycle is missed when counting the f branch target instruction address. This situation is usually treated as a branch wait. Taichang = # Graphical overview of the function execution of instructions under the traditional structure. As shown in Figure 4. The sequence system 10 includes a taxi with a continuously executed address + I cry 9 cards i 2 and 7 days in a row, and it is assumed that the f-type count is used to execute the f-type instruction. The address size described in this description can be applied to the size of its address as well. By program] :: ΐίί; Γ = ΐ; and module 26. The sum module 2 ° 6 will be sent to the multiplexer U 6 ^ Ί 'temporarily store the result shifted by the 1524 addition and increase the result to provide Tian 2 said chess group "to increment, module 18 to pass through multiple The other input. The address provided by the branch prediction. If eight selection inputs are selected, the multiplexer 16 is selected. The lifter 26 inputs the selection points of the multiplexer 16 generated by the multiplexer 26 by adding the displacement value center from the polyphase; 18 and download to the address temporary: 20 branches 2 '"stand-up to branch pre-download and download to the address temporary register 20. ° then' select the incremental address address: address;; = address provides instruction memory to Access the instruction under decoding or its t :::; r2; r instruction register ". Any value will be allowed to shift the address if the * jumps to the address. For example, when mentioning ::: 2 6: = the taxi process will require 7 sabre instructions. As mentioned above 1057-4473-PF.ptd Page 12 559733 V. Description of the Invention (8), this method involves branch waiting due to the time including execution of the multiplexer 26. Fig. 5 is a schematic block diagram illustrating the function execution of instructions under the structure of solving the branch wait problem in Fig. 4 according to the present invention. In Fig. 5, instead of the displacement value of the branch instruction of the program counter parameter value, the displacement value ^ is fetched from the instruction of the J command memory 22 and directly input to the multiplexer 16 ===. The least significant bit (lsBs) of the displacement value, in this J = =, the least significant bit of the multiplexer 16 labeled 15: 0. Fetching a branch instruction from the memory configuration of the finger completes immediately before the branch instruction is taken and before the next branch fetch cycle begins. Because: the cycle of: can be fetched as the branch target instruction. The next address address register 2 solid 5 power extraction cycle begins before its address has a branch target finger; can be used in the successful fetch cycle. The branch instruction and the branch wait are described. Well out of cycle. The above-mentioned order can be eliminated ί ί 2 :: The shortcoming is illustrated by FIG. 6 ′ FIG. 6 is a block diagram of the traditional reference 22 which is defined to be used for construction. The instruction memory interprets the blocks 102, 104, and 06 in the example. Such as the special picture ffff16. Therefore,;-a block has a set of address configurations from oooou to yuan and includes a 16-bit configuration with 2 memory address definitions. The least significant bit is located in a special block of the program. For example, when the branch refers to a person; the instruction in J's one configuration. Starting at block 104 7 and counting, according to the method described above, 1057-4473-PF.ptd Page 13 559733 V. Description of the invention (9) $:: 16 The part of the 6-bit displacement value is used to replace the next address. The disadvantage of the lever * ^ Γ 1 10 taking the position of low effective position comes from real d, V: There is one configuration in block 104. Here. Because of this, the stand-alone configuration must be limited to the current value of the block-counter counter, which may have some restrictions. According to the f instruction when starting to calculate the branch instruction. For example, 'Dangzhe forward branch can only be used as a short distance near the end of the block. The program currently executes close to the end of the block. @ ^ 八 礼 Similarly, if the distance ... the condition limits the process = system support is only limited to possible in order to solve This problem is part of the displacement value part of the embodiment of the present invention. The eight scholars also made a knife? The 7th old 丨 cut the block in the instruction memory that should be established by the knife support order. Figure 7 illustrates the outline of the method of the branch instruction according to the present invention ^ The picture shows an example of a command using a 32-bit bit with a 16-bit shift value. It is understandable that the present invention can be applied to other sizes. ">> Corresponds to Fig. 7. The format of the instruction 120 includes the bit 16_31. The operation code block 122 and the shift value block 128 with bits 0-15 are 128. 128 is further distinguished into a block with an address parameter value of bits 0-13 and a block block 126 with bits 丨 5. The 2-bit block field defines whether the branch should replace the same block with the current block (corresponding to the PC), currently leading (corresponding to Pc-0 or the currently connected block (corresponding to pc + i)). The address parameter value block 124 defines the address based on the definition block that the branch target instruction should fetch. Therefore, in the illustration of this embodiment, the block block 26 includes at least 2 bits of capacity to define the area for confirmation. At least 4 codes of the block. One of the codes means that the branch instruction is in the same block. The second code determines the confirmation target. Backward area
IM 1057-4473-PF.ptd 第14頁 559733 五、發明說明(ίο) 塊,第三碼確認目前領先區塊,第四碼可用以定確認分支 在相同的區塊而不同方向,而第一碼可用以定確認分支在 相同的區塊而相反方向。因此,以此結構下舉一例子,第 一和第二碼例如為0 0和〇 1,可用以確認前進分支,前者在 此區塊中而後者進入下個區塊。第三和第四碼例如為1 0和 11,可用以確認後退分支,前者進入區塊中而後者在目前 區塊中。因此,位址1 5的位元0可表示前進分支,位址1 5 的位元1可表示後退分支。 第8圖係圖解依據本發明之結構由分支指令確認分支 執行時指令的記憶體區塊而執行指令之功能概要方塊圖。& 在此結構下,1 4最低有效位元〇 -1 3從指令記憶體2 2連結到 多工器216的4輸入端之其中3輸入端。其他18位元31 - 14由 程式計數器1 2中取得並結合指令記憶體22的1 4最低有效位 元。其他1 8位元3 1 - 1 4連結到遞增模組2 2 0、遞減模組2 2 2 和直接路徑223,並將位元結果與指令記憶體22的14最低 有效位元在多工器216的輸入端結合。遞增模組220用以產 生當分支為記憶體下個區塊的位址;遞減模組222用以產 生當分支為記憶體目前領先區塊的位址;直接路徑223用 以產生當分支為記憶體目前區塊的位址;多工器2 1 6的第 四輸入端直接從程式計數器中接收位元31-0並供一船遠續〇 程式的執行所使用。 、 如分支偵測模組18用以選擇載入位址暫存器2〇的位址。 選沒有選取分支,則選擇14的位元31-0位址。如果分支 、、擇至下個區塊,則選擇220的位元3 1-14位址。如果分支IM 1057-4473-PF.ptd Page 14 559733 V. Description of the invention (ίο) block, the third code confirms the current leading block, the fourth code can be used to confirm that the branch is in the same block but different directions, and the first The code can be used to confirm that the branches are in the same block but in opposite directions. Therefore, an example is given in this structure. The first and second codes are, for example, 0 0 and 0 1, which can be used to confirm the forward branch, the former being in this block and the latter entering the next block. The third and fourth codes are, for example, 10 and 11, which can be used to confirm the backward branch, the former enters the block and the latter is in the current block. Therefore, bit 0 at address 15 can represent the forward branch, and bit 1 at address 15 can represent the backward branch. Fig. 8 is a block diagram illustrating the function of an instruction executed by a branch instruction confirming the execution of a memory block of the instruction according to the structure of the present invention. & In this structure, the 14 least significant bits of 0-1 3 are connected from the instruction memory 2 2 to 3 of the 4 inputs of the multiplexer 216. The other 18 bits 31-14 are obtained from the program counter 12 and combined with the least significant 4 bits of the instruction memory 22. The other 18 bits 3 1-1 4 are connected to the increment module 2 2 0, the decrement module 2 2 2 and the direct path 223, and the bit result and the 14 least significant bits of the instruction memory 22 are in the multiplexer. The input of 216 is combined. Increment module 220 is used to generate the address of the next block when the branch is in memory; decrement module 222 is used to generate the address of the current leading block when the branch is in memory; direct path 223 is used to generate when the branch is memory The address of the current block; the fourth input of the multiplexer 2 1 6 directly receives bits 31-0 from the program counter and is used by a ship to continue the program execution. For example, the branch detection module 18 is used to select an address of the address register 20 to be loaded. If no branch is selected, the bit 31-0 address of 14 is selected. If branch,, select to the next block, then select bit 3 1-14 address of 220. If branch
559733 五、發明說明(11) 選取至先前區塊,則選擇222的位元31-14位址。如果分支 選取至目前的區塊,則選擇223的位元3 1-14位址。 依據本發明的具有區塊確認碼之分支指令可用以作分 支偵測的幫助。如上所示,位址1 5的位元0可表示前進分 支,位址1 5的位元1可表示後退分支。一般分支預測的方 法中僅取後退分支而不取前進分支。因此,依據本發明, 如果位址1 5的位元0則不取分支,如果位址1 5的位元}則取 分支。 因此’依據本發明在第8圖的實施例中,提取分支的 位址可能範圍超過第5圖的實施例。使用後者的方法,位 址參數值的位移值的部份僅特定的2h可能位址,而非第5 圖前者方法中的216可能位址。然而,在此例中,使用第8 圖的方法,三個特定的記憶體區塊中的每一個之2U可能位 址。所以大大地增加可能分支距離和實現程式彈性的結 第9圖係圖解依據本發明降低分支等待的裝置與方法 之程式指令執行的時序時間和解碼週期之時序圖。如第9 圖所示,依據本發明,可在成功的週期内提取分支指令 分支目標指令。消除了傳統方法中所發現的分支等待。 在此註解本發明本以上述不同的方法來實行。舉 在指令快取記憶體載入欲執行之指令開始執行前 立即實灯’而不必以程式編譯或連結來個別改變指令本 :元:者:ί ί t確'忍攔位在示範的實施例中為2 位兀搁位 田載入才日1時附加$丨| 人 了加司適當的指令快取記憶體配559733 V. Description of the invention (11) If the previous block is selected, the bits 31-14 of 222 are selected. If the branch is selected to the current block, the bit 3 1-14 address of 223 is selected. A branch instruction with a block confirmation code according to the present invention can be used as an aid for branch detection. As shown above, bit 0 at address 15 can represent the forward branch, and bit 1 at address 15 can represent the backward branch. In the general branch prediction method, only the backward branch is taken and the forward branch is not taken. Therefore, according to the present invention, no branch is taken if bit 0 of address 15 is taken, and a branch is taken if bit 15 of address 15 is taken. Therefore, in the embodiment of FIG. 8 according to the present invention, the address of the extraction branch may exceed the embodiment of FIG. 5. Using the latter method, the shift part of the address parameter value is only a specific 2h possible address, rather than the 216 possible address in the former method in FIG. 5. However, in this example, using the method of Figure 8, the 2U possible address of each of the three specific memory blocks. Therefore, the possible branch distance is greatly increased and the flexibility of the program is realized. FIG. 9 is a timing diagram illustrating the timing of program instruction execution and the decoding cycle of the apparatus and method for reducing branch waiting according to the present invention. As shown in Fig. 9, according to the present invention, a branch instruction branch target instruction can be fetched in a successful cycle. Eliminates branch waits found in traditional methods. It is noted here that the present invention is implemented in different ways as described above. For example, the instruction cache is loaded immediately before the execution of the instruction to be executed. It is not necessary to change the instruction individually by program compilation or linking. In the middle of the table, 2 people are loaded. At 1 o'clock on the day, they are added with $ 丨 |
559733 五、發明說明(12) 置中。 在此實施例中,舉例所述的3 2位元指令,分支指令的 1 6位元位址參數值和2位元方塊確認參數值。可理解的這 些位元的數量在不超過本發明的範圍之内可以改變。 本發明的較佳實施例描述如上,然而,在不脫離本發 明之精神和範圍内,當可做些許的更動與潤飾。此外,其 餘實施例皆在下列申請專利範圍内。 參559733 V. Description of invention (12) Centered. In this embodiment, the 32-bit instruction, the 16-bit address parameter value of the branch instruction, and the 2-bit block confirmation parameter value are described as examples. It is understood that the number of these bits can be changed within the scope of the present invention. The preferred embodiment of the present invention has been described above. However, a few changes and modifications can be made without departing from the spirit and scope of the present invention. In addition, the other embodiments are all within the scope of the following patent applications. Participate
1057-4473-PF.ptd 第17頁1057-4473-PF.ptd Page 17