TW200414035A - Apparatus and method for killing an instruction after loading the instruction into an instruction queue in a pipelined microprocessor - Google Patents

Apparatus and method for killing an instruction after loading the instruction into an instruction queue in a pipelined microprocessor Download PDF

Info

Publication number
TW200414035A
TW200414035A TW93100761A TW93100761A TW200414035A TW 200414035 A TW200414035 A TW 200414035A TW 93100761 A TW93100761 A TW 93100761A TW 93100761 A TW93100761 A TW 93100761A TW 200414035 A TW200414035 A TW 200414035A
Authority
TW
Taiwan
Prior art keywords
instruction
queue
signal
clock cycle
deletion
Prior art date
Application number
TW93100761A
Other languages
Chinese (zh)
Other versions
TWI249131B (en
Inventor
Thomas C Mcdonald
Original Assignee
Ip First Llc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ip First Llc filed Critical Ip First Llc
Publication of TW200414035A publication Critical patent/TW200414035A/en
Application granted granted Critical
Publication of TWI249131B publication Critical patent/TWI249131B/en

Links

Abstract

An apparatus for killing an instruction after it has already been loaded into an instruction queue of a microprocessor is disclosed. The apparatus includes control logic that detects a condition in which the instruction must not be executed, such as a branch instruction misprediction; however, the control logic determines the condition too late to prevent the instruction from being loaded into the instruction queue. The control logic generates a kill signal indicating the instruction must not be executed. A kill queue receives the kill signal and stores its value. The kill queue maintains its entries in parallel with the instruction queue entries so that when the instruction queue subsequently outputs the instruction, the kill queue also outputs the value of the kill signal associated with the instruction. If the kill signal value output from the kill queue is true, then the microprocessor invalidates the instruction and does not execute it.

Description

200414035 五、發明說明(1) . 發明所屬之技術領域 本發明涉及一種微處理器中的指令緩衝,特別是涉及 一種在指令被載入指令緩衝器之後的指令刪除。 本發明是一相關申請,本申請主張美國專利申請第 60/440063號的優先權,該申請於2003年1月14日遞交,其 名稱為用於刪除在前級管線階段中採用分支目標位址快取 記憶體的微處理器中的指令格式化之後被作廢的指令的裝 置及方法(APPARATUS AND METHOD FOR KILLING INSTRUCTIONS DETERMINED INVALID AFTER INSTRUCTION FORMATTING IN A MICROPROCESSOR EMPLOYING A BRANCH TARGET ADDRESS CACHE IN AN EARLY PIPELINE STAGE )° 先前技術 現代微處理器都是管線微處理器。它們可在微處理器 的不同模組或管線階段中同時操作多個指令。H e η n e s s y與 P a 11 e r s ο n在其合著書中將管線技術定義為”多個指令重疊 執行的實施技術〈電腦結構:量化方法〉〉,第二版, John L. Hennessy , David A. Patterson 合著,Morgan Kauf mann 出版社,San Francisco, CA, 1 9 9 6 〇 他們 還提供了如下對於管線技術極佳的形象解釋: 一個管線就像一條流水線。在汽車生產的流水線上,有許 多階段,每個階段給汽車裝配一個零部件。每個階段與其 他階段並行運作,但各自對不同汽車進行裝配。在一個電 腦管線内,每個步驟完成指令的一部分。像流水線一樣,200414035 V. Description of the invention (1). TECHNICAL FIELD OF THE INVENTION The present invention relates to an instruction buffer in a microprocessor, and more particularly to instruction deletion after an instruction is loaded into the instruction buffer. This invention is a related application. This application claims the priority of US Patent Application No. 60/440063. This application was filed on January 14, 2003, and its name is used to delete the branch target address used in the previous pipeline stage. APPARATUS AND METHOD FOR KILLING INSTRUCTIONS DETERMINED INVALID AFTER INSTRUCTION FORMATTING IN A MICROPROCESSOR EMPLOYING A BRANCH TARGET ADDRESS CACHE IN AN EARLY PIPELINE STAGE Prior art modern microprocessors were pipeline microprocessors. They can operate on multiple instructions simultaneously in different modules or pipeline stages of a microprocessor. He η nessy and Pa11 ers ο n in their co-author define pipeline technology as "implementation technology of multiple instructions overlapping execution (computer structure: quantification method)", second edition, John L. Hennessy, David A Co-authored by Patterson, Morgan Kauf mann Press, San Francisco, CA, 196. They also provide an excellent image explanation of pipeline technology as follows: A pipeline is like a pipeline. On the pipeline of automobile production, there are Many stages, each stage assembles a component for the car. Each stage runs in parallel with the other stages, but each assembles a different car. Within a computer pipeline, each step completes a part of the instruction. Like a pipeline,

12829twf.ptd 第9頁 200414035 五、發明說明(2) . 不同的步驟同時並行完成不同指令的不同部分。每個此種 步驟被稱為一個管線階段,或管線片斷。所有階段連成一 體構成一個完整的管線,指令從一端進入,經過管線中各 個階段,最後又另一端輸出,正像流水線上的汽車一樣。 同步微處理器按時脈週期工作。一般地,在一個時脈週期 内,指令由一個微處理器的管線階段進行到下一個。在汽 車流水線上,如果階段上有工人沒有汽車可進行操作,那 麼整條流水線的生產效率就會降低。同樣,如果某個時脈 週期内一個微處理器階段因沒有指令可進行操作而被閒置 (一種通常被稱為管線氣泡的現象),那麼這個微處理器 的運算效率也會降低。 一個常用來避免管線氣泡現象的方法是在管線不同階 段間採用指令緩衝器,其常用結構為佇列結構。一個指令 緩衝器可在其前後管線階段處理速度不同時提供一個緩衝 空間。例如,當管線執行階段(如低端)需要指令來進行 操作,而在管線高端的快取記憶體中沒有指令時,指令緩 衝器就可發揮其作’用。此種情況下,在記憶體讀取的同 時,指令緩衝器可為執行階段提供指令,因此減小快取記 憶體指令缺失造成的影響。 另一種可能的管線氣泡產生的原因是分支指令。當處 理器接收到一個分支指令時,它必須確定分支指令的目標 位址並從目標位址取得指令,而不是從分支指令後的下一 個順序位址。另外,如果此分支指令為條件分支指令(如 根據某一條件存在與否而決定是否執行此分支指令),處12829twf.ptd Page 9 200414035 V. Description of the Invention (2). Different steps complete different parts of different instructions in parallel at the same time. Each such step is called a pipeline stage, or pipeline segment. All stages are connected together to form a complete pipeline. The instructions enter from one end, pass through each stage in the pipeline, and finally output at the other end, just like cars on the assembly line. The synchronous microprocessor operates on a clock cycle. Generally, in one clock cycle, instructions proceed from the pipeline stage of a microprocessor to the next. On the automobile assembly line, if there are workers at the stage who do not have cars to operate, then the production efficiency of the entire assembly line will be reduced. Similarly, if a microprocessor stage is idled because there are no instructions to operate in a clock cycle (a phenomenon commonly referred to as a pipeline bubble), the microprocessor's computing efficiency will also decrease. A commonly used method to avoid pipeline bubbles is to use instruction buffers between different stages of the pipeline. The common structure is a queue structure. An instruction buffer provides a buffer space when the processing speed of the pipeline stages before and after it is different. For example, when the pipeline execution stage (such as the low-end) requires instructions to operate and there are no instructions in the high-end cache memory of the pipeline, the instruction buffer can play its role. In this case, while the memory is being read, the instruction buffer can provide instructions for the execution phase, thus reducing the impact of missing cached memory instructions. Another possible cause of pipeline bubbles is branch instructions. When the processor receives a branch instruction, it must determine the target address of the branch instruction and obtain the instruction from the target address, not from the next sequential address after the branch instruction. In addition, if this branch instruction is a conditional branch instruction (such as whether to decide whether to execute this branch instruction based on the existence of a certain condition),

12829twf.ptd 第10頁 20041403512829twf.ptd Page 10 200414035

五、發明說明(3). 理器在確定目樑位址之外,還必須確定此分 被執行。因為確定目標位址和/或決定是否^支指令是否將 的管線階段在取得指令的管線階段之後,執、行分支指令 能由此產生。 以管線氣泡可 指令緩衝的確可以減少管線氣泡的數旦 理器一般都採用分支預測的機制來提前箱但現代微處 或分支指令是否將被執行,以進一步減小此,位址和/ 如果分支預測錯誤,無論因此預測而取彳異 問題。然而’ 序指令或目標位址指令,此指令均不應執^ 9々為下一順 生錯誤。 ’否則將會產 更正錯誤地分支指令預測,正是必須將 、 器的指令加以刪除的一個例子,亦即,不癉已,入微處理 此錯誤分支指令。然而,實際情況可能是:^官線來執行 入指令緩衝器之後才確定其必須被刪除。^指令已經被寫 方案來實現被寫入指令記憶體的指令的此,亟需一種 發明内容 除。 一時脈週期載入微處理器指令佇列而在二刪除一個在 佇列底端輸出的指令。其包括:一個一個時脈週期 遞前述第一時脈週期之後的第三時脈週號,用以谓 個刪除佇列,與刪除信號合用,用以栽2 ϋ f生的值; 期產生的刪除信號值,並在下一個時脈i ^ ^第三時脈 一個在第=時脈週期產生的有效性信號^將此值輸出 用,用以δ兄明此指令是否需要被微理u ‘:除2列合 益執仃。如果删jV. Description of the invention (3). In addition to determining the address of the eyepiece, the processor must also determine that this point is executed. Because the pipeline stage that determines the target address and / or decides whether the instruction will be supported or not, the execution and execution of branch instructions can be generated after the pipeline stage that gets the instruction. Instruction buffering with pipeline bubbles can indeed reduce pipeline bubbles. Generally, branch prediction mechanisms are used to advance the box. But whether modern instructions or branch instructions will be executed to further reduce this, address and / or branch Prediction is wrong, no matter what the surprise is. However, this sequence instruction or target address instruction should not be executed ^ 9々 as the next sequential error. ’Otherwise, the incorrect branch instruction prediction will be corrected, which is an example where the instruction of the device must be deleted, that is, the wrong branch instruction is processed microscopically. However, the actual situation may be: ^ official line to execute into the instruction buffer before determining that it must be deleted. ^ Instructions have been written to achieve the instructions written in the instruction memory, and there is an urgent need for an invention. One clock cycle is loaded into the microprocessor instruction queue and one instruction is output at the bottom of the queue at two. It includes: one clock cycle passes the third clock week number after the first clock cycle, which is used to prescribe a delete queue, and is used in conjunction with a delete signal to plant a value of 22f; Delete the signal value, and the next clock i ^ ^ the third clock a validity signal generated in the = clock cycle ^ This value is used for δ brother to indicate whether this instruction needs to be micro-unified u ': Divide 2 columns of Heyi execution. If you delete j

200414035 五、發明說明(4) • · · - . · 佇列在第二時脈週期輸出的刪除信號值為真,則此有效性 信號值為假。 另一方面,本發明提供一種刪除微處理器中指令的方 法。其包括:在第一時脈週期内將指令載入第一佇列,在 下一個時脈週期内產生一個刪除信號並將此刪除信號的值 載入另一個佇列;在第三時脈週期内將此指令從第一佇列 的底端輸出,並確定第二佇列中的信號值是否為真,如果 此值為真則執行此指令。 另一方面,本發明提供一種微處理器。其包括:第 一佇列,用來接收指令以進行指令緩衝;一個邏輯,與第 一佇列合用以發現指令不得被微處理器執行的情況,此邏 輯產生一個值為真的信號來說明此情況,此信號在指令被 第一佇列接收後產生;第二佇列,與邏輯合用,用以載入 此真值信號並於第一佇列輸出指令同時輸出此信號。此微 處理器回應此真值信號,作廢相應指令。 另一方面,本發明提供一種包含在傳輸媒體中的電腦 資料,此電腦資料f包括電腦可讀的程式碼,此程式碼使一 裝置可實現對在第一時脈週期載入微處理器指令佇列而在 下一時脈週期從佇列底端輸出的指令的刪除操作。該程式 碼包括:第一段程式碼,用以產生刪除信號及傳遞第三時 脈週期產生的信號值;第二段程式碼,用以產生刪除佇 列,與刪除信號合用,載入第三時脈週期產生的刪除信 號,並在第二時脈週期將此刪除信號值輸出;第三段程式 碼,用以產生一個有效性信號,並與刪除佇列合用,此有200414035 V. Description of the invention (4) • · ·-. · The delete signal output in the second clock cycle is true, then the validity signal value is false. In another aspect, the invention provides a method for deleting instructions in a microprocessor. It includes: loading instructions into the first queue in the first clock cycle, generating a delete signal in the next clock cycle and loading the value of this delete signal into another queue; in the third clock cycle Output this instruction from the bottom of the first queue and determine whether the signal value in the second queue is true. If this value is true, execute this instruction. In another aspect, the present invention provides a microprocessor. It includes: the first queue is used to receive instructions for instruction buffering; a logic is used in conjunction with the first queue to find that the instruction cannot be executed by the microprocessor, and this logic generates a signal that is true to explain In this case, this signal is generated after the instruction is received by the first queue; the second queue is used in conjunction with logic to load the truth signal and output the signal at the same time as the output instruction in the first queue. The microprocessor responds to this truth signal and invalidates the corresponding instruction. In another aspect, the present invention provides computer data contained in a transmission medium. The computer data f includes computer-readable code, which enables a device to implement instructions for loading a microprocessor in a first clock cycle. The queue deletes the instruction output from the bottom of the queue in the next clock cycle. The code includes: a first piece of code for generating a delete signal and transmitting a signal value generated by a third clock cycle; a second piece of code for generating a delete queue, which is used in conjunction with a delete signal to load a third The erasure signal generated in the clock cycle, and the value of this erasure signal is output in the second clock cycle; the third piece of code is used to generate a validity signal and is used in conjunction with the erasure queue.

12829twf.ptd 第12頁 200414035 五、發明說明(5) 第二::週期產生’並用以說明指令是否將被 冊J^ th =刪除佇列在第一時脈週期中所輸出的 示彳"就為真,此有效性信號值將為假。 、本發明的一個優點在於它使得採用指令 Ϊ : Ϊ : Ϊ Ϊ的要求指令刪除功能的微處理器管線中二程 能^在德Ϊ Ϊ仃。另一個好處在於,本發明使得刪除信號 列句在後面產生,而不需要額外的管線階段來儲存指令佇 ^讓本發明之上述和其他目的、特徵、和優點能更明 ,’下文特舉一較佳實施例,並配合所附圖式,作詳 、袖况明如下: 方式^ 抑园1為本發明的一個微處理器1 〇 Q的結構示意圖。微處 里器1 0 0疋一個具備多個管線階段的管線處 f示了部分階段,包括一個指令階段(二二「51'圖 —個提取階段(F - s t a g e ) 1 5 3, 一個轉譯階段(χ — s t a g e )155和一個暫存階段(R-Stage)157。卜stage 151包括 一個從記憶體或快取記憶體提取指令位元組的階段。在一 種實例中,I-stage 151包括了多個階段。F — Stage 153包 括一個將一段未格式化指令位元組格式化的階段。 ystage 155包括一個將巨集指令轉化為 R〜stage 157包括一個從暫存哭# # λ、史— 於 甘从D . 1 7存為槽載入運算元的暫存階 奴。其他R-stage 157之後的諸如位址產 仃,儲存及結果寫回等微處理哭〗Λ n从缸 貝叶 丄 &搜為1 0 0的執行階段未在圖i中12829twf.ptd Page 12 200414035 V. Description of the invention (5) Second: "cycle generation" and used to explain whether the instruction will be registered J ^ th = delete the display output in the first clock cycle " True, this validity signal value will be false. An advantage of the present invention is that it makes it possible to use instructions Ϊ: Ϊ: Ϊ Ϊ in the microprocessor pipeline that requires the instruction delete function in the second pass ^ Ϊ Ϊ Ϊ 仃. Another advantage is that the present invention enables the deletion of a signal sequence to be generated later without the need for additional pipeline stages to store instructions. ^ This makes the above and other objects, features, and advantages of the present invention clearer. The preferred embodiment, combined with the attached drawings, is described in detail and the details are as follows: Mode ^ Yiyuan 1 is a schematic structural diagram of a microprocessor 100 in the present invention. Microprocessor 1 0 0 疋 A pipeline with multiple pipeline stages f shows some of the stages, including an instruction stage (two "51 'graph-an extraction stage (F-stage) 1 5 3, a translation stage (Χ — stage) 155 and a temporary stage (R-Stage) 157. Stage 151 includes a stage to fetch instruction bytes from memory or cache memory. In one example, I-stage 151 includes Multiple stages. F — Stage 153 includes a stage to format an unformatted instruction byte. Ystage 155 includes a macro instruction to R ~ stage 157 includes a cry from temporary storage # # λ 、 史 — Yu Gan from D. 1 7 stored as a slot to load the temporary step slaves of the operands. Other R-stage 157 microprocessing such as address generation, storage, and result write back cry Λ n from the cylinder shell leaves & search for 1 0 0 execution phase is not in Figure i

12829twf.ptd 200414035 五、發明說明(6) 列出。 微處理器100在I-stage 151中包括了一個指令快取記 憶體1 0 4。指令快取記憶體1 〇 4緩衝從與微處理器1 〇 〇合用 的系統記憶體中取得的指令。指令快取記憶體1 〇 4接收一 個當前選取位址1 8 1 ,據此來選擇容量為一個快取線 (cache line)的指令位元組167並將其輸出。在一種實 例中’ 4曰令快取5己憶體1 0 4係為一多階段快取記情體,亦 即,指令快取記憶體1 0 4要求多個時脈週期來相應當前選 取位址並輸出一個快取線。 " 微處理器100在I - stage 151内還包括一個多工器 178。多工器178提供當前選取位址181 。多工器178 ^收下 一個目標位址1 7 9,此位址係將當前目標位址丨8 i加上指令 快取記憶體1 0 4記憶體所輸出的快取線大小所取得。多工 器178還將接收一個更正位址177,此位址明確指'出一個供 微處理器1 0 0更正錯誤的分支預測所用的位址。工器1 7 8 還接收一個預測的分支目標位址丨7 5。 ° 微處理器100在I-Stage 151内還包括一個 址快取記憶體BTAC 1 06 ’此快取記憶體耦接至 178 »BTAC 106回應當前目標位址181並產生—測 支目標位址175 〇BTAC i 06緩衝儲存執行過的 分支目標位址及分支指令位址。在一種實例中,btaci〇6 包括一個4路組合快取記憶體’並且被選中組合路 包含了多個項目,以供儲存目標位址及所預測分支指令的 分支預測育讯。除預測的分支目標位址丨7 5外,bt AC丨〇 612829twf.ptd 200414035 V. Description of Invention (6) List. The microprocessor 100 includes an instruction cache memory 104 in the I-stage 151. The instruction cache memory 104 buffers instructions obtained from the system memory used in conjunction with the microprocessor 100. The instruction cache memory 104 receives a currently selected address 181, and selects and outputs an instruction byte 167 with a capacity of one cache line. In one example, the '4 command cache 5 self-memory body 104 is a multi-stage cache memory situation, that is, the instruction cache memory 104 requires multiple clock cycles to correspond to the currently selected bit. Address and output a cache line. " Microprocessor 100 also includes a multiplexer 178 in I-stage 151. The multiplexer 178 provides the currently selected address 181. The multiplexer 178 ^ receives the next target address 179. This address is obtained by adding the current target address 丨 8 i to the instruction cache memory 1 0 4 and outputting the cache line size. The multiplexer 178 will also receive a correction address 177, which explicitly refers to an address for the microprocessor 100 to correct the incorrect branch prediction. The worker 1 7 8 also receives a predicted branch target address 丨 7 5. ° The microprocessor 100 also includes an address cache memory BTAC 1 06 in I-Stage 151 'This cache memory is coupled to 178 »BTAC 106 responds to the current target address 181 and generates-test branch target address 175 〇BTAC i 06 buffer stores the executed branch target address and branch instruction address. In one example, btaci06 includes a 4-way combination cache memory 'and the selected combination contains multiple items for storing the target address and the branch prediction education of the predicted branch instruction. Except the predicted branch target address 丨 7 5, bt AC 丨 〇 6

12829twf.ptd 200414035 五、發明說明(7) 還輸出分支預測相關資訊1 9 4。在一種實例中,B T A C資訊 1 9 4包括:一個偏移位元,說明當前選取位址丨8 1選中的快 取線的預測的分支指令的首位元組;一條資訊說明預測的 分支指令是否跨越半個快取線;針對選中項目中每個項目 的一個有效位元;一條資訊說明選中組合中的哪一路為最 近所最少使用的路;一條資訊說明選中路中的哪一個為最 近所最少所用的項目;及一個分支指令是否將會被執行的 預測。 微處理器1 0 0還包括控制邏輯1 0 2。如果當前目標位址 181與BTAC 106中一個執行過的分支指令的有效的快取位 址相符合,並且B T A C 1 0 6預測此分支指令將會被執行,則 控制邏輯1 02控制多工器1 78來選擇BTAC目標位址1 75。如 果錯誤的分支預測發生,控制邏輯1 〇 2則控制多工器1 7 8選 擇更正位址1 7 7。否則,控制邏輯1 〇 2將控制多工器1 7 8來 選擇下一個目標位址1 7 9。控制邏輯1 0 2也接受B T A C資訊 1 94 〇 ' 微處理器10 0在其1一 stage 151内還包括前置解碼邏輯 1 〇 8 ’此前置解碼邏輯丨〇 8與指令快取記憶體丨〇 4合用。前 置解碼邏輯1 〇 8接收指令快取記憶體丨〇 4提供的指令位元 組1 67的快取線及BTAC資訊1 94,並據此產生前置解碼資訊 1 6 9。在一種實例中,前置解碼資訊丨6 9包括:與每個指令 位元組相關的一個位元,此位元用來預測此位元組是否為 BTAC 1 〇 6所預測執行的分支指令的運算代碼;根據預測的 指令長度來預測下一指令長度的多個位元;與每個指令位12829twf.ptd 200414035 V. Description of the invention (7) It also outputs branch prediction related information 1 9 4. In one example, the BTAC information 1 9 4 includes: an offset bit indicating the first byte of the predicted branch instruction of the currently selected address 丨 8 1 selected cache line; an information indicating whether the predicted branch instruction is Cross half of the cache line; one valid bit for each item in the selected item; a message indicating which of the selected combinations is the least recently used road; a message indicating which of the selected roads is the most recent The least used items; and the prediction of whether a branch instruction will be executed. The microprocessor 100 also includes control logic 102. If the current target address 181 matches the valid cache address of an executed branch instruction in BTAC 106, and BTAC 1 0 6 predicts that this branch instruction will be executed, then the control logic 1 02 controls the multiplexer 1 78 to select BTAC destination address 1 75. If an incorrect branch prediction occurs, the control logic 102 controls the multiplexer 17 8 to select the correction address 17 7. Otherwise, the control logic 102 will control the multiplexer 178 to select the next target address 179. Control logic 1 0 2 also accepts BTAC information 1 94 〇 'Microprocessor 10 0 also includes pre-decoding logic 1 in its stage 151 〇 8' This pre-decoding logic 丨 〇8 and instruction cache memory 丨〇4A combined. The pre-decoding logic 1 08 receives the instruction cache memory 1 4 and the instruction line 1 67 of the cache line and the BTAC information 1 94, and generates pre-decoding information 1 6 9 accordingly. In one example, the pre-decoding information 6 9 includes: a bit associated with each instruction byte, which is used to predict whether the byte is a branch instruction predicted to be executed by BTAC 1 06 Operation code; predict multiple bits of the next instruction length based on the predicted instruction length; and each instruction bit

12829twf.ptd 第15頁 200414035 五、發明說明(8) • * · 署 - · 元組相關的一位元,.此位元用來預測此位元組是否為指令 的字首位元組;及分支指令輸出結果的預測。 微處理器100在其F-Stage 153内還包括一個指令位元 組緩衝器1 1 2,此緩衝器1 1 2與前置解碼邏輯1 0 8合用。指 令位元組緩衝器1 1 2從前置解碼邏輯1 〇 8接收前置解碼資訊 1 6 9,並從指令快取記憶體1 〇 4處接收指令位元組1 6 7。指 令位元組緩衝器1 1 2通過信號1 9 6向控制邏輯1 〇 2提供前置 解碼資訊。在一種實例中,指令位元組緩衝器1 1 2能夠緩 衝4個快取線的指令位元組及相關的前置解碼資訊。 微處理器1 0 0還包括指令位元組緩衝控制邏輯1 1 4,其 與指令位元組緩衝器1 1 2合用。指令位元組緩衝控制邏輯 1 1 4控制輸入及輸出指令位元組緩衝器丨丨2的指令位元組及 相關前置解碼資訊資料的流程。指令位元組緩衝控制邏輯 1 1 4同時也接收B T A C資訊1 9 4。 微處理器100在其F-stage 153内還包括一個指令格式 器1 1 6 ,其與指令位元組緩衝器1 1 2合用。指令格式器} J 6 從指令位元組緩衝,器1 1 2處接收指令位元組及前置解碼資 訊1 6 5,並由此產生格式化指令1 9 7。即,指令格式器i i 6 查閱從指令位元組緩衝器1 1 2内一個指令位元組的字串, 確定哪些位元組包含下一個指令及指令長度,並將下一指 令作為格式化後指令1 9 7輸出。在圖1所示實例中,指令格 式器1 1 6包括了 一個組合邏輯,此邏輯查閱指令位元組緩 衝器112提供的指令位元組165並在同一個時脈週期内輸出 格式化後指令1 9 7。在一種實例中,格式化後的指令丨9 7所12829twf.ptd Page 15 200414035 V. Description of the invention (8) • * · Department-· A bit related to the tuple. This bit is used to predict whether this byte is the first byte of the instruction; and branch Prediction of instruction output results. The microprocessor 100 also includes an instruction byte buffer 1 12 in its F-Stage 153. This buffer 1 12 is used in combination with the pre-decoding logic 108. The instruction byte buffer 1 1 2 receives pre-decoding information 1 6 9 from the pre-decoding logic 1 08, and receives the instruction byte 1 67 from the instruction cache memory 104. The instruction byte buffer 1 12 provides pre-decoding information to the control logic 102 via a signal 196. In one example, the instruction byte buffer 1 12 can buffer the instruction bytes and related pre-decoding information of the four cache lines. The microprocessor 100 also includes an instruction byte buffer control logic 1 1 4 which is used in conjunction with the instruction byte buffer 1 12. Instruction Byte Buffer Control Logic 1 1 4 Controls the input and output instruction byte buffers and the instruction byte and related pre-decoding information flow. The instruction byte buffer control logic 1 1 4 also receives B T A C information 1 9 4 at the same time. The microprocessor 100 also includes an instruction formatter 1 1 6 in its F-stage 153, which is used in conjunction with the instruction byte buffer 1 1 2. Instruction Formatter} J 6 receives the instruction byte buffer from the instruction byte buffer, and receives the instruction byte and pre-decode information 1 6 5 at the device 1 12 and generates the formatted instruction 1 9 7 from it. That is, the instruction formatter ii 6 consults a string of instruction bytes from the instruction byte buffer 1 1 2 to determine which bytes contain the next instruction and the instruction length, and uses the next instruction as a formatted Command 1 9 7 is output. In the example shown in FIG. 1, the instruction formatter 1 16 includes a combination logic. This logic refers to the instruction byte 165 provided by the instruction byte buffer 112 and outputs the formatted instruction in the same clock cycle. 1 9 7. In one example, the formatted instructions

12829twf.ptd 第16頁 200414035 五、發明說明(9) 提供的格式化後的指令包含了充分符合X 8 6結構指令組合 的指令。在一種實例中’格式化後的指令又被稱作由巨隼 指令轉化成的可由微處理器1 〇 〇管線執行階段所執行的微 指令。格式化後指令197是在F-Stage 153内產生的。每次 指令格式器1 1 6輸出一個格式化後指令1 9 7,指令格式器 116產生一個值為真的F —new—instr 152信號來說明格式化 後指令1 9 7包含一個有效的格式化後的指令。另外,指令 格式器116通過一個信號F 一 instr —info 198輸出格式化後 指令1 9 7的相關資訊,並將此信號提供給控制邏輯1 〇 2。在 一種實例中,信號F _ i n s t r 一 i n f ο 1 9 8包括:一個預測資訊 (如果此指令為分支指令),此預測資訊說明分支指令是 否被執行;一個指令的字首;此指令的位址是否命中在微 處理器分支位址緩衝記憶體内;是否此指令為一個遠距直 接分支指令(far direct branch instruction);是否 此指令為一個遠距間接分支指令(far indirect branch instruction ) •,是否此指令為一個調用分支指令(call branch instruction );是否]Jt指令為一個返回分支指令 (return branch instruction);是否此指令為一個長 距轉移返回分支指令(far return branch instruction );是否此指令為一無條件分支指令(unconditional branch instruction);及是否!^指令為一 i条件分支指令 (conditional branch instruction)。另外,指令格式 器1 1 6通過當前指令指標c I P信號1 8 2輸出格式化的指令的 位址,此位址等於前一指令之位址加上前一指令長度。12829twf.ptd Page 16 200414035 V. Description of the Invention (9) The formatted instructions provided include instructions that fully comply with the X 8 6 structure instruction combination. In one example, the 'formatted instructions' are also called micro-instructions which are converted from giant instructions and executed by the microprocessor's 1000 pipeline execution stage. The formatted command 197 is generated in the F-Stage 153. Each time the command formatter 1 1 6 outputs a formatted command 1 9 7, the command formatter 116 generates a value of F — new — instr 152 to indicate that the formatted command 1 9 7 contains a valid format Following instructions. In addition, the instruction formatter 116 outputs related information of the formatted instruction 197 through a signal F_instr_info 198, and provides this signal to the control logic 102. In one example, the signal F_instr_inf 1 9 8 includes: a prediction information (if the instruction is a branch instruction), the prediction information indicates whether the branch instruction is executed; the prefix of an instruction; the address of the instruction Whether to hit the microprocessor's branch address buffer memory; whether this instruction is a far direct branch instruction; whether this instruction is a far indirect branch instruction •, whether This instruction is a call branch instruction; whether] Jt instruction is a return branch instruction; whether this instruction is a long return branch instruction (far return branch instruction); whether this instruction is An unconditional branch instruction; and whether or not! The ^ instruction is an i conditional branch instruction. In addition, the instruction formatter 1 1 6 outputs the formatted instruction address through the current instruction index c I P signal 1 8 2. This address is equal to the address of the previous instruction plus the length of the previous instruction.

12829twf.ptd 第17頁 200414035 五、發明說明(ίο) • . . · . .... . . 微處理器100在其X-stage 155内還包括一個格式化後 的指令佇列F I Q 1 8 7。格式化後的指令佇列1 8 7從指令格式 器1 1 6處接收格式化後指令1 9 7。格式化後的指令佇列1 8 7 還通過一個早期信號(ear ly〇 ) 193輸出條格式化後的指 令。另外’格式化後的指令彳宁列1 8 7通過一個信號 X一r e 1 _ i n f ο 1 8 6從控制邏輯1 〇 2處接收相關由格式化後指 令197所獲格式化後指令的資訊。X —re l_info 186是在 X - stage 155内產生的。格式化後的指令佇列187還通過 1 a t e 0信號1 9 1輸出其由e a r 1 y 0信號1 9 3輸出格式化後的 指令的相關資訊。格式化後的指令佇列187及X_rel_info 1 8 6將在下面做詳細闡述。 微處理器1 0 0還包括格式化後指令佇列F I Q的控制邏輯 118 °FIQ控制邏輯118從指令格式器116處接收信號 F 一 new_instr 152 °FIQ控制邏輯118產生一個真值信號 F I Q — f u 1 1 1 9 9,並在格式化後的指令佇列1 8 7滿時,將此 信號發送給指令格式器1 1 6。F I Q控制邏輯1 1 8還產生一個 0 s h i f t ia 16 4 ’用來控制格式化後的指令Y宁列1 8 7内指令 的輪換。FIQ控制邏輯118還產生多個ei〇ad信號162,用來 控制從格式化後指令1 9 7向空的格式化後的指令仵列1 8 7項 目載入指令。在一種實例中,F I Q控制邏輯1 1 8為每一個格 式化後的指令彳宁列1 8 7的項目產生一個e 1 〇 a d信號1 6 2。在 一種實例中,格式化後的指令佇列1 8 7包括1 2個項目,每 一個項目儲存一條格式化後的巨集指令。但是,為了使示 意圖簡明清楚,圖1至圖3中的格式化後的指令>f宁列丨8 7僅12829twf.ptd Page 17 200414035 V. Description of Invention (ίο) •.... Of microprocessor 100 also includes a formatted instruction queue FIQ 1 8 7 in its X-stage 155 . The formatted instruction queue 1 8 7 receives the formatted instruction 1 9 7 from the instruction formatter 1 1 6. The formatted instruction queue 1 8 7 also outputs the formatted instruction through an early signal (early 0) 193. In addition, the formatted instruction 彳 列 1 8 7 receives information about the formatted instruction obtained by the formatted instruction 197 from the control logic 1 02 through a signal X_r e 1 _ i n f ο 1 8 6. X —re l_info 186 is generated in X-stage 155. The formatted instruction queue 187 also outputs information about the formatted instruction through the e a r 1 y 0 signal 1 9 1 through the 1 a t e 0 signal 1 9 1. The formatted instruction queue 187 and X_rel_info 1 8 6 will be explained in detail below. The microprocessor 1 0 0 also includes the formatted instruction queue FIQ control logic 118 ° FIQ control logic 118 receives a signal F from the instruction formatter 116 new_instr 152 ° FIQ control logic 118 generates a true value signal FIQ — fu 1 1 1 9 9 and when the formatted instruction queue 1 8 7 is full, send this signal to the instruction formatter 1 1 6. The F I Q control logic 1 1 8 also generates a 0 s h i f t ia 16 4 ′, which is used to control the rotation of the formatted instruction Y 1 7 7. The FIQ control logic 118 also generates a plurality of ei0ad signals 162, which are used to control the loading of 187 items from the formatted instruction 197 to the empty formatted instruction queue. In one example, the F I Q control logic 1 1 8 generates an e 1 0 a d signal 16 2 for each item of the formatted instruction 彳 1 8 7. In one example, the formatted instruction queue 1 8 7 includes 12 items, and each item stores a formatted macro instruction. However, in order to make the illustration clear and concise, the formatted instructions in Figs.

12829twf.ptd 第18頁 20041403512829twf.ptd Page 18 200414035

展示3個項目;因此圖1 展現3個e 1 〇 a d信號1 6 2,其可標A e 1 o a d [ 2 : 0 ]。 下 F I Q控制邏輯1 1 8還為每一個格式化後的指令彳宁列1 8 7 的項目保持一個有效位元1 3 4。圖1所示實例包含了 3個有 效位元,分別標示為F V 2, F V 1 , 和F V 0。F V 0 1 3 4與格式 化後的指令佇列1 8 7最低端項目相對應;F V 1 1 3 4與袼式/匕 後的指令佇列1 8 7中間項目相對應;而ρ V 2 1 3 4與格式化後 的指令佇列1 8 7最高端項目相對應。f I Q控制邏輯1 1 8還輸 出一個k號F — v a 1 i d 1 8 8,在一種實例中,此信號即為ρ v 〇 1 3 4。有效位元1 3 4說明格式化後的指令佇列丨8 7對應的項 目是否包含一個有效的指令。F I Q控制邏輯1 1 8還接收一個 XIQ —ful 1 信號195。 微處理器100在其X-Stage 155内還包括一個指令轉譯 器1 3 8 ’與格式化後的指令佇列丨8 7合用。指令轉譯器1 3 8 從格式化後的指令佇列187處通過一個early〇信號193接收 一個格式化後的指令,並將此格式化後的巨集指令轉譯成 一個或多個微指令4 7 1。在一個實例中,微處理器丨〇 〇包括 工一個1簡指令集電腦(R丨sc )核心,用來執行原始的或 簡化的指令集。在圖i所示實例中,指令轉譯器丨3 8包括了 $合邏輯,以通過㈡"^ 193來接收格式化後的巨集指 令’並在同一時脈週期内輸出轉譯後的微指令丨7 1。即, 無論指令轉譯器138的輸入是否包含有效的巨集指令,它 均會在每一個時脈週期對其輸入端資訊進行轉譯。 从處理裔100在其X — Stage 155内還包括一個轉譯後的3 items are shown; therefore Fig. 1 shows 3 e 1 o a d signals 16 2 which can be labeled A e 1 o a d [2: 0]. The lower F I Q control logic 1 1 8 also holds a valid bit 1 3 4 for each item of the formatted instruction 彳 1 8 7. The example shown in Figure 1 contains three valid bits, which are labeled F V 2, F V 1, and F V 0, respectively. FV 0 1 3 4 corresponds to the formatted instruction queue 1 8 7 lowest end item; FV 1 1 3 4 corresponds to the style / dagger instruction queue 1 8 7 middle item; and ρ V 2 1 3 4 corresponds to the formatted command queue 1 8 7 highest-end items. The f I Q control logic 1 1 8 also outputs a k number F — v a 1 i d 1 8 8. In one example, this signal is ρ v 〇 1 34. The valid bits 1 3 4 indicate whether the formatted instruction queue 丨 8 7 contains a valid instruction. The F I Q control logic 1 1 8 also receives a XIQ —ful 1 signal 195. The microprocessor 100 also includes an instruction translator 1 3 8 'in its X-Stage 155 for use with the formatted instruction queue 8 8. The instruction translator 1 3 8 receives a formatted instruction from the formatted instruction queue 187 through an early signal 193, and translates the formatted macro instruction into one or more micro instructions 4 7 1. In one example, the microprocessor includes a 1-instruction-set computer (R-sc) core to execute the original or simplified instruction set. In the example shown in FIG. I, the instruction translator 丨 38 includes $ logic to receive the formatted macro instruction through ㈡ ^ 193 and output the translated micro instruction in the same clock cycle. 7 1. That is, regardless of whether the input of the instruction translator 138 contains a valid macro instruction, it will translate its input information at every clock cycle. The processing line 100 also includes a translated version in its X-Stage 155

200414035 五、發明說明(12) 指令佇列XIQ 154,與指令轉譯器138合用。XI Q 154.緩衝 由指令轉譯器1 3 8處接收的微指令1 7 1。X I Q 1 5 4還緩衝由 格式化後的指令佇列1 8 7處通過1 a t e 0信號1 9 1接收到的相 關資訊。此資訊與微指令1 7 1轉譯之前的格式化後的巨集 指令相關,因此也與微指令1 7 1相關。此相關資訊被微處 理器1 0 0的執行階段用來執行相關的微指令1 7 1。在一種實 例中,X I Q 1 5 4包括4個項目,而在另外的實例中,X I Q 1 5 4分別包括6個或者8個項目。然而,為簡明清楚起見, 圖1 所示X I Q 1 54僅包含3個項目。 微處理器100還包括XIQ控制邏輯156,與XIQ 154合 用。XIQ控制邏輯156接收F_val id信號188並產生XIQ_ful 1 信號195。XIQ控制邏輯156還產生X_load信號164來控制轉 譯後的微指令1 7 1及相關資訊載入至X I Q 1 5 4中。X I Q控制 邏輯1 56還產生X —shi ft信號1 1 1來控制微指令在XIQ 154内 的向下轉移。XIQ控制邏輯156還為XIQ 154的每一個輸入 保持一個有效位元1 4 9。圖1所示實例包括3個有效位元, 分別標記為X V 2, X V 1和X V 0。X V 0 1 4 9對應X I Q 1 5 4低端項 目的有效位元;XVI 149對應XIQ 154中端項目的有效位 元;XV2 149對應XIQ 154高端項目的有效位元。XIQ控制 邏輯1 5 6還輸出一個X 一 v a 1 i d信號1 4 8,在一種實例中,此 信號即為X V 0 1 4 9。有效位元1 4 9說明一個X I Q 1 5 4内對應 的項目是否包含一個有效的轉譯後的微指令。 微處理器100在其X-Stagel55内還包括一個2輸入的多 工器172, 其係耦接至XIQ154。多工器172作為一個選擇200414035 V. Description of the invention (12) The instruction queue XIQ 154 is used together with the instruction translator 138. XI Q 154. Buffering Micro-instructions received by instruction translator 138. X I Q 1 5 4 also buffers the relevant information received by the formatted instruction queue 1 8 7 through 1 a t e 0 signal 1 9 1. This information is related to the formatted macro instruction before microinstruction 17.1 translation, and therefore also to microinstruction 17.1. This related information is used by the execution stage of the microprocessor 100 to execute the relevant microinstruction 171. In one example, X I Q 1 5 4 includes 4 items, while in other examples, X I Q 1 5 4 includes 6 or 8 items, respectively. However, for simplicity and clarity, X I Q 1 54 shown in Figure 1 contains only three items. Microprocessor 100 also includes XIQ control logic 156, which is used in conjunction with XIQ 154. The XIQ control logic 156 receives the F_val id signal 188 and generates a XIQ_ful 1 signal 195. The XIQ control logic 156 also generates an X_load signal 164 to control the translated micro-instruction 1 71 and related information to be loaded into X I Q 1 5 4. X I Q control logic 1 56 also generates X-shi ft signal 1 1 1 to control the downward movement of microinstructions in XIQ 154. The XIQ control logic 156 also holds a valid bit 1 4 9 for each input of the XIQ 154. The example shown in FIG. 1 includes three significant bits, which are labeled X V 2, X V 1 and X V 0, respectively. X V 0 1 4 9 corresponds to the effective bits of the low-end items of X I Q 1 5 4; XVI 149 corresponds to the effective bits of the middle-end items of XIQ 154; XV2 149 corresponds to the effective bits of the high-end items of XIQ 154. The XIQ control logic 1 5 6 also outputs an X 1 v a 1 i d signal 1 4 8. In one example, this signal is X V 0 1 4 9. The valid bit 1 4 9 indicates whether the corresponding item in an X I Q 1 5 4 contains a valid translated micro instruction. The microprocessor 100 also includes a 2-input multiplexer 172 in its X-Stagel 55, which is coupled to the XIQ 154. Multiplexer 172 as an option

I2829twf.ptd 第20頁 200414035 五、發明說明(13) 性地旁路XIQ 154的旁路多工器運作。多工器172在一個輸 入端接收XIQ 154的輸出,而在另一端接收xiq 154的輸入 信號,如微指令171及lateO 191。多工器172在XIQ控制邏 輯1 5 6產生的一個控制信號1 6 1輸入的控制下選擇其所接受 的一個輸入,並將其輸出至R — Stagel57内的 ㈣Μ 暫存器1 7 6。如果執行階段暫存器丨7 6狀態為可接收一條指 令,且當指令轉譯器138輸出微指令171時乂“ 154為空, 則气工器1 7 2在X I Q控制邏輯1 5 6控制下旁路X I Q 1 5 4。微處 理器1 0 0還包括一個有效位元暫存器R ν丨8 9,此暫存器丨8 g 從XIQ控制邏輯156處接收X-Valid信號148,並以此說明儲 存於執行階段暫存器丨76的微指令及相關資訊是否有效。 格式化後的指令仔列1 87包括:早期佇列1 32, =^通過格式化後指令信號丨97接收到的格式化後的巨集 才曰々;一個相對應的晚期佇列146,用來儲存通過 ” X —rel_lnf〇信號186接收到的相關資訊。 U2包括3個項目,分別標記為EE2,EE1⑴:早^丁 2 Γ巧二低端項目;ΕΕ1為早期仔歹"32的中端項為 ^Ε2為早期佇列132的高端項目。ΕΕ〇的的甲^員 輸出信號earlyO 193。信號eshift 164 k供作為 控制早期仔列132的轉換和載入:類^4^10/;1 2:〇] 162 仔歹,〗“6包括3個項目,分別標記為以2的圖1顯:晚期 LEO為晚期佇列146的低端項目;Lu為 ^ ’ ^E0。 端項目;LE2為晚期佇列146的高端項目免列146的中 作為輸出信號late〇 191。 、 LEo的内容提供I2829twf.ptd Page 20 200414035 V. Description of the invention (13) The bypass multiplexer of XIQ 154 is bypassed. The multiplexer 172 receives the output of the XIQ 154 at one input end and the input signal of the xiq 154 at the other end, such as microinstruction 171 and lateO 191. The multiplexer 172 selects an input it accepts under the control of a control signal 16 1 input generated by the XIQ control logic 1 56, and outputs it to the ㈣M register 1 7 in R-stagel57. If the state of the execution stage register 丨 7 6 can receive an instruction, and when the instruction translator 138 outputs a micro instruction 171 乂 "154 is empty, then the pneumatic device 1 2 is under the control of XIQ control logic 1 5 6 XIQ 1 5 4. The microprocessor 1 0 0 also includes a valid bit register R ν 丨 8 9, this register 丨 8 g receives the X-Valid signal 148 from the XIQ control logic 156 and uses this Indicate whether the microinstructions and related information stored in the execution stage register 76 are valid. Formatted instruction queue 1 87 includes: early queue 1 32, = ^ format received via formatted instruction signal 97 The transformed macro is said to be 々; a corresponding late queue 146 is used to store the relevant information received through the “X-rel_lnf0 signal 186. U2 includes 3 projects, which are marked as EE2, EE1⑴: Early ^ 2 2 巧 Qiao Er low-end projects; Ε1 is the early-term Tsai &32; the middle-end item is ^ Ε2 is the early high-end project 132. The first member of Ε〇 output signal earlyO 193. The signal eshift 164 k is used to control the conversion and loading of the early stage 132: class ^ 4 ^ 10 /; 1 2: 0] 162 stage, "6 includes 3 items, each marked as 2 shown in Figure 1 : Late LEO is the low-end project of late queue 146; Lu is ^ '^ E0. End project; LE2 is the high-end project of late queue 146. The output of late 146 is used as the output signal late〇191.

200414035 五、發明說明(14) 格式化後的指令佇列1 7 ·. 器185在第一時脈週期末尾一個暫存器〗85。暫存 eshift信號164,並在下— Q控制邏輯1 1 8處接收 號168來輸出第一時脈週週期通過一個1 shift信 格式化後的指令佇列i 8 7還包括1的f t信號1 6 4的值。 在第一時脈週期末尾從FiQ控邏。暫存器183 信號1 62,並在下一個3士 rr#匕輯1 18處接收el〇ad[2 : 0] 142來輸出第—時财@ f 1通過一個H〇ad[2:0]信號 值。即,暫存哭期接,到的el〇ad[2:0]信號162的 eloadr2 Π 1 ^ 5 和 1 8 3 /刀別將 e sh i f t 信號 j 6 4 及 el〇ad[2」0]$號162延遲一個時脈週期輸出。 應科^人:^例中,X — rel一inf 〇 186包括:用來轉譯成對 二二式化後的巨集指令的長度;一個對此巨集指 ^疋+ J I、個快取線的說明;此巨集指令的一個存放位 # :3指令的一個當前位置;此巨集指令的指令指 ^ λ *雜、a丨巨集指令被預測為分支指令的情況下與各種相 i!::資訊,此資訊為分支預測的更正。 田*猫:目丨八ΐ例中.,與分支預測及更正相關的資訊包括: 預測八=ί i指令是否會被執行的分支歷史表資訊;用來 一立艾·曰7是否會被執行的分支指令的線性指令指標的 P刀’用來與前述性指令 或邏輯演算, 個相關 分支指令是否會被執ί二分支樣式;在分 八*二二ί的情況下用以回溯的第二分支樣式;各種說明 又炎11特徵的標誌位元,如··此分支指令是否為一條件 支4曰々 调用指令,一個返回堆憂的目標, 12829twf.ptd 第22頁 200414035200414035 V. Description of the invention (14) Formatted instruction queue 1 7 ·. Device 185 is a register at the end of the first clock cycle85. Temporarily store the eshift signal 164 and receive the number 168 at the bottom-Q control logic 1 1 8 to output the first clock cycle period. The instruction queue is formatted by a 1 shift letter i 8 7 and also includes the 1 ft signal 1 6 The value of 4. Control logic from FiQ at the end of the first clock cycle. Register 183 signal 1 62, and receive el〇ad [2: 0] 142 at the next 3 rr ### 18 to output the first-time wealth @f 1 through a H〇ad [2: 0] signal value. That is to say, the eloadr2 of the el0ad [2: 0] signal 162 that was received during the temporary crying period, Π 1 ^ 5 and 1 8 3 / do not send the e sh ift signal j 6 4 and el〇ad [2 ″ 0] $ 162 delays one clock cycle output. Answer: In the example, X — rel — inf 〇186 includes: the length of the macro instruction used to translate into pairs of two and two; one for this macro refers to ^ 疋 + JI, a cache line A description of the storage instruction of this macro instruction #: a current position of the 3 instruction; the instruction instruction of this macro instruction ^ λ * miscellaneous, a 丨 macro instruction is predicted to be a branch instruction with various phases i! :: Information, this information is a correction for branch prediction. Tian * cat: In the case of the eighth example, the information related to branch prediction and correction includes: Prediction of the branch history table information whether the i instruction will be executed; whether it will be executed if Li 7 The P knife of the linear instruction index of the branch instruction is used to calculate with the aforementioned sexual instruction or logic, whether the related branch instruction will be executed in the two branch style; in the case of eight * two two, it is used to backtrack the second Branch styles; various flag bits that explain the characteristics of the 11th, such as whether the branch instruction is a conditional branch, a call instruction, a target that returns a heap of worries, 12829twf.ptd page 22 200414035

五、發明說明(15) 分支,一個間接分支,及分支指令結果的預測是否為 1 預測器所做;相關BTAC 1 0 6所做預測的各種資訊,如1 ^ 選取位址181是否對應一個BTAC 106内部位址,此對廉§ W 址是否有效,分支指令被預測為執行或是不執行,被:^ 目標位址181選中的BTAC 106組合的最近使用的項目, 果指令的執行要求BTAC 1 06進行更新,應替換選中組’^" 哪一個項目,即BTAC 106輸出的目標位址。在—個實二的 中,X一rel_inf〇 186的一部分是在前一個時脈週期產1 的’並與在此巨集指令被由早期彳宁列1 3 2的項目£ e 〇通尚 e a r 1 y 0 k號1 9 3提供之後一個時脈週期產生的相關 β 起作為輸入。 胃訊~ 微處理器1 00在其X-stage 1 55内還包括一個刪除^ 1 4 5,其耦接至F I Q控制邏輯1 1 8。刪除佇列1 4 5儲存一\ * 控制邏輯1 0 2產生的刪除信號丨4 1。控制邏輯1 〇 2產生一固由 值為真的刪除信號1 4 1來說明早期佇列1 3 2在前—個护固 期所接收到的格式化後指令信號丨97所包含的的巨集1 不能被微處理器1 〇,〇執行。刪除佇列丨4 5包含與格式化曰後7 指令佇列1 8 7項目數量相同的項目。圖1顯示删除佇列勺人 3個項目’分別標記為κ e 2、 K E 1及K E 0,並與圖1所示才夂 式化後的指令佇列187項目相對應。ΚΕ0為刪除佇列的底^ 入口項目,K E 1為刪除佇列的中端項目,κ E 2為刪除佇列的 頂端入口項目。如圖4、5、6所示,KE〇的内容由輸出信號 k i 1 1 0 1 4 3提供。刪除佇列丨4 5接收丨丨oad [ 2 : 〇 ]信號丨4 1 s h i f t仏號1 6 8及e s h i f t信號1 6 4,用來控制刪除仔列1 4 5V. Description of the invention (15) Branch, an indirect branch, and whether the prediction of the branch instruction result is made by the 1 predictor; various information related to the prediction made by the BTAC 1 0 6 such as 1 ^ whether the selected address 181 corresponds to a BTAC Internal address of 106, whether this is valid § W address, whether the branch instruction is predicted to be executed or not executed, is: ^ The most recently used item of the BTAC 106 combination selected by the target address 181, if the execution of the instruction requires BTAC 1 06 To update, which item of the selected group '^ " should be replaced, that is, the destination address output by BTAC 106. In a real second, a part of X_rel_inf〇186 was produced 1 in the previous clock cycle, and the item in this macro instruction was listed by the early Suining 1 2 2 £ e 〇 通 尚 ear 1 y 0 k number 1 9 3 provides the relevant β from the next clock cycle as input. Weixin ~ The microprocessor 1 00 also includes a delete ^ 1 4 5 in its X-stage 1 55, which is coupled to the F I Q control logic 1 1 8. The delete queue 1 4 5 stores a delete signal generated by the control logic 1 0 2 丨 4 1. The control logic 1 〇 2 generates a delete signal 1 4 1 with a value of true to explain the early queue 1 3 2 and the formatted command signal received in the previous curing period. 1 cannot be executed by microprocessor 1 0. The delete queue 4 5 contains the same number of items as the formatted 7 command queue 1 8 7. Fig. 1 shows that three items of the delete queue are labeled κ e 2, K E 1 and K E 0, respectively, and correspond to the 187 items of the command queue after the formalization shown in Fig. 1. KE0 is the bottom entry of the deleted queue, K E 1 is the middle entry of the deleted queue, and κ E 2 is the top entry of the deleted queue. As shown in Figures 4, 5, and 6, the content of KE0 is provided by the output signal k i 1 1 0 1 4 3. Deletion queue 丨 4 5 Receive 丨 oad [2: 〇] signal 丨 4 1 s h i f t 仏 No. 1 6 8 and e s h i f t signal 1 6 4 are used to control deletion of queue 1 4 5

12829twf.ptd 第23頁 200414035 五、發明說明(16) • · * · ·· · 的載入及轉換。在以下對圖4、5、6的闡述中將進一步解 釋刪除佇列。 控制邏輯102根據從BTAC資訊194、predecode_info 196、F一 instr 一 info 198和當前指令指標182發現的不同情 況產生一個真值信號。一種情況是察覺B T A C 1 〇 6錯誤預測 一個分支指令。在一種實例中,B T A C 1 0 6因錯誤預測分支 指令的長度,如預測的指令長度不同於指令格式器1 1 6所 確認的長度,而造成對分支指令的錯誤預測。在一種實例 中,BTAC 1 0 6因錯誤預測一個普通指令為分支指令而造成 分支指令的錯誤預測,例如BTAC 1 0 6預測一條指令為分支 指令,而指令格式器1 1 6確認其非分支指令。在一種實例 中,B T A C 1 0 6因錯誤預測分支指令的位址而造成分支指令 的錯誤預測,例如所預測由B T A C 1 0 6輸出的指令偏移位元 與被BTAC 106用來做此預測的選取位址181的和不等於指 令格式器1 1 6所產生的指令位址1 8 2。 在一種實例中,當B T A C 1 0 6進行預測時,被錯誤預測 的指令與後續指令·必須被刪除;因此,控制邏輯1 〇 2針對 每一個需要被刪除的指令產生一個值為真的刪除信號 1 4 1。控制邏輯1 〇 2在指令被提供給指令格式器1 1 6之後的 —個時脈週期產生刪除信號1 4 1。另外,控制邏輯1 〇 2通過 —個作廢信號1 4 7來提供資訊以作廢產生錯誤預測的b τ A C 1 0 6的項目。當控制邏輯1 〇 2作廢做錯誤預測的β τ A C 1 0 6項 目後,控制邏輯102控制多工器178來選擇更正位址177以 便重新獲取被錯誤預測的指令及其後續指令,藉此以更正12829twf.ptd Page 23 200414035 V. Description of the invention (16) • · * · · · · Loading and conversion. Deletion queues will be further explained in the following explanation of Figures 4, 5, and 6. The control logic 102 generates a truth signal based on different conditions found from the BTAC information 194, predecode_info 196, F_instr_info 198, and the current instruction index 182. One situation is the detection of a B TA C 106 incorrectly predicting a branch instruction. In one example, B T A C 1 0 6 incorrectly predicts the branch instruction because the predicted instruction length is different from the length confirmed by the instruction formatter 1 16. In one example, BTAC 1 06 incorrectly predicts a branch instruction due to incorrect prediction of a common instruction as a branch instruction. For example, BTAC 1 0 6 predicts an instruction as a branch instruction, and the instruction formatter 1 1 6 confirms its non-branch instruction. . In one example, BTAC 106 is mispredicting a branch instruction due to incorrectly predicting the address of a branch instruction, such as the predicted instruction offset bit output by BTAC 106 and the BTAC 106 used to make this prediction. The sum of the selected address 181 is not equal to the instruction address 1 8 2 generated by the instruction formatter 1 1 6. In one example, when BTAC 106 is making predictions, the mispredicted instructions and subsequent instructions must be deleted; therefore, the control logic 10 generates a delete signal with a value of true for each instruction that needs to be deleted. 1 4 1. The control logic 1 2 generates a delete signal 1 4 1 one clock cycle after the instruction is supplied to the instruction formatter 1 1 6. In addition, the control logic 102 provides information through a void signal 1 4 7 to invalidate the items of b τ A C 1 0 6 which generate a wrong prediction. When the control logic 1 〇 2 invalidates the mispredicted β τ AC 1 0 6 item, the control logic 102 controls the multiplexer 178 to select the correction address 177 so as to reacquire the incorrectly predicted instruction and its subsequent instructions. correct

12829twf_ptd12829twf_ptd

第24頁 200414035 i、.蜂明螞明(17) : . * - · · . ····*· _ · · · •原先的錯誤預測。因為此時BTAC 106内做錯誤預到的項目· 已無效,BTAC 1 0 6不會再預測上次被錯誤預測的指令為被 執行的分支指令;因此,無論此指令是否為分支指令,它 均會被指令格式器1 1 6格式化,被指令轉譯器1 3 8轉譯, 並被微處理器管線1 〇 〇的執行階段執行。 另一種控制邏輯1 0 2產生真值刪除信號丨4 1的情況為, 控制邏輯102導致微處理器1〇〇採用一個由btac 1〇6回k其 所做之一分支指令將被執行的預測而產生的一個目標位^八 址。在此情況下,由指令快取記憶體丨〇 4取出,並傳τ送 指令位元組緩衝器丨12的分支指令的任何後續指令都必 =T ,此,控制邏輯102針對每一個需被刪除的指令 S; ϋ值“刪ΐ信號141。控制邏輯102在指令提供給指 i i ί之一個時脈週期產生此刪除信號"I。 個為一個兩勃-1 ^ 106預測此2個指令中的第一 個指令。而 支指令’控制邏輯1 〇2則會刪除第二 圖2展示了根據本發明的 列187的早期仵列一^所不的格式化後的指令仔 選擇-暫存器,個ΪΓνΛ圖。早期㈣^ 列。此3個選擇= ΓΛΓ順序相連形成-個仔 ΕΕ0。 释暫存裔包括圖1所示的項目ΕΕ2、ΕΕ1 、 工琴m列頂端的選擇-暫存器包括-個2輸入的多 工為21 2及一個暫存 2, 褕入的夕 仔态標記為ER2,用來接Page 24 200414035 i.. Fengmingmaming (17):. *-· · · ··· ** _ · · · • The original wrong prediction. Because the wrongly predicted item in BTAC 106 is invalid at this time, BTAC 1 0 6 will no longer predict the last mispredicted instruction as the executed branch instruction; therefore, it will be regardless of whether this instruction is a branch instruction or not. It will be formatted by the instruction formatter 116, translated by the instruction translator 138, and executed by the execution stage of the microprocessor pipeline 100. Another case in which the control logic 102 generates a true delete signal 丨 41 is that the control logic 102 causes the microprocessor 100 to use a prediction that one of the branch instructions made by btac 106 will be executed. And one target bit is generated. In this case, any subsequent instruction of the branch instruction fetched from the instruction cache memory and transmitted to the instruction byte buffer, 12 must be equal to T. Therefore, the control logic 102 needs to be The deleted instruction S; the value "deletion signal 141. The control logic 102 generates this deletion signal " I at a clock cycle where the instruction is provided to the finger ii. One is a two-boolean -1 ^ 106 to predict the two instructions The first instruction in the. And the support instruction 'control logic 1 02 will delete the second. Figure 2 shows the early queue of column 187 according to the present invention. Formatted instruction selection-temporary storage Device, a νΓνΛ diagram. Early ^^ column. These 3 choices = ΓΛΓ are connected in order to form-仔 Ε0. The temporary storage family includes the items Ε2, Ε1, and the selection at the top of the m-column shown in Figure 1-temporary storage. The device includes a multiplex with 2 inputs of 21 2 and a temporary storage of 2. The incoming state is marked as ER2, which is used to connect

200414035 耳、梦明說明(is) 收多工器212的輸出。多工.器212包括負栽資料輸入端.., 用來接收圖1所^的格式化後指令信號197。多工器212 包括一個巧持資料輸入端,用來接收暫存器ER2 222的輸 出。多工器212接收圖}所示el〇ad[2]信號162作為控制輸 入。如果eload[2] 162為真,多工器212將選中負載資 輸入端上的格式化後指令信號丨9 7 ;否則,多工器2丨2則 選中保持資料輸入端上的暫存器ER2 2 2 2的輸出。暫 ER 2 2 2 2在一個時脈週期(clk)2〇2的上 ; 21 2的輸出。 夕 早期佇列132中段的選擇—暫存器包括一個3輸入的多 工器211及一個暫存器221 ,此暫存器標記為ER1 ,用來接 收^工器2 1 1的輸出。多工器2丨丨包括負載資料輸入端, 用來^收格式化後指令信號丨97。多工器2丨丨還包括一個 ,持貧料輸入端,用來接收暫存器E R 1 2 2 1的輸出。多工 裔2 1 1還包括一個轉換資料輸入端,用來接收暫存器er2 2 2 2的輸出。多工器211接收圖}所示el〇ad[1]信號162作 為控制輸入。夕工·器2 1 1還接收圖1所示e s h i f ^信號1 β 4作 為^制#號。如果e 1 〇 a d [ 1 ] 1 6 2為真,多工器2 1 1選中負 ,貧料輸入端上的格式化後指令信號丨9 7 ;如果e s h i f t信 ^64為真,多工器211選中轉換資料輸入端上暫存器ER2 的。輸出;否則,多工器2 1 2則選中保持資料輸入端上的 暫存态ER1 221的輸出。暫存器ER1 221在一個時脈週期 elk 202的上升緣載入多工器211的輸出。 ’ 早期仔列132底端的選擇—暫存器包括一個3輸入的多200414035 Ear and Mengming instructions (is) receive the output of multiplexer 212. The multiplexer 212 includes a load data input terminal for receiving the formatted command signal 197 shown in FIG. 1. The multiplexer 212 includes a clever data input terminal for receiving the output of the register ER2 222. The multiplexer 212 receives the elOad [2] signal 162 shown in the figure as a control input. If eload [2] 162 is true, the multiplexer 212 will select the formatted command signal on the input terminal of the load 丨 9 7; otherwise, the multiplexer 2 丨 2 will select the temporary storage on the data input terminal. ER2 2 2 2 output. Temporary ER 2 2 2 2 on a clock cycle (clk) 202; 21 2 output. In the middle of the early queue 132, the register-register includes a 3-input multiplexer 211 and a register 221. This register is labeled ER1 and is used to receive the output of the multiplexer 2 1 1. The multiplexer 2 includes a load data input terminal for receiving the formatted command signal 97. The multiplexer 2 also includes an input terminal, which is used to receive the output of the register E R 1 2 2 1. The multiplex 2 1 1 also includes a conversion data input terminal for receiving the output of the register er2 2 2 2. The multiplexer 211 receives the elOad [1] signal 162 as shown in the figure as a control input. The evening machine 2 1 1 also receives the e s h i f ^ signal 1 β 4 shown in FIG. 1 as the ^ system #. If e 1 〇ad [1] 1 6 2 is true, and the multiplexer 2 1 1 selects negative, the formatted command signal on the lean input terminal 丨 9 7; if eshift signal ^ 64 is true, the multiplexer 211 Select the register ER2 on the input side of the conversion data. Output; otherwise, the multiplexer 2 1 2 selects the output of the temporary storage state ER1 221 on the data input terminal. The register ER1 221 is loaded into the output of the multiplexer 211 at a rising edge of the clock period elk 202. ’Early selection at the bottom of 132-the register includes a 3-input multi-

200414035 五.、發明說明(19) • · . · * . · ♦ . · . . · · 工器210及一個暫存器2 2 0,此暫存器標記為ER0,用來接 收多工器210的輸出。多工器21〇包括負載資料輸入端, 用來接收格式化後指令信號1 9 7。多工器2 1 0還包括一個 保持資料輸入端,用來接收暫存器ER〇 220的輸出。多工 器2 1 0還包括一個轉換資料輸入端,用來接收暫存器E R 1 221的輸出。多工器210接收圖1所示ei〇ad[〇]信號162作 為控制輸入。多工器2 1 0還接收圖1所示e s h i f t信號1 6 4作 為控制信號。如果e 1 0 a d [ 0 ] 1 6 2為真,則多工器2 1 〇選中 負載貢料輸入端上的格式化後指令信號丨9 7 ;如果e s h丨f t =號1 6 4為真,則多工器2丨〇選中轉換資料輸入端上暫存 器ER 1 2 2 1的輸出;否則,多工器2丨2則選中保持資料輸 入端上的暫存器ER〇 22〇的輸出。暫存器心〇 220在一個時 pi η週Γ/γ/上2〇2的上升緣載人多卫器210的輸出。暫存器 ER 0 2 2 0將結果作為early〇信號193輸出。 ^示了根據本發明的圖1所示的格式化後的指令佇 個暫广、ί期佇列1 46的結構示意圖。晚期佇列1 46包括3 rf心;多器順序相連形成一㈣ LE()。 仔夕工裔包括圖1所不的項目LE2、LE1 、 工器LW列/j6頂,的暫存—多工器包括一個2輸入的多 收^工存器322,此暫存器標記為1^2,用來接 收夕工為312的輪出。多工器312包 徊咨哉次牴於入 端,用來接收圊1阱-γ f <匕括一個負載貝枓輸入 一個保持ΪΪΪ山不-Γ —lnf〇 186。多工器312還包括 、、广輸入端,用來接收暫存器LR2 322的輸出。200414035 V. Description of the invention (19) • · · · *. · ♦. ·.. · · Worker 210 and a register 2 2 0, this register is marked as ER0 and used to receive the multiplexer 210 Output. The multiplexer 21 includes a load data input terminal for receiving the formatted command signal 197. The multiplexer 2 10 also includes a holding data input terminal for receiving the output of the register ER 220. The multiplexer 2 10 also includes a conversion data input terminal for receiving the output of the register E R 221. The multiplexer 210 receives the ei0ad [〇] signal 162 shown in FIG. 1 as a control input. The multiplexer 2 1 0 also receives the e s h i f t signal 1 6 4 shown in FIG. 1 as a control signal. If e 1 0 ad [0] 1 6 2 is true, the multiplexer 2 1 〇 selects the formatted command signal on the load input terminal 丨 9 7; if esh 丨 ft = No. 1 6 4 is true , The multiplexer 2 丨 〇 selects the output of the register ER 1 2 2 1 on the input side of the conversion data; otherwise, the multiplexer 2 丨 2 selects the register ER〇22 on the hold data input end. Output. The register heart 〇 220 at a time pi η week Γ / γ / on the rising edge of 002 carries the output of the multi-body 210. The register ER 0 2 2 0 outputs the result as an early signal 193. ^ Shows a schematic structural diagram of a temporarily-formatted, long-term queue 146 according to the formatted instruction shown in FIG. 1 according to the present invention. Late queuing 146 includes 3 rf heart; multiple organs are connected in sequence to form a ㈣ LE (). Zai Xi workers include the temporary storage of items LE2, LE1, and worker LW column / j6 top, which are not shown in Figure 1. The multiplexer includes a 2-input multi-receiving ^ work register 322. This register is labeled ^ 2, used to receive the rotation of 312. The multiplexer 312 packet is received at the input terminal, and is used to receive the 1-well -γ f < input of a load, and the input of a hold-up signal. The multiplexer 312 also includes input terminals 广 and 广, for receiving the output of the register LR2 322.

第27頁 200414035 五、·發明說叼(20) 多工器312接收lload[2]信號142作為控制輸入。如果· lload[2] 142值為真’則多工器312選中負載資料輸入 上的X 一 ref 一 info 186 ;否則多工器312選中保持資料 端上的暫存器LR2 322的輸出。暫存器!^2 322在圖 示的時脈週期elk 202的上升緣載入多工器312的輸 晚期仵列146中段的暫存-多工器包括一個3輸入。 工器31 1及一個暫存器321 ,此暫存器標記為LR1 ,用 收多工器311的輸出。多工器311包括一個負載資料 端’用來接收圖1所示乂 — 1^_11^〇186。多、工器3 11^勺 一個保持資料輸入端,用來接收暫存器LR1 321的出匕。括 多工器311還包括一個轉換資料輸入端, 写 LR2 322的輸出。吝工立必〗〗 仔為 別私X / .夕工為311接收U〇ad[l]信號142作為控 :Ϊί ]。/「工/3U還接收1SMft信號168作為控制輸人。 值為真,則多卫器311選中負載資料輸 hrTe ;lnf0 I86;如果1shift 168值為真,則多 料二的輸出;否則多工器311選十保持資 料輸入知上的暫存.器LR1 321的輸出。暫存器lri 321在圖 出值。不日、脈週期clk 202的上升緣載入多工器311的輸 工哭底端的暫存一多工器包括一個3輸入的多 收ί=二:1存器3 2 0 ?匕暫存器標記為LR〇,用來接 端,用k接此、网?出。多工裔31 〇包括一個負載資料輸入 ^X"\f-ln^ 186 ° 輸入端,用來接收暫存器LR()32〇的輸出。Page 27 200414035 V. The invention says (20) The multiplexer 312 receives the lload [2] signal 142 as a control input. If lload [2] 142 is true, multiplexer 312 selects X_ref_info 186 on the input of load data; otherwise, multiplexer 312 selects the output of register LR2 322 on the hold data side. Register! ^ 2 322 is loaded into the output of multiplexer 312 at the rising edge of the illustrated clock cycle elk 202. The register of the middle stage of the late queue 146 includes a 3-input. The register 31 1 and a register 321 are marked as LR1 and are used to receive the output of the multiplexer 311. The multiplexer 311 includes a load data terminal 'for receiving 乂 — 1 ^ _11 ^ 〇186 shown in FIG. Multi-tool 3 11 ^ spoon A holding data input terminal, used to receive the output of the register LR1 321. The multiplexer 311 also includes a conversion data input terminal to write the output of the LR2 322.吝 工 立 必〗 〖Aberdeen for Be Private X /. Xi Gong for 311 receives the U〇ad [l] signal 142 as a control: Ϊί]. / "Industrial / 3U also receives 1SMft signal 168 as the control input. If the value is true, the multi-guard 311 selects the load data to input hrTe; lnf0 I86; if the value of 1shift 168 is true, the output of the second material is more; otherwise The worker 311 selects ten temporary storage on the input data. The output of the register LR1 321. The register lri 321 shows the value in the figure. The rising edge of the clock cycle clk 202 is loaded into the output of the multiplexer 311. A multiplexer at the bottom includes a three-input multi-receiver = 2: 1 register 3 2 0? The dagger register is marked as LR0, which is used for termination, and k is used to connect to this and the network output. Multi The worker 31 〇 includes a load data input ^ X " \ f-ln ^ 186 ° input terminal, which is used to receive the output of the register LR () 32〇.

12829twf.ptd12829twf.ptd

第28頁 200414035 五、發明說明(21) '· · . · . · ... 多工器310還包括一個轉換資料輸入端,用來接收暫存器. LR1 321的輸出。多工器310接收lload[0]信號142作為°控 制輸入。多工器3 1 0還接收1 s h i f t信號1 6 8作為控制輸入。 如果1 1 oad [ 0] 142值為真,則多工器310選中負載資料輸 入端上的X —ref —info 186 ;如果lshift 168值為真、,則多 工器3 1 0選中L R 1 3 2 1的輸出;否則多工器3 1 〇選中保持資 料輸入端上的暫存器LR0 320的輸出。暫存器LR1 320在圖 2中所示的時脈週期clk 2〇2的上升緣載入多工器31〇的輸 出值。多工器3 1 0將結果作為圖1中1 a t e 〇信號j g 1輸出。 —圖4展示依據本發明的圖}中所示刪除佇列丨4 5的第一 個實例的結構示意圖。圖4中刪除佇列實例的結構類似於 ,3中晚期佇列1 4 6的結構。刪除佇列包括3個暫存—多工 3。ji固暫存-多工器順序連接,構成一個佇列。3個暫存一 多工器包括了圖1所示的項目KE2、KE1*KE()。 删除仔列1 4 5頂端的暫存一 工器412及一個暫存器422,暫 多工器412的輸出。多工器412 用來接收圖1所示刪除信號丨4 1 持資料輸入端,用來接收暫存 4 1 2接收1 1 〇 a d [ 2 ]信號1 4 2作為 142值為真,多工器412選中負 1 4 1 ;否則多工器4 1 2選中保持 4 2 2的輸出。暫存器KR 2 4 2 2在 202的上升緣載入多工器412的 多工器包括一個2輸入的多 存^§ 4示3己為K R 2 ’用來接收 包括一個負載資料輸入端, 。多工器4 1 2還包括一個保 器KR2 422的輸出。多工器 控制k號。如果1 1 〇 a d [ 2 ] 載資料輸入端上的刪除信號 資料輸入端上的暫存器KR2 圖2中所示的時脈週期c i让 輸出值。 200414035 五、發明說明(22) * -一-----:-— 刪除佇列1 4 5 +段的暫存-多·工哭、, 卫器411及一個暫存器421 ,此括一個3輸入的多. 收多工器411的輸出。多工器411包括二=為以匕,用來接 端,用來接收刪除信號丨4 !。多工器4…^貧料輸入 料輸入端,用來接收暫存器KR1 42\的還 ''一個H ; 包括一個轉換資斜於Λ六山 田十1 ’出 夕工為41 1選 出。多工哭41、1、桩^〗^「’用來接收暫存器KR 2 4 2 2的輸 ί 信號142作為控制輸人。多工 14 = ί ;=號1⑼作為控制輪…果11 —[ 1 ] 如果lshlft 168值為直負載多貝工枓^端士的刪除信號 輸出;否則多工器⑴選中保持“;;4入11山選中T 4”的 421的輸出。•存器KR1 421在圖2中〗- 的暫存“R1 川的上升緣載人多卫器411的=中值所不的時脈週期川 删除佇列1 4 5底端的暫存—多工哭a 工器4 1 0及-個暫存器4 2 0,此暫存;f :::輸入?多 收多工器41〇的輸出。多工器41〇以:=?用來接 端,用來接收刪除.信號141。多工料輸入次 料輸入端,用來接收暫存器KR〇 42〇的還匕。夕個^2二 包括一個轉換資斜卜 山 出 夕工為410還 出。多工写41'0、來接收暫存器〇1421的輸 器41 〇還接二h +二〇 =[]信號14 2作為控制輸入。多工 口口吁丄U艰接收1 s h 1 f t信號1 β η你氩 142值為直,多工控制輸人。如果Uoadm “1 ;如果lshift lfi8 中亩負載資料輸人端上的刪除信號 k甲保符貝枓輸入端上的暫存器〇()Page 28 200414035 V. Description of the Invention (21) '· · · · · · ... The multiplexer 310 also includes a conversion data input terminal for receiving the output of the temporary register. LR1 321. The multiplexer 310 receives a lload [0] signal 142 as a control input. The multiplexer 3 1 0 also receives 1 s h i f t signal 1 6 8 as a control input. If the value of 1 1 oad [0] 142 is true, the multiplexer 310 selects X —ref —info 186 on the input of the load data; if the value of lshift 168 is true, the multiplexer 3 1 0 selects LR 1 3 2 1 output; otherwise, the multiplexer 3 1 〇 selects to hold the output of the register LR0 320 on the data input end. The register LR1 320 loads the output value of the multiplexer 310 at the rising edge of the clock period clk 2202 shown in FIG. 2. The multiplexer 3 1 0 outputs the result as the 1 a t e 〇 signal j g 1 in FIG. 1. — FIG. 4 shows a structural diagram of the first example of the deletion queue 415 shown in FIG. 5 according to the present invention. The structure of the deleted queue example in Figure 4 is similar to that of 3, the middle and late queues 1 4.6. Deletion queue includes 3 temporary storage-multiplexing 3. JI solid temporary storage-multiplexers are connected in sequence to form a queue. The three temporary storage multiplexers include the items KE2, KE1 * KE () shown in Figure 1. Delete the temporary multiplexer 412 and the temporary register 422 at the top of the rows 1 4 5 and the output of the temporary multiplexer 412. The multiplexer 412 is used to receive the deletion signal shown in FIG. 1 丨 4 1 is a data input terminal for receiving temporary storage 4 1 2 receives 1 1 〇ad [2] signal 1 4 2 as 142 is true, the multiplexer 412 selects negative 1 4 1; otherwise multiplexer 4 1 2 selects and maintains the output of 4 2 2. The register KR 2 4 2 2 is loaded into the multiplexer 412 at the rising edge of 202. The multiplexer includes a 2-input multi-storage ^ § 4 shows that 3 is KR 2 'for receiving and including a load data input terminal, . The multiplexer 4 1 2 also includes the output of a protector KR2 422. Multiplexer Controls k number. If 1 1 〇 a d [2] contains the delete signal on the data input terminal, the register KR2 on the data input terminal clock cycle c i shown in Figure 2 let the output value. 200414035 V. Description of the invention (22) *-一 -----: -— Delete the temporary storage of the queue 1 4 5 + paragraphs-multi-work cry, health device 411 and a temporary storage device 421, including one 3 input more. Receive the output of multiplexer 411. The multiplexer 411 includes two terminals, which are used for termination and used to receive deletion signals. Multiplexer 4 ... ^ lean input The input is used to receive the register KR1 42 \ and return an H; it includes a conversion data inclined to Λ Liushan Tian 10 1 ′ and Xi Gong is selected for 41 1. Multiplex cry 41, 1, pile ^〗 ^ "'Used to receive the input signal 142 of the register KR 2 4 2 2 as the control input. Multiplex 14 = ί; = No. 1⑼ as the control wheel ... Fruit 11 — [1] If the value of lshlft 168 is the output signal of the delete signal of the direct load multi-duplexer; otherwise, the multiplexer must be selected to keep the output of 421; 4 into 11 and select T 4. • Register KR1 421 is shown in Figure 2--"R1 Chuan's rising edge manned multi-monitor 411 = the median period of the clock cycle Chuan delete queue 1 4 5 bottom of the temporary storage-multiplex cry a work Register 4 1 0 and a temporary register 4 2 0, this temporary storage; f ::: input? The output of the multiplexer 41 is received. The multiplexer 41 is terminated with: =? And used to receive the delete signal 141. Multi-material input secondary input terminal, used to receive the return from the register KR〇 42〇. Xi Ge ^ 22 2 Including a conversion of the assets to the mountain, Xi Gong returned for 410. The multiplexer writes 41'0, the receiver 41 which receives the register 01421, and also receives two h + 20 = [] signals 14 2 as control inputs. Multiplex Mouth urges U to receive 1 s h 1 f t signal 1 β η argon 142 value is straight, multiplex control input. If Uoadm is "1; if lshift lfi8 the acre load data is input to the delete signal on the input terminal.

200414035 i、曼明說明(23) =:Ϊ㈡存,〇42〇在圖2中所示的時脈週期以 的上升緣載入多工器41〇的輸出值。多工器4« 作為圖1中kill〇信號;143輸出。 、…果 圖5展示依據本發明的圖!中所示刪 ^例的結構示意圖。刪除仵列Μ包括三個選擇 i::=广器?皮此相連構成-個佇列。此三個選IV 存斋匕括圖1所是的項目KE2、KE1和KE0。 暫 刪除佇列145頂端的選擇—暫存器包括一 ,器5」2及一個暫存器5 2 2,暫存器標記為 夕工益5 1 2的輸出。多工器5 j 2包括一個負載資收 圖1所示刪除信號141。多工器512還包括::伴 5!2接收圖!所示的11〇ad[2]信號14 夕=器200414035 i. Manmin's explanation (23) =: save, 〇42〇 Load the output value of multiplexer 410 with the rising edge of the clock cycle shown in Figure 2. The multiplexer 4 «is output as the kill0 signal in Fig. 1; ... Fruit Fig. 5 shows a diagram according to the invention! Schematic diagram of the deleted example shown in. Deleting the queue M includes three choices i :: = wideband? This is connected to form a queue. These three choices are stored in items IV, KE2, KE1, and KE0. Temporarily delete the selection at the top of queue 145-the register includes one, register 5 ″ 2 and one register 5 2 2 and the register is marked as the output of Xi Gongyi 5 1 2. The multiplexer 5 j 2 includes a load signal 141 shown in FIG. The multiplexer 512 also includes: receiving picture with 5! 2! Shown 11〇ad [2] signal 14 xi = device

Hoacim 142值為真,多工器512選果 的刪除信號141 ;否則多工器512選中保負載料4 1入,上 f 2器KR2 5 2 2的輸出。暫存器KR2 5 22在二個t = = t的 載入多工器512的輸出值,此時脈週期標記為elk 刪除佇列145中段的選擇-暫存器包括_個3 多工器川的輸出。多卫器511包括一個 用來接收 用來接收刪除信號1 4 1。多工5 n、罗4 k 貝枓輸入端, 人端,用來接收暫存器KR1 521的輸出匕一個二持輪 -個轉換資料輸入端,用來接收暫存器KR2二1 輸還出包括The value of Hoacim 142 is true, and the multiplexer 512 selects the delete signal 141; otherwise, the multiplexer 512 selects the load protection material 4 1 input, and the output of the f 2 device KR2 5 2 2 is selected. The register KR2 5 22 is loaded with the output values of the two multiplexers 512 at t = = t. At this time, the pulse period is marked as elk. Delete the middle selection of queue 145. The register includes _ 3 multiplexer channels. Output. Multiplexer 511 includes a receiver 1 4 1 for receiving a delete signal. Multiplex 5 n and 4 k input terminals, human terminal, used to receive the output of the register KR1 521, a two-wheeled wheel-a conversion data input, used to receive the register KR2 21, 1 return and output include

200414035 五、.發明說明.(24) • * · * .. * β - 攀 多工器5 11接收圖1所示的U〇ad[i]信號142作為控制信 號。多工器511還接收圖1所示1 shift信號168作為控制輸 入。如果lload[l] 142值為真,多工器511選中負載資料 輸入端上的刪除信號141 ;如果1 shift信號168值為真, 多工器511選中KR2 522的輸出;否則多工器hi選^保持 資料輸入端上的暫存器KR1 521的輸出。暫存gKRi 52'丨在 時脈週期elk 2 0 2的上升緣載入多工器51 1的輸1出值。 。删除仔列145底端的選擇—暫存器包括一個2輸入的多 工器510 、一個暫存器520標記為KR0 ,用來接收多工器51〇 的輸出,以及一個2輸入的多工器509。多工器5〇9包^ 一 個負載資料輸入端,用來接收刪除信號丨4 i。多工器5 〇 9還 包括一個,持貧料輸入端,用來接收暫存器KR0 520的輸 出。多工器5 0 9接收圖1所示的11〇ad[〇]信號142作為控制 信號。如果11 〇ad[0] 142值為真,多工器5〇9選中負載資 料輸入端上的刪除信號141 ;否則多工器5〇9選中保持資料 輸士端上的暫存器KR0 5 2 〇的輸出。多工器51〇包括一個保 持貧料輸入端,用,來接收多工器5〇9的輸出,此輸出即是” =所143。多工器510還包括一個轉換資料輸 器511的輸出。多工器51〇接收 為控制輸人。如果eshift信號m值為真 工考r 1 η、$由泣i枓輸知上的夕工态5 11的輸出;否則多 抑kp n q =户保持資料輸入端上的多工器5 0 9的輪出。暫存 輸出值。 了脈遇』cik 202的上升緣載入多工器51〇的200414035 V. Description of the invention. (24) • * · * .. * β-multiplexer 5 11 receives the Uoad [i] signal 142 shown in FIG. 1 as a control signal. The multiplexer 511 also receives a 1 shift signal 168 shown in FIG. 1 as a control input. If the value of lload [l] 142 is true, the multiplexer 511 selects the delete signal 141 on the input of the load data; if the value of the 1 shift signal 168 is true, the multiplexer 511 selects the output of the KR2 522; otherwise, the multiplexer Hi select ^ to maintain the output of the register KR1 521 on the data input. The temporarily stored gKRi 52 '丨 is loaded into the output value of the multiplexer 51 1 at the rising edge of the clock cycle elk 2 0 2. . Delete the selection at the bottom of column 145-the register includes a 2-input multiplexer 510, a register 520 labeled KR0, which is used to receive the output of the multiplexer 51, and a 2-input multiplexer 509 . The multiplexer 509 includes a load data input terminal for receiving a delete signal. The multiplexer 509 also includes a lean input terminal for receiving the output of the register KR0 520. The multiplexer 509 receives the 11oad [0] signal 142 shown in FIG. 1 as a control signal. If the value of 11 〇ad [0] 142 is true, the multiplexer 509 selects the delete signal 141 on the input side of the load data; otherwise, the multiplexer 509 selects the temporary register KR0 on the data input side. 5 2 〇 output. The multiplexer 5110 includes a lean input terminal for receiving the output of the multiplexer 509. This output is "= 143. The multiplexer 510 also includes the output of a conversion data input device 511. The multiplexer 51 is received as a control input. If the eshift signal m is the output of the real working test r 1 η, $ 由 枓 知 工 5 5 11; otherwise, multiplying kp nq = household holding data The output of the multiplexer 509 on the input side. The output value is temporarily stored. The rising edge of cik 202 is loaded into the multiplexer 51.

200414035 五、發明說明(25) .. . ‘ · ·· • * . * * · · · · . · 圖6展示依據本發明的圖1中所示刪除佇列丨4 5的苐三 個實例的結構示意圖。圖中刪除仵列1 4 5與圖5中刪除仔列 1 4 5相類似’並且對應元件也都標以類似序號。圖6所示刪 除佇列與圖5所示值不同之處在於以下幾點。圖6中刪除符 列1 4 5的輸入K E 0也包括四個邏輯閘··一個反相器6 〇 2、二 個2輸入的及閘6 0 4和6 0 6、及一個2輸入的或閘6 〇 8。反相 器602接收11 oad[0]信號142,並將其輸出提供給一個及閘 6 0 4。及閘6 0 4接收暫存器KR 〇 5 2 0的輸出並將其作為第二 個輸入。及閘6 0 6接收1 l〇ad[ 0 ]信號142作為其一個輸入, 同日守接收刪除h號1 4 1作為其另一個輸入。兩個及閘6 〇 4及 6_〇 6的輸出作為或閘6 〇 8的輸入。或閘6 〇 8的結果作為圖1所 示刪除佇列1 4 5的k i 1 1 〇信號1 4 3輸出,而不是圖5中所示刪 除佇列145中多工器509的輸出。 圖7展示了 F I Q控制邏輯1 1 8内產生根據本發明圖i所示 F —valid信號188之邏輯的結構示意圖。此邏輯包括一個反 相器7 1 2及一個2輸入的及閘7丨4。反相器7丨2接收圖1所示 k 1 1 1 0信號1 4 3,並將其輸出提供給及閘7丨4作為其一個輸 入。及閘7 1 4的另一個輸入為圖1所示格式化後的指令佇列 187的有效位元FV〇 134。因此,有效位元FV〇 U4經^ u〇 信號限定,以使得XIQ控制邏輯156能得知通過early〇信號 1 93提供給指令轉譯器丨38的指令為無效指令,如:被刪除 指令。 圖8展示了根據本發明圖1中所示的微處理器1 〇 〇指令 刪除裝置運作原理的流程圖。流程從區塊8 0 2開始。200414035 V. Description of the invention (25)... '· · · • *. * * · · · · · · · Figure 6 shows three examples of deleting the queue 4 4 5 shown in Figure 1 according to the present invention. Schematic. Deletion of queue 1 4 5 in the figure is similar to deletion of queue 1 1 4 5 in FIG. 5 and corresponding components are also marked with similar numbers. The deletion queue shown in FIG. 6 differs from the value shown in FIG. 5 in the following points. The input KE 0 of the delete column 1 4 5 in FIG. 6 also includes four logic gates. One inverter 6 and two two-input AND gates 6 0 4 and 6 0 6 and one two-input OR Gate 6 〇8. The inverter 602 receives the 11 oad [0] signal 142 and supplies its output to an AND gate 604. The AND gate 604 receives the output of the register KR 0 522 and uses it as the second input. And gate 6 0 6 receives 1 l0ad [0] signal 142 as one of its inputs, and the same day Shou receives delete number 1 4 1 as its other input. The outputs of the two AND gates 604 and 6_06 are used as the inputs of the OR gate 608. The result of the OR gate 6 08 is output as the k i 1 1 0 signal 1 4 3 of the delete queue 145 shown in FIG. 1, instead of the output of the multiplexer 509 in the delete queue 145 shown in FIG. FIG. 7 shows a schematic diagram of the logic for generating the F-valid signal 188 shown in FIG. This logic includes an inverter 7 1 2 and a 2-input AND gate 7 丨 4. The inverter 7 丨 2 receives the k 1 1 1 0 signal 1 4 3 shown in FIG. 1 and supplies its output to the AND gate 7 丨 4 as one of its inputs. The other input of the AND gate 7 1 4 is the valid bit FV 134 of the formatted instruction queue 187 shown in FIG. 1. Therefore, the valid bits FV〇 U4 are limited by the ^ u〇 signal, so that the XIQ control logic 156 can know that the instruction provided to the instruction translator 38 through the early signal 193 is an invalid instruction, such as a deleted instruction. FIG. 8 shows a flowchart of the operation principle of the microprocessor 100 instruction deletion device shown in FIG. 1 according to the present invention. The process starts at block 802.

12829twf.ptd 第33頁 200414035 五、發明說明(26) • * · · · · · · · · · - , . ··..., 在區塊8 0 2中,圖1所示之指令格式器1 1 6’將指令位元12829twf.ptd Page 33 200414035 V. Description of the invention (26) • * · · · · · · · · · · · ····, In block 802, the instruction formatter shown in Figure 1 1 1 6 'Set the instruction bit

組緩衝器1 1 2内的一條指令格式化,格式化後的指令由F I Q 控制邏輯1 1 8載入早期佇列1 3 2。特別的,F I Q控制邏輯 1 1 8將格式化後的指令載入早期佇列1 3 2最低端的無效的項 目中。在一種實例中,區塊8 0 2在第一時脈週期内發生, 在圖8中標記為c 1 〇 c k 1 。流程前進到區塊8 0 4。 在區塊8 0 4中,圖1所示控制邏輯1 02產生一個真值於 圖1所示的刪除信號1 4 1上,以此來說明在前一個時脈週期 時載入早期佇列1 3 2的指令必須被刪除。在一種實例中, 區塊8 0 4在時脈週期1的下一個時脈週期發生,在圖8中標 記為c 1 〇 c k 2。流程前進到8 0 6。 在區塊8 0 6中,刪除佇列將clock 2中產生的删除信號 1 4 1的值載入。此值被載入刪除佇列最低端的無效項目 中。流程前進到判定功能區塊8 0 8。 在判定功能區塊8 0 8中,判斷條件為在區塊8 0 2中載 入格式化後指令佇列1 8 7的指令,例如需要被刪除的指 令,是否位在格式.化後指令佇列1 87的最低端項目中。如 果此指令在格式化後指令佇列1 8 7的最低端項目中,則流 程前進到判定功能區塊8 1 2 ;如果不是,則流程前進到區 塊 8 1 8。 在判定功能區塊8 1 2中,判斷條件為刪除信號1 4 1值 是否非真。如果為真,流程前進到區塊81 4,否則,流程 前進到區塊8 1 6。 在區塊814中,將產生一個值為真的圖1中所示的An instruction in the group buffer 1 1 2 is formatted, and the formatted instruction is loaded into the early queue 1 3 2 by the F I Q control logic 1 1 8. In particular, the F I Q control logic 1 8 loads the formatted instructions into the invalid items at the lowest end of the early queue 1 2 2. In one example, block 802 occurs during the first clock cycle and is labeled c 1 0 c k 1 in FIG. 8. The flow advances to block 804. In block 804, the control logic 1 02 shown in FIG. 1 generates a true value on the delete signal 1 4 1 shown in FIG. 1 to explain that the early queue 1 is loaded in the previous clock cycle. 3 The 2 instruction must be deleted. In one example, block 804 occurs at the next clock cycle of clock cycle 1 and is labeled c 1 0 c k 2 in FIG. 8. The flow advances to 806. In block 806, the delete queue loads the value of the delete signal 1 4 1 generated in clock 2. This value is loaded into the invalid item at the lowest end of the delete queue. The flow advances to the decision function block 808. In the judgment function block 808, the judgment condition is to load the formatted instruction queue 1 187 in the block 802, for example, whether the instruction to be deleted is located in the format. After the instruction 伫Column 1 of the bottom 87 items. If this instruction is in the lowest end item of the formatted instruction queue 1 8 7, the flow advances to the decision function block 8 1 2; if not, the flow advances to block 8 1 8. In the judgment function block 8 1 2, the judgment condition is whether the value of the deletion signal 1 4 1 is not true. If true, the flow advances to block 81 4; otherwise, the flow advances to block 8 1 6. In block 814, a value of true will be generated as shown in Figure 1.

12829twf.ptd 第34頁 200414035 五、發明說明(27) • - . · :. · 、 · . · ... k i 1 1 0信號1 43,通過對F I Q有效位元FV0 1 34的限定來產生 一個值為假的圖1所示F - v a 1 i d信號1 8 8,並以此來實現指 令的刪除。流程在區塊8 1 4結束。 3 而在區塊816中,將產生一個值為假的圖1中所示的 killO信號143 ;因此,如果FV0 134為真,則F —valid 188 也為真。流私在區塊8 1 6結束。在'一個實例中,從區塊 8 0 4到區塊8 1 6全部在第二時脈週期發生。 在區塊8 1 8,格式化後指令佇列1 8 7及刪除佇列丨4 5向 下移一個項目。流程前進到判定功能區塊8 2 2。 在判定功能區塊8 2 2中,判斷條件為在區塊8 〇 2中所 載入之格式化後指令仔列1 8 7的指令,例如需要被刪除的 指令’是否位在格式化後指令佇列1 8 7的最低端項目。如 果是,流程前進到判定功能區塊8 2 4 ;否則,流程返回區 塊 8 1 8。 在判定功能區塊8 2 4,判斷條件為刪除佇列的最低端 項目是否為真。如果是,流程前進到區塊8 2 6 ;否則,流 程前進到區塊8 2 8。 在區塊826,將產生一個值為真的圖1所示的kill(Mf 號143 ’通過對FIQ有效位元fv〇 134的限定來產生一個值 為假的圖1所示F一val id信號1 88,並以此來實現指令的删 除。流程在區塊8 2 6結束。 在區塊8 2 8,將產生一個值為假的k丨丨丨〇信號丨4 3 ;因 此,如果FVO 134為真,則F — Valid 188也為真。流程在區 塊8 2 8結束。在一個實例中,每個從區塊8 1 8到區塊8 2 812829twf.ptd Page 34 200414035 V. Description of the invention (27) •-. ·:. · · · · · · · Ki 1 1 0 signal 1 43 to generate a FIQ valid bit FV0 1 34 The F-va 1 id signal 1 8 8 shown in FIG. 1 with a value of false is used to implement the deletion of the instruction. The process ends at block 8 1 4. 3 In block 816, a killO signal 143 shown in FIG. 1 with a value of false will be generated; therefore, if FV0 134 is true, then F_valid 188 is also true. The smuggling ends at block 8 1 6. In one example, all from block 804 to block 8 1 6 occur in the second clock cycle. In block 8 1 18, the formatted command queue 1 8 7 and delete queue 丨 4 5 move one item down. The flow advances to the decision function block 8 2 2. In the judgment function block 8 2 2, the judgment condition is that the formatted instructions loaded in the block 8 02 are listed as 1 8 7 instructions, for example, whether the instruction to be deleted 'is located in the formatted instruction. Queue the lowest end items of 1 8 7. If so, the flow advances to decision function block 8 2 4; otherwise, the flow returns to block 8 1 8. In the judgment function block 8 2 4, the judgment condition is whether the lowest end item of the deletion queue is true. If yes, the flow advances to block 8 2 6; otherwise, the flow advances to block 8 2 8. At block 826, a kill (Mf number 143 'shown in FIG. 1 with a value of true will be generated by limiting the FIQ valid bit fv〇134 to generate a false F_val id signal shown in FIG. 1 1 88, and use this to implement the instruction deletion. The process ends at block 8 2 6. At block 8 2 8, a false k 丨 丨 丨 〇 signal 4 4 will be generated; therefore, if FVO 134 If true, then F — Valid 188 is also true. The process ends at block 8 2 8. In one example, each from block 8 1 8 to block 8 2 8

12829twf.ptd 第35頁 200414035 五、發明說明(28). . 的迴圈都發生在clock 2相鄰的下_個時脈週期,.纪.. clock 3 ’或再下一個相鄰的時脈週期,直 $除的 指令轉移到格式化後指令符列187的最低端項目除的 圖9為說明依據本發明之圖丨所示指令刪除裝置工 理的時序圖。圖9顯示了 5個時脈週期,每個時脈週期由圖 真值用逆輯尚準位來表示。圖9展示了如下一種各 指令一格式器11 6產生一個新的格式化後的巨集指令時,田圖 1所^不的X I Q 1 5 4狀態為不滿,例如χ丨Q i 5 4可以從指令 譯器1 3 8接收巨集指令;格式化後指令佇列丨8 7為空。另 =,在圖9的實例中,當指令轉譯器138轉譯early〇 193所 3的格式化後的巨集指令並產生新的微指令丨7 1時,X工〇 為空。因此,XIQ控制邏輯156以χ —以丨η信號148來提 -—valid信號188的值,而不是象圖g所示,將F_valid 188儲存為有效位元^149。 - 如圖中所示,在第一時脈週期1内,指令格式器丨i 6 產生 個值為真的,圖1中所示F — new_instr信號152,來說 月,1中的格式化後指令信號1 9 7包含一個有效的新的格式 化後的巨集指令。因為格式化後指令佇列1 8 7為空,圖1中 FIQ控制邏輯118產生一個值為真的el〇ad[〇]信號162,以 ^ f格式化後指令信號197的有效的新的格式化後的巨集 & 7,入袼式化後指令佇列1 8 7最底層的空項目E E 0。如圖 中所不’在同一個實例内,刪除信號1 4 1 ,k i 1 1 0信號 143 ’Invalid 188,X —valid 148 及有效位元 RV 189 均為12829twf.ptd Page 35 200414035 V. Description of the invention (28). The loops all occur at the next _ clock cycle next to clock 2.... Clock 3 'or the next adjacent clock In the cycle, the division instruction is transferred to the lowest end item of the formatted instruction symbol 187. FIG. 9 is a timing diagram illustrating the operation of the instruction deletion device shown in FIG. Figure 9 shows five clock cycles. Each clock cycle is represented by the true value of the graph in the inverse series level. Figure 9 shows the following when each instruction-formatter 116 generates a new formatted macro instruction. The state of XIQ 1 5 4 in Tian Tu 1 is dissatisfied. For example, χ 丨 Q i 5 4 can be changed from The instruction interpreter 1 3 8 receives the macro instruction; the formatted instruction queue 丨 8 7 is empty. In addition, in the example of FIG. 9, when the instruction translator 138 translates the formatted macro instruction of early 3 193 and generates a new micro instruction 71, the X worker 0 is empty. Therefore, the XIQ control logic 156 uses χ—to η signal 148 to raise the value of the valid signal 188 instead of storing F_valid 188 as a valid bit ^ 149 as shown in FIG. G. -As shown in the figure, in the first clock cycle 1, the instruction formatter 丨 i 6 produces a value of true, shown in Figure 1 F — new_instr signal 152, for the month, after formatting in 1. Instruction signal 1 9 7 contains a valid new formatted macro instruction. Because the formatted command queue 1 8 7 is empty, the FIQ control logic 118 in FIG. 1 generates a true el0ad [〇] signal 162, which is a valid new format of the command signal 197 formatted with ^ f. The transformed macro & 7, enters the transformed empty command line 1 8 7 and the lowest empty item EE 0. As shown in the figure, in the same instance, the signals 1 4 1 and k 1 1 0 are deleted. The signal 143 ’Invalid 188, X-valid 148 and the valid bit RV 189 are both

第36頁 200414035 五、發明說明(29) 假。 · * 在第二時脈週期2内,圖1中的格式化後指令佇列1 8 7 的項目EE0的有效位元FV〇丨34,被設置為說明EE()是否包 含一個有效指令。在時脈週期2的上升緣,圖1中的一個暫 存器183載入eload[〇] 162並輸出一個值為真的ii〇ad[0] 142。如圖中所示,因為el〇ad[〇] ι62為真,新指令被載 入ER0 220並輸出於圖1中的eariy〇信號193 ,作為圖1中指 令轉譯器1 3 8的輸入。指令轉譯器丨3 8轉譯這個新的巨集指 令’並把得出的微指令1 7 1提供給X I Q 1 5 4。另外,如圖所 示’控制邏輯102產生相關X — rel — info 186上新的指令的 新資訊。因為ll〇ad[0] 142為真,如圖所示,多工器41〇 選擇負載資料輸入端,並將X一rel_infol86所包含的相關 資訊輸出到1 a t e 0 1 9 1上作為X I Q 1 5 4及圖1所示多工器1 7 2 的輸入。進一步,因為指令轉譯器138已經在第二時脈週 期轉譯過此新指令,F I Q控制邏輯11 8產生一個真值於圖丄 中的e s h i f t信號1 6 4上,以使得指令能在第三時脈週期轉 移出格式化後的指·令佇列1 8 7。 同樣在第二時脈週期2内,控制邏輯1 〇 2發現一個第一 時脈週期内產生的新指令必須被刪除的情況,並因此在第 二時脈週期後半周產生一個值為真的圖1所示刪除信號 141 。因為在clock 2的後半部ll〇ad[0] 142及刪除信號均 為真,依據圖4至圖6 ,killO信號143也為真。進一步,因 為ki 1 10信號143位元真,依據圖7,F一val id 188為娘。最 後,如圖所示,因為F 一 valid 188為假,且XIQ 154為空,P.36 200414035 V. Description of Invention (29) False. · * In the second clock cycle 2, the valid bit FV〇34 of the item EE0 of the formatted instruction queue 1 8 7 in FIG. 1 is set to indicate whether EE () contains a valid instruction. At the rising edge of the clock cycle 2, a register 183 in FIG. 1 loads eload [0] 162 and outputs a value of true i0ad [0] 142. As shown in the figure, because el0ad [〇] ι62 is true, the new instruction is loaded into ER0 220 and outputted in the eariy signal 193 in FIG. 1 as the input of the instruction translator 1 38 in FIG. 1. The instruction translator 丨 3 8 translates this new macro instruction ′ and provides the resulting micro instruction 1 7 1 to X I Q 1 5 4. In addition, as shown in the figure, the control logic 102 generates new information about the new instruction on X-rel-info 186. Because ll〇ad [0] 142 is true, as shown in the figure, the multiplexer 41 〇 selects the load data input terminal and outputs the relevant information contained in X_rel_infol86 to 1 ate 0 1 9 1 as XIQ 1 5 4 and the input of the multiplexer 1 7 2 shown in FIG. 1. Further, because the instruction translator 138 has translated this new instruction in the second clock cycle, the FIQ control logic 11 8 generates a true value on the eshift signal 1 6 4 in the figure, so that the instruction can be in the third clock Cycle out of formatted instruction queue 1 8 7. Also in the second clock cycle 2, the control logic 1 finds a situation where a new instruction generated in the first clock cycle must be deleted, and therefore a graph with a value of true is generated in the second half of the second clock cycle The deletion signal 141 is shown at 1. Because in the second half of the clock 2 llad [0] 142 and the delete signal are both true, according to FIGS. 4 to 6, the killO signal 143 is also true. Further, since the ki 1 10 signal is 143 bits true, according to FIG. 7, F_val id 188 is a mother. Finally, as shown in the figure, because F_valid 188 is false and XIQ 154 is empty,

12829twf.ptd 第37頁 200414035 五、發明說明(30) 在第二時脈週期結束時X_val id 148為假。 在第三時脈週期3内,因為新指令已\ 後指令佇列187,FV0 134為假。在第三時轉移出格式化 緣,因為XIQ154為空,XIq控制邏輯丨“‘二^期的上升 1 7 1及1 a t e 0 1 9 1所提供的指令相關資訊 ,的微指令 器176 。另外,圖1中暫存器185載入eshif 行階段暫存 一個真值1 shi ft 168。進一步,在第二時脈y"1 並輪出 假的X —val id I48被載入RV m,此信號在,'末尾值為 内為假。因此,在第二時脈週期產生並&入二時脈週期 器1 7 6的微指令1 7 1被標誌為無效,而如同仃階段暫存 會被微處理器1 0 〇管線的執行階段所執行。w的—樣,不 從圖9可看出,儘管新的巨集指令在第一日士 就已產生並且被載入入格式化後指令佇列丨8 7 了脈週期内 1 4 1直到第二時脈週期才產生。圖1中的指令刪卜刪除信號 地使巨集指令能夠被刪除,例如標誌、為無效, &置方便 段不會執行已被刪除地指令。 此執行階 圖1 0為說明依,據本發明之圖i所示指令刪除 原理的時序圖。除當指令格式器丨i 6產生一個x置工作 後的巨集指令時X丨q i 54為滿之外,圖丨〇與圖9相S格式化 為圖1 0所時實例中XI q i 54為滿X丨q 1 54的有效位_、似。因 149有顯示,而RV 189及x — valid H8的值則沒有$ 一v2 在時脈週期1内,XIQ_fuU 195為真。如同圖’9不一。 樣’指令格式器116在formatted 一 instr 197上產^斤示— 新的指令,F — new 一 instr 152為真。因為格式化徭—個 “偬指令佇12829twf.ptd Page 37 200414035 V. Description of the invention (30) X_val id 148 is false at the end of the second clock cycle. In the third clock cycle 3, FV0 134 is false because the new instruction has \ next instruction queue 187. At the third time, the formatting edge is transferred, because XIQ154 is empty, and the XIq control logic 丨 "the rise of the second period 1 7 1 and 1 ate 0 1 9 1 provides information about the instruction, the microinstruction unit 176. In addition, In FIG. 1, the register 185 loads the eshif line stage to temporarily store a truth value 1 shi ft 168. Further, at the second clock y " 1 and turns out a false X —val id I48 is loaded into RV m, this signal In the end value of 'is false. Therefore, the microinstruction 1 7 1 generated in the second clock cycle and & entered in the second clock cycler 1 7 6 is marked as invalid, and the temporary storage as in the 仃 stage will be It is executed by the execution stage of the microprocessor 100 pipeline. Similarly, it can not be seen from Figure 9, although the new macro instruction has been generated and loaded into the formatted instruction on the first day. Column 丨 8 7 In the pulse period, 1 4 1 is not generated until the second clock period. The instruction deletion in FIG. 1 enables the macro instruction to be deleted. For example, the flag is invalid, & The instructions that have been deleted will be executed. This execution step is illustrated in Fig. 10, according to the instruction deletion shown in Fig. I of the present invention. Timing diagram of the management. Except when the instruction formatter 丨 i 6 generates a macro instruction after x is set, X qi 54 is full, and the figure S and the phase S in FIG. 9 are formatted as the example in FIG. 10 The XI qi 54 is the valid bit of the full X 丨 q 1 54 _, similar. Because 149 is displayed, the value of RV 189 and x — valid H8 is not $ 1 v2 in the clock cycle 1, XIQ_fuU 195 is true. As shown in Fig. 9, the instruction formatter 116 produces a new instruction on formatted instr 197—F—new—instr 152 is true. Because formatting—a "instruction"

200414035 五、發明說明(31). ··· .... · · · . · 列1 8 7為空,如同圖9所示一樣,F I Q控制邏.輯1 1 8產圭一個 值為真的e 1 〇 a d [ 0 ]信號1 6 2,以此將有效的新的格式化後 的巨集指令從格式化後指令信號1 9 7載入E E 0。圖1中所示 冊J除信號141 、killO信號143及F — valid 188 ,如同圖9中 所示,均為假。但是,因為X I Q 1 5 4為滿,因此有效位元 XV2 149為真,亦即,XIQ 154的輸入2有效。 在時脈週期2内,如同圖9所示一樣,F V 0 1 3 4被設置 為說明EE0是否包含一個有效指令;暫存器183輸出一個值 為真的lload[0] 142 ;新指令被載入ER0 220之中,並被 輸出為earlyO信號193以作為指令轉譯器138的輸入;相關 新指令的新資訊被產生為χ — rel 一 inf〇 186 ;多工器310選 中負載資料輸入端,並將X - r e 1 _ i n f 〇 1 8 6所提供的新相關 資訊輸出為lateO 191 ,作為XIQ 154及多工器172的輸 入。但是,因為在第二時脈週期起始時乂丨Q 1 5 4為滿,不 同於圖9所示情況,F I Q控制邏輯1 1 8產生一個值為假的 e s h i f t信號1 6 4。X I Q控制邏輯1 5 6隨之取消X I Q — f u 1 1 1 9 5 ’以藉此表示指令轉譯器1 3 8已準備好將在第三時脈週 期内轉譯一條新的巨集指令。 同樣的,在時脈週期2内,控制邏輯1 〇 2發現一個第一 時脈週期内產生的新指令必須被刪除的情況,並因此在第 一時脈週期後半周產生一個值為真的刪除信號1 4 1。因為 在clock 2的後半部ii〇ad[0] 142及删除信號均為真,依 據圖4至圖6,kill〇信號143也為真。進一步,因為kiu〇 信號143為真,因此依據圖7可知F—valid 188為假。因為200414035 V. Description of the invention (31). ··· .... · · · · · Column 1 8 7 is empty, as shown in Figure 9, FIQ control logic. Series 1 1 8 yields a value of true e 1 〇ad [0] signal 16 2 to load a valid new formatted macro instruction from the formatted instruction signal 197 to EE 0. The divide-by-J signal 141, killO signal 143, and F_valid 188 shown in FIG. 1 are all false, as shown in FIG. However, because X I Q 1 5 4 is full, the valid bit XV2 149 is true, that is, input 2 of XIQ 154 is valid. In clock cycle 2, as shown in Figure 9, FV 0 1 3 4 is set to indicate whether EE0 contains a valid instruction; register 183 outputs a value of true lload [0] 142; the new instruction is loaded Into ER0 220, and is output as earlyO signal 193 as input to instruction translator 138; new information related to new instructions is generated as χ — rel — inf〇186; multiplexer 310 selects the load data input terminal, The new related information provided by X-re 1 _ inf 〇 1 8 6 is output as lateO 191 as the input of XIQ 154 and multiplexer 172. However, since 乂 Q 1 5 4 is full at the beginning of the second clock cycle, different from the case shown in FIG. 9, the F I Q control logic 1 1 8 generates an e s h i f t signal 1 6 4 which is false. X I Q control logic 1 5 6 subsequently cancels X I Q — f u 1 1 1 9 5 ′ to indicate that the instruction translator 1 8 is ready to translate a new macro instruction in the third clock cycle. Similarly, in clock cycle 2, control logic 1 finds a situation where a new instruction generated in the first clock cycle must be deleted, and therefore a true delete is generated half a week after the first clock cycle Signal 1 4 1. Because iiad [0] 142 and the delete signal are both true in the second half of clock 2, according to Figs. 4 to 6, the killo signal 143 is also true. Further, since the kiu0 signal 143 is true, it can be known from FIG. 7 that F_valid 188 is false. because

12829twf.ptd 第39頁 200414035 五、發明說明(32) • . · · · · XIQ 154被向下轉移,使得XIq 154在第二時脈週期不再為 滿’XV2 149轉為假’表示xiq 154頂端項目的指令,亦 即,由XV 2 1 4 9所明確指示其有效性的項目,不再有效。 在時脈週期3内’因為“匕丨^信號164在時脈週期cik 2 0 2上升緣為假,新指令被保持在ER 〇 2 2 0内,並通過 e a r 1 y 0 1 9 3被提供給指令轉譯器丨3 8進行轉譯。相當的, F V 0 1 3 4保持為真。指令轉譯器丨3 8轉譯新的巨集指令,並 把轉譯成的微指令171提供給xiq 154。因為lload[〇] 142 在時脈週期c 1 k 2 0 2上升緣為真,在第二時脈週期通過 X-rel — 186提供的相關資訊被載入lr〇 320。因為在 時脈週期其他時間丨1〇ad[ 〇 ] 142 &lshi f t 1 68為假, 2 : ’ LR 0 3 2 0的内$,亦即,與指令相關的*的資訊,回 過^teO 191而被提供給XIQ 154。在第三 汽後’ 控制邏輯118產生—個值為真的esMft信J 4開 =,令的指令在第四個時㈣ 保持Γί在週據,^ ^ 1 4 5輸入ΚΕ 0的刪除俨妒丨4丨在° / 生並被載入刪除佇列 Η 11〇信號14^接供ί ϋ在週期3被保持,並通過 1 8 8在整個時脈週期3内保持為假,F — va 1 1 d 的指令193為無效指令。此步驟假末^^供給指令轉譯器 脈週期2内控制邏輯1〇2產生^為/而的’原因在於在時 表示時脈週期i内產生的指令的删除信號“ i來 9 7必須破刪除。X v 2 1 4 9繼 12829twf.ptd 第40頁 200414035 五,發明說明(33) "--— :,叙進/步’控制邏輯在時脈週期3内賦假值給刪除 信號1 4 1 (或稱取消刪除信號丨4 1 )。 在時脈週期4内,因為新的指令被轉移出格式化後指 令分列1 8 7。’ F V 0 1 3 4轉為假。在時脈週期4的上升緣,圖i ==暫存器185載入eshift信號164並通過lshift 168輸出 一〔、值。另外’ X丨q控制邏輯丨5 6載入轉譯後的微指令丨7 1 及通過1 a 11 e 0 1 9 1提供的指令相關資訊給X I Q 1 5 4。但是,12829twf.ptd Page 39 200414035 V. Description of the invention (32) • · · · · XIQ 154 is shifted downwards, so that XIq 154 is no longer full at the second clock cycle. 'XV2 149 turns false' means xiq 154 The instructions for the top item, that is, the item whose validity is explicitly indicated by XV 2 1 4 9 are no longer valid. In clock cycle 3 ', because the signal 164 is false at the rising edge of clock cycle cik 2 0 2, the new instruction is held in ER 〇 2 2 0 and provided by ear 1 y 0 1 9 3 Translate the instruction translator 丨 38. Equivalently, FV 0 1 3 4 remains true. The instruction translator 丨 38 translates the new macro instruction and provides the translated microinstruction 171 to xiq 154. Because lload [〇] 142 The rising edge of c 1 k 2 0 2 is true in the clock cycle, and the relevant information provided by X-rel — 186 in the second clock cycle is loaded into lr〇320. Because at other times in the clock cycle 丨1〇ad [〇] 142 & lshi ft 1 68 is false, 2: 'LR 0 3 2 0 within $, that is, information related to the instruction *, which is provided to XIQ 154 in response to ^ teO 191 After the third steam, the control logic 118 generates a value of true esMft letter J 4 on =, so that the command at the fourth time ㈣ keep Γί in the weekly data, ^ ^ 1 4 5 Enter ΚΕ 0 to delete 俨Jealousy 丨 4 丨 born and loaded and deleted queue Η 11〇 signal 14 ^ receive 供 被 is held in cycle 3, and remains false throughout clock cycle 3 through 1 8 8, F — The instruction 193 of va 1 1 d is an invalid instruction. This step is false ^ ^ supply instruction translator pulse cycle 2 control logic 10 2 produces ^ is / and 'is because the time represents the instruction generated in the clock cycle i The deletion signal "i to 9 7 must be broken and deleted. X v 2 1 4 9 Following 12829twf.ptd Page 40 200414035 V. Description of the invention (33) " ----: The narrative / step 'control logic assigns a false value to the delete signal within the clock cycle 3 1 4 1 (Also known as the undelete signal 丨 4 1). In clock cycle 4, because the new instruction is transferred out of the formatted instruction column 1 8 7. ‘F V 0 1 3 4 turns false. At the rising edge of clock cycle 4, graph i == register 185 loads eshift signal 164 and outputs a [, value through lshift 168. In addition, “X 丨 q control logic” 5 6 loads the translated micro-instructions 丨 7 1 and the instruction-related information provided through 1 a 11 e 0 1 9 1 to X I Q 1 5 4. but,

因為在b脈週一期3末尾F — vai id 188為假,一個假值被載入 XV2 149來表示載入XIQ 154的轉譯後的微指令ιπ無效。 因此,^在時脈週期3由指令轉譯器1 3 8產生並被載入X I Q 的从“々171被標認為無效,並且如同預期一樣,當其 1 Q 1 5 4輸出時,不會被微處理器1 0 0管線的執行階段執 ί二ί二種實例中,因為XIQ 154接收微指令171的輸入被 知“ f無效,它可能被下一個微指令所覆蓋。 H ΐ,i、0可看出,儘管新的巨集指令在第一時脈週期内 ηΛλΛ ^生亚且被載入格式化後指令佇列1 8 7,但刪除作-到第二時脈週期才會產生。圖1中的指令删除Λ 士拙一巨集指令能夠被刪除,亦即,將其標誌為無效, ^订階段不會執行已被刪除地指令。 盾掷,^為說明依據本發明之圖1所示指令删除裝置工作 後的巨ΐί圖。除當指令袼式器116產生一個新的格式化 ί Ϊ 指令時XIQ 154為滿且格式化後指令佇列187不^ ΐΞ 1 1與圖丨〇相類似。圖1中所示的删除信號1 4 1的值 必須被載入刪除佇列中與格式化後指令佇列187内此新的值Because F — vai id 188 is false at the end of period 3 of the b-pulse Monday, a false value is loaded into XV2 149 to indicate that the translated microinstruction ππ loaded into XIQ 154 is invalid. Therefore, ^ is generated by instruction translator 1 3 8 at clock cycle 3 and loaded into XIQ from "々171 is marked invalid, and as expected, when its 1 Q 1 5 4 output, it will not be In the two examples of the execution phase of the processor's 100 pipeline, because XIQ 154 receives the input of the microinstruction 171 and is known as "f is invalid, it may be overwritten by the next microinstruction. H ΐ, i, 0. It can be seen that although the new macro instruction is ηΛλΛ ^ in the first clock cycle and is loaded into the formatted instruction queue 1 8 7, it is deleted-to the second clock Cycles will only happen. The instruction deletion Λ Shizhuo macro instruction in FIG. 1 can be deleted, that is, it is marked as invalid, and the deleted instruction will not be executed in the ordering stage. Shield throw, ^ is a huge picture illustrating the operation of the instruction deletion device shown in FIG. 1 according to the present invention. Except when the instruction formatter 116 generates a new formatting command, XIQ 154 is full and the formatted command queue 187 is not ^ ΐΞ 1 1 is similar to Figure 丨 0. The value of the delete signal 1 4 1 shown in Figure 1 must be loaded into the delete queue and this new value in the formatted command queue 187

200414035 五、發明說明(34) 巨集‘令所載入項目相對應的項目,並且與格式化後指令 仔列1 8 7相應的向下轉移,以此保證當格式化後指令佇列 1 8 7提供新的巨集指令時,與其對應的正確的刪除信號的 值也此被刪除彳宁列提供出來。其相關具體細節將在以下闡 述。因此刪除佇列145中暫存器〇1 (在圖4中標記為42;[, 在圖5、6中標記為521 ,而在此後將稱為^ 421)的值也 在圖11中表示出來。 一 在時脈週期1内,XIQ—full 195為真。如同圖9、10所 示一樣’指令格式器116在f〇rmatted—instr ι97上產生 一個新的指令,F_new_instr 152為真。因為EE0包含一個 有效指令’FV0 134為真;但是,如圖所示,因為ΕΕι不包 含一個有效指令,圖1中所示格式化後指令佇列丨8 7的項目 E E 1的有效位元ρ v 1 1 3 4為假。因此,f I Q控制邏輯1 1 8產 生一個值為真的e 1 0 a d [ 1 ]信號1 6 2,以此將格式化後指令 信號1 9 7的有效的新的格式化後的巨集指令載入E E 1。信號 e a r 1 y 0 1 9 3提供保存在e E 0的指令,此指令在圖1 1中被標 記為old instr ;信號lateO 191提供保存在LEO的舊指令 的相關資訊’此資訊被標記為ο 1 d i n f 〇,如圖所示。與圖 1 0所示相同,圖1中的刪除信號丨4 1 丨丨丨〇信號丨4 3均為 假’有效位元XV2 149為真。但是,因為FV0 134為真,而 刪除信號141為假,所以F —valid 188為真。KR1 421為 假0 在時脈週期2内,F V 1 1 3 4被設置為說明E E 1是否包含 個有效指令,F V 0同樣保持被設置的狀態。舊指令保存200414035 V. Description of the invention (34) The macro "orders the items corresponding to the loaded items, and transfers downwards corresponding to the formatted command line 1 8 7 to ensure that the formatted command line 1 8 7 When a new macro instruction is provided, the value of the corresponding correct deletion signal is also provided by the deletion column. Relevant specific details will be explained below. Therefore, the value of the register 〇1 in queue 145 (labeled 42 in FIG. 4; [, 521 in FIGS. 5 and 6, and hereinafter referred to as ^ 421) is also shown in FIG. 11 . -XIQ_full 195 is true during clock cycle 1. As shown in Figs. 9 and 10, the 'instruction formatter 116 generates a new instruction on fmatted_instr 97, and F_new_instr 152 is true. Because EE0 contains a valid instruction 'FV0 134 is true; however, as shown in the figure, because Ει does not contain a valid instruction, the formatted instruction queue shown in Figure 1 shows the valid bit ρ of item EE 1 of 8 7 v 1 1 3 4 is false. Therefore, the f IQ control logic 1 1 8 generates a true e 1 0 ad [1] signal 1 6 2, thereby validating a new formatted macro instruction of the formatted instruction signal 1 9 7 Load EE 1. The signal ear 1 y 0 1 9 3 provides the instruction saved in e E 0, this instruction is marked as old instr in Figure 1 1; the signal lateO 191 provides information about the old instruction saved in LEO ', this information is marked as ο 1 dinf 〇 as shown. As shown in FIG. 10, the deletion signal 丨 4 1 丨 丨 丨 signal 丨 4 3 in FIG. 1 are both false and the valid bit XV2 149 is true. However, since FV0 134 is true and delete signal 141 is false, F_valid 188 is true. KR1 421 is false 0 In clock cycle 2, F V 1 1 3 4 is set to indicate whether E E 1 contains a valid instruction, and F V 0 also remains set. Old instruction save

12829twf.ptd 第42頁 200414035 五、發明說明(35) 在ER 0 2 2 0,而舊指令的相關資訊則保存在LR 0 3 2 0。暫存 器183輸出一個值為真的lload[l] 142。新指令被載入ER1 2 2 1 ,如圖所示。與新指令相關的新的資訊被產生為 X — rel —info 186 ,且圖3中的多工器311將選中負載資料輸 入端,此輸入同樣被提供給暫存器L R 1 3 2 1。因為在第二 時脈週期起始時X I Q 1 5 4為滿,所以F I Q控制邏輯1 1 8產生 一個值為假的e s h i f t信號1 6 4。X I Q控制邏輯1 5 6隨之賦假 值給X I Q — f u 1 1 1 9 5,藉此以表示指令轉譯器丨3 8將準備在 第三時脈週期内轉譯一條新的巨集指令。 同樣在時脈週期2内,控制邏輯1 〇 2發現一個第一時脈 週期内產生的新指令必須被刪除的情況,並因此在第二時 脈週期後半周產生一個值為真的刪除信號丨4 1。κ R 1 4 2 1保 持為假。根據圖4至圖6 ’因為此例中格式化後指令仔列 1 8 7的E E 0内的指令不需被刪除,k i 1 1 〇信號1 4 3為假。進一 步,因為k i 1 1 0信號1 4 3為假,而f v 0 1 3 4為真,所以依據 圖7可知F 一 valid 188為假。因為XIq 154被向下轉移,使 得Xj Q 1 5 4在第二時脈週期不再為滿,χ v 2 1 4 9轉為假。這 表不X I Q 1 5 4頂端項目的指令,亦即,其有效性為χ v 2 1 4 9 所明確的項目,不再有效。 在日寸脈週期3内,因為e s h丨f t信號丨6 4在時脈週期c i k 2 0 2上升緣為假’新指令被保持在ER i 2 2丨内,另外舊指令12829twf.ptd Page 42 200414035 V. Description of the invention (35) ER 0 2 2 0, and the relevant information of the old instruction is stored in LR 0 3 2 0. The register 183 outputs a value of true lload [l] 142. The new instruction is loaded into ER1 2 2 1 as shown. The new information related to the new command is generated as X — rel — info 186, and the multiplexer 311 in FIG. 3 will select the load data input terminal, and this input is also provided to the register L R 1 3 2 1. Because X I Q 1 5 4 is full at the beginning of the second clock cycle, the F I Q control logic 1 1 8 generates an e s h i f t signal 1 6 4 with a false value. The X I Q control logic 1 5 6 then assigns a false value to X I Q — f u 1 1 1 9 5 to indicate that the instruction translator 3 8 will be ready to translate a new macro instruction in the third clock cycle. Also in the clock cycle 2, the control logic 1 02 finds that a new instruction generated in the first clock cycle must be deleted, and therefore generates a true delete signal in the second half of the second clock cycle 丨4 1. κ R 1 4 2 1 remains false. According to FIG. 4 to FIG. 6 ′, because the instructions in the formatted instruction list E E 0 in this example need not be deleted in this example, the k i 1 1 0 signal 1 4 3 is false. Further, since the k i 1 1 0 signal 1 4 3 is false and f v 0 1 3 4 is true, it can be seen from FIG. 7 that F_valid 188 is false. Because XIq 154 is shifted downward, Xj Q 1 5 4 is no longer full at the second clock cycle, and χ v 2 1 4 9 is false. This means that the instruction of the top item of X I Q 1 5 4, that is, the item whose validity is specified by χ v 2 1 4 9, is no longer valid. In the daily pulse period 3, because the e s h 丨 f t signal 丨 6 4 is false at the rising edge of the clock period c i k 2 0 2 ’The new instruction is kept in ER i 2 2 丨 and the old instruction

被保持在E R 0 2 2 0,並通讲 1 Π 1 Π O 1迷過earlyO 193被提供給指令轉譯 =8/^轉//八川物0134保持為真。指令轉譯器138 轉#售的巨集心令’並把轉譯成的微指令17ι提供給xiqIt is kept at E R 0 2 2 0, and the general talk 1 Π 1 Π O 1 overearlyO 193 is provided to instruction translation = 8 / ^ 转 /// 八 川 物 0134 remains true. Instruction translator 138 Reposts # 售 的 大 集 心 令 ’and provides the translated microinstructions 17ι to xiq

12829twf.ptd12829twf.ptd

第43頁 200414035 五、發明說明(36) 154。因為在時脈週期3 他 LRO 32〇\,U〇ad[〇] 142Page 43 200414035 V. Description of the Invention (36) 154. Because in the clock cycle 3 he LRO 32〇 \, U〇ad [〇] 142

指令的舊相關資訊,將通過':=的U,亦即,舊 154。因為lload[〇] 14 19 而破提供給XIQ 所以在第二時脈週期通 、、週fclk 2 0 2上升緣為真, 資訊被載人LR1 321 。在Ί二一/η 〇 I86提供的新的相關 輯U8產生一個值為直的二口 開始後,FIQ控制邏 同樣在日守脈週期3内,因為在時脈 Η「1 1 Θ所不。但疋’根據圖4至圖6 ’ k i i丨 ⑷進一 y ,控制邏輯102在時脈週期3賦假值給刪除信號 所以jm4内’因為新的指令被由阳轉移至刪, /LTi ΐ Γ在時脈週期4的上升緣,xiq控制邏輯 156將由售私令轉譯後的微指令171及通過ia 的指令資訊載人至XIQ 154。另外,暫存器i85載入、The old information about the command will pass the U of ': =, that is, old 154. Because lload [〇] 14 19 is provided to XIQ, the rising edge of fclk 2 0 2 is true in the second clock cycle, and the information is carried in LR1 321. After the new correlation series U8 provided by Ί21 / η〇I86 started to produce a straight mouth with two values, the FIQ control logic is also in the day guard cycle 3, because the clock Η1 1 Θ does not. But 疋 'according to Fig. 4 to Fig. 6' kii 丨 Enter a y, the control logic 102 assigns a false value to the delete signal at clock cycle 3, so jm4 'Because the new instruction is transferred from positive to delete, / LTi ΐ Γ On the rising edge of clock cycle 4, the xiq control logic 156 will carry the microinstruction 171 translated from the private sale order and the instruction information through ia to XIQ 154. In addition, the register i85 is loaded,

= 並通過lshift 168輸出—真值。因為XIQ 154狀怨為可以接受另一個微指令,所以esMft為直。因 ,在時脈週期elk 2 0 2上升緣eshift信號164為直/所以 轉移至ER0 2 2 0,並通過eariy〇、193提供給 ‘ 7轉澤裔1 3 8以進行轉譯。F V 0 1 3 4保持為真。指令轉 器1 3 8轉譯新指令,並把得到的微指令丨7 1提供給χ丨q °= And output via lshift 168—truth. Because XIQ 154 complains that it can accept another microinstruction, esMft is straight. Because the eshift signal 164 at the rising edge elk 2 0 2 in the clock cycle is straight / so it is transferred to ER 0 2 2 0 and provided to ‘7 translators 1 3 8 through eary 0, 193 for translation. F V 0 1 3 4 remains true. The instruction translator 1 3 8 translates the new instruction and provides the obtained microinstruction 丨 7 1 to χ 丨 q °

200414035 五、發明說明(37) 154。因為在時脈週期4内lshift 168為真,所以保持在 LR 1 3 2 1的新指令相關資訊被選中為多工器3 1 0的切換資料 輸入端’並被通過lateO信號191向外提供,如圖所示。 同樣在時脈週期4内,在時脈週期2内產生並被保存在 刪除佇列1 4 5的刪除信號1 4 1的值,亦即,刪除位元,被由 KR1 421轉移至圖4的KR0 4 2 0 (或圖5、6的KR0 520 )。因 此’根據圖4至圖6可知其導致產生一個值為真的kill〇信 號1 4 3。根據圖7,F _ v a 1 i d 1 8 8相應的轉為假。 在時脈週期5内,因為新指令被轉移出格式化後指令 佇列1 8 7,所以F I Q控制邏輯1 1 8清除F V 0 1 3 4。在時脈週期 5的上升緣,X I Q控制邏輯1 5 6將由新指令轉譯後所得的微 指令1 7 1及通過1 a t e 0 1 9 1提供的新指令相關資訊載入至 XIQ 154。但是,因為在時脈週期4末尾F_val id 188為 假,因此一個假的值被載入XV2 149來表示載入XIQ U4的 轉譯後的微指令無效。因此,在時脈週期3由指令轉譯器、 1 3 8產生並被載入X I q 1 5 4的微指令1 7 1被標記為益效,°並 且如同所預期一樣·,當其從XIQ i 54輸出時,不: 考 理器1 0 0管線的執行階段所執行。在一種實例中:因 7 154用來接收微指令171的項目被標記為無效,它 ·、合= 下一個微指令覆蓋。 匕田掖 ^圖1 1可看出,儘管新的巨集指令在第—時脈週期内 就已產生並且被載入格式化後指令佇列i 8 7,刪除信 卻直到第二時脈週期才產生。圖1中的指令刪除裝置^ 地使巨集指令能夠被刪除,亦即,將其標誌為無效,因更此200414035 V. Description of Invention (37) 154. Because lshift 168 is true in clock cycle 4, the new instruction related information kept at LR 1 3 2 1 is selected as the switching data input terminal of multiplexer 3 1 0 and is provided externally through the lateO signal 191 ,as the picture shows. Also in the clock cycle 4, the value of the delete signal 1 4 1 generated in the clock cycle 2 and stored in the delete queue 1 4 5, that is, the delete bit, is transferred from KR1 421 to FIG. 4 KR0 4 2 0 (or KR0 520 in Figures 5 and 6). Therefore, it can be known from FIG. 4 to FIG. 6 that it leads to the generation of a kill0 signal 1 4 3 with a true value. According to FIG. 7, F _ v a 1 i d 1 8 8 is false accordingly. In clock cycle 5, because the new instruction is transferred out of the formatted instruction queue 1 8 7, the F I Q control logic 1 1 8 clears F V 0 1 3 4. At the rising edge of the clock cycle 5, the X I Q control logic 1 5 6 loads the micro instruction 1 71 and the information about the new instruction provided by 1 a t e 0 1 9 1 into the XIQ 154. However, because F_val id 188 is false at the end of clock cycle 4, a false value is loaded into XV2 149 to indicate that the translated microinstructions loaded into XIQ U4 are invalid. Therefore, the microinstruction 1 7 1 generated by the instruction translator, 1 3 8 and loaded into XI q 1 5 4 at clock cycle 3 is marked as beneficial, ° and as expected, when it is removed from XIQ i When 54 is output, no: executed by the execution stage of the 1001 pipeline. In one example: Because 7 154 items used to receive microinstructions 171 are marked as invalid, it is combined with the next microinstruction coverage. Figure 11 shows that although the new macro instruction has been generated in the first clock cycle and is loaded into the formatted instruction queue i 8 7, the delete letter is up to the second clock cycle. Just produced. The instruction deletion device in FIG. 1 enables the macro instruction to be deleted, that is, marks it as invalid, and so on

12829twf.ptd12829twf.ptd

200414035 五、發明說明(38) 執行階段不會執行已被刪除的指令。 雖然本發明與其目的、特性、及優點已在此文檔中詳 細解釋,它還可以包括其他的實例。例如,儘管文中已提 及多種指令必須被删除的情況,本發明仍可用於其他情況 下的指令刪除。另外,儘管文中只描述一個表示微處理器 將巨集指令轉譯成微指令的實例,一個微處理器以精簡指 令集電腦(R I SC )代替,解碼R I SC指令,而不是將巨集指 令轉譯成微指令的實例仍不脫離本發明的可預期的實施例 範圍。 除用硬體來實施本發明外,它還可以通過電腦可讀代 碼(如電腦可讀程式碼、資料等)在一個電腦可用(如可 讀的)媒介内實現。此類電腦代碼可造成對此發明功能的 實施、模仿或兩者都有。例如,此功能可用通用的編程語 言(如C、C + +、JAVA、及其它類似語言)實現;亦可用 GDSII資料庫,硬體描述語言(HDL),包括Verilog HDL、VHDL、Altera HDL (AHDL)等,或者其他程式和/或 電路(如schematic)捕獲工具等行業記憶體在的工具實 現。電腦代碼可儲存於任何電腦可用(如可讀的)媒體 内,包括半島器記憶體、磁片、光碟、CD-ROM、DVD - ROM 及類似品,或作為電腦資料被放置在電腦可用(如可讀的 )傳播媒介(如載波或其他媒介包括數位的\光學的、或 類比媒介)。因此,電腦代碼可在通訊網絡中傳播,包括 因特網和企業内部網路。作為知識產權的一部分,本發明 可以包含於電腦代碼核心,例如微處理器核心,或者系統200414035 V. Description of the Invention (38) The deleted instruction will not be executed during the execution phase. Although the present invention and its objects, features, and advantages have been explained in detail in this document, it may include other examples. For example, although the case where various instructions must be deleted has been mentioned in the text, the present invention can be applied to instruction deletion in other cases. In addition, although the article only describes an example of a microprocessor that translates a macro instruction into a micro instruction, a microprocessor uses a reduced instruction set computer (RI SC) to decode the RI SC instruction instead of translating the macro instruction into Examples of microinstructions still do not depart from the scope of contemplated embodiments of the invention. In addition to implementing the present invention in hardware, it can also be implemented in a computer-usable (eg, readable) medium through computer-readable code (eg, computer-readable code, information, etc.). Such computer code may cause the function of this invention to be implemented, imitated, or both. For example, this function can be implemented with a general-purpose programming language (such as C, C ++, JAVA, and other similar languages); it can also use the GDSII database and hardware description language (HDL), including Verilog HDL, VHDL, Altera HDL (AHDL ), Etc., or other programs and / or circuits (such as schematic) capture tools and other industry memory implementation tools. Computer code can be stored on any computer-usable (eg, readable) media, including peninsula memory, magnetic disks, CD-ROMs, CD-ROMs, DVD-ROMs, and the like, or placed on a computer as computer data (eg (Readable) communication media (such as carrier waves or other media including digital, optical, or analog media). As a result, computer code can be transmitted across communication networks, including the Internet and corporate intranets. As part of the intellectual property, the invention may be embodied in a core of a computer code, such as a microprocessor core, or a system

12829twf.ptd 第46頁 200414035 五、發明說明(39) 級設計,例如單片機系統(S0C )内,並作為積體電路產 品的一部分而被轉移到硬體内。同時,本發明也可以用硬 體及電腦代碼的結合來實現。 最後,熟悉本專業的技術人員在不脫離本發明技術方 案範圍内,當可利用上述揭示的技術内容作出些許更動或 修飾為等同變化的等效實施例,但是凡是未脫離本發明技 術方案的内容,依據本發明的技術實質對以上實施例所作 的任何簡單更正、等同變化與修飾,均仍屬於本發明技術 方案的範圍内。12829twf.ptd Page 46 200414035 V. Description of the invention (39) level design, such as in the single-chip microcomputer system (S0C), and transferred to the hard body as part of the integrated circuit product. At the same time, the present invention can also be implemented by a combination of hardware and computer code. Finally, those skilled in the art can use the disclosed technical content to make minor changes or modifications to equivalent embodiments without departing from the scope of the technical solution of the present invention. However, those who do not depart from the technical solution of the present invention Any simple corrections, equivalent changes, and modifications made to the above embodiments according to the technical essence of the present invention still fall within the scope of the technical solution of the present invention.

12829twf.ptd 第47頁 200414035 圖式簡單說明 圖1為本發明所述一種微處理器的一個結構示意圖。 圖2為說明依據本發明之圖1 所示格式化的指令佇列 的第一彳宁列的結構示意圖。 圖3 為說明依據本發明之圖1 所示格式化的指令佇列 的第二彳宁列的結構示意圖。 圖4 為說明依據本發明之圖1 所示刪除佇列的第一個 實例的結構示意圖。 圖5為說明依據本發明之圖1 所示刪除佇列的第二個 實例的結構示意圖。 圖6為說明依據本發明之圖1 所示刪除佇列的第三個 實例的結構示意圖。 圖7為產生依據本發明之圖1 所示F_va 1 i d信號的F I Q 控制邏輯的結構示意圖。 圖8為說明依據本發明之圖1所示微處理器指令删除裝 置工作原理的流程圖。 圖9為說明依據本發明之圖1所示指令刪除裝置工作原 理的時序圖。 · 圖1 0為說明依據本發明之圖1所示指令刪除裝置工作 原理的時序圖。 圖1 1為說明依據本發明之圖1所示指令删除裝置工作 原理的時序圖。 圖式標記說明 ·· 1 0 0 :微處理器 1 0 2 :控制邏輯12829twf.ptd Page 47 200414035 Brief Description of Drawings Figure 1 is a schematic structural diagram of a microprocessor according to the present invention. FIG. 2 is a schematic diagram illustrating the structure of a first queue of the instruction queue formatted as shown in FIG. 1 according to the present invention. FIG. 3 is a schematic diagram illustrating the structure of a second queue of the instruction queue formatted as shown in FIG. 1 according to the present invention. FIG. 4 is a schematic structural view illustrating a first example of a deletion queue shown in FIG. 1 according to the present invention. Fig. 5 is a structural diagram illustrating a second example of the deletion queue shown in Fig. 1 according to the present invention. Fig. 6 is a schematic structural view illustrating a third example of the deletion queue shown in Fig. 1 according to the present invention. FIG. 7 is a schematic structural diagram of F I Q control logic for generating the F_va 1 i d signal shown in FIG. 1 according to the present invention. Fig. 8 is a flowchart illustrating the operation principle of the microprocessor instruction deletion device shown in Fig. 1 according to the present invention. Fig. 9 is a timing chart illustrating the operation principle of the instruction deletion device shown in Fig. 1 according to the present invention. Fig. 10 is a timing chart illustrating the operation principle of the instruction deletion device shown in Fig. 1 according to the present invention. Fig. 11 is a timing chart illustrating the operation principle of the instruction deletion device shown in Fig. 1 according to the present invention. Explanation of the graphic symbols1 0 0: Microprocessor 1 0 2: Control logic

12829twf.ptd 第48頁 200414035 圖式簡單說明 1 0 4 :指令快取記憶體 1 0 6 :分支目標位址快取記憶體 1 0 8 :前置解碼邏輯 1 1 1 : X_shi ft 信號 1 1 2 :指令位元組緩衝器 1 1 4 :指令位元組緩衝控制邏輯 1 1 6 :指令格式器 1 1 8 : F I Q控制邏輯 1 3 2 :早期佇列 1 3 4 :有效位元 1 3 8 :指令轉譯器 1 4 1 :刪除信號 142 : 1 load 信號 1 4 6 :晚期佇列 148 : X_valid 信號 1 4 9 :有效位元 1 5 1 : I階段 · 152 :F_new_instr 信號 1 5 3 : F階段 1 5 4 :轉譯後的指令佇列 1 5 5 : X階段 1 5 6 : X I Q控制邏輯 157 : R階段 1 6 1 :控制信號12829twf.ptd Page 48 200414035 Brief description of the diagram 1 0 4: Instruction cache 1 0 6: Branch target address cache 1 0 8: Pre-decoding logic 1 1 1: X_shi ft signal 1 1 2 : Instruction byte buffer 1 1 4: Instruction byte buffer control logic 1 1 6: Instruction formatter 1 1 8: FIQ control logic 1 3 2: Early queue 1 3 4: Valid bit 1 3 8: Instruction translator 1 4 1: delete signal 142: 1 load signal 1 4 6: late queue 148: X_valid signal 1 4 9: valid bit 1 5 1: phase I · 152: F_new_instr signal 1 5 3: F phase 1 5 4: Translated instruction queue 1 5 5: X stage 1 5 6: XIQ control logic 157: R stage 1 6 1: Control signal

12829twf.ptd 第49頁 200414035 圖式簡單說明 162 e 1 〇 a d信號 164 e s hi ft 信 號 164 1 〇 ad 信 號 167 指 令 位 元 組 169 196 :前置解碼資訊 171 微 指 令 172 178 :多工器 175 預 測 的 分 支目 標位址 176 執 行 階 段 暫存 器 177 更 正 位 址 179 下 一 個 a 標位 址 181 當 前 選 取 位址 182 當 前 指 令 指標 183 ,185 :暫存器 186 :X — re 1_ in f 〇信 號 187 :格 式 化 後 的指 令佇列 188 :F_ v a 1 i d信號 189 ••有 效 位 元 暫存 器 191 :晚 期 信 號 193 ••早 期 信 號 1 9 4 :分 支 預 測 相關 資訊 1 95 : XIQ_ful 1 信號 1 9 7 :格式化後指令 198 :F_instr_info 信號12829twf.ptd Page 49 200414035 Schematic description 162 e 1 〇ad signal 164 es hi ft signal 164 1 〇ad signal 167 instruction byte 169 196: pre-decode information 171 micro instruction 172 178: multiplexer 175 prediction Branch target address 176 during execution stage register 177 correction address 179 next a mark address 181 currently selected address 182 current instruction index 183, 185: register 186: X — re 1_in f 〇 signal 187: Formatted instruction queue 188: F_va 1 id signal 189 • Valid bit register 191: Late signal 193 • Early signal 1 9 4: Branch prediction related information 1 95: XIQ_ful 1 signal 1 9 7: Formatted command 198: F_instr_info signal

12829twf.ptd 第50頁 200414035 圖式簡單說明 199 : FIQ_ful1 信號 2 0 2 ··時脈訊號 210 , 211 , 212 , 310 , 311 , 312 , 410 , 411 , 412 , 509 , 510 ,511 ,512 :多工器 220 ,221 ,222 ,320 ,321 ,322 ,420 ,421 ,422 ,520 , 521 ,5 2 2 :暫存器 6 0 2 ,7 1 2 :反相器 604 ,606 ,714 :及閘 6 0 8 :或閘12829twf.ptd Page 50 200414035 Brief description of the diagram 199: FIQ_ful1 signal 2 0 2 ··· Clock signal 210, 211, 212, 310, 311, 312, 410, 411, 412, 509, 510, 511, 512: Multi Workers 220, 221, 222, 320, 321, 322, 420, 421, 422, 520, 521, 5 2 2: Registers 60 2, 7 1 2: Inverters 604, 606, 714: and gates 6 0 8: OR gate

12829twf.ptd 第51頁12829twf.ptd Page 51

Claims (1)

200414035 六、申請專利範圍 1 、一種指令刪除裝置,其中一指令在一第一時脈週 期載入於一微處理器的一指令佇列,並將該指令在之後的 一第二時脈週期從指令佇列底端項目輸出,該指令刪除裝 置包括: 一刪除信號,用來傳遞前述第一時脈週期之後的一第 三時脈週期内產生的值; 一刪除佇列,耦合至該刪除信號,用來載入前述第三 時脈週期產生的該刪除信號的值,並於前述第二時脈週期 將此值輸出;以及 一有效性信號,耦合至該刪除佇列,於前述第二時脈 週期產生,用於表示指令是否將被微處理器執行,如果在 前述第二時脈週期由前述刪除佇列輸出的刪除信號的值為 真,則此有效性信號值為假。 2、 根據申請專利範圍第1項所述的指令刪除裝置,其 中前述第三時脈週期與前述第二時脈週期為同一時脈週 期。 3、 根據申請專利範圍第1項所述的指令刪除裝置,其 中前述第三時脈週期為前述第二時脈週期之前一個時脈週 期。 4、 根據申請專利範圍第1項所述的指令刪除裝置,還 包括: 一載入信號,耦合至該刪除佇列,用來在第二時脈週 期内表示指令是否在第一時脈週期已被載入到前述指令佇 列的底端項目。200414035 VI. Patent application scope 1. An instruction deletion device, in which an instruction is loaded into an instruction queue of a microprocessor at a first clock cycle, and the instruction is executed from a second clock cycle thereafter The output of the bottom line of the instruction queue. The instruction deletion device includes: a deletion signal for transmitting a value generated in a third clock cycle after the first clock cycle; and a deletion queue coupled to the deletion signal. For loading the value of the deletion signal generated in the third clock cycle and outputting the value in the second clock cycle; and a validity signal coupled to the deletion queue at the second time A pulse period is generated to indicate whether the instruction will be executed by the microprocessor. If the value of the delete signal output by the aforementioned delete queue in the second clock period is true, the validity signal value is false. 2. The instruction deletion device according to item 1 of the scope of the patent application, wherein the third clock cycle and the second clock cycle are the same clock cycle. 3. The instruction deletion device according to item 1 of the scope of patent application, wherein the third clock cycle is a clock cycle before the second clock cycle. 4. The instruction deletion device according to item 1 of the scope of the patent application, further comprising: a loading signal coupled to the deletion queue for indicating whether the instruction is already in the first clock cycle in the second clock cycle Loaded into the bottom item of the aforementioned instruction queue. 12829twf.ptd 第52頁 200414035 六、申請專利範圍 5、 根據申請專利範圍第4項所述的指令刪除裝置,如 果前述載入信號為真,則前述第三時脈週期與前述第二時 脈週期為同一時脈週期。 6、 根據申請專利範圍第4項所述的指令刪除裝置,如 果前述載入信號為假’則前述第二時脈週期在前述第三時 脈週期之後。 7、 根據申請專利範圍第4項所述的指令刪除裝置,還 包括: 一邏輯,耦接至該刪除佇列,用來在前述第二時脈週 期根據前述載入信號及前述刪除彳宁列輸出的前述删除信號 的值來產生前述有效性信號。 8、 根據申請專利範圍第1項所述的指令刪除裝置,其 中前述刪除佇列包括: 複數個項目,用來儲存相對應之複數個時脈週期内產 生的複數個前述刪除信號的值。 9、 根據申請專利範圍第8項所述的指令刪除裝置,其 中前述複數個刪除,佇列項目中的每一個都包括一個負載資 料輸入端’此負載貧料輸入端搞接至對應的項目以接收前 述刪除信號。 1 0、根據申請專利範圍第8項所述的指令刪除裝置, 其中前述複數個刪除佇列項目中的每一個都包括一個保持 資料輸入端,此保持資料輸入端耦接至對應的項目以接收 此對應項目的當前值。 1 1 、根據申請專利範圍第8項所述的指令刪除裝置,12829twf.ptd Page 52 200414035 VI. Patent application scope 5. According to the instruction deletion device described in item 4 of the patent application scope, if the loading signal is true, the third clock cycle and the second clock cycle For the same clock cycle. 6. According to the instruction deletion device described in item 4 of the scope of the patent application, if the loading signal is false ', the second clock cycle is after the third clock cycle. 7. The device for deleting an instruction according to item 4 of the scope of the patent application, further comprising: a logic coupled to the deletion queue for using the loading signal and the deletion queue in the second clock cycle. The value of the aforementioned deletion signal is output to generate the aforementioned validity signal. 8. The device for deleting an instruction according to item 1 of the scope of the patent application, wherein the deletion queue includes: a plurality of items for storing the values of the plurality of deletion signals generated in the corresponding plurality of clock cycles. 9. According to the instruction deletion device described in item 8 of the scope of the patent application, wherein the plurality of deletions described above, each of the queued items includes a load data input terminal 'this load lean input terminal is connected to the corresponding project to Receive the aforementioned deletion signal. 10. The instruction deletion device according to item 8 of the scope of the patent application, wherein each of the plurality of deletion queue items includes a holding data input terminal, and the holding data input terminal is coupled to the corresponding item to receive The current value of this corresponding item. 1 1 According to the instruction deletion device described in item 8 of the scope of patent application, 12829twf.ptd 第53頁 200414035 六、 申請專利範圍 其 中 前 述 複 數 個 刪 除 佇 列 項 a 中的每一個 都包 括 一 個 轉 移 資 料 輸 入 端 5 此 轉 移 資 料 ¥m 入 端耦接至對 應的 項 目 以 接 收 由 前 述 複 數 個 刪 除 佇 列 項 目 之中位於此 對應 項 § 之 上 的 項 目 所 傳 來 之 刪 除 信 號 的 複 數 個值中的一 個。 12 、 根 據 中 請 專 利 範 圍 第 8項所述的指令刪除裝置 其 中 指 令 佇 列 包 括 複 數 個 項 因 用來儲存複 數個 指 令 , 而 前 述 複 數 個 刪 除 佇 列 項 目 則 用 來 儲存與前述 複數 個 指 令 仔 列 項 a 内 的 前 述 複 數 個 指 令 相 對 應的前述刪 除信 號 的 值 〇 13 根 據 中 請 專 利 範 圍 第 1項所述的指令刪除裝置 , 其 中 的 指 令 包 括 一 個 可 變 長 度 指令。 14 Λ 根 據 中 請 專 利 範 圍 第 1 3項所述的 指令 刪 除 裝 置 其 中 前 述 可 變 長 度 指 令 包 括 一 個X 8 6結構的指令。 > 15 根 據 中 請 專 利 範 圍 第 1 3項所述的 指令 刪 除 裝 置 指 令 由 _ 一 指 令 格 式 器 在 第 一 時 脈週期提供 給指 令 佇 列 5 前 述 指 令 格 式 器 決 定 指 令 的 長 度 〇 16 根 據 中 請 專 利 範 圍 第 1項所述的指令刪除裝置 , 其 中 前 述 指 令 在 前, 述 第 二 時 脈 週期由前述 指令 佇 列 底 端 項 9 輸 出 給 指 令 轉 譯 器 以 轉 譯 成為一個或 多個 微 指 令 並 由 微 處 理 器 根 據 前 述 有 效 性 信 號選擇性執 行。 17 一 種 在 微 處 理 器 中 刪 除指令的方 法, 包 括 • 在 _ 一 第 時 脈 週 期 將 一 指 令載入在一 第一 佇 列 中 , 在 前 述 第 一 時 脈 週 期 之 後 的一第二時 脈週 期 產 生 一 删 除 信 號 9 在 前 述 第 時 脈 週 期 内 將 前述刪除信 號的 個 值 載 入12829twf.ptd Page 53 200414035 VI. Scope of patent application Where each of the aforementioned plurality of deletion queue items a includes a transfer data input terminal 5 This transfer data ¥ m input terminal is coupled to the corresponding item to receive a One of a plurality of values of a deletion signal transmitted from an item above the corresponding item among the plurality of deletion queue items. 12. According to the instruction deletion device described in Item 8 of the patent scope, wherein the instruction queue includes a plurality of items for storing a plurality of instructions, and the plurality of deletion queue items are used to store the plurality of instructions. The value of the deletion signal corresponding to the plurality of instructions in the column a. According to the instruction deletion device described in item 1 of the patent scope, the instruction includes a variable-length instruction. 14 Λ According to the instruction deletion device described in item 13 of the patent application, the variable length instruction mentioned above includes an X 8 6 structure instruction. > 15 According to the instruction deletion device described in item 13 of the patent application, the instruction is provided by an instruction formatter to the instruction queue in the first clock cycle. 5 The foregoing instruction formatter determines the length of the instruction. 16 According to the request The instruction deletion device according to item 1 of the patent scope, wherein the foregoing instruction is first, and the second clock cycle is outputted to the instruction translator by the bottom end item 9 of the foregoing instruction queue to be translated into one or more microinstructions and processed by the micro. The processor executes selectively according to the aforementioned validity signal. 17 A method for deleting instructions in a microprocessor, including: • loading an instruction in a first queue at a _ first clock cycle, and generating a second clock cycle after the aforementioned first clock cycle A deletion signal 9 loads the values of the deletion signal in the aforementioned clock cycle 12829twf.ptd 第54頁 200414035 六、申請專利範圍 在一第二佇列中; 在一第三時脈週期判斷該第二佇列内的前述值是否為 真,在前述第三時脈週期内指令由前述第一佇列的底端項 目輸出;以及 如果前述值為真,執行前述指令。 1 8、根據申請專利範圍第1 7項所述的在微處理器中刪 除指令的方法,其中前述第三時脈週期與前述第二時脈週 期為同一時脈週期。 1 9、根據申請專利範圍第1 7項所述的在微處理器中刪 除指令的方法,其中前述第三時脈週期為前述第二時脈週 期的下一個時脈週期。 2 0、根據申請專利範圍第1 7項所述的在微處理器中删 除指令的方法,還包括: 在將前述指令載入前述第一彳宁列之前,將前述指令格 式化。 2 1、根據申請專利範圍第1 7項所述的在微處理器中刪 除指令的方法,還,包括: 在將前述指令載入於前述第一佇列之後,判斷前述指 令有否在前述第一佇列内向下轉移;以及 如果前述指令在前述第一佇列内已被向下轉移,在將 前述刪除信號的值載入第二佇列之後,將前述刪除信號的 值在前述第二佇列内向下轉移。 2 2、根據申請專利範圍第1 7項所述的在微處理器中刪 除指令的方法,還包括:12829twf.ptd Page 54 200414035 VI. The scope of patent application is in a second queue; in a third clock cycle, it is judged whether the aforementioned value in the second queue is true, and the instruction is commanded in the aforementioned third clock cycle Output by the bottom item of the aforementioned first queue; and if the aforementioned value is true, execute the aforementioned instruction. 18. The method for deleting instructions in a microprocessor according to item 17 of the scope of the patent application, wherein the third clock cycle and the second clock cycle are the same clock cycle. 19. The method for deleting instructions in a microprocessor according to item 17 of the scope of the patent application, wherein the third clock cycle is the next clock cycle of the second clock cycle. 20. The method for deleting instructions in a microprocessor according to item 17 of the scope of the patent application, further comprising: formatting the aforementioned instructions before loading the aforementioned instructions into the aforementioned first column. 2 1. The method for deleting an instruction in a microprocessor according to item 17 of the scope of the patent application, further comprising: after loading the foregoing instruction in the first queue, determining whether the instruction is in the foregoing Shift down in a queue; and if the instruction has been shifted down in the first queue, after the value of the delete signal is loaded into the second queue, the value of the delete signal is transferred in the second queue Move down within the column. 2 2. The method for deleting instructions in a microprocessor according to item 17 of the scope of patent application, further comprising: 12829twf.ptd 第55頁 200414035 六、申請專利範圍 在將前述指令載入前述第一彳宁列之前,預測前述指令 為進行分支指令; 發現一個對前述分支進行指令的錯誤預測;以及 回應前述發現的錯誤預測,在前述第二時脈週期,進 行前述產生前述刪除信號的操作。 2 3、根據申請專利範圍第2 2項所述的在微處理器中刪 除指令的方法,其中微處理器的一個分支位址快取記憶體 進行前述指令為進行分支指令的預測。 2 4、根據申請專利範圍第2 2項所述的在微處理器中刪 除指令的方法,其中,對前述分支指令的錯誤預測包含一 個對前述分支指令長度的錯誤預測。 2 5、根據申請專利範圍第2 2項所述的在微處理器中刪 除指令的方法,其中,對前述分支指令的錯誤預測包含一 個對前述分支指令位址的錯誤預測。 2 6、根據申請專利範圍第2 2項所述的在微處理器中刪 除指令的方法,其中,對前述分支指令的錯誤預測包含將 非分支指令判斷為·分支指令的錯誤預測。 2 7、根據申請專利範圍第1 7項所述的在微處理器中删 除指令的方法,還包括: 基於一個分支指令進行的預測,使此微處理器分支處 理,該指令為此分支指令之下一個指令;以及 在使此微處理器分支處理之後,在前述第二時脈週期 進行產生前述刪除信號的操作。 2 8、根據申請專利範圍第1 7項所述的在微處理器中刪12829twf.ptd Page 55 200414035 6. Scope of patent application Before loading the aforementioned instruction into the aforementioned first column, predict that the aforementioned instruction is a branch instruction; find an incorrect prediction of the instruction for the aforementioned branch; and respond to the findings In the wrong prediction, during the second clock cycle, the operation for generating the deletion signal is performed. 2 3. The method for deleting instructions in a microprocessor according to item 22 of the scope of patent application, wherein a branch address cache memory of the microprocessor performs the foregoing instruction to predict the branch instruction. 24. The method for deleting instructions in a microprocessor according to item 22 of the scope of the patent application, wherein the misprediction of the branch instruction includes a misprediction of the length of the branch instruction. 25. The method for deleting instructions in a microprocessor according to item 22 of the scope of the patent application, wherein the misprediction of the branch instruction includes a misprediction of the address of the branch instruction. 26. The method for deleting instructions in a microprocessor according to item 22 of the scope of the patent application, wherein the misprediction of the branch instruction includes a misprediction that judges a non-branch instruction as a branch instruction. 27. The method for deleting instructions in a microprocessor according to item 17 of the scope of the patent application, further comprising: predicting based on a branch instruction to cause this microprocessor to branch, and the instruction is one of the branch instructions. The next instruction; and after branching the microprocessor, performing the operation of generating the deletion signal in the second clock cycle. 2 8. Delete in the microprocessor according to item 17 of the scope of patent application 12829twf.ptd 第56頁 200414035 六、申請專利範圍 除指令的方法,其中,前述指令為被預測進行的分支指令 的下一個指令,該方法還包括: 回應於發現前述分支指令被預測為進行,在前述第二 時脈週期進行產生前述刪除信號的操作。 2 9、一種微處理器,包括: 一第一佇列,用來接收一個指令並進行緩衝; 一邏輯,耦接至前述第一佇列,用來發現一個前述指 令不能被微處理器執行的情況,其中前述邏輯在一個信號 上產生一個真值來說明前述情況,其中前述具真值的信號 在前述指令被前述第一佇列接收之後產生;以及 一第二佇列,耦接至前述邏輯,用來載入前述具真值 的信號並隨後與前述第一佇列輸出前述指令同時輸出前述 真值,其中微處理器回應前述具真值的信號並作廢前述指 令,而不對其進行執行。 3 0、根據申請專利範圍第2 9項所述的微處理器,其中 前述第二佇列包括: 多個儲存單元·,用來儲存前述邏輯在相應的多個時脈 週期内產生的前述信號的多個值。 3 1 、根據申請專利範圍第3 0項所述的微處理器,其中 前述第一佇列包括複數個儲存單元,用來儲存複數個指 令,其中前述第二佇列的複數個儲存單元用來儲存與前述 第一佇列的複數個儲存單元所儲存的前述複數個指令相應 的複數個信號值。 3 2、一種包含於傳輸媒介中的電腦資料信號,包括:12829twf.ptd Page 56 200414035 VI. Patent application method of dividing instructions, wherein the foregoing instruction is the next instruction of a branch instruction that is predicted to proceed, and the method further includes: in response to finding that the foregoing branch instruction is predicted to proceed, in The second clock cycle performs the operation of generating the deletion signal. 29. A microprocessor, comprising: a first queue for receiving an instruction and buffering; a logic coupled to the first queue for discovering a instruction that cannot be executed by the microprocessor A situation in which the foregoing logic generates a true value on a signal to explain the foregoing situation, wherein the signal having the true value is generated after the instruction is received by the first queue; and a second queue is coupled to the logic Is used to load the signal with a true value and then output the true value at the same time as the first queue output the instruction, wherein the microprocessor responds to the signal with a true value and invalidates the instruction without executing it. 30. The microprocessor according to item 29 of the scope of patent application, wherein the second queue includes: a plurality of storage units, which are used to store the aforementioned signals generated by the aforementioned logic in corresponding multiple clock cycles Multiple values. 31. The microprocessor according to item 30 of the scope of the patent application, wherein the first queue includes a plurality of storage units for storing a plurality of instructions, and the plurality of storage units for the second queue includes A plurality of signal values corresponding to the plurality of instructions stored in the plurality of storage units in the first queue are stored. 3 2. A computer data signal contained in a transmission medium, including: 12829twf.ptd 第57頁 200414035 六、申請專利範圍 一電腦可讀程式碼,用來提供一裝置以刪除在第一時 脈週期載入微處理器指令佇列而在一第二時脈週期由指令 佇列底端項目輸出的指令,且此第二時脈週期係在前述第 一時脈週期之後,前述電腦程式碼包括: 一第一程式碼,用來提供一刪除信號,用來傳遞一個 在前述第一時脈週期之後的第三時脈週期内產生的值; 一第二程式碼,用來提供一刪除佇列,耦接至前述刪 除信號,用來載入前述第三時脈週期產生的前述刪除信號 的值,並在第二時脈週期内輸出前述刪除信號的值;以及 一第三程式碼,用來提供一有效性信號,耦接至前述 删除佇列,該有效性信號在第二時脈週期產生,用來指示 前述指令是否將被微處理器執行,其中,如果前述刪除佇 列在第二時脈週期内輸出的前述刪除信號值為真,則前述 有效性信號值為假。12829twf.ptd Page 57 200414035 VI. Patent Application Scope A computer-readable code for providing a device to delete the queue of microprocessor instructions loaded in the first clock cycle and the instructions in the second clock cycle The instruction output by the bottom item of the queue, and the second clock cycle is after the aforementioned first clock cycle, the aforementioned computer code includes: a first code for providing a delete signal for passing a A value generated in the third clock period after the first clock period; a second code for providing a delete queue, coupled to the delete signal, for loading the third clock period generated The value of the deletion signal and outputs the value of the deletion signal in the second clock cycle; and a third code for providing a validity signal coupled to the deletion queue, the validity signal is The second clock cycle is generated to indicate whether the foregoing instruction will be executed by the microprocessor. If the value of the deletion signal output by the deletion queue in the second clock cycle is true, then the previous The validity of the signal is false. 12829twf.ptd 第58頁12829twf.ptd Page 58
TW93100761A 2003-01-14 2004-01-13 Apparatus and method for killing an instruction after loading the instruction into an instruction queue in a pipelined microprocessor TWI249131B (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US44006303P 2003-01-14 2003-01-14

Publications (2)

Publication Number Publication Date
TW200414035A true TW200414035A (en) 2004-08-01
TWI249131B TWI249131B (en) 2006-02-11

Family

ID=34375164

Family Applications (1)

Application Number Title Priority Date Filing Date
TW93100761A TWI249131B (en) 2003-01-14 2004-01-13 Apparatus and method for killing an instruction after loading the instruction into an instruction queue in a pipelined microprocessor

Country Status (2)

Country Link
CN (1) CN1316353C (en)
TW (1) TWI249131B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101604235B (en) * 2009-07-10 2012-03-28 杭州电子科技大学 Method for branch prediction of embedded processor
US9317293B2 (en) * 2012-11-28 2016-04-19 Qualcomm Incorporated Establishing a branch target instruction cache (BTIC) entry for subroutine returns to reduce execution pipeline bubbles, and related systems, methods, and computer-readable media
CN109708156B (en) * 2018-10-25 2024-04-12 青岛海尔智能技术研发有限公司 Control method for gas stove and gas stove
CN114090077B (en) * 2021-11-24 2023-01-31 海光信息技术股份有限公司 Method and device for calling instruction, processing device and storage medium

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TW253946B (en) * 1994-02-04 1995-08-11 Ibm Data processor with branch prediction and method of operation
US5644779A (en) * 1994-04-15 1997-07-01 International Business Machines Corporation Processing system and method of operation for concurrent processing of branch instructions with cancelling of processing of a branch instruction
US5649137A (en) * 1994-10-20 1997-07-15 Advanced Micro Devices, Inc. Method and apparatus for store-into-instruction-stream detection and maintaining branch prediction cache consistency
US6289442B1 (en) * 1998-10-05 2001-09-11 Advanced Micro Devices, Inc. Circuit and method for tagging and invalidating speculatively executed instructions

Also Published As

Publication number Publication date
CN1316353C (en) 2007-05-16
TWI249131B (en) 2006-02-11
CN1549113A (en) 2004-11-24

Similar Documents

Publication Publication Date Title
EP1513062B1 (en) Apparatus, method and computer data signal for selectively overriding return stack prediction in response to detection of non-standard return sequence
US9858081B2 (en) Global branch prediction using branch and fetch group history
US7203824B2 (en) Apparatus and method for handling BTAC branches that wrap across instruction cache lines
KR101059335B1 (en) Efficient Use of JHT in Processors with Variable Length Instruction Set Execution Modes
EP2084602B1 (en) A system and method for using a working global history register
US7143269B2 (en) Apparatus and method for killing an instruction after loading the instruction into an instruction queue in a pipelined microprocessor
US7234045B2 (en) Apparatus and method for handling BTAC branches that wrap across instruction cache lines
JP2018063684A (en) Branch predictor
US7159097B2 (en) Apparatus and method for buffering instructions and late-generated related information using history of previous load/shifts
US20130007425A1 (en) Processor and data processing method incorporating an instruction pipeline with conditional branch direction prediction for fast access to branch target instructions
JP5815596B2 (en) Method and system for accelerating a procedure return sequence
US7844806B2 (en) Global history branch prediction updating responsive to taken branches
US7689816B2 (en) Branch prediction with partially folded global history vector for reduced XOR operation time
CN111459550B (en) Microprocessor with highly advanced branch predictor
TW200414035A (en) Apparatus and method for killing an instruction after loading the instruction into an instruction queue in a pipelined microprocessor
TWI232403B (en) Apparatus and method for buffering instructions and late-generated related information using history of previous load/shifts
JP5696210B2 (en) Processor and instruction processing method thereof
CN112130897A (en) Microprocessor
US11816489B1 (en) Microprocessor with prediction unit pipeline that provides a next fetch address at a rate of one per clock cycle
CN111459551B (en) Microprocessor with highly advanced branch predictor
US20240045610A1 (en) Prediction unit with first predictor that provides a hashed fetch address of a current fetch block to its own input and to a second predictor that uses it to predict the fetch address of a next fetch block
JP3851235B2 (en) Branch prediction apparatus and branch prediction method

Legal Events

Date Code Title Description
MK4A Expiration of patent term of an invention patent