TWI232403B - Apparatus and method for buffering instructions and late-generated related information using history of previous load/shifts - Google Patents

Apparatus and method for buffering instructions and late-generated related information using history of previous load/shifts Download PDF

Info

Publication number
TWI232403B
TWI232403B TW92123370A TW92123370A TWI232403B TW I232403 B TWI232403 B TW I232403B TW 92123370 A TW92123370 A TW 92123370A TW 92123370 A TW92123370 A TW 92123370A TW I232403 B TWI232403 B TW I232403B
Authority
TW
Taiwan
Prior art keywords
instruction
clock cycle
queue
branch
item
Prior art date
Application number
TW92123370A
Other languages
Chinese (zh)
Other versions
TW200422947A (en
Inventor
Thomas C Mcdonald
Original Assignee
Ip First Llc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US10/422,057 external-priority patent/US7159097B2/en
Application filed by Ip First Llc filed Critical Ip First Llc
Publication of TW200422947A publication Critical patent/TW200422947A/en
Application granted granted Critical
Publication of TWI232403B publication Critical patent/TWI232403B/en

Links

Landscapes

  • Advance Control (AREA)

Abstract

An instruction buffering apparatus is disclosed. The apparatus includes an early queue and a late queue. The early queue receives an instruction generated during a first clock cycle. The late queue receives information related to the instruction during a second clock cycle subsequent to the first clock cycle. The early queue receives load/shift control signals for loading/shifting the early queue. Registers receive the early queue load/shift signals and provide delayed versions of the signals to the late queue for controlling loading/shifting the related information in the late queue. The late queue is configured such that when the apparatus is empty, the related information may be provided during the second clock cycle, i.e., in the same clock cycle that its related instruction is provided from the early queue.

Description

1232403 五、發明說明(1) 優先權資訊 [ 000 1 ]本申請案主張以下美國申請案之優先權··案 號10/422,057,申請日為2〇〇3年4月23日。 【發明所屬之技術領域】 [0 0 0 2 ]本發明係有關於微處理器之管線執行 (pipe 1 ining)領域,特別是關於管線化微處理器中的指令 緩衝。 【先前技術】1232403 V. Description of the invention (1) Priority information [000 1] This application claims the priority of the following U.S. applications: Case No. 10 / 422,057. The application date is April 23, 2003. [Technical Field to which the Invention belongs] [0 0 0 2] The present invention relates to the field of pipe 1 ining of a microprocessor, and particularly to instruction buffering in a pipelined microprocessor. [Prior art]

[ 00 0 3 ]現代的微處理器均為管線化微處理器。亦 I 即,在微處理器的不同區塊或管線階段内,可同時執行好 幾個指令。Hennessy與Pat ter son將管線執行定義為「多 個指令可重疊執行的實作技術」。參見計算機結構··量化 方法(第二版),1 996年由加州舊金山的M〇rgan Kaufmann 出,公司印行,John L· Hennessy Patters〇n 所著。接著,他們對管線執行做了以下精采的解說: 管線與組裝線類似。在汽車組裝線中,有許多步驟, 每個步驟對於汽車的建造都有某些貢獻。每個步驟雖然是 同的汽車上進行,但會與其他步驟並行地運作。在電, 月包管線中,每個步驟會完成指令的一部份。如同組裝線, =同步称會並行地完成不同指令的不同部份。這些步驟的 每一個稱為管線階段或管線區段。這些一 連接至下個階段,而形成管道指;會從一端進入 1232403 五、發明說明(2) 過這些階段,並從另一端離開,就 [ 0004 ] 常,每經過一時脈週期,指令就從 ,運作。通 段傳至另一階段。在汽車組裝線中,=°官線的—個階 為沒有汽車要組裝而處於閒置狀態,:此= H 效能便會降低。同樣地,若微處理器士產此或 行,而在-時脈週期處於閒置狀態(通常又稱為管有上令要進 (pipeline bubble)),則處理器的效能會降低。,沫 [ 0 0 0 5 ]在本說明書中,微處理器管線各 邏輯上可區分成二個部份。上層部份會提取及解:二在 =送到,行指令的下層部份。上層部份通常包括指令:取 ^以攸δ己憶體提取程式指令。因為從系統記憶體提取指 二所需的時間相當長,戶斤以上層部份還包括指令快取 J二快取從記憶體中所提取的指令,減少後續的指;: 取層管線階段的主要王作為當執行階段準備好執 仃4曰令時,將這些指令準備妥當。 匕凡[ 00 0 6 ]要避免執行階段產生泡沫,通常在上層管線 所採用的一種方法就是預先讀取程式,將多個曰程式指 k取至私令緩衝器。該指令緩衝器會將指令送到已準 好執行指令之執行階段。指令緩衝器常被配置為先進先 出記憶體,或佇列。 人[0007 ]指令緩衝技術在執行階段所需之一或多個指 、々不存在於指令快取記憶體時,是特別有助益的。在此情 /兄下’能減低多少遺失快取線所造成的影響,係依執行記[00 0 3] Modern microprocessors are all pipelined microprocessors. That is, several instructions can be executed simultaneously in different blocks or pipeline stages of the microprocessor. Hennessy and Pat Son defined pipeline execution as "an implementation technique in which multiple instructions can be executed on top of each other." See Computer Architecture · Quantitative Methods (Second Edition), 1996 by Morgan Kaufmann, San Francisco, California, Corporate Printing, John L. Hennessy PattersOn. Then they gave the following brilliant explanations of pipeline execution: Pipelines are similar to assembly lines. There are many steps in a car assembly line, each of which contributes to the construction of the car. Each step is performed on the same car, but will run in parallel with the other steps. In the electricity, monthly package pipeline, each step will complete a part of the instruction. Like the assembly line, the = synchronous scale will complete different parts of different instructions in parallel. Each of these steps is called a pipeline stage or pipeline section. These are connected to the next stage and form a pipe finger; they will enter 1232403 from one end. 5. Description of the invention (2) After passing through these stages and leaving from the other end, it is often [0004]. Every time a clock cycle passes, the instruction starts from , Operation. The passage passed to another stage. In the car assembly line, = ° of the official line is a stage where there is no car to be assembled and it is in an idle state: this = H performance will be reduced. Similarly, if the microprocessor is capable of doing this, and the -clock period is idle (commonly known as a pipeline bubble), the processor's performance will decrease. Mo [0 0 0 5] In this specification, each of the microprocessor pipelines can be logically divided into two parts. The upper part will be extracted and solved: the second part is = sent to the lower part of the command. The upper part usually includes instructions: fetching program instructions from δδ memory. Because it takes a long time to extract the second finger from the system memory, the upper part of the household also includes the instruction cache. The second instruction fetches the instructions extracted from the memory to reduce subsequent instructions; The main king prepares these instructions when the execution phase is ready to execute the 4th command. Dian Fan [00 0 6] To avoid bubbles in the execution phase, one method usually used in the upper pipeline is to read the program in advance and fetch multiple program pointers k to the private buffer. The instruction buffer sends the instruction to the execution stage where the instruction is ready to execute. The instruction buffer is often configured as FIFO or queue. [0007] Instruction buffering technology is particularly helpful when one or more of the instructions required in the execution stage are not present in the instruction cache memory. In this case, my brother ’can reduce the impact of how many missing cache lines are

1232403 五、發明說明(3) 憶體提取時 定。 [ 0008 ] 也是有用的 指令緩衝器所能提供至執行階段的指令 在程式中具有分支指令的情況下,緩衝技術 現代的微處理器係利用分支預測邏輯,來預 測是否會採行分支指令。若採行,則產生分支指令的目標 位址,而指令係從此目標位址處提取,而不是從下個循序 提取位址提取,並且會送到指令緩衝器。 [ 0009 ]在指令送至執行階段前須先進行某些處理的 情況下’指令緩衝也是有幫助的。例如,在某些處理器 中,指令集允許指令的長度為可變數目位元組了因此°,處 理器必須解碼一串指令位元組,並判斷下個指令的類型, 以決定其長度。每個指令的啟始處係由先前指令的長产來 決定。此程序通常稱為指令格式化。由於指令格式化^要 -些處理時間,所以將多個指令格式化,i將管線上層部 份中之格式化指令予以緩衝是有幫助的,“ 格式化指令即可送至執行階段處理。 & % # [0010]㊉了提取指令外,上層管線階段也 所提取指令相關的資訊’這些資訊並非 身、 測相關資訊,執行階段可能合 =子疋刀支預 歷,或更正預測錯誤的分φ = X 更新分支預測經 度,必項在處理$ 曰令。另一種例子是指令的長 的情況中決定出 位7L組已準備好送到指令緩衝器 須與其相關的指 度具必須在處理器執行可變長度指令 來。相關資訊可能在指令 的時脈週期之後產生1而,相關資訊必1232403 V. Description of the invention (3) Timing of memory recall. [0008] It is also useful that the instruction buffer can provide instructions to the execution stage. In the case of a program with branch instructions, buffering technology. Modern microprocessors use branch prediction logic to predict whether branch instructions will be taken. If it is taken, the target address of the branch instruction is generated, and the instruction is fetched from this target address instead of the next sequential fetch address and sent to the instruction buffer. [0091] In the case that some processing must be performed before the instruction is sent to the execution stage, the 'instruction buffering is also helpful. For example, in some processors, the instruction set allows the length of the instruction to be a variable number of bytes. Therefore, the processor must decode a string of instruction bytes and determine the type of the next instruction to determine its length. The beginning of each instruction is determined by the long production of the previous instruction. This procedure is often called instruction formatting. Since instruction formatting requires some processing time, it is helpful to format multiple instructions and buffer the formatting instructions in the upper part of the pipeline. "Formatting instructions can be sent to the execution stage for processing. & Amp % # [0010] In addition to the fetch instruction, the information related to the instruction is also extracted in the upper pipeline stage. 'These information is not physical and measurement related information, and the execution phase may be combined = the child's knife pre-calendar, or correcting the wrong prediction. φ = X Updates the branch prediction longitude, which must be processed in the $ command. Another example is the case where the instruction is long. The 7L group is ready to be sent to the instruction buffer. The pointer associated with it must be executed on the processor. Variable-length instructions come. Related information may be generated after the instruction's clock cycle.

第10頁 1232403Page 10 1232403

五、發明說明(4) 令同步地送到執行階段。 [0011]對此問題的一種解決方式為増加另一個管線 階段,給予相關資訊更多時間以進行緩衝及送至執行階 段。然而,此方式會有潛在地降低效能的缺點。尤1,告 分支指令預測錯誤時,在預測錯誤之分支指令前的所有管 線階段必須清空其指令,並且指令的提取必須再回到預澳J 錯誤的分支處。所需清空的階段數愈多,則微處理器管線 的執行階段產生泡沫的可能性也愈大。因此,理想上希望 能使管線階段儘可能地少。是以,此問題需要較佳的夺 方式。V. Description of the invention (4) The orders are sent to the execution stage synchronously. [0011] One solution to this problem is to add another pipeline stage, giving the relevant information more time to buffer and send to the execution stage. However, this approach has the disadvantage of potentially reducing performance. In particular, when a branch instruction is predicted to be wrong, all pipeline stages before the branch instruction that is predicted to be wrong must be cleared of its instructions, and the fetch of instructions must be returned to the branch where the J is wrong. The more stages that need to be emptied, the more likely it is that the execution phase of the microprocessor pipeline will generate bubbles. Therefore, it is desirable to have as few pipeline stages as possible. Therefore, this problem requires a better way.

【發明内容】 [0012] 置接收指令 接收至該緩 入空的緩衝 而在產生此 段。因此, 供一種用以 置,其中此 才取得此相 數個項目, 二佇列,具 目,每個項 个放%所狀刚返問題的做法是,比一緩衝 位元組晚一個時脈週期’才將相關的指、 衝裝置。此裝置包括—項工具,可於指令= ,時’使相關指令資訊能有效地繞過緩衝 的相同時脈週期内,被送到執行階’ if始上述ϋ的,本發明的一項特徵是,提 緩衝管線化微處理器中 k 緩衝裝置係直到取P托才曰7及相關貝訊的裝 關:罝糸罝到取件指令後至少-時脈週期 :::。此裝置包括第-仔列,其具有Γ二 儲存1令。此裝置還【括ί 目係%、於第一複數個項目之第二複數個項 弟t列中一對應指令有關的[Summary of the Invention] [0012] Set the receiving instruction to the buffer buffer and generate this segment. Therefore, for the purpose of providing a set of items, in which only a few items are obtained, the second line, the item, each item is placed in %%, the approach to the problem is to be one clock later than a buffer byte. 'Cycle' will be associated with the finger and punch device. This device includes an item that can be sent to the execution stage in the same clock cycle when the instruction = enables the relevant instruction information to effectively bypass the buffer. If the above is started, a feature of the present invention is The k-buffer device in the buffered pipelined microprocessor is not loaded until the P request is 7 and the relevant Besson device is installed: at least-the clock cycle after the fetch instruction :::. This device includes a first row, which has Γ, 2 storage and 1 ream. This device also includes items related to a corresponding instruction in the second plurality of items in the first plurality of items.

五、發明說明(5) ^ :此裝置還包括複數個控制訊號,耦接至第一佇列, 包括複ί個保:該些指令於第-佇列中。此裝置還 些控制訊號,以載入、移位及保持相關資訊 令缓=113。] 一方面,本發明的一項特徵是提供一種指 (u . a令緩衝器包括複數個多工式暫存器 、罗~:eglSter),每個係用以儲存一指令。此指令緩衝 器還包括複數個暫在· 衡 關的資訊。此指令緩ί dm:存器中之指令有 ΐ入以產生一控制訊號,以選擇性地將指^ 暫存中的一個。此指令緩衝器還包括- 值,並在第-昧;第一時脈週期内’接收控制訊號上的 值並在第夺脈週期後之n :選;於這些多工式暫存器中==其 中該二暫存式夕工器的相關資訊。 ’、 [0014]另一方面,本發明的一項 處理器。此微處理器包括一指令、楗供一種微 脈週期内,輸出一分支指令:=器;用以在第-時 輯,用以在第一睥脐、用登日α 处理器還包括一控制邏 軏用以在第時脈週期後之第二時脈週期内 : 分支指令之預測有關的資訊。此微處產生一此 衝器,耦接至指令格式化器,包括-指令緩 衝此分支指令,並在第二時脈週期内接收該=期= 1232403 五、發明說明(6) ___5. Description of the invention (5) ^: This device further includes a plurality of control signals coupled to the first queue, including multiple guarantees: the instructions are in the-queue. This device also has some control signals to load, shift and hold relevant information. Slow = 113. ] In one aspect, a feature of the present invention is to provide a finger (u.a. the buffer includes a plurality of multiplexed registers, Luo ~: eglSter), each of which is used to store an instruction. This instruction buffer also contains a number of temporary and balance information. This instruction buffers the instruction in dm: register to generate a control signal to selectively store one of the instructions ^ temporarily. This instruction buffer also includes-value, and is in the-first; in the first clock cycle, 'receives the value on the control signal and n after the first pulse capture cycle: select; in these multiplexed registers = = Among them, the related information of the two temporary-type wickers. [0014] In another aspect, a processor of the present invention. The microprocessor includes an instruction and a branch instruction for a micro-pulse period, and outputs a branch instruction: = device; used in the first time series, used in the first umbilical cord, using the date α processor and includes a control The logic is used in the second clock cycle after the first clock cycle: Information about the prediction of the branch instruction. A buffer is generated at this place, which is coupled to the instruction formatter, including-the instruction buffers this branch instruction, and receives it in the second clock cycle = period = 1232403 V. Description of the invention (6) ___

緩衝器在第一時脈週期期間,I 選擇性地輸出該資訊。芒二工’則在第二時脈週期内, 不為空,則在第二時脈週 間衝時脈週期期間 [0015] 另一方面,本發明的項擇拉f生地緩衝該資訊。 以緩衝具有管線的微處理器中之入項特徵是提供一種用 此方法包括在第一時脈週期内^二二及相關資訊的方法。 及在第-時脈週期後之第二時脈週心载列,以 關的資訊。此方法還包括判斷此指 產=與此指令有 間,是否從此佇列移出,在第一時脈週期期 則在第二時脈週期内將此相令未從此佇列移出, 而將此相關資訊連同此指令送至管線功翊間,繞過此佇列 [0016] 另一方面,本發明的一、項 桩一 令緩衝器。此指令緩衝器包括第一工、J疋棱供一種指 端、一保持資料輸入端、入資祖,具有一輸出 週期内接收一指令,以及一铨^,入端以在第一時脈 號。若此控制輸入端之值為】,則j以接收第一控制訊 料輸入端,否則會選取保持資輸:多卫器選取載入資 包括第一暫存卜具有心哭此指令緩衝器還 入端,以及耦接至第一多工器的 二的輸出端之一輸 端。此指令緩衝器還包括第^暫存恶貝料輸入端之一輸出 輸出端。此指令緩衝器還包括第—夕,具有一輸入端及一 二暫存器的輸入端之一輪=第::工器,具有輕接至第 端之一保持資料輸入端、一載第二暫存器的輸出 載入貝枓輪入端以在第一時脈 !232403 五、發明說明(7) _ 週期後之楚_ + / -控制輸入端:^ : ^内接收與該指令有關的資訊,以及 為真,則第二多工—控制訊號。若此控制輸入端之值 持資料輸入端。此二j,載)資料輪入端’否則會選取保 輪入端以^货 曰7緩衝器還包括第三暫存器,具有一 出端以在第二睹:广脈週期内接收第-控制訊號,以及一輸 -控制訊號在π:内產生第二控制訊號。因此,若第 内,會輪出期間為真,則在第二時脈週期 [οοπΐ :曰一古 關資訊° 含於傳輪媒介中的面?發明的-項特徵是提供-種内 腦可讀取程π , 61貝料汛號。此電腦資料訊號包括電 指令二ϊί:的;Τ、?化微處理器中提供-種可緩】 式碼得此相關資訊。此程式碼包括第-程 俜用以綠Ϊ 列,其具有第一複數個項目,每個 一望-於以^ 此耘式碼還包括第二程式碼,以提供 =一仔列,其具有對應於該第一複數個 赵The buffer selectively outputs this information during the first clock cycle. Mang Ergong 'is not empty during the second clock cycle, and then during the second clock cycle. [0015] On the other hand, the item of the present invention selectively buffers this information. An entry feature in a microprocessor with a pipeline is to provide a method for using this method to include the second and second information in the first clock cycle. And the second clock period after the first clock period is listed with relevant information. The method also includes judging whether the index is related to this instruction, whether to move out of the queue, and in the first clock cycle period, the order is not removed from the queue, and the correlation is related. Information along with this instruction is sent to the pipeline, bypassing this queue. [0016] On the other hand, the invention has a buffer and a buffer. The instruction buffer includes a first terminal, a finger terminal, a data input terminal, and an ancestor. It has an instruction to receive an instruction in an output cycle, and a 铨, the input terminal to receive the first clock number. . If the value of this control input is], j is used to receive the first control signal input, otherwise it will choose to keep the data lost: the multi-server selects the load data including the first temporary buffer with the crying command buffer and An input terminal, and an output terminal coupled to one of the two output terminals of the first multiplexer. The instruction buffer also includes an output output terminal, which is one of the input terminals of the temporary storage cache material. This instruction buffer also includes the first-night, one input terminal with one input terminal and one or two temporary registers. Round = :: Worker, which has a data input terminal that is lightly connected to the first terminal and a second temporary load. The output of the register is loaded into the input terminal of the Behr wheel at the first clock! 232403 V. Description of the invention (7) _ After the cycle _ + /-Control input terminal: ^: ^ receives information related to the instruction , And if true, then the second multiplex—control signal. If the value of this control input is the data input. These two j, contained) data round-in end 'Otherwise, the round-end end of the data will be selected. The buffer also includes a third register, which has an output end to receive the first- The control signal and an input-control signal generate a second control signal within π :. Therefore, if within the period, the revolving period is true, then in the second clock cycle [οοπΐ: Said an ancient customs information ° Face included in the transmission medium? The invention-item feature is to provide a species-internal brain readable range π, 61 shellfish number. This computer data signal includes the electrical instruction II :; T, the microprocessor provides-a kind of slow] code to get this related information. This code includes the first-Cheng line for the green line, which has a first plurality of items, each one looking-Yu Yi ^ This task code also includes a second code to provide = a line, which has a corresponding In the first plurality of Zhao

”目,每個係用以儲存與第一仔列中一對應;㈡J i麵括第三程式碼,以提供複數個控制訊 號輕接至第叫丁列,用以載入、移位及保持第一 的指令。此程式碼還包括第四程式碼,以提供複數個暫$ 器,接收該些控制訊號,並輪出延遲一時脈週期之該些 制訊號,以載入、移位及保持第二佇列中的相關資訊了二 [0018]本發明的一項優點是,可使用指令緩衝器或 佇列,而避免增加另一個管線階段,藉以增進處理器的 1232403 五、發明說明(8) 能。 在配合下列說明 [0 0 1 9 ]本發明之其它特徵及優點 及所附圖示後,將更能突顯出來 【實施方式】 [〇 0 2 4 ]現請參照圖1,其係本發明之微處理器丨〇 〇的 方塊圖。微處理器1 〇 〇為包括多個管線階段的管線化處理 器。圖中顯示了 一部份的階段,亦即卜階段丨5丨、F-階段 153、X-階段155以及R-階段157。卜階段151包括從記憶體 或指令快取記憶體中提取指令位元組的階段。在一實施例 中’ I -階段151包括複數個階段。f -階段153包括將一串未 格式化的指令位元組格式化成格式化指令的階段。χ-階段 1 55包括將格式化的巨指令轉譯成微指令的階段。R—階段 157包括從暫存器檔案載入運算元的暫存器階段。微處理 器1 0 0的其他執行階段未顯示出來,如接在R _階段丨5 7之後 的位址產生階段、資料階段、執行階段、儲存階段以及結 果回寫階段。 [〇〇25]微處理器1〇〇包括卜階段151中的指令快取記 憶體104 °指令快取記憶體1〇4會快取從耦接至微處理器 1 0 0的系統記憶體中所提取之指令。指令快取記憶體丨〇 4接 收一現行提取位址181,以選取一快取線之指令位元組167 來輸出。在一實施例中,指令快取記憶體1 04為一多重階 •k的快取記憶體,亦即,指令快取記憶體1 〇 4需要多個時 脈週期’以輸出現行提取位址1 81所對應的快取線。 ΙΗϋ"Each item is used to store a corresponding one in the first row; ㈡J i includes a third code to provide a plurality of control signals to tap into the Ding row for loading, shifting and holding The first instruction. This code also includes a fourth code to provide a plurality of temporary registers, receive the control signals, and roll out the system signals delayed by one clock cycle to load, shift, and hold. The related information in the second queue is two. [0018] An advantage of the present invention is that an instruction buffer or queue can be used without adding another pipeline stage to improve the processor's 1232403. V. Description of the invention (8 ) Yes. After cooperating with the following description [0 0 1 9] other features and advantages of the present invention and the accompanying drawings, it will be more prominent. [Embodiment] [〇0 2 4] Please refer to FIG. The block diagram of the microprocessor of the present invention. The microprocessor 100 is a pipelined processor including multiple pipeline stages. The figure shows a part of the stages, that is, stages 5 and F. -Stage 153, X-stage 155 and R-stage 157. Bu stage 151 includes from The stage of fetching instruction bytes in memory or instruction cache. In one embodiment, 'I-phase 151 includes a plurality of phases. F-phase 153 includes formatting a string of unformatted instruction bytes into Phases of formatting instructions. Χ-phase 1 55 includes the phase of translating formatted giant instructions into micro instructions. R-phase 157 includes the register phase of loading operands from the register file. Microprocessor 1 0 The other execution phases of 0 are not shown, such as the address generation phase, data phase, execution phase, storage phase, and result write-back phase that follow R_phase 丨 57. [〇〇25] Microprocessor 1〇〇 Includes the instruction cache memory 104 in the stage 151. The instruction cache memory 104 will cache the instructions fetched from the system memory coupled to the microprocessor 100. The instruction cache memory 丨〇4 Receives a current fetch address 181 to select an instruction byte 167 of a cache line for output. In one embodiment, the instruction cache memory 104 is a multi-level cache memory of k , That is, when the instruction cache memory 104 is more than one Pulse period ’to output the cache line corresponding to the current fetch address 1 81. ΙΗϋ

第15頁 1232403 五、發明說明(9) ----- 17Q [〇〇26]微處理器100還包括I -階段151中的多工器 夕工器178會提供現行提取位址181。多工器178會接 人下個循序提取位址丨79,其為現行提取位址181再加上指 7陕取a己憶體1 04所儲存之快取線大小。多工器丨78還會接 收一更正位址177,其指定微處理器1〇〇所要分支的位址, 以更正分支預測錯誤。多工器丨78還會接收一預測分支目 標位址1 7 5。 [ 0 027 ]微處理器1〇〇還包括丨—階段15ι中的分支目標 -位址快取記憶體(BTAC) 1 06,其耦接至多工器178。BTAC 1 0 6會回應現行提取位址丨8},而產生預測分支目標位址 i ° BTAC 106會快取執行過的分支指令之分支目標位 址,以及分支指令本身的位址。在一實施例中,BTAC 1〇6 包括四路集合關聯快取記憶體(4 — way sef associative cache),而所選取集合的每一路皆包含多個項目,用以儲 存目標位址及預測分支指令之分支預測資訊。除了預測目 標位址175外,BTAC 106還會輸出分支預測相關資訊194。 在一實施例中,BTAC資訊194包括:一偏移量,用以指定 在現行提取位址1 81所選取的指令快取線内,預測分支指 令之第一個位元組的位置;一指示,以指出預測分支指令 是否橫跨半快取線(hal f-cache 1 i ne)的邊界;所選取的 || 路中,每個項目之有效位元;指出所選取的集合中,哪— 路為最近最少使用(least - recently-used)的指示;指出 所選取的路之多個項目中,哪一個為最近最少使用的指 示;以及分支指令是否會被採行或不採行的預測。Page 15 1232403 V. Description of the invention (9) ----- 17Q [0026] The microprocessor 100 also includes a multiplexer in the I-stage 151. The multiplexer 178 will provide the current extraction address 181. The multiplexer 178 will pick up the next sequential fetch address 丨 79, which is the current fetch address 181 plus the size of the cache line stored in the memory 7 04. The multiplexer 78 will also receive a correction address 177, which specifies the address of the desired branch of the microprocessor 100 to correct the branch prediction error. The multiplexer 78 will also receive a predicted branch target address 1 7 5. [0 027] The microprocessor 100 also includes a branch target in the stage 15-address cache memory (BTAC) 106, which is coupled to the multiplexer 178. BTAC 1 0 6 will respond to the current fetch address 丨 8}, and generate the predicted branch target address i ° BTAC 106 will cache the branch target address of the executed branch instruction and the address of the branch instruction itself. In one embodiment, BTAC 106 includes four-way set associative cache, and each way of the selected set contains multiple items for storing the target address and predicting branches. Branch prediction information for instructions. In addition to the prediction target address 175, BTAC 106 also outputs branch prediction related information 194. In one embodiment, the BTAC information 194 includes: an offset for designating the location of the first byte of the branch instruction within the instruction cache line selected by the current fetch address 1 81; an indication To indicate whether the predicted branch instruction crosses the boundary of the half-cache line (hal f-cache 1 i ne); the valid bits of each item in the selected || path; indicate which— in the selected set— The road is a least- recently-used instruction; an indication of which of the selected items of the road is the least recently used; and a prediction as to whether a branch instruction will be taken or not taken.

第16頁Page 16

1232403 發明說明(10) [0028]微處理器100還包括控制邏輯102。若現行提 取位址181與BTAC 106中一先前執行的分支指令之有效快 取位址相吻合,且若BTAC 106預測該分支指令將被採行, 則控制邏輯102會控制多工器178選取BTAC目標位址175。 若發生分支預測錯誤,則控制邏輯1〇2會控制多工器178選 取更正位址1 7 7。否則’控制邏輯1 〇 2會控制多工器1 7 8選 取下個循序提取位址179。控制邏輯1〇2還會接收BTAC資訊 194 。 、 。1232403 Description of the Invention (10) [0028] The microprocessor 100 further includes a control logic 102. If the current fetch address 181 matches the effective cache address of a previously executed branch instruction in BTAC 106, and if BTAC 106 predicts that the branch instruction will be taken, then control logic 102 controls multiplexer 178 to select BTAC The target address is 175. If a branch prediction error occurs, the control logic 102 controls the multiplexer 178 to select the correction address 1 7 7. Otherwise, the control logic 102 will control the multiplexer 178 to select the next sequential fetch address 179. The control logic 102 will also receive BTAC information 194. ,.

[0029] 微處理器1〇〇還包括I —階段151中的預解碼邏 輯108 ’其搞接至指令快取記憶體1〇4。預解碼邏輯1〇8會 接收由指令快取記憶體104所送出之指令位元組丨67的快取 線,以及BTAC資訊194,據以產生預解碼資訊169。在一實 施例中’預解碼資訊1 69包括:與每個指令位元組相關聯 的位元,其預測此位元組是否為BTAC 1〇6預測會採行之分 支指令的運算碼(opcode)位元組;用以預測下個指令長度 之位元’其係依據所預測的指令長度;與每個指令位元組 相關聯的位元,其預測此位元組是否為指令的前置碼 (prefix)位元組;以及對分支指令結果的預測。[0029] The microprocessor 100 also includes pre-decoding logic 108 'in stage 151, which interfaces to the instruction cache memory 104. The pre-decoding logic 108 will receive the cache line of the instruction byte 丨 67 sent by the instruction cache memory 104 and the BTAC information 194 to generate the pre-decoding information 169. In one embodiment, the 'pre-decode information 1 69' includes: a bit associated with each instruction byte, which predicts whether the byte is an opcode (opcode) of a branch instruction that BTAC 106 predicts to take. ) Byte; the bit used to predict the next instruction length is based on the predicted instruction length; the bit associated with each instruction byte predicts whether this byte is a preamble to the instruction Prefix bytes; and prediction of branch instruction results.

[0030] 微處理器1〇〇還包括F—階段153中的指令位元 組緩衝器112,其耦接至預解碼邏輯1〇8。指令位元組缓衝 器112會從預解碼邏輯108接收預解碼資訊169,並從指令 快取記憶體104接收指令位元組167。指令位元組緩衝器 11 2會經由訊號1 9 6將預解碼資訊送到控制邏輯1 〇 2。在一 實施例中,指令位元組緩衝器丨丨2能緩衝多達四條快取線[0030] The microprocessor 100 also includes an instruction byte buffer 112 in the F-phase 153, which is coupled to the pre-decoding logic 108. The instruction byte buffer 112 receives pre-decode information 169 from the pre-decoding logic 108 and receives instruction byte 167 from the instruction cache memory 104. The instruction byte buffer 11 2 sends the pre-decoded information to the control logic 102 via a signal 1 96. In one embodiment, the instruction byte buffer 2 can buffer up to four cache lines.

第17頁 1232403 五、發明說明(11) 的指令位元組及相關預解碼資訊。 [0031]微處理器1〇〇還包括指令位元組緩衝器控制邏 輯1 1 4,其耦接至指令位元組緩衝器丨丨2。指令位元組緩衝 器控制邏輯114係用以控制指令位元組及相關預解碼資訊 流入及流出指令位元組緩衝器11 2。指令位元組緩衝器控 制邏輯114還會接收BTAC資訊194。 ° ' [ 0032 ]微處理器1〇〇還包括F—階段153中的指令格式 化器(instruction formatter)116,其耦接至指令位元組 緩衝器11 2。指令格式化器11 β會從指令位元組緩衝器11 2 接收指令位元組與預解碼資訊1 6 5,並從中產生格式化指 令197。亦即,指令格式化器116會查看指令位元組緩衝胃器 1 1 2中的一串指令位元組,判斷哪個位元組包括下個指令 以及下個指令的長度’並將下個指令輸出為 亡〇]:11^1^6(1一1113 1:1'197。在一實施例中,[01^31;1^(1」1^1^ 1 97所提供的格式化指令包括實質上符合以6架構指令集的 指令。在一實施例中,格式化指令也稱為巨指令,其會轉 譯成可由微處理器1 〇 〇管線的執行階段所執行的微指 令。for mat ted_i ns tr 197係在F-階段153中產生。每次指 令格式化器116輸出formatted 一 instr 197,指令格式化器 116會產生真值的F — new—instr訊號152,以表示在 for mat ted—instr 197上出現有效的格式化指令。此外, 指令格式化器116會經由訊號F_instr一info 198,輸出 尤〇〇1&1^6(1一丨1131:]:197的相關資訊至控制邏輯1〇2。在_實 施例中,F —ins tr_ inf ο 1 98包括:一分支指令是否會被採Page 17 1232403 V. Instruction byte (11) and related pre-decoding information. [0031] The microprocessor 100 also includes an instruction byte buffer control logic 1 4 which is coupled to the instruction byte buffer 丨 2. The instruction byte buffer control logic 114 is used to control the instruction byte and related pre-decode information to flow into and out of the instruction byte buffer 112. The instruction byte buffer control logic 114 also receives BTAC information 194. [0032] The microprocessor 100 also includes an instruction formatter 116 in the F-phase 153, which is coupled to the instruction byte buffer 112. The instruction formatter 11 β receives the instruction byte and pre-decode information 1 6 5 from the instruction byte buffer 11 2 and generates a formatting instruction 197 from it. That is, the instruction formatter 116 looks at a series of instruction bytes in the instruction byte buffer device 1 12 to determine which byte includes the next instruction and the length of the next instruction 'and sends the next instruction The output is 0.]: 11 ^ 1 ^ 6 (1-1131 1: 1: 1197. In one embodiment, the formatting instructions provided by [01 ^ 31; 1 ^ (1 "1 ^ 1 ^ 1 97 include It substantially conforms to the instructions with a 6-architecture instruction set. In one embodiment, the formatting instructions are also called giant instructions, which are translated into micro instructions that can be executed by the execution stage of the microprocessor 1000 pipeline. For mat ted_i ns tr 197 is generated in the F-stage 153. Each time the instruction formatter 116 outputs formatted an instr 197, the instruction formatter 116 will generate a true value of F — new — instr signal 152, which is indicated in for mat ted — A valid formatting instruction appears on instr 197. In addition, the instruction formatter 116 will output the relevant information of 〇〇1 & 1 ^ 6 (1 一 丨 1131:]: 197 to the control logic 1 via the signal F_instr_info 198. 〇 2. In the embodiment, F —ins tr_ inf ο 1 98 includes: whether a branch instruction will be taken

第18頁 1232403 五、發明說明(12)Page 18 1232403 V. Description of the invention (12)

行或不被採行的預測(若指令為分支指令);指令的前置 碼;指令位址是否命中微處理器的分支目標緩衝器中;指 令是否為遠直接分支指令(far direct branch instruct ion);指令是否為遠間接分支指令;指令是否為 呼叫分支指令;指令是否為返回分支指令;指令是否為遠 返回分支指令;指令是否為無條件分支指令;以及指令是 否為條件分支指令。此外,指令格式化器丨丨6會經由現行 指令指標(C IP)訊號1 8 2,輸出格式化指令的位址,此指標 係先前指令的位址再加上先前指令的長度。Prediction of whether or not to be taken (if the instruction is a branch instruction); the preamble of the instruction; whether the instruction address hits the microprocessor's branch target buffer; whether the instruction is a far direct branch instruction ); Whether the instruction is a far indirect branch instruction; whether the instruction is a call branch instruction; whether the instruction is a return branch instruction; whether the instruction is a far return branch instruction; whether the instruction is an unconditional branch instruction; In addition, the instruction formatter 6 outputs the address of the formatted instruction via the current instruction indicator (C IP) signal 1 8 2. This indicator is the address of the previous instruction plus the length of the previous instruction.

[ 0 033 ]微處理器100還包括X一階段155中的格式化指 令佇列(F I Q) 1 8 7。格式化指令佇列1 8 7會從指令格式化器 116接收f〇rmatted_instr 197。格式化指令佇列187還會[0 033] The microprocessor 100 further includes a formatting instruction queue (F I Q) 1 8 7 in the X stage 155. The formatting instruction queue 1 8 7 will receive fmatmatted_instr 197 from the instruction formatter 116. Format command queue 187 will also

經由ear lyO訊號193輸出格式化指令。此外,格式化指令 佇列1 8 7會經由訊號X一r e 1 — i n f ο 1 8 6,從控制邏輯1 〇 2接收 與經由for mat ted—ins tr 197所接收之格式化指令有關的 資訊。X一re 1-info 186係產生於X-階段155。格式化指令 佇列1 8 7還會經由1 a t e 0訊號1 91,輸出與格式化指令(其係 經由ear 1 y 0訊號1 93輸出)有關的資訊。格式化指令佇列 1 87及X —re 1 _info 186將於底下做更詳細地說明。 [0034]微處理1§100還包括格式化指令彳宁列(FIQ)控 制邏輯11 8。F IQ控制邏輯11 8會從指令格式化器11 6接收 F 一new一ins tr 152。當格式化指令佇列187已滿時,FIQ控 制邏輯118會產生一真值的FIQ_ full訊號199,送至指令格 式化器116 °FIQ控制邏輯118還會產生eshift訊號164,以Format command is output via ear lyO signal 193. In addition, the formatting instruction queue 1 8 7 will receive the information about the formatting instruction received through the control logic 1 02 via the signal X_re 1 — i n f ο 1 8 6 from the control logic 1 02. X-re 1-info 186 was created in X-phase 155. The format command queue 1 8 7 also outputs information related to the format command (which is output via ear 1 y 0 signal 1 93) via 1 a t e 0 signal 1 91. The formatting instruction queue 1 87 and X —re 1 _info 186 will be described in more detail below. [0034] Microprocessor 1 § 100 also includes formatting instruction queue (FIQ) control logic 118. The F IQ control logic 11 8 will receive F_new_ins tr 152 from the instruction formatter 116. When the formatting instruction queue 187 is full, the FIQ control logic 118 will generate a true value of FIQ_full signal 199, which will be sent to the instruction formatter 116 ° The FIQ control logic 118 will also generate an eshift signal 164.

第19頁 1232403 五、發明說明(13) 控制格式化指令彳宁列1 8 7内指令的移位。f IQ控制邏輯11 8 還會產生複數個e 1 oad訊號1 62,以控制將來自 formatted_instr 197的指令載入格式化指令仔列187之空 項目的動作。在一實施例中,F IQ控制邏輯11 8對於格式化 指令佇列187中的每個項目,會產生一個el〇ad訊號162。 在一實施例中,格式化指令佇列187包括12個項目,每個 項目儲存一格式化巨指令。不過,為了簡明起見,圖1至3 係顯示包括三個項目的格式化指令佇列1 8 7 ;因此,圖1顯 示了三個el oad訊號162,以el oad[ 2 : 0] 162來表示。Page 19 1232403 V. Description of the invention (13) Controls the shift of the instructions in the formatting instruction 彳 178. The f IQ control logic 11 8 will also generate a plurality of e 1 oad signals 1 62 to control the action of loading the instruction from formatted_instr 197 into the empty item in the formatting instruction array 187. In one embodiment, the F IQ control logic 118 generates an elOad signal 162 for each item in the formatting instruction queue 187. In one embodiment, the formatting instruction queue 187 includes 12 items, and each item stores a formatting giant instruction. However, for the sake of brevity, Figures 1 to 3 show a formatting instruction queue including three items 1 8 7; therefore, Figure 1 shows three el oad signals 162, which are represented by el oad [2: 0] 162 Means.

[0 0 3 5 ] F I Q控制邏輯11 8還會記錄格式化指令仔列1 8 7 中每個項目之一相關有效位元1 3 4。圖1所示的實施例包括 三個有效位元1 34,以V2、VI及V0來表示。VO 1 34係對應 於格式化指令佇列1 8 7中最低項目的有效位元;v 1 1 3 4係 對應於格式化指令佇列1 87中之中間項目的有效位元;V2 1 34係對應於格式化指令佇列1 87中最高項目的有效位 元。FIQ控制邏輯118還會輸出一 F一valid訊號188,其在一 實施例中即為V0 134。有效位元134係表示格式化指令件 列1 8 7中的對應項目是否包含一有效指令。f I Q控制邏輯 118還會接收一 XIQ_full訊號195。[0 0 3 5] The F I Q control logic 11 8 will also record the relevant valid bit 1 3 4 of each of the items in the formatting instruction array 1 8 7. The embodiment shown in FIG. 1 includes three significant bits 134, which are represented by V2, VI and V0. VO 1 34 is the effective bit corresponding to the lowest item in the formatting instruction queue 1 8 7; v 1 1 3 4 is the effective bit corresponding to the middle item in the formatting instruction queue 1 87; V2 1 34 Corresponds to the most significant bit in the format command queue 1 87. The FIQ control logic 118 also outputs an F-valid signal 188, which is V0 134 in one embodiment. The valid bit 134 indicates whether the corresponding entry in the formatting instruction column 1 8 7 contains a valid instruction. The f I Q control logic 118 will also receive a XIQ_full signal 195.

[0036]微處理器1〇〇還包括X-階段155中的指令轉譯 器138,其耦接至格式化指令佇列187。指令轉譯器138會 從格式化指令佇列187接收early0訊號193中的格式化指 令’並將格式化巨指令轉譯成一或多個微指令171。在一 實施例中,微處理器100包括精簡指令集電腦(reduced[0036] The microprocessor 100 also includes an instruction translator 138 in the X-phase 155, which is coupled to the formatted instruction queue 187. The instruction translator 138 receives the formatting instruction 'in the early0 signal 193 from the formatting instruction queue 187 and translates the formatting giant instruction into one or more microinstructions 171. In one embodiment, the microprocessor 100 includes a reduced instruction set computer (reduced

第20頁 1232403 五、發明說明(14) instruction set computer,簡稱RISC)核心,以執行原 生的(nat i ve)或精簡的指令集之微指令。Page 20 1232403 V. Description of the invention (14) instruction set computer (RISC) core to execute micro-instructions of nat ive or reduced instruction set.

[0037]微處理器100還包括X-階段155中的轉譯指令 佇列(XIQ)154,其耦接至指令轉譯器138。XIQ ι54會將由 指令轉澤器138所接收之轉譯微指令171予以緩衝。xiq 154還會經由lateO訊號191 ,緩衝從格式化指令仔列丨87 所接收之相關資訊。經由lateO訊號191所接收之資訊係與 微指令1 7 1有關’這是因為其與格式化巨指令(微指令係從 中轉5睪而付)有關。微處理1 〇 〇的執行階段會使用相關資 訊1 9 1,以執行相關的微指令1 71。 [ 0038 ]微處理器100還包括xiq控制邏輯156,輕接至 XI Q 154 °XIQ控制邏輯156接收F —valid訊號188,並產生 XIQ_full訊號195。XIQ控制邏輯156還會產生X—i〇a(j訊號 164,以控制將轉譯微指令171及相關資訊191載入XIQ 154 的動作。 [ 0 0 3 9 ]微處理器1〇〇還包括X-階段155中之二輸入端 的多工器172,其耦接至XI Q 154。多工器172係用來當作 旁路(bypass)多工器,以選擇性地繞過xiq 154。多工器 172之一輸入端會接收XIQ 154的輸出。多工器172之另一 輸入端則接收XI Q 154的輸入,亦即微指令171及late〇 191。多工器172會依據XIQ控制邏輯156所產生之控制輸入 161,而選取其中一輸入端,輸出至R—階段丨57中的執行階 段暫存器176。若執行階段暫存器176準備好接收指令,且 XIQ 154在指令轉譯器138輸出微指令171時為空,則乂1(3控 1232403[0037] The microprocessor 100 further includes a translation instruction queue (XIQ) 154 in the X-phase 155, which is coupled to the instruction translator 138. XIQ 54 buffers the translated microinstructions 171 received by the instruction translator 138. The xiq 154 also buffers the relevant information received from the formatting command line 87 via the lateO signal 191. The information received through the lateO signal 191 is related to the microinstruction 1 7 1 'because it is related to formatting the giant instruction (the microinstruction is paid by transferring 5 中). The execution phase of the micro-processing 100 will use the relevant information 191 to execute the relevant micro-instruction 171. [0038] The microprocessor 100 further includes a xiq control logic 156, which is lightly connected to the XI Q 154 ° XIQ control logic 156 receives an F-valid signal 188 and generates a XIQ_full signal 195. The XIQ control logic 156 also generates X-ioa (j signal 164, to control the action of loading the translated microinstruction 171 and related information 191 into XIQ 154. [0 0 39] The microprocessor 100 also includes X -Multiplexer 172 at the input of phase 155, which is coupled to XI Q 154. Multiplexer 172 is used as a bypass multiplexer to selectively bypass xiq 154. Multiplexer One input of the multiplexer 172 will receive the output of XIQ 154. The other input of the multiplexer 172 will receive the input of XI Q 154, namely microinstruction 171 and late〇191. The multiplexer 172 will control the logic 156 according to XIQ The generated control input 161 is selected, and one of the input terminals is selected and output to the execution stage register 176 in the R stage 57. If the execution stage register 176 is ready to receive the instruction, and the XIQ 154 is in the instruction translator 138 When outputting micro-instruction 171 is empty, , 1 (3 control 1232403

制邏輯156會控制多工器172,繞過XIQ 154。微處理器1〇〇 還包括有效位元暫存器189,可接收來自於XIQ控制邏輯 156的X —valid訊號148,以指出儲存於執行階段暫存器ία 中的微指令及相關資訊是否有效。Control logic 156 controls multiplexer 172, bypassing XIQ 154. The microprocessor 100 also includes a valid bit register 189, which can receive an X-valid signal 148 from the XIQ control logic 156 to indicate whether the micro-instructions and related information stored in the register ία during execution are valid. .

[0 0 4 0 ]格式化指令佇列1 8 7包括一較早佇列丨3 2,以 儲存經由format ted 一ins tr訊號197所接收之格式化巨指 令,並包括一對應的較晚佇列146,以儲存經由 X —r e 1 一 i n f 〇訊號1 8 6所接收之相關資訊。圖1係顯示包括三 個項目的較早佇列132,αΕΕ2、ee1&ee〇來表示。ee〇為^ 較早佇列132的底部項目,EE1為較早佇列132的中間項 目,而EE2為較早佇列132的頂端項目。EE〇的内容會送到 輸出 Λ 號earlyO 193。訊號eshift 164 及eload[2 ··0] 162係用以控制較早佇列132的移位及載入動作。同樣地, 圖1亦顯示包括三個項目的較晚佇列146,wLE2、lei及 LEO來表不。LEO為較晚佇列146的底部項目,lei為較晚佇 歹J1 46的中間項目,而le2為較晚仔列1 4β的頂端項目。le〇 的内容會送到輸出訊號lateO 191。[0 0 4 0] The formatting instruction queue 1 8 7 includes an earlier queue 3 2 to store the formatting giant instruction received via the formatted INS TR signal 197, and includes a corresponding later. Column 146 stores relevant information received via X-re 1 -inf 0 signal 1 86. Fig. 1 shows an earlier queue 132, which is composed of three items, represented by αΕΕ2, ee1 & ee0. ee〇 is the bottom item of the earlier queue 132, EE1 is the middle item of the earlier queue 132, and EE2 is the top item of the earlier queue 132. The content of EE〇 will be sent to the output Λ earlyO 193. The signals eshift 164 and eload [2 ·· 0] 162 are used to control the shifting and loading actions of the earlier queue 132. Similarly, Figure 1 also shows a later queue 146 consisting of three items, represented by wLE2, lei, and LEO. LEO is the bottom item of later queue 146, lei is the middle item of later queue J1 46, and le2 is the top item of later queue 1 4β. The content of le〇 will be sent to the output signal lateO 191.

[〇〇41]格式化指令佇列187還包括暫存器185。在第 時脈週期結束時,暫存器185會從FIQ控制邏輯118接收 estuft訊號164,並在下個時脈週期,於lshift訊號168中 輸出第一時脈週期期間所接收之eshift訊號164的值。格 j化指令佇列187還包括三個暫存器183。在第一時脈週期 結束時、,暫存器183會從FIQ控制邏輯118接收el〇ad[2 :〇] 汛號’並在下個時脈週期,於1 l〇ad[2 : 0]訊號142中輸出The formatting instruction queue 187 further includes a register 185. At the end of the clock cycle, the register 185 receives the etuft signal 164 from the FIQ control logic 118 and outputs the value of the eshift signal 164 received during the first clock cycle in the lshift signal 168 in the next clock cycle. . The j-instruction queue 187 also includes three registers 183. At the end of the first clock cycle, the register 183 will receive elOad [2: 〇] flood number 'from the FIQ control logic 118 and in the next clock cycle, it will signal 1 l0ad [2: 0]. 142 outputs

1232403 五、發明說明(16) 第一時脈週期期間所接收之e 1 oad [2 :0]訊號162的值。亦 即’暫存器185及183會分別輸出延遲一個時脈週期的 eshift 訊號 164 及eload[2 ··0]訊號。 [0042]在一實施例中,X—rel_info 186包括··格式 化巨指令(對應的微指令係從中轉譯而得)的長度;巨指令 是否橫跨半快取線邊界之指示;巨指令的位移襴位;巨指 令的立即(immediate)欄位;巨指令的指令指標;以及與 分支預測及更正有關的各種資訊(若巨指令被預測為分支 指令)。 * [0 0 4 3 ]在一實施例中,分支預測及更正的相關資訊 包括:用來預測分支指令是否被採行或不被採行的分支經 歷表資訊;用來預測分支指令是否被採行或不被採行之分 支指令的一部份線性指令指標;用來與預測採行/不採行 之線性指令指標進行互斥或運算的分支樣本;若分支預測 不正確時,用以回復原狀的第二分支樣本;用以表示分支 指令之相關特徵的各種旗標,這些特徵如分支指令^否為 條件分支指令、呼叫指令、返回堆疊的目標、相^ =支二 間接分支以及分支指令結果的預測是否由靜態預測=來 施;與BTAC 106所做預測有關的各種資訊,如現行^ 址181是否吻合BTAC 106中的快取位址、該吻合位址是立 有效、分支指令是否預期被採行或不被採行、由現彳$ 位址181所選取之BTAC 106的集合中,最近最少被使C取 路、若指令的執行需要更新BTAC 106,會以所選取 哪個路來取代,以及BTAC 106所輸出之目標位址。5之 正。在一實1232403 V. Description of the invention (16) The value of the e 1 oad [2: 0] signal 162 received during the first clock cycle. That is, the 'registers 185 and 183 output eshift signals 164 and eload [2 ·· 0] signals delayed by one clock cycle, respectively. [0042] In one embodiment, X-rel_info 186 includes the length of a formatted giant instruction (the corresponding microinstruction is translated from it); an indication of whether the giant instruction crosses a half-cache line boundary; Shift niches; immediate fields for giant instructions; instruction indicators for giant instructions; and various information related to branch prediction and correction (if the giant instruction is predicted as a branch instruction). * [0 0 4 3] In an embodiment, the branch prediction and correction related information includes: branch history table information used to predict whether a branch instruction is taken or not taken; and used to predict whether a branch instruction is taken Part of the linear instruction index for branch instructions that are taken or not taken; branch samples that are used to perform mutual exclusion or operation with linear instruction indicators that are predicted to take / not taken; used to reply if the branch prediction is incorrect The original second branch sample; various flags used to indicate the relevant characteristics of the branch instruction, such as branch instruction ^ is a conditional branch instruction, a call instruction, the target of the return stack, a relative ^ = branch indirect branch, and a branch instruction Whether the prediction of the result is performed by static prediction = to apply; various information related to the prediction made by BTAC 106, such as whether the current address 181 matches the cached address in BTAC 106, whether the matched address is valid, and whether the branch instruction is expected Of the set of BTAC 106 selected or not, selected by the current $ address 181, C has been the least recently routed. If the execution of the instruction needs to update BTAC 106, the selected BTAC 106 will be selected. Which way to replace, and the destination address output by BTAC 106. 5 of positive. In one real

1232403 五、發明說明(17) — 施例中’了部份的X一 rel一 inf〇 186會在先前的時脈週期期 間產生,並儲存起來以與相關資訊一起傳送,後者係在巨 指令從較早佇列132的項目EE0透過earl y〇訊號193送出後 的時脈週期所產生。 [ 0044 ]現請參照圖2,其係本發明圖1之格式化指令 佇列1 8 7的較早佇列1 3 2之方塊圖。 [0045]較早佇列132包括串接成佇列的三個多工式暫 存器。二個多工式暫存器包括圖1中的項gEE2、ee1及 ΕΕ0 〇 [ 0046 ]較早佇列132中的頂端多工式暫存器包括具二 個輸入端的多工器212,以及暫存器222 (以ER2來表示),罾 用以接收多工器212的輸出。多工器212包括一載入輸入 端,用以接收圖1中的f0rmatted—instr訊號197。多工器 212還包括一保持輸入端,用以接收暫存器ER2 222的輸 出。多工器212會接收圖1中的ei〇ad[2]訊號162,當作控 制輸入汛號。若e i〇ad [ 2] 162為真,則多工器212的載入 輸入端會選取formatted—instr訊號197 ;否則,多工器 21 2的保持輸入端會選取暫存器ER2 222的輸出。碰到時脈 . 訊號(以elk 202來表示)的上升緣,暫存器ER2 222會載入 多工器212所輸出的值。 攀 [ 0047 ]較早佇列132中的中間多工式暫存器包括具」 輸入端的多工器211,以及暫存器221 (以ER1來表示),用 以接收多工器211的輸出。多工器211包括一載入輸入端 用以接收format ted 一instr訊號197。多工器211還包括一1232403 V. Description of the invention (17)-The X-rel-inf0186 part of the embodiment will be generated during the previous clock cycle and stored for transmission with related information. The EE0 of the earlier queue 132 is generated by the clock cycle after the earl y signal 193 is sent. [0044] Please refer to FIG. 2, which is a block diagram of the earlier queue 1 32 of the formatting instruction queue 1 8 7 of FIG. 1 of the present invention. [0045] The earlier queue 132 includes three multiplexed registers that are serially connected into a queue. The two multiplexing registers include the items gEE2, ee1, and Ε0 in FIG. 1 [0046] The top multiplexing register in the earlier queue 132 includes a multiplexer 212 with two inputs, and a temporary register The register 222 (represented by ER2) is used to receive the output of the multiplexer 212. The multiplexer 212 includes a loading input terminal for receiving the f0rmatted_instr signal 197 in FIG. The multiplexer 212 also includes a holding input terminal for receiving the output of the register ER2 222. The multiplexer 212 receives the eioad [2] signal 162 in FIG. 1 as a control input flood number. If e i0ad [2] 162 is true, the loading input of multiplexer 212 will select formatted-instr signal 197; otherwise, the hold input of multiplexer 21 2 will select the output of register ER2 222. When the rising edge of the clock signal (represented by elk 202) is encountered, the register ER2 222 will load the value output by the multiplexer 212. [0047] The intermediate multiplexer register in the earlier queue 132 includes a multiplexer 211 with an input terminal and a register 221 (represented by ER1) to receive the output of the multiplexer 211. The multiplexer 211 includes a loading input terminal for receiving a formatted instr signal 197. The multiplexer 211 also includes a

1232403 五'發明說明(18) 保持輸入端,用以接收暫存器ER1 221的輸出。多工器211 還包括一移位輸入端,用以接收暫存器ER2 222的輸出。 多工器211會接收圖1中的ei〇ad[丨]訊號162,當作控制輪 入Λ號。多工器211還會接收圖1中的eshift訊號I",當 作另一控制輸入訊號。若e 1 oad [丨]! 6 2為真,則多工器 211的載入輸入端會選取formatted —instr訊號197 ;否° 則,若eshift訊號164為真,多工器211的移位輸入端會選 取暫存器ER2 222的輸出;至於其他情形,多工哭211的保 持輸入端會選取暫存器ER1 221的輸出。 ^ ^1232403 Five 'invention description (18) Hold input, used to receive the output of register ER1 221. The multiplexer 211 further includes a shift input terminal for receiving the output of the register ER2 222. The multiplexer 211 will receive the eioad [丨] signal 162 in FIG. 1 as the control turn Λ. The multiplexer 211 also receives the eshift signal I " in Fig. 1 as another control input signal. If e 1 oad [丨]! 6 2 is true, the loading input of multiplexer 211 will select formatted —instr signal 197; otherwise, if eshift signal 164 is true, the shift input of multiplexer 211 The terminal will select the output of the register ER2 222; in other cases, the hold input of the multiplexer 211 will select the output of the register ER1 221. ^ ^

升緣,暫存謂! 221會載入多工器211所輪==的上 [ 0048 ]較早佇列132中的底部多工式暫存 輸入端的多王器21〇,以及暫存器2 20 (以ERJ“)括/、用一 以接收多工器210的輸出。多工器21〇包括一載入輸入端, 用乂接收formatted_instr訊號197。多工器21〇请白社一 ,持輸入端,用以接收暫存器ER〇 22〇的輸出。多工器21〇 包,一移位輸入端,用以接收暫存器ER1 221的輸出。 夕工η器210會接收圖1中的el〇ad[〇]訊號162,當作控制輸 號。多工器210還會接收圖J中的esMft訊號164,當 I另一控制輸入訊號。若el〇ad[〇] 162為真,則多工器 〇的載入輸入端會選取formatted_instr訊號197;否 m訊號164為真,多工器21〇的移位輸入端會選 ^存請1 221的輸出;至於其他情形,多工器⑽的保 3入::選取暫存器ER〇 22。的輸出。碰到Μ 2〇2的上 、-,暫存l§ER0 220會載入多工器21〇所輸出的值。暫存Shengyuan, temporarily said! 221 will load the multiplexer 211 in the round == [0048] The multiplexer 21 of the bottom multiplexing temporary input in the earlier queue 132, and the register 2 20 (in ERJ) / 、 Use one to receive the output of the multiplexer 210. The multiplexer 21 includes a loading input terminal, and receives the formatted_instr signal 197 with 乂. The multiplexer 21 asks Bai Sheyi to hold the input terminal for receiving temporary signals. The output of the register ER〇22〇. The multiplexer 21 package, a shift input terminal, is used to receive the output of the register ER1 221. The Xiong η device 210 will receive elOad [〇] in Figure 1. The signal 162 is used as the control input signal. The multiplexer 210 will also receive the esMft signal 164 in Figure J as another control input signal. If el〇ad [〇] 162 is true, the load of the multiplexer 0 The input terminal will select the formatted_instr signal 197; if the m signal 164 is true, the shift input terminal of the multiplexer 21 will select the output of ^ 1 and 221; in other cases, the multiplexer's protection is 3: Select the output of the temporary register ER〇22. When it hits the upper and lower of M 2 0, temporary storage 1§ER0 220 will load the value output by the multiplexer 21 0. The temporary storage

1232403 五、發明說明(19) 器ER0 220的輸出係以earlyO訊號193來傳送。 [ 0049 ]現請參照圖3,其係本發明圖/之格式化 佇列1 87之較晚佇列1 46的方塊圖。 [ 0050 ]較晚仔列146包括串接成佇列的三個暫存 工器。三個暫存式多工器包括圖1中的項目LE2、ui工夕 LEO 〇 [ 0 0 5 1 ]較晚佇列146中的頂端暫存式多工器包括具二 輸入端的多工器312,以及暫存器3 22 ( WLR2來丄 厂一 以接收多工器312的輸出。多工器312包括一2=)入端用, 用以接收圖1中的又_^1_11^〇186。多工器312還包括_ 持輸入端’用以接收暫存器LR2 322的輸出。多工器312會 接收圖1中的1 load[2]訊號142,當作控制輸入訊號。若 lload[2] 142為真,則多工器312的載入輸入端會^取 X—rel-info 186 ;否則,多工器312的保持輸入端會選取 暫存器LR2 322的輸出。在圖2之cik 202的上升緣,暫;^ 器LR2 322會載入多工器312所輸出的值。 [ 0052 ]較晚佇列146中的中間暫存式多工器包括具三 輸入端的多工器311,以及暫存器321(以LR1來表示)/用一 以接收多工器311的輸出。多工器311包括一載入輸入端, 用以接收X — rel — info 186。多工器311還包括一保持輪入 端,用以接收暫存器LR1 321的輸出。多工器311還包括— 移位輸入端,用以接收暫存器LR2 322的輸出。多工器3 會接收圖1中的ll〇ad[l]訊號142,當作控制輸入訊號。若 lloacUl]訊號142為真,則多工器311的載入輸入端會選取 圍1232403 V. Description of the invention (19) The output of the device ER0 220 is transmitted with earlyO signal 193. [0049] Please refer to FIG. 3, which is a block diagram of the present invention / formatted queue 1 87 and later queue 1 46. [0050] The later queue 146 includes three temporary storage devices connected in series. The three temporary multiplexers include the item LE2 in FIG. 1 and the ui worker evening LEO 〇 [0 0 5 1] The top temporary multiplexer in the later queue 146 includes a multiplexer 312 with two inputs. And the register 3 22 (WLR2 comes to the factory to receive the output of the multiplexer 312. The multiplexer 312 includes a 2 =) for the input end, which is used to receive _ ^ 1_11 ^ 〇186 in Figure 1. The multiplexer 312 further includes a holding input terminal 'for receiving the output of the register LR2 322. The multiplexer 312 receives the 1 load [2] signal 142 in FIG. 1 as a control input signal. If lload [2] 142 is true, the load input of multiplexer 312 will take X-rel-info 186; otherwise, the hold input of multiplexer 312 will select the output of register LR2 322. At the rising edge of cik 202 in FIG. 2, the LR2 322 will load the value output by the multiplexer 312. [0052] The intermediate temporary multiplexer in the later queue 146 includes a multiplexer 311 with three inputs, and a temporary register 321 (represented by LR1) / one to receive the output of the multiplexer 311. The multiplexer 311 includes a loading input terminal for receiving X — rel — info 186. The multiplexer 311 also includes a hold-in terminal for receiving the output of the register LR1 321. The multiplexer 311 further includes a shift input terminal for receiving the output of the register LR2 322. The multiplexer 3 will receive the llad [l] signal 142 in FIG. 1 as a control input signal. If lloacUl] signal 142 is true, the load input of multiplexer 311 will select the range

第26頁 1232403 五、發明說明(20) X —rel —info 186 ;否則,若lshlft訊號168為真多工器 311的移位輸入端會選取暫存器LR2 322的輸出.至於夂他 形:多工器311的保持輪入端會選取暫存器lr; 32;的輸 出。在圖2之elk 202的上升緣,暫存器LR1 多 工器311所輸出的值。 10戟 [ 0053 ]較晚佇列146中的底部暫存式多工器包括具三 輸入端的多工器310,以及暫存器32〇(以LR〇來表示),用 :接ϊί,器31G的輸出。多卫器310包括-載人輸入端, 用以接收X_rel_inf0 186。多工器31〇還包括一保持輸入 ^ ’ ^以接收暫存HLR0 32〇的輸出。多工器31〇還包括〆 移位輸入端,用以接收暫存器LR1 321的輸出。多工薄3ii ,接收圖”的lload[0]訊號142 ’當作控制輸入訊號。若 jloacUO]訊號142為真,則多工器31〇的載入輸入端會選取 —rel —mfo 186 ;否則,若lshift訊號168為真多工器 310的移位輸入端會選取暫存器LR1 321的輸出;至於其他 ,形丄多工器310的保持輸入端會選取暫存器LR〇 32〇的輸 。在圖2之elk 202的上升緣,暫存sLR() 32〇會載入多 ===出的值。多工器31〇的輸出係以 訊號1 9 1來傳送。 [0 0 5 4 ]現明參照圖4,其係本發明圖1之格式化指人 運作時序圖。圖4顯示五個時脈週期,每個時: =期以圖2及3中clk訊號2〇2的上升緣為啟始。傳統上,在 圖4中’ |錢值係表示為高邏輯準位。圖4係㈣一種 形,其中在指令格式化器丨丨6產生新的格式化巨指令時, 第27頁 1232403 五、發明說明(21) 圖1之X IQ 1 54並未填滿(亦即,能從指令格式化器丨丨6接收 微指令)’而且格式化指令仵列1 8 7是空的。 [0 0 5 5 ]在時脈週期1的期間,指令格式化器丨丨6會產 生圖1中真值的F—new一instr訊號152,以表示一個有效的 新格式化巨^曰令係存在於圖1的formated—instr 197,如 圖所示。因為格式化指令佇列丨8 7為空,所以圖1的f IQ控 制邏輯118會產生真值的ei〇aci[〇]訊號162,以將此有效的 新格式化巨指令從formatted—instr 197載入EE0,其為格 式化指令仵列1 8 7中最低的空項目。 [0 0 5 6 ]在時脈週期2的期間,會設定圖1的v 〇丨3 4 (格 式化指令佇列187之項目EE0的有效位元),以表示mo包含 一有效指令。在時脈週期2的上升緣,圖1之其中一個暫存 器183會載入eload[0] 162 ’並輸出真值的H〇ad[0] 142。因為eload[0] 162為真,所以新指令會被載aER〇 220,並經由圖1的ear ly〇訊號丨93輸出,以傳送至圖1的指 令轉譯器138,如圖所示。指令轉譯器138會轉譯新的巨指 令’並將轉譯後的微指令171送到XIQ 154。此外,控制邏 輯102會在X — rel一info 186中產生與新指令有關的新資 訊,如圖所示。因為ll〇ad[0] 142為真,所以多工器31〇 會選取載入輸入端,並經由lateO 191,輸出X rel inf〇 186所提供的新相關資訊,以傳送至圖1的XIq 154及多工 器172,如圖所示。再者,由於指令轉譯器138在時脈週期 2期間會轉譯新指令,所以FIQ控制邏輯118會產生圖1中真 值的eshift訊號164 ’以使得指令在時脈週期3期間會從格Page 26 1232403 V. Description of the invention (20) X —rel —info 186; Otherwise, if the lshlft signal 168 is the shift input of the true multiplexer 311, the output of the register LR2 322 will be selected. As for the other shape: The input of the holding wheel of the multiplexer 311 selects the output of the register lr; 32 ;. At the rising edge of elk 202 in FIG. 2, the value output by the register LR1 multiplexer 311. 10 halves [0053] The bottom temporary multiplexer in the later queue 146 includes a multiplexer 310 with three inputs, and a temporary register 32 (indicated by LR0). Output. The multi-guard 310 includes a human input terminal for receiving X_rel_inf0 186. The multiplexer 31o also includes a hold input ^ '^ to receive the output of the temporarily stored HLR0 32o. The multiplexer 31o also includes a shift input terminal for receiving the output of the register LR1 321. The multiplexer 3ii receives the signal “lload [0] signal 142 'as the control input signal. If jloacUO] signal 142 is true, the load input of the multiplexer 31 will select —rel —mfo 186; otherwise If the lshift signal 168 is the shift input of the true multiplexer 310, the output of the register LR1 321 will be selected. As for the other, the hold input of the shape multiplexer 310 will select the output of the register LR032. On the rising edge of elk 202 in Figure 2, the temporary storage of sLR () 32 will load the value of multiple ===. The output of the multiplexer 3 10 is transmitted with the signal 1 91. [0 0 5 4 ] Reference is now made to Figure 4, which is a sequence diagram of the formatted finger operation of Figure 1 of the present invention. Figure 4 shows five clock cycles, each time: = period with the rise of clk signal 202 in Figures 2 and 3 Fate is the beginning. Traditionally, the value of | in Figure 4 is expressed as a high logic level. Figure 4 is a form in which when the instruction formatter 丨 6 generates a new formatting giant instruction, the Page 27, 1232403 V. Description of the invention (21) X IQ 1 54 of Fig. 1 is not filled (that is, micro-instructions can be received from the instruction formatter 丨 6) and the instruction is formatted 仵1 8 7 is empty. [0 0 5 5] During the clock cycle 1, the instruction formatter 6 will generate the F-new-instr signal 152 of the true value in FIG. 1 to indicate a valid new The formatting giant command exists in the formated-instr 197 of FIG. 1, as shown in the figure. Because the formatting instruction queue 丨 87 is empty, the f IQ control logic 118 of FIG. 1 will generate a true value of ei. aci [〇] signal 162 to load this valid new formatting giant instruction from formatted-instr 197 to EE0, which is the lowest empty item in the formatting instruction queue 1 8 7. [0 0 5 6] at the time During pulse period 2, v 〇 3 4 (the valid bit of item EE0 of the formatting instruction queue 187) in FIG. 1 is set to indicate that mo contains a valid instruction. At the rising edge of clock period 2, the figure One of the registers 183 will load eload [0] 162 'and output the true value of H0ad [0] 142. Because eload [0] 162 is true, the new instruction will be loaded with aER〇220, and It is output via the ear ly signal 丨 93 of FIG. 1 to be transmitted to the instruction translator 138 of FIG. 1 as shown in the figure. The instruction translator 138 will translate the new giant instruction 'and will translate The microinstruction 171 is sent to XIQ 154. In addition, the control logic 102 will generate new information related to the new instruction in X_rel_info 186, as shown in the figure. Because llad [0] 142 is true, so many The worker 31 will select the loading input terminal, and output the new related information provided by X rel inf〇186 through lateO 191 to send to the XIq 154 and multiplexer 172 of FIG. 1 as shown in the figure. Furthermore, since the instruction translator 138 will translate new instructions during clock cycle 2, the FIQ control logic 118 will generate the true eshift signal 164 'in FIG. 1 so that the instructions will be discontinued during clock cycle 3.

12324031232403

式化指令佇列187移出。 [ 0 05 7 ]在時脈週期3的期間,因為新指令已從格 指令佇列187移出,所以v〇 134為偽。在時脈週期3的1 緣,XIQ控制邏輯156會取決於xiQ i 54是否為空或非空, 而分別將轉譯微指令171 &late〇 191所提供的相關指令 訊載入執行階段暫存器176或XIQ 154。此外,圖1的暫^存、 器185會載入eshift訊號164,並輸出真值的!shift 168。 、[0058 ]由圖4可觀察得知,雖然新的巨指令係在時脈 週期1的期間產生,而相關資訊直到時脈週期2才會產生, 但是格式化指令佇列187係有助於在相同的時脈週期,使 相關資訊及轉譯微指令能送到執行階段。 [〇 0 5 9 ]現請參照圖5,其係本發明圖1的格式化指令 佇列1 8 7之運作時序圖。圖5與圖4類似,其差別只在於圖5 的情況中,當指令格式化器11 β產生新的格式化巨指令 時,X I Q 1 5 4為滿的。 [ 0 0 6 0 ]在時脈週期!的期間,XIQ—full ι95為真。指 令格式化器116會在formatted 一 instr 197中產生新指令, 且使F一new一instr 152為真,如圖4的情形。因為格式化指 令佇列187為空,所以FIQ控制邏輯118會產生真值的el〇ad [0 ]訊號1 6 2,以將有效的新格式化巨指令從 formatted—instr 197 載入 ΕΕ0,如圖 4 的情形。 [0 0 6 1 ]在時脈週期2的期間,v 〇 1 3 4會被設定;暫存 器183會輸出真值的ii〇ad[〇] 142 ;新指令會被載入ER〇 220,並經由ear iy〇訊號193輸出,以傳送至指令轉譯器The formula instruction queue 187 is removed. [0 05 7] During clock cycle 3, since the new instruction has been removed from the grid instruction queue 187, v0 134 is false. At the edge of clock cycle 3, the XIQ control logic 156 will load the relevant instruction messages provided by the translation microinstruction 171 & late〇191 into the execution stage temporarily depending on whether xiQ i 54 is empty or not.器 176 or XIQ 154. In addition, the temporary storage device 185 of FIG. 1 will load the eshift signal 164 and output the true value! shift 168. [0058] It can be observed from FIG. 4 that although the new giant instruction is generated during the clock cycle 1, and the related information is not generated until the clock cycle 2, the formatting command queue 187 is helpful. At the same clock cycle, relevant information and translated micro instructions can be sent to the execution stage. [〇 0 5 9] Please refer to FIG. 5, which is an operation timing chart of the formatting instruction queue 1 87 of FIG. 1 according to the present invention. Fig. 5 is similar to Fig. 4 except that in the case of Fig. 5, when the instruction formatter 11 β generates a new formatting giant instruction, X I Q 1 5 4 is full. [0 0 6 0] In the clock cycle! During the period, XIQ-full 95 was true. The instruction formatter 116 generates a new instruction in formatted_instr 197, and makes F_new_instr 152 true, as shown in the case of FIG. 4. Because the formatting instruction queue 187 is empty, the FIQ control logic 118 will generate a true el0ad [0] signal 1 6 2 to load a valid new formatting giant instruction from formatted-instr 197 into Ε0, such as The situation in Figure 4. [0 0 6 1] During the clock cycle 2, v 〇 1 34 will be set; the register 183 will output the true value ii〇ad [〇] 142; the new instruction will be loaded into ER〇220, And output via ear iy〇 signal 193 to send to the command translator

第29頁 1232403 五、發明說明(23) 138 ; X —re 1一info 186中會產生與新指令有關的新資訊; 以及多工器310會選取載入輸入端,並經由ia^e〇 igi,輸 出X — r el —info 186所提供的新相關資訊,以傳送至XIq 154及多工器172 ’如圖4的情形。然而,由於在時脈週期2 啟始時X I Q 1 5 4已滿,所以F I Q控制邏輯1丨8會產生偽值的 eshift訊號164,而與圖4不同。之後,XIq控制邏輯156會 使XIQ_full 195除能(deassert),以表示在時脈週期3的 期間,指令轉譯器138將會準備好轉譯新的巨指令。Page 29 1232403 V. Description of the invention (23) 138; X —re 1—info 186 will generate new information related to the new instruction; and the multiplexer 310 will select the loading input and pass ia ^ e〇igi , Output the new relevant information provided by X — r el — info 186 to be transmitted to XIq 154 and multiplexer 172 ′ as shown in the case of FIG. 4. However, since X I Q 1 5 4 is full at the beginning of the clock cycle 2, the F I Q control logic 1 丨 8 will generate a false value of the eshift signal 164, which is different from FIG. 4. After that, the XIq control logic 156 will deassert XIQ_full 195 to indicate that during clock cycle 3, the instruction translator 138 will be ready to translate the new giant instruction.

[ 0 0 6 2 ]在時脈週期3的期間,由於在clk 2〇2的上升 緣,eshif ΐ訊號164為偽,所以新指令會保持於ER〇 22〇 中,並經由early0 1 93送到指令轉譯器138進行轉譯。相 對應的是,V0 134仍保持為真。指令轉譯器138會轉譯新 的巨指令,並將轉譯微指令171送到XIQ 154。因為在clk 202的上升緣,llaod[0] 142為真,所以在時脈週期2的期 間,Xjel —info 186所提供的相關資訊會載、LR〇 32〇。 因為在時脈週期3的剩餘期間,u〇ad[〇] H2&lshift[0 0 6 2] During clock cycle 3, since the rising edge of clk 2 0 2, the eshif ΐ signal 164 is false, so the new instruction will be kept in ER 0 2 2 0 and sent via early 0 1 93 The instruction translator 138 performs translation. Correspondingly, V0 134 remains true. The instruction translator 138 translates the new giant instruction and sends the translated microinstruction 171 to the XIQ 154. Because llaod [0] 142 is true at the rising edge of clk 202, during the period of clock cycle 2, the relevant information provided by Xjel-info 186 will be included, LR0 32〇. Because during the remainder of clock cycle 3, u〇ad [〇] H2 & lshift

m為偽,所以LR0 320的内容(亦即,與指令有關的新資 訊)會經由UteO 191被送到XIQ 154,如圖所示。在時脈 =期3啟始之後,FIQ控制邏輯118會產生真值的eshift訊 =4,則吏得在時脈週期4的期間,新#令將會從格式化 [0063]在時脈週期4的 指令佇列1 8 7移出,所以v 〇 緣’ XIQ控制邏輯156會將轉 期間,因為新指令已從格式化 134為偽。在時脈週期4的上升 譯微指令171及lateO 191所提m is false, so the content of LR0 320 (that is, new information related to the instruction) is sent to XIQ 154 via UteO 191, as shown in the figure. After the clock = period 3 starts, the FIQ control logic 118 will generate a true value of eshift = 4, then during the period of clock period 4, the new # 令 will be formatted from the clock cycle [0063] The 4 instruction queue 1 8 7 is moved out, so v 〇 edge 'XIQ control logic 156 will transfer the period, because the new instruction has changed from formatting 134 to false. At the rise of clock cycle 4, translated by micro-instruction 171 and lateO 191

1232403 五、發明說明(24) 供的相關指令資訊載入XIQ 154。此外,圖1的暫存 會載入eshift訊號164,並且輸出真值的卜以^ 168。 [ 0064 ]由圖5可觀察得知,雖然新的巨指令係在時脈 週期1的期間產生,而相關資訊直到時脈週期2才會產生, 但是格式化指令佇列1 8 7係有助於在相同的時脈週期,使 相關"k说及轉澤微指令能送到XIQ 1 5 4。1232403 5. The relevant instruction information provided by the invention description (24) is loaded into XIQ 154. In addition, the temporary storage of Figure 1 will load the eshift signal 164 and output the true value of ^ 168. [0064] It can be observed from FIG. 5 that although a new giant instruction is generated during the clock cycle 1, and the related information is not generated until the clock cycle 2, the formatting command queue 1 8 7 is helpful. In the same clock cycle, the relevant " k said and the translation micro-instruction can be sent to XIQ 1 5 4.

[0 0 6 5 ]現請參照圖6,其係本發明圖1的格式化指令 佇列1 87之運作時序圖。圖6與圖5類似,其差別只在於^ 6 的情況中,當指令格式化器11 6產生新的格式化巨指令 時,不但X I Q 1 54為滿,格式化指令佇列丨87亦不是空的。[0 0 6 5] Please refer to FIG. 6, which is an operation timing chart of the formatting instruction queue 1 87 of FIG. 1 according to the present invention. Figure 6 is similar to Figure 5, except that in the case of ^ 6, when the command formatter 116 generates a new formatting giant command, not only XIQ 1 54 is full, but the formatting command queue 87 is not empty. of.

[ 0066 ]在時脈週期1的期間,XIQ__fuU ι95為真。指 令格式化器116會在formatted一instr 197中產生新指令, 且使F —new—instr 152為真,如圖4及5的情形。因為EE0包 含一有效指令,所以V0 134為真;然而,因為EE1未包含 一有效指令,所以圖1之格式化指令佇列丨87的項目ee丨之 有效位元(VI 134)為偽,如圖6所示。因此,FIQ控制邏輯 118會產生真值的ei〇ad[丨]訊號162,以將有效的新格式化 巨指令從formatted—instr 197 載入EE1。訊號eariy〇 193 會送出保持於ΕΕ0中的指令(圖6中稱為舊指令),而訊號 lateO 191會送出與保持於LE〇中之舊指令有關的資訊(稱 為舊資訊),如圖6所示。 [ 0067 ]在時脈週期2的期間,VI 134會被設定,以表 =eei現在包含一有效指令。U4也維持設定狀態。舊 指令係保持於ER0 220中,而舊資訊則保持於LR〇 32〇中。[0066] During the clock cycle 1, XIQ_FU95 is true. The instruction formatter 116 will generate a new instruction in formatted-instr 197, and make F_new_instr 152 true, as shown in the cases of FIGS. 4 and 5. Because EE0 contains a valid instruction, V0 134 is true; however, because EE1 does not contain a valid instruction, the valid bit (VI 134) of the item ee of the formatted instruction queue 丨 87 in Figure 1 is false, as Shown in Figure 6. Therefore, the FIQ control logic 118 will generate a true value of ei0ad [丨] signal 162 to load a valid newly formatted giant instruction from formatted-instr 197 into EE1. The signal eariy〇193 will send the command held in Ε0 (referred to as the old command in Figure 6), and the signal lateO 191 will send information related to the old command held in LE0 (called the old information), as shown in Figure 6 As shown. [0067] During clock cycle 2, VI 134 is set to show that = eei now contains a valid instruction. U4 also remains set. The old instructions are kept in ER0 220, while the old information is kept in LR0 32〇.

1232403 發明說明(25) 暫存器183會輸出真值的ii〇a(i[l] 142。新指令會被載入 ER1 221,如圖6所示。與新指令有關的新資訊係在 X — rel —info 186產生,而圖3的多工器311會選取載入輸入 端’其會被送至暫存器LR1 321。由於在時脈週期2啟始時 XIQ 154已滿,所以FIQ控制邏輯118會產生偽值的eshift 訊號164。之後,XIQ控制邏輯156會使XIQ—full 195除 能,以表示在時脈週期3的期間,指令轉譯器丨38將會準備 好轉譯新的巨指令。1232403 Invention description (25) The register 183 will output the true value of ii〇a (i [l] 142. The new instruction will be loaded into ER1 221, as shown in Figure 6. The new information related to the new instruction is in X — Rel —info 186 is generated, and the multiplexer 311 of FIG. 3 selects the load input terminal, which is sent to the temporary register LR1 321. Since the XIQ 154 is full at the beginning of the clock cycle 2, the FIQ control Logic 118 will generate a pseudo-valued eshift signal 164. After that, XIQ control logic 156 will disable XIQ-full 195 to indicate that during clock cycle 3, the instruction translator 38 will be ready to translate the new giant instruction .

[ 00 68 ]在時脈週期3的期間,由於在clk 2〇2的上升 緣,eshift訊號164為偽,所以新指令會保持於ER1 221 中。、此外,舊指令係保持於ER〇 22〇中,並經由early〇 193送到指令轉譯器138進行轉譯。V1及” 134仍保持為 真。指令轉譯器1 38會轉譯舊指令,並將其轉譯的微指令 171送到XIQ 154。因為在時脈週期3的剩餘期間,n〇ad [〇\ 142及lshift 168為偽,所以LR〇 32〇的内容(亦即,[00 68] During the clock cycle 3, since the eshift signal 164 is false at the rising edge of clk 202, the new instruction will remain in ER1 221. In addition, the old instructions are kept in ER0 22〇 and sent to instruction translator 138 via early 193 for translation. V1 and "134 remain true. The instruction translator 1 38 will translate the old instructions and send the translated micro instructions 171 to XIQ 154. Because in the remainder of clock cycle 3, n〇ad [〇 \ 142 and lshift 168 is false, so the contents of LR〇32〇 (ie,

與舊指令有關的舊資訊)會經由late〇 191被送到XIQ 154如圖6所不。因為在elk 202的上升緣,iia〇d[〇]l4: 為真,所以在時脈週期2的期間,X — rel —info 186所提供 的新相關資訊會載入LR1 321。在時脈週期3啟始之後, 控制邏輯118會產生真值的eshift訊號164,以使得在 時脈週期4的期間,新指令將會從EE1移到EE〇。 」0069 ]、在時脈週期4的期間,因為新指令已從eei移 I羅短134為偽。在時脈週期4的上升緣,XI<^ 制邏輯156會將從舊指令所轉譯的微指令171及UteO 191The old information related to the old instructions) will be sent to XIQ 154 via late〇 191 as shown in Figure 6. Since iiaOd [〇] 14: is true at the rising edge of elk 202, during the clock period 2, the new relevant information provided by X — rel — info 186 will be loaded into LR1 321. After the clock cycle 3 starts, the control logic 118 generates a true eshift signal 164, so that during the clock cycle 4, the new instruction will be moved from EE1 to EE0. "0069] During the period of clock cycle 4, the new instruction has been shifted from eei to I and short 134 is false. At the rising edge of clock cycle 4, XI < ^ control logic 156 translates microinstructions 171 and UteO 191 translated from the old instruction

第32頁 1232403Page 32 1232403

所提供的相關指令資訊載人XIQ 154。此外,暫存器i85會 載入eshift訊號164,並且輸出真值的lshift 168。因為 XIQ 154已準備好接收另一個微指令,所以esMft 164合 保持為真。由於在clk 202的上升緣,eshift訊號164為曰 真,所以新指令會從ER1 221移到ER〇 220,並經由early〇 193运至指令轉譯器138進行轉譯。v〇 134仍保持為真。指 令轉譯器1 38會轉譯新指令’並將從新指令所轉譯的微指 令171送到XIQ 154。因為在時脈週期4的期間,lshift 168為真,所以與保持於LR1 321中之新指令有關的資訊, 會在多工器310的移位輸入端被選取,並經*Ute〇訊號 191傳送,如圖6所示。 響 [ 0070 ]在時脈週期5的期間,因為新指令已從格式化 指令佇列187移出,所以FIQ控制邏輯118會清除v〇 134。 在時脈週期5的上升緣,xiq控制邏輯156會將從新指令所 轉譯的微指令171及Ute〇 191所提供的相關指令資訊載 入X I Q 1 5 4 〇 [0 0 7 1 ]由圖6可觀察得知,雖然新的巨指令係在時脈 週期1的期間產生,而相關資訊直到時脈週期2才會產生, 但疋格式化指令仔列1 8 7係有助於在相同時脈週期,使相 關資訊及轉譯微指令能送到XI q丨5 4。 [0 0 7 2 ]雖然本發明及其目的、特徵與優點已詳細敘 述’其它實施例亦可包含在本發明之範圍内。例如,雖然 所說明的實施例’係將巨指令做緩衝,以送到指令轉譯器 轉譯成微指令’但本發明的範圍並不限於這樣的實施例;The relevant instruction information provided is contained in XIQ 154. In addition, the register i85 will load the eshift signal 164 and output the true value of lshift 168. Because the XIQ 154 is ready to receive another microinstruction, esMft 164 remains true. Since the eshift signal 164 is true at the rising edge of clk 202, the new instruction will be moved from ER1 221 to ER〇 220 and delivered to instruction translator 138 via early 193 for translation. v〇 134 remains true. The instruction translator 1 38 translates the new instruction ’and sends the micro instruction 171 translated from the new instruction to the XIQ 154. Because lshift 168 is true during clock cycle 4, the information related to the new instruction held in LR1 321 will be selected at the shift input of multiplexer 310 and transmitted via * Ute〇 signal 191 ,As shown in Figure 6. [0070] During clock cycle 5, because the new instruction has been removed from the formatted instruction queue 187, the FIQ control logic 118 clears v〇 134. At the rising edge of clock cycle 5, the xiq control logic 156 loads the relevant instruction information provided by the microinstruction 171 and Ute〇191 translated from the new instruction into XIQ 1 5 4 〇 [0 0 7 1] Observation shows that although new giant instructions are generated during clock cycle 1, and the related information is not generated until clock cycle 2, the formatted command sequence 1 8 7 is helpful in the same clock cycle. , So that relevant information and translation microinstructions can be sent to XI q 丨 5 4. [0 0 7 2] Although the present invention and its objects, features, and advantages have been described in detail ', other embodiments may also be included within the scope of the present invention. For example, although the illustrated embodiment 'buffers huge instructions to be sent to an instruction translator to translate into micro instructions', the scope of the invention is not limited to such embodiments;

第33頁 1232403 五、發明說明(27) 確切地說,本發明可廣泛地應用於任何需要緩衝指令,且 指令相關資訊係在產生指令本身之時脈週期後一時脈週期 内產生的情形。再者,雖然前述實施例係於可處理可變長 度指令的微處理器中實施,但本發明不受限於此,亦可使 用於固定長度指令的處理器中。Page 33 1232403 V. Description of the invention (27) Specifically, the present invention can be widely applied to any situation that requires buffering instructions, and the instruction-related information is generated within one clock cycle after the clock cycle that generates the instruction itself. Furthermore, although the foregoing embodiment is implemented in a microprocessor capable of processing variable-length instructions, the present invention is not limited to this, and may be applied to a processor of fixed-length instructions.

[0 0 7 3 ]本發明除了利用硬體來實施外,也可實施於 電腦可使用(如可讀取)媒介所内含的電腦可讀取碼(如電 月a可讀取程式碼、資料等)中。電腦碼使本發明所揭露之 功此或架構(或兩者)成為可行。例如,這可藉由使用一般 程式語言(如C、C + +、JAVA及類似的程式語言);GDSII資 料庫;包括Verilog HDL、VHDL、Altera HDL(AHDL)等的 硬體描述語言(hdl);或此技術領域中其他可用的程式化 及/或電路記錄工具來達成。電腦碼可置於任何已知的電 腦可使用(如可讀取)媒介,包括半導體記憶體、磁碟、光 碟(如CD-ROM、DVD-ROM及類似物),並可作為電腦資料訊 號’内含於電腦可使用(如可讀取)傳輸媒介(如載波或任 何其他媒介’包括數位、光學或運用類比訊號的媒介)[0 0 7] In addition to using hardware to implement the present invention, the present invention may also be implemented in computer-readable codes (such as electricity month a readable program code) contained in a computer-usable (such as readable) medium, Information, etc.). The computer code makes the disclosed function or architecture (or both) feasible. For example, this can be achieved by using common programming languages (such as C, C ++, JAVA, and similar programming languages); GDSII databases; hardware description languages (hdl) including Verilog HDL, VHDL, Altera HDL (AHDL), etc. ; Or other stylized and / or circuit-recording tools available in this area of technology. The computer code can be placed on any known computer-usable (eg, readable) medium, including semiconductor memory, magnetic disks, optical disks (eg, CD-ROM, DVD-ROM, and the like), and can be used as a computer data signal ' Contained in a computer-usable (if readable) transmission medium (such as a carrier wave or any other medium 'including digital, optical, or analogue media)

中。就此而言,電腦碼可在通訊網路(包括網際網路及内 部網路)上傳輸。要了解到的是,本發明可實施於電腦碼 (如智慧財產權(丨P)核心(如微處理器核心)的一部份,或 如系統層級的設計(如系統單晶片(S0C)))中,並且可轉換 成硬體’作為積體電路製造的一部份。再者,本發明亦可 實作為硬體及電腦碼的組合。 〜之’以上所述者,僅為本發明之較佳實施例而已,in. In this regard, computer code can be transmitted over communication networks, including the Internet and internal networks. It should be understood that the present invention can be implemented in computer code (such as part of a Intellectual Property Right (丨 P) core (such as a microprocessor core), or as a system-level design (such as a system-on-a-chip (S0C))) And can be converted into hardware 'as part of integrated circuit manufacturing. Furthermore, the present invention can also be implemented as a combination of hardware and computer code. ~ Of the above is only a preferred embodiment of the present invention,

12324031232403

第35頁 1232403Page 1232403

圖式簡單說明 【圖式簡單說明】 [0 0 2 0 ]圖1係本發明之微處理器的方塊圖。 [ 0 02 1 ]圖2係本發明圖1之袼式化指令佇 1的較早佇Brief description of the drawings [Simple description of the drawings] [0 0 2 0] FIG. 1 is a block diagram of the microprocessor of the present invention. [0 02 1] FIG. 2 is an earlier version of the normalized instruction 伫 1 of FIG. 1 of the present invention.

[ 0 022 ]圖3係本發明圖1之格式化指令佇列之較晚符 列的方塊圖。 [0 0 2 3 ]圖4、5及6係本發明圖1的格式化指令佇列之 運作時序圖。 圖號說明: 100 微處理器 102 控制邏輯 104 指令快取記憶體 106 分支目標位址快取記憶體 108 預解碼邏輯 112 指令位元組緩衝 114 才曰令位元組緩衝器控制邏輯 116 指令格式化器 118 格式化指令佇列控制邏輯 132 較早佇列 134 有效位元 138 指令轉譯器 142 11 oad [ 2 : 〇 ]訊號 146 較晚佇列[0 022] FIG. 3 is a block diagram of a later string of the formatting instruction queue of FIG. 1 according to the present invention. [0 0 2 3] FIGS. 4, 5 and 6 are operation timing diagrams of the formatting instruction queue of FIG. 1 according to the present invention. Description of drawing number: 100 microprocessor 102 control logic 104 instruction cache memory 106 branch target address cache memory 108 pre-decoding logic 112 instruction byte buffer 114 instruction byte buffer control logic 116 instruction format Adapter 118 Formatting instruction queue control logic 132 Earlier queue 134 Effective bits 138 Instruction translator 142 11 oad [2: 〇] Signal 146 Late queue

1232403 圖式簡單說明 151 152 153 154 155 156 157 161 162 164 165 167 168 169 171 172 175 176 177 179 181 182 183 186 I -階段 訊號F_new. F-階段 ns t r 轉譯指令佇列(XIQ) X-階段 XIQ控制邏輯 R-階段 控制輸入 e 1 〇 a d訊號 eshift訊號 指令位元組與預解碼資訊 指令位元組 1 s h i f t訊號 預解碼資訊 微指令 178 、 210 、 211 、 212 、 310 、 311 、 312 :多工器 預測分支目標位址 執行階段暫存器 更正位址 下個循序提取位址 現行提取位址 現行指令指標訊號 185、220、221、222、320、321、322 :暫存器 訊號 X — rel —inf 〇 % 第37頁 1232403 圖式簡單說明 1 8 7 :格式化指令佇列 1 88 : F_va 1 i d 訊號 1 8 9 :有效位元暫存器 1 9 1 : 1 a t e 0 訊號 193 :earlyO 訊號 194 :分支預測相關資訊 195 : XIQ —ful 1 訊號 197 :格式化指令 198 ··訊號F_instr_info 1 9 9 : F I Q充滿訊號 202 :時脈訊號(elk)1232403 Schematic description 151 152 153 154 155 156 157 161 162 164 165 167 168 169 171 172 175 176 177 177 179 181 182 183 186 I-Phase signal F_new. F-phase ns tr Translation instruction queue (XIQ) X-phase XIQ control logic R-phase control input e 1 〇ad signal eshift signal instruction byte and pre-decode information instruction byte 1 shift signal pre-decode information micro instruction 178, 210, 211, 212, 310, 311, 312: multiple The worker predicts the branch target address during the execution stage. The register corrects the address. The next sequential fetch address. The current fetch address. The current instruction index signal 185, 220, 221, 222, 320, 321, 322: register signal X — rel. —Inf 〇% Page 37 1232403 Brief description of the drawing 1 8 7: Format command queue 1 88: F_va 1 id signal 1 8 9: Valid bit register 1 9 1: 1 ate 0 signal 193: earlyO signal 194: Branch prediction related information 195: XIQ —ful 1 signal 197: Format instruction 198 ·· signal F_instr_info 1 9 9: FIQ full signal 202: clock signal (elk)

Claims (1)

12324031232403 -種用以緩衝-管線化微處理器中之指令及相 裝置,其中此緩衝裝置係直到取得該指令後至少二=的 週期’才取得該相關資訊,該裝置包括: 夕 %脈 一第一佇列,具有第一複數個項目,每個 以儲存一指令; '目係用 一第二佇列,具有對應於該第一複數個項目之 複數個項目,每個項目係用以儲存與該第一件列中一: 應該指令有關的資訊; —f-A kind of buffering-lined microprocessor instruction and phase device, wherein the buffering device does not obtain the relevant information until at least two = cycles after obtaining the instruction, the device includes: A queue has a first plurality of items, each to store an instruction; 'the project is a second queue, which has a plurality of items corresponding to the first plurality of items, each item is used to store and the First in the first column: relevant information should be ordered; -f 複數個控制訊號,耦接至該第一佇列,用以載入 移位及保持該些指令於該第一 宁列中;以及 複數個暫存器,接收該些控制訊號,並輸出延遲一 個時脈週期之該些控制訊號,以載入、移位及保持琴相 關資訊於該第二佇列中。 ' μ 2·如申請專利範圍第1項之裝置,其中儲存於該第一彳宁列 中之該些指令包括巨指令,而該第一佇列將該些巨指令 送至一指令轉譯器,以將該些巨指令轉譯成微指令。8 7 3·如申請專利範圍第2項之裝置,其中該些巨指令包括χ86 巨指令。A plurality of control signals coupled to the first queue to load shift and hold the instructions in the first queue; and a plurality of registers to receive the control signals and output a delay of one The control signals of the clock cycle are used to load, shift and hold the relevant information of the piano in the second queue. 'μ 2 · As for the device of the scope of patent application, the instructions stored in the first queue include huge instructions, and the first queue sends the giant instructions to an instruction translator, To translate these giant instructions into micro instructions. 8 7 3. If the device of the scope of patent application No. 2, wherein the giant instructions include χ86 giant instructions. 4·如申請專利範圍第1項之裝置,其中該些指令係藉由一 指令格式化器送至該第一佇列。 5·如申請專利範圍第4項之裝置,其中該些指令包括可變 長度的指令,而該指令格式化器接收一串未格式化指令 位元組,並將該串未格式化指令位元組格式化成該些可 變長度指令。4. The device according to item 1 of the scope of patent application, wherein the instructions are sent to the first queue by an instruction formatter. 5. The device according to item 4 of the patent application, wherein the instructions include variable-length instructions, and the instruction formatter receives a string of unformatted instruction bytes, and the string of unformatted instruction bits The group is formatted into the variable-length instructions. 第39頁 1232403 申請專利範圍 _ 6· &中請專利範圍第1 f Ρ I / 於該微處理器之—扣、、,/、中“緩 置係耦接 一指令執行階段之^快取記憶體與該微處理器管線之 今利範圍第6項之裝置,其中該相關資訊包括從 =微處理器内非該指令快取記憶體之—部份所^收括的攸資 圍第6項之裝置,其中該第二符列係組態 内,將該:二接收到該相關資訊之一相同時脈週期 、^才關一貝汛送到該執行階段。 n專利範圍第8項之裝置, 一複㈣項"之一底部項目包括: j之該第 肉© ί存器肖以在該相同時脈週期後之—時脈週坤 内針Ϊ”地儲存該第一仔列之該第-複數個項= —對=部項目所存的該指令之對應該相關資訊; 内接收該相關資訊,並具有一輸出嫂二「J:f週期 之-輸入端及該執行階段,至該暫存器 、m 又 依據一或多個該延遲一 Bi时 將Ξ之該ί控制訊號’在該相同時脈週期内,選擇性地 將該相關資訊送到該執行階段。 ι〇· t申請專利範圍第8項之裝置,其中該第二仔列係%離 =該相關資訊所關聯之一轉譯指令被送至該執行; 蚁時,在該相同時脈週期内,將該相關資訊 行階段。 疋王成钒 η.如中請專利範圍第1〇項之裝置,其中該轉譯指令係從 1232403 六'申請專利範圍 對應於該相關資訊的該第一佇列所存之該指令轉譯而 得。 12·如申請專利範圍第11項之裝置,其中該轉譯指令係藉 由從該第一佇列接收該指令之一指令轉譯器來轉譯。9 1 3·如申請專利範圍第1 2項之裝置,其中該指令轉譯器係 於該相同時脈週期内從該第一佇列接收該指令。“ 申請專利範圍幻項之裝置,其中該相關資訊 存於該第一佇列中之一對應該指令的_ 八 = 15.如申請專利範圍第i項之裝置,其中該相;;資曰訊\括 存於該第一佇列中之一對應該指令的一長度。 : 1 6 ·如申凊專利範圍第1項之裝置’其中該相關資訊包括與 儲存於該第一佇列中之一對應該指令相關的分支預測、 資訊,其中該對應該指令係被預測為—分支指令。 1 7·如申請專利範圍第丨6項之裝置,其中該分支預測資訊 包括該分支指令之一預測目標位址。 1 8·如申請專利範圍第1 6項之裝置’其中該分支預測資訊 包括一指示,以顯示該分支指令在不同的時脈週期, 疋否橫跨由一指令快取記憶體所輸出的二群指令位元 組。 如申請專利範圍第16項之裝置’其中至少一部份的該 分支預測資訊係從一分支目標位址快取記憶體所產生 的資訊中獲得。 2〇·如申請專利範圍第丨9項之裝置’其中該分支目標位址 快取記憶體包括一 N路集合關聯快取記憶體,其中該分 第41頁 1232403Page 39 1232403 Patent application scope _ 6 · & Chinese patent application scope 1 f Ρ I / In the microprocessor-buckle ,,,,,, "slow position is coupled to an instruction execution stage ^ cache The device of item 6 of the current range of memory and the microprocessor pipeline, wherein the related information includes the portion of the instruction cache memory that is not included in the microprocessor—the instruction is included in the item 6 Item of the device, in which the second symbol series configuration, will: one of the two received the relevant information with the same clock cycle, and then send it to the implementation stage. N of the eighth patent scope The device, one of the complex items " one of the bottom items includes: j 的 本 肉 © 存 器 肖 to store the first one listed after the same clock cycle—the clock Zhoukun The -plural items =-corresponding = corresponding information of the instruction stored in the item; the related information is received inside, and there is an output-"the input end of the J: f cycle and the execution phase, to the The register, m, according to one or more of the delays when Bi delays the control signal in the In the synchronic cycle, the relevant information is selectively sent to the execution stage. The device of the 8th patent application scope, wherein the second row is% li = a translation instruction associated with the relevant information Is sent to the execution; ant time, in the same clock cycle, the relevant information is phased out. 疋 王成 VAN η. If the device of the patent scope item 10 is requested, the translation instruction is applied from 1232403 VI ' The scope of the patent corresponds to the translation of the instruction stored in the first queue of the relevant information. 12. As for the device of the scope of application for the patent, the translation instruction is obtained by receiving the instruction from the first queue. An instruction translator is used to translate. 9 1 3. The device of item 12 of the patent application scope, wherein the instruction translator receives the instruction from the first queue in the same clock cycle. Item of the device, wherein the relevant information is stored in one of the first queue corresponding to the instruction _ eight = 15. If the device of the scope of patent application for item i, the phase; One of the first queue A length of the instruction. : 1 6 · If the device of claim 1 of the patent scope ', wherein the related information includes branch prediction and information related to a corresponding instruction stored in the first queue, where the corresponding instruction is predicted as — Branch instruction. 17. The device according to item 6 of the patent application, wherein the branch prediction information includes a prediction target address of one of the branch instructions. 1 8 · If the device of the scope of patent application No. 16 ', wherein the branch prediction information includes an instruction to show that the branch instruction is at different clock cycles, whether to cross the two output by an instruction cache memory Group instruction byte. For example, at least a part of the device of the patent application No. 16 of the branch prediction information is obtained from information generated by a branch target address cache memory. 2〇 If the device of the scope of patent application No. 丨 9 ′, wherein the branch target address cache memory includes an N-way set-associative cache memory, the branch page 41 1232403 六、申請專利範圍 支預測資訊包括一指示,以指出該分支目標位址快取 記憶體之一選取集合内一最近最少使用的路。 21·如申請專利範圍第19項之裝置,其中該分支目標位址 h夬取δ己憶體包括一 N路集合關聯快取記憶體,其中該分 支預測資訊包括該分支目標位址快取記憶體之該Ν路中 的哪一路會被取代。 22.如申請專利範圍第19項之裝置,其中該分支預測資訊 包括該分支目標位址快取記憶體用以更正該分支指令 之一錯誤預測的資訊。 2 3 ·如申請 包括分 2 4 ·如申請 包括用 25.如申請 包括用 2 6·如申請 指定該 27·如申請 存於該 2 8 ·如申請 存於該 2 9 · —種指 複 複 專利範圍 支經歷表 專利範圍 來預測該 專利範圍 來預測該 專利範圍 分支指令 專利範圍 第一佇列 專利範圍 第一佇列 令緩衝器 數個多工 數個暫存 第16項之裝置 資訊。 第1 6項之裝置,其中該分支預測資邻 分支指令之一線性指令指標。 第1 6項之裝置,其中該分支預測資郭 分支指令之一分支樣本。 第1 6項之裝置,其中該分支預測資邻 之一分支指令類型。 第1項之裝置,其中該相關資訊包括; 中之對應該指令的一位移欄位。 第1項之裝置,其中該相關資訊包括1 中之一對應該指令的一立即攔位。 ’包括: 式暫存器,每個係用以儲存一指令; 式多工器,每個係用以儲存與該些多 其中該分支預測資 訊6. Scope of Patent Application The branch prediction information includes an instruction to point out the least recently used path in one of the branch target address cache memory selection sets. 21. The device according to item 19 of the scope of patent application, wherein the branch target address h fetches δ-memory body including an N-way set-associative cache memory, wherein the branch prediction information includes the branch target address cache memory Which of the N paths will be replaced. 22. The device as claimed in claim 19, wherein the branch prediction information includes information of the branch target address cache memory for correcting an incorrect prediction of one of the branch instructions. 2 3 · If the application includes 2 4 · If the application includes 25. If the application includes 2 6 · If the application specifies 27 · If the application is stored in the 2 8 · If the application is stored in the 2 9 The patent scope supports the patent scope table to predict the patent scope to predict the patent scope branch instruction patent scope first queue patent scope first queue order buffer buffer multiple multiplex number temporary storage of the 16th device information. The device of item 16, wherein the branch predicts a linear instruction index of a neighboring branch instruction. The device of item 16, wherein the branch predicts a branch sample of one of the branch instructions. The device of item 16, wherein the branch predicts a branch instruction type of a neighbor. The device of item 1, wherein the related information includes; a displacement field corresponding to the instruction. The device of item 1, wherein the related information includes an immediate stop corresponding to one of the instructions. ’Includes: a type register, each of which is used to store an instruction; a type multiplexer, each of which is used to store and the plurality of which the branch prediction information 第42頁Page 42 1232403 六、申請專利範圍 工式暫存器之 對應暫存器中之該指令有關的資訊; 生一批邏輯,耦接至該些多工式暫存器,用以產 暫亡二二訊號’以選擇性地將該指令載入該些多工式 暫存器中的一個;以及 制訊i暫ί器,用以在一第一時脈週期内,接收該控 週期一值,並在該第一時脈週期後之一第二時脈 暫疒哭t!出該值’以選擇性地載入對應於該些多工式 #I = I的該一個之該些暫存式多卫器中的一個内之 巧相關資訊。 3 0 印,利範圍第2 9項之指令緩衝器,其中該些暫存 f多工器中的該對應一個亦組態為在該第二時脈週期 内’選擇性地輸出該相關資訊。 1·如申睛專利範圍第3〇項之指令緩衝器,其中該些多工 式暫f器中的該一個係組態為在該第二時脈週期内輪 出該指令。 μ 32·如申請專利範圍第31項之指令緩衝器,其中該些多工 式暫存器中的該一個在該第二時脈週期内,會 令輸出至一指令轉譯器。 曰 33· 一種微處理器,包括: 一指令格式化器,用以在一第一時脈週期 出一分支指令·, 輪 一控制邏輯,用以在該第一時脈週期後之一第一 時脈週期内,產生與該分支指令之預測有關的 以及1232403 VI. Information related to the instruction in the corresponding register of the industrial register of the patent application range; Generate a batch of logic, coupled to these multi-purpose registers, to produce the signal of temporary death 22 ' To selectively load the instruction into one of the multiplexed registers; and a message register i for receiving a value of the control cycle within a first clock cycle, and One of the second clocks after the first clock cycle is temporarily crying out! This value is used to selectively load the temporary multiplexers corresponding to the one of the multiplex #I = I Related information in one of these. The instruction buffer of the 30th item, the profit range item 29, wherein the corresponding one of the temporary f multiplexers is also configured to selectively output the relevant information within the second clock cycle. 1. The instruction buffer of item 30 in the patent scope, wherein one of the multiplexers is configured to rotate the instruction in the second clock cycle. μ 32. For example, the instruction buffer of the 31st scope of the patent application, wherein one of the multiplexed registers is output to an instruction translator during the second clock cycle. A microprocessor comprising: an instruction formatter for issuing a branch instruction in a first clock cycle; a round of control logic for performing a first operation after the first clock cycle During the clock cycle, the !232403 六、申請專利範圍 令緩衝器’耦接至該指令格式化器,用以 時脈週期期間緩衝該分支指令,並: 資訊’若該指令緩衝器在該第-4 ϋ況’而若該指令緩衝器在該第—時脈週期期間^ 訊:’則在該第二時脈週期期間選擇性地緩衝該資 34.如申請專利範圍第33項之微處理器,更包括. 衝琴:ίΪ階ΐ,輕接至該指令緩衝器,若該指令緩 時脈週期内,從該指令緩衝器接收該d: 該指令緩衝器接收該資訊。帛-時脈週期内,從 如申„月專利範圍第34項之微處理器, 一指令轉譯器,耦接至詨枯· 緩衝器在該第一時脈週期期間、,,” |器’若該指令 在該第二時脈週期内 ;;=I則該指令轉譯器 令,而若該指令緩衝衝器接收該分支指 空;則該指令轉譯器期:不為 緩衝器接收該分支指令。 月内’從該指令 36.如申請專利範圍第35項之微處理器, J器在該第-時脈週期期間為空:哕若該指令緩 第二時脈週期内’會從該指令轉譯器2行階段在該 奇镬收一轉譯微指 第44頁 1232403 該指令緩衝器在 執行階段在該第 收該轉譯微指令 列; 在該 與該指令 判斷 列移出; 若該 内將該相 移出,則 相關資訊 3 8 ·如申請專 若該 二時脈週 若該 移位,則 將該相關 之指令及相 之一第二時 時脈週期 六、申請專利範圍 令,而若 空,則該 轉譯器接 37. —種用以緩衝具有一管線 關資訊的方法,包括: 在一第一時脈週期内 第一時脈週期後 有關的資訊; 該指令在該第二 以及 指令未從該仔列 關資訊載入該符 在該第二時脈週 連同該指令送至 利範圍第3 7項之 指令未從該彳宁列 期期間是否在該 指令在該第二時 在該第二時脈週 ^訊在該彳宁列内 39· —種指令緩衝器,包括: 一第一多工器,包括 端、一載入資料輸入端以 該第一時脈# 二成,週期期間不為 „ 巧功内,會從該指令 之一微處理器中 ’將一指令ι X 载入一指令佇 週期内,產生 時脈週期期間,是否從該佇 移出’則在該第 列,而若該指令已從該佇 =間,繞過該佇列而; 該管線。 成 方法,更包括: 移出,則判斷該指令在該 符列内往下移位;以及 脈週期期間在該佇列内往 期後之一第三時脈週期内下 往下移位。 一輸出端、一保持資料輪 在一第一時脈週期内接^ K! 232403 6. The scope of the patent application makes the buffer 'coupled to the instruction formatter to buffer the branch instruction during the clock cycle, and: Information' If the instruction buffer is in the -4th condition 'and if The instruction buffer during the first clock cycle ^ message: 'then selectively buffer the data during the second clock cycle 34. If the microprocessor of the scope of the patent application No. 33, but also includes. : ΪΪΪ, lightly connect to the instruction buffer, if the instruction is slow in the clock cycle, receive the d from the instruction buffer: d: The instruction buffer receives the information.帛 -In the clock cycle, from the microprocessor in item 34 of the patent application of Rushen, an instruction translator is coupled to the buffer and buffer during the first clock cycle. If the instruction is within the second clock cycle;; = I then the instruction translator order, and if the instruction buffer buffer receives the branch finger empty; then the instruction translator period: do not receive the branch instruction for the buffer . Within the month 'from the instruction 36. If the microprocessor in the 35th scope of the patent application, the J device is empty during the -clock cycle: 哕 If the instruction is slower than the second clock cycle' will be translated from the instruction In the second line stage, a translation micro-finger is received at the odd stage. Page 44 1232403 The instruction buffer is executed in the received micro-instruction sequence during the execution phase; the judgment sequence with the instruction is moved out; if the phase is shifted out within , Relevant information 3 8 · If the application is dedicated to the second clock cycle if the shift, then the relevant instruction and phase one second clock cycle 6. Application for patent scope order, and if empty, then the Translator 37. A method for buffering information with a pipeline, including: information related to a first clock period after a first clock period; the instruction on the second and the instruction not from the List the relevant information to load whether the symbol was sent to the profit range with the instruction in the second clock cycle along with the instruction to the 37th item. Whether the instruction is not in the period of the Suining period or not, the instruction is in the second time in the second clock. Zhou ^ News in this Suininglie 39 · — species The order buffer includes: a first multiplexer, including a terminal, a load data input terminal with the first clock # 20%, during the period is not „skillful, will be from one of the instructions of the microprocessor "Load an instruction X into an instruction cycle, and whether to remove it from the frame during the clock cycle generation" is in the first column, and if the instruction has been removed from the frame, bypass the queue and The pipeline. The method further includes: moving out, judging that the instruction is shifted down in the rune; and during the pulse cycle, it is moved down in the third clock cycle that is one after the period in the queue. One output terminal, one holding data wheel is connected in a first clock cycle ^ K 1232403 六、申請專利範圍 :控制輸入端以接收一第-控制訊號,其 :右=控制輸入端之值為真,則該第一多工器選取該 戰入貝料輸入端,否則會選取該保持資料輪入端· 之_二,:暫存器’包括輕接至該第一“器輸出端 端之一輸出端; 符貝枓輸入 一第二暫存器,具有一輸入端及一輪出端· 之一輸多:器,包括麵接至該第二暫存器輸入端 料輸入端、j接t該第二暫存器輸出端之-保持資 一第_ β —載入-貝料輪入以在該第一時脈週期後之 控週期内接收與該指令有關的資訊i f -=輸入端以接H控制訊號 端,否則合^ &兮Γ苐一夕工器選取該載入資料輸入 會選取該保持資料輸入端;以及 期内接收制2 -輸入端以在該第-時脈週 時脈週期内Lr第:;,以及一ί出端以在該第二 制訊號在哕m —控制訊號,藉此,若該第一控 週期内,id,;;期期間為真,則在該第二時脈 4°.如申請專及該相關資訊。 工器更包括<項之指令緩衝器,其中該第二多 I枯一移位資%I认 夕 之一輪出,該第四暫存』)端,以接收一第四暫存器 器及該第二多工器:二用二儲存位於該第二暫存 器之—項目的資料,的:項目之上的該指令緩衝 _ 、 該第二多工器更包括一第二控制1232403 6. Scope of patent application: The control input terminal receives a first-control signal, which is: right = the value of the control input terminal is true, then the first multiplexer selects the warfare input terminal, otherwise it selects the Hold-in data round-in terminal · Part _: The register 'includes an output terminal that is lightly connected to the output terminal of the first device; Fu Beiyu inputs a second register with an input terminal and a round-out Terminal · One input is more than one: device, which includes the material input terminal connected to the input terminal of the second register, and the output terminal of the second register-j-holding the first _ β-loading-shell material Turn in to receive the information related to the command in the control period after the first clock cycle if-= input terminal to connect to the H control signal terminal, otherwise, the loader will select the load The data input will select the holding data input terminal; and the receiving system 2-input terminal to Lr: within the -clock cycle clock cycle; and an output terminal to select the second system signal at 哕m —control signal, by which, if the id period in the first control period is true; Pulse 4 °. If you apply for special information and the relevant information. The tool also includes the command buffer of the item, in which the second one will be replaced by one, and the fourth temporary storage. ” End to receive a fourth register and the second multiplexer: two uses two to store the data of the item located in the second register, the instruction buffer above the item_, the second The multiplexer includes a second control
TW92123370A 2003-04-23 2003-08-26 Apparatus and method for buffering instructions and late-generated related information using history of previous load/shifts TWI232403B (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US10/422,057 US7159097B2 (en) 2002-04-26 2003-04-23 Apparatus and method for buffering instructions and late-generated related information using history of previous load/shifts

Publications (2)

Publication Number Publication Date
TW200422947A TW200422947A (en) 2004-11-01
TWI232403B true TWI232403B (en) 2005-05-11

Family

ID=34272339

Family Applications (1)

Application Number Title Priority Date Filing Date
TW92123370A TWI232403B (en) 2003-04-23 2003-08-26 Apparatus and method for buffering instructions and late-generated related information using history of previous load/shifts

Country Status (2)

Country Link
CN (1) CN1310137C (en)
TW (1) TWI232403B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI791960B (en) * 2019-03-27 2023-02-11 聯發科技股份有限公司 Method and apparatus for data forwarding

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140208075A1 (en) * 2011-12-20 2014-07-24 James Earl McCormick, JR. Systems and method for unblocking a pipeline with spontaneous load deferral and conversion to prefetch
US9208066B1 (en) * 2015-03-04 2015-12-08 Centipede Semi Ltd. Run-time code parallelization with approximate monitoring of instruction sequences
WO2016156955A1 (en) * 2015-03-31 2016-10-06 Centipede Semi Ltd. Parallelized execution of instruction sequences based on premonitoring

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5608885A (en) * 1994-03-01 1997-03-04 Intel Corporation Method for handling instructions from a branch prior to instruction decoding in a computer which executes variable-length instructions
US5809272A (en) * 1995-11-29 1998-09-15 Exponential Technology Inc. Early instruction-length pre-decode of variable-length instructions in a superscalar processor
US5805878A (en) * 1997-01-31 1998-09-08 Intel Corporation Method and apparatus for generating branch predictions for multiple branch instructions indexed by a single instruction pointer
US6065110A (en) * 1998-02-09 2000-05-16 International Business Machines Corporation Method and apparatus for loading an instruction buffer of a processor capable of out-of-order instruction issue
US6823444B1 (en) * 2001-07-03 2004-11-23 Ip-First, Llc Apparatus and method for selectively accessing disparate instruction buffer stages based on branch target address cache hit and instruction stage wrap

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI791960B (en) * 2019-03-27 2023-02-11 聯發科技股份有限公司 Method and apparatus for data forwarding

Also Published As

Publication number Publication date
TW200422947A (en) 2004-11-01
CN1514357A (en) 2004-07-21
CN1310137C (en) 2007-04-11

Similar Documents

Publication Publication Date Title
JP6796468B2 (en) Branch predictor
US8364902B2 (en) Microprocessor with repeat prefetch indirect instruction
EP1439458B1 (en) Apparatus and method for invalidating instructions in an instruction queue of a pipelined microprocessor
US6647489B1 (en) Compare branch instruction pairing within a single integer pipeline
US7159097B2 (en) Apparatus and method for buffering instructions and late-generated related information using history of previous load/shifts
EP1624369B1 (en) Apparatus for predicting multiple branch target addresses
EP1513062B1 (en) Apparatus, method and computer data signal for selectively overriding return stack prediction in response to detection of non-standard return sequence
KR101059335B1 (en) Efficient Use of JHT in Processors with Variable Length Instruction Set Execution Modes
TWI416408B (en) A microprocessor and information storage method thereof
EP2084602B1 (en) A system and method for using a working global history register
WO2000017746A1 (en) Mechanism for store to load forwarding
CN111886581A (en) Accurate early branch prediction in high performance microprocessors
TW201042542A (en) Apparatus and a method in a microprocessor
CN112579175A (en) Branch prediction method, branch prediction device and processor core
TWI232403B (en) Apparatus and method for buffering instructions and late-generated related information using history of previous load/shifts
US11294684B2 (en) Indirect branch predictor for dynamic indirect branches
CN111459551A (en) Microprocessor with highly advanced branch predictor
US20050144427A1 (en) Processor including branch prediction mechanism for far jump and far call instructions
TWI231450B (en) Processor including fallback branch prediction mechanism for far jump and far call instructions
CN113377442A (en) Fast predictor override method and microprocessor
TWI249131B (en) Apparatus and method for killing an instruction after loading the instruction into an instruction queue in a pipelined microprocessor
TWI283827B (en) Apparatus and method for efficiently updating branch target address cache
US10318303B2 (en) Method and apparatus for augmentation and disambiguation of branch history in pipelined branch predictors
TWI844775B (en) Quick predictor override method and micro processor
US20040128477A1 (en) Early access to microcode ROM

Legal Events

Date Code Title Description
MK4A Expiration of patent term of an invention patent