CN101916184B - Method for updating branch target address cache in microprocessor and microprocessor - Google Patents

Method for updating branch target address cache in microprocessor and microprocessor Download PDF

Info

Publication number
CN101916184B
CN101916184B CN 201010260377 CN201010260377A CN101916184B CN 101916184 B CN101916184 B CN 101916184B CN 201010260377 CN201010260377 CN 201010260377 CN 201010260377 A CN201010260377 A CN 201010260377A CN 101916184 B CN101916184 B CN 101916184B
Authority
CN
China
Prior art keywords
branch
instruction
branch instruction
target address
cache
Prior art date
Application number
CN 201010260377
Other languages
Chinese (zh)
Other versions
CN101916184A (en
Inventor
汤玛斯·C·麦当劳
Original Assignee
威盛电子股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority to US23792009P priority Critical
Priority to US61/237,920 priority
Priority to US12/575,951 priority patent/US8832418B2/en
Priority to US12/575,951 priority
Application filed by 威盛电子股份有限公司 filed Critical 威盛电子股份有限公司
Publication of CN101916184A publication Critical patent/CN101916184A/en
Application granted granted Critical
Publication of CN101916184B publication Critical patent/CN101916184B/en

Links

Abstract

本发明提供一种更新微处理器中的分支目标地址快取的方法及其微处理器,其中该微处理器包括分支目标地址快取(BTAC)、执行单元及更新逻辑电路。 The present invention provides a method of updating the branch target address in the microprocessor cache method and a microprocessor, wherein the microprocessor includes a branch target address cache (the BTAC), execution unit, and update logic circuit. 执行单元执行事先从一指令快取的提取总量中提取的分支指令。 Execution unit executes a previously extracted from the total extracts an instruction cache in the branch instruction. 更新逻辑电路耦接至BTAC与执行单元,更新逻辑电路判断BTAC是否已经储存位于提取总量中的N个分支指令的分支预测信息,其中N至少等于二;若BTAC尚未储存N个分支指令的分支预测信息,则使用分支指令的分支信息来更新BTAC;若BTAC已经储存N个分支指令的分支预测信息,则判断分支指令的替换优先权是否高于BTAC中的N个分支指令的替换优先权;以及若分支指令的替换优先权高于BTAC中的N个分支指令的替换优先权,则使用分支指令的分支信息来更新BTAC。 Update logic circuit is coupled to the BTAC with the execution unit, the update logic circuit determines the branch prediction information is BTAC has overall storage located in the extraction of the N branch instruction, wherein N is equal to at least two; if BTAC unsaved N branch instruction branching prediction information, using a branch instruction branching information to update BTAC; branch prediction information when the BTAC has stored N number of branch instructions, it is determined that the replacement priority branch instruction is higher than a replacement priority BTAC the N branch instruction; and replacing the priority if the branch instruction is replaced priority over the BTAC the N branch instruction, using a branch instruction in the branch information to update BTAC.

Description

更新微处理器中的分支目标地址快取的方法及其微处理器 Update branch target address cache in the microprocessor approach and microprocessor

技术领域 FIELD

[0001] 本发明是关于微处理器,特别是关于微处理器中的分支目标地址快取(branchtarget address caches)。 [0001] The present invention relates to a microprocessor, in particular concerning the branch target address in the microprocessor cache (branchtarget address caches).

背景技术 Background technique

[0002] 传统的分支目标地址快取(branch target address cache ;BTAC)大约只能将两个分支指令储存至指令数据的一给定对齐(aligned)的16字节片段中。 [0002] The conventional branch target address cache (branch target address cache; BTAC) a given alignment (the aligned) 16-byte segments only about two branch instructions is stored to the command data. 此设计选择是为了缩短耗时并减少功率消耗与晶粒尺寸。 This design choice is to shorten the time-consuming and reduce power consumption and grain size. 允许储存三个或四个分支指令要比储存两个分支指令复杂的多。 Allow store three or four branch instructions than store more complex two branch instructions. 虽然从指令快取中提取三个或多个分支指令(其初始字节皆在相同的16字节中)的情况并不多见,但此情况确实会发生并且会对效能产生负面影响。 While extracting three or more branch instructions from the instruction cache (initial bytes are in the same 16 bytes) case of the rare, but this situation does occur and have performance negatively.

发明内容 SUMMARY

[0003] 本发明提供一种微处理器,包括一分支目标地址快取、一执行单元以及一更新逻辑电路。 [0003] The present invention provides a microprocessor, comprising a branch target address cache, a unit, and an update logic execution. 分支目标地址快取中的各个项目用以储存至多N个分支指令的多个分支预测信息。 Branch target address cache for each item in the plurality of branch prediction information is used for storing up to N branch instruction. 执行单元用以执行事先从一指令快取的一提取总量中提取的一分支指令。 Execution means for executing a branch instruction previously extracted from the total quantity of extracting an instruction cache's. 更新逻辑电路耦接至分支目标地址快取与执行单元,更新逻辑电路用以判断分支目标地址快取是否已经储存位于提取总量中的N个分支指令的分支预测信息,其中N至少等于二;若分支目标地址快取尚未储存位于提取总量中的N个分支指令的分支预测信息,则使用分支指令的分支信息来更新分支目标地址快取;若分支目标地址快取已经储存位于提取总量中的N个分支指令的分支预测信息,则判断分支指令的替换优先权是否高于分支目标地址快取中的N个分支指令的替换优先权;以及若分支指令的替换优先权高于分支目标地址快取中的N个分支指令的替换优先权,则使用分支指令的分支信息来更新分支目标地址快取。 Update logic circuit is coupled to the branch target address cache and the execution unit, the update logic circuit configured to determine a branch target address cache is already storing branch prediction information of the total located in the extraction of the N branch instruction, wherein N is equal to at least two; the branch prediction information of the branch destination address cache is not yet stored total situated extraction of the N branch instruction, using a branch instruction in the branch information to update the branch target address cache; if the branch target address cache has been stored is located in the extraction of total whether the branch prediction information of the N branch instructions, it is determined that the branch instruction is replaced priority over the branch target address cache of the N branch instruction replacement priority; and replacing the priority if the branch instruction is higher than the branch target address cache of N replacement priority branch instruction, using a branch instruction in the branch information to update the branch target address cache.

[0004] 本发明提供一种更新微处理器中的一分支目标地址快取(BTAC)的方法,其中分支目标地址快取中的各个项目用以储存来自一指令快取的一提取总量中至多N个分支指令的多个分支预测信息。 [0004] The present invention provides a branch target a newer microprocessors method cache (the BTAC) address, wherein the branch target address cache for each item for storing the total amount of an extract from an instruction cache of up to a plurality of branch prediction information of the N branch instruction. 上述方法包括执行事先从指令快取的提取总量中提取的一分支指令。 The method includes performing a branch instruction previously extracted from the total amount of the instruction cache fetches. 判断分支目标地址快取是否已经储存位于提取总量中的N个分支指令的分支预测信息,其中N至少等于二。 Determining whether the branch target address cache has stored the branch prediction information of the total located in the extraction of the N branch instruction, wherein N is equal to at least two. 若分支目标地址快取尚未储存位于提取总量中的N个分支指令的分支预测信息,则使用分支指令的分支信息来更新分支目标地址快取。 If the branch target address cache is not yet the branch prediction information of the total located in the extraction of the N branch instruction is stored, using a branch instruction branching information to update the branch target address cache. 若分支目标地址快取已经储存位于提取总量中的N个分支指令的分支预测信息,则判断分支指令的替换优先权是否高于分支目标地址快取中的N个分支指令的替换优先权。 Branch prediction information when the branch target address cache has stored the total located in the extraction of the N branch instructions, it is determined whether the replacement priority branch instruction is higher than the replacement priority branch target address cache of the N branch instruction. 若分支指令的替换优先权高于分支目标地址快取中的N个分支指令的替换优先权,则使用分支指令的分支信息来更新分支目标地址快取。 If the replacement priority branch instruction is higher than the replacement priority branch target address cache of the N branch instruction, using a branch instruction in the branch information to update the branch target address cache.

[0005] 为让本发明的上述和其它目的、特征、和优点能更明显易懂,下文特举出较佳实施例,并配合所附图式,作详细说明如下。 [0005] In order to make the aforementioned and other objects, features, and advantages can be more fully understood by reading and examples below, and the accompanying figures, described in detail below.

附图说明[0006]图1为本发明实施例的微处理器的方块图; BRIEF DESCRIPTION block diagram [0006] FIG 1 the microprocessor embodiment of the present invention;

[0007] 图2为本发明实施例的指令快取的方块图; [0007] FIG 2 cache block diagram of an instruction according to an embodiment of the present invention;

[0008] 图3为图1中的分支目标地址快取的配置方块图; [0008] FIG. 3 is 1 branch target address cache configuration block diagram;

[0009] 图4为图1中的更新逻辑电路所使用的分支指令型式优先权的结构图; [0009] FIG. 4 is a block diagram of a branch instruction pattern update logic circuit of FIG. 1 used in priority;

[0010] 图5A和5B为图1中的微处理器的操作流程图。 [0010] FIGS. 5A and 5B are operational flow diagram of FIG microprocessor.

[0011][主要元件标号说明] [0011] [Main elements Reference Numerals]

[0012] 100~微处理器; 102~指令快取; [0012] 100 to the microprocessor; 102 to instruction cache;

[0013] 104~提取单元; 106~指令解码器; [0013] The extraction unit 104 ~; ~ instruction decoder 106;

[0014] 108~指令队列; 112~加法器; [0014] The instruction queue 108 ~; 112 ~ adder;

[0015] 116~寄存器别名表; 118~保留站; [0015] - a register alias table 116; 118 to reservation station;

[0016] 122~执行单元; 124~引退单元; [0016] 122 ~ execution unit; 124 ~ retirement unit;

[0017] 126~第二分支历史表;128~分支目标地址快取; [0017] 126 to the second branch history table; 128 to the branch target address cache;

[0018] 132~返回堆栈; 134~控制逻辑电路; [0018] 132 ~ return stack; 134 ~ control logic circuit;

[0019] 136~更新逻辑电路; 138~虚拟随机产生器; [0019] 136 - the update logic circuits; 138 ~ pseudorandom generator;

[0020] 142~提取地址; 144~下一个序列提取地址; [0020] 142 ~ fetch address; in 144 ~ a sequence of fetch address;

[0021] 146~预测分支目标地址;148~预测返回地址; [0021] 146 ~ predicted branch target address; 148 ~ predicted return address;

[0022] 152~正确目标地址; 154~分支目标地址; [0022] 152 - the correct target address; 154 to the branch target address;

[0023] 162~整体分支样式; 164~第一分支历史表; [0023] 162 to the whole branch style; 164 ~ first branch history table;

[0024] 166~虚拟随机指标; 168~通用寄存器; [0024] 166 ~ pseudorandom indicators; 168 ~ general register;

[0025] 202~快取线; 302~项目; [0025] 202 ~ cache line; 302 ~ Project;

[0026] 304~分支目标地址预测;306~方向预测; [0026] 304 and the branch target address prediction; 306 ~ directional prediction;

[0027] 308~分支指令型式; 312~有效位。 [0027] 308 - the branch instruction patterns; 312 ~ valid bit.

具体实施方式 Detailed ways

[0028] 为了减少上述问题所造成的效能影响,以下实施例将提供一种替换策略(replacement policy),适用于从指令快取中提取的快取线的相同部分或总量(例如16字节)中具有额外分支指令(例如第三分支指令)的情况。 [0028] In order to reduce the efficacy on the above-described problems caused by the following examples provide an alternative strategy (replacement policy), adapted to extract from the instruction cache in a cache line of the same portions or the total amount (e.g., 16 bytes case) has an additional branch instruction (e.g. the third branch instructions). 此替换策略为一种以相关分支指令的型式为基础的优先机制(priority scheme),并且具有取代优先机制的虚拟随机措施(pseudo-random provision)用以适应不同的极端状况(corner cases)。 This replacement policy as a kind of type associated branch instruction based priority mechanism (priority scheme), and has a pseudo-random measures (pseudo-random provision) to adapt to different extremes (corner cases) replace priority mechanism.

[0029] 图1为本发明实施例的微处理器100的方块图。 [0029] FIG 1 the microprocessor embodiment of a block diagram 100 of the embodiment of the present invention. 微处理器100包括一指令快取102以及一提取单元104,提取单元104提供的提取地址142用以存取指令快取102。 The microprocessor 100 includes an instruction cache 102 and an extracting unit 104, extracting fetch address unit 104 142 for accessing the instruction cache 102. 提取单元104通过选择不同来源所提供的多个地址中的一者来输出提取地址142,上述来源包括:提取地址142本身、用以递增提取地址142的加法器112所提供的下一个序列提取地址144、分支目标地址快取(BTAC) 128提供的预测分支目标地址146、返回堆栈(return stack) 132提供的预测返回地址148、执行单元122提供的正确目标地址152,以及指令解码器106提供的分支目标地址154。 Extraction unit 104 by a plurality of address selecting different sources provided in one output fetch address 142 the source comprising: a fetch address 142 itself for the next sequence 112 provided incremented fetch address 142 to the adder fetch address 144, the predicted branch target address of a branch target address cache (BTAC) 128 provided 146, return stack (return stack) 132 prediction provide a return address 148, perform the correct target address unit 122 is 152, and an instruction decoder 106 provides branch target address 154. 控制逻辑电路134用以根据来自第一分支历史表164与第二分支历史表126的方向预测以及分支目标地址快取128的信息,控制提取单元104选择多个输入中的一者。 The control logic circuit 134 to predict based on the direction from the first branch history table 164 and the second branch history table 126, and a branch target address cache information 128, controls the extraction unit 104 selects a plurality of inputs of one. 举例而言,分支目标地址快取128的信息包括方向预测与分支指令预测的型式(例如呼叫/返回指令、间接分支(indirect branch)指令、条件相对(conditionalrelative)指令、非条件相对(unconditional relative)指令)。 For example, the branch target address cache information 128 includes prediction and the directional prediction type branch instruction (such as a call / return instructions, indirect branches (indirect branch) instruction, conditions are relatively (conditionalrelative) instruction, relative unconditional (unconditional relative) instruction).

[0030] 指令快取102根据提取地址142提供指令字节的快取线202至指令解码器106。 [0030] The instruction cache 102 provides instruction bytes based on the extracted address 142 cache line 202 to the instruction decoder 106. 指令快取102在每个时钟周期提供部分快取线202而不是整个快取线202。 Instruction cache 102 provides a portion of a cache line 202 rather than the entire cache line 202 at each clock cycle. 如图2所示,在本实施例中各个快取线202为64字节,并且指令快取102在每个时钟周期提供部分快取线202 (16字节)至指令解码器106或指令缓冲器(图未显示)。 2, in the present embodiment, each cache line 202 is 64 bytes, and the instruction cache 102 to provide some of the cache lines 202 (16 bytes) in each clock cycle to the instruction decoder 106 or the instruction buffer device (not shown). 指令解码器106用以将指令字节解码。 An instruction decoder 106 for the instruction byte is decoded. 在本实施例中,指令解码器106将x86架构指令转译成微指令(microinstructions)并提供至指令队列(instruction queue) 108。 In the present embodiment, the instruction decoder 106 to the x86 architecture instruction translated into microinstructions (microinstructions), and provided to the instruction queue (instruction queue) 108. 当指令解码器106将一分支指令解码时(该分支指令的目标地址是以相对于分支指令的地址的偏移量来计算),指令解码器106计算分支目标地址154并将分支目标地址154提供至提取单元104。 When the instruction decoder 106 a branch instruction is decoded (target address of the branch instruction is offset relative to the branch instruction address is calculated), an instruction decoder 106 calculates the branch target address 154 and the branch target address 154 provides to the extraction unit 104. 此外,指令解码器106将分支指令的地址提供至第二分支历史表(branchhistorytable) 126。 In addition, the instruction decoder 106 the address of the branch instruction is supplied to the second branch history table (branchhistorytable) 126. 第二分支历史表126储存关于先前执行的分支指令的方向历史信息。 Direction of the second branch history table 126 stored branch instruction regarding previous execution history information. 若分支指令地址命中于(hits in)第二分支历史表126,则分支指令地址预测分支指令会被取用(taken)并将预测结果传送至控制逻辑电路134。 If the branch instruction address hits in (hits in) the second branch history table 126, the branch instruction address predicted branch instruction will be drawn (taken) and the forecast is transmitted to the control logic circuit 134. 控制逻辑电路134使用上述预测来控制提取单元104。 The control logic circuit 134 uses the prediction to control the extraction unit 104.

[0031] 指令队列108将程序顺序中的指令提供至寄存器别名表(register aliastable ;RAT) 116,寄存器别名表116用以维护并产生各个指令的相依性信息(cbpendencyinformation)。 [0031] The instruction queue 108 in program order instructions provided to the register alias table (register aliastable; RAT) 116, a register alias table 116 for maintaining and generating dependency information for each instruction (cbpendencyinformation). 寄存器别名表116将指令配送(dispatch)至保留站(reservationstation) 118,保留站118用以将指令(可能是程序顺序外的指令)发送至执行单元122。 Register alias table 116 instructions distribution (dispatch) to a reservation station (reservationstation) 118, reservation station 118 for instructions (which may be additional instructions in program order) sent to the execution unit 122. 执行单元122用以执行分支指令。 Executing unit 122 to execute a branch instruction. 执行单元122也显示不同的分支预测器(分支目标地址快取128、返回堆栈132、第二分支历史表126以及第一分支历史表164)是否已正确地预测分支指令。 Executing unit 122 also displays different branch prediction unit (branch target address cache 128, the return stack 132, the second branch history table 126, and a first branch history table 164) whether the predicted branch instructions correctly. 执行单元122也根据分支指令的执行,使用历史信息来更新上述不同的分支预测器。 Executing unit 122 also according to the execution of the branch instruction, and updating the different branch predictor usage history information. 执行单元122也将正确目标地址152提供至提取单元104。 Executing unit 122 will be correct target address 152 is supplied to the extraction unit 104. 执行单元122也更新微处理器100所储存的整体分支样式(globalbranch pattern) 162,当提取地址142出现于第一分支历史表164时,第一分支历史表164会使用整体分支样式162来执行方向预测。 Executing unit 122 also updates the whole branch fashion (globalbranch pattern) 162 the microprocessor 100 stored, when the fetch address 142 appears in the first branch history table 164, a first branch history table 164 uses the entire branch pattern 162 performs direction prediction. 在执行单元122执行指令之后,引退单元(retire unit) 124用以引退由重排序缓冲器(图未显示)所储存的程序顺序中的指令。 After the unit 122 performs the instruction execution, retirement unit (retire unit) 124 for retirement a reordering buffer (not shown) in program order stored instructions.

[0032] 请参考图3,图3为图1中的分支目标地址快取128的配置方块图。 [0032] Please refer to FIG. 3, FIG. 3 for the branch target address in FIG. 1 cache configuration block diagram 128. 分支目标地址快取128用以储存关于先前执行的分支指令的信息,并且在后续执行期间使用此信息来预测这些分支指令的目标地址、方向以及型式。 The branch target address cache 128 goals for storing information about branch instructions previously executed, and use this information during a subsequent execution to predict the branch instruction address, direction and type. 如图3所示,分支目标地址快取128中的各个项目(entry) 302包括一有效位312、一分支目标地址预测304、一方向预测306 (即分支指令是否会被取用(taken)或不取用(not taken))以及一分支指令型式308。 As shown, the branch target address cache 128 in each item 3 (entry) 302 comprises a valid bit 312, a branch target address prediction 304, a directional prediction 306 (i.e., the branch instruction whether to be drawn (taken) or no access (not taken)) and a branch instruction pattern 308. 在一实施例中,分支指令型式308用以指定分支指令是否为一呼叫/返回指令、间接分支指令、条件相对分支指令或非条件相对分支指令。 In one embodiment, the branch instruction type 308 for specifying a branch instruction is a call / return instructions, indirect branch instruction, the condition relative-branch instructions or unconditional relative branch instruction. 微处理器100中的更新逻辑电路136的优点在于使用分支指令型式308用以明智地替换分支目标地址快取128中的项目302,细节将在以下做进一步说明。 Advantages update logic circuit 136 in the microprocessor 100 in that a branch instruction pattern 308 to replace wisely branch target address cache 128 of the item 302, the details will be hereinafter further described. 如图3所示,分支目标地址快取128可在指令快取102中的快取线202的各个部分或提取总量(fetchquantum)(例如16字节)中储存两个项目302(标记为“A”与“B”)。 3, the branch target address cache 128 can cache parts of or the total amount of 102 cache line 202 extract (fetchquantum) (e.g. 16 bytes) stored two items 302 (labeled as instruction " A "and" B "). 换言之,分支目标地址快取128可储存部分快取线202中的至多两个分支指令的预测信息。 In other words, the branch target address cache prediction information up to two branch instructions 128 may store some of the cache line 202. 如上所述,在部分快取线202中具有超过两个分支指令的情况下,此限制会降低分支预测的效能。 As described above, the case where more than two branch instructions in some of the cache lines 202, this limit may reduce the effectiveness of branch prediction. 然而,更新逻辑电路136使用一明智的替换策略用以降低效能影响,细节将在以下做进一步说明。 However, update logic circuit 136 uses a sensible alternative strategies to reduce the performance impact, the details will be hereinafter further described. 在一实施例中,分支目标地址快取128也包括各个A/B项目对(entrypairs)的一最近最少使用(least-recently-used ;LRU)位(图未显示),用以显示最近最少使用A侧还是B侧以便决定是否要替换A项目302或B项目302。 In one embodiment, the branch target address cache 128 also includes a respective A / B project (entrypairs) of a least recently used (least-recently-used; LRU) bits (not shown) for displaying the least recently used A side or the B side in order to decide whether to replace the project A 302 or B item 302. 在本实施例中,虽然分支目标地址快取128储存每个部分快取线202 (16字节)中的两个分支指令的预测信息,但可依据设计需要来改变部分快取线202的大小以及每个部分快取线202中的分支指令的数目。 In the present embodiment, although the branch target address cache 128 storing prediction information for two branch instruction (16 bytes) in each part of the cache lines 202, but may change some of the cache lines 202 size depending on design requirements and the number of branch instructions each part of the cache line 202.

[0033] 参考回图1,当提取地址142出现于分支目标地址快取128时,分支目标地址快取128将信息提供至提取单元104、指令解码器106、返回堆栈132以及控制逻辑电路134。 [0033] Referring back to FIG 1, when the fetch address 142 appears in the branch target address cache 128, the branch target address cache 128 provides information to the extraction unit 104, an instruction decoder 106, the return stack 132, and a control logic circuit 134. 仔细而言,分支目标地址快取128将作为预测分支目标地址146的分支目标地址预测304提供至提取单元104,并且将方向预测306与分支指令型式308提供至控制逻辑电路134。 Carefully, the branch target address cache 128 a prediction 304 is provided to the extraction unit 104 as the predicted branch target address of the branch target address 146, and the direction of the predicted 306 and the branch instruction pattern 308 provided to the control logic circuit 134. 此夕卜,分支指令型式308沿着具有分支指令的管线传递,并且执行单元122随后将分支指令型式308提供至更新逻辑电路136以便执行分支目标地址快取128的替换策略,细节将在以下做进一步说明。 This evening Bu branch instruction pattern 308 along line transmission having a branch instruction, and executing unit 122 then a branch instruction pattern 308 is provided to in order to perform replacement strategy the branch target address cache 128, update logic circuit 136, details will be explored in Further explanation.

[0034] 返回堆栈132用以储存由呼叫指令产生的返回地址。 [0034] The return stack 132 return address generated by the call instruction to store. 当分支目标地址快取128显示提取地址142所指定的部分快取线202包含一呼叫指令时,返回堆栈132将具有一返回地址。 When the branch target address cache 128 displays fetch address 142 designated part cache line 202 includes a call instruction, the return stack 132 having a return address. 当分支目标地址快取128显示提取地址142所指定的部分快取线202包含一返回指令时,返回堆栈132将预测返回地址148提供至提取单元104。 When the branch target address cache 128 displays the fetch address 142 specified portion of the cache line 202 includes a return instruction, the return stack 132 prediction return address 148 is supplied to the extraction unit 104.

[0035] 微处理器100也包括一虚拟随机产生器138用以提供一虚拟随机指针166至更新逻辑电路136。 [0035] The microprocessor 100 also includes a pseudo-random 166 to update the logic circuit 136 generates 138 for providing a pseudorandom pointer. 更新逻辑电路136的优点在于使用虚拟随机指针166来执行分支目标地址快取128的替换策略,用以改善以严格优先权为基础(strictly priority-based)的替换策略,细节将在以下做进一步说明。 Advantages update logic circuit 136 is the use of pseudo-random pointer 166 to execute the branch target address cache replacement policy 128, to improve the strict priority replacement policy based on (strictly priority-based), the details of which will in the following be further described . 在一实施例中,虚拟随机产生器138为一15位的线性反馈移位寄存器(linearfeedback shift register ;LFSR),用以在虚拟随机顺序中的所有215个状态(除了全为O状态)内循环,并且在虚拟随机产生器138产生相同的重复产生样式(generation pattern repeats)之前,时钟周期数量为32767个时钟周期。 In one embodiment, the pseudorandom generator 138 is a 15-bit linear feedback shift register (linearfeedback shift register; LFSR) for all 215 state to the pseudo-random sequence (except full is O state) within the loop and 138 produce the same repeating before generating pattern (generation pattern repeats), number of clock cycles to 32,767 cycles in a pseudorandom generator. 当有需要时,可从15位中取样5位来产生虚拟随机指针166。 When necessary, samples were taken from 15 in 5 generates pseudorandom pointer 166. 因此,虚拟随机指标166大约每32个时钟周期平均为真值(true) —次。 Thus, pseudorandom index 166 approximately every 32 clock cycles mean a true value (true) - times.

[0036] 请参考图4,图4为图1中的更新逻辑电路136所使用的分支指令型式优先权的结构图。 [0036] Please refer to FIG. 4, FIG. 4 is a block diagram of a branch instruction version update logic circuit 136 of FIG. 1 used in priority. 如图4所示,间接型式的分支指令具有最高优先权(表示最后才被替换),呼叫/返回型式的分支指令具有第二高优先权,条件相对型式的分支指令具有第三高优先权,而非条件相对型式的分支指令具有最低优先权(表示可最先被替换)。 As shown, an indirect type of branch instruction has four highest priority (expressed finally being replaced), call / return type of the branch instruction having the second highest priority, conditions are relatively types of branch instructions having the third highest priority, rather than the condition of the opposing type branch instruction having the lowest priority (represents the first to be replaced).

[0037] 相对型式的分支指令的目标地址是以相对于分支指令的地址的总偏移量来计算,并且偏移量为指令本身中的字段。 Destination address [0037] opposite type of branch instruction based on the total offset with respect to the branch instruction address is calculated, and the instruction at offset itself field. 因此,指令解码器106可正确地计算相对型式的分支指令(包括条件相对分支指令以及非条件相对分支指令)的分支目标地址154。 Thus, the instruction decoder 106 can correctly calculate the relative type of branch instruction (including conditions relative-branch instructions and unconditional relative branch instruction) is a branch target address 154. 此外,由于已经知道非条件相对分支指令的方向,因此指令解码器106可准确地解析(resolve)非条件相对分支指令。 Further, since the already known direction unconditional relative branch instruction, so the instruction decoder 106 can parse accurately (Resolve) unconditional relative branch instruction. 因此,分支目标地址快取128误预测(mispredict) —非条件相对分支指令所产生的代价(penalty),相对小于误预测其它型式的分支指令所产生的代价。 Thus, the branch target address cache 128 misprediction (mispredict) - unconditioned relative cost of the branch instruction generated (Penalty), relatively smaller than misprediction cost of other types of branch instructions generated. 在一实施例中,误预测代价在最糟的情况下大约为七个时钟周期,但根据指令队列108的使用率(fullness),误预测代价也会少于七个时钟周期。 In one embodiment, the misprediction cost of the worst case is approximately seven clock cycles, but according to the usage rate (fullness,) instruction queue 108, erroneous prediction cost will be less than seven clock cycles. 这就是为什么非条件相对分支指令具有最低优先权(表不可最先被替换)。 This is why the unconditional relative branch instructions having the lowest priority (table is not the first to be replaced). 在一实施例中,分支目标地址快取128的项目302包括一旗标(flag)用以显示分支指令是否为一非条件相对分支指令,因此若部分快取线202中具有超过两个分支指令,则更新逻辑电路136替换分支目标地址快取128中的非条件相对分支指令,并且更新逻辑电路136通常不会将其它型式的分支指令替换为一非条件相对分支指令。 In one embodiment, the branch target address cache entry 128 302 comprises a flag (In Flag) for displaying the branch command is a relative branch instruction is an unconditional, so if some of the cache line 202 having more than two branch instructions , update logic circuit 136 replacement branch target address cache 128 of unconditional relative branch instruction, and updating the logic circuit 136 does not typically replaced by other types of branch instruction is an unconditional relative branch instruction.

[0038] 与相对型式的分支指令相比,微处理器100的通用寄存器168中的某些操作数(operand)或存储器位置中的某些操作数可用来计算一间接型式的分支指令目标地址。 [0038] Compared with the opposite type of branch instruction, some of the operands (the operand) microprocessor general purpose register 168,100 of or some of the operand memory locations can be used to calculate an indirect type of branch instruction target address. 因此,指令解码器106不会预测间接分支指令,并且是由执行单元122来计算间接分支指令目标地址。 Thus, instruction decoder 106 does not predict indirect branch instruction, and is 122 calculated by the execution unit indirect branch instruction target address. 因此,分支目标地址快取128误预测一间接分支指令所产生的代价,通常会大于误预测其它型式的分支指令所产生的代价。 Thus, the branch target address cache 128 misprediction cost of an indirect branch instruction is generated, usually greater than misprediction cost of other types of branch instructions generated. 这就是为什么间接分支指令具有最高优先权(表示最后才被替换)。 This is why indirect branch instruction with the highest priority (represented finally being replaced).

[0039] 此外,替换分支目标地址快取128中的呼叫/返回指令(返回堆栈132中具有该呼叫/返回指令的一有效返回地址),会导致返回堆栈132未对齐(misaligned)使得返回堆栈132很有可能在之后会误预测,因而产生负面效能影响。 [0039] Further, alternatively call the branch target address cache 128 / return instruction (return stack 132 having the call / return instructions, a valid return address), cause the return stack 132 is not aligned (misaligned) so that the return stack 132 It is likely to be mistaken prediction after, resulting in a negative performance impact. 这就是为什么呼叫/返回指令具有第二高优先权。 This is why the call / return instructions having the second highest priority.

[0040] 最后,虽然通过指令解码器106 (目标地址)、第二分支历史表126 (方向)以及分支目标地址快取128来预测条件相对分支指令,但由于在本实施例中的分支目标地址快取128的大小大于第二分支历史表126,因此分支目标地址快取128的方向预测会比较准确。 [0040] Finally, although by the instruction decoder 106 (destination address), the second branch history table 126 (direction) and a branch target address cache 128 to predict the condition of the relative branch instruction, but the branch target address in the present embodiment the size of the cache 128 is greater than the second branch history table 126, and therefore the branch target address cache direction 128 predictions would be more accurate. 此外,从分支目标地址快取128中移除条件相对分支指令会导致整体分支样式162产生误差。 In addition, from the branch target address cache 128 to remove the conditions relative branch instruction will lead to a whole branch pattern 162 errors. 基于上述理由,条件相对分支指令是高于非条件相对分支指令因而具有第三高优先权。 Relative-branch instructions above reasons, provided that the above unconditional relative branch instruction which has the third highest priority.

[0041] 请参考图5,图5为图1中的微处理器100的操作流程图。 [0041] Please refer to FIG. 5, FIG. 5 is a flowchart of the operation in FIG. 1 the microprocessor 100. 流程从步骤502开始。 Process begins at step 502.

[0042] 在步骤502中,执行单元122执行一全新的分支指令并提供相关信息至更新逻辑电路136。 [0042] In step 502, the execution unit 122 to perform a new branch instruction and provide relevant information to the update logic circuit 136. 流程前进至步骤504。 The flow proceeds to step 504.

[0043] 在步骤504中,更新逻辑电路136使用上述分支指令的地址用以在分支目标地址快取128中建立索引。 [0043] In step 504, the refresh address logic circuit 136 uses the branch instruction is used to index the branch target address cache 128. 流程前进至判断步骤506。 Flow proceeds to decision step 506.

[0044] 在判断步骤506中,更新逻辑电路136检查A项目302与B项目302的有效位312,用以判断快取线202的相同部分(same portion)中是否具有超过两个分支指令。 [0044] In decision step 506, update logic circuit 136 checks A program 302 and B project significant bits 312,302 and is used to determine the same section (same portion) cache line 202 whether more than two branch instructions. 若有,流程前进至步骤512 ;若没有,流程前进至步骤508。 If so, the flow proceeds to step 512; if not, the flow proceeds to step 508.

[0045] 在步骤508中,更新逻辑电路136使用与上述分支指令相关的执行信息来更新分支目标地址快取128。 [0045] In step 508, update logic circuit 136 uses associated with said branch instruction execution information to update the branch target address cache 128. 换言之,更新逻辑电路136写入无效的A项目302或B项目302。 In other words, update logic circuit 136 writes an invalid Program A 302 or B item 302. 流程结束于步骤508。 The process ends at step 508.

[0046] 在步骤512中,更新逻辑电路136检查执行单元122所提供的上述分支指令的分支指令型式308,以及A项目302与B项目302中的两个有效分支指令(根据不同实施例,上述两个有效分支指令是来自于分支目标地址快取128或执行单元122)的分支指令型式308。 [0046] In step 512, update logic circuit 136 checks the execution unit the branch instruction 122 provided in the branch instruction patterns 308, and A projects 302 and two active branch instruction B program 302 (depending on the embodiment, the above-described branch instruction two active branch instruction from the branch target address cache 128 or the execution unit 122) of the pattern 308. 流程前进至判断步骤514。 Flow proceeds to decision step 514.

[0047] 在判断步骤514中,更新逻辑电路136判断上述分支指令的分支指令型式308是否高于A项目302与B项目302中的两个有效分支指令的分支指令型式308。 [0047] In decision step 514, update logic circuit 136 determines the branch instruction is a branch instruction type 308 is higher than two branch instructions A valid item 302 and item 302 of the B branch instruction pattern 308. 若是,流程前进至步骤516 ;若否,流程前进至步骤518。 If so, the flow proceeds to step 516; if not, the flow proceeds to step 518.

[0048] 在步骤516中,更新逻辑电路136使用与上述分支指令相关的执行信息来更新分支目标地址快取128。 [0048] In Step 516, update logic circuit 136 uses associated with said branch instruction execution information to update the branch target address cache 128. 换言之,更新逻辑电路136替换A项目302与B项目302中的两个有效分支指令中的一者。 In other words, update logic circuit 136 alternatively two active branch instructions A program 302 and B item 302 of one. 在一实施例中,更新逻辑电路136根据LRU位选择索弓I集合(indexedset)与选择路径(selected way)的A项目302或B项目302。 In one embodiment, update logic circuit 136 selectin I bow set (indexedset) The LRU bits and select the path (selected way) of Project A 302 or B item 302. 流程结束于步骤516。 The process ends at step 516.

[0049] 参考步骤518,更新逻辑电路136检查虚拟随机指针166。 [0049] Referring to step 518, update logic circuit 136 checks pseudorandom pointer 166. 流程前进至判断步骤522。 Flow proceeds to decision step 522.

[0050] 在判断步骤522中,更新逻辑电路136判断上述分支指令是否为一非条件相对型式的分支指令。 [0050] In decision step 522, update logic circuit 136 determines the branch instruction is an unconditional opposite type of branch instruction. 若是,流程前进至判断步骤524 ;若否,流程前进至判断步骤532。 If so, flow proceeds to decision step 524; if not, the flow proceeds to decision step 532.

[0051] 在判断步骤524中,更新逻辑电路136检查虚拟随机指针166是否为真值。 [0051] In decision step 524, update logic circuit 136 checks the pseudorandom pointer 166 is true values. 若是,流程前进至步骤526 ;若否,流程前进至步骤528。 If so, the flow proceeds to step 526; if not, the flow proceeds to step 528.

[0052] 在步骤526中,更新逻辑电路136使用新执行的分支指令的分支信息来更新分支目标地址快取128。 [0052] In step 526, the update logic circuit 136 using a branch instruction newly executed branch information to update the branch target address cache 128. 流程结束于步骤526。 The process ends at step 526.

[0053] 在步骤528中,更新逻辑电路136不使用新执行的分支指令的分支信息来更新分支目标地址快取128。 [0053] In step 528, update logic circuit 136 without using a branch instruction newly executed branch information to update the branch target address cache 128. 流程结束于步骤528。 The process ends at step 528.

[0054] 在判断步骤532中,更新逻辑电路136判断三个分支指令(即新执行的分支指令以及A项目302与B项目302中的两个分支指令)是否皆为条件相对分支指令。 [0054] In decision step 532, update logic circuit 136 determines three branch instructions (i.e. branch instructions and A projects newly executed two branch instruction 302 and B project 302) whether or are all conditions relative-branch instructions. 若是,流程前进至判断步骤534 ;若否,流程前进至步骤528。 If so, flow proceeds to decision step 534; if not, the flow proceeds to step 528.

[0055] 在判断步骤534中,更新逻辑电路136判断指令解码器106或第二分支历史表126是否正确地预测新执行的分支指令。 [0055] In decision step 534, update logic circuit 136 determines the instruction decoder 106 or the second branch history table 126 to predict whether a branch instruction is newly executed correctly. 若是,流程前进至判断步骤524 ;若否,流程前进至步骤526。 If so, flow proceeds to decision step 524; if not, the flow proceeds to step 526.

[0056] 本发明人观察到在部分快取线202中具有三个分支指令的情况下,程序有时会按顺序执行其指令而造成重复执行这三个分支指令的情形,因此有可能会替换分支目标地址快取128中的另一个分支指令。 [0056] The present inventors observed the case of a three branch instructions in some of the cache line 202, the program may be executed in the instruction sequence resulting situation is repeatedly performed three branch instructions, it is possible to replace the branch target address cache another branch instruction 128. 然而,大部分的时间只会执行这三个分支指令中的两个(或一个)分支指令,这将影响上述步骤502〜516中以严格优先权为基础的替换策略的效能。 However, most of the time will perform these three branch instruction two (or one) of the branch instruction, which would affect the performance of replacement policy strict priority based on the above steps 502~516 in. 举例而言,假设程序具有一外循环与一内循环,其中外循环包括一条件相对分支指令(例如第一X86JCC指令),内循环包括一第二X86JCC指令与一非条件相对分支指令(例如X86JMP指令),并且内循环跟随在x86JCC指令之后,x86JMP指令跟随在第二x86JCC指令之后。 For example, assume that the program has an outer loop and an inner loop, wherein the outer loop includes a condition relative-branch instructions (e.g., a first X86JCC instruction), the loop comprising a second X86JCC instruction and a unconditional relative branch instruction (e.g. X86JMP instruction), and the inner loop follows the instruction x86JCC, x86JMP x86JCC second instruction follows the instruction. 在此情况下,通常希望分支目标地址快取128的A项目302与B项目302中包含内循环中的分支指令(第二X86JCC指令与X86JMP指令),而不是包含外循环中的分支指令(第一X86JCC指令)。 In this case, it is generally desirable to the branch target address cache 128. A program 302 and B project 302 contains the branch instruction (second X86JCC instruction X86JMP instruction) within the loop, rather than including the branch instruction outer loop (the first a X86JCC instructions). 然而,由于X86JCC指令是高于x86JMP指令,因此根据以严格优先权为基础的替换策略,分支目标地址快取128中的A项目302与B项目302会包含两个x86JCC指令,并且更新逻辑电路136不会将这两个X86JCC指令中的任一者替换为X86JMP指令,这种结果是不理想的。 However, since X86JCC instruction is above x86JMP instruction, according to the replacement policy strict priority basis, the branch target address cache project A 128 302 B project 302 contains two x86JCC instructions, and update logic circuit 136 not these according to any one of the two X86JCC instructions replace X86JMP instruction, which result is undesirable.

[0057] 为了降低效能影响,虚拟随机产生器138提供虚拟随机指针166至更新逻辑电路136,相关细节请参考上述步骤518〜528。 [0057] In order to reduce the effectiveness of the influence, the pseudorandom generator 138 provides a pseudorandom pointer 166 to update logic circuit 136, details refer to the above steps 518~528. 值得注意的是,虚拟随机指针166随着微处理器100的时钟周期呈现规律性的变化,但由于大部分程序并不随着时钟周期规律地执行一给定的分支指令,因此虚拟随机指针166与分支指令的执行呈现随机性的变化。 Notably, the pointer 166 as a pseudorandom clock cycle of the microprocessor 100 exhibits changed regularly, but because most programs are not performed regularly with a clock cycle of a given branch instruction, pointer 166 and thus pseudorandom executing a branch instruction presented random changes.

[0058] 因此,假设虚拟随机指标166大约每32个时钟周期平均为真值(true) —次,步骤518〜528所实现的替换策略会使得更新逻辑电路136将外循环中的第一x86JCC指令替换为内循环的第32个执行实例(execution instance)中的x86JMP指令,并且内循环中的x86JMP指令会储存在分支目标地址快取128中,直到外循环中的第一X86JCC指令再次被执行。 [0058] Thus, assuming pseudo-random index 166 approximately every 32 clock cycles mean a true value (true) - times, replacement policy steps 518~528 achieved will be such that the update logic circuit 136 of the outer loop of the first x86JCC instruction replace the circulation of 32 instance is executed (execution instance) in x86JMP instructions and the inner loop x86JMP instruction stored in the branch target address cache 128 until the outer loop of the first X86JCC instruction is executed again.

[0059] 此外,若在一给定的部分快取线202中具有三个X86JCC指令,更新逻辑电路136会检查指令解码器106或第二分支历史表126是否正确地预测X86JCC指令,若有正确地预测X86JCC指令,则根据步骤532、534以及528,更新逻辑电路136通常不会替换其它两个X86JCC指令中的一者。 [0059] Further, if a given portion of the cache line 202 having three X86JCC instruction, update logic circuit 136 checks the instruction decoder 106 or the second branch history table 126 to predict whether X86JCC instructions correctly, if correct It predicted X86JCC instruction, then in accordance with step 532, 534 and 528, update logic circuit 136 typically does not replace the other two X86JCC instructions one. 由于在本实施例中,第二分支历史表126的大小与所使用的算法复杂度皆小于分支目标地址快取128与第一分支历史表164,因此必须将难以预测(hard-to-predict)的x86JCC指令储存在方向预测最准确的分支目标地址快取128中。 Since in the present embodiment, the second branch history table size and complexity of the algorithm used in 126 are all smaller than the branch target address cache 128 of the first branch history table 164, it must be difficult to predict (hard-to-predict) the x86JCC instructions stored in the directional prediction most accurate branch target address cache 128. 然而,为了避免上述类似情况(较常查见(see)三个X86JCC指令中的两者,并且很少执行三个X86JCC指令中的一者),根据步骤532、534以及526,更新逻辑电路136会允许运作良好(well-behaved)的x86JCC指令(即内循环中被指令解码器106或第二分支历史表126所正确预测的X86JCC指令)继续执行(go ahead),并且替换其它X86JCC指令中的一者(通常位于内循环的第32个执行实例(execution instance)中)。 However, to avoid similar to the above case (more often to check see (See) of both three X86JCC instructions, and rarely performed three X86JCC instruction one), in accordance with step 532, 534 and 526, update logic circuit 136 It will allow good functioning (well-behaved) of x86JCC instruction (i.e., the inner loop is the instruction 126 to correctly predict the decoder 106 or the second branch history table X86JCC instruction) continues (go ahead), and replaced by other X86JCC instructions a (usually located within the loop of 32 instance is executed (execution instance) in).

[0060] 本发明虽以各种实施例揭露如上,然其仅为范例参考而非用以限定本发明的范围,任何本领域技术人员,在不脱离本发明的精神和范围内,当可做些许的更动与润饰。 [0060] While the invention has various embodiments disclosed above, however, its merely exemplary reference and not to limit the scope of the present invention, anyone skilled in the art, without departing from the spirit and scope of the invention, as do some modifications and retouching. 举例而言,可使用软件来实现本发明所述的装置与方法的功能、构造、模块化、模拟、描述及/或测试。 For example, software may be used to implement functions of the apparatus and method according to the present invention, configuration, modular, simulation, description and / or testing. 此目的可通过使用一般程序语言(例如C、C++)、硬件描述语言(包括Verilog或VHDL硬件描述语言等等)、或其它可用的程序来实现。 This object is achieved by the use of general programming languages ​​(e.g., C, C ++), hardware description languages ​​(including Verilog or VHDL hardware description language, etc.), or other available program. 该软件可被设置在任何计算机可用的媒体,例如半导体、磁盘、光盘(例如CD-ROM、DVD-ROM等等)中。 The software may be provided on any computer-usable medium such as semiconductor, magnetic disk, optical disk (e.g., CD-ROM, DVD-ROM, etc.). 本发明实施例中所述的装置与方法可被包括在一半导体智慧财产权核心(semiconductorintellectual propertycore),例如以硬件描述语言(HDL)实现的微处理器核心中,并被转换为硬件型态的集成电路产品。 Embodiment of the present invention, the apparatus and method described in the may be included in the microprocessor core of a semiconductor intellectual property core (semiconductorintellectual propertycore), for example, a hardware description language (HDL) implementation, and is converted into a hardware type of integrated circuit products. 此外,本发明所描述的装置与方法可通过结合硬件与软件的方式来实现。 Further, the apparatus and method described in the present invention may be implemented by way of combination of hardware and software. 因此,本发明不应该被本文中的任一实施例所限定,而当视所附的权利要求范围与其等效物所界定者为准。 Accordingly, the present invention should not be any article in one limited to the examples, when the range of their equivalents depending on the appended claims and their equivalents. 特别是,本发明是实现于一般用途计算机的微处理器装置中。 In particular, the present invention is implemented in the microprocessor means a general-purpose computer. 最后,任何本领域技术人员,在不脱离本发明的精神和范围内,当可作些许更动与润饰,因此本发明的保护范围当视所附的权利要求范围所界定者为准。 Finally, anyone skilled in the art, without departing from the spirit of the invention and scope, when the scope of the claims may make various modifications and variations, and therefore the scope of the present invention is best appended claims and their equivalents.

Claims (14)

1.一种微处理器,包括: 一分支目标地址快取,其中上述分支目标地址快取中的各个项目用以储存多个分支预测信息,并且上述分支目标地址快取被组织成多个N个项目的组从而每组提供用于存储关于每个提取总量的N个分支指令的分支预测信息的N个项目,其中N为恒定的整数值,N至少等于二,其中分支预测信息包括预测的分支指令的目标地址、方向以及型式的信息,以及分支指令利用基于分支指令类型的替换优先级进行分类; 一执行单元,用以执行事先从一指令快取的提取总量中提取的一分支指令;以及一更新逻辑电路,耦接至上述分支目标地址快取与上述执行单元,上述更新逻辑电路用以: 判断上述分支目标地址快取是否已经储存位于上述提取总量中的上述N个分支指令的上述分支预测信息; 若上述分支目标地址快取尚未储存位于上述提取总量中 1. A microprocessor comprising: a branch target address cache, wherein the branch target address cache for each item for storing a plurality of branch prediction information and the branch target address cache is organized into a plurality of N items groups so each provided for storing N items branch prediction information about the N branch instructions per total extracted, where N is a constant integer value, N is equal to at least two, wherein the branch prediction information includes prediction destination address, the direction and the information type, and a branch instruction of the branch instruction using the classified based on replacement priority branch instruction type; an execution unit configured to execute a branch in advance is extracted from the total amount of an instruction cache of extraction instructions; and a refresh logic circuit, coupled to the branch target address cache and the execution means, the updating logic configured to: determining the branch target address cache is already storing the N branch is located total amount of the extraction the branch prediction information instruction; if the branch target address cache is not yet stored is positioned in the total amount of extraction 上述N个分支指令的上述分支预测信息,则使用上述分支指令的分支信息来更新上述分支目标地址快取; 若上述分支目标地址快取已经储存位于上述提取总量中的上述N个分支指令的上述分支预测信息,则判断上述分支指令的替换优先权是否高于上述分支目标地址快取中的上述N个分支指令的替换优先权;以及若上述分支指令的上述替换优先权高于上述分支目标地址快取中的上述N个分支指令的上述替换优先权,则使用上述分支指令的上述分支信息来更新上述分支目标地址快取。 The N of the branch prediction information of the branch instruction, using the branch instruction in the branch information to update the branch target address cache; if the branch target address cache has stored the N branch instruction total positioned in the extraction of the branch prediction information, it is determined whether the replacement priority the branch instruction is higher than the replacement priority of the branch target address cache of said N branch instruction; and if the above replacement priority the branch instruction is higher than the branch target the branch information of the replacement priority of the N branch address of the instruction cache, the above-described branch instruction to update the branch target address cache.
2.根据权利要求1所述的微处理器,其中一间接型式的分支指令的替换优先权高于一呼叫/返回型式的分支指令的替换优先权,并且上述呼叫/返回型式的分支指令的上述替换优先权高于一条件相对型式的分支指令的替换优先权,并且上述条件相对型式的分支指令的上述替换优先权高于一非条件相对型式的分支指令的替换优先权。 Above 2. The microprocessor according to claim 1, replacing the priority in which an indirect type of branch instruction is greater than a call / return replacement priority branch instruction type, and said call / return type branch instruction Alternatively priority over a condition opposite type of branch instruction replacement priority, and said replacement priority of the above-described conditions are relatively type branch instruction is higher than a non-conditional opposite type of branch instruction replacement priority.
3.根据权利要求1所述的微处理器,其中一非条件相对型式的分支指令的替换优先权低于其它型式的分支指令的替换优先权。 3. The microprocessor of claim 1 replacement priority claim, replacement priority wherein a non-conditional opposite type of branch instructions than other types of branch instruction.
4.根据权利要求1所述的微处理器,还包括: 一虚拟随机产生器,耦接至上述更新逻辑电路,用以产生一虚拟随机指标; 其中上述更新逻辑电路还用以: 若上述分支指令的上述替换优先权不高于上述分支目标地址快取中的上述N个分支指令的上述替换优先权,则判断上述虚拟随机指标是否为一真值; 若上述虚拟随机指标为上述真值,则使用上述分支指令的上述分支信息来更新上述分支目标地址快取;以及若上述虚拟随机指标为一伪值,则不使用上述分支指令的上述分支信息来更新上述分支目标地址快取,其中上述虚拟随机产生器包括一线性反馈移位寄存器。 4. The microprocessor of claim 1, further comprising: a pseudorandom generator coupled to the update logic circuit for generating a pseudo-random index; wherein said update logic circuit is further configured to: if the branch the above-described replacement priority above replacement priority command is not higher than the N branch instruction of the branch target address from the cache, it is determined in the virtual stochastic whether a true value; if the dummy stochastic above true value, using the branch instruction of the branch information to update the branch target address cache; and if the dummy stochastic is a dummy value, not use the branch instruction of the branch information to update the branch target address cache, wherein said pseudorandom generator comprises a linear feedback shift register.
5.根据权利要求1所述的微处理器,上述更新逻辑电路还用以: 若上述分支指令的上述替换优先权不高于上述分支目标地址快取中的上述N个分支指令的上述替换优先权,则判断上述分支指令是否为一非条件相对型式的分支指令; 若上述分支指令为上述非条件相对型式的分支指令,则判断一虚拟随机指标是否为一真值;若上述虚拟随机指标为上述真值,则使用上述分支指令的上述分支信息来更新上述分支目标地址快取;以及若上述虚拟随机指标为一伪值,则不使用上述分支指令的上述分支信息来更新上述分支目标地址快取。 The microprocessor as recited in claim 1, and the update logic circuit is further configured to: the replaceable if the above replacement priority the branch instruction is not higher than the N branch instruction of the branch target address cache of priority right, it is determined that the branch instruction is an unconditional opposite type of branch instruction; branch instruction if the branch instruction is the above-described unconditional opposite type, it is determined that a pseudo-random indicator whether a true value; when the above-described pseudo-random indicators above true value, using the branch instruction the branch information and updating the branch target address cache; and said branch information when said virtual stochastic is a dummy value, not using the branch instruction updating the branch target address quickly take.
6.根据权利要求5所述的微处理器,上述更新逻辑电路还用以: 若上述分支指令不是上述非条件相对型式的分支指令,则判断上述分支指令与上述N个分支指令是否皆为一条件相对型式的分支指令;以及若上述分支指令与上述N个分支指令不是皆为上述条件相对型式的分支指令,则不使用上述分支指令的上述分支信息来更新上述分支目标地址快取。 6. The microprocessor 5 of the claim, and the update logic circuit is further configured to: when the branch instruction is not the above-described unconditional opposite type of branch instruction, it is determined that the branch instruction and said N branch instruction if are both a conditions opposite type of branch instruction; and if the branch instruction with the branch information than are all above-described conditions of the opposing type branch instruction, the branch instruction does not use the N branch instruction to update the branch target address cache.
7.根据权利要求6所述的微处理器,还包括: 一指令解码器,用以通过解码来预测上述分支指令; 其中上述更新逻辑电路还用以: 若上述分支指令与上述N个分支指令皆为上述条件相对型式的分支指令,则判断上述指令解码器是否正确地预测上述分支指令; 若上述指令解码器没有正确地预测上述分支指令,或上述虚拟随机指标为上述真值,则使用上述分支指令的上述分支信息来更新上述分支目标地址快取;以及若上述指令解码器正确地预测上述分支指令,或上述虚拟随机指标为上述伪值,则不使用上述分支指令的上述分支信息来更新上述分支目标地址快取。 According to claim 6 of the microprocessor, further comprising: an instruction decoder for by decoding to predict the branch instruction; wherein said update logic circuit is further configured to: if the branch instruction and the N branch instruction are all above-described conditions of the opposing type branch instruction, it is determined that the command decoder predicting whether the branch instruction is correctly; if the command decoder does not correctly predicted the branch instruction, or said pseudo-random index above true value, using the the branch information to update the branch target address cache branch instruction; and the branch information if the instruction decode correctly predicted the branch instruction, or said pseudo-random index of the dummy values, not using the branch instruction updating the branch target address cache.
8.一种更新微处理器中的一分支目标地址快取的方法,其中上述分支目标地址快取中的各个项目用以储存多个分支预测信息,并且上述分支目标地址快取被组织成多个N个项目的组从而每组提供用于存储关于每个提取总量的N个分支指令的分支预测信息的N个项目,其中N为恒定的整数值,N至少等于二,以及分支指令利用基于分支指令类型的替换优先级进行分类,上述方法包括: 执行事先从指令快取的上述提取总量中提取的一分支指令; 判断上述分支目标地址快取是否已经储存位于上述提取总量中的上述N个分支指令的上述分支预测信息,以及分支预测信息包括预测的分支指令的目标地址、方向以及型式的信息; 若上述分支目标地址快取尚未储存位于上述提取总量中的上述N个分支指令的上述分支预测信息,则使用上述分支指令的分支信息来更新上述分支 The method of a branch target address cache 8. A newer microprocessors, wherein the branch target address cache for each item for storing a plurality of branch prediction information and the branch target address cache is organized into a plurality of N-items of the groups so that each group is provided for storing N program branch prediction information about the N branch instructions per total extracted, where N is a constant integer value, N is equal to at least two, and a branch instruction using classified based replacement priority branch instruction type, said method comprising: performing a branch instruction previously extracted from the total amount of the extracted instruction cache in; Analyzing the branch target address cache is already stored amount located at the extraction the N of the branch prediction information of the branch instruction, and the branch prediction information comprising a predicted branch instruction target addresses, directions and information type; if the branch target address cache is not yet stored amount located at the extraction of the N branches the branch prediction information instruction, using the branch instruction is the branch information to update the branch 目标地址快取; 若上述分支目标地址快取已经储存位于上述提取总量中的上述N个分支指令的上述分支预测信息,则判断上述分支指令的替换优先权是否高于上述分支目标地址快取中的上述N个分支指令的替换优先权;以及若上述分支指令的上述替换优先权高于上述分支目标地址快取中的上述N个分支指令的上述替换优先权,则使用上述分支指令的上述分支信息来更新上述分支目标地址快取。 Target address cache; if the branch target address cache of the branch prediction information is located in the N branch instruction total amount of the extraction already stored, it is determined whether the replacement priority the branch instruction is higher than the branch target address cache replaces the priority of the N branch instruction; and said replacement priority if the above replacement priority the branch instruction is higher than the N branch instruction of the branch target address from the cache, the above-described branch instruction above branch information to update the branch target address cache.
9.根据权利要求8所述的方法,其中一间接型式的分支指令的替换优先权高于一呼叫/返回型式的分支指令的替换优先权,并且上述呼叫/返回型式的分支指令的上述替换优先权高于一条件相对型式的分支指令的替换优先权,并且上述条件相对型式的分支指令的上述替换优先权高于一非条件相对型式的分支指令的替换优先权。 The replaceable replacement priority, and said call / return type branch instruction The method according to claim 8, replacing the priority in which the branch instruction is an indirect type than a call / return type branch instruction priority right above a condition opposite type of branch instruction replacement priority, and said replacement priority of the above-described conditions are relatively type branch instruction is higher than a non-conditional opposite type of branch instruction replacement priority.
10.根据权利要求8所述的方法,其中一非条件相对型式的分支指令的替换优先权低于其它型式的分支指令的替换优先权。 Alternatively priority 10. The method of claim 8, replacing the priority in which a non-conditional opposite type of branch instructions than other types of branch instruction.
11.根据权利要求8所述的方法,还包括: 若上述分支指令的上述替换优先权不高于上述分支目标地址快取中的上述N个分支指令的上述替换优先权,则判断一虚拟随机指标是否为一真值; 若上述虚拟随机指标为上述真值,则使用上述分支指令的上述分支信息来更新上述分支目标地址快取;以及若上述虚拟随机指标为一伪值,则不使用上述分支指令的上述分支信息来更新上述分支目标地址快取。 11. The method of claim 8, further comprising: said substitute priority if the above replacement priority the branch instruction is not higher than the N branch instruction of the branch target address from the cache, it is determined a pseudorandom indicator whether a true value; if the dummy stochastic above true value, then the branch information of the branch instruction to update the branch target address cache; and if said virtual stochastic is a dummy value, not using the the branch information of the branch instruction to update the branch target address cache.
12.根据权利要求8所述的方法,还包括: 若上述分支指令的上述替换优先权不高于上述分支目标地址快取中的上述N个分支指令的上述替换优先权,则判断上述分支指令是否为一非条件相对型式的分支指令; 若上述分支指令为上述非条件相对型式的分支指令,则判断一虚拟随机指标是否为一真值; 若上述虚拟随机指标为上述真值,则使用上述分支指令的上述分支信息来更新上述分支目标地址快取;以及若上述虚拟随机指标为一伪值,则不使用上述分支指令的上述分支信息来更新上述分支目标地址快取。 12. The method of claim 8, further comprising: said substitute priority if the above replacement priority the branch instruction is not higher than the N branch instruction of the branch target address from the cache, it is determined the branch instruction whether it is an unconditional opposite type of branch instruction; branch instruction if the branch instruction is the above-described unconditional opposite type, it is determined that a pseudo-random indicator whether a true value; if the dummy stochastic above true value, using the the branch information of the branch instruction to update the branch target address cache; and said branch information when said virtual stochastic is a dummy value, not using the branch instruction updating the branch target address cache.
13.根据权利要求12所述的方法,还包括: 若上述分支指令不是上述非条件相对型式的分支指令,则判断上述分支指令与上述N个分支指令是否皆为一条件相对型式的分支指令;以及若上述分支指令与上述N个分支指令不是皆为上述条件相对型式的分支指令,则不使用上述分支指令的上述分支信息来更新上述分支目标地址快取。 13. The method according to claim, further comprising: opposing type branch instruction if the branch instruction is not the above-described unconditional opposite type of branch instruction, it is determined that the branch instruction and said N branch instruction whether or are all of the conditions; and if the branch instruction with the branch information than are all above-described conditions of the opposing type branch instruction, the branch instruction does not use the N branch instruction to update the branch target address cache.
14.根据权利要求13所述的方法,还包括: 若上述分支指令与上述N个分支指令皆为上述条件相对型式的分支指令,则判断一指令解码器是否正确地预测上述分支指令; 若上述指令解码器没有正确地预测上述分支指令,或上述虚拟随机指标为上述真值,则使用上述分支指令的上述分支信息来更新上述分支目标地址快取;以及若上述指令解码器正确地预测上述分支指令,或上述虚拟随机指标为上述伪值,则不使用上述分支指令的上述分支信息来更新上述分支目标地址快取。 14. The method of claim 13, further comprising: when the branch instruction with the N branch instructions are all above-described conditions of the opposing type branch instruction, it is determined that an instruction decoder predicting whether the branch instruction is correctly; if the above instruction decoder without correctly predicted the branch information of the branch instruction, or said pseudo-random index above true value, then the branch instruction to update the branch target address cache; and correctly predicts the branch if said instruction decoder the branch information command, or said pseudo-random index of the dummy values, not using the branch instruction updating the branch target address cache.
CN 201010260377 2009-08-28 2010-08-20 Method for updating branch target address cache in microprocessor and microprocessor CN101916184B (en)

Priority Applications (4)

Application Number Priority Date Filing Date Title
US23792009P true 2009-08-28 2009-08-28
US61/237,920 2009-08-28
US12/575,951 US8832418B2 (en) 2009-08-28 2009-10-08 Efficient branch target address cache entry replacement
US12/575,951 2009-10-08

Publications (2)

Publication Number Publication Date
CN101916184A CN101916184A (en) 2010-12-15
CN101916184B true CN101916184B (en) 2014-02-12

Family

ID=43323702

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 201010260377 CN101916184B (en) 2009-08-28 2010-08-20 Method for updating branch target address cache in microprocessor and microprocessor

Country Status (1)

Country Link
CN (1) CN101916184B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104252334B (en) * 2013-06-29 2017-07-07 华为技术有限公司 The branch target address acquisition method and apparatus

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1089169A2 (en) 1999-10-01 2001-04-04 Hitachi, Ltd. System and method for reducing latencies associated with branch instructions
CN1397876A (en) 2001-05-04 2003-02-19 智慧第一公司 Appts. and method for replacing target address in imaginary branch target address high speed buffer storage
CN101187863A (en) 2006-11-17 2008-05-28 国际商业机器公司 Data processing system, processor and method of data processing

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1089169A2 (en) 1999-10-01 2001-04-04 Hitachi, Ltd. System and method for reducing latencies associated with branch instructions
CN1397876A (en) 2001-05-04 2003-02-19 智慧第一公司 Appts. and method for replacing target address in imaginary branch target address high speed buffer storage
CN101187863A (en) 2006-11-17 2008-05-28 国际商业机器公司 Data processing system, processor and method of data processing

Also Published As

Publication number Publication date
CN101916184A (en) 2010-12-15

Similar Documents

Publication Publication Date Title
US8930679B2 (en) Out-of-order execution microprocessor with reduced store collision load replay by making an issuing of a load instruction dependent upon a dependee instruction of a store instruction
CN1310134C (en) Branch prediction device with two levels of branch prediction cache
US9244689B2 (en) Energy-focused compiler-assisted branch prediction
EP1116102B1 (en) Method and apparatus for calculating indirect branch targets
KR100944139B1 (en) Instruction prefetch mechanism
US7203817B2 (en) Power consumption reduction in a pipeline by stalling instruction issue on a load miss
US7178010B2 (en) Method and apparatus for correcting an internal call/return stack in a microprocessor that detects from multiple pipeline stages incorrect speculative update of the call/return stack
US6877089B2 (en) Branch prediction apparatus and process for restoring replaced branch history for use in future branch predictions for an executing program
US7159103B2 (en) Zero-overhead loop operation in microprocessor having instruction buffer
US6141747A (en) System for store to load forwarding of individual bytes from separate store buffer entries to form a single load word
US7568087B2 (en) Partial load/store forward prediction
US8185725B2 (en) Selective powering of a BHT in a processor having variable length instructions
US5864697A (en) Microprocessor using combined actual and speculative branch history prediction
CN101160561B (en) Suppressing update of a branch history register by loop-ending branches
JP4856100B2 (en) Non-aligned memory access prediction
JP2744890B2 (en) Branch prediction expression data processing apparatus and method of operation
CN101375228B (en) Microprocessor having a power-saving instruction cache way predictor and instruction replacement scheme
US20070288725A1 (en) A Fast and Inexpensive Store-Load Conflict Scheduling and Forwarding Mechanism
US6502185B1 (en) Pipeline elements which verify predecode information
US7437543B2 (en) Reducing the fetch time of target instructions of a predicted taken branch instruction
US6351796B1 (en) Methods and apparatus for increasing the efficiency of a higher level cache by selectively performing writes to the higher level cache
US7222227B2 (en) Control device for speculative instruction execution with a branch instruction insertion, and method for same
US7143273B2 (en) Method and apparatus for dynamic branch prediction utilizing multiple stew algorithms for indexing a global history
CN1291311C (en) Device and method for executing non-standard calling and return program codes
JP2001166935A (en) Branch prediction method for processor and processor

Legal Events

Date Code Title Description
C06 Publication
C10 Request of examination as to substance
C14 Granted