CN103238134A - Bimodal branch predictor encoded in a branch instruction - Google Patents

Bimodal branch predictor encoded in a branch instruction Download PDF

Info

Publication number
CN103238134A
CN103238134A CN2011800578444A CN201180057844A CN103238134A CN 103238134 A CN103238134 A CN 103238134A CN 2011800578444 A CN2011800578444 A CN 2011800578444A CN 201180057844 A CN201180057844 A CN 201180057844A CN 103238134 A CN103238134 A CN 103238134A
Authority
CN
China
Prior art keywords
branch
instruction
bimodal
prediction
address
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2011800578444A
Other languages
Chinese (zh)
Other versions
CN103238134B (en
Inventor
苏雷什·K·文库马汉提
卢奇安·科德雷斯库
史蒂芬·R·香农
王林
菲利普·M·琼斯
黛西·T·帕拉尔
屠嘉晋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qualcomm Inc
Original Assignee
Qualcomm Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qualcomm Inc filed Critical Qualcomm Inc
Publication of CN103238134A publication Critical patent/CN103238134A/en
Application granted granted Critical
Publication of CN103238134B publication Critical patent/CN103238134B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3836Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
    • G06F9/3842Speculative instruction execution
    • G06F9/3844Speculative instruction execution using dynamic branch prediction, e.g. using branch history tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Advance Control (AREA)

Abstract

Each branch instruction having branch prediction support has branch prediction bits in architecture specified bit positions in the branch instruction. An instruction cache supports modifying the branch instructions with updated branch prediction bits that are dynamically determined when the branch instruction executes.

Description

Be encoded in the bimodal branch predictor in the branch instruction
Technical field
The present invention relates generally to and reduces electric power and implement complicacy and the technology of the performance of the disposal system of improvement support branch prediction, and more particularly, relate to favourable technology for the branch prediction information of in the branch instruction that is stored in the multi-level store hierarchy, dynamically encoding.
Background technology
For example many portable products such as mobile phone, laptop computer, PDA(Personal Digital Assistant) are incorporated into one or more processors of carrying out the program of supporting communication and multimedia application.The processor that is used for these products conventionally has the configuration of layer-stepping storer, and described layer-stepping storer configuration has the multistage cache memory that comprises instruction cache, data caching and system storage.Processor also need operate to support a plurality of computation-intensive functions of these products with high-performance and efficient.Processor is generally execution pipeline and that support conditional branch instructions.
Can before the determining of condition, make pipeline stall in the execution of conditional branch instructions on the pipelined processor.For fear of processor is ended, in pipeline, use the branch prediction of certain form usually ahead of time, thereby the permission processor obtains speculatively based on branch's behavior of prediction and executes instruction.If misprediction conditional branching, the then instruction of obtaining speculatively that is associated from pipeline flushing and obtain new instruction from the branch address of determining.This misprediction has reduced processor performance and has increased electric power use.
The conventional method of branch prediction is owing to the implementation cost of branch prediction circuit and complicacy and be restricted, and all described branch prediction circuit all consume electric power.
Summary of the invention
Of the present invention some aspect in, identification of the present invention is to having low implementation cost and reducing the needs of the improved branch prediction capabilities of electricity usage.In order to realize these purposes, one embodiment of the invention are used and a kind of bimodal branch predictor position are stored in method in the branch instruction in the instruction cache.Based on the bimodal branch predictor position prediction branch target address that is stored in from the branch instruction that instruction cache obtains.Based in response to the execution of described branch instruction to the assessment of branch prediction accuracy and determine whether to change described bimodal branch predictor position.The bimodal branch predictor position of the change of the described bimodal branch predictor position from the described branch instruction of obtaining is stored in the described instruction cache.
Another embodiment of the present invention is handled a kind of branch prediction equipment.Instruction cache is configured for use in instruction and obtains the storage of place, address and provide branch instruction, described branch instruction to have bimodal branch predictor position.The pipeline storer is configured for use in the described instruction of the described branch instruction of storage and obtains the address.Prediction circuit is configured for use in based on to the assessment of the condition that is associated with the described branch instruction that provides and determine whether to change bimodal branch prediction position.Writing control logic circuit is configured for use in the described instruction cache in described instruction through storage the described bimodal branch prediction position that changes, the described bimodal branch prediction position from the described branch instruction that provides of storage in the described branch instruction at place, address is provided.
Another embodiment of the present invention is handled a kind of method for the bimodal branch prediction.The term of execution dynamically produce the branch prediction position that is associated with conditional branch instructions.The described branch prediction position that dynamically produces is stored in the described conditional branch instructions in the instruction cache.
According to following embodiment and accompanying drawing, can understand the present invention and further feature of the present invention and advantage more comprehensively will be apparent.
Description of drawings
Fig. 1 is for can advantageously using the block scheme of the example wireless communication system of embodiments of the invention;
Fig. 2 is the functional block diagram according to the processing complex for the bimodal branch predictor of memory encoding in the branch instruction that is stored in memory hierarchy of the present invention;
Fig. 3 explanation is according to exemplary 32 and 16 conditional branch instructions forms of the dynamic coding of support bimodal branch predictor of the present invention position;
Fig. 4 explanation is according to a plurality of grades the exemplary 1 grade of instruction cache subsystem that is coupled to processor pipeline of the present invention; And
Fig. 5 illustrates according to of the present invention for the process that reads and write bimodal branch prediction position in the branch instruction of Icache.
Embodiment
Now referring to the accompanying drawing of showing some embodiment of the present invention the present invention is described more completely.Yet the present invention can embodied in various forms and be should not be construed as and be limited to the embodiment that states herein.But, provide these embodiment so that the present invention will be for comprehensive and complete, and will fully pass on scope of the present invention to the those skilled in the art.
At first for example C, C++,
Figure BDA00003279163800021
Smalltalk,
Figure BDA00003279163800022
Senior programming language such as TSQL, Perl or operated or be used for carrying out computer program code or " program code " according to the operation of teaching of the present invention when being programmed in operation according to teaching of the present invention with various other programming languages.By advanced procedures code conversion cost machine assembler program will be compiled into the target processor framework with the written program in these language.The program that is used for the target processor framework also can directly be write with this machine assembler language.This machine assembler program is used the instruction mnemonic(al) representation of machine level binary command.Program code or computer-readable media refer to for example machine language code such as object identification code as used herein, and its form can be understood by processor.
The example wireless communication system 100 of embodiments of the invention can be advantageously used in Fig. 1 explanation.For illustrative purposes, Fig. 1 shows three remote units 120,130 and 150 and two base stations 140.It should be understood that common wireless communication system can have much more remote unit and base station.As further discussed below, comprise respectively as nextport hardware component NextPort, component software or both remote units 120,130 represented by assembly 125A, 125C, 125B and 125D, 150 and base station 140 be suitable for embodying the present invention.Fig. 1 show from the base station 140 to remote unit 120,130 and 150 forward link signal 180 and from remote unit 120,130 and 150 to the base station 140 reverse link signal 190.
In Fig. 1, remote unit 120 is through being shown as mobile phone, and remote unit 130 is through being shown as portable computer, and the fixed location remote unit of remote unit 150 in being shown as wireless local loop system.For instance, remote unit alternately is the portable data units of mobile phone, pager, intercom, handheld PCS Personal Communications System (PCS) unit, for example personal digital assistant, or the fixed position data cell of instrument fetch equipment for example.Although Fig. 1 explanation the invention is not restricted to these exemplary illustrated unit according to the remote unit of teaching of the present invention.Any processor system that embodiments of the invention can compatibly be used for the support branch prediction and support to have the memory hierarchy of cache memory.
Branch prediction techniques can comprise the technology of static prediction and performance prediction.The possible behavior of some branch instructions can and/or be compiled the translator and predict statically by programmer.For instance, can be based on carrying out runtime property (for example, assessment is withdrawed from circulation, and it is the branch to the last address that begins to locate of circulating) predicted branches instruction statically.These " back to " branches usually are predicted to be and are used to stay in the circulation.When withdraw from circulation and since fall through to branch after next instruction and when not adopting described branch, " afterwards to " branch will be by misprediction, and then withdraw from circulation.Also can determine that " forward direction " branch seldom is used at specific program.Therefore, " back to " branch can be predicted as " employing " statically, and " forward direction " branch can be predicted as " not adopting " statically.
Performance prediction generally is based on the assessment to the history of the behavior that is stored in the specific branch in the special branch history storage device circuit.The branch evaluates pattern that the general indication of analysis of program is pass by recently can be the good indicator of the behavior of following branch instruction.As an example of simple branch history branch predictor, can keep a plurality of flags, wherein each flag is associated with the address of conditional branch instructions.When the conditional branching that is associated is evaluated as " employing ", set each flag, and when described conditional branching is evaluated as " not adopting ", reset each flag.The prediction that next time occurs to described conditional branching then may simply be the flag target value that is associated.For some branch instructions, this fallout predictor can produce prediction accurately.
The design object that is closely related with the maximization branch prediction accuracy is the adverse effect of minimise false branch prediction.Consider " back to " as described above branch condition and use a flag as the dynamic branch predictor device.When processor is in circulation, adopt branch, and the flag that is associated keeps " one " in by each cycle of circulation, and predict " employing " at carrying out the future of " back to " branch instruction.To withdraw from circulation time, " back to " branch is pre-fetched in the pipeline for " employing " and false command by misprediction.Processor recovers from wrong branch prediction according to known branch misprediction restoration methods, causes loss and the waste electric power of performance simultaneously.Owing to this event, reset a flag being associated with reflection " not adopting " branch history.Yet next time carrying out of " back to " branch instruction will be probably in the period 1 of circulation, and based on the prediction of " not adopting " flag with incorrect.In this case, branch evaluation history two mispredictions-of causing withdrawing from branch evaluates at each circulation in single position withdraw from next follow-up execution place of end and another " back to " branch instruction in the period 1 of circulation in circulation.
A kind of technology of the influence be used to the branch evaluates that minimizes misprediction is for to be weighted to indicate strong or weak prediction by confidence factor to branch prediction.For instance, can be by producing confidence factor based on the bimodal branch predictor by the branch history of the state representation of two saturated counters.For each branch that uses this technological prediction, need independent counter or independent 2 history store.Each counter is taked one in the one of four states, and each state representation is through the weight estimation value, for example:
11-predicts employing by force
The weak prediction of 10-is adopted
The weak prediction of 01-is not adopted
00-is strong, and prediction is not adopted
For instance, counter increases progressively when the respective conditions branch instruction is evaluated as " employing " and successively decreases when instruction is evaluated as " not adopting ".Increase progressively between two states change towards the forward direction on the direction of " strong prediction adopt " state and be decremented between two states in the reverse transformation on the direction of " strong prediction is not adopted " state.For instance, do not adopt increasing progressively of state to be the forward direction transformation to " 10 " weak prediction employing state from " 01 " weak prediction.This incremented/decremented stops at the 0b11 place and successively decreases " saturated " when stopping at the 0b00 place increasing progressively.Therefore, branch prediction not only comprises as the employing that can determine by the highest significant position (MSB) of 2 saturated counters of check output or do not adopt prediction, also comprises intensity that two positions indications utilizing Counter Value are predicted or the weighting factor of degree of confidence.
The substitute technology that is used for enforcement bimodal branch predictor is based on finite state machine.The finite state machine fallout predictor is used for each predicted branch separately.The finite state machine fallout predictor has one of four states, and each state representation is through the weight estimation value, for example:
11-predicts employing by force
The weak prediction of 10-is adopted
The weak prediction of 00-is not adopted
01-is strong, and prediction is not adopted
Depend on whether current state and the conditional branch instructions that is associated are evaluated as " employing ", the finite state machine fallout predictor is located saturated carry out forward direction transformation " 01 " → " 00 " → " 10 " → " 11 " between two states of weight estimation value in " 11 ".Depend on whether current state and the conditional branch instructions that is associated are evaluated as " not adopting ", the finite state machine fallout predictor is located saturated carry out reverse transformation " 11 " → " 10 " → " 00 " → " 01 " between two states of weight estimation value in " 00 ".By the finite state machine fallout predictor, be marked as the P position through the highest significant position of weight estimation value, and be marked as the intensity of Q position and expression prediction through the least significant bit (LSB) of weight estimation value.
The instruction of branches such as " back to " branch instruction of for example above considering by the bimodal branch predictor only misprediction once, but not as by single flag fallout predictor twice.Branch prediction at the circulation place of withdrawing from will make fallout predictor move to " weak employing " from " the strong employing ".Actual prediction is bimodal, and is represented by the MSB of the bimodal predictor circuit that can be embodied as 2 digit counters that are associated or finite state machine fallout predictor as described above.Therefore, the appearance next time of " back to " branch instruction will be predicted to be " employings ", and it may be for correct, and fallout predictor will move back to " strong employing " state.Determine the intensity of branch prediction degree of confidence through the binary value of weight estimation value, it is higher wherein to dispose reliability at arbitrary end of scope, and lower towards the middle degree of confidence of scope.
The cost of implementing this bimodal prognoses system is for expensive, needs branch history table etc. and the device in order to branch's counter or finite state machine fallout predictor are associated with the address of branch instruction.In order to support that wherein per five to seven instructions can run into branch instruction large program once, branch history table can be very big.
Fig. 2 is according to the functional block diagram for the processing complex 200 that is stored in the bimodal branch predictor of encoding in the branch instruction that is stored in memory hierarchy 202 of the present invention.The processor 204 that processor complex 200 comprises memory hierarchy 202 and has processor pipeline 206, control circuit 208 and register file (RF) 210.Memory hierarchy 202 comprises 230,1 grades of data cache memory (L1Dcache) 232 of 1 grade of instruction cache (L1Icache) and accumulator system 234.Control circuit 208 comprises programmable counter (PC) 209.For discuss clear for the purpose of, show the peripheral unit that can be connected to the processor complex.The nextport hardware component NextPort 125A that processor complex 200 can be compatibly be used for Fig. 1 to be used for carrying out the program code that is stored in L1Icache230, is stored in L1Dcache232 and the data that with the accumulator system 234 that can comprise higher cache memory and primary memory are associated thereby utilize to 125D.Processor 204 can be general processor, multiline procedure processor, digital signal processor (DSP), application specific processor (ASP) etc.Any other techniques available that the various assemblies of handling complex 200 can use special IC (ASIC) technology, field programmable gate array (FPGA) technology or other FPGA (Field Programmable Gate Array), discrete gate or transistor logic or be suitable for set application is implemented.
Processor pipeline 206 comprises (for example) six main level: instruction fetch stage 214, the decoding with prediction logic 217 and bimodal predictor circuit 218 and prediction stage 216, dispatch stage 219, read register stage 220, execution level 222 and write back stages 224.Though show single-processor pipeline 206, use memory hierarchy 202 of the present invention and decoding and prediction stage 216 to come processing instruction applicable to other framework of super-scalar designs and enforcement parallel pipeline.For instance, the superscalar processor that designs at high clock rate can have two or more parallel pipelines of supporting a plurality of threads and each pipeline can be with instruction fetch stage 214, decoder stage 216, dispatch stage 219, read register stage 220, execution level 222 and write back stages 224 is divided into two or more pipeline stages, thereby increases total processor pipeline degree of depth in order to support high clock rate.And for design, enforcement or other reason, prediction logic 217 and bimodal predictor circuit 218 can be arranged in other place of processor 204, for example in (for example) control circuit 208.
From the first order of processor pipeline 206, the instruction fetch stage 214 that is associated with programmable counter (PC) 209 is obtained instruction from L1Icache230 and is handled by the level of back being used for.If it is miss that instruction is obtained in L1Icache230, mean that instruction to be obtained is not in L1Icache230, then from comprising that multistage cache memory (for example, 2 grades of (L2) cache memories) and the accumulator system 234 of primary memory obtain instruction.Instruction can (for example, start ROM (read-only memory) (ROM), hard disk drive, CD) from other source or be loaded into accumulator system 234 from external interface (for example, network).The then instruction that decoding is obtained in decoder stage 216.
Dispatch stage 219 obtains one or more and is assigned to one or more instruction pipeline through decoded instruction and with it.Read register stage 220 and obtain data operand from RF210.Instruction and write back stages 224 that execution level 222 is carried out through assigning write results to RF210.Result operand from execution level 222 can spend a plurality of performance periods to determine the employed condition of conditional branch instructions.During these cycles, till processor pipeline 206 must wait until always that result operand can be used.Because compare reception result in write back stages 224 out of orderly with program sequencing, so write back stages 224 is used processor facility prewired program order when writing results to RF210.
Processor complex 200 can be configured to execute instruction under the control that is stored in the program on the computer-readable storage medium.For instance, computer-readable storage medium can directly (for example be associated with processor complex 200 local, can obtain from L1Icache230) data that obtain from L1Dcache232 and accumulator system 234 are operated being used for, or be associated with processor complex 200 via (for example) input/output interface (not shown).In instruction fetch stage 214, receive the conditional branch instructions (Cbranch) that obtains from L1Icache230.The conditional branch instructions of retrieving in decoding and prediction stage 216 and using the bimodal prediction bits among the Cbranch that dynamically is stored among the L1Icache230 to obtain with forecasting institute will be used still and not adopt.Can obtain other instruction speculatively based on described prediction.When Cbranch is in execution level 222, determines condition and inform that via prediction signal 223 bimodal predictor circuit 218 is to change and to carry out the reverse transformation of the state of bimodal fallout predictor at the forward direction of the state that carries out the bimodal fallout predictor under the adopted situation of Cbranch under the not adopted situation of Cbranch.Then via bimodal position signal 240 transmit bimodal predictor circuit 218 through update mode in L1Icache230, in next available write cycle the bimodal prediction bits is stored among the Cbranch that is associated.The prediction of next branch target address when the bimodal branch predictor position influence that changes in the Cbranch that the stores instruction obtains the Cbranch instruction next time and do not influence the function of Cbranch instruction.The more detailed description of the processor pipeline 206 that uses L1Icache230 and decoding and prediction stage 216 hereinafter is provided by detailed code instance.
Fig. 3 illustrates that respectively support according to the present invention is to exemplary 32 and 16 conditional branch instructions forms 302 and 304 of the dynamic coding of bimodal branch predictor position.32 conditional branch instructions forms 302 comprise first condition code selection field 306, first operational code 308, prediction bits 310, the skew 312 of 24 bit strip signs and Q position 314.16 conditional branch instructions forms 304 comprise that second operational code 320, second condition code select field 322 and be used for 8 bit strip signs skew 324 in the address of half-word address boundary identification 16 bit instructions.
Before loading procedure, determine the prediction bits (for example, prediction bits 310) in the conditional branch instructions statically.For instance, as determining that from 24 bit strip sign offset fields 312 of backward bifurcation instruction by asserting that P position 310 is value one, backward bifurcation can be predicted as " employing " by compiler.By the finite state machine embodiment of bimodal predictor circuit 218, Q position 314 can the value of being set to indicate strong prediction.Perhaps, Q position 314 can the value of being set to zero to indicate weak prediction.For instance, the initial or default setting at Q position 314 can be zero.In alternate embodiment, bimodal fallout predictor position both can be by the analysis of program being determined statically and before executive routine, being specified in the branch instruction.For instance, be used as in the program context of loop fuction in conditional branching (Cbranch) instruction, P position 310 can be set to " 1 " and Q position 314 can be set to " 0 ", thus the weak employing state of indication.When passing through the period 1 of circulation, the Cbranch instruction will be predicted to be employing probably and also will be assessed as employing probably.To cause the bimodal predictor circuit to advance to " 11 " to the assessment of adopting and adopt state by force.
By 2 saturated counters embodiments of bimodal predictor circuit 218, can determine to adopt or do not adopt prediction by the highest significant position (MSB) of 2 saturated counters outputs of check.Intensity or the degree of confidence of prediction can be determined by two positions of monitor counter value.For instance, the XOR of 2 saturated counters outputs
Figure BDA00003279163800071
Provide the scale-of-two indication of the intensity of prediction, wherein the strong prediction of " 1 " indication and the weak prediction of " 0 " indication.By use as described above 2 saturated counters through the weight estimation value, the state of wanting (for example, adopt state or do not adopt state for the weak prediction of " 01 " for the weak prediction of " 10 ") can be selected before the loading procedure and initial setting in conditional branch instructions.Two positions of check bimodal predictor circuit 218 are to determine the change through the state of weight estimation value.
Dynamically determine prediction bits 310 and Q position 314 by highest significant position (MSB) and the least significant bit (LSB) (LSB) of the bimodal predictor circuit that is associated with conditional branch instructions respectively.Can during decoding, come the condition for identification branch instruction by the coding by first operational code 308.Q position 314 is positioned at 0 position, position of 32 conditional branch instructions forms 302.0 position, position of address is generally used in the processor with 16 and 32 bit instructions to identify 16 bit instructions in the half-word address boundary.Yet 0 position, position is not used for the addressing purpose in 32 conditional branch instructions forms 302, and this is because according to definition, and all 32 bit instructions all are that word is aimed at and 0 expression, 16 bit address positions, position.Perhaps, for each conditional branch instructions, the Q position can be stored in the independent array, and prediction bits keeps being stored in the conditional branch instructions.
Fig. 4 explanation is according to a plurality of grades exemplary 1 grade of instruction cache (L1Icache) subsystem 400 that is coupled to processor pipeline 206 of the present invention.L1Icache subsystem 400 comprises a plurality of levels of L1Icache230 and processor pipeline 206.L1Icache230 comprises command content addressable memory (ICAM) 402, instruction random access memory (IRAM) 403 and writes steering logic 404.
To obtain when instruction, the instruction fetch stage 214 of processor pipeline 206 be emitted in receive among the ICAM402 of L1Icache230 obtain address 408.Obtain that address 408 comprises (for example) cache line address and in the skew by the branch instruction position in the cache line of described cache line address addressing.Compare to determine whether instruction with in the IRAM403 of cache memory, finding to obtain 408 places, address with obtaining address 408 and clauses and subclauses among the ICAM402.If in ICAM402, determine coupling, then produce hit indication 410 with select with IRAM403 that coupling clauses and subclauses among the ICAM402 are associated in line.For instance, can select order line 412, order line 412 comprises first instruction (Instr.1) 414, has conditional branch instructions (Cbranch) 416 and the extra instruction 420 of P position 417 and Q position 418.
Selected order line 412 is directed into the output 424 of L1Icache230 and is received in the instruction fetch stage 214.In the next stage of processor pipeline 206, for Cbranch416, decoding and prediction stage 216 use P positions 417 and Q position 418 are predicted that Cbranch416 will be used still and are not adopted.Based on described prediction, correspondingly adjust PC209 and instruction fetch stage 214 and produce next that adopt or do not adopt the place, address and obtain the address.The address of Cbranch416 and prediction P position 417 and Q position 418 are stored in the pipeline buffers 421 to be used for the check after a while after condition is determined.
Cbranch416 continues along processor pipeline 206 down, for example passes dispatch stage 219, reads register stage 220 and arrives the execution level 222 of determining condition there.Prediction signal 223 informs that decoding and prediction stage 216 carrying out the forward direction transformation and indicating in condition under the situation of " not adopting " and carry out reverse transformation in BP218 under the situation of condition indication " employings " in bimodal predictor circuit (BP) 218.The bimodal branch position that decoding and prediction stage 216 then will be selected from BP218 is delivered to via bimodal position signal 240 and writes steering logic 404.If up-to-date bimodal branch place value is different from previous P position 417 and Q position 418 values, then writes steering logic 404 and cause storage up-to-date P position and Q place value by the P position that is associated and the Q place value of upgrading in the Cbranch instruction 416 among the L1Icache230.Therefore, previous P position 417 and previous Q position 418 values can be replaced.For instance, can transmit the latest edition of P position and Q position to load in the Cbranch position in the order line 412 via internal signal 430.In alternative method, can instruct to load on via the Cbranch that obtains that internal signal 430 transmission use the latest edition of P position and Q position to upgrade in the Cbranch position in the order line 412.Other location of instruction in internal signal 428 and the 432 and instruction cache lines is associated to support to being stored in the access of the conditional branch instructions in those addresses.If the time durations from Icache reading conditions branch instruction the time between the point that branch prediction information is write back to Icache, then washes branch prediction information and new cache more not from Icache substitute I cache line.
If in L1Icache230, find Cbranch instruction, then indicate miss and will obtain the address to be forwarded to next stage storer in the memory hierarchy.For instance, can use unified 2 grades of cache memories (L2 cache memory).Under the situation of in the L2 cache memory, hitting, will be forwarded to L1Icache230 from the Cbranch instruction of L2 cache memory accesses to be used for loading and being forwarded to concurrently the instruction fetch stage 214 of processor pipeline 206.After the renewal of determining the bimodal prediction bits of Cbranch, use the last look of P position and Q position dynamically to upgrade Cbranch among the L1Icache230 at once.For instance, if L1Icache is single-port device, then the renewal of Cbranch instruction can just obtained under the situation of instruction (it generally has the priority that is higher than renewal) at L1Icache and ended.If L1Icache is two port devices, then can use a port to carry out the renewal of Cbranch instruction, use second port to obtain instruction from Icache simultaneously.Also branch prediction information is forwarded to the L2 cache memory, is present among the L1Icache also like this even have the cache line of Cbranch instruction.If based on replacement policy (for example, least recently used (LRU)) replacement L1 line, when then obtain line from the L2 cache memory next time, make up-to-date information of forecasting the Cbranch that the stores instruction from the L2 cache memory to obtain, this is because the L2 cache line is updated.In other method, when using branch information to upgrade L1Icache, set the dirty bit in the label that is associated with the order line with Cbranch instruction.During more relocating in replacing L1Icache, then new and old more relocating in the L2 cache memory.
Can be arranged in decoding and prediction stage 216 with the possibility of four back-to-back conditional branchings considering pipeline for 206, four bimodal predictor circuit of the exemplary pipeline of showing.The number of bimodal predictor circuit changes with the degree of depth of pipeline.For the pipeline with big degree of depth, need be greater than four bimodal predictor circuit, and depend on needs, and can implement " n " individual bimodal predictor circuit, wherein " n " is less than the number by the support of the pipeline degree of depth.In the case, after receiving " n+1 " conditional branch instructions, this branch will not have the prediction support at once and will be ended.For instance, will be ended in the speculating type access at the branch target address place that predicts, when the condition determined at " n+1 " branch, can produce branch target address till.
Use the branch prediction of the bimodal counter that (for example) be associated with each conditional branch instructions to use prediction logic usually, prediction logic has independent branch prediction array with preservation bimodal counter bit with at the respective value of the conditional branch instructions that is associated.The present invention need not be subjected to this branch prediction array of circuit requirement restriction at capacity.Therefore, hardware circuit embodiment according to the present invention can reduce when keeping the validity of branch prediction.And branch prediction can with the bimodal information of forecasting and each branch instruction be stored together and not limited by the branch prediction array as described in this article.Therefore, compare with the method for using the branch prediction array, electricity usage is minimized.
Fig. 5 illustrates according to of the present invention for the process that reads and write bimodal branch prediction position in the branch instruction of Icache.Referring to previous each figure to emphasize and to illustrate implementation detail.In first step 502, handling start program on the complex 200.And process 500 is followed the path of a conditional branch instructions when flowing through processor pipeline 206.
At step 504 place, obtain instruction from L1Icache230.At steps in decision-making 506 places, determine whether the instruction of obtaining is conditional branching (Cbranch) instruction.If the instruction of obtaining is not the Cbranch instruction, then process 500 turns back to step 504.If the instruction of obtaining is the Cbranch instruction, then process 500 proceeds to step 508.
At step 508 place, the Cbranch that decoding is obtained in decoding and prediction stage 216 instructs and selects the bimodal prediction bits from conditional branch instructions.At step 510 place, with Cbranch instruction obtain the address and selected bimodal prediction bits is stored in the pipeline buffers 421 of Fig. 5.At step 512 place, based on bimodal prediction bits predicted branches destination address.At steps in decision-making 514 places, determine whether to upgrade and obtain the address.Need change to the branch target address of predicting if obtain the address, then process 500 proceeds to step 516.At step 516 place, based on the branch target address of predicting upgrade speculatively for instruction fetch stage 214 obtain the address being used for obtaining instruction at step 504 place, and process 500 proceeds to step 518.Turn back to steps in decision-making 514, do not need to change if obtain the address, then process 500 proceeds to step 518.
At step 518 place, (for example) determines that at execution level 222 places condition and the process 500 of Cbranch instruction proceed to steps in decision- making 520 and 521 concurrently.The condition of determining at step 518 place is used for determining the bimodal branch prediction accuracy.At steps in decision-making 520 places, determine whether misprediction Cbranch instruction.If misprediction Cbranch instruction, then process 500 proceeds to step 522.At step 522 place, washing processor pipeline 206 and will obtaining address setting is the calibrated address that obtains.If not misprediction Cbranch instruction, then process 500 proceeds to step 524.At step 524 place, process line 206 is proceeded normal pipe operation.
At steps in decision-making 521 places, determine whether condition indicates the Cbranch instruction to be assessed as employing.If the Cbranch instruction is not assessed as employing, in other words, the Cbranch instruction is assessed as not to be adopted, and then process 500 proceeds to step 526.At step 526 place, adjust the bimodal predictor circuit in inverse direction, wherein predicted value is saturated at the bimodal predicted value place that is " 00 ", and process 500 proceeds to steps in decision-making 530.Turn back to steps in decision-making 520, if the Cbranch instruction is assessed as employing, then process 500 proceeds to step 528.At step 528 place, adjust the bimodal predictor circuit at forward direction, wherein predicted value is saturated at the bimodal predicted value place that is " 11 ", and process 500 proceeds to steps in decision-making 530.
At steps in decision-making 530 places, determine whether bimodal predictor circuit position is different from from the bimodal prediction bits of the Cbranch Instruction Selection of obtaining.If bimodal predictor circuit position is identical with bimodal prediction bits from the Cbranch Instruction Selection obtained, then process 500 proceeds to step 504.If bimodal predictor circuit position is different from the bimodal prediction bits of Cbranch instruction, then process 500 proceeds to step 532.At step 532 place, locate write cycle to upgrade with the Cbranch instruction at available Icache and be stored in bimodal prediction bits among the L1Icache.Process 500 then proceeds to step 504.
The software module of the combination that the method for describing in conjunction with embodiment disclosed herein can hardware and the nonvolatile signal of being carried out by processor with storage embodies.Software module can reside at random-access memory (ram), flash memory, ROM (read-only memory) (ROM), EPROM (EPROM) but, in hard disk removable disk, tape, compact disk ROM (read-only memory) (CD-ROM) or this technology in the medium of known any other form.Medium can be coupled to processor, makes that processor can be from read information and write information to medium in some cases.The medium that is coupled to processor can be with the direct male part of circuit embodiment all-in-one-piece maybe can utilize one or more interfaces, supports the serial data flow transmission of direct access or use download technology.
Though disclose the present invention in the context for the treatment of the illustrative embodiment in the device system, it should be understood that can be by the those skilled in the art with above argumentation and appended claims consistently use extensive multiple embodiments.Present technique is scalable to all grades memory hierarchy that comprises 3 grades of cache memories and primary memory.And conditional branch instructions can make up with single comparison order of comparing in the branch instruction.Single relatively branch instruction comprises bimodal branch prediction position in the order format that compares branch instruction.For instance, the not use position in the order format can be used for bimodal branch prediction position.In addition, conditional branch instructions can make up with load instructions in the single loading that also comprises bimodal branch prediction position and branch instruction.

Claims (21)

1. one kind is stored in method in the branch instruction in the instruction cache with bimodal branch predictor position, and described method comprises:
Come the predicted branches destination address based on the bimodal branch predictor position that is stored in from the branch instruction that instruction cache obtains;
Based in response to the execution of described branch instruction to the assessment of branch prediction accuracy and determine whether to change described bimodal branch predictor position; And
The bimodal branch predictor position of the change of the described bimodal branch predictor position from the described branch instruction of obtaining is stored in the described instruction cache.
2. method according to claim 1, wherein said bimodal branch predictor position is the position from the bimodal predictor circuit, and the indication of institute rheme is strong adopts branch prediction indication, weak employing branch prediction indication, weakly do not adopt the branch prediction indication and do not adopt the branch prediction indication by force.
3. method according to claim 2 wherein is assigned to the least significant bit (LSB) of described bimodal predictor circuit in the bit field of 32 branch instruction forms untapped one, and wherein that is used for 16 corresponding branch instruction forms.
4. method according to claim 1 wherein by the analysis of program being determined statically described bimodal fallout predictor position, and was loading on before carrying out described program in the described branch instruction in the storer and is specifying described bimodal fallout predictor position.
5. method according to claim 1, it further comprises:
After obtaining described branch instruction the address is obtained in instruction and be stored in the pipeline stages, it is cache line address and in the skew by the branch instruction position in the cache line of described cache line address addressing that the address is obtained in wherein said instruction; And
Select the skew in described cache line address of preserving and the described cache line to obtain the address as the described instruction that is used for the described bimodal branch prediction position that changes of storage.
6. method according to claim 1, wherein said instruction cache is 1 grade of instruction cache.
7. method according to claim 1, it further comprises:
Use has the described branch instruction of the described bimodal branch predictor position that changes and upgrades 2 grades of instruction caches.
8. method according to claim 1, the described bimodal branch predictor position influence that changes in the wherein said branch instruction of storing when obtaining described branch instruction next time to the prediction of next branch target address and do not influence the function of described branch instruction.
9. method according to claim 1 wherein has the described branch instruction of the described bimodal branch predictor position that changes by storage and the described bimodal branch predictor position that changes is stored in the described instruction cache.
10. branch prediction equipment, it comprises:
Instruction cache, it is configured for use in instruction and obtains the storage of place, address and provide branch instruction, described branch instruction to have bimodal branch predictor position;
The pipeline storer, it is configured for use in the described instruction of preserving described branch instruction and obtains the address;
Prediction circuit, it is configured for use in based on to the assessment of the condition that is associated with the described branch instruction that provides and determine whether to change described bimodal branch prediction position; And
Write control logic circuit, it is configured for use in the described instruction cache in described instruction of preserving the described bimodal branch prediction position that changes, the described bimodal branch prediction position from the described branch instruction that provides of storage in the described branch instruction at place, address is provided.
11. branch prediction equipment according to claim 10, the wherein said control logic circuit that writes further obtains the described branch instruction that address place storage has the described bimodal branch prediction position that the described bimodal branch prediction position from the described branch instruction of obtaining changes in described instruction of preserving in described instruction cache.
12. branch prediction equipment according to claim 10, wherein said branch instruction are comparison and branch instruction.
13. branch prediction equipment according to claim 10, wherein said branch instruction is for loading and branch instruction.
14. branch prediction equipment according to claim 10, wherein said branch prediction circuit further comprises:
Two digit counters with strong employing, weak employing, weak state that does not adopt and do not adopt by force, described two digit counters are configured to adopt branch to increase progressively at each, strong what adopt is that three binary counting place is saturated in expression, and do not adopt branch to successively decrease at each, what do not adopt by force in expression is that zero binary counting place is saturated.
15. a method that is used for the bimodal branch prediction, described method comprises:
The term of execution dynamically produce the branch prediction position that is associated with conditional branch instructions; And
The described branch prediction position that dynamically produces is stored in the described conditional branch instructions in the instruction cache.
16. method according to claim 15, it further comprises:
If described conditional branch instructions is evaluated as employing, then change between the current state of weight estimation value and NextState, carrying out forward direction towards saturated strong employing state;
If described conditional branch instructions is evaluated as do not adopt, then between described described current state through the weight estimation value and NextState, carry out reverse transformation towards the saturated state of not employing by force.
17. method according to claim 16, wherein said described current state and described NextState through the weight estimation value is the state of finite state machine fallout predictor, and its expression is carried out the strong employing of described conditional branch instructions, weak employing, weakly do not adopted and do not adopt by force history.
18. method according to claim 15, it further comprises:
Based on by the employing of the condition of described conditional branch instructions appointment or do not adopt resolution and adjust the bimodal prediction circuit; And
When identical with the described bimodal prediction bits of decoding from described conditional branch instructions in the described branch prediction position of being represented by described bimodal prediction circuit, dynamically determine not upgrade the described branch prediction position of storing with described conditional branch instructions.
19. method according to claim 15, wherein said branch prediction position are configured at first to be 1 highest significant position and to be 0 least significant bit (LSB), thereby the weak employing state of indication bimodal prediction circuit.
20. method according to claim 15, it further comprises:
In temporary buffer, preserve address and the described branch prediction position of described conditional branch instructions;
With described branch prediction position of preserving with based on by the employing of the condition of described conditional branch instructions appointment or do not adopt resolution and the bimodal prediction circuit value adjusted compares; And
Retrieve the described address of preserving of described conditional branch instructions and where store the described branch prediction position of dynamically determining to be identified in.
21. method according to claim 15, wherein said conditional branch instructions have the fixed instruction collection framework form that comprises described bimodal prediction bits.
CN201180057844.4A 2010-11-08 2011-11-07 Be encoded in the bimodal branch predictor in branch instruction Active CN103238134B (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US12/941,105 US9122486B2 (en) 2010-11-08 2010-11-08 Bimodal branch predictor encoded in a branch instruction
US12/941,105 2010-11-08
PCT/US2011/059658 WO2012064677A1 (en) 2010-11-08 2011-11-07 Bimodal branch predictor encoded in a branch instruction

Publications (2)

Publication Number Publication Date
CN103238134A true CN103238134A (en) 2013-08-07
CN103238134B CN103238134B (en) 2016-03-30

Family

ID=45217633

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201180057844.4A Active CN103238134B (en) 2010-11-08 2011-11-07 Be encoded in the bimodal branch predictor in branch instruction

Country Status (7)

Country Link
US (1) US9122486B2 (en)
EP (1) EP2638463A1 (en)
JP (1) JP5745638B2 (en)
KR (1) KR101536179B1 (en)
CN (1) CN103238134B (en)
TW (1) TW201235940A (en)
WO (1) WO2012064677A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105867884A (en) * 2016-03-24 2016-08-17 清华大学 An improved PAp branch prediction method
CN108604184A (en) * 2016-02-29 2018-09-28 高通股份有限公司 It is throttled using the dynamic pipeline of the weighting based on confidence level of branch instruction in progress

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9519479B2 (en) 2013-11-18 2016-12-13 Globalfoundries Inc. Techniques for increasing vector processing utilization and efficiency through vector lane predication prediction
US9690587B2 (en) 2014-04-08 2017-06-27 International Business Machines Corporation Variable updates of branch prediction states
US10853074B2 (en) * 2014-05-01 2020-12-01 Netronome Systems, Inc. Table fetch processor instruction using table number to base address translation
US9442726B1 (en) 2015-12-15 2016-09-13 International Business Machines Corporation Perceptron branch predictor with virtualized weights
US11086629B2 (en) * 2018-11-09 2021-08-10 Arm Limited Misprediction of predicted taken branches in a data processing apparatus
US11163577B2 (en) 2018-11-26 2021-11-02 International Business Machines Corporation Selectively supporting static branch prediction settings only in association with processor-designated types of instructions

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH1115659A (en) * 1997-06-20 1999-01-22 Nec Corp Branch prediction system
US20040059899A1 (en) * 2002-09-20 2004-03-25 International Business Machines Corporation Effectively infinite branch prediction table mechanism
CN101427213A (en) * 2006-05-04 2009-05-06 国际商业机器公司 Methods and apparatus for implementing polymorphic branch predictors
EP2063355A1 (en) * 2007-11-22 2009-05-27 Sony Computer Entertainment Europe Ltd. Branch prediction method
CN101449238A (en) * 2006-06-08 2009-06-03 国际商业机器公司 Local and global branch prediction information storage

Family Cites Families (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100310581B1 (en) 1993-05-14 2001-12-17 피터 엔. 데트킨 Inference recording mechanism of branch target buffer
US5878255A (en) 1995-06-07 1999-03-02 Advanced Micro Devices, Inc. Update unit for providing a delayed update to a branch prediction array
US5887159A (en) 1996-12-11 1999-03-23 Digital Equipment Corporation Dynamically determining instruction hint fields
GB2389211B (en) 1998-12-31 2004-02-04 Intel Corp A method and apparatus for improved predicate prediction
US6351796B1 (en) 2000-02-22 2002-02-26 Hewlett-Packard Company Methods and apparatus for increasing the efficiency of a higher level cache by selectively performing writes to the higher level cache
US20030126414A1 (en) 2002-01-02 2003-07-03 Grochowski Edward T. Processing partial register writes in an out-of order processor
US7752426B2 (en) 2004-08-30 2010-07-06 Texas Instruments Incorporated Processes, circuits, devices, and systems for branch prediction and other processor improvements
US7587580B2 (en) 2005-02-03 2009-09-08 Qualcomm Corporated Power efficient instruction prefetch mechanism
US7461243B2 (en) 2005-12-22 2008-12-02 Sun Microsystems, Inc. Deferred branch history update scheme
US20070260862A1 (en) 2006-05-03 2007-11-08 Mcfarling Scott Providing storage in a memory hierarchy for prediction information
US20080040576A1 (en) 2006-08-09 2008-02-14 Brian Michael Stempel Associate Cached Branch Information with the Last Granularity of Branch instruction in Variable Length instruction Set
US20080072024A1 (en) 2006-09-14 2008-03-20 Davis Mark C Predicting instruction branches with bimodal, little global, big global, and loop (BgGL) branch predictors
TWI379230B (en) * 2008-11-14 2012-12-11 Realtek Semiconductor Corp Instruction mode identification apparatus and instruction mode identification method
CN105468334A (en) 2008-12-25 2016-04-06 世意法(北京)半导体研发有限责任公司 Branch decreasing inspection of non-control flow instructions
US20130283023A1 (en) 2012-04-18 2013-10-24 Qualcomm Incorporated Bimodal Compare Predictor Encoded In Each Compare Instruction

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH1115659A (en) * 1997-06-20 1999-01-22 Nec Corp Branch prediction system
US20040059899A1 (en) * 2002-09-20 2004-03-25 International Business Machines Corporation Effectively infinite branch prediction table mechanism
CN101427213A (en) * 2006-05-04 2009-05-06 国际商业机器公司 Methods and apparatus for implementing polymorphic branch predictors
CN101449238A (en) * 2006-06-08 2009-06-03 国际商业机器公司 Local and global branch prediction information storage
EP2063355A1 (en) * 2007-11-22 2009-05-27 Sony Computer Entertainment Europe Ltd. Branch prediction method

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108604184A (en) * 2016-02-29 2018-09-28 高通股份有限公司 It is throttled using the dynamic pipeline of the weighting based on confidence level of branch instruction in progress
CN108604184B (en) * 2016-02-29 2022-05-06 高通股份有限公司 Dynamic pipeline throttling using confidence-based weighting of in-flight branch instructions
CN105867884A (en) * 2016-03-24 2016-08-17 清华大学 An improved PAp branch prediction method
CN105867884B (en) * 2016-03-24 2018-06-15 清华大学 A kind of modified PAp branch prediction methods

Also Published As

Publication number Publication date
EP2638463A1 (en) 2013-09-18
CN103238134B (en) 2016-03-30
TW201235940A (en) 2012-09-01
KR101536179B1 (en) 2015-07-13
US20120117327A1 (en) 2012-05-10
US9122486B2 (en) 2015-09-01
JP5745638B2 (en) 2015-07-08
WO2012064677A1 (en) 2012-05-18
JP2013545194A (en) 2013-12-19
KR20130111583A (en) 2013-10-10

Similar Documents

Publication Publication Date Title
CN103238134B (en) Be encoded in the bimodal branch predictor in branch instruction
CN101694613B (en) Unaligned memory access prediction
CN102934075B (en) For using the method and apparatus of the sequence flow of prenoticing technology reprogramming
US7631146B2 (en) Processor with cache way prediction and method thereof
EP2864868B1 (en) Methods and apparatus to extend software branch target hints
US7707397B2 (en) Variable group associativity branch target address cache delivering multiple target addresses per cache line
KR101402560B1 (en) Computational processing device
US5774710A (en) Cache line branch prediction scheme that shares among sets of a set associative cache
EP0795828A2 (en) Dynamic set prediction method and apparatus for a multi-level cache system
US20020199151A1 (en) Using type bits to track storage of ECC and predecode bits in a level two cache
EP1628210A2 (en) Processing apparatus
US20080072024A1 (en) Predicting instruction branches with bimodal, little global, big global, and loop (BgGL) branch predictors
CN101438237A (en) Block-based branch target address cache
US9547358B2 (en) Branch prediction power reduction
US9552032B2 (en) Branch prediction power reduction
WO2007019001A1 (en) Call return stack way prediction repair
US20140075166A1 (en) Swapping Branch Direction History(ies) in Response to a Branch Prediction Table Swap Instruction(s), and Related Systems and Methods
CN116302106A (en) Apparatus, method, and system for facilitating improved bandwidth of branch prediction units
US7346737B2 (en) Cache system having branch target address cache
CN114647447A (en) Context-based memory indirect branch target prediction
US20060015706A1 (en) TLB correlated branch predictor and method for use thereof
US11995443B2 (en) Reuse of branch information queue entries for multiple instances of predicted control instructions in captured loops in a processor
WO2005119428A1 (en) Tlb correlated branch predictor and method for use therof

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant