WO2020060734A1 - Using loop exit prediction to accelerate or suppress loop mode of a processor - Google Patents

Using loop exit prediction to accelerate or suppress loop mode of a processor Download PDF

Info

Publication number
WO2020060734A1
WO2020060734A1 PCT/US2019/048487 US2019048487W WO2020060734A1 WO 2020060734 A1 WO2020060734 A1 WO 2020060734A1 US 2019048487 W US2019048487 W US 2019048487W WO 2020060734 A1 WO2020060734 A1 WO 2020060734A1
Authority
WO
WIPO (PCT)
Prior art keywords
loop
processor
instruction
instructions
iterations
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/US2019/048487
Other languages
English (en)
French (fr)
Inventor
Arunachalam Annamalai
Marius Evers
Aparna Thyagarajan
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Advanced Micro Devices Inc
Original Assignee
Advanced Micro Devices Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Advanced Micro Devices Inc filed Critical Advanced Micro Devices Inc
Priority to CN201980061096.3A priority Critical patent/CN112740173A/zh
Priority to EP19862627.7A priority patent/EP3853716A4/en
Priority to KR1020217010368A priority patent/KR102556897B1/ko
Priority to JP2021514963A priority patent/JP7301955B2/ja
Publication of WO2020060734A1 publication Critical patent/WO2020060734A1/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/3005Arrangements for executing specific machine instructions to perform operations for flow control
    • G06F9/30065Loop control instructions; iterative instructions, e.g. LOOP, REPEAT
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/26Power supply means, e.g. regulation thereof
    • G06F1/32Means for saving power
    • G06F1/3203Power management, i.e. event-based initiation of a power-saving mode
    • G06F1/3234Power saving characterised by the action undertaken
    • G06F1/3287Power saving characterised by the action undertaken by switching off individual functional units in the computer system
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/26Power supply means, e.g. regulation thereof
    • G06F1/32Means for saving power
    • G06F1/3203Power management, i.e. event-based initiation of a power-saving mode
    • G06F1/3206Monitoring of events, devices or parameters that trigger a change in power modality
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/26Power supply means, e.g. regulation thereof
    • G06F1/32Means for saving power
    • G06F1/3203Power management, i.e. event-based initiation of a power-saving mode
    • G06F1/3234Power saving characterised by the action undertaken
    • G06F1/3243Power saving in microcontroller unit
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/26Power supply means, e.g. regulation thereof
    • G06F1/32Means for saving power
    • G06F1/3203Power management, i.e. event-based initiation of a power-saving mode
    • G06F1/3234Power saving characterised by the action undertaken
    • G06F1/3293Power saving characterised by the action undertaken by switching to a less power-consuming processor, e.g. sub-CPU
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/26Power supply means, e.g. regulation thereof
    • G06F1/32Means for saving power
    • G06F1/3203Power management, i.e. event-based initiation of a power-saving mode
    • G06F1/3234Power saving characterised by the action undertaken
    • G06F1/3296Power saving characterised by the action undertaken by lowering the supply or operating voltage
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/30076Arrangements for executing specific machine instructions to perform miscellaneous control operations, e.g. NOP
    • G06F9/30083Power or thermal control instructions
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30181Instruction operation extension or modification
    • G06F9/30189Instruction operation extension or modification according to execution mode, e.g. mode flag
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/32Address formation of the next instruction, e.g. by incrementing the instruction counter
    • G06F9/322Address formation of the next instruction, e.g. by incrementing the instruction counter for non-sequential address
    • G06F9/325Address formation of the next instruction, e.g. by incrementing the instruction counter for non-sequential address for loops, e.g. loop detection or loop counter
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3802Instruction prefetching
    • G06F9/3808Instruction prefetching for instruction reuse, e.g. trace cache, branch target cache
    • G06F9/381Loop buffering
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3836Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
    • G06F9/3842Speculative instruction execution
    • G06F9/3844Speculative instruction execution using dynamic branch prediction, e.g. using branch history tables
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3867Concurrent instruction execution, e.g. pipeline or look ahead using instruction pipelines
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/50Reducing energy consumption in communication networks in wire-line communication networks, e.g. low power modes or reduced link rate

Definitions

  • loop mode the processor retrieves and executes instructions of the loop from a loop instruction buffer, rather than repeatedly retrieving the instructions of the loop via an instruction fetch unit.
  • the loop mode allows the processor to conserve resources by, for example, placing the instruction fetch unit or other portions of the processor in a low-power state while in the loop mode.
  • conventional loop mode operation is inefficient under some conditions.
  • the loop mode is typically exited as a result of the processor encountering a branch misprediction for the loop exit instruction.
  • the branch misprediction causes the instruction pipeline of the processor to be flushed thereby consuming additional processor resources and resulting in a power overhead.
  • the resources consumed by the pipeline flush can exceed the resources conserved by operating in the loop mode.
  • FIG. 1 is a block diagram of an instruction pipeline within a processor implementing loop exit prediction in a low power state in a loop mode in accordance with some embodiments.
  • FIG. 2 is a block diagram of the instruction pipeline of FIG. 1 with a loop exit predictor in a powered up state during a loop mode in accordance with some embodiments.
  • FIG. 3 is a block diagram of the instruction pipeline of FIG. 1 illustrating additional aspects of the loop exit predictor and a loop mode in accordance with some embodiments.
  • FIG. 4 is a flow diagram illustrating a method of employing loop exit prediction to identify relatively large loop iterations for a loop mode in accordance with some embodiments.
  • FIG. 5 is a flow diagram illustrating a method of employing loop exit prediction to identify small loop iterations for a loop mode in accordance with some embodiments.
  • FIGs. 1-5 illustrate techniques for employing loop exit prediction (LEP) at a processor to conserve processor resources associated with employing a loop mode.
  • the processor includes a LEP unit that predicts the exit of each executing loop.
  • the processor Based on the prediction by the LEP unit, the processor implements one or more loop management techniques, including declining to enter the loop mode for a relatively short loop, exiting the loop mode before an indication of a branch misprediction or encountering a branch misprediction, and accelerating entry into the loop mode for a relatively large loop.
  • Each of these techniques reduces the amount of resources consumed by the processor to employ the loop mode, thereby enhancing processing efficiency.
  • the processor employs the LEP unit to predict the number of iterations of each executing loop in a program flow. In response to the LEP unit indicating that the number of iterations for the loop is below a specified threshold number of loop iterations, the processor suppresses entry into the loop mode for the loop. The processor thereby avoids entering the loop mode when resource costs of entering the loop mode exceed resource savings from executing the loop in the loop mode.
  • the processor employs the LEP unit while executing a loop in the loop mode to predict when the loop is expected to exit.
  • the processor initiates exiting of the loop mode such as fetching and filling an instruction pipeline with one or more next instructions to execute upon exiting the loop.
  • the processor therefore does not wait for a branch misprediction to indicate the loop exit or trigger the loop exit, and thus this procedure avoids a pipeline flush that consumes processor resources and delays further instruction execution.
  • LEP is used to predict loop exit branches even during the loop mode.
  • a dedicated LEP unit within a processor performs LEP. Since LEP is specifically tuned for loop exit branches, LEP accuracy is higher than an accuracy of general branch prediction applied by one or more branch predictors during execution of loops.
  • the processor also uses the predicted number of iterations provided by the LEP unit to identify a relatively large loop, and the processor accelerates entry into the loop mode ahead of executing the large loop.
  • the processor nominally enters the loop mode in response to a first threshold number of iterations of the loop being executed or likely to be executed before entering the loop mode to make sure that a loop is actually encountered and successfully completing loops through the set of loop instructions.
  • the processor in response to the predicted number of iterations exceeding a specified second threshold, the processor initiates the loop mode without waiting for the first threshold number of iterations of the loop to be executed, thereby conserving processor resources by entering the loop mode sooner than in other embodiments of use of the loop mode.
  • FIG. 1 is a block diagram of an instruction pipeline architecture within a processor 100 implementing LEP in accordance with some embodiments. Only a few components of the processor 100 are illustrated for sake of simplicity of illustration. Further, certain components of the processor 100 may be considered part of either a front side or a back side of the processor 100 as conventionally understood, for retrieving and executing instructions, respectively, but are not so designated herein as the techniques described herein are applicable to a plurality of types of processors having various components, architectures, instruction sets, modes of operation, and so forth.
  • the processor 100 generally executes sets of instructions (e.g., computer programs) to carry out tasks on behalf of an electronic device.
  • the processor 100 is incorporated into an electronic device such as a desktop computer, laptop computer, server, smartphone, game console, household appliance, and the like.
  • the processor 100 includes an instruction pipeline 1 14 including an instruction cache 101 , a data cache 102, an instruction fetch unit 103 having one or more predictors 104, a loop exit predictor 105, a decoder 106, a reorder buffer 107, registers 108, a loop instruction buffer 109, reservation stations 1 10, a load/store unit 1 1 1 , one or more execution units 1 12, and a power controller 1 17.
  • the instruction pipeline 1 14 is operates in at least two modes: an active (non-loop) mode and a loop mode.
  • the components of the processor 100 are provided with power for active execution of instructions.
  • the processor 100 places one or more components in a low-power state to conserve one or more resources including energy that would have been consumed in the active mode such as while loop instructions are repeatedly executed while certain components remain idle.
  • the instruction fetch unit retrieves instructions from the instruction cache 101 based on a value stored at a program counter 1 13.
  • the instruction fetch unit 103 also fetches instructions based on predictions generated by the predictors 104.
  • the predictors 104 include branch predictors and loop predictors that identify branching instructions, generate branch target addresses, loop instructions, and perform other branch, loop, and prediction functions.
  • the instruction fetch unit 103 provides the fetched instructions to the decoder
  • a dispatch stage (not shown) of the decoder 106 sends each micro-op to a
  • the reorder buffer 107 manages the scheduling of execution of the micro-ops at the load/store unit 1 1 1 and the execution units 1 12.
  • the reservation stations 1 10 manage access to the registers 108 by the load/store unit 1 1 1 and execution units 1 12. After execution of the corresponding micro-operations, each instruction is retired at a retire stage (not shown) of the instruction pipeline 1 14.
  • the instruction pipeline 1 14 executes iterations of a loop using the loop instruction buffer 109.
  • a loop is a set of instructions that is repeatedly executed until a conditional branch terminating the loop is taken.
  • the conditional branch instruction is a relative jump instruction that includes an offset that is added to the program counter 1 13 pointing to the conditional branch instruction.
  • the instruction pipeline 1 14 identifies that the conditional branch instruction was taken a threshold number of times (e.g., 2, 3, 4, 5) in the most recent execution instance of the loop.
  • An iteration of a loop refers to a single execution of the instructions of the loop.
  • the instruction pipeline 1 14 in response to detecting an instruction loop (e.g., based on logic of the predictors 104 indicating an instruction loop), stores one or more of the micro-ops for the instructions of the loop in the loop instruction buffer 109.
  • the loop instruction buffer 109 repeatedly provides the micro-operations to the load/store unit 1 1 1 and the execution units 1 12 for execution until the loop exit is reached.
  • the instruction fetch unit 103 suspends retrieving instructions from the instruction cache 101 .
  • certain components of the processor 100 including one or more of the components of the instruction pipeline 1 14 are placed in a low-power mode or state by the power controller 1 17 to conserve power as illustrated by a dashed line 1 18.
  • the power controller 1 17 powers down the instruction fetch unit 103, one or more predictors 104, the loop exit predictor 105, and the decoder 106 while maintaining other components in an active state, such as the loop instruction buffer 109, the load/store unit 1 1 1 , and the execution units 1 12. While in the active state, certain components remain powered on and perform their functions until a loop exit condition occurs and power is restored to those components that were placed in the low-power mode (e.g., before, during, or after entering loop mode).
  • the instruction pipeline 1 14 includes a loop exit predictor (LEP) 105 that predicts the number of iterations of each executed loop.
  • LEP loop exit predictor
  • the LEP 105 stores a loop history 1 16 that indicates patterns in loops executed at the instruction pipeline 1 14.
  • the LEP 105 generates and stores the loop history 1 16 during one or more dedicated training periods of the instruction pipeline 1 14.
  • the instruction pipeline 1 14 executes specified sets of instructions, counts the number of iterations of each executed loop, and stores the number of iterations at a storage structure designated to predict the number of loops 1 15.
  • the instruction pipeline 1 14 continues to count iterations of each executed loop and based on the iterations adjusts the predicted number of loops 1 15.
  • the LEP 105 supports efficient use of the loop mode in a number of ways.
  • the instruction pipeline 1 14 employs the predictions of the LEP 105 to identify loops predicted to have relatively few iterations, and to avoid entering loop mode for those loops. Thus, in response to the predicted number of loops 1 15 for a loop being lower than a threshold, the instruction pipeline 1 14 prevents entry into the loop mode.
  • the instruction pipeline 1 14 employs the predictions of the LEP to identify loops predicted to have a relatively high number of iterations, and accelerates entry into the loop mode for those loops. Thus, in response to the predicted number of loops 1 15 for a loop being higher than a threshold (e.g., a first threshold), the instruction pipeline 1 14 enters the loop mode for the first iteration of the loop.
  • a threshold e.g., a first threshold
  • the instruction pipeline 1 14 uses the LEP 105 during the loop mode itself. This use can be better understood with reference to FIG. 2.
  • FIG. 2 is a block diagram of an alternative configuration of the processor 100 whereby the instruction pipeline 1 14 maintains the loop exit predictor 105 in an active state during the loop mode (as illustrated by the placement of the LEP 105 relative to the dashed line 218) in accordance with some embodiments. When in the active state during the loop mode, the loop exit predictor 105 continues to predict the number of loop iterations.
  • the loop exit predictor 105 updates the predicted number of loop iterations likely to be performed by the loop being executed, and the loop exit predictor 105 updates a timing of restoring power to the components of the instruction pipeline 1 14 that were placed in the low-power mode based on the updated prediction so that the loop mode is exited prior to a branch misprediction that undesirably results in a pipeline flush which is both a performance and power overhead.
  • the end of a loop, and therefore the exiting of the loop mode is indicated by a branch misprediction for the branch instruction that ends the loop.
  • the branch misprediction that indicates the end of the loop requires the instruction pipeline to be flushed and the pipeline returned to an earlier state.
  • executing the loop until encountering a misprediction results in a power loss by way of a pipeline bubble whereby one or more downstream components such as the decoder 106, the reorder buffer 107, the registers 108, the reservation stations 1 10, the load/store unit 1 1 1 , and the execution units 1 12 are starved for instructions.
  • the loop exit predictor 105 is maintained in the active state and predicts the exit to the loop.
  • the instruction pipeline exits the loop mode by returning the instruction fetch unit 103 and other modules to an active state.
  • the instruction pipeline 1 14 thereby avoids a branch misprediction for the loop exit and thus avoids a mispredict performance penalty.
  • FIG. 3 is a block diagram of the processor 100 of FIG. 1 illustrating additional aspects of the LEP 105 in accordance with some embodiments.
  • the loop exit predictor 105 further includes: a loop instruction buffer 302, loop prediction logic 303, one or more loop counters 304, loop identifiers 305, a first loop threshold 306, a second loop threshold 307, a loop prediction 308, one or more comparison results 309, and one or more confidence values 310.
  • the loop prediction logic 303 provides loop exit predictions based on a set of instructions that are identified as being executed repeatedly.
  • a loop prediction 308 includes identifying and storing the predicted number of loops for a particular loop or set of one or more loop instructions.
  • the loop counters 304 and the loop identifiers 305 are used by the loop exit predictor 105 and the instructions of the loop instruction buffer 302.
  • the loop counters 304 are used in training to identify when a set of instructions is executed as a loop, and used during loop execution to keep track of how many iterations of loop instructions have been completed.
  • a respective loop counter 304 is compared against a predicted loop exit in preparing to exit the loop at a predicted loop exit count.
  • One or more loops may be encountered when executing processor instructions and the processor 100 maintains a history of a plurality of executing loops in the loop history 1 16 such as when a second loop is executing inside of a first loop.
  • the loop counters 304 include at least loop confidence values, and current, past, and predicted loop iteration values.
  • the loop exit predictor 105 detects the loop and the loop exit branch in the set of processor instructions. Training includes the loop exit predictor 105 keeping track of a number of loop iterations repeatedly executed for a particular set of loop instructions such as in one of the loop counters 304. Whenever a particular loop iterates a same number of times as in a previous run or execution instance of the loop, a confidence value 310 is incremented, and the confidence value 310 is used by the loop exit predictor 105 when providing its estimate of the loop exit.
  • the loop exit predictor 105 searches for a matching loop identifier in a current set of the loop identifiers 305.
  • a hit to an LEP entry in the loop identifiers 305 implies that a predicted branch instruction is an exit branch instruction.
  • Finding the hit in the loop identifiers 305 includes matching a characteristic of a loop instruction to at least one identifier in the loop identifiers. If a current iteration of the particular loop being tracked by the loop exit predictor 105 is equal to a total number of iterations predicted by loop exit predictor 105, then the particular loop is predicted to exit during this iteration. That is, the particular loop iteration of the loop exit branch is predicted as not-taken. Otherwise, the loop exit branch is predicted to be taken.
  • LEP performed by the loop exit predictor 105 is only performed when a confidence value 310 associated with the particular branch is sufficiently high. If the confidence value 310 is too low (i.e., fails to exceed a confidence threshold) or if there is no hit to an LEP entry in the loop identifiers 305, then the branch is predicted or subjected to processing by other branch predictors such as one of the predictors 104 of the instruction fetch unit 103. Since the loop prediction logic 303 is specifically tuned for loop exit branches, its prediction accuracy is usually higher than an accuracy of other branch predictors or general type predictors when the processor 100 executes instructions of exit branches. The loop prediction logic 303 provides the loop prediction 308 for each loop. The loop prediction 308 indicates whether a set of executing instructions is indeed a set of loop instructions. The loop exit predictor 105 provides the predicted number of loops: a number of iterations that the set of loop instructions is likely to complete before exiting.
  • entering the loop mode is triggered by saturating a specified number of bits of a direction history of the conditional branches (not shown) to ensure that a loop (e.g., a set of one or more instructions) is actually being executed by the processor 100.
  • a loop is identified by finding a repetitive pattern along a direction in a history register. For a direction history register that is 100 bits, if a group of five bits out of the 100 bits is repeating, then that implies that there is a loop with five conditional branches.
  • the loop mode is entered only after saturating a certain number of bits of the direction history register or exceeding a direction threshold (value).
  • a system For a saturation level of 80 bits to saturate, and a loop that has only two conditional branches, a system would have to wait 40 iterations of the loop because only at that point would have a direction history variable (e.g., dirHist) become saturated (reach 80 counted bits) thereby triggering entrance of the loop mode.
  • dirHist direction history variable
  • the system On the other hand, if the number of bits to saturate is too low (e.g., 10), then the system would have entered the loop mode right after a fifth iteration to reach the value of 10 by incrementing saturation by two bits for each loop iteration.
  • the processor would enter the loop mode and then immediately come out of loop mode thereby wasting the benefits provided by the loop mode.
  • the processor 100 is identified as executing a loop. The larger the direction threshold, the longer it takes the processor 100 to be triggered into entering the loop mode and the lower a chance to identify an opportunity to save power by entering the loop mode when the instructions are actually loop instructions.
  • branch prediction includes a branch direction, a direction threshold, and a target address.
  • LEP includes a loop direction, a loop threshold, and a loop exit target address.
  • the processor 100 also uses the loop prediction 308 before and during execution of micro-ops to determine when to enter the loop mode and to exit the loop mode.
  • the loop prediction 308 is compared against the first loop threshold 306 and the second loop threshold 307.
  • the comparisons yield a respective comparison result 309, one result per comparison.
  • the processor 100 enters the loop mode.
  • an application e.g., software application as a source of micro-ops for the processor 100
  • micro-ops of the instructions (or instructions) pertaining to that loop are cached in the loop instruction buffer 302 before or during the loop mode.
  • the micro-ops are executed out of the loop instruction buffer 302 by one or more cores such as the first processor core 301 , and certain other components of the processor 100 are placed in a low- power mode thereby saving power that would have been consumed by operation of the components operating at full power.
  • the loop exit predictor 105 remains powered up, and the loop instruction buffer 302 is powered down to a low-power or lower- power state, and energy consumption by the processor 100 remains a result of the loop mode.
  • the loop exit predictor 105 remains powered up and continues to predict the exit of the loop and a direction of the loop instructions when the instructions are pulled from the instruction cache 101 and provided to the first processor core 301.
  • the loop mode occurs when one or more components are powered down or placed into a low-power mode and while loop instructions are executed such as from the loop instruction buffer 302.
  • one way to exit the loop mode is to have an instruction execution component send a redirect message, an exit signal, indicating that the exit branch was mispredicted to one or more components of the processor 100.
  • the exit signal causes the instruction pipeline 1 14 to fetch and execute instructions that occur after the loop. Because branch mispredicts are expensive in terms of wasted power and wasted execution cycles, an improperly selected or designated direction threshold comes with a power performance overhead. Hence, there is a trade-off to be made in terms of obtaining a power savings by entering the loop mode versus a power performance overhead for mispredicting the exit branch instruction.
  • the power performance overhead of the mispredicted exit branch outweighs the power savings in the loop mode for a particular configuration of the processor 100.
  • Another way to exit the loop mode involves the loop exit predictor 105 remaining powered up and the loop exit predictor 105 providing the loop exit signal upon successful loop exit prediction. In this way, a mispredict is avoided by having the instruction pipeline 1 14 timely deliver for execution instructions occurring after the loop.
  • FIG. 4 is a flow diagram illustrating a method 400 for implementing loop exit prediction for a relatively large loop iteration prediction in accordance with some embodiments.
  • the method 400 is performed by components of a processor such as the components of the processor 100.
  • the method 400 includes identifying whether a branch instruction is a loop instruction - a loop to potentially be executed in the loop mode. If so, at block 402, the processor determines a loop identifier and a number of loop iterations for the loop. This identification includes looking up the loop identifier in a set of stored loop identifiers such as the loop identifiers 305.
  • the processor determines whether the determined number of loop iterations exceeds a first loop threshold such as the first loop threshold 306.
  • the first loop threshold is a relatively large number (e.g., 500; 1 ,000; 10,000) for use in identifying a loop as a large loop having a relatively large number of predicted loop iterations to be executed by the processor. If the determined number of loop iterations exceeds the first loop threshold, the loop mode is directly entered. Further, according to some embodiments, if the first loop threshold is exceeded, no check is made whether a certain direction history threshold or direction history variable is exceeded: the loop mode is directly entered without making this check.
  • the processor determines whether the determined number of loop iterations exceeds a second loop threshold such as the second loop threshold 307.
  • the second loop threshold is a relatively small number (e.g., 15,
  • the processor waits for identification of a next loop by maintaining one or more components of an instruction pipeline in an active mode including maintaining the components in a powered up state, and execution returns to block 401.
  • the processor and loop exit predictor have encountered a loop that is likely too small to benefit from power savings of the loop mode and the processor avoids entering the loop mode based on the determination relative to the first loop threshold and the second loop threshold. Alternatively, the processor avoids entering the loop mode based on the determination relative to the second loop threshold.
  • the processor waits for a certain number of actual loop iterations before confirming that the instructions are executing within the loop. If the determined number of loop iterations exceeds the first threshold at block 403 - or after waiting the certain number of successful loop executions at block 406 - the method 400 continues at block 407 at which a set of loop instructions are stored in a loop buffer, such as the loop buffer 109.
  • the loop instructions are repeatedly executed from the loop buffer.
  • one or more components of the processor are placed in a low power mode.
  • the loop instructions are executed until a branch misprediction occurs or the loop instructions are executed for the number of predicted loop iterations and exited by having the loop exit predictor accurately predict the loop exit and provide the loop exit signal. In this situation, no pipeline bubble is encountered by the processor.
  • power is restored to the processor components that were placed in the low-power mode during the loop mode at block 408. Once power is restored, the processor waits for a next loop at block 405.
  • FIG. 5 is a flow diagram illustrating a method 500 for implementing loop exit prediction for a relatively small loop iteration prediction in accordance with some embodiments.
  • the method 500 is performed by components of a processor such as the processor 100.
  • the method 500 includes predicting a number of loop iterations associated with a set of loop instructions.
  • the set of loop instructions are executed in a loop mode and, in response to the predicted number of loop iterations failing to exceed the first loop iteration threshold (e.g., the predicted number of loop iterations is less than or equal to the loop iteration threshold), the set of instructions are operated in an active mode.
  • the loop mode includes placing at least one component of an instruction pipeline of the processor into a low-power mode or state. Further, according to some embodiments, at block 503, no check is made whether a certain direction history threshold or direction history variable is exceeded: the loop mode is entered directly upon determining that the predicted number of loop iterations exceeds the first loop iteration threshold. At block 504, the loop mode also includes executing the set of loop instructions from a loop buffer.
  • the loop mode includes certain additional steps in
  • the loop mode updates the predicted number of loop iterations associated with the set of loop instructions.
  • the predicting and the updating of the number of loop iterations is performed by a loop exit predictor such as the loop exit predictor 105.
  • the loop mode determines a time to restore power to the components of the instruction pipeline of the processor that were placed in the low-power mode.
  • the time to restore power to the low-powered components is able to come before an end of execution of the loop instructions since a lead time (e.g., a certain number of clock cycles) is often needed to fill the instruction pipeline with instructions that come sequentially after exiting the loop to avoid a pipeline bubble.
  • the loop mode predicts an exit for the set of loop instructions.
  • the processor determines the time to restore power to the components placed in the low-power mode and determines a next instruction address based on the predicted exit.
  • the active mode of the method 500 includes maintaining the at least one component of the instruction pipeline in a powered up state. For example, a loop exit predictor such as loop exit predictor 105 is maintained with power.
  • the active mode also executes the set of loop instructions from an instruction fetch stage unit of the instruction pipeline.
  • the processor is operating either in the loop mode or in the active mode.
  • certain aspects of the techniques described above may implemented by one or more processors of a processing system executing software.
  • the software includes one or more sets of executable instructions stored or otherwise tangibly embodied on a non-transitory computer readable storage medium.
  • the software can include the instructions and certain data that, when executed by the one or more processors, manipulate the one or more processors to perform one or more aspects of the techniques described above.
  • the non-transitory computer readable storage medium can include, for example, a magnetic or optical disk storage device, solid state storage devices such as Flash memory, a cache, random access memory (RAM) or other non-volatile memory device or devices, and the like.
  • the executable instructions stored on the non-transitory computer readable storage medium may be in source code, assembly language code, object code, or other instruction format that is interpreted or otherwise executable by one or more processors.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • Advance Control (AREA)
  • Executing Machine-Instructions (AREA)
  • Microcomputers (AREA)
  • Power Sources (AREA)
PCT/US2019/048487 2018-09-18 2019-08-28 Using loop exit prediction to accelerate or suppress loop mode of a processor Ceased WO2020060734A1 (en)

Priority Applications (4)

Application Number Priority Date Filing Date Title
CN201980061096.3A CN112740173A (zh) 2018-09-18 2019-08-28 使用循环退出预测来加速或抑制处理器的循环模式
EP19862627.7A EP3853716A4 (en) 2018-09-18 2019-08-28 USING A LOOP OUTPUT PREDICTION TO SPEED UP OR SUPPRESS THE LOOP MODE OF A PROCESSOR
KR1020217010368A KR102556897B1 (ko) 2018-09-18 2019-08-28 루프 종료 예측을 이용하여 프로세서의 루프 모드를 가속 또는 억제하기
JP2021514963A JP7301955B2 (ja) 2018-09-18 2019-08-28 ループ終了予測を用いたプロセッサのループモードの促進又は抑制

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US16/134,440 US10915322B2 (en) 2018-09-18 2018-09-18 Using loop exit prediction to accelerate or suppress loop mode of a processor
US16/134,440 2018-09-18

Publications (1)

Publication Number Publication Date
WO2020060734A1 true WO2020060734A1 (en) 2020-03-26

Family

ID=69772505

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2019/048487 Ceased WO2020060734A1 (en) 2018-09-18 2019-08-28 Using loop exit prediction to accelerate or suppress loop mode of a processor

Country Status (6)

Country Link
US (2) US10915322B2 (https=)
EP (1) EP3853716A4 (https=)
JP (1) JP7301955B2 (https=)
KR (1) KR102556897B1 (https=)
CN (1) CN112740173A (https=)
WO (1) WO2020060734A1 (https=)

Families Citing this family (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10884751B2 (en) 2018-07-13 2021-01-05 Advanced Micro Devices, Inc. Method and apparatus for virtualizing the micro-op cache
US11294681B2 (en) * 2019-05-31 2022-04-05 Texas Instruments Incorporated Processing device with a microbranch target buffer for branch prediction using loop iteration count
US11256318B2 (en) * 2019-08-09 2022-02-22 Intel Corporation Techniques for memory access in a reduced power state
US20210200550A1 (en) * 2019-12-28 2021-07-01 Intel Corporation Loop exit predictor
US11520590B2 (en) * 2020-09-02 2022-12-06 Microsoft Technology Licensing, Llc Detecting a repetitive pattern in an instruction pipeline of a processor to reduce repeated fetching
US20220283811A1 (en) * 2021-03-03 2022-09-08 Microsoft Technology Licensing, Llc Loop buffering employing loop characteristic prediction in a processor for optimizing loop buffer performance
US12288067B2 (en) * 2022-06-23 2025-04-29 Arm Limited Prediction of number of iterations of a fetching process
US12373215B2 (en) * 2022-07-25 2025-07-29 Apple Inc. Using a next fetch predictor circuit with short branches and return fetch groups
US20240112050A1 (en) * 2022-09-29 2024-04-04 Nvidia Corporation Identifying idle-cores in data centers using machine-learning (ml)
US12541371B2 (en) 2023-08-23 2026-02-03 Arm Limited Predicting behaviour of control flow instructions using prediction entry types
CN117170747B (zh) * 2023-08-28 2025-10-17 海光信息技术股份有限公司 程序与指令处理、训练与预测方法与装置、处理器
US12411692B2 (en) * 2023-09-07 2025-09-09 Arm Limited Storage of prediction-related data
US12517732B2 (en) 2024-03-22 2026-01-06 Tenstorrent USA, Inc. Processor with one or more progressive conservative execution modes
US12450060B1 (en) * 2024-08-28 2025-10-21 Qualcomm Incorporated Sharing loop cache instances among multiple threads in processor devices

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6578138B1 (en) * 1999-12-30 2003-06-10 Intel Corporation System and method for unrolling loops in a trace cache
US20090055635A1 (en) * 2007-08-24 2009-02-26 Matsushita Electric Industrial Co., Ltd. Program execution control device
US20120117362A1 (en) 2010-11-10 2012-05-10 Bhargava Ravindra N Replay of detected patterns in predicted instructions
US20150227374A1 (en) * 2014-02-12 2015-08-13 Apple Inc. Early loop buffer entry
US20150293577A1 (en) * 2014-04-11 2015-10-15 Apple Inc. Instruction loop buffer with tiered power savings
US20160179549A1 (en) 2014-12-23 2016-06-23 Intel Corporation Instruction and Logic for Loop Stream Detection

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9952869B2 (en) * 2009-11-04 2018-04-24 Ceva D.S.P. Ltd. System and method for using a branch mis-prediction buffer
US9116686B2 (en) * 2012-04-02 2015-08-25 Apple Inc. Selective suppression of branch prediction in vector partitioning loops until dependency vector is available for predicate generating instruction
US9753733B2 (en) * 2012-06-15 2017-09-05 Apple Inc. Methods, apparatus, and processors for packing multiple iterations of loop in a loop buffer
US9557999B2 (en) * 2012-06-15 2017-01-31 Apple Inc. Loop buffer learning
US9710276B2 (en) * 2012-11-09 2017-07-18 Advanced Micro Devices, Inc. Execution of instruction loops using an instruction buffer
US9459871B2 (en) * 2012-12-31 2016-10-04 Intel Corporation System of improved loop detection and execution
CN104298488B (zh) * 2014-09-29 2018-02-23 上海兆芯集成电路有限公司 循环预测器指导的循环缓冲器
US9875106B2 (en) 2014-11-12 2018-01-23 Mill Computing, Inc. Computer processor employing instruction block exit prediction
JP2018005488A (ja) * 2016-06-30 2018-01-11 富士通株式会社 演算処理装置及び演算処理装置の制御方法

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6578138B1 (en) * 1999-12-30 2003-06-10 Intel Corporation System and method for unrolling loops in a trace cache
US20090055635A1 (en) * 2007-08-24 2009-02-26 Matsushita Electric Industrial Co., Ltd. Program execution control device
US20120117362A1 (en) 2010-11-10 2012-05-10 Bhargava Ravindra N Replay of detected patterns in predicted instructions
US20150227374A1 (en) * 2014-02-12 2015-08-13 Apple Inc. Early loop buffer entry
US20150293577A1 (en) * 2014-04-11 2015-10-15 Apple Inc. Instruction loop buffer with tiered power savings
US20160179549A1 (en) 2014-12-23 2016-06-23 Intel Corporation Instruction and Logic for Loop Stream Detection

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3853716A4

Also Published As

Publication number Publication date
JP7301955B2 (ja) 2023-07-03
CN112740173A (zh) 2021-04-30
US11256505B2 (en) 2022-02-22
US20200089498A1 (en) 2020-03-19
EP3853716A4 (en) 2022-06-15
US20210191722A1 (en) 2021-06-24
JP2022500777A (ja) 2022-01-04
KR20210046806A (ko) 2021-04-28
KR102556897B1 (ko) 2023-07-18
US10915322B2 (en) 2021-02-09
EP3853716A1 (en) 2021-07-28

Similar Documents

Publication Publication Date Title
US11256505B2 (en) Using loop exit prediction to accelerate or suppress loop mode of a processor
EP3001308B1 (en) Loop predictor-directed loop buffer
US8782384B2 (en) Branch history with polymorphic indirect branch information
US8099586B2 (en) Branch misprediction recovery mechanism for microprocessors
US7890738B2 (en) Method and logical apparatus for managing processing system resource use for speculative execution
US9361111B2 (en) Tracking speculative execution of instructions for a register renaming data store
US20100325395A1 (en) Dependence prediction in a memory system
CN102736897B (zh) 多线程处理的线程选择
US20140195790A1 (en) Processor with second jump execution unit for branch misprediction
US8028180B2 (en) Method and system for power conservation in a hierarchical branch predictor
US10705851B2 (en) Scheduling that determines whether to remove a dependent micro-instruction from a reservation station queue based on determining cache hit/miss status of one ore more load micro-instructions once a count reaches a predetermined value
US10628160B2 (en) Selective poisoning of data during runahead
EP2261797B1 (en) Information processing apparatus and branch prediction method
US10860327B2 (en) Methods for scheduling that determine whether to remove a dependent micro-instruction from a reservation station queue based on determining a cache hit/miss status of a load micro-instruction once a count reaches a predetermined value and an apparatus using the same
US20130173885A1 (en) Processor and Methods of Adjusting a Branch Misprediction Recovery Mode
US10613866B2 (en) Method of detecting repetition of an out-of-order execution schedule, apparatus and computer-readable medium
US20060225046A1 (en) System for predictive processor component suspension and method thereof
US11526360B2 (en) Adaptive utilization mechanism for a first-line defense branch predictor
US11663007B2 (en) Control of branch prediction for zero-overhead loop
WO2013100999A1 (en) Enabling and disabling a second jump execution unit for branch misprediction
US20020166042A1 (en) Speculative branch target allocation
CN120179296B (zh) 基于分支跳转的伪乱序指令调度方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19862627

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2021514963

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 20217010368

Country of ref document: KR

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 2019862627

Country of ref document: EP

Effective date: 20210419

WWG Wipo information: grant in national office

Ref document number: 202117011380

Country of ref document: IN