WO2012145992A1 - 一种实现值关联间接跳转预测的方法 - Google Patents

一种实现值关联间接跳转预测的方法 Download PDF

Info

Publication number
WO2012145992A1
WO2012145992A1 PCT/CN2011/080247 CN2011080247W WO2012145992A1 WO 2012145992 A1 WO2012145992 A1 WO 2012145992A1 CN 2011080247 W CN2011080247 W CN 2011080247W WO 2012145992 A1 WO2012145992 A1 WO 2012145992A1
Authority
WO
WIPO (PCT)
Prior art keywords
instruction
value
indirect jump
information
boot
Prior art date
Application number
PCT/CN2011/080247
Other languages
English (en)
French (fr)
Inventor
程旭
谭明星
刘先华
张吉豫
谢子超
佟冬
Original Assignee
北京北大众志微系统科技有限责任公司
济南众志信息技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京北大众志微系统科技有限责任公司, 济南众志信息技术有限公司 filed Critical 北京北大众志微系统科技有限责任公司
Publication of WO2012145992A1 publication Critical patent/WO2012145992A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/445Program loading or initiating
    • G06F9/44521Dynamic linking or loading; Link editing at or after load time, e.g. Java class loading
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/54Link editing before load time
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3836Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
    • G06F9/3842Speculative instruction execution
    • G06F9/3846Speculative instruction execution using static prediction, e.g. branch taken strategy

Definitions

  • the invention belongs to the field of system design of microprocessor design and application microprocessor, and in particular relates to a method for implementing value-dependent indirect jump prediction by a modern processor.
  • speculative execution is one of the important means of mining instruction-level parallelism.
  • accurate transfer prediction techniques are crucial.
  • the main purpose of the transfer prediction is to increase the CPU speed. Predictive execution is based on the transfer prediction, that is, if the CPU can predict whether the program is transferred before the result of the previous instruction executed by the pipeline, the corresponding instruction can be executed in advance, thereby avoiding the idleness of the pipeline in the processor. Wait, thus increasing the speed of the CPU. On the other hand, if the previous instruction results in a transfer prediction error, the instructions and results that have been loaded into the pipeline must be cleared, and then the correct instruction is loaded into the pipeline for reprocessing, which reduces processor performance.
  • the accurate branch instruction prediction technology can provide the processor with a waste of processor clock cycles caused by continuous instruction stream execution; and once the branch instruction misprediction occurs, the processor speculates on the number of error steps performed. Ten or even hundreds of instructions will be discarded, all speculative execution will be cancelled, and the processor clock cycle will be wasted. Therefore, improving the accuracy of the transfer instruction prediction is a key goal of the design and application of the transfer instruction, which is of great significance for modern processor mining instruction level parallelism.
  • the transfer instructions can be classified into conditional transfer and unconditional transfer according to the target characteristics of the transfer, or can be divided into direct transfer and indirect transfer. Among them, conditional direct transfer (referred to as “conditional transfer”) and unconditional indirect transfer (referred to as “indirect jump”) are the two most important transfer instructions. Conditional transfer instructions usually have a strong correlation with the transfer history, so the history-based conditional transfer predictor can Achieve higher prediction accuracy; and indirect jump instructions are difficult to accurately predict due to multiple target addresses.
  • Indirect jump instructions are widely used in modern object-oriented programs and virtual machine interpreter programs, resulting in a large number of indirect jump prediction failures. According to statistics, about 45% of the transition prediction failures are caused by indirect jump instructions. Therefore, the prediction failure problem of indirect jump instructions is one of the important factors affecting the performance of modern processors. How to design efficient and accurate indirect jumps The predictor is a difficult problem in current processor design.
  • the associated predictor is currently the most widely used type of predictor.
  • the history-based indirect jump predictor uses information such as the jump direction history and the execution path history to guide the prediction of the indirect jump destination address; the indirect jump predictor based on the data value uses some data values to guide the indirect jump prediction.
  • the indirect jump predictor based on the advance calculation uses special hardware to calculate the jump destination address in advance for the special indirect jump instruction of the virtual function call.
  • the technical problem to be solved by the present invention is to provide a method for implementing value-dependent indirect jump prediction, which can accurately perform indirect jump prediction according to the effective association information of the indirect jump instruction.
  • the present invention provides a method for implementing value-related indirect jump prediction, involving a compiler and a processor, the method comprising:
  • the compiler identifies the subroutine structure corresponding to the indirect jump instruction in the source program and the associated information in the associated data value according to the cross-sectional information obtained when the executable program is executed by the processor, and the related information in the source data
  • a boot instruction for identifying the associated information is inserted in the sequence, and the executable program is generated again.
  • the processor dynamically collects the related information according to the boot instruction during execution of the executable program generated by the compiler again, and generates a value history mode.
  • the cross-sectional information obtained by the compiler includes one or more of the number of executions of the indirect jump instruction, the number of dynamic jump targets, and the number of predicted failures of the target address; and the indirect jump in the source program is identified according to the cross-sectional information.
  • the step of the subroutine structure corresponding to the instruction and the associated data value includes: selecting an indirect jump instruction whose execution times are more than the threshold value and/or the prediction failure rate is higher than the failure rate threshold is a difficult prediction instruction;
  • one or more bit information in the middle of the virtual function table address is identified as associated information
  • the low-order bit information in the standardized case variable value is identified as the associated information
  • one or more bits in the function pointer value that are not aligned start are identified as associated information.
  • the compiler inserts a boot instruction for identifying the associated information in the source program, and the step of generating the executable program again includes:
  • the compiler analyzes the control flow between the source program processes, and explicitly inserts a boot instruction on the path of the control flow.
  • the information carried in the boot instruction includes: a distance value between the boot instruction and the unpredictable instruction.
  • the method further includes:
  • the compiler schedules the instructions based on the data dependencies between the source programs to increase the distance between the instructions and the corresponding unpredictable instructions.
  • the method further includes a register file, a value history mode register, and a target address buffer; wherein, in the transmitting phase of the boot instruction, the processor executes the executable program generated by the compiler again, according to The boot instruction dynamically collects the associated information and generates a value history pattern
  • the steps include:
  • the value of the corresponding register in the register file is read as the associated data value according to the register number indicated by the boot instruction;
  • the previous value history in the value history mode register is shifted to the second combined position of the value history, and combined with the associated information shifted to the first combined position of the value history, spliced into a value history mode.
  • the method further includes:
  • the target address of the hard-to-predict instruction is predicted based on the program counter PC value of the hard-to-predict instruction and the generated value history mode.
  • the method further includes a filter table.
  • the method further includes: using a sum of the distance value indicated by the boot command and the program counter PC value of the boot command as a label, if the query is not in the filter table If the item is matched, the PC value of the label as a difficult prediction instruction is filled in the newly allocated item in the filter table.
  • the step of predicting the target address of the unpredictable command according to the PC value of the hard-to-predict command and the generated value history mode includes:
  • the filter table has an item matching the label, the current indirect jump instruction is marked as a hard-to-predict instruction
  • the value history mode read from the value history mode register is XORed with the PC value of the difficult prediction instruction, and the result of the exclusive OR operation is read as an index and stored in the target address buffer.
  • the destination address, the instruction fetch and execution of the next cycle instruction is XORed with the PC value of the difficult prediction instruction, and the result of the exclusive OR operation is read as an index and stored in the target address buffer.
  • the typical subroutine structure recognized by the compiler at compile time and its effective And the data value is transmitted to the processor by inserting the boot instruction, so that the processor forms a value history mode according to the dynamically collected multiple associated data values during the execution of the program, and predicts the indirect jump instruction as effective associated information, thereby Effectively improve the prediction accuracy of indirect jump instructions, thereby improving the overall system performance of the processor and its applications.
  • FIG. 1 is a flowchart of an embodiment of a method for implementing value-associated indirect jump prediction according to the present invention
  • FIG. 2 is a flowchart of an embodiment of the method for identifying a correlation data value in a second compilation by the compiler in the method embodiment shown in FIG.
  • FIG. 3 is a flow chart of an embodiment of a method for forming a value history mode and predicting an indirect jump instruction by an associated data value collected by a processor according to a boot instruction in the method embodiment shown in FIG. 1;
  • FIG. 4 is a structural block diagram of an embodiment of a value-dependent indirect jump prediction apparatus used in the method embodiment shown in FIG. 3;
  • FIG. 5 is a flow chart for further explaining the operation of the flow of the method embodiment shown in FIG. 3.
  • FIG. 6 is a schematic diagram showing the sorting and shifting of the associated data values by the class shifter 2 in the device embodiment shown in FIG.
  • Figure 7 is a schematic diagram showing the update of the value history mode by the value history mode register 3 of the apparatus embodiment shown in Figure 4;
  • Figure 8 is a graphical representation of experimental results data illustrating the prediction results of the method and apparatus of the present invention.
  • the embodiment of the present invention adopts a technical solution of software and hardware coordination, and proposes a method and system for value-related indirect jump prediction based on compiler guidance.
  • the core idea is that the compiler passes a typical sub-routine.
  • the characteristic analysis of the indirect jump instruction included in the program structure find the different associated data values to which the indirect jump instruction in the different subroutine structure should be associated, and mark the associated data value by inserting the boot instruction in the program. To instruct the processor to perform indirect jump predictions when executing executable programs.
  • the embodiment of the present invention designs a compiling method for automatically identifying unrelated predictive indirect jump instructions according to the subroutine structure, and designing an associated data value that can be recognized at compile time to the processor.
  • Boot instructions The compiler explicitly inserts the boot instruction after identifying the associated value based on the subroutine structure to pass the associated information identified at compile time to the processor.
  • FIG. 1 it is a flow of an embodiment of a method for implementing value-related indirect jump prediction provided by the present invention, including a flow executed by a compiler at a compile time and a flow executed by a processor at a program running time, respectively, including the following steps. :
  • the compiler compiles the source program into an executable program by first compiling
  • the compiler performs a cross-sectional view of the processor executing the executable program process to obtain the cross-sectional information; the compiler performs a cross-sectional view of the indirect jump instruction of the processor during execution of the executable program, and collects the program indirect jump instruction in a typical
  • the profiling information in the case of the input set mainly includes one or more of the number of executions of the indirect jump instruction, the number of dynamic jump targets, and the number of target address prediction failures.
  • the compiler performs secondary compilation according to the cross-sectional information, identifies the subroutine structure and its associated information in the source program for the indirect jump instruction, and inserts the boot instruction during the compiling process;
  • the compiler recognizes the subroutine structure corresponding to the indirect jump instruction in the source program, and mainly includes one or more of a virtual function call, a Switch-case statement, and a function pointer call; and identifying and corresponding according to the corresponding subroutine structure
  • the indirect jump instruction jump target has strong correlation information; through the inter-process control flow analysis, the boot instruction is explicitly inserted into the program to identify the associated information corresponding to the indirect jump instruction.
  • the "boot instruction” set by the embodiment of the present invention for implementing value-associated indirect jump prediction is a special instruction (an instruction that is invisible to a user using a processor) added by extending the instruction system, and the instruction carries The following three types of information:
  • the first type of information is used to indicate the distance between the boot instruction and the corresponding indirect jump instruction, indicating that The value of the distance can be positive or negative, which is determined by the relative order of the boot instruction and the indirect jump instruction;
  • the second type of information is a register number corresponding to the associated information corresponding to the indirect jump instruction
  • the third type of information is a category indicating an indirect jump instruction corresponding to the boot instruction, that is, a category indicating a subroutine structure corresponding to the indirect jump instruction.
  • the above three types of information are all included in the boot instruction by the encoder through direct encoding; these three types of information can be obtained when the boot instruction is decoded by the processor during program execution.
  • the specific format of the boot instructions can be customized according to the characteristics of the processor's command system.
  • the processor dynamically collects related information according to the booting instruction during execution of the executable program, and forms a value history mode;
  • the value history mode is a composite information formed by combining corresponding related information among a plurality of associated data values. It draws on the idea of an indirect jump predictor based on the transfer history, but it differs from the indirect jump predictor based on the transfer history in that it forms the historical pattern using the information in the associated data values instead of using the transfer history. .
  • the associated data value of the indirect jump instruction is 1, 2, 3, 1, 2, 3 in sequence.
  • the indirect jump instruction is The corresponding value history is (1, 2), (2, 3), (3, 1), (1, 2), (2, 3), and its value history mode is (1, 2), (2, 3) ), (3,1);
  • the value history mode is composed of the associated information bits of the three associated data values
  • the value history corresponding to the indirect jump instruction is (1, 2, 3), (2) , 3,1), (3,1,2), (1,2,3), whose value history mode is (1,2,3), (2,3,1), (3,1,2) .
  • the value history mode reflects the law of the occurrence of associated data values and has a strong correlation with the indirect jump destination address, so it can be used to guide indirect jump prediction.
  • the processor predicts the indirect jump instruction as the valid associated information according to the value history pattern formed by the collected association information.
  • Step 30 shown in Figure 1 that is, during the secondary compilation process, the compiler identifies the subroutine structure and its associated data values in the source program for the indirect jump instruction, and inserts it into the boot instruction, and the specific flow representation In Figure 2, the following steps are included:
  • the compiler collects difficult prediction instructions according to a program executed by the processor according to the source program and its typical input set.
  • the compiler selects an indirect jump instruction that performs more than the number of thresholds and the predicted failure rate is higher than the failure rate threshold as a "difficult to predict instruction" based on the profile information obtained during the cross-sectional process.
  • These subroutine structures are a kind of local control flow and data dependency structure, that is, one or more of the above virtual function calls, Switch-case statements, and function pointer calls. Since these subroutine structures carry source-level control flow and data flow information, it is possible to more clearly indicate which data values are more strongly related to indirect jump instructions and how to use these highly correlated information.
  • one or more bit information in the middle of the virtual function table address is identified as the associated information forming the corresponding value history information.
  • a virtual function call subroutine is a special function call designed to implement the "polymorphism" feature in an object-oriented program.
  • This "polymorphism" means that when the same message is received by different categories of objects, it may result in completely different behavior, so the virtual function call target address is dynamically determined by the specific category of the object.
  • Indirect jump instructions applied to virtual function calls usually require three processes: get the object address, get the virtual function table address, and indirect jump. According to the semantic features of the virtual function call, it can be found that the virtual function table has strong affinity with the indirect jump instruction, and its corresponding value history information should contain one or more bit information in the middle of the virtual function table address.
  • the low-order bit information of the normalized case variable value is identified as the associated information forming the corresponding value history information.
  • the Switch-case statement is a control flow that dynamically selects branch paths based on the value of the case variable. Structure, widely used in modern high-level programming languages such as C/C++/C#/Java. Usually, when the number of branch paths is greater than a certain threshold, the compiler uses an indirect jump instruction to implement the Switch-case statement, otherwise it uses the if-else structure.
  • the specific process is to first normalize the case variable to become a neighboring enumeration variable starting from 0, and then use the normalized case variable value as the index to obtain the corresponding target address, and use the indirect jump. The instruction jumps to the corresponding branch path.
  • the normalized case variable value has a strong correlation with the indirect jump instruction, and its corresponding value history pattern should contain the low order bit information of the normalized case variable value.
  • one or more bits starting from the non-alignment in the function pointer value are identified as the associated information forming the corresponding value history information.
  • the function pointer call is mainly used to jump to the corresponding target address according to the function pointer content. Therefore, the function pointer value has a strong correlation with the indirect jump instruction, and its corresponding value history pattern should contain one or more bits of information in the function value pointer value that are not aligned.
  • the compiler analyzes the control flow of the source program, and explicitly inserts a boot instruction on the path of the control flow to identify The associated information forming the value history information corresponding to the corresponding indirect jump instruction.
  • the compiler inserts a boot instruction on each control path to track multiple control flow paths, so each indirect jump instruction may correspond to multiple boot instructions.
  • the boot instruction is scheduled according to the data dependency between the program processes to increase the distance between the boot instruction and the indirect jump instruction.
  • the compiler's scheduling of the boot instructions is mainly based on such data dependencies between the instructions, dynamically scheduling the boot instructions and the boot instructions and subsequent instructions of the boot instructions without affecting the correctness of the program, to increase the boot instructions and indirect The distance between the jump instructions, so that the processor can predict the indirect jump instruction target address in time through the associated data value passed by the boot instruction.
  • the specific instruction scheduling algorithm is based on the traditional "table scheduling” algorithm (refer to Compilers: Principles, Techniques, & Tools, second Edition)) in Section 10.3.2), including the following steps:
  • the boot instruction can complete the scheduling as early as possible, leaving as many instructions as possible between the boot instruction and the indirect jump instruction, thereby increasing the distance between the boot instruction and the indirect jump instruction.
  • step 50 of the method embodiment shown in FIG. 1 the processor dynamically collects the association information according to the boot instruction, and forms a value history mode, and predicts the indirect jump instruction according to the formed value history mode, and the specific process of the method embodiment Indicated in Figure 3, including the following steps:
  • the target address stored in the target address buffer is predicted based on the PC value of the unpredictable instruction and the formed value history mode.
  • the switch-case statement contains an indirect jump instruction whose target address is changed repeatedly (wpawn, wknight), which is difficult to predict if using the traditional indirect jump predictor.
  • the "i" value (ie, the normalized case variable) appearing at the first two times t0, tl at time t2 is 1, 1 and the respective lower bits are respectively taken as Correlation information acquisition, combined to form a value history 1, 1; " ⁇ value 1 and 2 appearing at the first two times t1, t2 at time t3, respectively, respectively, the respective lower bits are acquired as associated information, and combined to form a value history 1 , 2; ... will be at time t7 for the first two times t5, t6 "value 1,
  • the value history mode is obtained from all the value histories in Table 1 above, ie all different value histories.
  • the acquired four value history patterns are shown in Table 2, and the target addresses predicted according to each value history mode are shown; where T1 represents the predicted target address wpawn, and T2 represents the predicted target address wknightaci Table 2
  • T1 represents the predicted target address wpawn
  • T2 represents the predicted target address wknightsky Table 2
  • the target address predicted based on the value history pattern is represented in Table 3, compared with the target address actually appearing in the program execution, and the result indicates that the value calendar of the present invention is used.
  • the history mode can predict the target address of an indirect jump very accurately.
  • FIG. 5 gives a more detailed method embodiment flow, including a value history update and an indirect jump prediction process.
  • the upper half of the broken line and the lower half of the broken line are shown in Fig. 4, respectively.
  • the value history update includes the following steps:
  • the processor when the boot instruction enters the transmitting phase, the processor reads the value in the corresponding register from the register file 1 according to the register number RA corresponding to the associated data value identified by the boot instruction, and uses it as the associated data. Value collection.
  • the processor determines that the subroutine structure corresponding to the instruction is a virtual function call according to the category of the indirect jump instruction, the plurality of associated information bits in the middle of the virtual function table address are shifted by the class shifter 2. Bit to the first combined position of the value history (for example, right shift to the lowest number of bits); where the number of shifts depends on the number of associated information bits in the middle of the virtual function table address.
  • the processor determines, according to the category of the indirect jump instruction, that the subroutine structure corresponding to the instruction is a Switch-case statement or a function pointer call, and shifts the corresponding associated information bit to the first combined position of the value history. The number of bits shifted depends on the number of associated information bits.
  • the previous value history shifted to the second combined position of the value history and the associated information at the first combined position of the value history are spliced into a value history mode by an "or" operation.
  • the processor uses the sum of the distance value Offset indicated by the boot command and the PC value of the command as the tag query filter table 4. If the query has no matching item, the new filter table 4 Assign an item and fill the label with the PC value as a hard-to-predict command. Otherwise, you don't have to do anything.
  • the PC value of the indirect jump instruction is used as the tag query filter table. If there is an item matching the tag in the filter table, the current jump instruction is a hard-to-predict command, otherwise the flag is normal. Jump instruction
  • the processor uses the PC value of the indirect jump instruction as a label in the fetching stage. If there is an item matching the label in the query filter table 4, the current jump instruction is considered to be a hard-to-predict instruction, otherwise it is considered to be Normal jump instruction.
  • the value history mode read from the value history mode register is different from the PC value of the instruction, and the XOR result is used as an index to read the target address mapped in the BTB;
  • Jump instruction access the BTB according to the PC value of the instruction, obtain the target address; End the process.
  • the present invention further provides an apparatus embodiment for predicting an indirect jump instruction at a program time in the embodiment of the method, and the structural block diagram thereof is as shown in FIG. 4 .
  • the boot instruction transmitting module 7 is configured to: read the associated data value of the indirect jump instruction from the register file 1 according to the boot instruction during the boot instruction transmitting phase, and output the sort shift carrying the associated data value to the sort shifter 2 Bit command
  • the register file 1 is set to: store the associated data value corresponding to the indirect jump instruction through multiple registers;
  • the class shifter 2 is set to: shift the associated information in the associated data value to the first combined position of the value history according to the class shift command, and output the shifted associated information to the value history mode register 3;
  • the value history mode register 3 is set to: after shifting the previous value history mode to the second history position of the value history, the associated information with the shift is combined into an updated value history mode;
  • the target address buffer 5 is set to: save the target address corresponding to the indirect jump instruction according to the PC value of the indirect jump instruction and the associated data value as an index.
  • the booting instruction carries the following three types of information: the first type of information is used to indicate the distance between the booting instruction and the corresponding indirect jump instruction; the second type of information is to identify the corresponding indirect jump instruction. The corresponding register number of the associated information; the third type of information is the category of the indirect jump instruction corresponding to the boot instruction, that is, the category identifying the subroutine structure corresponding to the indirect jump instruction, where:
  • the boot instruction transmitting module 7 reads the value in the corresponding register from the register file 1 according to the register number identified by the boot instruction, and carries it as the associated data value together with the category of the indirect jump instruction indicated by the boot instruction. In the classification shift command, output to the class shifter 2;
  • the classification shifter 2 shifts the association information in the associated data value carried by the command to the first combined position of the value history according to the category of the indirect jump instruction carried in the classification shift command, and outputs the shifted associated information.
  • the value history mode register 3 performs a value history mode update.
  • the above device embodiment further includes a filter table 4, wherein:
  • the boot instruction transmitting module 7 also uses the sum of the distance value indicated by the boot command and the PC value of the command as the tag query filter table 4 in the transmitting phase of the boot command. If there is no matching item, the tag is used as the PC of the hard predictable command. The value is filled in the newly assigned item of the filter table 4;
  • the instruction fetch module 6 uses the PC value of the indirect jump instruction as the tag query filter table 4 in the fetching stage. If there is an item matching the tag, the current jump instruction is marked as a hard-to-predict instruction; The value history mode read from the value history mode register 3 is different from the PC value of the instruction, and the target address stored in the target address buffer 5 is read with the exclusive OR result as an index.
  • the class shifter 2 shifts the associated information in the associated data value by 6 to the ⁇ 5:0> of the first combined position of the value history; the value history mode register 3 The previous value history mode is shifted left by 6 bits and shifted to ⁇ 19:10> at the second combination position of the value history.
  • the prediction result evaluation experiment was carried out for the above method embodiment and device embodiment.
  • the evaluation experiment results show that the present invention can effectively improve the prediction accuracy and thereby improve the processor performance.
  • the experimental environment is based on the SimpleScalar simulator and the SPEC typical evaluation assembly.
  • the base processor uses the 4K table entry and the 4-way group connected BTB structure to implement indirect jump prediction.
  • the basic parameters are shown in Table 4.
  • LI ICache 16KB 4-way set associative, 64 bytes per line, 1 cycle access delay
  • LI DCache 16KB 4-way set associative, 64 bytes per line, 1 cycle access delay
  • L2 Cache 512KB, 8-way set associative, 64 bytes per line, 10-cycle access delay
  • the main memory access latency 150-cycle average access latency evaluation assembly includes 5 SPEC CPU2000 typical programs (perlbmk, gap, gcc00, crafty, eon on the abscissa in Figure 8), and 3 SPEC CPU2006 typical programs (Figure 8 Perlbench, gcc06, sjeng on the abscissa) and two C++ typical programs (richards, ixx on the abscissa in Figure 8).
  • Figure 8 shows the performance of the indirect jump prediction technique of an embodiment of the present invention.
  • ORIG represents the basic processor
  • VBBI Value Based BTB Indexing
  • VHC Value History Classification, VHC for short
  • the prediction technique proposed by the present invention increases the performance by an average of 19% with respect to the base processor; the prediction technique of the embodiment of the present invention increases the performance by an average of 4.3% with respect to the VBBI predictor.

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Advance Control (AREA)
  • Devices For Executing Special Programs (AREA)

Abstract

本发明披露了一种实现值关联间接跳转预测的方法,涉及编译器和处理器,该方法包括:编译器根据对处理器执行可执行程序时获取的剖视信息,识别源程序中间接跳转指令对应的子程序结构及其关联数据值中的关联信息,并在该源程序中插入用以标识该关联信息的引导指令,再次生成可执行程序;处理器在执行编译器再次生成的可执行程序过程中,根据引导指令动态收集关联信息,并生成值历史模式。本发明可有效地提高间接跳转指令的预测准确率,从而可提高处理器及其应用整体的系统性能。

Description

一种实现值关联间接跳转预测的方法
技术领域
本发明属于微处理器设计和应用微处理器的系统设计领域, 尤其涉及现 代处理器实现值关联间接跳转预测的方法。
背景技术
在现代处理器中, 推测式执行是挖掘指令级并行性的重要手段之一。 为 了提高推测式执行的效率, 减少错误的推测执行, 准确的转移预测技术是至 关重要的。
在现代的宽发射、 深度流水的超标量处理器体系结构中, 釆用转移预测 的主要目的是为了提高 CPU的运算速度。推测执行是依托于转移预测基础之 上的, 即如果 CPU能够在流水线执行的前条指令结果出来之前能够预测到是 否程序转移, 就可以提前执行相应的指令, 由此可避免处理器中流水线的空 闲等待, 从而提高 CPU的运行速度。 另一方面, 如果前一指令结果出来后证 明转移预测错误, 则必须将已装入流水线执行的指令和结果全部清除, 然后 再将正确的指令装入流水线重新处理, 从而降低了处理器性能。
由此可见, 准确的转移指令预测技术能够为处理器提供连续的指令流输 继续执行而造成的处理器时钟周期的浪费; 而一旦发生转移指令误预测, 处 理器推测执行的错误步骤上的几十条甚至上百条指令将被丟弃, 所有推测执 行的工作将被取消, 又会造成处理器时钟周期的浪费。 因此, 提高转移指令 预测准确度是转移指令预测设计和应用的关键目标, 对现代处理器挖掘指令 级并行性具有重要意义。
转移指令根据转移的目标特性可以分为有条件转移与无条件转移, 或者 分为直接转移与间接转移。 其中有条件直接转移 (简称 "条件转移" )和无 条件间接转移 (简称 "间接跳转" )是两类最重要的转移指令。 条件转移指 令通常与转移历史具有较强的关联性, 因此基于历史的条件转移预测器能够 达到较高的预测准确率; 而间接跳转指令因具有多个目标地址而难以准确预 测。
间接跳转指令被广泛应用于现代面向对象程序和虚拟机解释器程序中, 由此而引发了大量间接跳转预测失效。 据统计, 约 45%的转移预测失效是由 间接跳转指令所引起的, 因此, 间接跳转指令的预测失效问题是影响现代处 理器性能的重要因素之一, 如何设计高效准确的间接跳转预测器是目前处理 器设计中面临的一个难题。
在现有的间接跳转预测器中, 关联预测器是目前使用最为广泛的一类预 测器。 为了提高关联预测器的预测准确率, 先后有基于历史、 基于数据值和 基于提前计算等多种间接跳转预测器被提出。 其中, 基于历史的间接跳转预 测器使用跳转方向历史和执行路径历史等信息, 指导间接跳转目标地址的预 测; 基于数据值的间接跳转预测器使用某些数据值指导间接跳转预测; 而基 于提前计算的间接跳转预测器则使用特殊硬件对虚函数调用这一类特殊的间 接跳转指令提前计算其跳转目标地址。
目前, 最新研究表明, 由于间接跳转指令与某些数据值必然存在较强的 关联性, 因此基于数据值的间接跳转预测器相对来说通常能够取得较高的预 测准确率。 尽管如此, 现有的基于数据值的间接跳转预测器存在两个难以解 决的问题: 首先是寻找有效的关联数据值是非常困难的; 其次, 这些关联数 据值很可能在间接跳转预测时是无法得到的。 这两个问题导致了现有的基于 数据值的间接跳转预测器难以取得理想的预测结果。
发明内容
本发明所要解决的技术问题是提供一种实现值关联间接跳转预测的方 法, 能够根据间接跳转指令的有效关联信息准确地进行间接跳转预测。
为了解决上述技术问题, 本发明提供了一种实现值关联间接跳转预测的 方法, 涉及编译器和处理器, 该方法包括:
编译器根据对处理器执行可执行程序时获取的剖视信息, 识别源程序中 间接跳转指令对应的子程序结构及其关联数据值中的关联信息, 并在该源程 序中插入用以标识该关联信息的引导指令, 再次生成可执行程序; 处理器在执行编译器再次生成的可执行程序过程中, 根据引导指令动态 收集关联信息, 并生成值历史模式。
可选的, 编译器获取的剖视信息, 包括间接跳转指令的执行次数、 动态 跳转目标数量以及目标地址预测失效次数中的一种或多种; 根据剖视信息识 别源程序中间接跳转指令对应的子程序结构及其关联数据值的步骤包括: 选取执行次数多于次数阔值和 /或预测失效率高于失效率阔值的间接跳 转指令为难预测指令;
识别难预测指令对应的所述子程序结构, 包括虚函数调用、 Switch-case 语句以及函数指针调用中的一种或多种子程序; 其中:
对于所述虚函数调用子程序, 将虚函数表地址中间的一个或多个比特信 息识别为关联信息;
对于 Switch-case语句子程序,将标准化 case变量值中的低位比特信息识 别为关联信息;
对于函数指针调用子程序, 将函数指针值中非对齐开始的一个或多个比 特识别为关联信息。
可选的, 编译器在源程序中插入用以标识该关联信息的引导指令, 再次 生成可执行程序的步骤包括:
编译器通过对源程序过程间的控制流进行分析, 在控制流的路径上显式 地插入引导指令, 该引导指令中携带的信息包括: 用于指示引导指令与难预 测指令之间的距离值, 用于标识对应于难预测指令的关联数据值的寄存器编 号, 用于表示对应的子程序结构类别的难预测指令的类别。
可选的, 该方法还包括:
编译器根据源程序过程间的数据依赖关系对弓 )导指令进行调度, 以增大 该引导指令和相应的难预测指令之间的距离。
可选的, 该方法还涉及一寄存器堆、 一值历史模式寄存器以及一目标地 址緩冲区; 其中, 在引导指令的发射阶段, 处理器在执行编译器再次生成的 可执行程序过程中, 根据引导指令动态收集关联信息, 并生成值历史模式的 步骤包括:
根据引导指令表明的寄存器编号读取寄存器堆中相应寄存器的值作为关 联数据值收集;
根据引导指令表明的难预测指令的类别, 将收集的关联数据值中的关联 信息移位到值历史第一组合位置上;
将值历史模式寄存器中的前一值历史移位到值历史第二组合位置, 并与 移位到值历史第一组合位置上的关联信息进行组合, 拼接成值历史模式。
可选的,
针对难预测指令的类别, 将收集的关联数据值中的关联信息移位到值历 史第一组合位置上, 或者将值历史模式寄存器中的前一值历史移位到值历史 第二组合位置, 均釆用固定移位位数, 并通过实验确定最佳的固定移位位数。
可选的, 该方法还包括:
在间接跳转指令的取指阶段, 根据难预测指令的程序计数器 PC值和生 成的值历史模式预测难预测指令的目标地址。
可选的, 该方法还涉及一过滤表, 在引导指令的发射阶段, 还包括: 将该引导指令表明的距离值与该引导指令的程序计数器 PC值之和作为 标签若查询该过滤表中没有匹配的项, 则将该标签作为难预测指令的 PC值 填入该过滤表内新分配的项中。
可选的, 根据难预测指令的 PC值和生成的值历史模式预测该难预测指 令的目标地址的步骤包括:
将间接跳转指令的 PC值作为标签若查询所述过滤表中有与该标签匹配 的项, 则标记当前间接跳转指令是难预测指令;
针对该难预测指令, 将从值历史模式寄存器读取的值历史模式与该难预 测指令的 PC值进行异或操作, 并以异或操作的结果作为索引读取保存在目 标地址緩冲区中的目标地址, 进行下一周期指令的取指和执行过程。
通过上述方法, 编译器在编译时刻识别的典型子程序结构及其有效的关 联数据值, 并通过插入引导指令传递给处理器, 使得处理器在执行程序过程 中根据动态收集的多个关联数据值形成值历史模式, 并作为有效的关联信息 预测间接跳转指令, 因而可有效地提高间接跳转指令的预测准确率, 从而可 提高处理器及其应用整体的系统性能。
将上述方法应用于现代超标量处理器中, 通过实验结果表明能够有效地 提高应用处理器系统的整体性能。 附图概述
图 1是本发明的实现值关联间接跳转预测的方法实施例流程图; 图 2是图 1所示的方法实施例中编译器在第二次编译中识别关联数据值 实施例的流程图;
图 3是图 1所示的方法实施例中处理器根据引导指令收集的关联数据值 形成值历史模式并预测间接跳转指令的方法实施例的流程图;
图 4是在图 3所示的方法实施例中使用的值关联间接跳转预测装置实施 例的结构框图;
图 5是对图 3所示的方法实施例流程的操作进一步细述的流程; 图 6是图 4所示的装置实施例中分类移位器 2对关联数据值进行分类移 位的示意图;
图 7是图 4所示的装置实施例中值历史模式寄存器 3进行值历史模式更 新的示意图;
图 8是说明本发明方法及装置的预测结果的评测实验结果数据示意图。
本发明的较佳实施方式 例举的实施例仅用于说明和解释本发明,而不构成对本发明技术方案的限制。
本发明实施方式釆用软硬件协同的技术方案, 提出了一种基于编译指导 的值关联间接跳转预测的方法及系统, 其核心思想是编译器通过对典型的子 程序结构中所包含的间接跳转指令的特点分析, 找到不同的子程序结构中的 间接跳转指令应该关联到的不同关联数据值, 并通过在程序中插入引导指令 对关联数据值进行标记, 以指导处理器在执行可执行程序时进行的间接跳转 预测。
本发明实施方式基于上述思想设计了一种根据子程序结构自动为难预测 的间接跳转指令识别不同关联值的编译方法, 并设计了一种能将编译时识别 的关联数据值传递给处理器的引导指令。 编译器根据子程序结构识别关联值 后显式地插入引导指令, 以便将编译时刻识别的关联信息传递给处理器。
如图 1所示, 是本发明提供的实现值关联间接跳转预测的方法一实施例 的流程, 包括编译器在编译时刻执行的流程和处理器在程序运行时刻执行的 流程, 分别包括如下步骤:
10: 编译器通过首次编译将源程序编译成可执行程序;
20: 编译器对处理器执行可执行程序过程进行剖视, 获取剖视信息; 编译器对处理器在执行可执行程序过程中的间接跳转指令进行剖视, 搜 集程序间接跳转指令在典型输入集情况下的剖视(profiling )信息, 主要包括 间接跳转指令的执行次数、 动态跳转目标数量以及目标地址预测失效次数中 的一种或多种。
30: 编译器根据剖视信息进行二次编译, 为间接跳转指令识别源程序中 的子程序结构及其关联信息, 并在编译过程中插入引导指令;
其中, 编译器识别源程序中的间接跳转指令所对应的子程序结构, 主要 包括虚函数调用、 Switch-case语句、 函数指针调用中的一种或多种; 根据相 应的子程序结构识别与间接跳转指令跳转目标具有较强关联性的信息; 通过 过程间控制流分析, 将引导指令显式插入到程序中, 以标识该间接跳转指令 所对应的关联信息。
本发明实施方式为实现值关联间接跳转预测而设置的 "引导指令" , 是 通过扩展指令系统而添加的一种特殊指令 (是使用处理器的用户所看不到的 指令) , 该指令携带以下三类信息:
第一类信息用于指示本引导指令与相应的间接跳转指令的距离, 表示这 种距离的数值可以为正值或者负值, 它由引导指令和间接跳转指令的相对顺 序决定;
第二类信息则是标识间接跳转指令所对应的关联信息相应的寄存器编 号;
第三类信息是表示引导指令所对应的间接跳转指令的类别 , 即表示该间 接跳转指令相应的子程序结构的类别。
以上三类信息均是由编码器通过直接编码而包含到引导指令之中; 在由 处理器执行程序过程中对引导指令进行译码时即可获得这三类信息。
引导指令的具体格式可以根据处理器的指令系统的特点而定制。
40: 再次生成可执行程序;
50: 处理器在执行可执行程序过程中根据引导指令动态收集关联信息, 并形成值历史模式;
值历史模式是一种将多个关联数据值中相应的关联信息组合形成的复合 体信息。 它借鉴了基于转移历史的间接跳转预测器的思想, 但它与基于转移 历史的间接跳转预测器所不同的是, 其形成历史模式使用的是关联数据值中 的信息而不是使用转移历史。
假设间接跳转指令的关联数据值依次为 1, 2, 3, 1, 2, 3 , 若值历史模 式由 2个关联数据值中各自的关联信息比特组合而成, 则该间接跳转指令所 对应的值历史为(1,2)、 (2,3)、 (3,1)、 (1,2)、 (2,3)、 其值历史模式为(1,2)、 (2,3)、 (3,1); 若值历史模式由 3个关联数据值中各自的关联信息比特组合而 成, 则该间接跳转指令所对应的值历史为(1,2,3)、(2,3,1)、(3,1,2)、(1,2,3)、 其值历史模式为(1,2,3)、 (2,3,1)、 (3,1,2)。 值历史模式反映了关联数据值出现 的规律, 与间接跳转目标地址具有较强的关联性, 因此可用于引导间接跳转 预测。
60: 根据值历史模式对间接跳转指令进行预测。
处理器将根据收集的关联信息所形成的值历史模式作为有效的关联信息 对间接跳转指令进行预测。 图 1中所示的步骤 30, 即编译器在进行二次编译过程中, 为间接跳转指 令识别源程序中的子程序结构及其关联数据值, 并将其插入引导指令, 具体 的流程表示在图 2中, 包括如下步骤:
301 : 编译器根据源程序及其典型输入集剖视处理器执行的程序, 搜集难 预测指令;
编译器根据在剖视过程中获取的剖视信息选取执行次数多于次数阔值和 预测失效率高于失效率阔值的间接跳转指令作为 "难预测指令" 。
302: 根据源程序中的子程序结构为每个难预测指令识别关联信息; 编译器在二次编译过程中对那些 "难预测指令" 进行特殊处理, 包括: ( 1 )识别 "难预测指令" 对应的子程序结构;
这些子程序结构是一种局部控制流和数据依赖结构, 即包括上述虚函数 调用、 Switch-case语句、 函数指针调用中的一种或多种。 由于这些子程序结 构携带了源程序级别的控制流和数据流信息, 因此能够更清楚地表明哪些数 据值与间接跳转指令关联性较强, 以及如何使用这些关联性强的信息。
( 2 )根据相应的子程序结构识别与间接跳转指令跳转目标具有较强关联 性的信息; 其中:
a、对于虚函数调用, 将虚函数表地址中间的一个或多个比特信息识别为 形成对应的值历史信息的关联信息。
虚函数调用子程序是一种为实现面向对象程序中 "多态性" 特征而设计 的一种特殊函数调用。 该 "多态性" 是指发出同样的消息被不同类别的对象 接收时, 有可能导致完全不同的行为, 因此虚函数调用目标地址由对象的具 体类别动态确定。应用于虚函数调用的间接跳转指令通常需要进行三个过程, 即: 获取对象地址、 获取虚函数表地址以及间接跳转。 根据虚函数调用的语 义特征, 可以发现虚函数表与间接跳转指令具有强烈关联性, 并且其对应的 值历史信息应该包含虚函数表地址中间的一个或多个比特信息。
b、 对于 Switch-case语句, 将标准化 case变量值的低位比特信息识别为 形成对应的值历史信息的关联信息。
Switch-case语句是一种根据 case变量值动态选择分支路径执行的控制流 结构, 被广泛用于 C/C++/C#/Java等现代高级编程语言中。 通常, 当分支路 径数目大于一定阔值时,编译器会使用间接跳转指令来实现 Switch-case语句, 否则使用 if-else结构来实现。 当使用间接跳转指令实现时, 其具体过程为首 先对 case变量进行标准化使其成为从 0开始的相邻枚举变量, 然后使用标准 化 case变量值作为索引获得相应目标地址, 并使用间接跳转指令跳转到相应 分支路径。 标准化 case变量值与间接跳转指令具有强烈的关联性, 并且其对 应的值历史模式应包含标准化 case变量值的低位比特信息。
c、对于函数指针调用, 将函数指针值中非对齐开始的一个或多个比特识 别为形成对应的值历史信息的关联信息。
函数指针调用主要用于根据函数指针内容跳转到相应的目标地址。 故函 数指针值与间接跳转指令具有强烈关联性, 并且其对应的值历史模式应该包 含函数值指针值中非对齐开始的一个或多个比特信息。
303: 在源程序的控制流路径上显式地插入引导指令, 以标识关联信息; 编译器通过对源程序的控制流进行分析, 在该控制流的路径上显式地插 入引导指令, 以标识相应的间接跳转指令所对应的形成值历史信息的关联信 息。
编译器在每一个控制路径上插入引导指令, 可跟踪多条控制流路径, 所 以每个间接跳转指令可能会对应多个引导指令。
304: 根据程序过程间数据依赖关系对引导指令进行调度, 以增大引导指 令和间接跳转指令间的距离。
可执行程序中的指令之间是存在数据依赖关系的, 譬如一个寄存器中的 值依赖于另外一个寄存器的值, 或依赖于对多个寄存器的值运算的结果, 或 依赖于一个标号地址中的值。 编译器对引导指令的调度主要是根据指令之间 的这种数据依赖关系, 在不影响程序正确性的前提下动态调度引导指令以及 引导指令的前驱指令和后继指令, 以增大引导指令和间接跳转指令之间的距 离, 从而使得处理器能够及时通过引导指令传递的关联数据值对间接跳转指 令目标地址进行预测。
具体的指令调度算法是基于传统 "表调度" 算法 (参考 Compilers: Principles, Techniques, & Tools, second Edition))中 10.3.2节 )实现的, 包含如 下步骤:
• 根据程序中指令之间的数据依赖关系建立数据依赖图;
• 在数据依赖图中标记引导指令, 以及引导指令所依赖的所有其它指 令;
• 修改这些被标记指令的调度优先级,使得它们的优先级最高, 以便尽 可能早地完成这些被标记指令的调度。
通过修改调度优先级的方法, 引导指令就可以尽早完成调度, 在引导指 令和间接跳转指令之间留下尽可能多的指令, 从而增大了引导指令和间接跳 转指令之间的距离。
在图 1所示的方法实施例的步骤 50中,处理器根据引导指令动态收集关 联信息, 并形成值历史模式, 以及根据形成的值历史模式预测间接跳转指令, 其方法实施例的具体流程表示在图 3中, 包括如下步骤:
510: 在引导指令的发射阶段, 根据引导指令从寄存器堆动态收集关联数 据值, 并针对间接跳转指令的类别移位处理关联数据值, 获取相应的关联信 息;
520: 将获取的关联信息与前一值历史组合更新成当前值历史, 形成值历 史模式; 将引导指令所对应的难预测转指令的 PC值写入过滤表中;
530: 在间接跳转指令的取指阶段, 根据难预测指令的 PC值和形成的值 历史模式预测保存在目标地址緩冲区内的目标地址。
譬如下面中给出了 SPEC CPU 2006程序集中的 458.sjeng程序片段: void std eval (void)
for j=l , a=l ; a<=piece— count; j++) {
i = pieces[j]; switch (board [i]) case (wpawn): case (wknight): . . .
其中的 switch-case语句中包含一个间接跳转指令, 其目标地址反复变化 ( wpawn, wknight ) , 如果用传统的间接跳转预测器则难以预测。
处理器执行该 458.sjeng程序的可执行程序中,将在时刻 t2针对在前两个 时刻 t0、 tl 出现的 "i" 值(即标准化 case变量)为 1、 1 , 分别将各自的低 位作为关联信息获取, 组合形成值历史 1 ,1 ; 将在时刻 t3针对在前两个时刻 tl、 t2出现的 "Γ 值为 1、 2 , 分别将各自的低位作为关联信息获取, 组合形 成值历史 1 ,2; ... ...将在时刻 t7针对在前两个时刻 t5、 t6出现的 " 值为 1、
2 , 分别将各自的低位作为关联信息获取, 组合形成值历史 1 ,2。
处理器在 t0~t7 时刻根据前两个时刻的 "z" 值中处于低位的关联信息形 成的值历史, 表示在表 1中。
表 1
Figure imgf000013_0001
从上述表 1 中所有的值历史可获取值历史模式, 即所有不同的值历史。 表 2中表示了获取的四个值历史模式, 并表示出根据每一值历史模式预测的 目标地址; 其中 T1 表示预测的目标地址 wpawn, T2表示预测的目标地址 wknight„ 表 2
Figure imgf000013_0002
针对表 1 中的每一值历史, 将根据值历史模式预测的目标地址表示在表 3 中, 与程序执行实际出现的目标地址相比较, 结果表明使用本发明的值历 史模式作为有效的关联信息, 可以非常准确地预测间接跳转的目标地址。
表 3
Figure imgf000014_0001
为了更清楚地解释图 3所示的处理器预测间接跳转指令的方法实施例的 各个步骤, 图 5给予了更加详细的方法实施例流程, 包括值历史更新和间接 跳转预测两个过程, 分别如图 4所示虚线上半部和虚线下半部所示。
其中, 值历史更新包括如下步骤:
511 : 在引导指令的发射阶段, 根据引导指令表明的寄存器编号读取寄存 器堆中相应寄存器的值作为关联数据值收集;
请参照图 4 , 当引导指令进入发射阶段时, 处理器根据引导指令所标识 的关联数据值对应的寄存器编号 RA,从寄存器堆 1中读取相应的寄存器中的 值, 并将其作为关联数据值收集。
512:根据引导指令表明的间接跳转指令的类别对收集的关联数据值进行 移位, 将其中的关联信息移位到值历史第一组合位置上;
请参照图 6 , 譬如处理器根据间接跳转指令的类别若判断该指令所对应 的子程序结构为虚函数调用, 则通过分类移位器 2将虚函数表地址中间的多 个关联信息比特移位到值历史第一组合位置 (譬如右移到最低几位) ; 其中 移位的位数取决于虚函数表地址中间的关联信息比特数。
处理器根据间接跳转指令的类别若判断该指令所对应的子程序结构为 Switch-case语句或函数指针调用, 将对应的关联信息比特移位到值历史第一 组合位置上的方法类似上述方法,移位的位数取决于对应的关联信息比特数。
521 : 将值历史模式寄存器中的前一值历史移位到值历史第二组合位置, 与移位到值历史第一组合位置上的关联信息组合拼接, 形成值历史模式; 如图 7所示, 表示了将移位到值历史第一组合位置上的关联信息与前一 值历史(初始时刻为初始的值历史)拼接成值历史模式的过程, 请参照图 4: 首先在值历史模式寄存器 3中将前一值历史移位到值历史第二组合位置 (譬如移位到最高的几位) , 移位的位数取决于处于值历史第一组合位置上 的关联信息的位数;
将移位到值历史第二组合位置上的前一值历史与处于值历史第一组合位 置上的关联信息通过 "或" 操作, 拼接成值历史模式。
522: 将引导指令表明的距离值与该指令的 PC值之和作为标签查询过滤 表, 若没有匹配的项, 则将该标签作为难预测指令的 PC值填入过滤表新分 配的项中;
请参照图 4 , 在引导指令的发射阶段, 处理器将引导指令表明的距离值 Offset与该指令的 PC值之和作为标签查询过滤表 4, 若查询没有匹配的项, 则在过滤表 4新分配一项,并将该标签作为难预测指令的 PC值填入该新分配 中, 否则则不必进行操作。
531 : 在取指阶段, 将间接跳转指令的 PC值作为标签查询过滤表, 若过 滤表中有与该标签匹配的项, 则标记为当前跳转指令是难预测指令, 否则标 记为是普通跳转指令;
请参照图 4, 处理器在取指阶段将间接跳转指令的 PC值作为标签, 若查 询过滤表 4中有与该标签匹配的项, 则认为当前跳转指令是难预测指令, 否 则认为是普通跳转指令。
532、 533 : 对于难预测指令, 将从值历史模式寄存器读取的值历史模式 与该指令的 PC值相异或, 并以异或结果作为索引读取映射在 BTB中的目标 地址; 对于普通跳转指令, 按该指令的 PC值访问 BTB, 获取目标地址; 结 束流程。
针对上述实现值关联间接跳转预测方法实施例, 本发明相应地还提供了 一种在该方法实施例中处理器在程序时刻预测间接跳转指令的装置实施例, 其结构框图如图 4所示, 包括寄存器堆 1、 分类移位器 2、 值历史模式寄存器 3、 目标地址緩冲区 5、 指令取指模块 6以及引导指令发射模块 7 , 其中: 指令取指模块 6 , 设置为: 在指令的取指时刻将接收的引导指令输出给 引导指令发射模块 7;根据间接跳转指令的 PC值和值历史模式寄存器 3更新 的值历史模式预测保存在目标地址緩冲区 5中相应的目标地址;
引导指令发射模块 7 , 设置为: 在引导指令发射阶段根据该引导指令从 寄存器堆 1读取收集间接跳转指令的关联数据值, 并向分类移位器 2输出携 带该关联数据值的分类移位命令;
寄存器堆 1 , 设置为: 通过多个寄存器存放间接跳转指令对应的关联数 据值;
分类移位器 2 , 设置为: 根据分类移位命令将关联数据值中的关联信息 移位到值历史第一组合位置, 并将移位的关联信息输出给值历史模式寄存器 3;
值历史模式寄存器 3 , 设置为: 将前一值历史模式移位到值历史第二组 合位置后, 与移位的关联信息组合成更新的值历史模式;
目标地址緩冲区 5 , 设置为: 根据间接跳转指令的 PC值和所述关联数据 值作为索引, 保存所述间接跳转指令所对应的目标地址。
在上述装置实施例中, 引导指令中携带有以下三种信息: 第一类信息用 于指示本引导指令与相应的间接跳转指令的距离; 第二类信息是标识间接跳 转指令所对应的关联信息相应的寄存器编号; 第三类信息是标识引导指令所 对应的间接跳转指令的类别, 即标识该间接跳转指令相应的子程序结构的类 别, 其中:
引导指令发射模块 7根据引导指令所标识的寄存器编号, 从寄存器堆 1 中读取相应的寄存器中的值, 并将其作为关联数据值与引导指令所表示的间 接跳转指令的类别一起携带在分类移位命令中, 输出给分类移位器 2;
分类移位器 2根据分类移位命令中携带的间接跳转指令的类别, 将该命 令携带的关联数据值中的关联信息移位到值历史第一组合位置, 并输出移位 的关联信息, 待值历史模式寄存器 3进行值历史模式更新。 上述装置实施例还包括过滤表 4, 其中:
引导指令发射模块 7在引导指令的发射阶段还将引导指令表明的距离值 与该指令的 PC值之和作为标签查询过滤表 4, 若没有匹配的项, 则将该标签 作为难预测指令的 PC值填入过滤表 4新分配的项中;
指令取指模块 6在取指阶段将间接跳转指令的 PC值作为标签查询过滤 表 4 , 若有与该标签匹配的项, 则标记当前跳转指令是难预测指令; 针对该 难预测指令将从值历史模式寄存器 3读取的值历史模式与该指令的 PC值相 异或, 并以异或结果作为索引读取保存在目标地址緩冲区 5中的目标地址。
在图 4所示的装置实施例中, 分类移位器 2是将关联数据值中的关联信 息右移 6位移到值历史第一组合位置上 <5:0>; 值历史模式寄存器 3则将前一 值历史模式左移 6位, 移位到值历史第二组合位置上<19:10>。
实际上, 无论是分类移位器 2右移关联数据值的位数, 还是值历史模式 寄存器 3将前一值历史模式左移的位数, 均取决于间接跳转指令所对应的关 联数据值中的关联信息有效位的位数。 但是, 这样做起来会使得处理器的结 构非常复杂, 故考虑到此, 本发明实施例针对间接跳转指令的三类子程序结 构釆用固定的移位位数, 并通过实验获取最佳的移位位数。
针对上述方法实施例和装置实施例进行了预测结果评测实验。 评测实验 结果表明, 本发明能够有效地提高预测准确率, 从而提高处理器性能。 实验 环境基于 SimpleScalar模拟器和 SPEC典型评测程序集。 基础处理器使用 4K 表项、 4路组相连 BTB结构实现间接跳转预测, 基本参数如表 4所示。
表 4.基础处理器配置参数
Figure imgf000017_0001
LI ICache 16KB, 4路组相联, 每行 64字节, 1周期访问延时
LI DCache 16KB, 4路组相联, 每行 64字节, 1周期访问延时
L2 Cache 512KB, 8路组相联, 每行 64字节, 10周期访问延时
主存访问延时 150周期平均访问延时 评测程序集包括 5 个 SPEC CPU2000 典型程序 (图 8 中横坐标上的 perlbmk, gap, gcc00、 crafty、 eon ) , 3个 SPEC CPU2006典型程序 (图 8 中横坐标上的 perlbench、 gcc06、 sjeng )和 2个 C++典型程序(图 8中横坐标 上的 richards、 ixx ) 。
在 SPEC CPU2000和 SPEC CPU2006评测程序集中, 仅选择间接跳转误 预测性能损失大于 5%的典型程序。另外两个 C++典型程序为 Richards和 ixx, 其中 Richards是一个模拟的操作系统内核任务调度器, ixx是一个将 IDL源程 序转化为 C++程序的转化器,它们反映了面向对象程序的间接跳转指令行为, 被广泛用于间接跳转评测。对每个评测程序,使用 SimPoint工具选取由 100M 条指令构成的代表程序片段来实际运行。编译环境基于开源编译器 GCC-4.1 , 主要添加子程序结构分析、 关联数据值识别和标记等编译过程。 剖视过程基 于常规 BTB预测器完成, 即: 选取执行次数多于次数阔值和预测失效率高于 失效率阔值的间接跳转指令作为 "难预测指令" 。
图 8给出了本发明实施例的间接跳转预测技术的性能。 其中, ORIG表 示基础处理器, VBBI ( Value Based BTB Indexing, 简称 VBBI )预测器是 2010 年由 Farooq 等人在论文 《 Value Based BTB Indexing for Indirect Jump ΡΓ£?ί ζ^ο«》中提出的最新的、预测性能最好的间接跳转预测技术, VHC( Value History Classification, 简称 VHC )表示本发明所提出的间接跳转预测技术。 从图 8中可以看出, 相对于基础处理器, 本发明所提出的预测技术平均将性 能提高 19%; 相对于 VBBI预测器, 本发明实施例的预测技术平均将性能提 高 4.3%。
本领域普通技术人员可以理解上述方法中的全部或部分步骤可通过程序 来指令相关硬件完成, 所述程序可以存储于计算机可读存储介质中, 如只读 存储器、 磁盘或光盘等。 可选地, 上述实施例的全部或部分步骤也可以使用 一个或多个集成电路来实现, 相应地, 上述实施例中的各模块 /单元可以釆用 硬件的形式实现, 也可以釆用软件功能模块的形式实现。 本发明不限制于任 何特定形式的硬件和软件的结合。
以上所述仅为本发明的优选实施例而已, 并不用于限制本发明, 对于本 领域的技术人员来说, 本发明可以有各种更改和变化。 凡在本发明的精神和 原则之内, 所作的任何修改、 等同替换、 改进等, 均应包含在本发明的保护 范围之内。
工业实用性
上述实施方式应用于现代超标量处理器中, 实验结果表明能够有效地提 高应用处理器系统的整体性能。

Claims

权 利 要 求 书
1、 一种实现值关联间接跳转预测的方法, 涉及编译器和处理器, 该方法 包括:
编译器根据对处理器执行可执行程序时获取的剖视信息, 识别源程序中 间接跳转指令对应的子程序结构及其关联数据值中的关联信息, 并在所述源 程序中插入用以标识所述关联信息的引导指令, 再次生成可执行程序;
处理器在执行编译器再次生成的可执行程序过程中, 根据所述引导指令 动态收集关联信息, 并生成值历史模式。
2、 按照权利要求 1所述的方法, 其中, 所述编译器获取的剖视信息, 包 括间接跳转指令的执行次数、 动态跳转目标数量以及目标地址预测失效次数 中的一种或多种; 根据所述剖视信息识别源程序中间接跳转指令对应的子程 序结构及其关联数据值的步骤包括:
选取执行次数多于次数阔值和 /或预测失效率高于失效率阔值的所述间 接跳转指令为难预测指令;
识别所述难预测指令对应的所述子程序结构, 包括虚函数调用、
Switch-case语句以及函数指针调用中的一种或多种子程序; 其中:
对于所述虚函数调用子程序, 将虚函数表地址中间的一个或多个比特信 息识别为所述关联信息;
对于所述 Switch-case语句子程序,将标准化 case变量值中的低位比特信 息识别为所述关联信息;
对于所述函数指针调用子程序, 将函数指针值中非对齐开始的一个或多 个比特识别为所述关联信息。
3、 按照权利要求 2所述的方法, 其中, 所述编译器在所述源程序中插入 用以标识所述关联信息的引导指令, 再次生成可执行程序的步骤包括:
所述编译器通过对所述源程序过程间的控制流进行分析, 在所述控制流 的路径上显式地插入所述引导指令, 所述引导指令中携带的信息包括: 用于 指示所述引导指令与所述难预测指令之间的距离值, 用于标识对应于所述难 预测指令的所述关联数据值的寄存器编号, 用于表示对应的子程序结构类别 的所述难预测指令的类别。
4、 按照权利要求 3所述的方法, 其还包括:
所述编译器根据所述源程序过程间的数据依赖关系对所述引导指令进行
5、 按照权利要求 3所述的方法, 其还涉及一寄存器堆、 一值历史模式寄 存器以及一目标地址緩冲区; 其中, 在所述引导指令的发射阶段, 所述处理 器在执行编译器再次生成的可执行程序过程中, 根据所述引导指令动态收集 关联信息, 并生成值历史模式的步骤包括:
根据所述引导指令表明的寄存器编号读取所述寄存器堆中相应寄存器的 值作为关联数据值收集;
根据所述引导指令表明的所述难预测指令的类别, 将收集的关联数据值 中的关联信息移位到值历史第一组合位置上;
将所述值历史模式寄存器中的前一值历史移位到值历史第二组合位置, 并与移位到值历史第一组合位置上的关联信息进行组合, 拼接成所述值历史 模式。
6、 按照权利要求 5所述的方法, 其中,
针对所述难预测指令的类别, 将收集的关联数据值中的关联信息移位到 值历史第一组合位置上, 或者将所述值历史模式寄存器中的前一值历史移位 到值历史第二组合位置, 均釆用固定移位位数, 并通过实验确定最佳的固定 移位位数。
7、 按照权利要求 5或 6所述的方法, 其还包括:
在间接跳转指令的取指阶段, 根据所述难预测指令的程序计数器(PC ) 值和生成的所述值历史模式预测所述难预测指令的目标地址。
8、 按照权利要求 7所述的方法, 其还涉及一过滤表, 在所述引导指令的 发射阶段, 所述方法还包括:
将该引导指令表明的距离值与该引导指令的 PC值之和作为标签若查询 所述过滤表中没有匹配的项, 则将该标签作为难预测指令的 PC值填入所述 过滤表内新分配的项中。
9、 按照权利要求 8所述的方法, 其中, 根据所述难预测指令的 PC值和 生成的所述值历史模式预测该难预测指令的目标地址的步骤包括:
将所述间接跳转指令的 PC值作为标签若查询所述过滤表中有与该标签 匹配的项, 则标记当前间接跳转指令是难预测指令;
针对所述难预测指令, 将从所述值历史模式寄存器读取的值历史模式与 该难预测指令的 PC值进行异或操作, 并以所述异或操作的结果作为索引读 取保存在所述目标地址緩冲区中的目标地址, 进行下一周期指令的取指和执 行过程。
PCT/CN2011/080247 2011-04-28 2011-09-27 一种实现值关联间接跳转预测的方法 WO2012145992A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN 201110108052 CN102163143B (zh) 2011-04-28 2011-04-28 一种实现值关联间接跳转预测的方法
CN201110108052.7 2011-04-28

Publications (1)

Publication Number Publication Date
WO2012145992A1 true WO2012145992A1 (zh) 2012-11-01

Family

ID=44464386

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2011/080247 WO2012145992A1 (zh) 2011-04-28 2011-09-27 一种实现值关联间接跳转预测的方法

Country Status (2)

Country Link
CN (1) CN102163143B (zh)
WO (1) WO2012145992A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105808251A (zh) * 2016-03-03 2016-07-27 武汉斗鱼网络科技有限公司 一种基于虚函数表劫持绕过安全检测的方法与系统
US9880842B2 (en) 2013-03-15 2018-01-30 Intel Corporation Using control flow data structures to direct and track instruction execution

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102163143B (zh) * 2011-04-28 2013-05-01 北京北大众志微系统科技有限责任公司 一种实现值关联间接跳转预测的方法
CN102156636B (zh) * 2011-04-28 2013-05-01 北京北大众志微系统科技有限责任公司 一种实现值关联间接跳转预测的装置
CN102799523B (zh) * 2012-07-03 2015-06-17 华为技术有限公司 动态探测程序执行路径的方法、装置和计算机系统
CN103679041B (zh) * 2012-09-06 2016-11-23 中天安泰(北京)信息技术有限公司 数据安全读取方法及装置
CN103679040B (zh) * 2012-09-06 2016-09-14 中天安泰(北京)信息技术有限公司 数据安全读取方法及装置
EP2959378A1 (en) * 2013-02-22 2015-12-30 Marvell World Trade Ltd. Patching boot code of read-only memory
CN104731718A (zh) * 2013-12-24 2015-06-24 上海芯豪微电子有限公司 一种缓存系统和方法
CN109522050B (zh) * 2018-09-10 2020-11-17 上海交通大学 基于处理器控制流记录特性的内存数据实时记录方法和系统
CN111176729A (zh) * 2018-11-13 2020-05-19 深圳市中兴微电子技术有限公司 一种信息处理方法、装置及计算机可读存储介质
CN112445522A (zh) * 2019-09-02 2021-03-05 中科寒武纪科技股份有限公司 指令跳转方法、相关设备及计算机可读介质
CN117389629B (zh) * 2023-11-02 2024-06-04 北京市合芯数字科技有限公司 分支预测方法、装置、电子设备及介质

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1520547A (zh) * 2001-06-29 2004-08-11 皇家菲利浦电子有限公司 预测间接分支目标地址的方法、装置和编译器
US7493600B2 (en) * 2004-08-23 2009-02-17 Faraday Technology Corp. Method for verifying branch prediction mechanism and accessible recording medium for storing program thereof
CN101763291A (zh) * 2009-12-30 2010-06-30 中国人民解放军国防科学技术大学 一种程序控制流错误检测方法
CN102156636A (zh) * 2011-04-28 2011-08-17 北京北大众志微系统科技有限责任公司 一种实现值关联间接跳转预测的装置
CN102156634A (zh) * 2011-04-20 2011-08-17 北京北大众志微系统科技有限责任公司 一种实现值关联间接跳转预测的方法
CN102156635A (zh) * 2011-04-21 2011-08-17 北京北大众志微系统科技有限责任公司 实现值关联间接跳转预测的装置
CN102163143A (zh) * 2011-04-28 2011-08-24 北京北大众志微系统科技有限责任公司 一种实现值关联间接跳转预测的方法

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1520547A (zh) * 2001-06-29 2004-08-11 皇家菲利浦电子有限公司 预测间接分支目标地址的方法、装置和编译器
US7493600B2 (en) * 2004-08-23 2009-02-17 Faraday Technology Corp. Method for verifying branch prediction mechanism and accessible recording medium for storing program thereof
CN101763291A (zh) * 2009-12-30 2010-06-30 中国人民解放军国防科学技术大学 一种程序控制流错误检测方法
CN102156634A (zh) * 2011-04-20 2011-08-17 北京北大众志微系统科技有限责任公司 一种实现值关联间接跳转预测的方法
CN102156635A (zh) * 2011-04-21 2011-08-17 北京北大众志微系统科技有限责任公司 实现值关联间接跳转预测的装置
CN102156636A (zh) * 2011-04-28 2011-08-17 北京北大众志微系统科技有限责任公司 一种实现值关联间接跳转预测的装置
CN102163143A (zh) * 2011-04-28 2011-08-24 北京北大众志微系统科技有限责任公司 一种实现值关联间接跳转预测的方法

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9880842B2 (en) 2013-03-15 2018-01-30 Intel Corporation Using control flow data structures to direct and track instruction execution
CN105808251A (zh) * 2016-03-03 2016-07-27 武汉斗鱼网络科技有限公司 一种基于虚函数表劫持绕过安全检测的方法与系统
CN105808251B (zh) * 2016-03-03 2021-02-02 武汉斗鱼网络科技有限公司 一种基于虚函数表劫持绕过安全检测的方法与系统

Also Published As

Publication number Publication date
CN102163143A (zh) 2011-08-24
CN102163143B (zh) 2013-05-01

Similar Documents

Publication Publication Date Title
WO2012145992A1 (zh) 一种实现值关联间接跳转预测的方法
US9286072B2 (en) Using register last use infomation to perform decode-time computer instruction optimization
Zilles et al. Execution-based prediction using speculative slices
Roth et al. Speculative data-driven multithreading
Heckmann et al. The influence of processor architecture on the design and the results of WCET tools
Rychlik et al. Efficacy and performance impact of value prediction
JP6095670B2 (ja) コンピュータ・システム内のオペランド活性情報の維持
Butts et al. Dynamic dead-instruction detection and elimination
Cher et al. Skipper: a microarchitecture for exploiting control-flow independence
WO2007038304A1 (en) Scheduling optimizations for user-level threads
US20140108768A1 (en) Computer instructions for Activating and Deactivating Operands
Gao et al. SEED: A statically greedy and dynamically adaptive approach for speculative loop execution
US20140047216A1 (en) Scalable Decode-Time Instruction Sequence Optimization of Dependent Instructions
WO2012145997A1 (zh) 一种实现值关联间接跳转预测的装置
Sheikh et al. Control-flow decoupling: An approach for timely, non-speculative branching
Hajiabadi et al. NOREBA: a compiler-informed non-speculative out-of-order commit processor
Garg et al. Speculative parallelization in decoupled look-ahead
Milenkovic et al. Microbenchmarks for determining branch predictor organization
Kim et al. Implementing optimizations at decode time
Agarwal et al. Exploiting postdominance for speculative parallelization
Koizumi et al. Reduction of instruction increase overhead by STRAIGHT compiler
Sassone et al. Static strands: safely collapsing dependence chains for increasing embedded power efficiency
Reinman et al. Classifying load and store instructions for memory renaming
Smith Architectural support for compile-time speculation
Sassone et al. Static strands: Safely exposing dependence chains for increasing embedded power efficiency

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 11864490

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 11864490

Country of ref document: EP

Kind code of ref document: A1