WO2019183877A1 - Branch prediction method and device - Google Patents

Branch prediction method and device Download PDF

Info

Publication number
WO2019183877A1
WO2019183877A1 PCT/CN2018/081057 CN2018081057W WO2019183877A1 WO 2019183877 A1 WO2019183877 A1 WO 2019183877A1 CN 2018081057 W CN2018081057 W CN 2018081057W WO 2019183877 A1 WO2019183877 A1 WO 2019183877A1
Authority
WO
WIPO (PCT)
Prior art keywords
branch instruction
branch
value
prediction
prediction information
Prior art date
Application number
PCT/CN2018/081057
Other languages
French (fr)
Chinese (zh)
Inventor
麻军平
韩彬
吴迪
Original Assignee
深圳市大疆创新科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市大疆创新科技有限公司 filed Critical 深圳市大疆创新科技有限公司
Priority to CN201880011003.1A priority Critical patent/CN110462587A/en
Priority to PCT/CN2018/081057 priority patent/WO2019183877A1/en
Publication of WO2019183877A1 publication Critical patent/WO2019183877A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead

Definitions

  • the present invention relates to the field of processors and, more particularly, to a method and apparatus for branch prediction.
  • Processors play an important role in the modern electronics industry. Processor design is also a high-end technology in the microelectronics industry.
  • the processor design generally adopts a multi-stage pipeline structure to increase the operating frequency of the processor, thereby improving the performance of the processor.
  • High-performance processors often have pipeline depths of more than ten levels, and some processors even reach more than twenty stages of pipelines.
  • Branch instructions are instructions that change the flow of a program and are very common in programs. If the branch of the branch instruction is true, then the next instruction to be executed will jump.
  • the pipelined design of the processor can increase the operating frequency of the processor and also cause the pipeline to be emptied when the branch instruction jumps. After the pipeline is emptied, the instruction is re-read from the target address, which initializes the pipeline. Initializing the pipeline can result in a long drain on the pipeline. Since branch instructions are very common in programs, branch instructions cause pipeline emptying that has severely affected processor performance. In response to this problem, branch prediction techniques have been proposed. Branch prediction technology plays an important role in improving processor performance.
  • the basic principle of the branch prediction technique is that when reading an instruction, it makes a judgment on the branch instruction, determines whether it jumps, and what is the jump target, and then determines the next time to read the instruction according to the judgment result.
  • the branch prediction is based on the historical execution of the branch instruction in the program, and makes a judgment on the execution status of the current branch instruction (ie, whether to jump).
  • the Gshare predictor is a widely used branch prediction technology. Gshare predicts that it has achieved high accuracy, usually above 90%, but it is difficult to further improve. In fact, as long as the accuracy of the branch prediction can be increased by one percentage point, the performance of the processor will be greatly improved.
  • the present invention provides a method and apparatus for branch prediction, which can further improve the accuracy of branch prediction with respect to the prior art.
  • a method for branch prediction comprising: acquiring prediction information of a branch instruction, the prediction information including at least a value of a program counter of the branch instruction and a value of a branch history register of the branch instruction Obtaining a hash value of the prediction information by a hash function; and retrieving a mode history table according to the hash value to obtain a prediction result of the branch instruction.
  • the lower bits of the value of the PC of the branch instruction ie, the address of the branch instruction
  • the lower bits of the PC encountering different branch instructions are equal, it may cause conflicts in the entries in the PHT.
  • the hash value of the prediction information of the branch instruction obtained by the HASH function is used to retrieve the PHT, because the HASH function can compress the message of any length into a short message of a fixed length, so there is no need to worry about the prediction information.
  • the bit length is too long to cause the PHT to be too large, so that the value of the complete PC of the branch instruction can be included in the prediction information, and by using the value of the complete PC of the branch instruction, the different branch instructions can be avoided to some extent.
  • the problem of the PHT entry can effectively avoid the conflict of the PHT entries, thereby improving the accuracy of the branch prediction.
  • the present invention uses the hash value of the prediction information of the branch instruction obtained by the HASH function to retrieve the PHT, so that there is no need to worry that the bit length of the prediction information is too long and the PHT is too large, compared with the existing branch prediction technology.
  • the rich prediction information can be provided to further improve the accuracy of branch prediction.
  • an apparatus for branch prediction comprising the following units:
  • An obtaining unit configured to acquire prediction information of a branch instruction, where the prediction information includes at least a value of a program counter of the branch instruction and a value of a branch history register of the branch instruction;
  • a calculating unit configured to obtain a hash value of the prediction information by using a hash function
  • a retrieving unit configured to retrieve a mode history table according to the hash value to obtain a prediction result of the branch instruction.
  • the present invention uses the hash value of the prediction information of the branch instruction obtained by the HASH function to retrieve the PHT, so that there is no need to worry that the bit length of the prediction information is too long and the PHT is too large, compared with the existing branch prediction technology.
  • the rich prediction information can be provided to further improve the accuracy of branch prediction.
  • a branch predictor comprising: a memory and a processor, the memory for storing instructions, the processor for executing the memory stored instructions, and storing in the memory Execution of the instructions causes the processor to perform the method provided by the first aspect.
  • a computer storage medium having stored thereon a computer program, the computer program being executed by a computer such that the computer performs the method provided by the first aspect.
  • a computer program product comprising instructions for causing a computer to perform the method provided by the first aspect is provided when executed by a computer.
  • Figure 1 is a schematic diagram of a multi-stage pipeline of a processor.
  • FIG. 2 is a schematic diagram of the principle of a prior branch prediction technique.
  • FIG. 3 is a schematic flowchart of a method for branch prediction according to an embodiment of the present invention.
  • FIG. 4 is a schematic diagram of the principle of branch prediction according to an embodiment of the present invention.
  • FIG. 5 is a schematic block diagram of an apparatus for branch prediction according to an embodiment of the present invention.
  • FIG. 6 is a schematic block diagram of a branch predictor according to an embodiment of the present invention.
  • FIG. 1 is a schematic diagram of a common pipeline division.
  • the processor is divided into six stages of pipelines, which in turn include: address calculation, instruction reading, instruction distribution, instruction decoding, instruction execution, and register file.
  • the instructions in the program memory also enter the processor in a pipelined manner as shown in FIG.
  • a branch instruction is an instruction that changes the flow of a program. If the branch of the branch instruction is true, then the next instruction to be executed will jump.
  • branch instructions include, but are not limited to, absolute jump instructions, conditional jump instructions, function call instructions, and function return instructions.
  • branch prediction is to discover the branch instruction and predict whether the branch instruction will jump before executing the first stage of the pipeline.
  • the program counter is a register in the computer processor that is used to store the address (location) of the instruction currently being executed. In other words, the contents of the program counter are the address of the instruction currently being executed. When each instruction is fetched, the address stored in the program counter is incremented by one.
  • PC program counter
  • BHR is a multi-bit shift register. If it is a branch instruction of jump, the lowest displacement of BHR is 1" "1"; if it is a branch instruction without jump, the lowest displacement of BHR is 1" "0". .
  • PHT is used to record the history of branch instructions, jump or not.
  • PHT is a table of 2-bit counters. Each entry of the PHT is 2 bits, indicating the historical execution of a branch instruction.
  • the number of entries included in the PHT is determined by the number of bits of the PHT address. Assuming that the address of the PHT is N bits (N is a positive integer), the PHT includes 2 N entries.
  • the number of entries included in the PHT may also be referred to as the area of the PHT.
  • the area of the PHT mentioned hereinafter refers to the number of entries included in the PHT.
  • the branch prediction technique refers to detecting a branch instruction therein when reading an instruction, determining whether it jumps, and what is the target of the jump, and then determining the next time to read the instruction according to the judgment result.
  • Gshare Global history with Index Sharing
  • branch prediction technology also known as Gshare predictor
  • the Gshare predictor uses the value of the lower bit of the PC value of the branch instruction and the value after the binary bit recorded in the BHR to retrieve the PHT to predict whether the current branch instruction jumps. Specifically, the value after the XOR is regarded as an address, and then an entry of the PHT is located according to the address, and the jump result of the branch instruction is predicted according to the value of the 2 bit in the entry.
  • the basic principle of the Gshare predictor is shown in Figure 2.
  • PHT can't do much because of the PHT area, such as a table with a PHT of 10 bits.
  • the lower 10 bits of the value of the PC of the branch instruction can only be used to retrieve the PHT, that is, the lower 10 bits of the value of the PC are XORed with the binary bits recorded in the BHR, and then the exclusive OR is utilized.
  • the results of the search for PHT This is also the reason why the Gshare predictor uses the lower bits of the PC value of the branch instruction.
  • Gshare branch prediction technology can often reach more than 90%.
  • Gshare branch prediction technology is widely used in processor design.
  • the accuracy of the Gshare predictor is difficult to further improve.
  • the pipeline depth is deep and there are many branch instructions. If the branch prediction accuracy is increased by 1 percentage point, the performance of the processor is greatly improved.
  • embodiments of the present invention provide a method and apparatus for branch prediction, which can further improve the accuracy of branch prediction with respect to existing branch prediction techniques.
  • FIG. 3 is a schematic flowchart of a method for branch prediction according to an embodiment of the present invention.
  • the method can be performed by a branch predictor.
  • the method includes the following steps.
  • the value of the PC of the branch instruction refers to the address of the branch instruction.
  • BHR is a multi-bit shift register. If it is a branch instruction of jump, the lowest displacement of BHR is 1 "1"; if it is a branch instruction without jump, the minimum displacement of BHR is 1 bit. "0".
  • the value of the BHR of the branch instruction refers to the current value recorded in the BHR when it is the turn to predict the branch instruction.
  • the prediction information in this embodiment includes the value of the PC directly corresponding to the branch instruction and the value of the BHR, and is not a value obtained by XORing the value of the PC with the value of the BHR. For example, if the value of the PC of the branch instruction is "0x2c" and the value of the BHR of the branch instruction is "0b00_0000_0110", the prediction information of the branch instruction is:
  • the HASH function is also called a hash function or a hash function.
  • the HASH function can convert a message of any length into a short message of a fixed length.
  • the HASH function can be thought of as a mapping of compressing long messages into short messages.
  • the short message obtained by the HASH function compression may be referred to as a message digest.
  • the hash value of the prediction information obtained by the HASH function in this embodiment refers to the message digest obtained by hashing the prediction information as a variable in the HASH function.
  • the length of the message digest of the HASH function in this embodiment is equal to the number of bits of the address of the PHT.
  • the message digest of the HASH function is also 10 bits in length.
  • the bit number of the hash value of the prediction information of the branch instruction is equal to the number of bits of the address of the PHT.
  • the message digest of the HASH function is also 10 bits in length.
  • the prediction information of the branch instruction acquired in S310 is:
  • the prediction information is used as a variable of the HASH function, and a 10-bit message digest is obtained through a hash operation, that is, a hash value of the prediction information is obtained, and the hash value has a length of 10 bits.
  • the hash value is regarded as an address, and the PHT is checked, and an entry in the PHT is located, and then the jump result of the branch instruction is predicted according to the value of the 2 bit stored in the entry.
  • the jump result of the branch instruction may be predicted to be a jump; when the value in the entry that is located according to the hash value is “01”, Then, the jump result of the branch instruction can be predicted to be no jump.
  • the processor can process the branch instruction in accordance with the pipeline shown in FIG.
  • the method further includes updating the PHT based on the actual jump result of the branch instruction.
  • the actual jump of the branch instruction can be known. If the actual jump of the branch instruction is consistent with the prediction result in S330, Then, the PHT is not updated. Otherwise, the PHT is updated according to the actual jump condition of the branch instruction, that is, the value of 2 bits in the entry located according to the hash value is updated.
  • the lower bits of the value of the PC of the branch instruction ie, the address of the branch instruction
  • the lower bits of the PC encountering different branch instructions are equal, it may cause conflicts in the entries in the PHT.
  • the hash value of the prediction information of the branch instruction obtained by the HASH function is used to retrieve the PHT, because the HASH function can compress the message of any length into a short message of a fixed length, so there is no need to worry about the prediction information.
  • the length of the bit is too long to cause the PHT to be too large, so that the value of the complete PC of the branch instruction can be included in the prediction information, and by using the value of the complete PC of the branch instruction, the correspondence of different branch instructions can be avoided to some extent.
  • the problem of the same PHT entry can effectively avoid the conflict of PHT entries, which can improve the accuracy of branch prediction.
  • the present invention uses the hash value of the prediction information of the branch instruction obtained by the HASH function to retrieve the PHT, so that there is no need to worry that the bit length of the prediction information is too long, resulting in an excessive PHT, and therefore, relative to the existing branch.
  • Predictive technology can provide rich prediction information, which can further improve the accuracy of branch prediction.
  • the present invention uses the hash value of the prediction information of the branch instruction obtained by the HASH function to retrieve the PHT, so the size of the PHT can be flexibly designed by setting the length of the message digest of the HASH function.
  • the design of the HASH function in the present invention may use common logical operations, such as “AND”, “OR”, “XOR”, etc., and may also use table lookup replacement, shifting, and the like.
  • the HASH function can be designed such that the message digest of its output can be as scattered as possible.
  • the message digest of the HASH function is relatively scattered, and it can be ensured to some extent that the hash values of different prediction information are different, which can effectively avoid the existence of blank entries in the PHT, that is, the utilization of the PHT can be improved, and at the same time, Reduce conflicts between entries in the PHT.
  • the prediction information of the branch instruction acquired in S310 includes the value of the complete PC of the branch instruction.
  • the value of the PC of the branch instruction is 32 bits or 64 bits.
  • the prediction information used in this embodiment includes the value of the complete PC of the branch instruction, and the prediction result of the branch instruction can be further improved compared to the lower bits of the address of the branch instruction in the prior art. Accuracy.
  • the HASH function may be set to be parameterized and configurable, and the uniformity of the message digest is best by adjusting the parameters of the HASH function when executing a certain program.
  • the method further includes: selecting the hash function from a plurality of candidate hash functions, wherein the hash function is the most uniform of the message digest in the plurality of candidate hash functions. A good hash function.
  • multiple HASH functions can be designed to select the function with the best message digest uniformity from multiple HASH functions when executing a certain program.
  • the length of the message digest of the HASH function is determined according to at least one of the following: system requirements, processor size, and program size.
  • the process of determining the length of the message digest of the HASH function according to the system requirement is: when the system can tolerate a decrease in the prediction accuracy caused by a small area of the PHT, the message digest can be designed to be shorter, correspondingly, PHT will also become smaller; conversely, when the system requires higher prediction accuracy, the message digest can be designed to be longer, and accordingly, the size of the PHT will also become larger.
  • processors there are many types of processors, ranging from small microcontrollers to general embedded processors, as well as large processors for high performance computing. The scales between these different types of processors vary widely.
  • the message digest needs to be designed to be shorter.
  • the pursuit of high performance is not sensitive to area and power consumption, can use a large PHT, so the message digest can be designed to be longer.
  • the length of the message digest of the HASH function can be set according to system requirements, processor size, or program size, so that the size of the PHT can be designed according to system requirements, processor size, or program size. Therefore, this embodiment can Provide appropriate branch prediction accuracy based on specific application needs.
  • the HASH function is a uniform HASH function.
  • a HASH function maps a long message set to a short message set, if the message digest can be evenly distributed, there is little conflict, that is, different long messages correspond to different message digests.
  • Such a HASH function is called a uniform HASH function.
  • the HASH function can evenly distribute the message digest, which is equivalent to uniformly distributing the hash value for retrieving the PHT, so that a large number of idle entries in the PHT can be largely avoided, that is, the utilization of the PHT is improved. In addition, conflicts between individual entries in the PHT can be effectively avoided.
  • the prediction information of the branch instruction may include other factors affecting the execution of the branch instruction in addition to the value of the PC of the branch instruction and the value of the BHR.
  • the prediction information of the branch instruction acquired in S310 includes type information of the branch instruction in addition to the value of the PC of the branch instruction and the value of the BHR.
  • the type information of the branch instruction refers to information indicating the jump type of the branch instruction.
  • the type of the branch instruction is defined as “0b000”
  • the type of the branch instruction is defined as “0b010”
  • the branch instruction is a conditional jump
  • the type of the branch instruction is defined as "0b001”.
  • the prediction information to be input into the HASH function is:
  • the prediction information of the embodiment includes the type of the branch instruction, and the content of the prediction information is increased compared with the prior art, thereby further improving the accuracy of the branch prediction. degree.
  • the prediction information of the branch instruction acquired in S310 includes, in addition to the value of the PC of the branch instruction and the value of the BHR, a PC that includes n branch instructions executed before the branch instruction.
  • the value of n is a positive integer.
  • the prediction information further includes type information of the n branch instructions.
  • n is equal to 3.
  • the prediction information of the embodiment includes the value of the PC of the one or more branch instructions executed before the branch instruction, and may further include the one or more branches.
  • the type information of the instruction is richer in the content for predicting the branch instruction than the prior art, so that the accuracy of the branch prediction can be further improved.
  • the prediction information of the branch instruction acquired in S310 includes, in addition to the value of the PC of the branch instruction and the value of the BHR, a stack pointer of the processor, and/or the branch instruction The value of the condition register, which is used to indicate the result of the branch instruction.
  • the value of the condition register refers to the value stored by the condition register.
  • the condition register of the branch instruction stores "1" or "0".
  • the value of the condition register is used to indicate the actual jump result of the branch instruction.
  • stack pointer is a ubiquitous concept in the processor and refers to a memory address.
  • some variables such as registers in the main program need to be written to the stack area to protect the register field inside the processor, and then the sub-function is executed.
  • the prediction information of this embodiment includes, in addition to the value of the PC of the branch instruction and the value of the BHR, the stack pointer of the processor, and/or the value of the condition register of the branch instruction, which is used in comparison with the prior art.
  • the content of the prediction branch instruction is more abundant, so that the accuracy of the branch prediction can be further improved.
  • the foregoing description of the content of the prediction information about the branch instruction may include an embodiment of the content, and may be combined in any manner, which is not limited by the present invention.
  • the prediction information of the branch instruction includes the value of the PC of the branch instruction, the value of the BHR of the branch instruction, the type information of the branch instruction, and the value and type information of the PC of the n branch instructions executed before the branch instruction.
  • the prediction information of the branch instruction includes a value of the PC of the branch instruction, a value of the BHR of the branch instruction, type information of the branch instruction, and a value and type information of the PC of the n branch instructions executed before the branch instruction.
  • FIG. 4 is a schematic diagram of the principle of performing branch prediction according to an embodiment of the present invention.
  • the prediction information of the branch instruction includes a plurality of pieces of information: a value of the PC of the branch instruction, a value of a PC of a branch instruction of the branch instruction, a value of a PC of the branch instruction of the previous branch instruction, and a branch instruction Type information, type information of the last branch instruction of the branch instruction, type information of the last branch instruction of the branch instruction, ..., the value of the BHR of the branch instruction, where "" indicates that the other branch is affected The factor of execution of the instruction.
  • the last branch instruction of the branch instruction refers to the last branch instruction executed before the branch instruction; the last branch instruction of the branch instruction refers to the last previous branch instruction executed before the branch instruction.
  • the above information is used as a variable of the HASH function, and a hash operation is performed to obtain a hash value whose length is equal to the number of bits of the address of the PHT.
  • the PHT is retrieved using the hash value to obtain a prediction result of the branch instruction.
  • the second column is an instruction
  • the first column is the value of the PC corresponding to the instruction.
  • Call_ins2, br_ins4, br_ins11, ret49 are four branch instructions, and the rest of the instructions are non-branch instructions.
  • call_ins2 is a function call instruction, which is an absolute jump instruction.
  • the instruction type of the absolute jump instruction is defined as "0b000”.
  • Br_ins4 and br_ins11 are conditional jump instructions.
  • the instruction type of the conditional jump instruction is defined as "0b001”.
  • Ret returns an instruction for the function.
  • the instruction type of the function return instruction is defined as "0b010”.
  • call_ins2 calls the subfunction (ie sub_function), reg49 returns from the subfunction, and br_ins4 does not jump, ie: pc(08)->pc(c0)->pc(c4)-> Pc(0c)->pc(18)->pc(2c), branch prediction is now required for the branch instruction br_ins11.
  • the prediction information of the branch instruction br_ins11 is obtained, the prediction information includes the value of the PC of the branch instruction br_ins11, the value of the PC of the branch instruction br_ins11 of the last branch instruction br_ins4, and the PC of the branch instruction br_ins11 of the last branch instruction ret49
  • the value of the PC of the branch instruction br_ins11 is "0x2c"
  • the value of the PC of the last branch instruction br_ins4 of the branch instruction br_ins11 is "0x18”
  • the value of the PC of the branch instruction br_ins11 of the last branch instruction ret49 is "c4".
  • the type information of the branch instruction br_ins11 is "0b001”
  • the type information of the last branch instruction br_ins4 of the branch instruction br_ins11 is "0b001”
  • the type information of the last branch instruction ret49 of the branch instruction br_ins11 is "0b010".
  • BHR is a 10-bit shift register. When a jump instruction is encountered, the lowest shift is 1 bit, and a jump instruction without a jump is encountered. The lowest shift is 1 bit 0, and the initial value of BHR is 0.
  • the prediction information of the branch instruction br_ins11 that needs to be predicted can be expressed as:
  • the prediction information of the branch instruction br_ins11 is used as a variable of the HASH function, and a hash operation is performed to obtain a 10-bit message digest (ie, a hash value of the prediction information).
  • the message digest is used to retrieve the PHT, and the branch instruction br_ins11 is predicted to jump according to the value of 2 bits in the located entry.
  • the present invention can consider more comprehensive factors affecting branch instructions in the process of branch prediction, so that different execution conditions of the same branch instruction can be fully distinguished, and thus The division of the same branch instruction jumps and does not jump, so that the accuracy of the branch prediction can be effectively improved.
  • the entry conflicts in the PHT can be avoided to some extent.
  • the prediction information of the branch instruction described above may include at least one of the following factors affecting the execution of the branch prediction in addition to the value of the PC including the branch instruction and the value of the BHR: the branch instruction
  • the type information, the value of the program counter of the n branch instructions executed before the branch instruction, the type information of the n branch instructions, the stack pointer of the processor, and the value of the condition register may be added to the prediction information of the branch instruction.
  • the S310 specifically includes: generating, according to the value of the PC of the branch instruction and the value of the BHR, and the selected influencing factor, the prediction information of the branch instruction, where the prediction information includes the PC of the branch instruction.
  • the selected influencing factor is at least one factor selected from the following factors according to the control information: type information of the branch instruction, a value of a program counter of the n branch instructions executed before the branch instruction, and the n pieces The type information of the branch instruction, the stack pointer of the processor, and the value of the condition register.
  • the control information includes an influence factor of the above-mentioned respective influence factors on the execution of the branch instruction.
  • a factor that selects a larger (or largest) influence factor from among the above various influencing factors is added to the prediction information of the branch instruction.
  • control information can be manually configured and is an empirical value.
  • control information can be generated by a processor that runs the instructions.
  • the processor stores various influencing factors, and then calculates the influence factors of each influencing factor through correlation analysis, such as statistical analysis.
  • a factor that greatly affects the execution of the branch prediction may be selected from a plurality of influencing factors according to specific requirements, and the branch prediction may be discarded.
  • the implementation of smaller factors can further improve the accuracy of branch prediction.
  • the technical solution provided by the embodiment of the present invention uses the hash value of the prediction information of the branch instruction obtained by the HASH function to retrieve the PHT, because the HASH function can compress the message of any length into a short message of a fixed length. Therefore, there is no need to worry that the PHT is too large due to the long bit length of the prediction information, so that a more comprehensive factor affecting the branch instruction can be considered, so that the accuracy of the branch prediction can be improved.
  • FIG. 5 is a schematic block diagram of an apparatus for branch prediction according to an embodiment of the present invention.
  • the device comprises the following units.
  • the obtaining unit 510 is configured to acquire prediction information of a branch instruction, where the prediction information includes at least a value of a program counter of the branch instruction and a value of a branch history register of the branch instruction.
  • the calculating unit 520 is configured to obtain a hash value of the prediction information by using a hash function. as well as
  • the searching unit 530 is configured to retrieve a mode history table according to the hash value to obtain a prediction result of the branch instruction.
  • the prediction information further includes type information of the branch instruction.
  • the prediction information further includes a value of a program counter of n branch instructions executed before the branch instruction, and n is a positive integer.
  • the prediction information further includes type information of the n branch instructions.
  • the prediction information further includes: a stack pointer of the processor, and/or a value of a condition register of the branch instruction, the value of the condition register is used to indicate a jump result of the branch instruction.
  • the prediction information further includes at least one of: type information of the branch instruction, a value of a program counter of the n branch instructions executed before the branch instruction, the n pieces The type information of the branch instruction, the stack pointer of the processor, and the value of the condition register.
  • the value of the program counter of the branch instruction is 32 bits or 64 bits.
  • the obtaining unit is further configured to: select the hash function from a plurality of candidate hash functions, where the hash function is uniformity of the message digest in the plurality of candidate hash functions The best hash function.
  • the length of the message digest of the hash function is determined based on at least one of the following: system requirements, processor size, and program size.
  • the hash function is a uniform hash function.
  • the device further includes:
  • an update unit configured to update the mode history table according to an actual jump result of the branch instruction.
  • an embodiment of the present invention further provides a branch predictor, including: a memory 620 for storing instructions, and a processor 610, configured to execute instructions stored by the memory 620, and Execution of the instructions stored in the memory 620 causes the processor 610 to perform the method of the above method embodiments.
  • a branch predictor including: a memory 620 for storing instructions, and a processor 610, configured to execute instructions stored by the memory 620, and Execution of the instructions stored in the memory 620 causes the processor 610 to perform the method of the above method embodiments.
  • the branch predictor further includes at least one register 630 for storing prediction information of the branch instruction.
  • branch predictor includes a BHR 640.
  • the embodiment of the invention further provides a computer storage medium on which is stored a computer program, which when executed by a computer, causes the computer to execute the method of the above method embodiment.
  • Embodiments of the present invention also provide a computer program product comprising instructions that, when executed by a computer, cause a computer to perform the method of the above method embodiments.
  • the computer program product includes one or more computer instructions.
  • the computer can be a general purpose computer, a special purpose computer, a computer network, or other programmable device.
  • the computer instructions can be stored in a computer readable storage medium or transferred from one computer readable storage medium to another computer readable storage medium, for example, the computer instructions can be from a website site, computer, server or data center Transmission to another website site, computer, server or data center via wired (eg coaxial cable, fiber optic, digital subscriber line (DSL)) or wireless (eg infrared, wireless, microwave, etc.).
  • the computer readable storage medium can be any available media that can be accessed by a computer or a data storage device such as a server, data center, or the like that includes one or more available media.
  • the usable medium may be a magnetic medium (for example, a floppy disk, a hard disk, a magnetic tape), an optical medium (such as a digital video disc (DVD)), or a semiconductor medium (such as a solid state disk (SSD)).
  • a magnetic medium for example, a floppy disk, a hard disk, a magnetic tape
  • an optical medium such as a digital video disc (DVD)
  • a semiconductor medium such as a solid state disk (SSD)
  • the disclosed systems, devices, and methods may be implemented in other manners.
  • the device embodiments described above are merely illustrative.
  • the division of the unit is only a logical function division.
  • there may be another division manner for example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored or not executed.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be in an electrical, mechanical or other form.
  • the units described as separate components may or may not be physically separate, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
  • each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.

Abstract

Provided is a branch prediction method and device, the method comprises: acquiring prediction information of a branch order, the prediction information at least including a program counter value of the prediction order and a value of a branch history register; acquiring a HASH value of the prediction information by means of a HASH function; and acquiring a prediction result of the branch order according to the HASH value searching pattern history table. Since a HASH function can compress a message of any length into a short message of a fixed length, there is no need to worry that the pattern history table is too large due to the too long bit length of the prediction information. Therefore, the prediction information can include information with a long bit length so as to further improve the accuracy of branch prediction.

Description

分支预测的方法与装置Branch prediction method and device
版权申明Copyright statement
本专利文件披露的内容包含受版权保护的材料。该版权为版权所有人所有。版权所有人不反对任何人复制专利与商标局的官方记录和档案中所存在的该专利文件或者该专利披露。The disclosure of this patent document contains material that is subject to copyright protection. This copyright is the property of the copyright holder. The copyright owner has no objection to the reproduction of the patent document or the patent disclosure in the official records and files of the Patent and Trademark Office.
技术领域Technical field
本发明涉及处理器领域,并且更为具体地,涉及一种分支预测的方法与装置。The present invention relates to the field of processors and, more particularly, to a method and apparatus for branch prediction.
背景技术Background technique
处理器在现代电子产业中占有十分重要的地位。处理器设计也是微电子产业中的一项高端技术。处理器设计一般采用多级流水线的结构,用来提高处理器的工作频率,从而提高处理器的性能。高性能的处理器往往有十级以上的流水线深度,有些处理器甚至达到二十多级流水线。Processors play an important role in the modern electronics industry. Processor design is also a high-end technology in the microelectronics industry. The processor design generally adopts a multi-stage pipeline structure to increase the operating frequency of the processor, thereby improving the performance of the processor. High-performance processors often have pipeline depths of more than ten levels, and some processors even reach more than twenty stages of pipelines.
分支指令是改变程序流程的指令,在程序中非常常见。如果分支指令的分支成立,那么下一条将要执行的指令就会跳转。Branch instructions are instructions that change the flow of a program and are very common in programs. If the branch of the branch instruction is true, then the next instruction to be executed will jump.
处理器的流水线设计可以提高处理器的工作频率,同时在分支指令跳转时也会导致流水线清空。流水线清空后,从目标地址重新读取指令,即初始化流水线。初始化流水线会造成很长的流水线浪费。由于分支指令在程序中非常常见,因此分支指令导致流水线清空已经严重影响到处理器性能。针对这个问题,分支预测技术被提出来。分支预测技术在提升处理器性能方面有着十分重要的作用。The pipelined design of the processor can increase the operating frequency of the processor and also cause the pipeline to be emptied when the branch instruction jumps. After the pipeline is emptied, the instruction is re-read from the target address, which initializes the pipeline. Initializing the pipeline can result in a long drain on the pipeline. Since branch instructions are very common in programs, branch instructions cause pipeline emptying that has severely affected processor performance. In response to this problem, branch prediction techniques have been proposed. Branch prediction technology plays an important role in improving processor performance.
分支预测技术的基本原理是,在读取指令时,对其中的分支指令做出判断,判断其是否跳转,以及跳转目标是多少,然后根据判断结果,决定下一次从哪里读取指令。实际执行过程中,分支预测是基于程序中分支指令的历史执行情况,对当前分支指令的执行情况(即是否跳转)做出判断。The basic principle of the branch prediction technique is that when reading an instruction, it makes a judgment on the branch instruction, determines whether it jumps, and what is the jump target, and then determines the next time to read the instruction according to the judgment result. In the actual execution process, the branch prediction is based on the historical execution of the branch instruction in the program, and makes a judgment on the execution status of the current branch instruction (ie, whether to jump).
目前,Gshare预测器是被广泛使用的分支预测技术。Gshare预测已经能达到很高的准确度,通常在90%以上,但却很难再进一步提高。而实际上只要分支预测的准确度能提高一个百分点,处理器的性能就会有一个很大的提升。Currently, the Gshare predictor is a widely used branch prediction technology. Gshare predicts that it has achieved high accuracy, usually above 90%, but it is difficult to further improve. In fact, as long as the accuracy of the branch prediction can be increased by one percentage point, the performance of the processor will be greatly improved.
因此,有必要提出一种可以进一步提高准确度的分支预测技术。Therefore, it is necessary to propose a branch prediction technique that can further improve the accuracy.
发明内容Summary of the invention
本发明提供一种分支预测的方法与装置,相对于现有技术,可以进一步提高分支预测的准确度。The present invention provides a method and apparatus for branch prediction, which can further improve the accuracy of branch prediction with respect to the prior art.
第一方面,提供一种分支预测的方法,所述方法包括:获取分支指令的预测信息,所述预测信息至少包括所述分支指令的程序计数器的值以及所述分支指令的分支历史寄存器的值;通过哈希函数获得所述预测信息的哈希值;以及根据所述哈希值息检索模式历史表,以获取所述分支指令的预测结果。In a first aspect, a method for branch prediction is provided, the method comprising: acquiring prediction information of a branch instruction, the prediction information including at least a value of a program counter of the branch instruction and a value of a branch history register of the branch instruction Obtaining a hash value of the prediction information by a hash function; and retrieving a mode history table according to the hash value to obtain a prediction result of the branch instruction.
在现有的Gshare分支预测技术中,由于受PHT表的面积的限制,只能使用分支指令的PC的值(即分支指令的地址)的低位来做预测。如果遇到不同的分支指令的PC的低位相等的话,可能会造成PHT中表项的冲突。In the existing Gshare branch prediction technique, due to the limitation of the area of the PHT table, the lower bits of the value of the PC of the branch instruction (ie, the address of the branch instruction) can only be used for prediction. If the lower bits of the PC encountering different branch instructions are equal, it may cause conflicts in the entries in the PHT.
而在本发明中,采用通过HASH函数获得的该分支指令的预测信息的哈希值来检索PHT,因为HASH函数可以把任意长度的消息压缩成固定长度的短消息,因此无需担心由于该预测信息的比特长度太长而导致PHT过大,从而可以在预测信息中包括分支指令的完整的PC的值,而通过使用分支指令的完整的PC的值,可以在一定程度上避免不同分支指令对应相同的PHT表项的问题,即可以有效避免PHT的表项冲突,进而可以提高分支预测的准确度。In the present invention, the hash value of the prediction information of the branch instruction obtained by the HASH function is used to retrieve the PHT, because the HASH function can compress the message of any length into a short message of a fixed length, so there is no need to worry about the prediction information. The bit length is too long to cause the PHT to be too large, so that the value of the complete PC of the branch instruction can be included in the prediction information, and by using the value of the complete PC of the branch instruction, the different branch instructions can be avoided to some extent. The problem of the PHT entry can effectively avoid the conflict of the PHT entries, thereby improving the accuracy of the branch prediction.
因此,本发明采用通过HASH函数获得的该分支指令的预测信息的哈希值来检索PHT,从而无需担心预测信息的比特位长度太长而导致PHT过大,相对于现有的分支预测技术,可以提供较为丰富的预测信息,则可以进一步提高分支预测的准确度。Therefore, the present invention uses the hash value of the prediction information of the branch instruction obtained by the HASH function to retrieve the PHT, so that there is no need to worry that the bit length of the prediction information is too long and the PHT is too large, compared with the existing branch prediction technology. The rich prediction information can be provided to further improve the accuracy of branch prediction.
第二方面,提供一种分支预测的装置,该装置包括如下单元:In a second aspect, an apparatus for branch prediction is provided, the apparatus comprising the following units:
获取单元,用于获取分支指令的预测信息,所述预测信息至少包括所述分支指令的程序计数器的值以及所述分支指令的分支历史寄存器的值;An obtaining unit, configured to acquire prediction information of a branch instruction, where the prediction information includes at least a value of a program counter of the branch instruction and a value of a branch history register of the branch instruction;
计算单元,用于通过哈希函数获得所述预测信息的哈希值;以及a calculating unit, configured to obtain a hash value of the prediction information by using a hash function;
检索单元,用于根据所述哈希值息检索模式历史表,以获取所述分支指令的预测结果。And a retrieving unit, configured to retrieve a mode history table according to the hash value to obtain a prediction result of the branch instruction.
因此,本发明采用通过HASH函数获得的该分支指令的预测信息的哈希值来检索PHT,从而无需担心预测信息的比特位长度太长而导致PHT过大,相对于现有的分支预测技术,可以提供较为丰富的预测信息,则可以进一步提高分支预测的准确度。Therefore, the present invention uses the hash value of the prediction information of the branch instruction obtained by the HASH function to retrieve the PHT, so that there is no need to worry that the bit length of the prediction information is too long and the PHT is too large, compared with the existing branch prediction technology. The rich prediction information can be provided to further improve the accuracy of branch prediction.
第三方面,提供一种分支预测器,该分支预测器包括:存储器与处理器,所述存储器用于存储指令,所述处理器用于执行所述存储器存储的指令,并且对所述存储器中存储的指令的执行使得,所述处理器用于执行第一方面提供的方法。In a third aspect, a branch predictor is provided, the branch predictor comprising: a memory and a processor, the memory for storing instructions, the processor for executing the memory stored instructions, and storing in the memory Execution of the instructions causes the processor to perform the method provided by the first aspect.
第四方面,提供一种计算机存储介质,其上存储有计算机程序,所述计算机程序被计算机执行时使得,所述计算机执行第一方面提供的方法。In a fourth aspect, a computer storage medium is provided having stored thereon a computer program, the computer program being executed by a computer such that the computer performs the method provided by the first aspect.
第五方面,提供一种包含指令的计算机程序产品,所述指令被计算机执行时使得计算机执行第一方面提供的方法。In a fifth aspect, a computer program product comprising instructions for causing a computer to perform the method provided by the first aspect is provided when executed by a computer.
附图说明DRAWINGS
图1是处理器的多级流水线的示意图。Figure 1 is a schematic diagram of a multi-stage pipeline of a processor.
图2是现有的分支预测技术的原理示意图。2 is a schematic diagram of the principle of a prior branch prediction technique.
图3是本发明实施例提供的分支预测的方法的示意性流程图。FIG. 3 is a schematic flowchart of a method for branch prediction according to an embodiment of the present invention.
图4是本发明实施例提供的分支预测的原理示意图。FIG. 4 is a schematic diagram of the principle of branch prediction according to an embodiment of the present invention.
图5是本发明实施例提供的分支预测的装置的示意性框图。FIG. 5 is a schematic block diagram of an apparatus for branch prediction according to an embodiment of the present invention.
图6是本发明实施例提供的分支预测器的示意性框图。FIG. 6 is a schematic block diagram of a branch predictor according to an embodiment of the present invention.
具体实施方式detailed description
为了便于理解本发明实施例提供的技术方案,下文首先描述一些本发明实施例涉及的概念。In order to facilitate understanding of the technical solutions provided by the embodiments of the present invention, some concepts related to the embodiments of the present invention are first described below.
1)处理器的流水线设计。1) Pipeline design of the processor.
现代处理器设计中,为了提高处理器工作频率,一般都采用流水线技术,即处理器被划分为多级流水线。图1为常见的流水线划分的示意图,图1中,处理器被划分为六级流水线,依次包括:地址计算、指令读取、指令分发、指令解码、指令执行与寄存器堆等。程序存储器中的指令也是按照如图1所示的流水线的方式进入处理器。In modern processor design, in order to improve the operating frequency of the processor, pipeline technology is generally adopted, that is, the processor is divided into multi-stage pipelines. FIG. 1 is a schematic diagram of a common pipeline division. In FIG. 1, the processor is divided into six stages of pipelines, which in turn include: address calculation, instruction reading, instruction distribution, instruction decoding, instruction execution, and register file. The instructions in the program memory also enter the processor in a pipelined manner as shown in FIG.
2)分支指令。2) Branch instructions.
分支指令是改变程序流程的指令。如果分支指令的分支成立,那么下一条将要执行的指令就会跳转。A branch instruction is an instruction that changes the flow of a program. If the branch of the branch instruction is true, then the next instruction to be executed will jump.
例如,分支指令包括但不限于:绝对跳转指令,条件跳转指令,函数调用指令,函数返回指令。For example, branch instructions include, but are not limited to, absolute jump instructions, conditional jump instructions, function call instructions, and function return instructions.
3)分支指令对处理器的流水线的影响。3) The effect of branch instructions on the pipeline of the processor.
假设处理器的流水线如图1所示,程序存储器中的指令也是按照如图1所示的流水线的方式进入处理器。如果程序存储器存储的指令中有跳转指令,跳转指令只有到达“指令执行”这一级时,才会发生跳转。这时处理器需要清空前面四个流水级,并且重新计算取指地址,读取指令。清空流水线会对处理器的性能造成很大影响。Assume that the pipeline of the processor is as shown in Figure 1. The instructions in the program memory also enter the processor in the same manner as the pipeline shown in Figure 1. If there is a jump instruction in the instruction stored in the program memory, the jump instruction will only jump when it reaches the level of "instruction execution". At this point, the processor needs to clear the first four pipeline stages, and recalculate the fetch address and read the instruction. Emptying the pipeline can have a significant impact on processor performance.
针对该问题,分支预测技术被提出来。For this problem, branch prediction techniques have been proposed.
4)分支预测技术。4) Branch prediction technology.
针对上面描述的问题,如果处理器能够在“地址计算”这一级,提前发现分支指令,并做出预测,即该分支指令是否跳转,那么就能避免流水线清空带来的处理器性能损失。如图1所示,分支预测的目的就是,在执行流水线的第一级之前,发现分支指令并预测该分支指令是否跳转。For the problem described above, if the processor can find the branch instruction in advance at the "address calculation" level and make a prediction, that is, whether the branch instruction jumps, the processor performance loss caused by the pipeline clearing can be avoided. . As shown in Figure 1, the purpose of branch prediction is to discover the branch instruction and predict whether the branch instruction will jump before executing the first stage of the pipeline.
在具体描述分支预测技术之前,先介绍几个概念。Before describing the branch prediction technique in detail, introduce several concepts.
①程序计数器(program counter,PC)。1 program counter (PC).
程序计数器是计算机处理器中的寄存器,它用于存储当前正在执行的指令的地址(位置)。换句话说,程序计数器的内容就是当前正在执行的指令的地址。当每个指令被获取,程序计数器存储的地址加1。The program counter is a register in the computer processor that is used to store the address (location) of the instruction currently being executed. In other words, the contents of the program counter are the address of the instruction currently being executed. When each instruction is fetched, the address stored in the program counter is incremented by one.
本文中涉及程序计数器(PC)值,其指的是,当前正在执行的分支指令的地址。This document refers to the program counter (PC) value, which refers to the address of the branch instruction currently being executed.
②分支历史寄存器(Branch History Register,BHR)。2 Branch History Register (BHR).
BHR是一个多位的移位寄存器,如果是跳转的分支指令,则BHR的最低位移入1bit的“1”;如果是不跳转的分支指令,则BHR的最低位移入1bit的“0”。BHR is a multi-bit shift register. If it is a branch instruction of jump, the lowest displacement of BHR is 1" "1"; if it is a branch instruction without jump, the lowest displacement of BHR is 1" "0". .
③模式历史表(Pattern History Table,PHT)。3 Pattern History Table (PHT).
PHT用于记录分支指令的历史执行情况,跳转或者不跳转。通常,PHT是由2位计数器组成的一个表。PHT的每一个表项为2bit,表示一个分支指令的历史执行情况。PHT is used to record the history of branch instructions, jump or not. Typically, PHT is a table of 2-bit counters. Each entry of the PHT is 2 bits, indicating the historical execution of a branch instruction.
PHT中包括的表项的数量由PHT的地址的位数决定。假设PHT的地址为N位(N为正整数),则PHT中包括2 N个表项。 The number of entries included in the PHT is determined by the number of bits of the PHT address. Assuming that the address of the PHT is N bits (N is a positive integer), the PHT includes 2 N entries.
PHT中包括的表项的数量也可称为PHT的面积。下文中提及的PHT的面积指的是PHT中包括的表项的数量。The number of entries included in the PHT may also be referred to as the area of the PHT. The area of the PHT mentioned hereinafter refers to the number of entries included in the PHT.
分支预测技术指的是,在读取指令时,检测其中的分支指令,并判断其是否跳转,以及跳转目标是多少,再根据判断结果,决定下一次从哪里读取指令。The branch prediction technique refers to detecting a branch instruction therein when reading an instruction, determining whether it jumps, and what is the target of the jump, and then determining the next time to read the instruction according to the judgment result.
目前,Gshare(Global history with Index Sharing)分支预测技术(也称为Gshare预测器)是被广泛使用的分支预测技术。Gshare预测器使用分支指令的PC的值的低位与BHR中记录的二进制位异或后的值去检索PHT,用来预测当前分支指令是否跳转。具体地,将异或后的值看作一个地址,然后根据该地址定位PHT的一个表项,根据该表项中的2bit的值预测该分支指令的跳转结果。Gshare预测器的基本原理如图2所示。Currently, Gshare (Global history with Index Sharing) branch prediction technology (also known as Gshare predictor) is a widely used branch prediction technology. The Gshare predictor uses the value of the lower bit of the PC value of the branch instruction and the value after the binary bit recorded in the BHR to retrieve the PHT to predict whether the current branch instruction jumps. Specifically, the value after the XOR is regarded as an address, and then an entry of the PHT is located according to the address, and the jump result of the branch instruction is predicted according to the value of the 2 bit in the entry. The basic principle of the Gshare predictor is shown in Figure 2.
实际上,出于PHT的面积的考虑,PHT并不能做的很大,例如PHT为10位地址的表。例如,针对10位地址的PHT,只能采用分支指令的PC的值的低10位来检索PHT,即利用PC的值的低10位与BHR中记录的二进制位进行异或,然后利用异或的结果检索PHT。这也是Gshare预测器仅使用分支指令的PC的值的低位的原因。In fact, PHT can't do much because of the PHT area, such as a table with a PHT of 10 bits. For example, for a PHT of a 10-bit address, the lower 10 bits of the value of the PC of the branch instruction can only be used to retrieve the PHT, that is, the lower 10 bits of the value of the PC are XORed with the binary bits recorded in the BHR, and then the exclusive OR is utilized. The results of the search for PHT. This is also the reason why the Gshare predictor uses the lower bits of the PC value of the branch instruction.
Gshare分支预测技术的预测准确度往往能够达到90%以上。Gshare分支预测技术在处理器设计中被广泛采用。但是,Gshare预测器的准确度很难进一步提高。而在现代处理器中,流水线深度都很深,分支指令也很多。如果分支预测准确度有1个百分点的提升,对处理器的性能都有很大改善。The prediction accuracy of Gshare branch prediction technology can often reach more than 90%. Gshare branch prediction technology is widely used in processor design. However, the accuracy of the Gshare predictor is difficult to further improve. In modern processors, the pipeline depth is deep and there are many branch instructions. If the branch prediction accuracy is increased by 1 percentage point, the performance of the processor is greatly improved.
针对上述需求,本发明实施例提供一种分支预测的方法与装置,相对于现有的分支预测技术,可以进一步提高分支预测的准确度。In response to the above requirements, embodiments of the present invention provide a method and apparatus for branch prediction, which can further improve the accuracy of branch prediction with respect to existing branch prediction techniques.
图3为本发明实施例提供的分支预测的方法的示意性流程图。例如,该方法可以由分支预测器执行。如图3所示,该方法包括以下步骤。FIG. 3 is a schematic flowchart of a method for branch prediction according to an embodiment of the present invention. For example, the method can be performed by a branch predictor. As shown in FIG. 3, the method includes the following steps.
S310,获取分支指令的预测信息,该预测信息至少包括该分支指令的程序计数器的值(下文简称为PC的值)以及该分支指令的分支历史寄存器的值(下文简称为BHR的值)。S310. Acquire prediction information of a branch instruction, where the prediction information includes at least a value of a program counter of the branch instruction (hereinafter referred to as a value of a PC) and a value of a branch history register of the branch instruction (hereinafter referred to as a value of BHR).
该分支指令的PC的值,指的就是该分支指令的地址。The value of the PC of the branch instruction refers to the address of the branch instruction.
应理解,BHR是一个多位的移位寄存器,如果是跳转的分支指令,则 BHR的最低位移入1bit的“1”;如果是不跳转的分支指令,则BHR的最低位移入1bit的“0”。该分支指令的BHR的值指的是,轮到预测该分支指令时,BHR中记录的当前值。It should be understood that BHR is a multi-bit shift register. If it is a branch instruction of jump, the lowest displacement of BHR is 1 "1"; if it is a branch instruction without jump, the minimum displacement of BHR is 1 bit. "0". The value of the BHR of the branch instruction refers to the current value recorded in the BHR when it is the turn to predict the branch instruction.
需要说明的是,本实施例中的预测信息包括的直接是分支指令的PC的值和BHR的值,并非是PC的值与BHR的值进行异或后的值。例如,假设分支指令的PC的值为“0x2c”,该分支指令的BHR的值为“0b00_0000_0110”,则该分支指令的预测信息为:It should be noted that the prediction information in this embodiment includes the value of the PC directly corresponding to the branch instruction and the value of the BHR, and is not a value obtained by XORing the value of the PC with the value of the BHR. For example, if the value of the PC of the branch instruction is "0x2c" and the value of the BHR of the branch instruction is "0b00_0000_0110", the prediction information of the branch instruction is:
“0x2c;"0x2c;
0b00_0000_0110”。0b00_0000_0110".
S320,通过哈希函数(下文称为HASH函数)获得该预测信息的哈希值。S320, obtaining a hash value of the prediction information by a hash function (hereinafter referred to as a HASH function).
HASH函数又称为散列函数或者杂凑函数。HASH函数可以把任意长度的消息变换成固定长度的短消息。HASH函数可以看作是一种将长消息压缩成短消息的映射。其中,通过HASH函数压缩得到的短消息可以称为消息摘要。The HASH function is also called a hash function or a hash function. The HASH function can convert a message of any length into a short message of a fixed length. The HASH function can be thought of as a mapping of compressing long messages into short messages. The short message obtained by the HASH function compression may be referred to as a message digest.
本实施例中通过HASH函数得到的该预测信息的哈希值,指的就是,将该预测信息作为HASH函数中的变量,经过哈希运算,得到的消息摘要。The hash value of the prediction information obtained by the HASH function in this embodiment refers to the message digest obtained by hashing the prediction information as a variable in the HASH function.
需要说明的是,本实施例中的HASH函数的消息摘要的长度等于PHT的地址的比特位数。例如,PHT为10位地址的表,则HASH函数的消息摘要的长度也为10比特。换句话说,该分支指令的预测信息的哈希值的比特位数等于PHT的地址的位数。It should be noted that the length of the message digest of the HASH function in this embodiment is equal to the number of bits of the address of the PHT. For example, if the PHT is a 10-bit address table, the message digest of the HASH function is also 10 bits in length. In other words, the bit number of the hash value of the prediction information of the branch instruction is equal to the number of bits of the address of the PHT.
作为一个示例,假设PHT为10位地址的表,则HASH函数的消息摘要的长度也为10比特。例如,在S310中获取的分支指令的预测信息为:As an example, assuming that the PHT is a table of 10-bit addresses, the message digest of the HASH function is also 10 bits in length. For example, the prediction information of the branch instruction acquired in S310 is:
“0x2c;"0x2c;
0b00_0000_0110”,0b00_0000_0110",
在S320中,将该预测信息作为该HASH函数的变量,经过哈希运算,得到一个10比特的消息摘要,即获得该预测信息的哈希值,该哈希值的长度为10比特。In S320, the prediction information is used as a variable of the HASH function, and a 10-bit message digest is obtained through a hash operation, that is, a hash value of the prediction information is obtained, and the hash value has a length of 10 bits.
S330,根据该预测信息的哈希值检索模式历史表(下文简称为PHT),以获取该分支指令的预测结果。S330. Search a mode history table (hereinafter referred to as PHT) according to the hash value of the prediction information to obtain a prediction result of the branch instruction.
具体地,将该哈希值看作一个地址,去查PHT,会定位到PHT中的一个表项,然后根据该表项中存储的2bit的值,预测该分支指令的跳转结果。Specifically, the hash value is regarded as an address, and the PHT is checked, and an entry in the PHT is located, and then the jump result of the branch instruction is predicted according to the value of the 2 bit stored in the entry.
假设,定义“00”和“01”表示不跳转,定义“10”和“11”表示跳转。当根据该哈希值定位的表项中的值为“10”,则可以预测该分支指令的跳转结果为跳转;当根据该哈希值定位的表项中的值为“01”,则可以预测该分支指令的跳转结果为不跳转。Assume that the definitions "00" and "01" indicate no jump, and the definitions "10" and "11" indicate jumps. When the value in the entry that is located according to the hash value is “10”, the jump result of the branch instruction may be predicted to be a jump; when the value in the entry that is located according to the hash value is “01”, Then, the jump result of the branch instruction can be predicted to be no jump.
应理解,获得分支指令的预测结果后,处理器可以按照如图1所示的流水线处理该分支指令。It should be understood that after obtaining the prediction result of the branch instruction, the processor can process the branch instruction in accordance with the pipeline shown in FIG.
还应理解,该方法还包括:根据该分支指令的实际跳转结果,更新PHT。It should also be understood that the method further includes updating the PHT based on the actual jump result of the branch instruction.
以图1所示流水线为例,到达“指令执行”这一级时,就能知道该分支指令的实际跳转情况了,如果,该分支指令的实际跳转情况与S330中的预测结果一致,则不更新PHT,反之,则根据该分支指令的实际跳转情况更新PHT,即更新根据该哈希值定位的表项中的2bit的值。Taking the pipeline shown in Figure 1 as an example, when the "instruction execution" level is reached, the actual jump of the branch instruction can be known. If the actual jump of the branch instruction is consistent with the prediction result in S330, Then, the PHT is not updated. Otherwise, the PHT is updated according to the actual jump condition of the branch instruction, that is, the value of 2 bits in the entry located according to the hash value is updated.
在现有的Gshare分支预测技术中,由于受PHT表的面积的限制,只能使用分支指令的PC的值(即分支指令的地址)的低位来做预测。如果遇到不同的分支指令的PC的低位相等的话,可能会造成PHT中表项的冲突。In the existing Gshare branch prediction technique, due to the limitation of the area of the PHT table, the lower bits of the value of the PC of the branch instruction (ie, the address of the branch instruction) can only be used for prediction. If the lower bits of the PC encountering different branch instructions are equal, it may cause conflicts in the entries in the PHT.
而在本发明中,采用通过HASH函数获得的该分支指令的预测信息的哈希值来检索PHT,因为HASH函数可以把任意长度的消息压缩成固定长度的短消息,因此无需担心由于该预测信息的比特位长度太长而导致PHT过大,从而可以在预测信息中包括分支指令的完整的PC的值,而通过使用分支指令的完整的PC的值,可以在一定程度上避免不同分支指令对应相同的PHT表项的问题,即可以有效避免PHT的表项冲突,进而可以提高分支预测的准确度。In the present invention, the hash value of the prediction information of the branch instruction obtained by the HASH function is used to retrieve the PHT, because the HASH function can compress the message of any length into a short message of a fixed length, so there is no need to worry about the prediction information. The length of the bit is too long to cause the PHT to be too large, so that the value of the complete PC of the branch instruction can be included in the prediction information, and by using the value of the complete PC of the branch instruction, the correspondence of different branch instructions can be avoided to some extent. The problem of the same PHT entry can effectively avoid the conflict of PHT entries, which can improve the accuracy of branch prediction.
还因此,本发明采用通过HASH函数获得的该分支指令的预测信息的哈希值来检索PHT,从而无需担心预测信息的比特位长度太长而导致PHT过大,因此,相对于现有的分支预测技术,可以提供较为丰富的预测信息,则可以进一步提高分支预测的准确度。Therefore, the present invention uses the hash value of the prediction information of the branch instruction obtained by the HASH function to retrieve the PHT, so that there is no need to worry that the bit length of the prediction information is too long, resulting in an excessive PHT, and therefore, relative to the existing branch. Predictive technology can provide rich prediction information, which can further improve the accuracy of branch prediction.
此外,本发明采用通过HASH函数获得的该分支指令的预测信息的哈希值来检索PHT,因此可以通过设置HASH函数的消息摘要的长度来灵活地设计PHT的大小。Furthermore, the present invention uses the hash value of the prediction information of the branch instruction obtained by the HASH function to retrieve the PHT, so the size of the PHT can be flexibly designed by setting the length of the message digest of the HASH function.
可选地,本发明中的HASH函数的设计可以使用常用的逻辑操作,例如“与”、“或”、“异或”等,还可以使用查表替换,移位等操作。Optionally, the design of the HASH function in the present invention may use common logical operations, such as "AND", "OR", "XOR", etc., and may also use table lookup replacement, shifting, and the like.
可选地,HASH函数可以被设计为使得其输出的消息摘要可以尽可能分 散。Alternatively, the HASH function can be designed such that the message digest of its output can be as scattered as possible.
应理解,HASH函数的消息摘要较为分散,可以在一定程度上保证,不同的预测信息的哈希值不同,这样可以有效避免PHT中存在空白表项,即可以提高PHT的利用率,同时也可以减少PHT中的表项之间的冲突。It should be understood that the message digest of the HASH function is relatively scattered, and it can be ensured to some extent that the hash values of different prediction information are different, which can effectively avoid the existence of blank entries in the PHT, that is, the utilization of the PHT can be improved, and at the same time, Reduce conflicts between entries in the PHT.
可选地,在一些实施例中,S310中获取的分支指令的预测信息中包括该分支指令的完整的PC的值。Optionally, in some embodiments, the prediction information of the branch instruction acquired in S310 includes the value of the complete PC of the branch instruction.
具体地,该分支指令的PC的值为32位或64位。Specifically, the value of the PC of the branch instruction is 32 bits or 64 bits.
本实施例中采用的预测信息包括分支指令的完整的PC的值,相比于现有技术中只能取分支指令的地址的低位来预测分支指令的跳转结果,本实施例可以进一步地提高预测准确度。The prediction information used in this embodiment includes the value of the complete PC of the branch instruction, and the prediction result of the branch instruction can be further improved compared to the lower bits of the address of the branch instruction in the prior art. Accuracy.
对于本发明所使用的HASH函数,可以采用不同的设计方法。Different design methods can be employed for the HASH function used in the present invention.
可选地,在一些实施例中,HASH函数可以设置成带参数的、可配置的,在执行某一段程序时,通过调整HASH函数的参数,使消息摘要的均匀性最好。Optionally, in some embodiments, the HASH function may be set to be parameterized and configurable, and the uniformity of the message digest is best by adjusting the parameters of the HASH function when executing a certain program.
可选地,在一些实施例中,该方法还包括:从多个备选哈希函数中选择该哈希函数,该哈希函数为该多个备选哈希函数中消息摘要的均匀性最好的哈希函数。Optionally, in some embodiments, the method further includes: selecting the hash function from a plurality of candidate hash functions, wherein the hash function is the most uniform of the message digest in the plurality of candidate hash functions. A good hash function.
例如,可以设计多个HASH函数,在执行某一段程序时,从多个HASH函数中挑选消息摘要均匀性最好的函数。For example, multiple HASH functions can be designed to select the function with the best message digest uniformity from multiple HASH functions when executing a certain program.
可选地,在一些实施例中,该HASH函数的消息摘要的长度根据如下信息中的至少一种确定:系统需要、处理器规模和程序大小。Optionally, in some embodiments, the length of the message digest of the HASH function is determined according to at least one of the following: system requirements, processor size, and program size.
作为一个示例,根据系统需求确定HASH函数的消息摘要的长度的过程为:当系统能够容忍PHT的面积较小所带来的预测准确率降低时,可以将消息摘要设计得短一些,相应地,PHT也会变小;反之,当系统对预测准确率要求较高时,可以将消息摘要设计得长一些,相应地,PHT的大小也会变大。As an example, the process of determining the length of the message digest of the HASH function according to the system requirement is: when the system can tolerate a decrease in the prediction accuracy caused by a small area of the PHT, the message digest can be designed to be shorter, correspondingly, PHT will also become smaller; conversely, when the system requires higher prediction accuracy, the message digest can be designed to be longer, and accordingly, the size of the PHT will also become larger.
应理解,处理器的类型很多,从小的单片机,到一般的嵌入式处理器,还有用于高性能计算的大型处理器,这些不同类型的处理器之间的规模相差很大。It should be understood that there are many types of processors, ranging from small microcontrollers to general embedded processors, as well as large processors for high performance computing. The scales between these different types of processors vary widely.
其中,对于嵌入式处理器,其追求面积小、功耗低,需要使用小的PHT,因此,需要将消息摘要设计得短一些。Among them, for embedded processors, the pursuit of small area, low power consumption, the need to use a small PHT, therefore, the message digest needs to be designed to be shorter.
对于用于高性能计算的处理器,其追求高性能,对面积与功耗并不敏感,可以使用大的PHT,因此,可以将消息摘要设计的长一点。For high performance computing processors, the pursuit of high performance, is not sensitive to area and power consumption, can use a large PHT, so the message digest can be designed to be longer.
本实施例可以根据系统需要、处理器规模或程序大小等来设置HASH函数的消息摘要的长度,从而可以根据系统需要、处理器规模或程序大小等来设计PHT的大小,因此,本实施例可以根据具体应用需求,提供合适的分支预测准确度。In this embodiment, the length of the message digest of the HASH function can be set according to system requirements, processor size, or program size, so that the size of the PHT can be designed according to system requirements, processor size, or program size. Therefore, this embodiment can Provide appropriate branch prediction accuracy based on specific application needs.
可选地,在一些实施例中,HASH函数为均匀HASH函数。Optionally, in some embodiments, the HASH function is a uniform HASH function.
一个HASH函数把长消息集合映射为短消息集合时,如果消息摘要能均匀分布,很少产生冲突,即不同的长消息对应不同的消息摘要,这样的HASH函数被称为均匀HASH函数。When a HASH function maps a long message set to a short message set, if the message digest can be evenly distributed, there is little conflict, that is, different long messages correspond to different message digests. Such a HASH function is called a uniform HASH function.
应理解,HASH函数可以使消息摘要均匀分布,相当于,用于检索PHT的哈希值也均匀分布,这样可以在很大程度上避免PHT中大量的闲置表项,即提高PHT的利用率,此外,还可以有效避免PHT中各个表项之间的冲突。It should be understood that the HASH function can evenly distribute the message digest, which is equivalent to uniformly distributing the hash value for retrieving the PHT, so that a large number of idle entries in the PHT can be largely avoided, that is, the utilization of the PHT is improved. In addition, conflicts between individual entries in the PHT can be effectively avoided.
通过本发明提供的方案,在做分支预测时,可以收集更多和分支预测密切相关的信息作为预测信息,而不用担心由于预测信息的比特位长度太长而导致PHT过大。应理解,在分支预测中,用于预测的信息越多越丰富,预测准确度越高。With the solution provided by the present invention, when branch prediction is performed, more information closely related to the branch prediction can be collected as prediction information without worrying that the PHT is too large due to the bit length of the prediction information being too long. It should be understood that in branch prediction, the more information is used for prediction, the more accurate the prediction is.
为了进一步提高分支预测的准确度,分支指令的预测信息除了包括该分支指令的PC的值和BHR的值之外,还可以包括其它的影响分支指令的执行的因素。In order to further improve the accuracy of the branch prediction, the prediction information of the branch instruction may include other factors affecting the execution of the branch instruction in addition to the value of the PC of the branch instruction and the value of the BHR.
可选地,在一些实施例中,S310中获取的分支指令的预测信息除了包括该分支指令的PC的值和BHR的值之外,还包括该分支指令的类型信息。Optionally, in some embodiments, the prediction information of the branch instruction acquired in S310 includes type information of the branch instruction in addition to the value of the PC of the branch instruction and the value of the BHR.
分支指令的类型信息指的是用于指示该分支指令的跳转类型的信息。The type information of the branch instruction refers to information indicating the jump type of the branch instruction.
例如,当分支指令为绝对跳转指令时,该分支指令的类型定义为“0b000”;当分支指令为函数返回指令时,该分支指令的类型定义为“0b010”;当分支指令为条件跳转指令时,该分支指令的类型定义为“0b001”。For example, when the branch instruction is an absolute jump instruction, the type of the branch instruction is defined as “0b000”; when the branch instruction is a function return instruction, the type of the branch instruction is defined as “0b010”; when the branch instruction is a conditional jump When the instruction is executed, the type of the branch instruction is defined as "0b001".
作为一个示例,该分支指令的PC的值为“0x2c”,该分支指令的BHR的值为“0b00_0000_0110”,该分支指令的类型信息为“0b001”,则将要输入HASH函数的预测信息为:As an example, if the value of the PC of the branch instruction is "0x2c", the value of the BHR of the branch instruction is "0b00_0000_0110", and the type information of the branch instruction is "0b001", the prediction information to be input into the HASH function is:
“0x2c"0x2c
0b0010b001
0b00_0000_0110”。0b00_0000_0110".
本实施例的预测信息中除了分支指令的PC的值与BHR的值之外,还包括分支指令的类型,相比于现有技术,增加了预测信息的内容,从而可以进一步提高分支预测的准确度。In addition to the value of the PC of the branch instruction and the value of the BHR, the prediction information of the embodiment includes the type of the branch instruction, and the content of the prediction information is increased compared with the prior art, thereby further improving the accuracy of the branch prediction. degree.
可选地,在一些实施例中,S310中获取的分支指令的预测信息除了包括该分支指令的PC的值和BHR的值之外,还包括在该分支指令之前执行的n条分支指令的PC的值,n为正整数。Optionally, in some embodiments, the prediction information of the branch instruction acquired in S310 includes, in addition to the value of the PC of the branch instruction and the value of the BHR, a PC that includes n branch instructions executed before the branch instruction. The value of n is a positive integer.
可选地,在本实施例中,该预测信息还包括该n条分支指令的类型信息。Optionally, in this embodiment, the prediction information further includes type information of the n branch instructions.
例如,n等于3。For example, n is equal to 3.
本实施例的预测信息中除了分支指令的PC的值与BHR的值之外,还包括在该分支指令之前执行的一条或多条分支指令的PC的值,还可以包括这一条或多条分支指令的类型信息,相比于现有技术,用于预测分支指令的内容更加丰富,从而可以进一步提高分支预测的准确度。In addition to the value of the PC of the branch instruction and the value of the BHR, the prediction information of the embodiment includes the value of the PC of the one or more branch instructions executed before the branch instruction, and may further include the one or more branches. The type information of the instruction is richer in the content for predicting the branch instruction than the prior art, so that the accuracy of the branch prediction can be further improved.
可选地,在一些实施例中,S310中获取的分支指令的预测信息除了包括该分支指令的PC的值和BHR的值之外,还包括处理器的栈指针,和/或该分支指令的条件寄存器的值,该条件寄存器的值用于指示该分支指令的跳转结果。Optionally, in some embodiments, the prediction information of the branch instruction acquired in S310 includes, in addition to the value of the PC of the branch instruction and the value of the BHR, a stack pointer of the processor, and/or the branch instruction The value of the condition register, which is used to indicate the result of the branch instruction.
条件寄存器的值指的是该条件寄存器存储的值。分支指令的条件寄存器中存储着“1”或者“0”。条件寄存器的值用于指示分支指令的实际跳转结果。The value of the condition register refers to the value stored by the condition register. The condition register of the branch instruction stores "1" or "0". The value of the condition register is used to indicate the actual jump result of the branch instruction.
应理解,栈指针是处理器里面普遍存在的一个概念,指的是一个内存地址。程序在执行过程中,如果调用子函数,需要先把主程序中的一些寄存器等变量写入栈区,以保护处理器内部的寄存器现场,然后再执行子函数。It should be understood that the stack pointer is a ubiquitous concept in the processor and refers to a memory address. During the execution of the program, if the sub-function is called, some variables such as registers in the main program need to be written to the stack area to protect the register field inside the processor, and then the sub-function is executed.
本实施例的预测信息中除了分支指令的PC的值与BHR的值之外,还包括处理器的栈指针,和/或该分支指令的条件寄存器的值,相比于现有技术,用于预测分支指令的内容更加丰富,从而可以进一步提高分支预测的准确度。The prediction information of this embodiment includes, in addition to the value of the PC of the branch instruction and the value of the BHR, the stack pointer of the processor, and/or the value of the condition register of the branch instruction, which is used in comparison with the prior art. The content of the prediction branch instruction is more abundant, so that the accuracy of the branch prediction can be further improved.
需要说明的是,上文描述的关于分支指令的预测信息可以包括哪些内容的实施例,可以任意方式组合,本发明对此不作限定。It should be noted that the foregoing description of the content of the prediction information about the branch instruction may include an embodiment of the content, and may be combined in any manner, which is not limited by the present invention.
例如,分支指令的预测信息包括该分支指令的PC的值、该分支指令的BHR的值、该分支指令的类型信息、在该分支指令之前执行的n条分支指 令的PC的值与类型信息。For example, the prediction information of the branch instruction includes the value of the PC of the branch instruction, the value of the BHR of the branch instruction, the type information of the branch instruction, and the value and type information of the PC of the n branch instructions executed before the branch instruction.
再例如,分支指令的预测信息包括该分支指令的PC的值、该分支指令的BHR的值、该分支指令的类型信息、在该分支指令之前执行的n条分支指令的PC的值与类型信息、处理器的栈指针,该分支指令的条件寄存器的值。For another example, the prediction information of the branch instruction includes a value of the PC of the branch instruction, a value of the BHR of the branch instruction, type information of the branch instruction, and a value and type information of the PC of the n branch instructions executed before the branch instruction. The stack pointer of the processor, the value of the condition register of the branch instruction.
图4为本发明实施例进行分支预测的原理示意图。分支指令的预测信息中包括多项信息:该分支指令的PC的值、该分支指令的上一次分支指令的PC的值、该分支指令的再上一次分支指令的PC的值、该分支指令的类型信息、该分支指令的上一次分支指令的类型信息、该分支指令的再上一次分支指令的类型信息、……、该分支指令的BHR的值,其中“……”表示其它的影响该分支指令的执行的因素。FIG. 4 is a schematic diagram of the principle of performing branch prediction according to an embodiment of the present invention. The prediction information of the branch instruction includes a plurality of pieces of information: a value of the PC of the branch instruction, a value of a PC of a branch instruction of the branch instruction, a value of a PC of the branch instruction of the previous branch instruction, and a branch instruction Type information, type information of the last branch instruction of the branch instruction, type information of the last branch instruction of the branch instruction, ..., the value of the BHR of the branch instruction, where "..." indicates that the other branch is affected The factor of execution of the instruction.
该分支指令的上一次分支指令指的是,在该分支指令之前执行的上一次分支指令;该分支指令的再上一次分支指令指的是,在该分支指令之前执行的上上次分支指令。The last branch instruction of the branch instruction refers to the last branch instruction executed before the branch instruction; the last branch instruction of the branch instruction refers to the last previous branch instruction executed before the branch instruction.
将上述多项信息作为HASH函数的变量,进行哈希运算,获得哈希值,该哈希值的长度等于PHT的地址的位数。利用该哈希值检索PHT,获得该分支指令的预测结果。The above information is used as a variable of the HASH function, and a hash operation is performed to obtain a hash value whose length is equal to the number of bits of the address of the PHT. The PHT is retrieved using the hash value to obtain a prediction result of the branch instruction.
为了更好地理解本发明提供的技术方案,下文以如下一段程序为例进行描述。In order to better understand the technical solution provided by the present invention, the following paragraph is taken as an example for description.
Figure PCTCN2018081057-appb-000001
Figure PCTCN2018081057-appb-000001
Figure PCTCN2018081057-appb-000002
Figure PCTCN2018081057-appb-000002
在上述程序中,第二列为指令,第一列为指令对应的PC的值。In the above program, the second column is an instruction, and the first column is the value of the PC corresponding to the instruction.
call_ins2,br_ins4,br_ins11,ret49为四条分支指令,其余指令都为非分支指令。Call_ins2, br_ins4, br_ins11, ret49 are four branch instructions, and the rest of the instructions are non-branch instructions.
其中,call_ins2为函数调用指令,是绝对跳转指令。假设将绝对跳转指令的指令类型定义为“0b000”。br_ins4和br_ins11为条件跳转指令。假设将条件跳转指令的指令类型定义为“0b001”。ret为函数返回指令。假设将函数返回指令的指令类型定义为“0b010”。Among them, call_ins2 is a function call instruction, which is an absolute jump instruction. Assume that the instruction type of the absolute jump instruction is defined as "0b000". Br_ins4 and br_ins11 are conditional jump instructions. Assume that the instruction type of the conditional jump instruction is defined as "0b001". Ret returns an instruction for the function. Assume that the instruction type of the function return instruction is defined as "0b010".
假设在某一次执行指令过程中,call_ins2调用了子函数(即sub_function),reg49从子函数返回,br_ins4没有跳转,即:pc(08)->pc(c0)->pc(c4)->pc(0c)->pc(18)->pc(2c),现在需要对分支指令br_ins11做分支预测。Assume that during a certain execution of the instruction, call_ins2 calls the subfunction (ie sub_function), reg49 returns from the subfunction, and br_ins4 does not jump, ie: pc(08)->pc(c0)->pc(c4)-> Pc(0c)->pc(18)->pc(2c), branch prediction is now required for the branch instruction br_ins11.
采用如图3所示实施例提供的方法对分支指令br_ins11做分支预测的步骤如下。The steps of branch prediction of the branch instruction br_ins11 by using the method provided in the embodiment shown in FIG. 3 are as follows.
在S310中,获取分支指令br_ins11的预测信息,该预测信息包括分支指令br_ins11的PC的值,分支指令br_ins11的上一次分支指令br_ins4的PC的值,分支指令br_ins11的再上一次分支指令ret49的PC的值,分支指令br_ins11的类型信息,分支指令br_ins11的上一次分支指令br_ins4的类型信息,分支指令br_ins11的再上一次分支指令ret49的类型信息,分支指令br_ins11的BHR的值。In S310, the prediction information of the branch instruction br_ins11 is obtained, the prediction information includes the value of the PC of the branch instruction br_ins11, the value of the PC of the branch instruction br_ins11 of the last branch instruction br_ins4, and the PC of the branch instruction br_ins11 of the last branch instruction ret49 The value of the branch instruction type information of the branch instruction br_ins11, the type information of the last branch instruction br_ins4 of the branch instruction br_ins11, the type information of the branch instruction ret49 of the branch instruction br_ins11, and the BHR value of the branch instruction br_ins11.
具体地,分支指令br_ins11的PC的值为“0x2c”,分支指令br_ins11的上一次分支指令br_ins4的PC的值为“0x18”,分支指令br_ins11的再上一次分支指令ret49的PC的值为“c4”,分支指令br_ins11的类型信息为“0b001”,分支指令br_ins11的上一次分支指令br_ins4的类型信息为“0b001”,分支指令br_ins11的再上一次分支指令ret49的类型信息为“0b010”。Specifically, the value of the PC of the branch instruction br_ins11 is "0x2c", the value of the PC of the last branch instruction br_ins4 of the branch instruction br_ins11 is "0x18", and the value of the PC of the branch instruction br_ins11 of the last branch instruction ret49 is "c4". The type information of the branch instruction br_ins11 is "0b001", the type information of the last branch instruction br_ins4 of the branch instruction br_ins11 is "0b001", and the type information of the last branch instruction ret49 of the branch instruction br_ins11 is "0b010".
假设,BHR为10位的移位寄存器,遇到跳转的跳转指令时,最低位移入1比特1,遇到不跳转的跳转指令,最低位移入1比特0,且BHR初始值 为0。Assume that BHR is a 10-bit shift register. When a jump instruction is encountered, the lowest shift is 1 bit, and a jump instruction without a jump is encountered. The lowest shift is 1 bit 0, and the initial value of BHR is 0.
在执行了pc(08)时BHR=0b00_0000_0001;在执行了pc(c4)时BHR=0b00_0000_0011;在执行了pc(18)时,BHR=0b00_0000_0110。当需要预测指令br_ins11时,BHR=0b00_0000_0110。即分支指令br_ins11的BHR的值为“0b00_0000_0110”。BHR=0b00_0000_0001 when pc(08) is executed; BHR=0b00_0000_0011 when pc(c4) is executed; BHR=0b00_0000_0110 when pc(18) is executed. When the prediction instruction br_ins11 is needed, BHR=0b00_0000_0110. That is, the value of BHR of the branch instruction br_ins11 is "0b00_0000_0110".
则该需要预测的分支指令br_ins11的预测信息可以表示为:Then, the prediction information of the branch instruction br_ins11 that needs to be predicted can be expressed as:
0x2c0x2c
0x180x18
0xc40xc4
0b0010b001
0b0010b001
0b0100b010
……......
0b00_0000_01100b00_0000_0110
上面的“……”表示根据实际需求,还可以在预测信息中增加其它信息,例如还可以增加条件寄存器的值或处理器的栈指针。The above "..." indicates that other information may be added to the prediction information according to actual needs, for example, the value of the condition register or the stack pointer of the processor may be added.
在S320中,将分支指令br_ins11的预测信息作为HASH函数的变量,进行哈希运算,得到一个10比特的消息摘要(即该预测信息的哈希值)。In S320, the prediction information of the branch instruction br_ins11 is used as a variable of the HASH function, and a hash operation is performed to obtain a 10-bit message digest (ie, a hash value of the prediction information).
在S330中,使用该消息摘要,去检索PHT,根据定位的表项中的2bit的值来预测分支指令br_ins11是否跳转。In S330, the message digest is used to retrieve the PHT, and the branch instruction br_ins11 is predicted to jump according to the value of 2 bits in the located entry.
上述可知,相比于现有的Gshare分支预测技术,本发明在分支预测的过程中,可以考虑更全面的影响分支指令的因素,从而可以充分区分开同一条分支指令的不同执行条件,进而可以区分开同一条分支指令跳转和不跳转两种情况,因此可以有效提高分支预测的准确度,此外,也可以在一定程度上避免PHT中的表项冲突。As can be seen from the above, compared with the existing Gshare branch prediction technology, the present invention can consider more comprehensive factors affecting branch instructions in the process of branch prediction, so that different execution conditions of the same branch instruction can be fully distinguished, and thus The division of the same branch instruction jumps and does not jump, so that the accuracy of the branch prediction can be effectively improved. In addition, the entry conflicts in the PHT can be avoided to some extent.
需要说明的是,上文描述了分支指令的预测信息除了包括该分支指令的PC的值和BHR的值之外,还可以包括如下影响分支预测的执行的因素中的至少一种:该分支指令的类型信息、在该分支指令之前执行的n条分支指令的程序计数器的值、该n条分支指令的类型信息、处理器的栈指针、条件寄存器的值。但本发明并非限定因此,实际操作中,对于任意会影响到分支指令的执行的因素都可以添加到分支指令的预测信息中。It should be noted that the prediction information of the branch instruction described above may include at least one of the following factors affecting the execution of the branch prediction in addition to the value of the PC including the branch instruction and the value of the BHR: the branch instruction The type information, the value of the program counter of the n branch instructions executed before the branch instruction, the type information of the n branch instructions, the stack pointer of the processor, and the value of the condition register. However, the present invention is not limited. Therefore, in actual operation, any factor that affects the execution of the branch instruction can be added to the prediction information of the branch instruction.
可选地,在上述某些实施例中,S310具体包括:根据分支指令的PC的值和BHR的值,以及选择的影响因素,生成该分支指令的预测信息,该预测信息包括分支指令的PC的值和BHR的值,以及该所选择的影响因素。其中,所选择的影响因素是根据控制信息从如下因素中选择出来的至少一种因素:该分支指令的类型信息、在该分支指令之前执行的n条分支指令的程序计数器的值、该n条分支指令的类型信息、处理器的栈指针、条件寄存器的值。Optionally, in some embodiments, the S310 specifically includes: generating, according to the value of the PC of the branch instruction and the value of the BHR, and the selected influencing factor, the prediction information of the branch instruction, where the prediction information includes the PC of the branch instruction. The value of the BHR and the selected factors. The selected influencing factor is at least one factor selected from the following factors according to the control information: type information of the branch instruction, a value of a program counter of the n branch instructions executed before the branch instruction, and the n pieces The type information of the branch instruction, the stack pointer of the processor, and the value of the condition register.
该控制信息中包括上述各个影响因素对该分支指令的执行的影响因子。The control information includes an influence factor of the above-mentioned respective influence factors on the execution of the branch instruction.
作为一个示例,从上述多种影响因素中选择出影响因子较大(或最大)的因素添加到分支指令的预测信息中。As an example, a factor that selects a larger (or largest) influence factor from among the above various influencing factors is added to the prediction information of the branch instruction.
可选地,该控制信息可以是人工配置的,是一个经验值。Optionally, the control information can be manually configured and is an empirical value.
可选地,该控制信息可以由运行指令的处理器生成。Alternatively, the control information can be generated by a processor that runs the instructions.
例如,处理器在每执行完一个分支指令后,存储各项影响因素,然后通过相关分析,例如统计分析,计算出每项影响因素的影响因子。For example, after each branch instruction is executed, the processor stores various influencing factors, and then calculates the influence factors of each influencing factor through correlation analysis, such as statistical analysis.
本实施例,通过根据控制信息生成分支指令的预测信息,使得在实际操作中,可以根据具体需求,从多个影响因素中选择对分支预测的执行影响较大的因素,同时可以舍弃对分支预测的执行影响较小的因素,可以进一步提高分支预测的准确度。In this embodiment, by generating the prediction information of the branch instruction according to the control information, in actual operation, a factor that greatly affects the execution of the branch prediction may be selected from a plurality of influencing factors according to specific requirements, and the branch prediction may be discarded. The implementation of smaller factors can further improve the accuracy of branch prediction.
综上所述,本发明实施例提供的技术方案,采用通过HASH函数获得的该分支指令的预测信息的哈希值来检索PHT,因为HASH函数可以把任意长度的消息压缩成固定长度的短消息,因此无需担心由于该预测信息的比特位长度太长而导致PHT过大,因此可以考虑更全面的影响分支指令的因素,从而可以提高分支预测的准确度。In summary, the technical solution provided by the embodiment of the present invention uses the hash value of the prediction information of the branch instruction obtained by the HASH function to retrieve the PHT, because the HASH function can compress the message of any length into a short message of a fixed length. Therefore, there is no need to worry that the PHT is too large due to the long bit length of the prediction information, so that a more comprehensive factor affecting the branch instruction can be considered, so that the accuracy of the branch prediction can be improved.
上文描述了本发明的方法实施例,下文描述本发明的装置实施例,应理解,装置实施例的描述与方法实施例的描述相互对应,因此,未详细描述的内容可以参见上文方法实施例,为了简洁,这里不再赘述。The method embodiments of the present invention are described above, and the device embodiments of the present invention are described below. It should be understood that the description of the device embodiments corresponds to the description of the method embodiments, and therefore, the details of the methods are not described in detail. For the sake of brevity, we will not repeat them here.
图5为本发明实施例提供的分支预测的装置的示意性框图。该装置包括如下单元。FIG. 5 is a schematic block diagram of an apparatus for branch prediction according to an embodiment of the present invention. The device comprises the following units.
获取单元510,用于获取分支指令的预测信息,该预测信息至少包括该分支指令的程序计数器的值以及该分支指令的分支历史寄存器的值。The obtaining unit 510 is configured to acquire prediction information of a branch instruction, where the prediction information includes at least a value of a program counter of the branch instruction and a value of a branch history register of the branch instruction.
计算单元520,用于通过哈希函数获得该预测信息的哈希值。以及The calculating unit 520 is configured to obtain a hash value of the prediction information by using a hash function. as well as
检索单元530,用于根据该哈希值检索模式历史表,以获取该分支指令的预测结果。The searching unit 530 is configured to retrieve a mode history table according to the hash value to obtain a prediction result of the branch instruction.
可选地,在一些实施例中,该预测信息还包括该分支指令的类型信息。Optionally, in some embodiments, the prediction information further includes type information of the branch instruction.
可选地,在一些实施例中,该预测信息还包括在该分支指令之前执行的n条分支指令的程序计数器的值,n为正整数。Optionally, in some embodiments, the prediction information further includes a value of a program counter of n branch instructions executed before the branch instruction, and n is a positive integer.
可选地,在一些实施例中,该预测信息还包括该n条分支指令的类型信息。Optionally, in some embodiments, the prediction information further includes type information of the n branch instructions.
可选地,在一些实施例中,该预测信息还包括:处理器的栈指针,和/或该分支指令的条件寄存器的值,该条件寄存器的值用于指示该分支指令的跳转结果。Optionally, in some embodiments, the prediction information further includes: a stack pointer of the processor, and/or a value of a condition register of the branch instruction, the value of the condition register is used to indicate a jump result of the branch instruction.
可选地,在一些实施例中,该预测信息还包括如下信息中的至少一种:该分支指令的类型信息、在该分支指令之前执行的n条分支指令的程序计数器的值、该n条分支指令的类型信息、处理器的栈指针、条件寄存器的值。Optionally, in some embodiments, the prediction information further includes at least one of: type information of the branch instruction, a value of a program counter of the n branch instructions executed before the branch instruction, the n pieces The type information of the branch instruction, the stack pointer of the processor, and the value of the condition register.
可选地,在一些实施例中,该分支指令的程序计数器的值为32位或64位。Optionally, in some embodiments, the value of the program counter of the branch instruction is 32 bits or 64 bits.
可选地,在一些实施例中,获取单元还用于,从多个备选哈希函数中选择该哈希函数,该哈希函数为该多个备选哈希函数中消息摘要的均匀性最好的哈希函数。Optionally, in some embodiments, the obtaining unit is further configured to: select the hash function from a plurality of candidate hash functions, where the hash function is uniformity of the message digest in the plurality of candidate hash functions The best hash function.
可选地,在一些实施例中,该哈希函数的消息摘要的长度根据如下信息中的至少一种确定:系统需要、处理器规模和程序大小。Optionally, in some embodiments, the length of the message digest of the hash function is determined based on at least one of the following: system requirements, processor size, and program size.
可选地,在一些实施例中,该哈希函数为均匀哈希函数。Optionally, in some embodiments, the hash function is a uniform hash function.
可选地,在一些实施例中,该装置还包括:Optionally, in some embodiments, the device further includes:
更新单元,用于根据该分支指令的实际跳转结果,更新该模式历史表。And an update unit, configured to update the mode history table according to an actual jump result of the branch instruction.
如图6所示,本发明实施例还提供一种分支预测器,包括:存储器620与处理器610,该存储器620用于存储指令,该处理器610用于执行该存储器620存储的指令,并且对该存储器620中存储的指令的执行使得,该处理器610用于执行上述方法实施例的方法。As shown in FIG. 6, an embodiment of the present invention further provides a branch predictor, including: a memory 620 for storing instructions, and a processor 610, configured to execute instructions stored by the memory 620, and Execution of the instructions stored in the memory 620 causes the processor 610 to perform the method of the above method embodiments.
可选地,如图6所示,在某些实施例中,该分支预测器还包括至少一个寄存器630,该至少一个寄存器630用于存储分支指令的预测信息。Optionally, as shown in FIG. 6, in some embodiments, the branch predictor further includes at least one register 630 for storing prediction information of the branch instruction.
应理解,该分支预测器包括BHR 640。It should be understood that the branch predictor includes a BHR 640.
本发明实施例还提供一种计算机存储介质,其上存储有计算机程序,该 计算机程序被计算机执行时使得,该计算机执行上述方法实施例的方法。The embodiment of the invention further provides a computer storage medium on which is stored a computer program, which when executed by a computer, causes the computer to execute the method of the above method embodiment.
本发明实施例还提供一种包含指令的计算机程序产品,该指令被计算机执行时使得计算机执行上述方法实施例的方法。Embodiments of the present invention also provide a computer program product comprising instructions that, when executed by a computer, cause a computer to perform the method of the above method embodiments.
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其他任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本发明实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(digital subscriber line,DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质(例如,软盘、硬盘、磁带)、光介质(例如数字视频光盘(digital video disc,DVD))、或者半导体介质(例如固态硬盘(solid state disk,SSD))等。In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware or any other combination. When implemented in software, it may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, the processes or functions described in accordance with embodiments of the present invention are generated in whole or in part. The computer can be a general purpose computer, a special purpose computer, a computer network, or other programmable device. The computer instructions can be stored in a computer readable storage medium or transferred from one computer readable storage medium to another computer readable storage medium, for example, the computer instructions can be from a website site, computer, server or data center Transmission to another website site, computer, server or data center via wired (eg coaxial cable, fiber optic, digital subscriber line (DSL)) or wireless (eg infrared, wireless, microwave, etc.). The computer readable storage medium can be any available media that can be accessed by a computer or a data storage device such as a server, data center, or the like that includes one or more available media. The usable medium may be a magnetic medium (for example, a floppy disk, a hard disk, a magnetic tape), an optical medium (such as a digital video disc (DVD)), or a semiconductor medium (such as a solid state disk (SSD)). .
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本发明的范围。Those of ordinary skill in the art will appreciate that the elements and algorithm steps of the various examples described in connection with the embodiments disclosed herein can be implemented in electronic hardware or a combination of computer software and electronic hardware. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the solution. A person skilled in the art can use different methods for implementing the described functions for each particular application, but such implementation should not be considered to be beyond the scope of the present invention.
在本发明所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。In the several embodiments provided by the present invention, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. For example, the device embodiments described above are merely illustrative. For example, the division of the unit is only a logical function division. In actual implementation, there may be another division manner, for example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored or not executed. In addition, the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be in an electrical, mechanical or other form.
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作 为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separate, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
另外,在本发明各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。In addition, each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
以上所述,仅为本发明的具体实施方式,但本发明的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本发明揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本发明的保护范围之内。因此,本发明的保护范围应以所述权利要求的保护范围为准。The above is only a specific embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily think of changes or substitutions within the technical scope of the present invention. It should be covered by the scope of the present invention. Therefore, the scope of the invention should be determined by the scope of the appended claims.

Claims (25)

  1. 一种分支预测的方法,其特征在于,包括:A method for branch prediction, characterized in that it comprises:
    获取分支指令的预测信息,所述预测信息至少包括所述分支指令的程序计数器的值以及所述分支指令的分支历史寄存器的值;Obtaining prediction information of a branch instruction, where the prediction information includes at least a value of a program counter of the branch instruction and a value of a branch history register of the branch instruction;
    通过哈希函数获得所述预测信息的哈希值;以及Obtaining a hash value of the prediction information by a hash function;
    根据所述哈希值检索模式历史表,以获取所述分支指令的预测结果。And retrieving the mode history table according to the hash value to obtain a prediction result of the branch instruction.
  2. 根据权利要求1所述的方法,其特征在于,所述预测信息还包括所述分支指令的类型信息。The method of claim 1, wherein the prediction information further comprises type information of the branch instruction.
  3. 根据权利要求1或2所述的方法,其特征在于,所述预测信息还包括在所述分支指令之前执行的n条分支指令的程序计数器的值,n为正整数。The method according to claim 1 or 2, wherein the prediction information further comprises a value of a program counter of n branch instructions executed before the branch instruction, and n is a positive integer.
  4. 根据权利要求3所述的方法,其特征在于,所述预测信息还包括所述n条分支指令的类型信息。The method according to claim 3, wherein the prediction information further includes type information of the n branch instructions.
  5. 根据权利要求1至4中任一项所述的方法,其特征在于,所述预测信息还包括:处理器的栈指针,和/或所述分支指令的条件寄存器的值,所述条件寄存器的值用于指示所述分支指令的跳转结果。The method according to any one of claims 1 to 4, wherein the prediction information further comprises: a stack pointer of the processor, and/or a value of a condition register of the branch instruction, the condition register The value is used to indicate the result of the branch instruction.
  6. 根据权利要求1所述的方法,其特征在于,所述预测信息还包括如下信息中的至少一种:所述分支指令的类型信息、在所述分支指令之前执行的n条分支指令的程序计数器的值、所述n条分支指令的类型信息、处理器的栈指针、条件寄存器的值。The method according to claim 1, wherein the prediction information further comprises at least one of: type information of the branch instruction, a program counter of n branch instructions executed before the branch instruction The value, the type information of the n branch instructions, the stack pointer of the processor, and the value of the condition register.
  7. 根据权利要求1至6中任一项所述的方法,其特征在于,所述分支指令的程序计数器的值为32位或64位。The method according to any one of claims 1 to 6, wherein the value of the program counter of the branch instruction is 32 bits or 64 bits.
  8. 根据权利要求1至7中任一项所述的方法,其特征在于,所述哈希函数为均匀哈希函数。The method according to any one of claims 1 to 7, wherein the hash function is a uniform hash function.
  9. 根据权利要求1至8中任一项所述的方法,其特征在于,所述方法还包括:The method according to any one of claims 1 to 8, wherein the method further comprises:
    从多个备选哈希函数中选择所述哈希函数,所述哈希函数为所述多个备选哈希函数中均匀性最好的哈希函数。The hash function is selected from a plurality of alternative hash functions, the hash function being the hash function having the best uniformity among the plurality of candidate hash functions.
  10. 根据权利要求1至8中任一项所述的方法,其特征在于,所述哈希函数的消息摘要的长度根据如下信息中的至少一种确定:系统需要、处理器规模和程序大小。The method according to any one of claims 1 to 8, wherein the length of the message digest of the hash function is determined according to at least one of the following: system requirements, processor size, and program size.
  11. 根据权利要求1至10中任一项所述的方法,其特征在于,所述方 法还包括:The method according to any one of claims 1 to 10, wherein the method further comprises:
    根据所述分支指令的实际跳转结果,更新所述模式历史表。Updating the mode history table according to an actual jump result of the branch instruction.
  12. 一种分支预测的装置,其特征在于,包括:A device for branch prediction, comprising:
    获取单元,用于获取分支指令的预测信息,所述预测信息至少包括所述分支指令的程序计数器的值以及所述分支指令的分支历史寄存器的值;An obtaining unit, configured to acquire prediction information of a branch instruction, where the prediction information includes at least a value of a program counter of the branch instruction and a value of a branch history register of the branch instruction;
    计算单元,用于通过哈希函数获得所述预测信息的哈希值;以及a calculating unit, configured to obtain a hash value of the prediction information by using a hash function;
    检索单元,用于根据所述哈希值检索模式历史表,以获取所述分支指令的预测结果。a retrieval unit, configured to retrieve a pattern history table according to the hash value to obtain a prediction result of the branch instruction.
  13. 根据权利要求12所述的装置,其特征在于,所述预测信息还包括所述分支指令的类型信息。The apparatus according to claim 12, wherein said prediction information further comprises type information of said branch instruction.
  14. 根据权利要求12或13所述的装置,其特征在于,所述预测信息还包括在所述分支指令之前执行的n条分支指令的程序计数器的值,n为正整数。The apparatus according to claim 12 or 13, wherein said prediction information further comprises a value of a program counter of n branch instructions executed before said branch instruction, n being a positive integer.
  15. 根据权利要求14所述的装置,其特征在于,所述预测信息还包括所述n条分支指令的类型信息。The apparatus according to claim 14, wherein said prediction information further comprises type information of said n branch instructions.
  16. 根据权利要求12至15中任一项所述的装置,其特征在于,所述预测信息还包括:处理器的栈指针,和/或所述分支指令的条件寄存器的值,所述条件寄存器的值用于指示所述分支指令的跳转结果。The apparatus according to any one of claims 12 to 15, wherein the prediction information further comprises: a stack pointer of the processor, and/or a value of a condition register of the branch instruction, the condition register The value is used to indicate the result of the branch instruction.
  17. 根据权利要求12所述的装置,其特征在于,所述预测信息还包括如下信息中的至少一种:所述分支指令的类型信息、在所述分支指令之前执行的n条分支指令的程序计数器的值、所述n条分支指令的类型信息、处理器的栈指针、条件寄存器的值。The apparatus according to claim 12, wherein said prediction information further comprises at least one of: type information of said branch instruction, program counter of n branch instructions executed before said branch instruction The value, the type information of the n branch instructions, the stack pointer of the processor, and the value of the condition register.
  18. 根据权利要求12至17中任一项所述的装置,其特征在于,所述分支指令的程序计数器的值为32位或64位。The apparatus according to any one of claims 12 to 17, wherein the value of the program counter of the branch instruction is 32 bits or 64 bits.
  19. 根据权利要求12至18中任一项所述的装置,其特征在于,所述获取单元还用于,从多个备选哈希函数中选择所述哈希函数,所述哈希函数为所述多个备选哈希函数中消息摘要的均匀性最好的哈希函数。The apparatus according to any one of claims 12 to 18, wherein the obtaining unit is further configured to select the hash function from a plurality of candidate hash functions, the hash function being The hash function with the best uniformity of the message digest in the multiple alternative hash functions.
  20. 根据权利要求12至18中任一项所述的装置,其特征在于,所述哈希函数的消息摘要的长度根据如下信息中的至少一种确定:系统需要、处理器规模和程序大小。The apparatus according to any one of claims 12 to 18, wherein the length of the message digest of the hash function is determined according to at least one of the following: system requirements, processor size, and program size.
  21. 根据权利要求12至20中任一项所述的装置,其特征在于,所述哈 希函数为均匀哈希函数。The apparatus according to any one of claims 12 to 20, wherein the hash function is a uniform hash function.
  22. 根据权利要求12至21中任一项所述的装置,其特征在于,所述装置还包括:The device according to any one of claims 12 to 21, wherein the device further comprises:
    更新单元,用于根据所述分支指令的实际跳转结果,更新所述模式历史表。And an updating unit, configured to update the mode history table according to an actual jump result of the branch instruction.
  23. 一种分支预测器,其特征在于,包括:存储器与处理器,所述存储器用于存储指令,所述处理器用于执行所述存储器存储的指令,并且对所述存储器中存储的指令的执行使得,所述处理器用于执行如权利要求1至11中任一项所述的方法。A branch predictor, comprising: a memory for storing instructions, said processor for executing instructions stored by said memory, and performing execution of instructions stored in said memory such that The processor is operative to perform the method of any one of claims 1 to 11.
  24. 一种计算机存储介质,其特征在于,其上存储有计算机程序,所述计算机程序被计算机执行时使得,所述计算机执行如权利要求1至11中任一项所述的方法。A computer storage medium, characterized in that a computer program is stored thereon, the computer program being executed by a computer such that the computer performs the method of any one of claims 1 to 11.
  25. 一种包含指令的计算机程序产品,其特征在于,所述指令被计算机执行时使得计算机执行如权利要求1至11中任一项所述的方法。A computer program product comprising instructions, wherein the instructions, when executed by a computer, cause a computer to perform the method of any one of claims 1 to 11.
PCT/CN2018/081057 2018-03-29 2018-03-29 Branch prediction method and device WO2019183877A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201880011003.1A CN110462587A (en) 2018-03-29 2018-03-29 The method and apparatus of branch prediction
PCT/CN2018/081057 WO2019183877A1 (en) 2018-03-29 2018-03-29 Branch prediction method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2018/081057 WO2019183877A1 (en) 2018-03-29 2018-03-29 Branch prediction method and device

Publications (1)

Publication Number Publication Date
WO2019183877A1 true WO2019183877A1 (en) 2019-10-03

Family

ID=68062062

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/081057 WO2019183877A1 (en) 2018-03-29 2018-03-29 Branch prediction method and device

Country Status (2)

Country Link
CN (1) CN110462587A (en)
WO (1) WO2019183877A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113504943B (en) * 2021-09-03 2021-12-14 广东省新一代通信与网络创新研究院 Method and system for implementing hybrid branch prediction device for reducing resource usage

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101763248A (en) * 2008-12-25 2010-06-30 世意法(北京)半导体研发有限责任公司 System and method for multi-mode branch predictor
CN102160033A (en) * 2008-09-05 2011-08-17 超威半导体公司 Hybrid branch prediction device with sparse and dense prediction caches
CN102566974A (en) * 2012-01-14 2012-07-11 哈尔滨工程大学 Instruction acquisition control method based on simultaneous multithreading
CN105320519A (en) * 2014-07-25 2016-02-10 想象技术有限公司 Conditional branch prediction using a long history

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102053818B (en) * 2009-11-05 2014-07-02 无锡江南计算技术研究所 Branch prediction method and device as well as processor
CN105718241B (en) * 2016-01-18 2018-03-13 北京时代民芯科技有限公司 A kind of sort-type mixed branch forecasting system based on SPARC V8 architectures

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102160033A (en) * 2008-09-05 2011-08-17 超威半导体公司 Hybrid branch prediction device with sparse and dense prediction caches
CN101763248A (en) * 2008-12-25 2010-06-30 世意法(北京)半导体研发有限责任公司 System and method for multi-mode branch predictor
CN102566974A (en) * 2012-01-14 2012-07-11 哈尔滨工程大学 Instruction acquisition control method based on simultaneous multithreading
CN105320519A (en) * 2014-07-25 2016-02-10 想象技术有限公司 Conditional branch prediction using a long history

Also Published As

Publication number Publication date
CN110462587A (en) 2019-11-15

Similar Documents

Publication Publication Date Title
US6438673B1 (en) Correlated address prediction
US8127119B2 (en) Control-flow prediction using multiple independent predictors
US20150046690A1 (en) Techinques for selecting a predicted indirect branch address from global and local caches
JP2018063684A (en) Branch predictor
WO2020199058A1 (en) Branch instruction processing method, branch predictor, and processor
US20090037709A1 (en) Branch prediction device, hybrid branch prediction device, processor, branch prediction method, and branch prediction control program
CN109643237B (en) Branch target buffer compression
CN109308191B (en) Branch prediction method and device
CN104657285B (en) Data caching system and method
US8473727B2 (en) History based pipelined branch prediction
KR20150110337A (en) Apparatus for decoupling l2 btb from l2 cache to accelerate search for miss after miss and method thereof
US20070162895A1 (en) Mechanism and method for two level adaptive trace prediction
US10481912B2 (en) Variable branch target buffer (BTB) line size for compression
US10423419B2 (en) Stream based branch prediction index accelerator for multiple stream exits
JP2018523239A (en) Power efficient fetch adaptation
WO2019183877A1 (en) Branch prediction method and device
US10877893B2 (en) Adaptive pre-fetch
US9652245B2 (en) Branch prediction for indirect jumps by hashing current and previous branch instruction addresses
CN110806900A (en) Memory access instruction processing method and processor
CN115480826B (en) Branch predictor, branch prediction method, branch prediction device and computing equipment
US20160239305A1 (en) Branch target buffer column predictor
WO2022057749A1 (en) Method and apparatus for handling missing memory page abnomality, and device and storage medium
TW202036284A (en) Branch prediction based on load-path history
US20170371671A1 (en) Stream based branch prediction index accelerator with power prediction
US20190042613A1 (en) Storage architectures for graph analysis applications

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18913192

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18913192

Country of ref document: EP

Kind code of ref document: A1