WO2019183877A1

WO2019183877A1 - Branch prediction method and device

Info

Publication number: WO2019183877A1
Application number: PCT/CN2018/081057
Authority: WO
Inventors: 麻军平; 韩彬; 吴迪
Original assignee: 深圳市大疆创新科技有限公司
Priority date: 2018-03-29
Filing date: 2018-03-29
Publication date: 2019-10-03
Also published as: CN110462587A

Abstract

Provided is a branch prediction method and device, the method comprises: acquiring prediction information of a branch order, the prediction information at least including a program counter value of the prediction order and a value of a branch history register; acquiring a HASH value of the prediction information by means of a HASH function; and acquiring a prediction result of the branch order according to the HASH value searching pattern history table. Since a HASH function can compress a message of any length into a short message of a fixed length, there is no need to worry that the pattern history table is too large due to the too long bit length of the prediction information. Therefore, the prediction information can include information with a long bit length so as to further improve the accuracy of branch prediction.

Description

Branch prediction method and device

Copyright statement

The disclosure of this patent document contains material that is subject to copyright protection. This copyright is the property of the copyright holder. The copyright owner has no objection to the reproduction of the patent document or the patent disclosure in the official records and files of the Patent and Trademark Office.

Technical field

The present invention relates to the field of processors and, more particularly, to a method and apparatus for branch prediction.

Background technique

Processors play an important role in the modern electronics industry. Processor design is also a high-end technology in the microelectronics industry. The processor design generally adopts a multi-stage pipeline structure to increase the operating frequency of the processor, thereby improving the performance of the processor. High-performance processors often have pipeline depths of more than ten levels, and some processors even reach more than twenty stages of pipelines.

Branch instructions are instructions that change the flow of a program and are very common in programs. If the branch of the branch instruction is true, then the next instruction to be executed will jump.

The pipelined design of the processor can increase the operating frequency of the processor and also cause the pipeline to be emptied when the branch instruction jumps. After the pipeline is emptied, the instruction is re-read from the target address, which initializes the pipeline. Initializing the pipeline can result in a long drain on the pipeline. Since branch instructions are very common in programs, branch instructions cause pipeline emptying that has severely affected processor performance. In response to this problem, branch prediction techniques have been proposed. Branch prediction technology plays an important role in improving processor performance.

The basic principle of the branch prediction technique is that when reading an instruction, it makes a judgment on the branch instruction, determines whether it jumps, and what is the jump target, and then determines the next time to read the instruction according to the judgment result. In the actual execution process, the branch prediction is based on the historical execution of the branch instruction in the program, and makes a judgment on the execution status of the current branch instruction (ie, whether to jump).

Currently, the Gshare predictor is a widely used branch prediction technology. Gshare predicts that it has achieved high accuracy, usually above 90%, but it is difficult to further improve. In fact, as long as the accuracy of the branch prediction can be increased by one percentage point, the performance of the processor will be greatly improved.

Therefore, it is necessary to propose a branch prediction technique that can further improve the accuracy.

Summary of the invention

The present invention provides a method and apparatus for branch prediction, which can further improve the accuracy of branch prediction with respect to the prior art.

In a first aspect, a method for branch prediction is provided, the method comprising: acquiring prediction information of a branch instruction, the prediction information including at least a value of a program counter of the branch instruction and a value of a branch history register of the branch instruction Obtaining a hash value of the prediction information by a hash function; and retrieving a mode history table according to the hash value to obtain a prediction result of the branch instruction.

In the existing Gshare branch prediction technique, due to the limitation of the area of the PHT table, the lower bits of the value of the PC of the branch instruction (ie, the address of the branch instruction) can only be used for prediction. If the lower bits of the PC encountering different branch instructions are equal, it may cause conflicts in the entries in the PHT.

In the present invention, the hash value of the prediction information of the branch instruction obtained by the HASH function is used to retrieve the PHT, because the HASH function can compress the message of any length into a short message of a fixed length, so there is no need to worry about the prediction information. The bit length is too long to cause the PHT to be too large, so that the value of the complete PC of the branch instruction can be included in the prediction information, and by using the value of the complete PC of the branch instruction, the different branch instructions can be avoided to some extent. The problem of the PHT entry can effectively avoid the conflict of the PHT entries, thereby improving the accuracy of the branch prediction.

Therefore, the present invention uses the hash value of the prediction information of the branch instruction obtained by the HASH function to retrieve the PHT, so that there is no need to worry that the bit length of the prediction information is too long and the PHT is too large, compared with the existing branch prediction technology. The rich prediction information can be provided to further improve the accuracy of branch prediction.

In a second aspect, an apparatus for branch prediction is provided, the apparatus comprising the following units:

An obtaining unit, configured to acquire prediction information of a branch instruction, where the prediction information includes at least a value of a program counter of the branch instruction and a value of a branch history register of the branch instruction;

a calculating unit, configured to obtain a hash value of the prediction information by using a hash function;

And a retrieving unit, configured to retrieve a mode history table according to the hash value to obtain a prediction result of the branch instruction.

In a third aspect, a branch predictor is provided, the branch predictor comprising: a memory and a processor, the memory for storing instructions, the processor for executing the memory stored instructions, and storing in the memory Execution of the instructions causes the processor to perform the method provided by the first aspect.

In a fourth aspect, a computer storage medium is provided having stored thereon a computer program, the computer program being executed by a computer such that the computer performs the method provided by the first aspect.

In a fifth aspect, a computer program product comprising instructions for causing a computer to perform the method provided by the first aspect is provided when executed by a computer.

DRAWINGS

Figure 1 is a schematic diagram of a multi-stage pipeline of a processor.

2 is a schematic diagram of the principle of a prior branch prediction technique.

FIG. 3 is a schematic flowchart of a method for branch prediction according to an embodiment of the present invention.

FIG. 4 is a schematic diagram of the principle of branch prediction according to an embodiment of the present invention.

FIG. 5 is a schematic block diagram of an apparatus for branch prediction according to an embodiment of the present invention.

FIG. 6 is a schematic block diagram of a branch predictor according to an embodiment of the present invention.

detailed description

In order to facilitate understanding of the technical solutions provided by the embodiments of the present invention, some concepts related to the embodiments of the present invention are first described below.

1) Pipeline design of the processor.

In modern processor design, in order to improve the operating frequency of the processor, pipeline technology is generally adopted, that is, the processor is divided into multi-stage pipelines. FIG. 1 is a schematic diagram of a common pipeline division. In FIG. 1, the processor is divided into six stages of pipelines, which in turn include: address calculation, instruction reading, instruction distribution, instruction decoding, instruction execution, and register file. The instructions in the program memory also enter the processor in a pipelined manner as shown in FIG.

2) Branch instructions.

A branch instruction is an instruction that changes the flow of a program. If the branch of the branch instruction is true, then the next instruction to be executed will jump.

For example, branch instructions include, but are not limited to, absolute jump instructions, conditional jump instructions, function call instructions, and function return instructions.

3) The effect of branch instructions on the pipeline of the processor.

Assume that the pipeline of the processor is as shown in Figure 1. The instructions in the program memory also enter the processor in the same manner as the pipeline shown in Figure 1. If there is a jump instruction in the instruction stored in the program memory, the jump instruction will only jump when it reaches the level of "instruction execution". At this point, the processor needs to clear the first four pipeline stages, and recalculate the fetch address and read the instruction. Emptying the pipeline can have a significant impact on processor performance.

For this problem, branch prediction techniques have been proposed.

4) Branch prediction technology.

For the problem described above, if the processor can find the branch instruction in advance at the "address calculation" level and make a prediction, that is, whether the branch instruction jumps, the processor performance loss caused by the pipeline clearing can be avoided. . As shown in Figure 1, the purpose of branch prediction is to discover the branch instruction and predict whether the branch instruction will jump before executing the first stage of the pipeline.

Before describing the branch prediction technique in detail, introduce several concepts.

1 program counter (PC).

The program counter is a register in the computer processor that is used to store the address (location) of the instruction currently being executed. In other words, the contents of the program counter are the address of the instruction currently being executed. When each instruction is fetched, the address stored in the program counter is incremented by one.

This document refers to the program counter (PC) value, which refers to the address of the branch instruction currently being executed.

2 Branch History Register (BHR).

BHR is a multi-bit shift register. If it is a branch instruction of jump, the lowest displacement of BHR is 1" "1"; if it is a branch instruction without jump, the lowest displacement of BHR is 1" "0". .

3 Pattern History Table (PHT).

PHT is used to record the history of branch instructions, jump or not. Typically, PHT is a table of 2-bit counters. Each entry of the PHT is 2 bits, indicating the historical execution of a branch instruction.

The number of entries included in the PHT is determined by the number of bits of the PHT address. Assuming that the address of the PHT is N bits (N is a positive integer), the PHT includes 2 ^N entries.

The number of entries included in the PHT may also be referred to as the area of the PHT. The area of the PHT mentioned hereinafter refers to the number of entries included in the PHT.

The branch prediction technique refers to detecting a branch instruction therein when reading an instruction, determining whether it jumps, and what is the target of the jump, and then determining the next time to read the instruction according to the judgment result.

Currently, Gshare (Global history with Index Sharing) branch prediction technology (also known as Gshare predictor) is a widely used branch prediction technology. The Gshare predictor uses the value of the lower bit of the PC value of the branch instruction and the value after the binary bit recorded in the BHR to retrieve the PHT to predict whether the current branch instruction jumps. Specifically, the value after the XOR is regarded as an address, and then an entry of the PHT is located according to the address, and the jump result of the branch instruction is predicted according to the value of the 2 bit in the entry. The basic principle of the Gshare predictor is shown in Figure 2.

In fact, PHT can't do much because of the PHT area, such as a table with a PHT of 10 bits. For example, for a PHT of a 10-bit address, the lower 10 bits of the value of the PC of the branch instruction can only be used to retrieve the PHT, that is, the lower 10 bits of the value of the PC are XORed with the binary bits recorded in the BHR, and then the exclusive OR is utilized. The results of the search for PHT. This is also the reason why the Gshare predictor uses the lower bits of the PC value of the branch instruction.

The prediction accuracy of Gshare branch prediction technology can often reach more than 90%. Gshare branch prediction technology is widely used in processor design. However, the accuracy of the Gshare predictor is difficult to further improve. In modern processors, the pipeline depth is deep and there are many branch instructions. If the branch prediction accuracy is increased by 1 percentage point, the performance of the processor is greatly improved.

In response to the above requirements, embodiments of the present invention provide a method and apparatus for branch prediction, which can further improve the accuracy of branch prediction with respect to existing branch prediction techniques.

FIG. 3 is a schematic flowchart of a method for branch prediction according to an embodiment of the present invention. For example, the method can be performed by a branch predictor. As shown in FIG. 3, the method includes the following steps.

S310. Acquire prediction information of a branch instruction, where the prediction information includes at least a value of a program counter of the branch instruction (hereinafter referred to as a value of a PC) and a value of a branch history register of the branch instruction (hereinafter referred to as a value of BHR).

The value of the PC of the branch instruction refers to the address of the branch instruction.

It should be understood that BHR is a multi-bit shift register. If it is a branch instruction of jump, the lowest displacement of BHR is 1 "1"; if it is a branch instruction without jump, the minimum displacement of BHR is 1 bit. "0". The value of the BHR of the branch instruction refers to the current value recorded in the BHR when it is the turn to predict the branch instruction.

It should be noted that the prediction information in this embodiment includes the value of the PC directly corresponding to the branch instruction and the value of the BHR, and is not a value obtained by XORing the value of the PC with the value of the BHR. For example, if the value of the PC of the branch instruction is "0x2c" and the value of the BHR of the branch instruction is "0b00_0000_0110", the prediction information of the branch instruction is:

"0x2c;

0b00_0000_0110".

S320, obtaining a hash value of the prediction information by a hash function (hereinafter referred to as a HASH function).

The HASH function is also called a hash function or a hash function. The HASH function can convert a message of any length into a short message of a fixed length. The HASH function can be thought of as a mapping of compressing long messages into short messages. The short message obtained by the HASH function compression may be referred to as a message digest.

The hash value of the prediction information obtained by the HASH function in this embodiment refers to the message digest obtained by hashing the prediction information as a variable in the HASH function.

It should be noted that the length of the message digest of the HASH function in this embodiment is equal to the number of bits of the address of the PHT. For example, if the PHT is a 10-bit address table, the message digest of the HASH function is also 10 bits in length. In other words, the bit number of the hash value of the prediction information of the branch instruction is equal to the number of bits of the address of the PHT.

As an example, assuming that the PHT is a table of 10-bit addresses, the message digest of the HASH function is also 10 bits in length. For example, the prediction information of the branch instruction acquired in S310 is:

"0x2c;

0b00_0000_0110",

In S320, the prediction information is used as a variable of the HASH function, and a 10-bit message digest is obtained through a hash operation, that is, a hash value of the prediction information is obtained, and the hash value has a length of 10 bits.

S330. Search a mode history table (hereinafter referred to as PHT) according to the hash value of the prediction information to obtain a prediction result of the branch instruction.

Specifically, the hash value is regarded as an address, and the PHT is checked, and an entry in the PHT is located, and then the jump result of the branch instruction is predicted according to the value of the 2 bit stored in the entry.

Assume that the definitions "00" and "01" indicate no jump, and the definitions "10" and "11" indicate jumps. When the value in the entry that is located according to the hash value is “10”, the jump result of the branch instruction may be predicted to be a jump; when the value in the entry that is located according to the hash value is “01”, Then, the jump result of the branch instruction can be predicted to be no jump.

It should be understood that after obtaining the prediction result of the branch instruction, the processor can process the branch instruction in accordance with the pipeline shown in FIG.

It should also be understood that the method further includes updating the PHT based on the actual jump result of the branch instruction.

Taking the pipeline shown in Figure 1 as an example, when the "instruction execution" level is reached, the actual jump of the branch instruction can be known. If the actual jump of the branch instruction is consistent with the prediction result in S330, Then, the PHT is not updated. Otherwise, the PHT is updated according to the actual jump condition of the branch instruction, that is, the value of 2 bits in the entry located according to the hash value is updated.

In the present invention, the hash value of the prediction information of the branch instruction obtained by the HASH function is used to retrieve the PHT, because the HASH function can compress the message of any length into a short message of a fixed length, so there is no need to worry about the prediction information. The length of the bit is too long to cause the PHT to be too large, so that the value of the complete PC of the branch instruction can be included in the prediction information, and by using the value of the complete PC of the branch instruction, the correspondence of different branch instructions can be avoided to some extent. The problem of the same PHT entry can effectively avoid the conflict of PHT entries, which can improve the accuracy of branch prediction.

Therefore, the present invention uses the hash value of the prediction information of the branch instruction obtained by the HASH function to retrieve the PHT, so that there is no need to worry that the bit length of the prediction information is too long, resulting in an excessive PHT, and therefore, relative to the existing branch. Predictive technology can provide rich prediction information, which can further improve the accuracy of branch prediction.

Furthermore, the present invention uses the hash value of the prediction information of the branch instruction obtained by the HASH function to retrieve the PHT, so the size of the PHT can be flexibly designed by setting the length of the message digest of the HASH function.

Optionally, the design of the HASH function in the present invention may use common logical operations, such as "AND", "OR", "XOR", etc., and may also use table lookup replacement, shifting, and the like.

Alternatively, the HASH function can be designed such that the message digest of its output can be as scattered as possible.

It should be understood that the message digest of the HASH function is relatively scattered, and it can be ensured to some extent that the hash values of different prediction information are different, which can effectively avoid the existence of blank entries in the PHT, that is, the utilization of the PHT can be improved, and at the same time, Reduce conflicts between entries in the PHT.

Optionally, in some embodiments, the prediction information of the branch instruction acquired in S310 includes the value of the complete PC of the branch instruction.

Specifically, the value of the PC of the branch instruction is 32 bits or 64 bits.

The prediction information used in this embodiment includes the value of the complete PC of the branch instruction, and the prediction result of the branch instruction can be further improved compared to the lower bits of the address of the branch instruction in the prior art. Accuracy.

Different design methods can be employed for the HASH function used in the present invention.

Optionally, in some embodiments, the HASH function may be set to be parameterized and configurable, and the uniformity of the message digest is best by adjusting the parameters of the HASH function when executing a certain program.

Optionally, in some embodiments, the method further includes: selecting the hash function from a plurality of candidate hash functions, wherein the hash function is the most uniform of the message digest in the plurality of candidate hash functions. A good hash function.

For example, multiple HASH functions can be designed to select the function with the best message digest uniformity from multiple HASH functions when executing a certain program.

Optionally, in some embodiments, the length of the message digest of the HASH function is determined according to at least one of the following: system requirements, processor size, and program size.

As an example, the process of determining the length of the message digest of the HASH function according to the system requirement is: when the system can tolerate a decrease in the prediction accuracy caused by a small area of the PHT, the message digest can be designed to be shorter, correspondingly, PHT will also become smaller; conversely, when the system requires higher prediction accuracy, the message digest can be designed to be longer, and accordingly, the size of the PHT will also become larger.

It should be understood that there are many types of processors, ranging from small microcontrollers to general embedded processors, as well as large processors for high performance computing. The scales between these different types of processors vary widely.

Among them, for embedded processors, the pursuit of small area, low power consumption, the need to use a small PHT, therefore, the message digest needs to be designed to be shorter.

For high performance computing processors, the pursuit of high performance, is not sensitive to area and power consumption, can use a large PHT, so the message digest can be designed to be longer.

In this embodiment, the length of the message digest of the HASH function can be set according to system requirements, processor size, or program size, so that the size of the PHT can be designed according to system requirements, processor size, or program size. Therefore, this embodiment can Provide appropriate branch prediction accuracy based on specific application needs.

Optionally, in some embodiments, the HASH function is a uniform HASH function.

When a HASH function maps a long message set to a short message set, if the message digest can be evenly distributed, there is little conflict, that is, different long messages correspond to different message digests. Such a HASH function is called a uniform HASH function.

It should be understood that the HASH function can evenly distribute the message digest, which is equivalent to uniformly distributing the hash value for retrieving the PHT, so that a large number of idle entries in the PHT can be largely avoided, that is, the utilization of the PHT is improved. In addition, conflicts between individual entries in the PHT can be effectively avoided.

With the solution provided by the present invention, when branch prediction is performed, more information closely related to the branch prediction can be collected as prediction information without worrying that the PHT is too large due to the bit length of the prediction information being too long. It should be understood that in branch prediction, the more information is used for prediction, the more accurate the prediction is.

In order to further improve the accuracy of the branch prediction, the prediction information of the branch instruction may include other factors affecting the execution of the branch instruction in addition to the value of the PC of the branch instruction and the value of the BHR.

Optionally, in some embodiments, the prediction information of the branch instruction acquired in S310 includes type information of the branch instruction in addition to the value of the PC of the branch instruction and the value of the BHR.

The type information of the branch instruction refers to information indicating the jump type of the branch instruction.

For example, when the branch instruction is an absolute jump instruction, the type of the branch instruction is defined as “0b000”; when the branch instruction is a function return instruction, the type of the branch instruction is defined as “0b010”; when the branch instruction is a conditional jump When the instruction is executed, the type of the branch instruction is defined as "0b001".

As an example, if the value of the PC of the branch instruction is "0x2c", the value of the BHR of the branch instruction is "0b00_0000_0110", and the type information of the branch instruction is "0b001", the prediction information to be input into the HASH function is:

"0x2c

0b001

0b00_0000_0110".

In addition to the value of the PC of the branch instruction and the value of the BHR, the prediction information of the embodiment includes the type of the branch instruction, and the content of the prediction information is increased compared with the prior art, thereby further improving the accuracy of the branch prediction. degree.

Optionally, in some embodiments, the prediction information of the branch instruction acquired in S310 includes, in addition to the value of the PC of the branch instruction and the value of the BHR, a PC that includes n branch instructions executed before the branch instruction. The value of n is a positive integer.

Optionally, in this embodiment, the prediction information further includes type information of the n branch instructions.

For example, n is equal to 3.

In addition to the value of the PC of the branch instruction and the value of the BHR, the prediction information of the embodiment includes the value of the PC of the one or more branch instructions executed before the branch instruction, and may further include the one or more branches. The type information of the instruction is richer in the content for predicting the branch instruction than the prior art, so that the accuracy of the branch prediction can be further improved.

Optionally, in some embodiments, the prediction information of the branch instruction acquired in S310 includes, in addition to the value of the PC of the branch instruction and the value of the BHR, a stack pointer of the processor, and/or the branch instruction The value of the condition register, which is used to indicate the result of the branch instruction.

The value of the condition register refers to the value stored by the condition register. The condition register of the branch instruction stores "1" or "0". The value of the condition register is used to indicate the actual jump result of the branch instruction.

It should be understood that the stack pointer is a ubiquitous concept in the processor and refers to a memory address. During the execution of the program, if the sub-function is called, some variables such as registers in the main program need to be written to the stack area to protect the register field inside the processor, and then the sub-function is executed.

The prediction information of this embodiment includes, in addition to the value of the PC of the branch instruction and the value of the BHR, the stack pointer of the processor, and/or the value of the condition register of the branch instruction, which is used in comparison with the prior art. The content of the prediction branch instruction is more abundant, so that the accuracy of the branch prediction can be further improved.

It should be noted that the foregoing description of the content of the prediction information about the branch instruction may include an embodiment of the content, and may be combined in any manner, which is not limited by the present invention.

For example, the prediction information of the branch instruction includes the value of the PC of the branch instruction, the value of the BHR of the branch instruction, the type information of the branch instruction, and the value and type information of the PC of the n branch instructions executed before the branch instruction.

For another example, the prediction information of the branch instruction includes a value of the PC of the branch instruction, a value of the BHR of the branch instruction, type information of the branch instruction, and a value and type information of the PC of the n branch instructions executed before the branch instruction. The stack pointer of the processor, the value of the condition register of the branch instruction.

FIG. 4 is a schematic diagram of the principle of performing branch prediction according to an embodiment of the present invention. The prediction information of the branch instruction includes a plurality of pieces of information: a value of the PC of the branch instruction, a value of a PC of a branch instruction of the branch instruction, a value of a PC of the branch instruction of the previous branch instruction, and a branch instruction Type information, type information of the last branch instruction of the branch instruction, type information of the last branch instruction of the branch instruction, ..., the value of the BHR of the branch instruction, where "..." indicates that the other branch is affected The factor of execution of the instruction.

The last branch instruction of the branch instruction refers to the last branch instruction executed before the branch instruction; the last branch instruction of the branch instruction refers to the last previous branch instruction executed before the branch instruction.

The above information is used as a variable of the HASH function, and a hash operation is performed to obtain a hash value whose length is equal to the number of bits of the address of the PHT. The PHT is retrieved using the hash value to obtain a prediction result of the branch instruction.

In order to better understand the technical solution provided by the present invention, the following paragraph is taken as an example for description.

In the above program, the second column is an instruction, and the first column is the value of the PC corresponding to the instruction.

Call_ins2, br_ins4, br_ins11, ret49 are four branch instructions, and the rest of the instructions are non-branch instructions.

Among them, call_ins2 is a function call instruction, which is an absolute jump instruction. Assume that the instruction type of the absolute jump instruction is defined as "0b000". Br_ins4 and br_ins11 are conditional jump instructions. Assume that the instruction type of the conditional jump instruction is defined as "0b001". Ret returns an instruction for the function. Assume that the instruction type of the function return instruction is defined as "0b010".

Assume that during a certain execution of the instruction, call_ins2 calls the subfunction (ie sub_function), reg49 returns from the subfunction, and br_ins4 does not jump, ie: pc(08)->pc(c0)->pc(c4)-> Pc(0c)->pc(18)->pc(2c), branch prediction is now required for the branch instruction br_ins11.

The steps of branch prediction of the branch instruction br_ins11 by using the method provided in the embodiment shown in FIG. 3 are as follows.

In S310, the prediction information of the branch instruction br_ins11 is obtained, the prediction information includes the value of the PC of the branch instruction br_ins11, the value of the PC of the branch instruction br_ins11 of the last branch instruction br_ins4, and the PC of the branch instruction br_ins11 of the last branch instruction ret49 The value of the branch instruction type information of the branch instruction br_ins11, the type information of the last branch instruction br_ins4 of the branch instruction br_ins11, the type information of the branch instruction ret49 of the branch instruction br_ins11, and the BHR value of the branch instruction br_ins11.

Specifically, the value of the PC of the branch instruction br_ins11 is "0x2c", the value of the PC of the last branch instruction br_ins4 of the branch instruction br_ins11 is "0x18", and the value of the PC of the branch instruction br_ins11 of the last branch instruction ret49 is "c4". The type information of the branch instruction br_ins11 is "0b001", the type information of the last branch instruction br_ins4 of the branch instruction br_ins11 is "0b001", and the type information of the last branch instruction ret49 of the branch instruction br_ins11 is "0b010".

Assume that BHR is a 10-bit shift register. When a jump instruction is encountered, the lowest shift is 1 bit, and a jump instruction without a jump is encountered. The lowest shift is 1 bit 0, and the initial value of BHR is 0.

BHR=0b00_0000_0001 when pc(08) is executed; BHR=0b00_0000_0011 when pc(c4) is executed; BHR=0b00_0000_0110 when pc(18) is executed. When the prediction instruction br_ins11 is needed, BHR=0b00_0000_0110. That is, the value of BHR of the branch instruction br_ins11 is "0b00_0000_0110".

Then, the prediction information of the branch instruction br_ins11 that needs to be predicted can be expressed as:

0x2c

0x18

0xc4

0b001

0b010

......

0b00_0000_0110

The above "..." indicates that other information may be added to the prediction information according to actual needs, for example, the value of the condition register or the stack pointer of the processor may be added.

In S320, the prediction information of the branch instruction br_ins11 is used as a variable of the HASH function, and a hash operation is performed to obtain a 10-bit message digest (ie, a hash value of the prediction information).

In S330, the message digest is used to retrieve the PHT, and the branch instruction br_ins11 is predicted to jump according to the value of 2 bits in the located entry.

As can be seen from the above, compared with the existing Gshare branch prediction technology, the present invention can consider more comprehensive factors affecting branch instructions in the process of branch prediction, so that different execution conditions of the same branch instruction can be fully distinguished, and thus The division of the same branch instruction jumps and does not jump, so that the accuracy of the branch prediction can be effectively improved. In addition, the entry conflicts in the PHT can be avoided to some extent.

It should be noted that the prediction information of the branch instruction described above may include at least one of the following factors affecting the execution of the branch prediction in addition to the value of the PC including the branch instruction and the value of the BHR: the branch instruction The type information, the value of the program counter of the n branch instructions executed before the branch instruction, the type information of the n branch instructions, the stack pointer of the processor, and the value of the condition register. However, the present invention is not limited. Therefore, in actual operation, any factor that affects the execution of the branch instruction can be added to the prediction information of the branch instruction.

Optionally, in some embodiments, the S310 specifically includes: generating, according to the value of the PC of the branch instruction and the value of the BHR, and the selected influencing factor, the prediction information of the branch instruction, where the prediction information includes the PC of the branch instruction. The value of the BHR and the selected factors. The selected influencing factor is at least one factor selected from the following factors according to the control information: type information of the branch instruction, a value of a program counter of the n branch instructions executed before the branch instruction, and the n pieces The type information of the branch instruction, the stack pointer of the processor, and the value of the condition register.

The control information includes an influence factor of the above-mentioned respective influence factors on the execution of the branch instruction.

As an example, a factor that selects a larger (or largest) influence factor from among the above various influencing factors is added to the prediction information of the branch instruction.

Optionally, the control information can be manually configured and is an empirical value.

Alternatively, the control information can be generated by a processor that runs the instructions.

For example, after each branch instruction is executed, the processor stores various influencing factors, and then calculates the influence factors of each influencing factor through correlation analysis, such as statistical analysis.

In this embodiment, by generating the prediction information of the branch instruction according to the control information, in actual operation, a factor that greatly affects the execution of the branch prediction may be selected from a plurality of influencing factors according to specific requirements, and the branch prediction may be discarded. The implementation of smaller factors can further improve the accuracy of branch prediction.

In summary, the technical solution provided by the embodiment of the present invention uses the hash value of the prediction information of the branch instruction obtained by the HASH function to retrieve the PHT, because the HASH function can compress the message of any length into a short message of a fixed length. Therefore, there is no need to worry that the PHT is too large due to the long bit length of the prediction information, so that a more comprehensive factor affecting the branch instruction can be considered, so that the accuracy of the branch prediction can be improved.

The method embodiments of the present invention are described above, and the device embodiments of the present invention are described below. It should be understood that the description of the device embodiments corresponds to the description of the method embodiments, and therefore, the details of the methods are not described in detail. For the sake of brevity, we will not repeat them here.

FIG. 5 is a schematic block diagram of an apparatus for branch prediction according to an embodiment of the present invention. The device comprises the following units.

The obtaining unit 510 is configured to acquire prediction information of a branch instruction, where the prediction information includes at least a value of a program counter of the branch instruction and a value of a branch history register of the branch instruction.

The calculating unit 520 is configured to obtain a hash value of the prediction information by using a hash function. as well as

The searching unit 530 is configured to retrieve a mode history table according to the hash value to obtain a prediction result of the branch instruction.

Optionally, in some embodiments, the prediction information further includes type information of the branch instruction.

Optionally, in some embodiments, the prediction information further includes a value of a program counter of n branch instructions executed before the branch instruction, and n is a positive integer.

Optionally, in some embodiments, the prediction information further includes type information of the n branch instructions.

Optionally, in some embodiments, the prediction information further includes: a stack pointer of the processor, and/or a value of a condition register of the branch instruction, the value of the condition register is used to indicate a jump result of the branch instruction.

Optionally, in some embodiments, the prediction information further includes at least one of: type information of the branch instruction, a value of a program counter of the n branch instructions executed before the branch instruction, the n pieces The type information of the branch instruction, the stack pointer of the processor, and the value of the condition register.

Optionally, in some embodiments, the value of the program counter of the branch instruction is 32 bits or 64 bits.

Optionally, in some embodiments, the obtaining unit is further configured to: select the hash function from a plurality of candidate hash functions, where the hash function is uniformity of the message digest in the plurality of candidate hash functions The best hash function.

Optionally, in some embodiments, the length of the message digest of the hash function is determined based on at least one of the following: system requirements, processor size, and program size.

Optionally, in some embodiments, the hash function is a uniform hash function.

Optionally, in some embodiments, the device further includes:

And an update unit, configured to update the mode history table according to an actual jump result of the branch instruction.

As shown in FIG. 6, an embodiment of the present invention further provides a branch predictor, including: a memory 620 for storing instructions, and a processor 610, configured to execute instructions stored by the memory 620, and Execution of the instructions stored in the memory 620 causes the processor 610 to perform the method of the above method embodiments.

Optionally, as shown in FIG. 6, in some embodiments, the branch predictor further includes at least one register 630 for storing prediction information of the branch instruction.

It should be understood that the branch predictor includes a BHR 640.

The embodiment of the invention further provides a computer storage medium on which is stored a computer program, which when executed by a computer, causes the computer to execute the method of the above method embodiment.

Embodiments of the present invention also provide a computer program product comprising instructions that, when executed by a computer, cause a computer to perform the method of the above method embodiments.

In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware or any other combination. When implemented in software, it may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, the processes or functions described in accordance with embodiments of the present invention are generated in whole or in part. The computer can be a general purpose computer, a special purpose computer, a computer network, or other programmable device. The computer instructions can be stored in a computer readable storage medium or transferred from one computer readable storage medium to another computer readable storage medium, for example, the computer instructions can be from a website site, computer, server or data center Transmission to another website site, computer, server or data center via wired (eg coaxial cable, fiber optic, digital subscriber line (DSL)) or wireless (eg infrared, wireless, microwave, etc.). The computer readable storage medium can be any available media that can be accessed by a computer or a data storage device such as a server, data center, or the like that includes one or more available media. The usable medium may be a magnetic medium (for example, a floppy disk, a hard disk, a magnetic tape), an optical medium (such as a digital video disc (DVD)), or a semiconductor medium (such as a solid state disk (SSD)). .

Those of ordinary skill in the art will appreciate that the elements and algorithm steps of the various examples described in connection with the embodiments disclosed herein can be implemented in electronic hardware or a combination of computer software and electronic hardware. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the solution. A person skilled in the art can use different methods for implementing the described functions for each particular application, but such implementation should not be considered to be beyond the scope of the present invention.

In the several embodiments provided by the present invention, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. For example, the device embodiments described above are merely illustrative. For example, the division of the unit is only a logical function division. In actual implementation, there may be another division manner, for example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored or not executed. In addition, the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be in an electrical, mechanical or other form.

The units described as separate components may or may not be physically separate, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.

The above is only a specific embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily think of changes or substitutions within the technical scope of the present invention. It should be covered by the scope of the present invention. Therefore, the scope of the invention should be determined by the scope of the appended claims.

Claims

A method for branch prediction, characterized in that it comprises:

Obtaining prediction information of a branch instruction, where the prediction information includes at least a value of a program counter of the branch instruction and a value of a branch history register of the branch instruction;

Obtaining a hash value of the prediction information by a hash function;

And retrieving the mode history table according to the hash value to obtain a prediction result of the branch instruction.
The method of claim 1, wherein the prediction information further comprises type information of the branch instruction.
The method according to claim 1 or 2, wherein the prediction information further comprises a value of a program counter of n branch instructions executed before the branch instruction, and n is a positive integer.
The method according to claim 3, wherein the prediction information further includes type information of the n branch instructions.
The method according to any one of claims 1 to 4, wherein the prediction information further comprises: a stack pointer of the processor, and/or a value of a condition register of the branch instruction, the condition register The value is used to indicate the result of the branch instruction.
The method according to claim 1, wherein the prediction information further comprises at least one of: type information of the branch instruction, a program counter of n branch instructions executed before the branch instruction The value, the type information of the n branch instructions, the stack pointer of the processor, and the value of the condition register.
The method according to any one of claims 1 to 6, wherein the value of the program counter of the branch instruction is 32 bits or 64 bits.
The method according to any one of claims 1 to 7, wherein the hash function is a uniform hash function.
The method according to any one of claims 1 to 8, wherein the method further comprises:

The hash function is selected from a plurality of alternative hash functions, the hash function being the hash function having the best uniformity among the plurality of candidate hash functions.
The method according to any one of claims 1 to 8, wherein the length of the message digest of the hash function is determined according to at least one of the following: system requirements, processor size, and program size.
The method according to any one of claims 1 to 10, wherein the method further comprises:

Updating the mode history table according to an actual jump result of the branch instruction.
A device for branch prediction, comprising:

An obtaining unit, configured to acquire prediction information of a branch instruction, where the prediction information includes at least a value of a program counter of the branch instruction and a value of a branch history register of the branch instruction;

a calculating unit, configured to obtain a hash value of the prediction information by using a hash function;

a retrieval unit, configured to retrieve a pattern history table according to the hash value to obtain a prediction result of the branch instruction.
The apparatus according to claim 12, wherein said prediction information further comprises type information of said branch instruction.
The apparatus according to claim 12 or 13, wherein said prediction information further comprises a value of a program counter of n branch instructions executed before said branch instruction, n being a positive integer.
The apparatus according to claim 14, wherein said prediction information further comprises type information of said n branch instructions.
The apparatus according to any one of claims 12 to 15, wherein the prediction information further comprises: a stack pointer of the processor, and/or a value of a condition register of the branch instruction, the condition register The value is used to indicate the result of the branch instruction.
The apparatus according to claim 12, wherein said prediction information further comprises at least one of: type information of said branch instruction, program counter of n branch instructions executed before said branch instruction The value, the type information of the n branch instructions, the stack pointer of the processor, and the value of the condition register.
The apparatus according to any one of claims 12 to 17, wherein the value of the program counter of the branch instruction is 32 bits or 64 bits.
The apparatus according to any one of claims 12 to 18, wherein the obtaining unit is further configured to select the hash function from a plurality of candidate hash functions, the hash function being The hash function with the best uniformity of the message digest in the multiple alternative hash functions.
The apparatus according to any one of claims 12 to 18, wherein the length of the message digest of the hash function is determined according to at least one of the following: system requirements, processor size, and program size.
The apparatus according to any one of claims 12 to 20, wherein the hash function is a uniform hash function.
The device according to any one of claims 12 to 21, wherein the device further comprises:

And an updating unit, configured to update the mode history table according to an actual jump result of the branch instruction.
A branch predictor, comprising: a memory for storing instructions, said processor for executing instructions stored by said memory, and performing execution of instructions stored in said memory such that The processor is operative to perform the method of any one of claims 1 to 11.
A computer storage medium, characterized in that a computer program is stored thereon, the computer program being executed by a computer such that the computer performs the method of any one of claims 1 to 11.
A computer program product comprising instructions, wherein the instructions, when executed by a computer, cause a computer to perform the method of any one of claims 1 to 11.