CN116661872A - Prediction method and device for supporting simultaneous prediction of two unconditional branch instructions of continuous jump - Google Patents

Prediction method and device for supporting simultaneous prediction of two unconditional branch instructions of continuous jump Download PDF

Info

Publication number
CN116661872A
CN116661872A CN202310601060.8A CN202310601060A CN116661872A CN 116661872 A CN116661872 A CN 116661872A CN 202310601060 A CN202310601060 A CN 202310601060A CN 116661872 A CN116661872 A CN 116661872A
Authority
CN
China
Prior art keywords
branch
instruction
address
unconditional
unconditional branch
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310601060.8A
Other languages
Chinese (zh)
Inventor
黄立波
杨凌
郑重
郭辉
邓全
雷国庆
王永文
王俊辉
郭维
隋兵才
孙彩霞
沈俊忠
倪晓强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National University of Defense Technology
Original Assignee
National University of Defense Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National University of Defense Technology filed Critical National University of Defense Technology
Priority to CN202310601060.8A priority Critical patent/CN116661872A/en
Publication of CN116661872A publication Critical patent/CN116661872A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • G06F9/3802Instruction prefetching
    • G06F9/3804Instruction prefetching for branches, e.g. hedging, branch folding
    • G06F9/3806Instruction prefetching for branches, e.g. hedging, branch folding using address prediction, e.g. return stack, branch history buffer
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • G06F9/3836Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
    • G06F9/3842Speculative instruction execution
    • G06F9/3844Speculative instruction execution using dynamic branch prediction, e.g. using branch history tables
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention discloses a prediction method and a device for supporting two unconditional branch instructions for simultaneously predicting continuous jump, wherein the method comprises the steps of judging whether two unconditional branch instructions within the same instruction fetching width exist in an item hit in a branch target cache BTB when a current instruction hits in the branch target cache BTB, if so, acquiring the branch address of a first unconditional branch instruction in the hit item, and simultaneously acquiring the branch address of a second unconditional branch instruction in the hit item, and combining the two branch addresses and outputting the combined unconditional branch instructions; otherwise, only the branch address of the first unconditional branch instruction in the hit entry is obtained and output. The invention aims to solve the problem that after the unconditional branch instruction predicts the jump, the jump target address is in the same instruction fetching width, and the instruction fetching cost is higher when the first encountered branch instruction is the unconditional branch instruction.

Description

Prediction method and device for supporting simultaneous prediction of two unconditional branch instructions of continuous jump
Technical Field
The present invention relates to the field of microprocessor design, and in particular, to a method and apparatus for predicting two unconditional branch instructions that support simultaneous prediction of consecutive jumps.
Background
Branch instructions in a processor may be generally divided into conditional branch instructions and unconditional branch instructions. The unconditional branch instruction plays an important role as one of the program flow control class instructions in the microprocessor, for example call/return class function call and return operations are compiled into the unconditional branch instruction, which necessarily results in a change in the fetch address as long as the instruction is validated. However, there is a certain time delay from the fetching of the unconditional branch instruction to the changing of the corresponding fetch address, and how to effectively reduce the time overhead for improving the performance of the microprocessor becomes an important research point.
Unconditional branch prediction techniques have been proposed to solve this problem by maintaining historical branch addresses for unconditional branch instructions, and then predicting unconditional branch instructions under the same historical conditions using the historical branch addresses.
Predictions for unconditional branch instructions typically use a branch target cache (Branch Target Buffer, BTB) to record their branch addresses and use them to predict unconditional branch instructions of fixed jump addresses, such as common function call class instructions. For prediction of function return type unconditional branch instructions, the return address stack (Return Address Stack, RAS) is usually used for prediction, and RAS is a structure of first-in-last-out (First In Last Out, FILO) which maps call and return actions of a program to push and pop actions of RAS, so that the unconditional branch instructions can be better predicted. In addition, indirect branch predictors are used to predict indirect jump unconditional branch instructions, such as (Indirect Target GEometric History Length (ITTAGE) predictors, which are used to predict branches related to specific execution states, such as function pointers, switch-cases, etc., compared with BTB.
Conventional unconditional branch predictors each predict for a single unconditional branch instruction, outputting prediction information for only one unconditional branch instruction at a time. In predicting jumps, the unconditional branch predictor outputs a branch target address as the fetch address for the next cycle. Under the condition of correct prediction, the effective instruction of each beat can be fetched from the memory and sent into the processor, and the control related problem caused by the unconditional branch instruction can be better solved.
In practical processors, instruction fetch units typically fetch multiple instructions, such as a cache line, at a time from the cache in order to increase the throughput of instructions, such that a longer instruction fetch width results in a situation where multiple unconditional branch instructions are present. In conventional branch predictors, the prediction is typically performed sequentially for the branch instructions therein, and in the event that the prediction does not jump, the next branch instruction is predicted until no more branch instructions therein are encountered or the branch instruction that predicts the jump is encountered. Without further branch instructions, the fetch unit reads the instructions in the next cache line sequentially. When encountering a branch instruction of a predicted jump, the instruction fetching unit reads the instruction in the corresponding cache line according to the branch target address. Even if prediction can be performed on a plurality of instructions simultaneously by parallel prediction, only branch jump information of a first prediction jump is generally taken, and branch prediction for the cache line ends and proceeds to the next prediction loop.
In summary, in the conventional unconditional branch prediction mode, a jump of an unconditional branch instruction is predicted, and the branch prediction for the cache line in which the branch instruction is located is ended. Therefore, when the address of the unconditional branch instruction and the target address of the jump are in the same cache line, the fetch unit can read the same cache line for multiple times, so that the fetch efficiency is reduced. Because all instructions between two unconditional branch instructions have been determined and the fetch unit already has possession of this portion of instructions, in the conventional manner, this portion of instructions will be discarded and the same instruction fetched again from the cache, thus posing a problem of inefficient fetching.
Disclosure of Invention
The invention aims to solve the technical problems: the invention aims to solve the problems that after the non-conditional branch instruction predicts the jump, the jump target address is in the same instruction fetching width, and the fetching cost is higher when the first encountered branch instruction is the non-conditional branch instruction.
In order to solve the technical problems, the invention adopts the following technical scheme:
a prediction method supporting simultaneous prediction of two unconditional branch instructions of a continuous jump, comprising:
s101, judging whether the current instruction hits in a branch target cache BTB, and jumping to the next step if the current instruction hits;
s102, judging whether two unconditional branch instructions located in the same instruction fetch width exist in an item hit in a branch target cache BTB, if so, acquiring a branch address of a first unconditional branch instruction in the hit item, simultaneously acquiring a branch address of a second unconditional branch instruction in the hit item, and combining the branch addresses of the two unconditional branch instructions and outputting the combined branch addresses; otherwise, only the branch address of the first unconditional branch instruction in the hit entry is acquired, and the branch address of the first unconditional branch instruction is output.
Optionally, the fields in each entry of the branch target cache BTB include the high-order address HA, offset FBO, branch type FBY, and branch target address FBT of the first branch instruction, the instruction address SBA, branch type SBY, and branch target address SBT of the second branch instruction of the consecutive jump, and an identification UV for indicating whether the first and second branch instructions in the corresponding entry are unconditional branch instructions, respectively.
Optionally, determining in step S101 whether the current instruction hits in the branch target cache BTB includes:
s201, cutting the address of the current instruction;
s202, matching the high-order address of the cut address with the high-order address HA of the first branch instruction in each table item of the branch target cache BTB, and if the matched table item is found, judging that the current instruction hits in the branch target cache BTB; otherwise, the current instruction is determined to miss in the branch target cache BTB.
Optionally, clipping the address of the current instruction in step S201 means right-shifting the address of the current instruction by log 2 FW bit, wherein FW is the value width.
Optionally, the number of bits of the identifier UV is two, where one bit is used to indicate whether the first branch instruction in the corresponding table entry is an unconditional branch instruction with 0 and 1, and the other bit is used to indicate whether the second branch instruction in the corresponding table entry is an unconditional branch instruction with 0 and 1, and the value of the identifier UV is 11 or 10.
Optionally, in step S102, determining whether there are two unconditional branch instructions within the same fetch width in the table entry hit in the branch target cache BTB refers to: judging whether the value of the identifier UV of the item hit in the branch target cache BTB is 11 or not, if yes, judging that two unconditional branch instructions within the same fetch width exist in the item hit in the branch target cache BTB; otherwise, it is determined that there are not two unconditional branch instructions located within the same instruction fetch width in the entry hit in the branch target cache BTB.
Optionally, when acquiring the branch address of the first unconditional branch instruction in the hit entry in step S102, determining the branch type FBY of the first unconditional branch instruction in the hit entry includes: if the branch type FBY of the first unconditional branch instruction in the hit table is a function call class jal instruction, the branch address of the first unconditional branch instruction is the branch target address FBT of the first unconditional branch instruction in the hit table, and the address of the next instruction of the unconditional branch instruction is written into the return address stack RAS; if the branch type FBY of the first unconditional branch instruction in the command table is a function return ret-like instruction, the branch address of the first unconditional branch instruction is the first pop address in the return address stack RAS.
Optionally, when acquiring the branch address of the second unconditional branch instruction in the hit entry in step S102, determining the branch type SBY of the second unconditional branch instruction in the hit entry includes: if the branch type SBY of the second unconditional branch instruction in the hit table is a function call class jal instruction, the branch address of the second unconditional branch instruction is the branch target address SBT of the second unconditional branch instruction in the hit table, and the address of the next instruction of the unconditional branch instruction is written into the return address stack RAS; if the branch type SBY of the second unconditional branch instruction in the command table is a function return ret-like instruction, the branch address of the second unconditional branch instruction is the second pop address in the return address stack RAS.
In addition, the invention also provides a prediction device for supporting two unconditional branch instructions for simultaneously predicting continuous jumps, which comprises a microprocessor and a memory which are connected with each other, wherein the microprocessor is programmed or configured to execute the prediction method for supporting two unconditional branch instructions for simultaneously predicting continuous jumps.
Furthermore, the present invention provides a computer readable storage medium having stored therein a computer program for programming or configuring by a microprocessor to perform the prediction method supporting simultaneous prediction of two unconditional branch instructions of a sequential jump.
Compared with the prior art, the invention has the following advantages:
1. the method has high prediction speed, and can output the branch addresses of two unconditional branch instructions which jump continuously in one beat of clock.
2. The invention has simple design logic, and the branch prediction principle is basically the same as that of the original branch prediction method, so that the function of the invention can be supported by simple logic modification.
3. The invention can improve the finger taking efficiency of the finger taking unit at the front end of the processor and reduce unnecessary finger taking operation.
Drawings
FIG. 1 is a schematic diagram of a basic flow of a method according to an embodiment of the present invention.
FIG. 2 is a schematic diagram of an entry structure of a BTB table according to an embodiment of the present invention.
FIG. 3 is a schematic diagram of a predictor in an embodiment of the present invention.
FIG. 4 is a schematic diagram of an example of prediction of two unconditional branch instructions in an embodiment of the invention.
Detailed Description
As shown in fig. 1, the prediction method supporting two unconditional branch instructions for predicting consecutive jumps simultaneously according to the present embodiment includes:
s101, judging whether the current instruction hits in a branch target cache BTB, and jumping to the next step if the current instruction hits;
s102, judging whether two unconditional branch instructions located in the same instruction fetch width exist in an item hit in a branch target cache BTB, if so, acquiring a branch address of a first unconditional branch instruction in the hit item, simultaneously acquiring a branch address of a second unconditional branch instruction in the hit item, and combining the branch addresses of the two unconditional branch instructions and outputting the combined branch addresses; otherwise, only the branch address of the first unconditional branch instruction in the hit entry is acquired, and the branch address of the first unconditional branch instruction is output.
In the conventional prediction manner, only branch information of one branch instruction is reserved in each item of the branch target cache BTB, so that the structure can only output one piece of branch instruction information at a time, and particularly, the unconditional branch prediction structure for the branch target cache BTB and the return address stack RAS selectively outputs the branch information in the branch target cache BTB or the return address stack RAS according to the type of the current unconditional branch instruction. In this embodiment, the fields in each entry of the branch target cache BTB include the upper address HA (High Address), offset FBO (First Branch Offset), branch type FBY (First Branch tYpe), and branch target address FBT (First Branch Target) of the first branch instruction, the instruction address SBA (Second Branch Address), branch type SBY (Second Branch tYpe), and branch target address SBT (Second Branch Target) of the second branch instruction that jump in succession, and the identifier UV (Unconditional Valid) for indicating whether the first branch instruction and the second branch instruction in the corresponding entry are unconditional branch instructions, respectively, by the above structural improvement, the branch target cache BTB has the function of outputting the branch information of two unconditional branch instructions that appear in succession at a time
In this embodiment, the step S101 of determining whether the current instruction hits in the branch target cache BTB includes:
s201, cutting the address of the current instruction;
s202, matching the high-order address of the cut address with the high-order address HA of the first branch instruction in each table item of the branch target cache BTB, and if the matched table item is found, judging that the current instruction hits in the branch target cache BTB; otherwise, the current instruction is determined to miss in the branch target cache BTB.
In this embodiment, step S201 of clipping the address of the current instruction means right-shifting the address of the current instruction by log 2 FW bit, wherein FW is the value width.
In this embodiment, the number of bits of the identifier UV is two, where one bit is used to indicate whether the first branch instruction in the corresponding table entry is an unconditional branch instruction by using 0 and 1, and the other bit is used to indicate whether the second branch instruction in the corresponding table entry is an unconditional branch instruction by using 0 and 1, and the value of the identifier UV is 11 or 10.
In this embodiment, in step S102, determining whether there are two unconditional branch instructions within the same fetch width in the table entry hit in the branch target cache BTB refers to: judging whether the value of the identifier UV of the item hit in the branch target cache BTB is 11 or not, if yes, judging that two unconditional branch instructions within the same fetch width exist in the item hit in the branch target cache BTB; otherwise, it is determined that there are not two unconditional branch instructions located within the same instruction fetch width in the entry hit in the branch target cache BTB.
In this embodiment, when the branch address of the first unconditional branch instruction in the hit entry is obtained in step S102, determining the branch type FBY of the first unconditional branch instruction in the hit entry includes: if the branch type FBY of the first unconditional branch instruction in the hit table is a function call class jal instruction, the branch address of the first unconditional branch instruction is the branch target address FBT of the first unconditional branch instruction in the hit table, and the address of the next instruction of the unconditional branch instruction is written into the return address stack RAS; if the branch type FBY of the first unconditional branch instruction in the command table is a function return ret-like instruction, the branch address of the first unconditional branch instruction is the first pop address in the return address stack RAS.
In this embodiment, when the branch address of the second unconditional branch instruction in the hit entry is obtained in step S102, determining the branch type SBY of the second unconditional branch instruction in the hit entry includes: if the branch type SBY of the second unconditional branch instruction in the hit table is a function call class jal instruction, the branch address of the second unconditional branch instruction is the branch target address SBT of the second unconditional branch instruction in the hit table, and the address of the next instruction of the unconditional branch instruction is written into the return address stack RAS; if the branch type SBY of the second unconditional branch instruction in the command table is a function return ret-like instruction, the branch address of the second unconditional branch instruction is the second pop address in the return address stack RAS. As can be seen from the above description, in step S102, the branch addresses of the two unconditional branch instructions in the fetch table entry are obtained by using the branch target cache BTB in combination with the return address stack RAS, and this function has no specific requirement for the specific implementation form of the unconditional branch predictor.
Referring to fig. 3, in the predictor, the method further includes using a selector MUX to implement selecting output between two branch values after the information of the branch target cache BTB and the return address stack RAS. The serial module INLINE in fig. 3 integrates the valid instructions within the value width by comprehensively utilizing the branch information of the two unconditional branch instructions, and sends the valid instructions to a subsequent unit of the processor for execution. The return address stack RAS predicts the function return unconditional branch instruction, which is different from the traditional return address stack RAS with a single stack outlet, which can only take out the branch information of one unconditional branch instruction at a time. And the branch target cache BTB predicts against other types of unconditional branch instructions, and relevant information of the branch instructions is stored in the branch target cache BTB. In this embodiment, the structure of the branch target cache BTB and the serial module INLINE are modified to store two unconditional branch instructions that jump continuously within the same value width, and provide corresponding jump information at the same time during prediction, and the function of efficiently integrating the instructions into subsequent execution units of the processor according to the two branch information. The serial module INLINE integrates the instructions according to the output two unconditional branch instruction information and sends the instructions to a subsequent execution unit of the processor.
The operation of the branch predictor employing the method of the present embodiment is largely divided into two parts, a prediction operation and an update operation, respectively.
When the branch target cache BTB hits and knows that the UV flag bit is 11 in prediction, two pieces of unconditional branch instruction information located in the same instruction fetch width exist in the representative table entry, and at the moment, the situation that one unconditional branch instruction in the instruction fetch width is considered to be the next unconditional branch instruction which is necessarily jumped can be considered to occur, the branch information of a second unconditional branch instruction is simultaneously given out by utilizing the branch target cache BTB and the return address stack RAS when the jump of the first unconditional branch instruction is predicted, and the two pieces of information can be used for integrating instructions by the serial module INLINE and are directly fetched from the jump target address of the second unconditional branch instruction, so that the value fetch efficiency of a fetch unit is improved, and the performance benefit of a processor is obtained. The input of the concatenation module INLINE is an instruction obtained by the instruction fetch unit, for example a cache line, and the output is an instruction stream fed to a subsequent execution unit of the processor, and when the UV flag bit is 11, the instruction between the target address of the first unconditional branch instruction and the instruction address of the second unconditional branch instruction is issued. When the UV flag bit is 10, the prediction is performed according to the normal condition of predicting one branch instruction at a time, that is, only the branch information of the first unconditional branch instruction is output, and the serial module INLINE also only fetches the instruction address of the first branch instruction according to the original fetching strategy, so that the performance of the original processor is not affected. In this embodiment, when the branch predictor predicts an unconditional branch instruction, the address of the instruction is first clipped, and the branch target cache BTB is accessed by using the low order address of the clipped address, where the clipping is performed by shifting the instruction address to the right log 2 FW, wherein FW is Fetch Width (FW), detects entry contents after reading out entries corresponding to the BTB, indicates that the BTB hits if the high address of the clipped address is the same as the HA in the entry, and indicates that the instruction is a branch instruction, otherwise indicates that the instruction is not a branchThe instruction will not make a prediction output. When the instruction is a branch instruction, the corresponding UV flag bit is checked, and if the UV flag bit is 11, it represents that there are two valid branch information of the unconditional branch instruction in the table entry, and the branch information of the unconditional branch instruction is output at this time: 1) The branch address FBA of the first unconditional branch instruction is the cropped address left shift log 2 The FW bit is followed by FBO. 2) The branch type of the first unconditional branch instruction is FBY. 3) Default to the branch target of the first unconditional branch instruction as FBT; if the branch type is a function call class jal instruction, the branch target of the first unconditional branch instruction is FBT, and the address of the next instruction of the unconditional branch instruction is written into the return address stack RAS; if the branch type is a function return type ret instruction, then the branch target of the first unconditional branch instruction is the first pop address of the return address stack RAS. 4) The branch address of the second conditional branch instruction is SBA. 5) The branch type of the second conditional branch instruction is SBY. 6) Defaulting to SBT as the branch target of the second unconditional branch instruction; if the branch type is the function call type jal instruction, the branch target of the second unconditional branch instruction is SBT, and the address of the next instruction of the second unconditional branch instruction is written into the return address stack RAS; if the branch type is a function return type ret instruction, then the branch target of the second unconditional branch instruction is the second pop address of the return address stack RAS. At this time, the serial module INLINE will send the instruction between the FBT and the SBA for the subsequent execution of the processor according to the above information on the basis of the traditional fetching of the FBA, and the instruction fetching unit will start fetching from the SBT, thereby saving the prediction time overhead for the second unconditional branch instruction in the traditional case.
If the UV flag bit is 10, it represents that there is only branch information of one valid unconditional branch instruction in the table entry, and only branch information of the first unconditional branch instruction is output at this time: 1) The branch address FBA of the first unconditional branch instruction is the cropped address left shift log 2 The FW bit is followed by FBO. 2) The branch type of the first unconditional branch instruction is FBY. 3) Default first unconditional scoreThe branch target of the branch instruction is FBT; if the branch type is a function call class jal instruction, the branch target of the first unconditional branch instruction is FBT, and the address of the next instruction of the unconditional branch instruction is written into the return address stack RAS; if the branch type is a function return type ret instruction, then the branch target of the first unconditional branch instruction is the first pop address of the return address stack RAS. At this point the concatenation module INLINE fetches instructions to the FBA and sends them out according to conventional policies. Thus, there may be two sources for the branch target address, one is a branch target cache, the other is a return address stack RAS, the return address stack RAS supports one or two pieces of address information to be popped simultaneously, if there is only one function return type ret instruction in the two unconditional branch instructions, the return address stack RAS is popped once, representing the target address of the return type ret instruction, if the two unconditional branch instructions are function return type ret instructions, the return address stack RAS is popped twice, representing the branch target addresses of the two preceding and following unconditional branch instructions, respectively, if there is no return type ret instruction in the two unconditional branch instructions, the return address stack RAS is not popped, and the sequence of the pop-in and pop-out needs are the same as in the branch target cache. It is noted that when the first unconditional branch instruction is a function call class jal instruction and the second unconditional branch instruction is a function return class ret instruction, the return address stack RAS needs to complete a first-in-stack and then-out-of-stack operation, so when the return address stack RAS detects a first-in-stack and then-out-of-stack operation, the return address stack RAS directly sends the data to the pop port, but does not enter the stack, and this design will optimize the access time of the return address stack RAS.
When updating, the branch predictor keeps the updating information of two branch instructions, and when the unconditional branch instruction is submitted, the branch target cache BTB and the return address stack RAS are updated according to the instruction type, and the corresponding first branch instruction information in the branch target cache BTB is modified. When the first branch instruction submitted subsequently is an unconditional branch instruction and the first branch instruction and the unconditional branch instruction are located in the same instruction fetching width, the branch information of the first branch instruction is written into the second branch instruction position of the branch target cache BTB corresponding to the unconditional branch instruction, the branch information of the first unconditional branch instruction is not reserved any more, and otherwise, the unconditional branch instruction is updated according to the normal condition of updating one unconditional branch instruction at a time. This operation ensures that the branch information of two unconditional branch instructions that occur in succession is preserved. When two unconditional branch instructions of continuous jump are predicted to be wrong, the corresponding UV flag bit is reset to 10, so that the branch information of the second unconditional branch instruction can not be output when the same instruction is predicted next time.
The update information of two continuous branch instructions is reserved in the branch predictor, and if two continuously occurring unconditional branch instructions are detected and are within the same instruction fetching width, the branch information of the two unconditional branch instructions is written into the same item of the branch target cache BTB, the item can be indexed by using the address cut by the first unconditional branch instruction, and the operation of writing into the branch target cache mainly comprises the following contents: 1) The upper address of the first unconditional branch instruction is written to the HA bit. 2) Writing the low order address offset of the first unconditional branch instruction into the FBO bits, the address offset being the address of the unconditional branch instruction minus the cropped address left shift log 2 The value after FW is the low log of the instruction address 2 FW bits. 3) The branch instruction type of the first unconditional branch instruction is written to the FBY bit. 4) The branch target address of the first unconditional branch instruction is written to the FBT bit. 5) The instruction address of the second conditional branch instruction is written to the SBA bit. 6) The branch instruction type of the second conditional branch instruction is written to the SBY bit. 7) The branch target address of the second conditional branch instruction is written to the SBT bit. 8) Setting the UV flag bit to 11 represents that both unconditional branch instructions are valid. When two unconditional branch instructions appearing in succession are not satisfied, for the update of the unconditional branch instruction, only the first unconditional branch instruction is updated into the branch target cache BTB, and the update operation mainly includes the following contents: 1) The upper address of the first unconditional branch instruction is written to the HA bit. 2) Writing a low order address offset of a first unconditional branch instruction to the FBO bitsThe offset is the address of the unconditional branch instruction minus the clipped address left shift log 2 The value after FW is the low log of the instruction address 2 FW bits. 3) The branch instruction type of the first unconditional branch instruction is written to the FBY bit. 4) The branch target address of the first unconditional branch instruction is written to the FBT bit. 5) The UV flag bit is set to 10, representing that only the first unconditional branch instruction is valid.
By the two operations, corresponding information can be written into the branch target cache when two unconditional branch instructions of continuous jump are encountered, and the branch information of the two branch instructions is simultaneously given when the first unconditional branch instruction is encountered next time. However, when two unconditional branch instructions of the continuous jump are predicted to be in error, the UV flag bit of the corresponding item in the branch target cache BTB is set to be 10 only.
Taking prediction of two unconditional branch instructions with continuous jumps within the same fetch width as an example, in fig. 4, the fetch and optimized fetch modes under the conventional condition are respectively shown, where Br0 and Br1 represent unconditional branch instructions, PC represents a program counter, and 128bits is the fetch width. The target address of the jump of the unconditional branch instruction Br0 is within the same instruction fetch width, and the first branch instruction encountered after the jump is the unconditional branch instruction Br1. For the instruction fetching unit, two predictions and two fetching operations are required in the conventional branch prediction manner, and based on the branch prediction information obtained in the embodiment, the instruction can be integrated by the optimized instruction fetching unit, and the instruction fetching can be completed through one instruction fetching and one prediction, so that higher instruction fetching efficiency is obtained.
In summary, the present embodiment designs a prediction method capable of supporting two unconditional branch instructions for predicting continuous jumps simultaneously based on the existing branch prediction technology, wherein the second unconditional branch instruction is the first branch instruction encountered on the program path after the jump of the first unconditional branch instruction, and the two unconditional branch instructions are located within the same value width. The method of the embodiment aims at the situation that one unconditional branch instruction jumps to a first branch instruction which is within the same instruction fetching width and is encountered as the unconditional branch instruction, the prediction function can give out the instruction address information of a second unconditional branch instruction and the target address of the corresponding jump when the first unconditional branch instruction predicts the jump, thereby being beneficial to avoiding redundant instruction fetching operation and improving instruction fetching efficiency. The traditional branch predictor terminates the prediction for the cache line while predicting the jump of the first unconditional branch instruction, and the prediction for the first branch instruction encountered after the jump needs to be performed in the next prediction cycle, so that the fetching efficiency of the fetching unit is lower. The method of the embodiment has simple design principle, and can improve the performance gain of the processor caused by the branch predictor by improving the instruction fetching efficiency of the instruction fetching unit on the basis of basically not changing the prediction accuracy of the branch predictor by adjusting the existing branch prediction technology.
In addition, the present embodiment also provides a prediction apparatus for supporting two unconditional branch instructions for simultaneously predicting consecutive jumps, including a microprocessor and a memory connected to each other, the microprocessor being programmed or configured to execute the prediction method for supporting two unconditional branch instructions for simultaneously predicting consecutive jumps. Furthermore, the present embodiment provides a computer readable storage medium having stored therein a computer program for programming or configuring by a microprocessor to perform the prediction method supporting simultaneous prediction of two unconditional branch instructions of a sequential jump.
The above description is only a preferred embodiment of the present invention, and the protection scope of the present invention is not limited to the above examples, and all technical solutions belonging to the concept of the present invention belong to the protection scope of the present invention. It should be noted that modifications and adaptations to the present invention may occur to one skilled in the art without departing from the principles of the present invention and are intended to be within the scope of the present invention.

Claims (10)

1. A method of predicting two unconditional branch instructions that support simultaneous prediction of consecutive jumps, comprising:
s101, judging whether the current instruction hits in a branch target cache BTB, and jumping to the next step if the current instruction hits;
s102, judging whether two unconditional branch instructions located in the same instruction fetch width exist in an item hit in a branch target cache BTB, if so, acquiring a branch address of a first unconditional branch instruction in the hit item, simultaneously acquiring a branch address of a second unconditional branch instruction in the hit item, and combining the branch addresses of the two unconditional branch instructions and outputting the combined branch addresses; otherwise, only the branch address of the first unconditional branch instruction in the hit entry is acquired, and the branch address of the first unconditional branch instruction is output.
2. The method according to claim 1, wherein the fields in each entry of the branch target cache BTB include the high-order address HA, offset FBO, branch type FBY, and branch target address FBT of the first branch instruction, respectively, the instruction address SBA, branch type SBY, and branch target address SBT of the second branch instruction of the consecutive jump, and an identification UV for indicating whether the first and second branch instructions in the corresponding entry are unconditional branch instructions.
3. The method according to claim 2, wherein determining whether the current instruction hits in the branch target cache BTB in step S101 comprises:
s201, cutting the address of the current instruction;
s202, matching the high-order address of the cut address with the high-order address HA of the first branch instruction in each table item of the branch target cache BTB, and if the matched table item is found, judging that the current instruction hits in the branch target cache BTB; otherwise, the current instruction is determined to miss in the branch target cache BTB.
4. The method according to claim 3, wherein the step S201 of clipping the address of the current instruction means shifting the address of the current instruction by log to the right 2 FW bit, wherein FW is the value width.
5. The method according to claim 2, wherein the number of bits of the identification UV is two, one bit is used to indicate whether the first branch instruction in the corresponding table entry is an unconditional branch instruction by 0 and 1, the other bit is used to indicate whether the second branch instruction in the corresponding table entry is an unconditional branch instruction by 0 and 1, and the value of the identification UV is 11 or 10.
6. The method according to claim 5, wherein the step S102 of determining whether two unconditional branch instructions within the same fetch width exist in the entry hit in the branch target cache BTB is: judging whether the value of the identifier UV of the item hit in the branch target cache BTB is 11 or not, if yes, judging that two unconditional branch instructions within the same fetch width exist in the item hit in the branch target cache BTB; otherwise, it is determined that there are not two unconditional branch instructions located within the same instruction fetch width in the entry hit in the branch target cache BTB.
7. The method according to claim 2, wherein when the branch address of the first unconditional branch instruction in the hit entry is obtained in step S102, the method comprises determining the branch type FBY of the first unconditional branch instruction in the hit entry: if the branch type FBY of the first unconditional branch instruction in the hit table is a function call class jal instruction, the branch address of the first unconditional branch instruction is the branch target address FBT of the first unconditional branch instruction in the hit table, and the address of the next instruction of the unconditional branch instruction is written into the return address stack RAS; if the branch type FBY of the first unconditional branch instruction in the command table is a function return ret-like instruction, the branch address of the first unconditional branch instruction is the first pop address in the return address stack RAS.
8. The method according to claim 2, wherein when the branch address of the second unconditional branch instruction in the hit entry is obtained in step S102, the method comprises determining the branch type SBY of the second unconditional branch instruction in the hit entry: if the branch type SBY of the second unconditional branch instruction in the hit table is a function call class jal instruction, the branch address of the second unconditional branch instruction is the branch target address SBT of the second unconditional branch instruction in the hit table, and the address of the next instruction of the unconditional branch instruction is written into the return address stack RAS; if the branch type SBY of the second unconditional branch instruction in the command table is a function return ret-like instruction, the branch address of the second unconditional branch instruction is the second pop address in the return address stack RAS.
9. A prediction apparatus for supporting simultaneous prediction of two unconditional branch instructions for a continuous jump, comprising a microprocessor and a memory interconnected, wherein the microprocessor is programmed or configured to perform the method of predicting two unconditional branch instructions for supporting simultaneous prediction of a continuous jump as claimed in any one of claims 1 to 8.
10. A computer readable storage medium having a computer program stored therein, the computer program being programmed or configured by a microprocessor to perform the method of predicting two unconditional branch instructions supporting simultaneous prediction of consecutive jumps as claimed in any one of claims 1 to 8.
CN202310601060.8A 2023-05-25 2023-05-25 Prediction method and device for supporting simultaneous prediction of two unconditional branch instructions of continuous jump Pending CN116661872A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310601060.8A CN116661872A (en) 2023-05-25 2023-05-25 Prediction method and device for supporting simultaneous prediction of two unconditional branch instructions of continuous jump

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310601060.8A CN116661872A (en) 2023-05-25 2023-05-25 Prediction method and device for supporting simultaneous prediction of two unconditional branch instructions of continuous jump

Publications (1)

Publication Number Publication Date
CN116661872A true CN116661872A (en) 2023-08-29

Family

ID=87712974

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310601060.8A Pending CN116661872A (en) 2023-05-25 2023-05-25 Prediction method and device for supporting simultaneous prediction of two unconditional branch instructions of continuous jump

Country Status (1)

Country Link
CN (1) CN116661872A (en)

Similar Documents

Publication Publication Date Title
EP2519874B1 (en) Branching processing method and system
US20150339125A1 (en) Branch processing method and system
US20040098540A1 (en) Cache system and cache memory control device controlling cache memory having two access modes
CN113254079B (en) Method and system for realizing self-increment instruction
CN101884025B (en) Method and system for accelerating procedure return sequences
US6654871B1 (en) Device and a method for performing stack operations in a processing system
US7017030B2 (en) Prediction of instructions in a data processing apparatus
CN101194228B (en) Performance microprocessor and device of fast return of microcontroller subroutine
JP2008107983A (en) Cache memory
GB2293670A (en) Instruction cache
US20100106910A1 (en) Cache memory and method of controlling the same
US20040230780A1 (en) Dynamically adaptive associativity of a branch target buffer (BTB)
JP3842218B2 (en) Computer instruction with instruction fetch control bit
CN116661872A (en) Prediction method and device for supporting simultaneous prediction of two unconditional branch instructions of continuous jump
US9507600B2 (en) Processor loop buffer
CN115328552A (en) Low-cost and high-efficiency branch predictor implementation method
US7058938B2 (en) Method and system for scheduling software pipelined loops
US6862680B2 (en) Microprocessor processing specified instructions as operands
US6842846B2 (en) Instruction pre-fetch amount control with reading amount register flag set based on pre-detection of conditional branch-select instruction
CN113656074B (en) RISC architecture processor and parallel pipeline structure design method thereof
CN113946540B (en) DSP processor and processing method for judging jump instruction thereof
CN111190645B (en) Separated instruction cache structure
CN201556199U (en) Byte code high-speed cache device for real-time Java processor
US8255672B2 (en) Single instruction decode circuit for decoding instruction from memory and instructions from an instruction generation circuit
CN117311814A (en) Instruction fetch unit, instruction reading method and chip

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination