CN115562730A - Branch predictor, related device and branch prediction method - Google Patents

Branch predictor, related device and branch prediction method Download PDF

Info

Publication number
CN115562730A
CN115562730A CN202211200710.XA CN202211200710A CN115562730A CN 115562730 A CN115562730 A CN 115562730A CN 202211200710 A CN202211200710 A CN 202211200710A CN 115562730 A CN115562730 A CN 115562730A
Authority
CN
China
Prior art keywords
branch
prediction
branch predictor
storage unit
predictor
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211200710.XA
Other languages
Chinese (zh)
Inventor
刘东启
魏定彦
徐文健
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou C Sky Microsystems Co Ltd
Original Assignee
Pingtouge Shanghai Semiconductor Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Pingtouge Shanghai Semiconductor Co Ltd filed Critical Pingtouge Shanghai Semiconductor Co Ltd
Priority to CN202211200710.XA priority Critical patent/CN115562730A/en
Publication of CN115562730A publication Critical patent/CN115562730A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • G06F9/3836Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
    • G06F9/3842Speculative instruction execution
    • G06F9/3848Speculative instruction execution using hybrid branch prediction, e.g. selection between prediction techniques
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The embodiment of the invention provides a branch predictor, related equipment and a branch prediction method. The branch predictor includes: the system comprises a basic predictor and a plurality of labeled branch predictors, wherein each labeled branch predictor is provided with two storage units, and a first storage unit is a single-port storage unit and is used for storing the high bit and the flag bit of a saturation counter in a label prediction table corresponding to the current labeled branch predictor; the second storage unit is a dual-port storage unit and is used for storing the low bit and the valid bit of a saturation counter in a mark prediction table corresponding to the current mark branch predictor. The embodiment of the invention is suitable for various chips comprising a CISC instruction set, a RISC instruction set (particularly RISC-V instruction set) or a VLIM instruction set architecture, such as an Internet of things chip, an audio/video chip and the like.

Description

Branch predictor, related equipment and branch prediction method
Technical Field
The embodiment of the invention relates to the technical field of chips, in particular to a branch predictor, related equipment containing the branch predictor and a branch prediction method.
Background
With the development of modern processors, branch predictors are included in the processors to improve the performance of the processors. The branch predictor guesses which branch will be executed before the end of the execution of a conditional branch instruction (an instruction that changes the program flow, if the branch condition is satisfied, the next instruction to be executed will be changed) of the processor, so as to improve the performance of the instruction pipeline of the processor and improve the flow of the instruction pipeline. However, in the operation process of some branch predictors (e.g., hybrid branch predictors such as TAGE predictors), the historical execution result of the corresponding memory needs to be read for branch prediction, and the prediction result needs to be written into the memory for updating, which results in read-write conflict of the memory.
To avoid this problem, some branch prediction schemes in the prior art can solve the read-write conflict, for example, one way is to decouple the front-end pipeline and the predictor, but this way increases more circuit logic and increases the area and power consumption of the processor; in another way, when a read-write conflict occurs, the pipeline is suspended, and a read operation is performed after the write operation is completed, but this way may cause a performance loss of the processor.
Therefore, how to effectively avoid the read/write conflict of the branch predictor while considering the power consumption, area and performance of the processor becomes an urgent problem to be solved.
Disclosure of Invention
Embodiments of the present invention provide a branch predictor, a related apparatus and a branch prediction method to at least partially solve the above problems.
According to a first aspect of embodiments of the present invention, a branch predictor is provided. The branch predictor includes: the device comprises a base predictor and a plurality of labeled branch predictors, wherein each labeled branch predictor is provided with two storage units which are a first storage unit and a second storage unit; wherein: the first storage unit is a single-port storage unit and is used for storing saturation counter high bits of preset bits for performing branch jump prediction in a label prediction table corresponding to a current label branch predictor and flag bits for branch hit judgment; the second storage unit is a dual-port storage unit and is used for storing a saturation counter low bit of a preset bit number for saturation updating in a mark prediction table corresponding to the current mark branch predictor and a valid bit for indicating the validity of an entry of the prediction table of the current mark branch predictor.
According to a second aspect of embodiments of the present invention, there is provided a processor core, including: the branch predictor according to the first aspect.
According to a third aspect of embodiments of the present invention, there is provided a pipelined processor. A pipelined processor comprising a branch predictor according to the first aspect; alternatively, the processor core of the second aspect.
According to a fourth aspect of the embodiments of the present invention, there is provided a chip including: the branch predictor according to the first aspect; or, the processor core of the second aspect; alternatively, a processor according to the third aspect.
According to a fifth aspect of embodiments of the present invention, there is provided a control apparatus including: the branch predictor according to the first aspect; or the processor core of the second aspect; or, the processor according to the third aspect; alternatively, the chip according to the fourth aspect.
According to a sixth aspect of the embodiments of the present invention, there is provided a branch prediction method. The branch prediction method comprises the following steps: judging whether the prediction result of the branch predictor is correct or wrong; if the prediction is correct, aiming at the mark branch predictor with correct prediction in a plurality of mark branch predictors of the branch predictors, writing and updating the low bit of a saturation counter with preset bit number, which is stored in a second storage unit and is subjected to saturation updating in a mark prediction table, through a data writing port in a dual port of the second storage unit of the mark branch predictor; if the branch prediction is wrong, aiming at the label branch predictor with the wrong prediction in a plurality of label branch predictors of the branch predictors, writing and updating the high order of a saturation counter with preset digit for performing branch jump prediction in a label prediction table, which is stored in a first storage unit, by using a single port of a first storage unit of the label branch predictor as a data writing port; and writing and updating the lower bits of a saturation counter of a preset number of bits for saturation updating in the mark prediction table stored in the second storage unit through a data writing port in a dual port of the second storage unit.
In the scheme of the embodiment of the invention, a storage unit of a labeled branch predictor in the branch predictor is divided into two storage units, namely a first storage unit and a second storage unit, wherein the first storage unit stores a saturation counter high bit of a preset bit number for performing branch jump prediction in a label prediction table corresponding to the current labeled branch predictor and a flag bit for branch hit judgment; the second storage unit stores the low order of a saturation counter of a preset number of bits for saturation updating in a mark prediction table corresponding to the current mark branch predictor and the effective bit for indicating the validity of the table entry of the prediction table of the current mark branch predictor. Since the flag bit and the high bit of the saturation counter do not need to be updated in case of correct branch prediction, only the low bit and the valid bit of the saturation counter need to be periodically updated. According to different updating conditions, the part of each branch which needs to be updated is split to be stored by using a second storage unit, the second storage unit is realized by using a dual-port memory, and the first storage unit is realized by using a single-port memory, so that the balance of the performance and the area of the processor is ensured.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments or the prior art descriptions will be briefly described below, it is obvious that the drawings in the following descriptions are only some embodiments described in the embodiments of the present invention, and other drawings can be obtained by those skilled in the art according to these drawings.
FIG. 1 is a schematic block diagram of a processor including a branch predictor according to one example.
FIG. 2 is a schematic block diagram of a branch predictor according to one embodiment of the present invention.
FIG. 3 is a diagram illustrating an exemplary memory location configuration for a tagged branch predictor as shown in FIG. 2.
FIG. 4 is a block diagram of a pipelined processor according to another embodiment of the present disclosure.
FIG. 5 is a block diagram illustrating the steps of a branch prediction method according to another embodiment of the present invention.
Detailed Description
In order to make those skilled in the art better understand the technical solutions in the embodiments of the present invention, the technical solutions in the embodiments of the present invention will be described in detail below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all the embodiments. All other embodiments obtained by a person skilled in the art based on the embodiments of the present invention shall fall within the scope of the protection of the embodiments of the present invention.
The following terms are used herein:
conditional branch instruction: the instructions of the program flow may be changed. If the branch condition is true, the next instruction to be executed is changed.
Branch predictor: guessing which branch will be executed before branch instruction execution ends improves the performance of the processor's instruction pipeline. By using a branch predictor, the flow of instruction pipelining may be improved.
TAGE predictor: the TAGE is called a Tagged Geometric History Branch Predictor, is a hybrid Predictor, and has the advantages that Branch prediction can be performed on a certain Branch instruction according to Branch History sequences with different lengths, the accuracy of the Branch instruction in each History sequence is evaluated, and the highest historical accuracy is selected as the judgment standard of final Branch prediction.
A saturation counter: also known as bimodal predictors, there are typically 4 state machines, namely: strong unselected, weak selected, strong selected. When a branch command is evaluated, the corresponding state machine is modified. If the branch is not adopted, the state value is reduced towards the direction of 'strong non-selection'; if the branch is taken, the state value is increased towards a "strong selection" direction.
A processor core: also referred to as processor cores, are the cores of a processor, and a processor may have multiple (two or more) cores, but one core belongs to only one processor. The processor core assumes the functions of the processor compute engine, and all computations, commands accepted/stored, and data processed are performed in the processor core.
A pipelined processor: a processor having a pipeline of stages, each stage performing a different task with respect to program instructions. In a standard pipelined processor, the stages typically include five stages, instruction fetch, instruction decode, operand fetch, execute, and write back results.
A dual-port memory cell: a read-write memory cell can be supported simultaneously.
Single-port memory cell: the memory cell can only read or write at the same time, and can not read or write at the same time.
Hereinafter, aspects of embodiments of the present invention will be described based on the above terms.
Generally, a hardware processor having one or more computational cores may execute instructions (e.g., threads of instructions) to operate on data, such as to perform arithmetic, logical, or other functions. In some examples, the executed instruction operations (e.g., threads) include one or more branch operations (e.g., branch instructions).
In some examples, a branch operation is either unconditional (e.g., the branch is taken each time the instruction is executed) or conditional (e.g., the direction taken for the branch is conditional). For example, an instruction to be executed after a conditional branch (e.g., a conditional jump) is not known exactly until the condition to which the branch is compliant is resolved (resolve). In this case, rather than waiting until the condition is determined, a branch predictor of the processor may perform branch prediction to predict whether a branch will be taken and/or predict a target instruction (e.g., target address) for the branch. In some examples, if a branch is predicted to be taken, the processor fetches and speculatively executes instruction(s), such as instructions found at the predicted branch target address, for the direction (e.g., path) of the taken branch. Instructions executed after a branch prediction are speculative in some examples, where the processor has not determined whether the prediction is correct. In some examples, the processor decides the branch instruction at the back end of the pipeline circuit (e.g., in an execution, retirement, and/or writeback unit/circuit). In some examples, if the branch instruction is determined by the processor (e.g., by the back end) to not be taken, then all instructions following the branch instruction that are currently taken in the pipeline circuit are flushed (e.g., discarded). In some examples, a branch predictor (e.g., a branch predictor) learns from past behavior of branches to predict a next (e.g., incoming) branch.
The processor architecture diagram of FIG. 1 illustrates a hardware processor 100 that includes at least one branch predictor 104 (1) -104 (N) and at least one (e.g., data load dependent) branch redirect circuit 102 (1) -102 (N), where the hardware processor 100 may be a pipelined processor. Although multiple branch predictors are depicted in FIG. 1, a single branch predictor may be utilized for branch prediction for the compute cores 106 (1) -106 (N). In some examples, the branch predictors are distributed, with each computational core including its own local branch predictor 104 (1) -104 (N). Each local branch predictor 104 (1) -104 (N) may share data, such as a history of branch instructions executed by the processor 100.
In some examples, N is any integer greater than two. Hardware processor 100 may be coupled to system memory 114 to form a computing system. The computational core of hardware processor 100 may include, for example, any of instruction fetch circuitry, decoders, execution circuitry, or retirement circuitry (or other units or circuitry discussed herein) as pipeline circuitry for the computational core.
Hardware processor 100 may also include registers 108. In addition to, or in lieu of, accessing data in system memory 114, for example, registers 108 may include one or more general purpose registers 110 to perform (e.g., logical or arithmetic) operations. Registers 108 may include one or more architectural register files 112. In some examples, the processor 100 (e.g., a branch predictor thereof) will populate branch history data (e.g., context data) into one or more registers 108 based on the instruction (e.g., a branch instruction). In another embodiment, the branch history may be saved to system memory 114. The branch history may include a global history of the branch instruction (e.g., including a history of paths taken through a series of branches of currently executing program code to reach the branch instruction), as well as an address identifier of the branch instruction (e.g., an instruction pointer value or program counter value associated with the branch instruction).
The system memory 114 may include (e.g., store) one or more of the following: operating System (OS) code 116, or application code 118.
The branch redirect circuit 102 for the core 106 is used to redirect incorrect predictions.
The branch predictor 104 in the processor 100 may employ techniques such as a Ghare predictor, a TAGE predictor, a journey branch predictor, etc. Because of the better performance of the TAGE predictor in the processor, in the embodiments of the present invention, only the TAGE predictor is taken as an example, and the solution of the embodiments of the present invention will be described.
An exemplary structure of a TAGE predictor is shown in FIG. 2, which splits a branch into history-dependent and history-independent branches, which are predicted using a base prediction table and a tag prediction table, respectively. Specifically, as shown in FIG. 2, it includes a base predictor and n (typically four) tagged labeled branch predictors. Wherein the base predictor uses a base prediction table, shown schematically as T0, for predicting historical irrelevant branches; the labeled prediction tables corresponding to the four labeled branch predictors are respectively represented as T1, T2, T3 and T4 and are used for predicting branches related to the history.
Each entry (each row) of the base prediction table T0 includes a saturation counter ctr of 2 bits. The base predictor directly indexes a 2-bit saturation counter by the number of program counters PC XOR T0 entries. Each tag prediction table has a certain number of entries (table rows), and the entries of different tag prediction tables may be different. However, each table entry includes three parts, namely: a saturation counter ctr indicating the branch instruction jump, a flag bit tag for the matching PC, and a signal valid bit u indicating that the current entry is valid.
In addition, the branch predictor also includes a history register h to record history prediction information.
When the branch instruction enters the predictor, the prediction results of the five tables are obtained, and then the result value with the highest priority is selected as the final prediction result of the time according to the priorities of the five tables.
An exemplary branch prediction process based on the branch predictor shown in FIG. 2 includes: (1) Each branch instruction corresponds to a program counter value PC. Firstly, indexing a T0 table by partial bits of a PC to obtain the value of a two-bit saturation counter; (2) Dividing the history register h into 4 equal parts, and performing two different hash calculations on the PC part bit and the 1/4 length bit width, the 2/4 length bit width, the 3/4 length bit width and the 4/4 length bit width of the history register respectively to obtain 8 result values serving as index values and mark values of 4 tables of T0-T4; (3) Selecting the table items (table rows) corresponding to the 4 tables of T1, T2, T3 and T4 by using the index value obtained in the step (2), taking out the tag bits of the corresponding table items, and comparing the tag bits with the mark value obtained in the step (2); if the two entries are equal, the ctr of the corresponding entry is taken out; otherwise, ignoring the predicted value of the table; (4) And (4) obtaining more than one predicted value according to the (2) and the (3), wherein the priority order of the predicted values is T4> T3> T2> T1> T0, and selecting the final predicted value according to the priority.
In the branch prediction process, the memory (e.g., SRAM) of each predictor needs to be periodically accessed, for example, the historical execution result of the memory needs to be read, and the prediction result needs to be written into the memory for updating, which causes read-write conflict of the memory.
However, in the conventional method, by decoupling the front-end pipeline and the branch predictor, or suspending the pipeline when read-write conflict occurs, the complexity and area of the processor are increased, or the performance of the processor is lost.
To this end, an embodiment of the present invention provides a branch predictor, which includes a base predictor and a plurality of labeled branch predictors, as shown in fig. 2. However, unlike conventional branch predictors, in embodiments of the present invention, each tagged branch predictor employs a dual memory location structure. That is, each labeled branch predictor has two memory locations, a first memory location and a second memory location, respectively.
The first storage unit is a single-port storage unit and is used for storing saturation counter high bits of preset bits for performing branch jump prediction in a tag prediction table corresponding to the current tag branch predictor and flag bits for branch hit judgment. The second storage unit is a dual-port storage unit and is used for storing a saturation counter low bit of a preset bit number for saturation updating in a mark prediction table corresponding to the current mark branch predictor and a valid bit for indicating the validity of an entry of the prediction table of the current mark branch predictor. By the mode, the part needing to be updated in each mark branch predictor can be split, and the method is realized by using a dual-port memory, so that the balance of the performance and the area of the processor is ensured while the read-write conflict is solved.
Preferably, the high bit of the saturation counter is the highest bit of the saturation counter, and the low bit of the saturation counter is other bits of the saturation counter except the highest bit. In branch predictors, and in particular in the TAGE predictor described in FIG. 2, the highest bit of the saturation counter (herein, "bit" in "high", "low", etc. means all bits) is used for branch jump prediction, which does not need to be updated if the prediction is correct.
In general, for the saturation counter, when the tagged branch predictor predicts correctly, the corresponding tagged prediction tables T1 to T4 hit, the saturation counter in the tagged prediction table of the tagged branch predictor that predicts correctly will be updated, and the saturation counters in the tagged prediction tables of the other three tagged branch predictors will be updated when the prediction is wrong. If the tagged branch predictors are all in wrong prediction, the number of T0 entries of the basic prediction table of the PC XOR basic predictor is directly indexed into a 2-bit saturation counter to be updated. For the valid bit, when one marked branch predictor predicts correctly and other marked branch predictors predict incorrectly, the valid bit corresponding to the correctly predicted marked branch predictor is added with 1. In addition, the valid bits corresponding to all the labeled branch predictors have the maximum value, and in practical applications, periodic updating is performed according to the condition that the valid bits corresponding to the labeled branch predictors have the maximum value.
However, in actual table entry updating, the marked branch predictor predicts the correct condition more often, and for the saturation counter, only part of bits of the marked branch predictor need to be updated under the correct condition. Based on this, in one possible approach, after one branch prediction is completed, the branch predictor performs a data update operation on the second storage unit according to the branch prediction result, or performs a data update operation on both the first storage unit and the second storage unit.
When the branch prediction result is that the branch prediction is correct, the branch predictor writes and updates the low bit of the saturation counter stored in the second storage unit through the data write-in port of the second storage unit of the mark branch predictor with correct prediction. In this case, since the second storage unit has both a data write port and a data read port, data read is not affected during data write, and since most of the time the prediction result of the branch predictor is correct, the use of the dual-port second storage unit stores the low bit of the saturation counter, which not only effectively ensures data operation on the storage unit, but also effectively avoids read-write collision, thereby achieving effective balance between processor area and performance.
When the branch prediction result is a branch prediction error, the branch predictor takes a single port of a first storage unit of the marked branch predictor with the prediction error as a data writing port, and the high order of a saturation counter stored in the first storage unit is written and updated; and writing and updating the low bits of the saturation counter stored in the second storage unit through a data writing port in the dual port of the second storage unit of the marked branch predictor with the prediction error. In this case, a branch prediction error may cause a null in the tag prediction table, and the high and low bits of the saturation counter need to be updated. However, because the prediction error probability is very low, the first storage unit of the single port is used for storing the high bit of the saturation counter, and the single port is used as a data writing port during updating, so that the performance of the processor is not obviously influenced, and the area of the processor is effectively reduced.
In addition, when the branch predictor receives a data reading instruction sent by the processor to which the branch predictor belongs, each mark branch predictor takes a single port of the first storage unit as a data reading port and provides reading access to the high bit of the saturation counter; and, read access to the lower bits of the saturation counter is provided through a data read port in the dual port of the second memory cell.
According to the embodiment, a storage unit of a labeled branch predictor part in the branch predictor is divided into two storage units, namely a first storage unit and a second storage unit, wherein the first storage unit stores a saturation counter high bit of a preset bit number for performing branch jump prediction in a label prediction table corresponding to the current labeled branch predictor and a flag bit for branch hit judgment; the second storage unit stores the low order of a saturation counter of a preset number of bits for saturation updating in a mark prediction table corresponding to the current mark branch predictor and the effective bit for indicating the validity of the table entry of the prediction table of the current mark branch predictor. Since the flag bit and the high bit of the saturation counter do not need to be updated in case of correct branch prediction, only the low bit and the valid bit of the saturation counter need to be periodically updated. According to different updating conditions, the part of each branch which needs to be updated is split to be stored by using a second storage unit, the second storage unit is realized by using a dual-port memory, and the first storage unit is realized by using a single-port memory, so that the balance of the performance and the area of the processor is ensured.
The above process is exemplified below with reference to fig. 2 and 3, taking the TAGE predictor as an example.
As mentioned previously, the TAGE predictor mainly consists of: the T0 level basic predictor is used for providing a default prediction result; tn-level multiple labeled branch predictors (4 in the example of FIG. 2), which contain three logic components: (1) tag (flag bit) of N bit, which is used to mark the hit judgment (i.e. branch hit judgment) of the table entry of the prediction table; (2) a counter (saturation counter) of M bit for indicating the predicted value of the table entry of the mark prediction table; (3) j bit, useful counter, to indicate the use (valid or not) of the entry of the tag prediction table.
In this example, the above three logic units will be divided into two memory cells (as shown in fig. 3), where:
(1) the first storage unit (TAGE _ HI) includes: the highest bit (ctr _ hi, highest bit of the saturation counter) of the 1-bit counter is used as a jump prediction value, and N-bit tag (flag bit);
(2) the second storage unit (TAGE _ LO) includes: an M-1bit counter (ctr _ lo, the lower bit of the saturation counter except the highest bit) for performing saturation update of the counter; j bit useFUL counter (u, significant bit).
The second storage unit has two port ports, and can support simultaneous read-write operation.
Based on the above arrangement, access to the memory can be implemented as:
(1) And (3) reading: for a conventional read operation, both the base predictor and the high and low memory portions of the tagged branch predictor will be accessed, i.e., both the first memory location and the second memory location will be accessed.
(2) And (3) writing: for the case of correct branch prediction, only the second storage unit needs to be updated, and the first storage unit does not need to be updated, so that only the second storage unit needs to be stored by using a read-write operation; and for the case of branch prediction error, if a bubble exists in the prediction error, updating the first storage unit and the second storage unit.
According to the above example, since the highest bits of tag and counter do not need to be updated under the condition that the branch prediction is correct, and only the lower bits of counter and usefull counter need to be updated every cycle, the memory unit is split according to the difference of the update conditions, and the memory unit which needs to support simultaneous reading and writing uses dual ports, and the other memory units use single ports, thereby ensuring the balance of processor performance and area.
FIG. 4 is a block diagram of a pipelined processor according to another embodiment of the present disclosure. The pipelined processor 500 of the present embodiment includes the branch predictor described in the previous embodiments, and it should be understood that the pipelined processor 500 may be a single-core processor or a multi-core processor.
In some examples, each core of the pipelined processor 500 includes a branch prediction stage, an instruction fetch stage, a decode stage, an allocation stage, an execution stage, and a write-back (e.g., retirement) stage. Each of the above stages may include different levels of circuitry. Alternatively, the above-described line phases may be subdivided into a larger number of phases. In addition, additional pipeline stages may also be included, such as a prefetch stage, an instruction pointer generation (IP Gen) stage, and so forth.
In some examples, the pipelined processor 500 receives an Instruction Pointer (IP) that identifies the next instruction to be input into the processor. For example, the IP generation stage may select an instruction pointer (e.g., a memory address) that identifies the next instruction in a program sequence to be fetched and executed by a core (e.g., a logic core). In some examples, the pipelined processor 500 (e.g., the IP generation stage) increments the memory address of the most recently fetched instruction by a predetermined amount X (e.g., 1) each clock cycle.
However, in the case of an exception, or when a branch instruction is taken, the pipelined processor 500 (e.g., the IP generation stage) may select an instruction pointer that identifies the next sequential instruction in program order that is not. In some examples, the pipelined processor 500 (e.g., a branch prediction stage) predicts whether a conditional branch instruction is to be taken, e.g., to reduce branch penalties.
On the basis of the processor, the embodiment of the invention further provides a chip, which at least comprises the branch predictor or the processor core or the processor as described above. In practical applications, the chip may further include hardware, a controller, and the like for implementing various functions according to different actual requirements, but it is within the scope of the present invention as long as the chip includes the branch predictor or the processor core or the processor.
Further, an embodiment of the present invention also provides a control device, which includes at least the branch predictor or the processor core or the processor or the chip as described above. In practical applications, the control device may be implemented as any suitable device, such as a mobile control device, an industrial control device, a desktop control device, and so on.
In addition, an embodiment of the invention further provides a branch prediction method, and fig. 5 is a schematic step diagram of a branch prediction method according to another embodiment of the invention. The branch prediction method of the embodiment comprises the following steps:
s510: judging whether the prediction result of the branch predictor is correct or wrong; if the prediction is correct, go to step S520; if the prediction is wrong, step S530 is executed.
In a possible manner, before this step, the method may further include: acquiring a program counter value PC corresponding to the branch instruction; according to the program counter value PC, indexing and reading a saturated counter value in a basic prediction table corresponding to a basic predictor in the branch predictor; according to a program counter value PC, indexing and reading a saturation counter value in a mark prediction table corresponding to each mark branch predictor in the branch predictors; and determining a prediction result according to the read saturation counter value in the basic prediction table and the read saturation counter value in the mark prediction table.
Wherein reading the saturated counter value in the tag prediction table corresponding to each tag branch predictor in the branch predictors can be implemented as: for each marking branch predictor, taking a single port of a first storage unit as a data reading port, and performing reading access on the high order of a saturation counter; and, a read access is made to the lower bits of the saturation counter through a data read port in the dual ports of the second memory cell.
S520: and if the prediction is correct, writing and updating the lower bits of a saturation counter with preset bits for saturation updating in the label prediction table stored in the second storage unit through a data writing port in a dual port of the second storage unit of the label branch predictor aiming at the label branch predictor with correct prediction in the plurality of label branch predictors.
S530: if the branch prediction is wrong, aiming at the label branch predictor with the wrong prediction in a plurality of label branch predictors of the branch predictors, writing a single port of a first storage unit into a port through a single-port first storage unit of the label branch predictor, and performing writing updating operation on the high bits of a saturation counter with preset bits for performing branch jump prediction in a label prediction table, which is stored in the first storage unit; and writing and updating the lower bits of the saturation counters of the preset number of bits for performing saturation updating in the mark prediction table stored in the second storage unit through the data writing ports in the dual ports of the second storage unit.
When a prediction error occurs, the labeled branch predictors all have an error, and the operation of step S530 is performed for each labeled branch predictor.
It should be understood that the branch prediction method of the present embodiment is described simply, and reference may be made to the foregoing description of the branch predictor, and corresponding beneficial effects are not described herein again.
In addition, for specific implementation of each step in the program, reference may be made to corresponding steps and corresponding descriptions in units in the foregoing method embodiments, which are not described herein again. It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described devices and modules may refer to the corresponding process descriptions in the foregoing method embodiments, and are not described herein again.
It should be noted that, according to the implementation requirement, each component/step described in the embodiment of the present invention may be divided into more components/steps, and two or more components/steps or partial operations of the components/steps may also be combined into a new component/step to achieve the purpose of the embodiment of the present invention.
The above-described method according to an embodiment of the present invention may be implemented in hardware, firmware, or as software or computer code storable in a recording medium such as a CD ROM, a RAM, a floppy disk, a hard disk, or a magneto-optical disk, or as computer code originally stored in a remote recording medium or a non-transitory machine-readable medium downloaded through a network and to be stored in a local recording medium, so that the method described herein may be stored in such software processing on a recording medium using a general-purpose computer, a dedicated processor, or programmable or dedicated hardware such as an ASIC or FPGA. It is understood that a computer, processor, microprocessor controller, or programmable hardware includes memory components (e.g., RAM, ROM, flash memory, etc.) that can store or receive software or computer code that, when accessed and executed by a computer, processor, or hardware, implements the methods described herein. Furthermore, when a general-purpose computer accesses code for implementing the methods illustrated herein, execution of the code transforms the general-purpose computer into a special-purpose computer for performing the methods illustrated herein.
Those of ordinary skill in the art will appreciate that the various illustrative elements and method steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present embodiments.
The above embodiments are only used for illustrating the embodiments of the present invention, and not for limiting the embodiments of the present invention, and those skilled in the art can make various changes and modifications without departing from the spirit and scope of the embodiments of the present invention, so that all equivalent technical solutions also belong to the scope of the embodiments of the present invention, and the scope of patent protection of the embodiments of the present invention should be defined by the claims.

Claims (14)

1. A branch predictor, comprising: the device comprises a base predictor and a plurality of labeled branch predictors, wherein each labeled branch predictor is provided with two storage units which are a first storage unit and a second storage unit;
wherein:
the first storage unit is a single-port storage unit and is used for storing saturation counter high bits of preset bits for performing branch jump prediction in a label prediction table corresponding to a current label branch predictor and flag bits for branch hit judgment;
the second storage unit is a dual-port storage unit and is used for storing a saturation counter low bit of a preset bit number for saturation updating in a mark prediction table corresponding to the current mark branch predictor and a valid bit for indicating the validity of an entry of the prediction table of the current mark branch predictor.
2. The branch predictor of claim 1, wherein the saturating counter is higher than the highest bit of the saturating counter and lower than the other bits of the saturating counter except the highest bit.
3. The branch predictor according to claim 1 or 2, wherein after one branch prediction is completed, the branch predictor performs a data update operation on the second storage unit or performs a data update operation on both the first storage unit and the second storage unit according to a branch prediction result.
4. The branch predictor of claim 3,
and when the branch prediction result is that the branch prediction is correct, the branch predictor performs write-in updating operation on the low order of a saturation counter stored in a second storage unit of the marking branch predictor through a data write-in port of the second storage unit of the marking branch predictor with correct prediction.
5. The branch predictor of claim 3,
when the branch prediction result is a branch prediction error, the branch predictor takes a single port of a first storage unit of a marked branch predictor with the prediction error as a data write-in port and carries out write-in updating operation on the high bit of a saturation counter stored in the first storage unit; and writing and updating the lower bits of the saturation counter stored in the second storage unit through a data writing port in the dual port of the second storage unit of the marked branch predictor with the prediction error.
6. The branch predictor as claimed in claim 1 or 2, wherein each tag branch predictor provides read access to the high bits of the saturation counter with a single port of the first memory unit as a data read port when the branch predictor receives a data read instruction sent by its processor; and providing read access to the lower bits of the saturation counter through a data read port in the dual ports of the second memory cell.
7. The branch predictor of claim 1 or 2, wherein the branch predictor is a hybrid branch predictor.
8. The branch predictor of claim 7, wherein the hybrid branch predictor is a TAGE hybrid branch predictor.
9. A processor core, comprising:
the branch predictor of any one of claims 1-8.
10. A pipelined processor, comprising:
the branch predictor of any one of claims 1-8; or, the processor core of claim 9.
11. A chip, comprising:
the branch predictor of any one of claims 1-8; or, the processor core as recited in claim 9; or, a processor according to claim 10.
12. A control device, comprising:
the branch predictor of any one of claims 1-8; or, the processor core as recited in claim 9; or, a processor according to claim 10; alternatively, a chip as claimed in claim 11.
13. A branch prediction method, comprising:
judging whether the prediction result of the branch predictor is correct or wrong;
if the prediction is correct, writing and updating the low bit of a saturation counter with preset bit number for saturation updating in a label prediction table, which is stored in a second storage unit, through a data writing port in a dual port of the second storage unit of the label branch predictor aiming at the label branch predictor with correct prediction in a plurality of label branch predictors;
if the branch prediction is wrong, aiming at the label branch predictor with the wrong prediction in a plurality of label branch predictors of the branch predictors, writing and updating the high order of a saturation counter with preset digit for performing branch jump prediction in a label prediction table, which is stored in a first storage unit, by using a single port of a first storage unit of the label branch predictor as a data writing port; and writing and updating the low bits of the saturation counter with the preset number of bits for saturation updating in the mark prediction table stored in the second storage unit through a data writing port in the dual port of the second storage unit.
14. The branch predictor of claim 13, wherein prior to the determining that the prediction outcome of the branch predictor is either prediction correct or prediction incorrect, the method further comprises:
acquiring a program counter value corresponding to the branch instruction;
according to the program counter value, indexing and reading a saturation counter value in a basic prediction table corresponding to a basic predictor in the branch predictor; according to the program counter value, indexing and reading a saturation counter value in a mark prediction table corresponding to each mark branch predictor in the branch predictors;
and determining the prediction result according to the read saturation counter value in the basic prediction table and the read saturation counter value in the mark prediction table.
CN202211200710.XA 2022-09-29 2022-09-29 Branch predictor, related device and branch prediction method Pending CN115562730A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211200710.XA CN115562730A (en) 2022-09-29 2022-09-29 Branch predictor, related device and branch prediction method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211200710.XA CN115562730A (en) 2022-09-29 2022-09-29 Branch predictor, related device and branch prediction method

Publications (1)

Publication Number Publication Date
CN115562730A true CN115562730A (en) 2023-01-03

Family

ID=84743378

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211200710.XA Pending CN115562730A (en) 2022-09-29 2022-09-29 Branch predictor, related device and branch prediction method

Country Status (1)

Country Link
CN (1) CN115562730A (en)

Similar Documents

Publication Publication Date Title
US9298467B2 (en) Switch statement prediction
US7437543B2 (en) Reducing the fetch time of target instructions of a predicted taken branch instruction
US9367471B2 (en) Fetch width predictor
US7471574B2 (en) Branch target buffer and method of use
US9201654B2 (en) Processor and data processing method incorporating an instruction pipeline with conditional branch direction prediction for fast access to branch target instructions
US6081887A (en) System for passing an index value with each prediction in forward direction to enable truth predictor to associate truth value with particular branch instruction
US8943298B2 (en) Meta predictor restoration upon detecting misprediction
US20130152048A1 (en) Test method, processing device, test program generation method and test program generator
JP5231403B2 (en) Sliding window block based branch target address cache
US10664280B2 (en) Fetch ahead branch target buffer
US20030065912A1 (en) Removing redundant information in hybrid branch prediction
US8151096B2 (en) Method to improve branch prediction latency
EP1974254B1 (en) Early conditional selection of an operand
KR20220017403A (en) Limiting the replay of load-based control-independent (CI) instructions in the processor's speculative predictive failure recovery
EP2690549A1 (en) Arithmetic processing device, information processing device, and arithmetic processing method
JPH08320788A (en) Pipeline system processor
US5295248A (en) Branch control circuit
JP3725547B2 (en) Limited run branch prediction
JP3802038B2 (en) Information processing device
CN112740175A (en) Load path history based branch prediction
CN110741343A (en) Multi-labeled branch prediction table
CN115562730A (en) Branch predictor, related device and branch prediction method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20240301

Address after: 310052 Room 201, floor 2, building 5, No. 699, Wangshang Road, Changhe street, Binjiang District, Hangzhou City, Zhejiang Province

Applicant after: C-SKY MICROSYSTEMS Co.,Ltd.

Country or region after: China

Address before: 201208 floor 5, No. 2, Lane 55, Chuanhe Road, No. 366, Shangke Road, China (Shanghai) pilot Free Trade Zone, Pudong New Area, Shanghai

Applicant before: Pingtouge (Shanghai) semiconductor technology Co.,Ltd.

Country or region before: China