CN111258654B

CN111258654B - Instruction branch prediction method

Info

Publication number: CN111258654B
Application number: CN201911324044.9A
Authority: CN
Inventors: 张俊逍; 王前; 葛悦飞
Original assignee: Ningbo Fugu Technology Co ltd
Current assignee: Ningbo Fugu Technology Co ltd
Priority date: 2019-12-20
Filing date: 2019-12-20
Publication date: 2022-04-29
Anticipated expiration: 2039-12-20
Also published as: CN111258654A

Abstract

The invention provides an instruction branch prediction method, which comprises the following steps: obtaining a prediction table, wherein the prediction table comprises a mark field, and the mark field is used for predicting whether a corresponding instruction is a compression instruction or not and is not aligned to jump. The instruction branch prediction method of the invention marks the instruction type corresponding to the index and how the instruction is jumped by adding a marking field in the prediction table, thereby being capable of distinguishing whether the current instruction is a compressed instruction and carrying out non-aligned jump, being capable of accurately predicting the jump of the instruction and improving the working efficiency of the processor.

Description

Instruction branch prediction method

Technical Field

The invention relates to the technical field of predictors, in particular to an instruction branch prediction method.

Background

Processors typically employ a pipelined architecture and support conditional branch instruction execution. Execution in a processor employing a pipeline architecture may cause the pipeline to stall prior to the determination of a condition, the longer the pipeline, the longer the processor latency. To avoid processor performance loss, branch prediction techniques are used to allow the processor to speculatively fetch and execute instructions based on predicted branch behavior. If a misprediction branch holds, instructions in the pipeline that are fetched and executed based on the prediction are flushed and new instructions are fetched again from the determined branch address. The higher the branch predictor prediction accuracy, the less processor performance penalty.

Traditional predictors are based on the repetitive behavior of branches. Branch predictors learn branch behavior by recording the address of the encountered branch and the branch history, thereby predicting the branch outcome for a particular branch instruction from the last few records of the same branch instruction. For the RSIC-V instruction set, 32-bit instructions and 16-bit instructions can be stitched together, which presents challenges to reading the prediction accuracy of conventional predictors of fixed instruction length. When 32-bit instructions are intermixed with 16-bit instructions, the 32-bit instructions may generate the same number of indices as the 16-bit compressed instructions where the non-aligned jump occurs, and thus the 32-bit instructions may share an entry with the non-aligned 16-bit compressed instructions, causing aliasing.

Disclosure of Invention

The instruction branch prediction method provided by the invention can identify the non-aligned jump situation of the compressed instruction and improve the prediction accuracy.

The invention relates to an instruction branch prediction method, which comprises the following steps:

a predictor acquires a prediction table, wherein the prediction table comprises a mark field;

the predictor predicts whether the instruction is a compressed instruction or not according to the mark field in the prediction table and generates non-aligned jump.

Optionally, the data recorded in the tag field has a first state and a second state;

the first state predicts that the current jump instruction is a compressed instruction and a non-aligned jump occurs;

the second state predicts that the current jump instruction is a non-compressed instruction or that no non-aligned jump occurs.

Optionally, the data recorded in the tag field updates the state according to the last instruction jump condition;

when the last instruction jump condition is a compression instruction and a non-aligned jump occurs, updating the data recorded in the mark field to be in a first state;

and when the last instruction jump condition is a non-compressed instruction or a non-aligned jump does not occur, updating the data recorded in the mark field to be in a second state.

Optionally, the first state is represented by a binary digit of 1 and the second state is represented by a binary digit of 0.

Optionally, the prediction table further comprises an index number field, a confidence field, and a prediction result field.

Optionally, judging whether the data state of the mark field is reset according to the data state of the confidence field; and when the confidence coefficient field of the prediction table is 0, filling the index number of the currently acquired instruction into the index number field, and resetting the data in the mark field to be in a first state or a second state.

Optionally, judging whether the current instruction skips according to the current entry according to the confidence field of the current entry of the prediction table;

and when the confidence coefficient data of the confidence coefficient field of the current entry is higher than a preset value, predicting the jump of the current instruction according to the data of the prediction result field and the marking field.

Optionally, the prediction table is a Base Predictor table or a tag table.

Optionally, matching the acquired instruction in a Base Predictor table and a plurality of tag table at the same time;

when the table has hit entries, predicting instruction jump according to the table with the longest historical information;

when there is no hit entry in the tag table, a jump to the instruction is predicted from the Base Predictor table.

Optionally, the prediction table is used to predict jumps of 32-bit instructions and 16-bit compressed instructions in the RSIC-V instruction set.

The invention provides an instruction branch prediction method, which is characterized in that a marking field is added into a prediction table to mark an instruction type corresponding to an index and how the instruction jumps, so that whether a current instruction is a compressed instruction or not can be distinguished, non-aligned jump is carried out, the jump of the instruction can be accurately predicted, and the working efficiency of a processor is improved.

Drawings

FIG. 1 is a flow chart of an embodiment of a method for instruction branch prediction according to the present invention;

FIG. 2 is a diagram of a predictor for an embodiment of a method for instruction branch prediction according to the present invention;

FIG. 3 is a block diagram illustrating instruction fetching and jumping according to one embodiment of the present invention;

FIG. 4 is a diagram illustrating instruction fetching and jumping according to one embodiment of the present invention;

FIG. 5 is a block diagram illustrating instruction fetching and jumping according to one embodiment of the present invention;

FIG. 6 is a block diagram illustrating instruction fetching and jumping according to one embodiment of the present invention;

FIG. 7 is a block diagram illustrating instruction fetching and jumping according to one embodiment of the present invention;

FIG. 8 is a diagram illustrating one of the entries in the Base Predictor table according to one embodiment of the instruction branch prediction method of the present invention;

FIG. 9 is a diagram illustrating an entry in a tag table according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Example 1

As shown in fig. 1, an embodiment of the present invention provides an instruction branch prediction method, including:

The present embodiment specifically describes the prediction of a jump between a 32-bit instruction and a 16-bit compressed instruction in an RSIC-V instruction set, however, the application scope of the present invention is not limited to the RSIC-V instruction set, nor is the application scope of the present invention limited to the execution process of the 16-bit compressed instruction and the 32-bit instruction, and those skilled in the art should understand that the application scope of the present invention includes all instruction execution processes that are prone to cause aliasing.

The prediction process of the instruction branch prediction method of the embodiment is as shown in fig. 2, and the instruction and the global history information are simultaneously operated and matched in a Base Predictor table or a tag table;

when the 32-bit instruction is mixed with the 16-bit compression instruction, and when the 32-bit instruction is mixed with the 16-bit compression instruction, the number of indexes generated by the 32-bit instruction and the unaligned jump 16-bit compression instruction is the same, that is, the 32-bit instruction and the unaligned 16-bit compression instruction may share one entry, so aliasing is caused, and the predictor performs misprediction. The non-aligned pointer is read by a time predictor, and the fixed instruction length of the non-aligned pointer crosses a 32-bit address boundary or comprises an incomplete half 32-bit instruction and a 16-bit compression instruction.

When reading a 64-bit instruction bundle (instruction bundle), the instructions may be combined as follows:

1. 4 16-bit compression instructions;

2. 1 32-bit instruction and two 16-bit compression instructions;

3. 2 32-bit instructions;

since 32-bit instructions are mixed with 16-bit compressed instruction sets, reading fixed length instructions involves several non-aligned instruction situations as shown in fig. 3-6, in the figure, gray is the fixed instruction length 32-bit read by the predictor, the arrow is the instruction jump position, and the dotted line is the 32-bit address boundary.

Taking the case of fig. 6 as an example, the following situation may occur during the operation:

a normal 32-bit instruction pc [0:31], global history information is 32 bits h [0:31], but the index number is 8 bits, and the operation process is as follows;

Tag1＝pc[0:7]xor pc[8:15]xor pc[16:23]xor pc[23:31]xor h[0:7]xor h[7:15]xor h[16:23]xor h[24:31]；

in fig. 6, the 16-bit compression instruction pc [0:15] and the first half 32-bit instruction pc [16:31], the global history information is 32 bits h [0:31], the index number is 8 bits, and the operation process is as follows:

Tag2＝pc[0:7]xor pc[8:15]xor pc[16:23]xor pc[23:31]xor h[0:7]xor h[7:15]xor h[16:23]xor h[24:31]；

the last two generated tags 1 have a probability of being the same as tag 2.

At this time, since the provided prediction table includes the tag field, the entry may be tagged, so that it can be distinguished whether the entry is a normal 32-bit instruction or an instruction formed by the 16-bit instruction and the first half 32-bit instruction shown in fig. 6. The prediction table may be a Base Predictor table, and the structure of the prediction table is shown in fig. 8; a tag table may be used, and the structure thereof is shown in fig. 9.

In particular, a 1-bit counter may be used to record a first state and a second state, i.e. the first state is represented by the binary digit 1 and the second state is represented by the binary digit 0. Of course, the first state and the second state are not limited to being represented in the above-described manner.

When an actual jump occurs, the comp bit in the predictor is updated according to whether the actual instruction is a 16-bit compressed instruction or not and whether a non-aligned jump occurs or not. If the jump instruction is a 16-bit packed instruction and is a non-aligned jump (i.e., the read instruction crosses a 32-bit address boundary), then the comp is updated to 1. For example, when a non-aligned jump occurs in FIGS. 3, 4, 6, and 7 and a 16-bit compress instruction is included, the comp is updated to 1.

If the instruction is a 32-bit instruction or an aligned jump 16-bit compress instruction, the comp is updated to 0. For example, in FIG. 5, although a non-aligned jump has occurred, it is not a 16-bit packed instruction, so comp is still updated to 0.

Optionally, the prediction table further includes an index number field (i.e., tag), a confidence field (i.e., useful), and a prediction result field (i.e., ctr).

Optionally, determining whether the data state of the tag field (i.e., comp) is reset according to the data state of the confidence field (i.e., useful); when the confidence field (i.e., useful) of the prediction table is 0, the index number of the currently-acquired instruction is filled into the index number field (i.e., tag), and the data in the tag field (i.e., comp) is reset to the first state or the second state.

Optionally, judging whether the current instruction jumps according to the current entry according to a confidence field (i.e. usefull) of the current entry of the prediction table;

when the confidence data of the confidence field (i.e., useful) of the current entry is higher than a predetermined value, a jump of the current instruction is predicted according to the data of the prediction result field (i.e., ctr) and the tag field (i.e., comp).

The specific description is as follows:

ctr in the Tag table is recorded by using a 3-bit saturation counter, and the manner of providing a prediction result by ctr is as follows:

when ctr is 1xx, the predicted jump occurs, and when ctr is 0xx, the predicted jump does not occur. When the predicted result is the same as the actual jump and ctr is not equal to 111, ctr is ctr +1, and when the predicted result is not the same as the actual jump and ctr is not equal to 000, ctr is ctr-1.

The Useful adopts a two-bit saturation counter to record and provides the confidence coefficient of a prediction result; when the value of useful is 0, it indicates that this record can be overwritten by a new record. When use has a value of weak (01 or 00), the Predictor predicts from the provision of tag table or Base Predictor indexed using shorter history information, and when strong (10 or 11), uses the prediction of the current tag bank.

Comp records by using a counter of one bit and marks the jump mode of the instruction, when Comp is 1, the instruction is a 16-bit compression instruction and non-aligned jump occurs. When comp is 0, the instruction is a non-16-bit compress instruction or no non-aligned jump occurs.

The difference between the Base predictor table and the Tag table is that the ctr in the Base predictor table adopts a two-bit saturation counter, when the ctr is 1x, the prediction jump occurs, and when the ctr is 0x, the prediction jump does not occur. When the predicted result is the same as the actual jump and ctr is not equal to 11, ctr is ctr +1, and when the predicted result is not the same as the actual jump and ctr is not equal to 00, ctr is ctr-1.

Prediction mode of TAGE:

and matching the TAG in the TAG Table by using the TAG index number, wherein the matching is started from the longest index number using the global history information.

The matching cases include the following:

if none of the three TAG tables miss, then the prediction of the Base Predictor is used.

If a TAG Table is hit and the confidence (use) of the entry is not a weak (01 or 00), then the prediction of the current entry is used. If the current entry confidence is weak (10 or 00), then the prediction result of the Base Predictor is used.

If multiple TAG tables are hit at the same time. The prediction result is obtained from the TAG Table using a larger number of bits of history information.

Using the prediction scheme shown in fig. 2, the following is exemplified:

when the Tag table 3 and the Tag table 1 hit at the same time, and the Tag table 2 misses; t3 is provider component, providing the final prediction result, T1 is altpred;

if the value of the useful field of the provider component is strong (10 and 11), using the provider cmpont to judge whether the current instruction jumps according to the content recorded by the ctr field of the provider cmpont, and judging which instruction the current instruction jumps and which jump occurs according to the component field of the current instruction; when the value of the useful field of the provider component is a week (00 or 01), using altpred to judge whether the current instruction jumps according to the content recorded in the ctr field of the altpred, and judging which instruction the current instruction jumps and which jump occurs according to the component field of the altpred.

The updating process of the prediction table is as follows:

and after the instruction is jumped, updating the ctr field, the useful field and the comp field according to the jump condition of the instruction.

When the jump prediction of the instruction is correct, if the useful counter of the provider component is high (11 or 10) when the useful counter is low (00 or 01), the ctr field of the provider component is updated. If the useFULE counter of the provider component is low (00 or 01), the ctr field of the alternate provider is updated at the same time. The specific updating method comprises the following steps: when ctr in an entry is less than 111, ctr is ctr + 1; if ctr is 111, ctr is still 111.

Because the result of the prediction of the jump is a correct result, the confidence data of the confidence field in the entry is increased, that is, when the useful counter is less than 11, the useful is useful + 1; when the useful counter is equal to 11, the useful is still 11.

And meanwhile, updating the comp field according to the jump condition, if the instruction is a 16-bit compression instruction and a non-aligned jump occurs, updating the comp field to be 1, and if the instruction is a non-16-bit compression instruction or a non-aligned jump does not occur, updating the comp field to be 0.

When the jump prediction of the instruction is wrong:

the ctr of the TAG Table entry providing the prediction result is first updated.

When ctr in an entry is greater than 000, ctr-1; if ctr is 000, ctr is still 000.

When the Useful counter is greater than 00, Useful-1. When the Useful counter is 0, it indicates that the entry can be swapped out.

Meanwhile, the data of the comp field needs to be updated, the data update of the comp field is irrelevant to whether the jump is correct or not, and is only relevant to whether the instruction is a 16-bit compression instruction or not and whether the non-aligned jump occurs or not, and the updating mode is the same as that when the prediction is correct.

If the TAG Table providing the prediction result is not the one of the longest used global history information, a new entry is assigned to reuse the one of the longer global history information. The old entry with useful of 0 is preferentially replaced when a new entry is allocated.

The embodiment provides an instruction branch prediction method, which is characterized in that a marking field (comp) is added into a prediction table to mark an instruction type corresponding to an index and how the instruction jumps, so that whether a current instruction is a compressed instruction or not can be distinguished, non-aligned jump is performed, jump of the instruction can be accurately predicted, and the working efficiency of a processor is improved.

Example 2

The embodiment provides an instruction branch prediction method, which comprises the following steps: a predictor acquires a prediction table, wherein the prediction table comprises a mark field;

The specific process is as follows:

the Predictor acquires an instruction, performs operation on the currently acquired instruction and the global history information, then performs matching in a plurality of tag tables, and simultaneously performs matching on the currently acquired instruction in a Base Predictor table. In this embodiment, the tag table and the Base Predictor table are the prediction tables.

There are three kinds of hit conditions in this case

The first method comprises the following steps: a plurality of tag table tables have hit entries, and at the moment, the tag table with the longest global historical information is selected for query; when the confidence of useful of the hit entry of the tag table with the longest global history information is high (11 or 10).

Predicting by using a hit entry with the longest global history information, during prediction, judging whether an instruction jumps or not according to the information of a ctr field, and judging the type and the jumping mode of the instruction according to a comp field, for example, when the ctr field of the hit entry is 100 and the comp field is 1, judging that the current instruction is a 16-bit instruction, performing non-aligned jumping, and obtaining a prediction result of jumping; and executing according to the prediction result.

After the instruction decoding is finished, whether the prediction result is correct or not can be judged, if the prediction is correct, the result of the predicted jump is continuously executed and the information stored in the entry is updated, and the updating mode is as follows: ctr + 1; usefull + 1; when the instruction is a 16-bit instruction and a non-aligned jump occurs, the comp field is kept to be 1, and when the instruction is a non-16-bit instruction or the non-aligned jump does not occur, the comp field is updated to be 0; if the prediction is wrong, clearing the pipeline, executing the instruction according to the decoding result, and updating the information stored in the entry in the following updating mode: ctr-1; usefull ═ usefull-1; the comp field remains at 1 when the instruction is a 16-bit packed instruction and a non-aligned jump occurs, and is updated to 0 when the instruction is a non-16-bit packed instruction or a non-aligned jump does not occur. In short, the update of the comp field does not depend on whether the instruction is a 16-bit compression instruction or not, but only on whether the instruction is a 16-bit compression instruction and a non-aligned jump occurs.

Secondly, a plurality of tag table tables have hit entries, and at the moment, the tag table with the longest global historical information is selected for query; when the confidence of useful of the hit entry of the tag table with the longest global history information is low (01 or 00).

Predicting by adopting a tag table with longer global historical information in other hit entries, judging whether an instruction jumps or not according to the information of a ctr field during prediction, and judging the type and the jumping mode of the instruction according to a comp field, for example, when the ctr field of a hit entry is 100 and the comp field is 1, judging that the current instruction is a 16-bit instruction, performing non-aligned jumping, and judging that the prediction result is jumping; and executing according to the prediction result.

After the instruction decoding is finished, whether the prediction result is correct can be judged, if the prediction is correct, the result of the predicted jump is continuously executed, and the information stored in the hit entries of the tag table with the hit entries is updated at the same time, wherein the updating mode is as follows: ctr + 1; usefull + 1; when the instruction is a 16-bit instruction and a non-aligned jump occurs, the comp field is kept to be 1, and when the instruction is a non-16-bit instruction or the non-aligned jump does not occur, the comp field is updated to be 0; if the prediction is wrong, clearing the pipeline, executing the instruction according to the decoding result, and updating the information stored in the entry in the following updating mode: ctr-1; usefull ═ usefull-1; the comp field remains at 1 when the instruction is a 16-bit packed instruction and a non-aligned jump occurs, and is updated to 0 when the instruction is a non-16-bit packed instruction or a non-aligned jump does not occur. In short, the update of the comp field does not depend on whether the instruction is a 16-bit compression instruction or not, but only on whether the instruction is a 16-bit compression instruction and a non-aligned jump occurs. And a new entry is allocated on the tag table with the longest global history information, and the old entry with the useful of 0 is preferentially replaced when the new entry is allocated.

Thirdly, the plurality of tag table tables do not have hit entries, the hit entries in the Base Predictor table are adopted for prediction, the prediction process and the update process of the entries are similar to the first case, and the details are repeated here. When the error is predicted, when the hit item of the Base Predictor table is updated, a new item is allocated on the tag table with the longest global history information, and the old item with use of 0 is preferentially replaced when the new item is allocated.

The above description is only for the specific embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. An instruction branch prediction method, comprising:

the predictor predicts whether the instruction is a compressed instruction or not according to the mark field in the prediction table and generates non-aligned jump; wherein, the non-alignment refers to that the fixed instruction length read by the predictor crosses 32-bit address boundary, or comprises incomplete half 32-bit instructions and a 16-bit compression instruction;

the data recorded in the tag field has a first state and a second state;

2. The instruction branch prediction method of claim 1 wherein: updating the state of the data recorded in the mark field according to the last instruction jump condition;

3. The instruction branch prediction method of claim 1 wherein: the first state is represented by the binary digit 1 and the second state is represented by the binary digit 0.

4. The instruction branch prediction method of claim 1 wherein: the prediction table also includes an index number field, a confidence field, and a prediction result field.

5. An instruction branch prediction method as claimed in claim 4, wherein: judging whether the data state of the marking field is reset or not according to the data state of the confidence coefficient field; and when the confidence coefficient field of the prediction table is 0, filling the index number of the currently acquired instruction into the index number field, and resetting the data in the mark field to be in a first state or a second state.

6. An instruction branch prediction method as claimed in claim 4, wherein: judging whether the current instruction jumps according to the current entry according to the confidence field of the current entry of the prediction table;

7. The instruction branch prediction method of claim 1 wherein: the prediction table is a Base Predictor table or a tag table.

8. An instruction branch prediction method as defined in claim 7, wherein: matching the acquired instructions in a Base Predictor table and a plurality of tag table tables at the same time;

9. The instruction branch prediction method of claim 1 wherein: the prediction table is used for predicting the jump of 32-bit instructions and 16-bit compression instructions in the RSIC-V instruction set.