CN111078295B - Mixed branch prediction device and method for out-of-order high-performance core - Google Patents

Mixed branch prediction device and method for out-of-order high-performance core Download PDF

Info

Publication number
CN111078295B
CN111078295B CN201911194732.8A CN201911194732A CN111078295B CN 111078295 B CN111078295 B CN 111078295B CN 201911194732 A CN201911194732 A CN 201911194732A CN 111078295 B CN111078295 B CN 111078295B
Authority
CN
China
Prior art keywords
predictor
branch
unit
tage
branch prediction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911194732.8A
Other languages
Chinese (zh)
Other versions
CN111078295A (en
Inventor
陈伟杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hexin Interconnect Technology Qingdao Co ltd
Original Assignee
Hexin Interconnect Technology Qingdao Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hexin Interconnect Technology Qingdao Co ltd filed Critical Hexin Interconnect Technology Qingdao Co ltd
Priority to CN201911194732.8A priority Critical patent/CN111078295B/en
Publication of CN111078295A publication Critical patent/CN111078295A/en
Application granted granted Critical
Publication of CN111078295B publication Critical patent/CN111078295B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • G06F9/3836Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
    • G06F9/3842Speculative instruction execution
    • G06F9/3848Speculative instruction execution using hybrid branch prediction, e.g. selection between prediction techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Debugging And Monitoring (AREA)
  • Advance Control (AREA)

Abstract

The invention discloses a mixed branch prediction device and method for out-of-order high-performance cores, and relates to the field of computer branch prediction. The device can evaluate the performance of the processor micro-system structure level, and reduce out-of-order high-performance processor renaming blockage caused by branch prediction failure and missing instructions; the device provides a high-precision flexible-parameterization configurable hybrid branch predictor, which consists of a global historical information branch TAGE predictor, a statistical correction predictor and a circular predictor; the TAGE predictor utilizes a parameterized Tagged component and a split reading improvement strategy to realize high-precision branch prediction and reduce access conflicts; the statistical correction predictor is used for confirming or restoring the prediction result of the TAGE predictor according to the prediction result and the confidence coefficient of the TAGE predictor; the loop predictor is used for predicting a regular loop with a long loop body using a replacement strategy and a loop branch folding technique. The invention fully utilizes the limited hardware storage overhead, greatly reduces access conflict, improves the branch prediction precision and simultaneously improves the overall performance of the processor.

Description

Mixed branch prediction device and method for out-of-order high-performance core
Technical Field
The embodiment of the invention relates to the field of branch prediction, in particular to a mixed branch prediction device and method for an out-of-order high-performance core.
Background
As the performance of processor cores continues to increase, processor microarchitectures are becoming increasingly complex. In the face of increasingly complex processor microarchitectures and limited development time, how to effectively evaluate processor performance has become an important issue that processor designs need to overcome. The increased number of transistors on a single chip allows the processor to employ more complex microarchitectures, and techniques such as superscalar, branch prediction, out-of-order execution, look-ahead execution, etc. are also widely employed. However, a problem with increasingly complex microarchitectures is how to perform more efficient performance analysis during processor development.
A common approach to processor performance analysis is to use a software simulator, and the mainstream processor manufacturers also develop and optimize processor performance analysis models during processor development. Although the software simulator has high abstraction level and high simulation speed, the precision of the software simulator is limited, and especially in the software simulation process, some acceleration techniques are usually adopted, which results in further loss of simulation precision. The RTL code is a relatively accurate processor performance model, and the processor often supports a processor performance monitoring unit for counting processor performance events and providing a performance statistical result through a counter in the implementation process. However, limited by hardware implementation costs, processors tend to implement only a very small number of performance counters, and the performance events that these counters count are used primarily to guide software optimization. Furthermore, the speed of simulating RTL code using a software simulation environment is slow.
Therefore, the performance prediction provided by the prior art, namely the prediction of branch instructions, is not enough to guide the performance analysis and optimization of the microarchitectural level, and the prediction precision is low.
Disclosure of Invention
Embodiments of the present invention provide a hybrid branch prediction apparatus and method for an out-of-order high-performance core, so as to provide a high-precision hybrid branch predictor and solve the problem of low precision in performance analysis of the conventional processor core.
In order to achieve the above object, the embodiments of the present invention mainly provide the following technical solutions:
in a first aspect, an embodiment of the present invention provides a hybrid branch prediction apparatus for an out-of-order high-performance core, where the hybrid branch prediction apparatus includes a global history information branch TAGE predictor, a statistical correction predictor, a loop predictor, and a processor overall performance evaluation module, where the TAGE predictor is configured to predict a main direction of a branch instruction by using a split-read improvement strategy; the statistical correction predictor is used for confirming or restoring the prediction result of the TAGE predictor according to the prediction result and the confidence coefficient of the TAGE predictor; the loop predictor is used for predicting a regular loop with a long loop body by utilizing a replacement strategy and a loop branch folding technology; the processor overall performance evaluation module is used for performing performance analysis by combining a benchmark program and a critical path, and comprises a branch predictor ESL parameterized system modeling unit, a hybrid branch predictor quantitative analysis component, a processor architecture performance analysis component, a processor special performance detection component and a critical data summarization component.
Further, the TAGE branch prediction structure comprises: the branch prediction device comprises a branch prediction part T0, a Tagged part T [ i ] and a multiplexing selector MUX unit, wherein each Tagged part consists of a tag unit, a pred unit and a u unit, and the branch prediction part T0, the Tagged part T [ i ] and the multiplexing selector MUX unit form a TAGE branch prediction structure through judgment logic; wherein the split read improvement strategy comprises: splitting a T0 unit and a two-bit pred unit into a direction unit and a strength unit; when branch prediction is carried out, only reading the direction unit; when the branch prediction is correctly updated, the strength of the strength unit is directly set to be strong.
Further, the split read improvement strategy further comprises: combining a direction unit and a strength unit which are split from the two-bit pred unit with a tag unit and a u unit respectively, wherein the direction unit and the strength unit comprise a first combination formed by combining tag and direction and a second combination formed by combining strength and u; and respectively storing each item of the direction unit and the strength unit which are split from the T0 in the same address space of different banks.
Further, the split read improvement strategy further comprises: the TAGE predictor only accesses the first composition when prediction is carried out; and when the prediction result is correctly updated, only the second combination is accessed.
Further, the split read improvement strategy further comprises: the TAGE predictor is used for matching the branch prediction error updating operation which is carried out in the pipeline while predicting the branch instruction, and if the access item of the branch prediction error updating operation and the branch prediction reading item are at the same position of the same item of the same Tagged component, directly carrying out negation operation on the direction unit of the branch prediction reading item, and further obtaining the correct branch prediction direction.
Further, the TAGE predictor is further configured to: when an error occurs in the branch prediction of the TAGE predictor and a providing component of a branch prediction item is not a TAgged component with the longest historical information length, a new item is allocated for error updating; if the TAgged component list has no distributable item temporarily, performing external intervention; searching all TAgged components with history information length larger than the providing components, and carrying out self-decreasing 1 operation on u bits of the indexed items in the TAgged components to redistribute new items; and if a plurality of assignable items appear in different TAgged components, selecting a new assignment item according to the assignment probability.
Further, the statistical correction predictor is also used for selecting the output prediction result to be the prediction result of the TAGE predictor or the prediction result of the statistical correction predictor by comparing the confidence degrees of the statistical correction predictor and the TAGE predictor; and when the confidence coefficient of the statistical correction predictor is higher than that of the TAGE predictor, selecting the prediction result of the statistical correction predictor, and when the confidence coefficient of the TAGE predictor is higher than that of the statistical correction predictor, selecting the prediction result of the AGE predictor.
Further, the loop predictor has 64 entries, each 39bits, including a 10bit past iteration counter, a 10bit retired iteration counter, a 10bit tag counter, a 4bit confidence counter, a 4bit age counter, and 1 direction bit.
Further, the replacement policy includes: replacing the loop predictor entry only if its age counter is empty; when allocating, firstly setting the initial value of the age to be 7, if any item has the possibility of becoming a replacement target, reducing the age value of the item; when any entry is used and a valid prediction is provided, then the age of that entry is incremented, and if the loop predictor determines that any branch is not a regular loop, then the age value of that entry is cleared.
In a second aspect, embodiments of the present invention provide a hybrid branch prediction method for out-of-order high-performance cores,
the method is applied to a mixed branch prediction device of an out-of-order high-performance core, and comprises the following steps: predicting a main direction of the branch instruction by executing a split read improvement strategy through a TAGE predictor; confirming or restoring the prediction result of the TAGE predictor through the statistical correction predictor according to the prediction result and the confidence coefficient of the TAGE predictor; a loop predictor is used for executing a replacement strategy and a loop branch folding technology to predict a regular loop with a long loop body; and analyzing the performances of the TAGE predictor, the statistical correction predictor and the circular predictor by the overall performance evaluation module of the processor.
The technical scheme provided by the embodiment of the invention at least has the following advantages:
the embodiment of the invention provides a mixed branch prediction device and a mixed branch prediction method for an out-of-order high-performance core, wherein the device can be used for carrying out performance evaluation on a processor micro-architecture level, and the out-of-order high-performance processor renaming blockage caused by branch prediction failure and instruction missing is relieved; the device can be regarded as a high-precision flexible-parameterization configurable hybrid branch predictor, which consists of a global historical information branch TAGE predictor, a statistical correction predictor and a loop predictor, can fully utilize limited storage overhead, greatly reduce access conflict, and improve the overall performance of a processor while improving the branch prediction precision.
Drawings
Fig. 1 is a schematic structural diagram of a hybrid branch prediction apparatus for an out-of-order high-performance core according to an embodiment of the present invention.
FIG. 2 is a diagram illustrating overall performance evaluation of a processor of an out-of-order high-performance core hybrid branch prediction apparatus according to an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention is provided for illustrative purposes, and other advantages and effects of the present invention will become apparent to those skilled in the art from the present disclosure.
In the following description, for purposes of explanation and not limitation, specific details are set forth such as particular system structures, interfaces, techniques, etc. in order to provide a thorough understanding of the present invention. It will be apparent, however, to one skilled in the art that the present invention may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, circuits, and methods are omitted so as not to obscure the description of the present invention with unnecessary detail.
A problem encountered when processors incorporating pipelining process branch instructions is that, depending on the true/false of the decision condition, a jump may occur, interrupting the processing of the instruction in the pipeline, because the processor cannot determine the next instruction to the instruction until the branch is taken. The longer the pipeline, the longer the processor will wait because it must wait for the branch instruction to finish before determining the next instruction to enter the pipeline. Branch prediction techniques have been developed to address this problem, however, the accuracy of existing branch prediction techniques has not been adequate for more and more complex processor cores.
Therefore, an embodiment of the present invention provides a hybrid branch prediction apparatus for an out-of-order high-performance core, referring to fig. 1, the hybrid branch prediction apparatus mainly includes a global history information branch TAGE predictor 01, a statistical correction predictor 02, and a loop predictor 03; the TAGE predictor 01 is used for predicting the main direction of the branch instruction by utilizing a split reading improvement strategy; the statistical correction predictor 02 is used for confirming the prediction result of the TAGE predictor 01 according to the prediction result and the confidence coefficient of the TAGE predictor 01, and when the TAGE predictor 01 is statistically mispredicted, the statistical predictor is combined with branch history, branch confidence coefficient and the like to restore the branch misprediction result; the loop predictor 03 is used to predict regular loops with long loop body using replacement strategies and loop branch folding techniques. The outputs of the TAGE predictor 01, the statistical correction predictor 02 and the loop predictor 03 are connected to a multiplex selector.
Specifically, the branch prediction structure of the normal TAGE predictor 01 includes: branch prediction basic unit T0, Tagged unit T [ i ], and multiplexing selector MUX unit; each Tagged component consists of a tag unit, a pred unit and a u unit; the branch prediction basic unit T0, Tagged unit T [ i ] and multiplexing selector MUX unit form TAGE branch prediction structure through judging logic. Where tag is used to match the search address information, pred is used to provide the branch prediction direction, and u is used to indicate whether the corresponding branch prediction entry is valid. The search information of T0 is determined by program counter PC, and the access address of T [ i ] is obtained by hash operation of history information of PC and TAgged component. The existing TAGE branch prediction structure can simultaneously support the retrieval of various different historical information lengths; and the more complete the branch history information is, the higher the corresponding branch prediction reliability is; and preferentially selecting the branch prediction direction given by the part with long history length, and predicting the branch prediction direction according to T0 when all TAgged parts have no corresponding history information matching item. Compared with other branch prediction strategies with single historical information length, the TAGE branch prediction strategy has higher branch prediction precision, but the area overhead and the implementation complexity are also higher.
Therefore, the embodiment of the present invention makes a split reading improvement strategy for the structure of the TAGE branch predictor, specifically including: on the structure of a TAGE branch predictor, a T0 unit and a two-bit pred unit are divided into a direction unit and a strength unit; when branch prediction is carried out, only the direction unit is read; when the branch prediction is updated correctly, the strength of the strength unit is set to be strong directly. Combining a direction unit and a strength unit which are split from the two-bit pred unit with a tag unit and a u unit respectively, wherein the direction unit and the strength unit comprise a first combination formed by combining tag and direction and a second combination formed by combining strength and u; and respectively storing each item of the direction unit and the strength unit which are split from the T0 in the same address space of different banks. The strategy can simultaneously carry out the branch prediction operation and the correct update operation of the branch prediction, fully utilizes the improvement advantage of split reading and greatly reduces the access conflict.
In terms of an execution strategy, the TAGE predictor 01 only accesses a first composition when performing prediction; and when the prediction result is correctly updated, only the second combination is accessed. The method can avoid a series of phenomena of performance reduction of the processor, such as pipeline pause or failure in updating of some prediction result information, caused by access conflict with prediction result updating logic due to access of two units in each branch prediction.
In addition, the split reading improvement strategy of the embodiment of the invention further comprises:
the TAGE predictor 01 performs matching on the branch prediction error updating operation in the pipeline while predicting the branch instruction, and directly performs negation operation on the direction unit of the branch prediction reading item if the access item of the branch prediction error updating operation and the branch prediction reading item are the same position of the same item of the same Tagged component, so as to obtain the correct branch prediction direction.
The above-mentioned TAGE predictor 01 is also used for: when an error occurs in the branch prediction of the TAGE predictor and a providing component of a branch prediction item is not a TAgged component with the longest historical information length, a new item is allocated for error updating; if the TAgged component has no distributable item, performing external intervention, retrieving all TAgged components with history information length larger than the providing component, performing self-subtraction 1 operation on u bits of the indexed item in the TAgged components, and redistributing new items; and if a plurality of assignable items appear in different TAgged components, selecting a new assignment item according to the assignment probability.
The probability distribution method can avoid ping-pong phenomenon, thereby avoiding unnecessary overhead caused by the circulation that the branch prediction is always in searching for a new matching item.
In particular, the prior art TAGE fails to predict statistically biased branches, e.g., branches have little bias in one direction, but no strong correlation with historical paths. On some of the branches, the performance of TAGE predictor 01 is sometimes worse than a simple PC-indexed wide counter table.
To this end, embodiments of the present invention provide a statistical corrector predictor 02 for better predicting such statistically biased branches. The correction is intended to detect and restore unlikely predictions, specifically, the prediction from the TAGE predictor 01 and the address on the branch, global history, global path, local history information are presented to the statistical corrector predictor, which decides whether to reverse the prediction. Since the prediction provided by the TAGE predictor 01 is correct in most cases, the storage overhead of the statistical corrector predictor is not too large and the prediction accuracy can be improved.
More specifically, the statistical correction predictor 02 includes several different components: 1 bias component and a plurality of GEHL components; the bias component consists of two tables, the entries are indexed using different hash functions to limit the impact of entry collisions, and are indexed by the direction predicted by the PC and TAGE predictor 01.
The GEHL component indexes using a global conditional branch history, a branch history associated with the return stack, a plurality of entries of the local history, and a number of 16 entries of the local history, respectively. And the history records all have a plurality of tables, and all the tables are a plurality of bit counters. The branch prediction is calculated by reading the sign of the sum of the predictions over all the statistical corrector tables, and it should be noted that the prediction results of the TAGE predictor 01 are integrated by multiplying twice the number of statistical corrector tables by the direction, which includes +1 and-1.
The entries of the statistical correction predictor 02 are updated by using a dynamic threshold strategy suggested by the GEHL component, and a dynamic threshold table indexed by a PC is used, so that marginal benefits are realized, and about 0.19% of error prediction can be reduced.
In addition, the output of the statistical correction predictor 02 is generally more accurate than the output of the TAGE predictor 01; however, the results of the TAGE predictor 01 are generally more accurate when the output of the TAGE predictor 01 is of high confidence and the output of the statistical correction predictor 02 is of low confidence.
Therefore, the statistical correction predictor 02 of the embodiment of the present invention is further configured to select whether the output prediction result is the prediction result of the TAGE predictor 01 or the prediction result of the statistical correction predictor 02 by comparing the confidence degrees of the statistical correction predictor 02 and the TAGE predictor 01;
when the confidence coefficient of the statistical correction predictor 02 is higher than that of the TAGE predictor 01, selecting a prediction result of the statistical correction predictor 02; when the confidence coefficient of the TAGE predictor 01 is higher than that of the statistical correction predictor 02, the prediction result of the TAGE predictor is selected. It was estimated that this method can reduce mispredictions by about 0.72%.
Specifically, the loop predictor 03 provided by the embodiment of the present invention has 64 entries, each of which is 39bits, and includes a past iteration count of 10 bits, a retired iteration count of 10 bits, a tag counter of 10 bits, a confidence counter of 4 bits, an age counter of 4 bits, and 1 direction bit.
The loop predictor 03 predicts by using a replacement strategy, and the specific replacement strategy comprises: an entry of the loop predictor 03 is replaced only when its age counter is empty; when distributing, firstly setting the initial value of the age to be 7, and if any item is possible to be a replacement target, reducing the age value of the item; when any entry is used and a valid prediction is provided, the age of the entry is incremented, and if the loop predictor 03 determines that any branch is not a regular loop, the age value of the entry is cleared.
Referring to fig. 2, the hybrid branch prediction apparatus further includes a processor overall performance evaluation module, which is configured to perform performance analysis by combining a benchmark program and a critical path, where benchmark is a benchmark inspection program; the processor overall performance evaluation module comprises a mixed branch predictor ESL parameterized system modeling unit, a mixed branch predictor quantitative analysis component, a processor architecture performance analysis component, a processor special performance detection component and a key data summarization component.
Firstly, according to the prediction results of a TAGE predictor 01, a statistical correction predictor 02 and a loop predictor 03, modeling is carried out by utilizing an ESL (electronic shelf label) parameterized system of a high-precision mixed branch predictor, quantitative analysis is carried out by utilizing a quantitative analysis component of the mixed branch predictor, and analysis is carried out by utilizing a structural performance analysis component of a processor system; monitoring real-time data by using a special performance detection component of the processor, summarizing data by using a key data summarization component, analyzing the overall performance of the processor by combining a Benchmark and a key data path DataPath, feeding back an analysis result to an ESL (electronic stability level) parameterized modeling unit of a hybrid branch predictor, performing closed-loop iteration, and finally completing the establishment of an overall performance evaluation model of the processor. The performance evaluation model is used for evaluating the performance of the hybrid branch prediction device, so that out-of-order high-performance processor rename blockage caused by branch prediction failure and missing instructions can be relieved.
The embodiment of the invention provides a mixed branch prediction device of an out-of-order high-performance core, which can evaluate the performance of a micro-architecture level and reduce the rename blockage of the out-of-order high-performance processor caused by the failure and missing instruction of branch prediction; the device provides a high-precision flexible-parameterization configurable hybrid branch predictor which comprises a global historical information branch TAGE predictor, a statistical correction predictor and a loop predictor, limited storage overhead can be fully utilized, access conflicts are greatly reduced, and the overall performance of a processor is improved while the branch prediction precision is improved.
Corresponding to the foregoing embodiments, an embodiment of the present invention provides a mixed branch prediction method for an out-of-order high-performance core, where the method is applied to a mixed branch prediction apparatus for an out-of-order high-performance core, and specifically includes:
predicting a main direction of the branch instruction by executing a split read improvement strategy through a TAGE predictor; split read improvement strategies include: splitting a T0 unit and a two-bit pred unit into a direction unit and a strength unit; when branch prediction is carried out, only the direction unit is read; when the branch prediction is correctly updated, the strength of the strength unit is directly set to be strong. Combining a direction unit and a strength unit which are split by a two-bit pred unit with a tag unit and a u unit respectively, wherein the direction unit and the strength unit comprise a first combination formed by combining tag and direction and a second combination formed by combining strength and u; and respectively storing each item of the direction unit and the strength unit which are split from the T0 in the same address space of different banks. And the TAGE predictor only accesses the first combination when performing prediction; when the prediction result is correctly updated, only the second composition is accessed.
Confirming or restoring the prediction result of the TAGE predictor through the statistical correction predictor according to the prediction result and the confidence coefficient of the TAGE predictor; when the confidence coefficient of the statistical correction predictor is higher than that of the TAGE predictor, selecting a prediction result of the statistical correction predictor; and when the confidence coefficient of the TAGE predictor is higher than that of the statistical correction predictor, selecting the prediction result of the TAGE predictor.
A loop predictor is used for executing a replacement strategy and a branch loop reduction technology to predict a regular loop with a long loop body; replacing the loop predictor entry only if its age counter is empty; when distributing, firstly setting the initial value of the age to be 7, and if any item has the possibility of becoming a replacement target, reducing the age value of the item; and when any entry is used and a valid prediction is provided, the age of the entry is incremented, and if the loop predictor determines that any branch is not a regular loop, the age value of the entry is cleared.
And analyzing the performances of the TAGE predictor, the statistical correction predictor and the circular predictor by a processor overall performance evaluation module.
The embodiment of the invention can relieve out-of-order high-performance processor renaming blockage caused by branch prediction failure and missing instructions; high-precision branch prediction is realized by using a parameterized Tagged component and a split reading improvement strategy, and access conflicts are reduced; confirming or restoring the prediction result of the TAGE predictor according to the prediction result and the confidence coefficient of the TAGE predictor; predicting a regular loop with a long loop body by using a replacement strategy and a branch loop folding technology; the device can make full use of limited hardware storage overhead, greatly reduces access conflicts, improves the branch prediction precision and simultaneously improves the overall performance of the processor.
Those skilled in the art will appreciate that the functionality described in the present invention may be implemented in a combination of hardware and software in one or more of the examples described above. When software is applied, the corresponding functionality may be stored on or transmitted over as one or more instructions or code on a computer-readable medium. Computer-readable media includes both computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another. A storage media may be any available media that can be accessed by a general purpose or special purpose computer.
The above-mentioned embodiments, objects, technical solutions and advantages of the present invention are further described in detail, it should be understood that the above-mentioned embodiments are only exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made on the basis of the technical solutions of the present invention should be included in the scope of the present invention.

Claims (9)

1. The mixed branch prediction device of the out-of-order high-performance core is characterized by comprising a global historical information branch TAGE predictor, a statistical correction predictor, a loop predictor and a processor overall performance evaluation module,
the TAGE predictor is used for predicting the main direction of the branch instruction by utilizing a split reading improvement strategy;
the statistical correction predictor is used for confirming or restoring the prediction result of the TAGE predictor according to the prediction result and the confidence coefficient of the TAGE predictor;
the loop predictor is used for predicting a regular loop with a long loop body by utilizing a replacement strategy and a loop branch folding technology;
the performance evaluation module is used for performing performance analysis by combining benchmark and a critical path, and the overall performance evaluation module of the processor comprises a branch predictor ESL parametric modeling unit, a mixed branch predictor quantitative analysis component, a processor architecture performance analysis component, a processor special performance detection component and a critical data summarization component;
the TAGE predictor includes: the branch prediction device comprises a branch prediction part T0, a Tagged part T [ i ] and a multiplexing selector MUX unit, wherein each Tagged part consists of a tag unit, a pred unit and a u unit, the T0 part, the Tagged part T [ i ] and the multiplexing selector MUX unit of the branch predictor form a TAGE branch prediction structure through judgment logic, and the split reading improvement strategy comprises the following steps: dividing a T0 and a pred unit of two bits into a direction unit and a strength unit; when branch prediction is carried out, only the direction unit is read; and when the branch prediction is correctly updated, the strength of the strength unit is directly set to be strong.
2. The apparatus of claim 1, wherein the split read improvement strategy further comprises:
combining a direction unit and a strength unit which are split by a two-bit pred unit with a tag unit and a u unit respectively, wherein the direction unit and the strength unit comprise a first combination formed by combining tag and direction and a second combination formed by combining strength and u;
and respectively storing each item of the direction unit and the strength unit which are split from the T0 in the same address space of different banks.
3. The apparatus of claim 2, wherein the split read improvement strategy further comprises:
the TAGE predictor only accesses the first combination when performing prediction, and only accesses the second combination when the prediction result is correctly updated.
4. The apparatus of claim 1, wherein the split read improvement strategy further comprises:
the TAGE predictor is used for matching the branch prediction error updating operation which is carried out in the pipeline while predicting the branch instruction, and if the access item of the branch prediction error updating operation and the branch prediction reading item are the same position of the same item of the same Tagged component, directly carrying out negation operation on the direction unit of the branch prediction reading item, and further obtaining the correct branch prediction direction.
5. The hybrid branch prediction device of an out-of-order high performance core of claim 1, wherein the TAGE predictor is further to:
when an error occurs in the branch prediction of the TAGE predictor and a providing component of a branch prediction item is not a TAgged component with the longest historical information length, a new item is allocated for error updating;
if the TAgged component list has no distributable item, performing external intervention, retrieving all TAgged components with history information length larger than that of the providing component, performing self-decreasing 1 operation on u bits of the indexed item in the TAgged components, and redistributing new items;
and if a plurality of assignable items appear in different TAgged components, selecting a new assignment item according to the assignment probability.
6. The apparatus of claim 1, wherein the statistical correction predictor is further configured to select whether the output prediction result is the prediction result of the TAGE predictor or the prediction result of the statistical correction predictor by comparing confidence magnitudes of the statistical correction predictor and the TAGE predictor;
and when the confidence coefficient of the statistical correction predictor is higher than that of the TAGE predictor, selecting the prediction result of the statistical correction predictor, and when the confidence coefficient of the TAGE predictor is higher than that of the statistical correction predictor, selecting the prediction result of the TAGE predictor.
7. The apparatus of claim 1, wherein the loop predictor has 64 entries, each 39-bit entry comprising a 10-bit past iteration counter, a 10-bit retired iteration counter, a 10-bit tag counter, a 4-bit confidence counter, a 4-bit age counter, and 1 direction bit.
8. The apparatus of claim 1, wherein the replacement policy comprises:
replacing the loop predictor entry only if its age counter is empty; when distributing, firstly setting the initial value of the age to be 7, and if any item has the possibility of becoming a replacement target, reducing the age value of the item; and when any entry is used and a valid prediction is provided, the age of the entry is incremented, and if the loop predictor determines that any branch is not a regular loop, the age value of the entry is cleared.
9. A mixed branch prediction method of an out-of-order high-performance core is applied to a mixed branch prediction device of the out-of-order high-performance core, and comprises the following steps:
predicting a main direction of the branch instruction by executing a split read improvement strategy through a TAGE predictor;
confirming or restoring the prediction result of the TAGE predictor through the statistical correction predictor according to the prediction result and the confidence coefficient of the TAGE predictor;
a loop predictor is used for executing a replacement strategy and a loop branch folding technology to predict a regular loop with a long loop body;
analyzing the performances of the TAGE predictor, the statistical correction predictor and the circular predictor by the overall performance evaluation module of the processor;
the TAGE predictor includes: the branch prediction device comprises a branch prediction part T0, a Tagged part T [ i ] and a multiplexing selector MUX unit, wherein each Tagged part consists of a tag unit, a pred unit and a u unit, the T0 part, the Tagged part T [ i ] and the multiplexing selector MUX unit of the branch predictor form a TAGE branch prediction structure through judgment logic, and the split reading improvement strategy comprises the following steps: dividing a T0 and a pred unit of two bits into a direction unit and a strength unit; when branch prediction is carried out, only the direction unit is read; and when the branch prediction is correctly updated, the strength of the strength unit is directly set to be strong.
CN201911194732.8A 2019-11-28 2019-11-28 Mixed branch prediction device and method for out-of-order high-performance core Active CN111078295B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911194732.8A CN111078295B (en) 2019-11-28 2019-11-28 Mixed branch prediction device and method for out-of-order high-performance core

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911194732.8A CN111078295B (en) 2019-11-28 2019-11-28 Mixed branch prediction device and method for out-of-order high-performance core

Publications (2)

Publication Number Publication Date
CN111078295A CN111078295A (en) 2020-04-28
CN111078295B true CN111078295B (en) 2021-11-12

Family

ID=70311962

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911194732.8A Active CN111078295B (en) 2019-11-28 2019-11-28 Mixed branch prediction device and method for out-of-order high-performance core

Country Status (1)

Country Link
CN (1) CN111078295B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111857831B (en) * 2020-06-11 2021-07-20 成都海光微电子技术有限公司 Memory bank conflict optimization method, parallel processor and electronic equipment
CN112988233B (en) * 2021-02-06 2024-03-26 江南大学 Deviation corrector and method for branch instruction prediction

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109829436A (en) * 2019-02-02 2019-05-31 福州大学 Multi-face tracking method based on depth appearance characteristics and self-adaptive aggregation network

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170322810A1 (en) * 2016-05-06 2017-11-09 Qualcomm Incorporated Hypervector-based branch prediction
CN106406823B (en) * 2016-10-10 2019-07-05 上海兆芯集成电路有限公司 Branch predictor and method for operating branch predictor
CN110109705A (en) * 2019-05-14 2019-08-09 核芯互联科技(青岛)有限公司 A kind of superscalar processor branch prediction method for supporting embedded edge calculations

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109829436A (en) * 2019-02-02 2019-05-31 福州大学 Multi-face tracking method based on depth appearance characteristics and self-adaptive aggregation network

Also Published As

Publication number Publication date
CN111078295A (en) 2020-04-28

Similar Documents

Publication Publication Date Title
US11379234B2 (en) Store-to-load forwarding
JP5965041B2 (en) Load store dependency predictor content management
US9715389B2 (en) Dependent instruction suppression
CN1188778C (en) Zoning transmit quene and distribution strategy
US7822951B2 (en) System and method of load-store forwarding
JP5799465B2 (en) Loop buffer learning
US6059835A (en) Performance evaluation of processor operation using trace pre-processing
KR101496009B1 (en) Loop buffer packing
US5860151A (en) Data cache fast address calculation system and method
US9135005B2 (en) History and alignment based cracking for store multiple instructions for optimizing operand store compare penalties
CN101611380A (en) Speculative throughput calculates
CN101246447B (en) Method and apparatus for measuring pipeline stalls in a microprocessor
CN111078295B (en) Mixed branch prediction device and method for out-of-order high-performance core
US11928467B2 (en) Atomic operation predictor to predict whether an atomic operation will complete successfully
US20130262821A1 (en) Performing predecode-time optimized instructions in conjunction with predecode time optimized instruction sequence caching
CN111221575A (en) Register renaming method and system for out-of-order high-performance processor
US8151096B2 (en) Method to improve branch prediction latency
CN114008587A (en) Limiting replay of load-based Control Independent (CI) instructions in speculative misprediction recovery in a processor
KR20230093442A (en) Prediction of load-based control independent (CI) register data independent (DI) (CIRDI) instructions as control independent (CI) memory data dependent (DD) (CIMDD) instructions for replay upon recovery from speculative prediction failures in the processor
CN1322415C (en) Method and apparatus to replay transformed instructions
Douma et al. Fast and precise cache performance estimation for out-of-order execution
CN102163139A (en) Microprocessor fusing loading arithmetic/logic operation and skip macroinstructions
US11379240B2 (en) Indirect branch predictor based on register operands
CN101916184B (en) Method for updating branch target address cache in microprocessor and microprocessor
US11983533B2 (en) Control flow prediction using pointers

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant