CN117667221A

CN117667221A - Hybrid algorithm two-stage branch prediction system, method and related equipment

Info

Publication number: CN117667221A
Application number: CN202410133325.0A
Authority: CN
Inventors: 刘宇翔; 周庆华
Original assignee: Ruisixinke Shenzhen Technology Co ltd
Current assignee: Ruisixinke Shenzhen Technology Co ltd
Priority date: 2024-01-31
Filing date: 2024-01-31
Publication date: 2024-03-08
Anticipated expiration: 2044-01-31
Also published as: CN117667221B

Abstract

The invention is applicable to the technical field of processors, and particularly relates to a two-stage branch prediction system, a two-stage branch prediction method and related equipment of a hybrid algorithm, wherein the two-stage branch prediction system comprises the following steps: the first-stage pipeline is used for acquiring the program counting pointer value transmitted by the instruction fetching unit, and sending the program counting pointer value to the NLP unit, the BTB unit and the TNMT unit for inquiring the table entry to obtain a corresponding prediction result; a second stage pipeline for returning the NLP prediction result to the fetch unit as a first branch prediction result; simultaneously, fusion prediction is carried out on BTB and TNMT prediction results according to a first branch prediction logic to obtain a first fusion result; and the third-stage pipeline is used for carrying out fusion prediction according to the second branch prediction logic through the RAS unit according to the first fusion result to obtain a second fusion result, comparing the second fusion result with the NLP prediction result according to the third branch prediction logic to obtain a second branch prediction result, and returning to the instruction fetching unit. The invention improves the branch prediction accuracy of the processor system.

Description

Hybrid algorithm two-stage branch prediction system, method and related equipment

Technical Field

The invention is suitable for the technical field of processors, and particularly relates to a two-stage branch prediction system and method of a hybrid algorithm and related equipment.

Background

With the continued evolution of computer architecture, the performance and complexity of processors has increased. Multicore processors, hyper-threading technology, and higher clock frequencies are the primary means of increasing the computational power of processors. However, the improvement of the performance of the processor is limited by the bottleneck problems of high energy consumption, difficult heat dissipation, memory access efficiency and the like. To overcome these challenges, researchers have struggled to find new innovative solutions where improving branch prediction techniques is an important ring of improving instruction level parallelism and reducing pipeline stall.

Branch prediction techniques are a vital part of the processor design by predicting branch instructions in the program execution path in order to better execute the instruction stream. Existing static branch prediction and dynamic branch prediction methods have been widely adopted.

Existing branch prediction strategies have respective scenarios and advantages and disadvantages. BTB (Branch Target Buffer ) is a critical hardware structure in computer architecture that is mainly used to improve the prediction accuracy and execution efficiency of branch instructions, and is used to store the target address of a branch instruction, so as to quickly retrieve and predict the target of a branch when a program is executed. NLP (Next Line Prediction ) performs better when processing continuous, linear instruction streams, but suffers from reduced accuracy when faced with frequent programs of branch instructions, complex control flows, including poor adaptability to program dynamics, difficulty in handling complex branching behavior, especially in the presence of loops or conditional branches; TNMT (Tournament Prediction, race predictor) improves the accuracy of predictions by integrating multiple branch predictors, but also increases hardware cost, race and select logic introduces additional delay, and requires more memory to maintain the state of multiple predictors, TNMT may still perform poorly in the face of complex program behavior, as it still relies on a single decision maker; RAS (Return Address Stack ) is mainly used to solve the branch prediction problem of function calls and returns, but its performance is limited by the depth of the call stack, and for deep recursion or complex function nesting situations, the capacity of the RAS may not be sufficient to effectively predict the return address, resulting in branch prediction inaccuracy.

Thus, existing branch prediction strategies face challenges as program complexity increases and execution environments change. The complexity of branch instructions, frequent program calls and returns, and other factors have led to the accuracy of branch prediction techniques being continually limited.

Disclosure of Invention

The invention provides a two-stage branch prediction system, a two-stage branch prediction method and related equipment of a hybrid algorithm, and aims to solve the problem that in the prior art, the accuracy of a branch prediction method is low due to high program complexity.

To solve the above technical problem, in a first aspect, the present invention provides a two-stage branch prediction system of a hybrid algorithm, including:

the first-stage pipeline comprises an NLP unit, a BTB unit and a TNMT unit, and is used for acquiring a program count pointer value of a current instruction transmitted by a fetching unit of a processor, and respectively sending the program count pointer value into the NLP unit, the BTB unit and the TNMT unit to perform table entry query to respectively obtain an NLP prediction result, a BTB prediction result and a TNMT prediction result;

the second stage of pipeline is used for storing the NLP prediction result and returning the NLP prediction result to the instruction fetch unit as a first branch prediction result; meanwhile, the BTB prediction result and the TNMT prediction result are stored, and fusion prediction is carried out according to a first branch prediction logic to obtain a first fusion result; the method comprises the steps of,

the third-stage pipeline comprises an RAS unit, and is used for carrying out fusion prediction according to a second branch prediction logic through the RAS unit according to the first fusion result to obtain a second fusion result; and comparing the second fusion result with the NLP prediction result according to a third branch prediction logic to obtain a second branch prediction result, and returning the second branch prediction result to the instruction fetching unit.

Still further, the first branch prediction logic is embodied as:

determining that the jump judgment type of the BTB prediction result to the current instruction is an unconditional jump instruction or a conditional jump instruction: if yes, taking the BTB predicted result as the first fusion result; and if the condition jump instruction is the condition jump instruction, the BTB predicted result and the TNMT predicted result are used as the first fusion result together.

Still further, the second branch prediction logic is specifically configured to:

the RAS unit determines that the current instruction is a function return instruction or a function call instruction according to the jump judgment type of the current instruction by the first fusion result:

and if the function returns the instruction, reading a branch instruction function return address from a first-in first-out queue maintained by the RAS unit, and taking the branch instruction function return address and the first fusion result together as the second fusion result.

Still further, the third branch prediction logic is specifically configured to:

and judging whether the NLP prediction result is the same as the second fusion result, and if not, taking the second fusion result as the second branch prediction result.

Still further, the third stage pipeline is further configured to:

when the RAS unit determines that the current instruction is a function return instruction or a function call instruction according to the jump judgment type of the current instruction by the first fusion result:

if the function call instruction is the function call instruction, the return address of the current instruction is written into a first-in first-out queue maintained by the RAS unit.

Still further, the third stage pipeline is further configured to:

after comparing the second fusion result with the NLP prediction result according to a third branch prediction logic, judging whether the second branch prediction result is obtained or not:

if yes, returning the second branch prediction result to the instruction fetching unit so as to wash out the processing flow of the instruction fetching unit on the first branch prediction result;

if not, not returning any data to the finger taking unit.

In a second aspect, the present invention also provides a two-stage branch prediction method of a hybrid algorithm implemented by a two-stage branch prediction system based on the hybrid algorithm as described above, including the steps of:

acquiring a program count pointer value of a current instruction transmitted by a fetching unit of a processor in a first-stage pipeline, and respectively sending the program count pointer value into the NLP unit, the BTB unit and the TNMT unit to perform table entry query to respectively obtain an NLP prediction result, a BTB prediction result and a TNMT prediction result;

storing the NLP prediction result in a second stage pipeline, and returning the NLP prediction result to the instruction fetch unit as a first branch prediction result; meanwhile, the BTB prediction result and the TNMT prediction result are stored, and fusion prediction is carried out according to a first branch prediction logic to obtain a first fusion result;

fusion prediction is carried out in a third-stage assembly line through the RAS unit according to the first fusion result and the second branch prediction logic to obtain a second fusion result; and comparing the second fusion result with the NLP prediction result according to a third branch prediction logic to obtain a second branch prediction result, and returning the second branch prediction result to the instruction fetching unit.

In a third aspect, the present invention also provides a computer device comprising: memory, a processor and a hybrid algorithm two-level branch prediction program stored on the memory and executable on the processor, the processor implementing the steps in the hybrid algorithm two-level branch prediction method as set forth in any one of the above when executing the hybrid algorithm two-level branch prediction program.

In a fourth aspect, the present invention also provides a computer readable storage medium having stored thereon a two-stage branch prediction program of a hybrid algorithm, which when executed by a processor implements the steps of the two-stage branch prediction method of the hybrid algorithm as described in any one of the above.

The invention has the beneficial effects that a two-stage branch prediction system of a mixed algorithm capable of combining multiple branch prediction strategies is provided, the two-stage branch prediction system combines multiple branch prediction algorithm structures, fuses the two-stage branch prediction systems together according to the characteristics of respective prediction modes and results, and performs hierarchical processing, so that the advantages of each branch prediction algorithm are effectively reserved, the coverage condition of a branch prediction unit on different types of branch instructions is improved, the branch prediction accuracy of the system is improved, and the overall performance of the system is improved.

Drawings

FIG. 1 is a schematic diagram of a hybrid algorithm two-stage branch prediction system according to an embodiment of the present invention;

fig. 2 is a schematic structural diagram of an NLP unit according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of a BTB unit according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of a TNMT unit structure according to an embodiment of the present invention;

fig. 5 is a view showing a structure of an RAS unit according to an embodiment of the present invention;

FIG. 6 is a block flow diagram of the steps of a two-stage branch prediction method of a hybrid algorithm provided by an embodiment of the present invention;

fig. 7 is a schematic structural diagram of a computer device according to an embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

Referring to fig. 1, fig. 1 is a schematic structural diagram of a hybrid algorithm two-stage branch prediction system according to an embodiment of the present invention, where the hybrid algorithm two-stage branch prediction system 100 includes:

the first stage pipeline 101 comprises an NLP unit 1011, a BTB unit 1012 and a TNMT unit 1013, wherein the first stage pipeline 101 is configured to obtain a program count pointer value of a current instruction transmitted by a fetch unit of a processor, and send the program count pointer value to the NLP unit 1011, the BTB unit 1012 and the TNMT unit 1013 to perform entry query respectively, so as to obtain an NLP prediction result, a BTB prediction result and a TNMT prediction result respectively;

a second stage pipeline 102, configured to save the NLP prediction result and return the NLP prediction result to the instruction fetch unit as a first branch prediction result; meanwhile, the BTB prediction result and the TNMT prediction result are stored, and fusion prediction is carried out according to a first branch prediction logic to obtain a first fusion result; the method comprises the steps of,

a third stage pipeline 103, including an RAS unit 1031, where the third stage pipeline 103 is configured to perform fusion prediction according to a second branch prediction logic by using the RAS unit 1031 according to the first fusion result, to obtain a second fusion result; and comparing the second fusion result with the NLP prediction result according to a third branch prediction logic to obtain a second branch prediction result, and returning the second branch prediction result to the instruction fetching unit.

To further illustrate the technical effects of the hybrid algorithm two-stage branch prediction system 100 in the embodiment of the present invention, the specific structures of the NLP unit 1011, the BTB unit 1012, the TNMT unit 1013, and the RAS unit 1031 used in the implementation process are shown in fig. 2-5:

fig. 2 is a schematic structural diagram of an NLP unit 1011 according to an embodiment of the present invention, where NLP is a dynamic branch prediction technique, and the main body in the implementation process is a micro BTB of a content addressable memory, where the micro BTB stores a small amount of instruction content, and because of the timeliness requirement of the NLP unit 1011, the size of the micro BTB cannot be too large, otherwise, the result cannot be given in time in the next cycle. When the branch prediction is performed, the program count pointer information of the instruction fetch unit is sent to the NLP unit 1011, and is compared with each storage content in the micro BTB to obtain the branch prediction result of the final hit.

In the embodiment of the invention, the prediction result of the NLP unit 1011 is returned to the finger fetch unit first, so as to ensure that the whole branch prediction system can jump in time in the next clock cycle for a simple branch prediction scene, and avoid the idle shooting introduced into the pipeline.

Fig. 3 is a schematic diagram of a BTB unit 1012 according to an embodiment of the present invention, in which the main body in the implementation process is a BTB table with a sram structure, and a large amount of instruction contents executed by a processor are stored in the BTB table. The entry size of BTB unit 1012 is larger and the search time is longer than for micro BTB in NLP unit 1011, so more branch instruction content can be saved for more accurate branch prediction. When branch prediction is performed, program count pointer information of the instruction fetch unit is sent to the BTB unit 1012, and the BTB unit 1012 reads corresponding instruction contents according to the contents index BTB table of the program count pointer.

Fig. 4 is a schematic structural diagram of a TNMT unit 1013 provided by an embodiment of the present invention, where the TNMT unit 1013 is a branch prediction structure that combines multiple prediction strategies, and a main body in an implementation process includes a global branch history table built by registers, a local branch history table built by registers, a selection table built by registers, a global history register built by registers, and a multiplexer unit. When branch prediction is performed, program count pointer information of the instruction fetching unit is sent to the TNMT unit 1013, at this time, the whole unit directly addresses a local branch history table, performs exclusive or operation with a global history register, indexes the global branch history table and a selection table by using the obtained result, and then selects the result of the local branch history table or the result of the global branch history table used in the prediction. After repeating the above process a plurality of times, the TNMT unit 1013 may dynamically select a result source according to execution flows of different instructions, thereby improving the accuracy of the predicted result of the non-forced jump instruction.

Fig. 5 is a diagram of a RAS unit 1031 according to an embodiment of the present invention, which is mainly composed of a fifo queue. When branch prediction is performed, the fifo queue is correspondingly operated according to the instruction type (in the embodiment of the invention, the instruction type for the RAS unit 1031 prediction is determined by the prediction result obtained by the BTB unit 1012 described above) transmitted from the upper pipeline. If the instruction is of the function call type, the function return address corresponding to the instruction is sent to a first-in first-out queue for storage; in the case of a function return type instruction, a most recent function return address is read back from the first-in first-out queue as the predicted branch instruction return address.

In combination with the specific structure of each unit in the above embodiment, the two-stage branch prediction system 100 of the hybrid algorithm in the embodiment of the present invention combines the prediction results of each unit through different branch prediction logic, so as to obtain a prediction result with more comprehensive performance. Specific:

the first branch prediction logic is specifically:

In the embodiment of the present invention, since the BTB unit 1012 can predict the type of the instruction at the location, determine whether the instruction is a branch instruction, and more specifically determine the type of the corresponding branch instruction and the target address of the branch instruction, when the BTB unit 1012 determines an unconditional jump instruction, the BTB predicted result is used as the first fusion result; in contrast, for a conditional jump instruction, since BTB unit 1012 does not determine the condition well, TNMT unit 1013 predicts the result of a predicted jump or no jump of the branch instruction according to its structure, and these two results (BTB unit 1012 determines the conditional jump, TNMT specifically determines whether to jump or not) are taken together as the first fusion result.

The second branch prediction logic specifically comprises:

the RAS unit 1031 determines that the current instruction is a function return instruction or a function call instruction according to the skip judgment type of the current instruction by the first fusion result:

if the function returns an instruction, a branch instruction function return address is read from a fifo maintained by the RAS unit 1031, and the branch instruction function return address and the first fusion result are used together as the second fusion result.

The RAS unit 1031 processes the fifo according to the jump type of the instruction determined by the prediction result obtained by the BTB unit 1012, specifically, when the function is involved in the return (i.e. the branch instruction needs to jump), the RAS unit 1031 reads the latest branch instruction function return address in the fifo, and at this time, the return address is combined with the result obtained by the BTB unit 1012 in the above process to determine whether the condition jumps and the TNMT specifically determine whether to jump, which are used together as the second fusion result. It is apparent that the second fusion result is a more specific comprehensive branch prediction result of data obtained by combining the respective policy advantages of the BTB unit 1012, TNMT unit 1013, and RAS unit 1031.

Correspondingly, the second branch prediction logic is executed by the third stage pipeline 103, the third stage pipeline 103 further being configured to:

when the RAS unit 1031 determines that the current instruction is a function return instruction or a function call instruction according to the skip judgment type of the current instruction by the first fusion result:

if the instruction is a function call instruction, the return address of the current instruction is written into the fifo maintained by the RAS unit 1031.

The third branch prediction logic is specifically configured to:

The third branch prediction logic is also executed by the third stage pipeline 103, the third stage pipeline 103 further being configured to:

if not, not returning any data to the finger taking unit.

Specifically, when the third branch prediction strategy is executed, the first branch prediction result and the second branch prediction result are combined to make a judgment, because the prediction result of the NLP unit 1011 is already returned to the instruction fetch unit when the second stage pipeline 102 is executed, and in the third stage pipeline 103, if the NLP prediction result (i.e., the first branch prediction result) is the same as the comprehensive prediction result of the BTB unit 1012, the TNMT unit 1013, and the RAS unit 1031 (i.e., the obtained second branch prediction result can be output), it is indicated that the whole branch system has implemented a correct jump prediction for the branch instruction when the first branch prediction result is returned to the instruction fetch unit, at this time, the third stage pipeline 103 no longer returns data to the instruction fetch unit, avoiding flushing out the correct judgment process.

It should be noted that the specific structures of the NLP unit 1011, the BTB unit 1012, the TNMT unit 1013, and the RAS unit 1031 described in the embodiments of the present invention and in fig. 2 to 5 are only used to understand the processing procedure between the pipelines of each stage of the two-stage branch prediction system 100 of the hybrid algorithm in the embodiments of the present invention, which is not used to limit the implementation manner of the whole branch prediction system, it should be considered that the single branch prediction unit implemented in other optimized structures is set in the form of the multi-stage pipeline and the multi-stage branch prediction strategy of the two-stage branch prediction system 100 of the hybrid algorithm constructed in the embodiments of the present invention, and the technical effects achieved are foreseeable and are also within the scope of the present invention.

The embodiment of the invention also provides a two-stage branch prediction method of the hybrid algorithm based on the two-stage branch prediction system of the hybrid algorithm, referring to fig. 6, fig. 6 is a block flow diagram of steps of the two-stage branch prediction method of the hybrid algorithm provided by the embodiment of the invention, and the two-stage branch prediction method of the hybrid algorithm comprises the following steps:

s201, acquiring a program count pointer value of a current instruction transmitted by a fetching unit of a processor in a first-stage pipeline, and respectively sending the program count pointer value into the NLP unit, the BTB unit and the TNMT unit to perform table entry query to respectively obtain an NLP prediction result, a BTB prediction result and a TNMT prediction result;

s202, storing the NLP prediction result in a second stage pipeline, and returning the NLP prediction result to the instruction fetch unit as a first branch prediction result; meanwhile, the BTB prediction result and the TNMT prediction result are stored, and fusion prediction is carried out according to a first branch prediction logic to obtain a first fusion result;

s203, performing fusion prediction in a third-stage pipeline through the RAS unit according to the first fusion result and the second branch prediction logic to obtain a second fusion result; and comparing the second fusion result with the NLP prediction result according to a third branch prediction logic to obtain a second branch prediction result, and returning the second branch prediction result to the instruction fetching unit.

The two-stage branch prediction method of the hybrid algorithm can achieve the same technical effects as the two-stage branch prediction system of the hybrid algorithm in the above embodiment through the steps thereof, and the description of the above embodiment is omitted herein.

Referring to fig. 7, fig. 7 is a schematic structural diagram of a computer device according to an embodiment of the present invention, where the computer device 300 includes: memory 302, processor 301, and a two-level branch prediction program of a hybrid algorithm stored on the memory 302 and executable on the processor 301.

The processor 301 invokes the two-stage branch prediction program of the hybrid algorithm stored in the memory 302, and executes the steps in the two-stage branch prediction method of the hybrid algorithm provided in the embodiment of the present invention, please refer to fig. 6, specifically including the following steps:

The computer device 300 provided in the embodiment of the present invention can implement the steps in the two-stage branch prediction method of the hybrid algorithm in the above embodiment, and can implement the same technical effects, and is not described in detail herein with reference to the description in the above embodiment.

The embodiment of the invention also provides a computer readable storage medium, on which a two-stage branch prediction program of a hybrid algorithm is stored, and when the two-stage branch prediction program of the hybrid algorithm is executed by a processor, each process and step in the two-stage branch prediction method of the hybrid algorithm provided by the embodiment of the invention are implemented, and the same technical effects can be achieved, so that repetition is avoided, and no redundant description is provided here.

Those skilled in the art will appreciate that implementing all or part of the above-described methods may be accomplished by two-stage branch prediction programs of hybrid algorithms, stored on a computer readable storage medium, which when executed may comprise the steps of the above-described methods. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM) or the like.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) comprising instructions for causing a terminal (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the method according to the embodiments of the present invention.

While the embodiments of the present invention have been illustrated and described in connection with the drawings, what is presently considered to be the most practical and preferred embodiments of the invention, it is to be understood that the invention is not limited to the disclosed embodiments, but on the contrary, is intended to cover various equivalent modifications and equivalent arrangements included within the spirit and scope of the appended claims.

Claims

1. A hybrid algorithm two-stage branch prediction system, comprising:

2. The hybrid-algorithmic two-stage branch prediction system of claim 1, wherein the first branch prediction logic is specifically:

3. The hybrid-algorithmic two-stage branch prediction system of claim 2, wherein the second branch prediction logic is specifically:

4. The hybrid-algorithmic two-stage branch prediction system of claim 3, wherein the third branch prediction logic is specifically:

5. The hybrid-algorithmic two-stage branch prediction system of claim 3, wherein the third stage pipeline is further configured to:

6. The hybrid-algorithmic two-stage branch prediction system of claim 4, wherein the third stage pipeline is further configured to:

if not, not returning any data to the finger taking unit.

7. A two-stage branch prediction method of a hybrid algorithm implemented by a two-stage branch prediction system based on the hybrid algorithm of any one of claims 1-6, comprising the steps of:

8. A computer device, comprising: a memory, a processor, and a hybrid algorithm two-level branch prediction program stored on the memory and executable on the processor, the processor implementing the steps in the hybrid algorithm two-level branch prediction method of claim 7 when executing the hybrid algorithm two-level branch prediction program.

9. A computer readable storage medium having stored thereon a hybrid algorithm two-stage branch prediction program which when executed by a processor implements the steps in the hybrid algorithm two-stage branch prediction method of claim 7.