CN117632262A - Branch prediction method and system, branch predictor, processor and storage medium - Google Patents

Branch prediction method and system, branch predictor, processor and storage medium Download PDF

Info

Publication number
CN117632262A
CN117632262A CN202311862430.XA CN202311862430A CN117632262A CN 117632262 A CN117632262 A CN 117632262A CN 202311862430 A CN202311862430 A CN 202311862430A CN 117632262 A CN117632262 A CN 117632262A
Authority
CN
China
Prior art keywords
prediction
branch
pipeline
condition
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311862430.XA
Other languages
Chinese (zh)
Inventor
王小岛
高军
赵天磊
苑佳红
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Phytium Technology Co Ltd
Original Assignee
Phytium Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Phytium Technology Co Ltd filed Critical Phytium Technology Co Ltd
Priority to CN202311862430.XA priority Critical patent/CN117632262A/en
Publication of CN117632262A publication Critical patent/CN117632262A/en
Pending legal-status Critical Current

Links

Landscapes

  • Advance Control (AREA)

Abstract

The application provides a branch prediction method and system, a branch predictor, a processor and a storage medium, which are applied to the branch predictor in a branch prediction system, wherein the system comprises the branch predictor and an accuracy counter, and the method comprises the following steps: predicting by using a first prediction pipeline to obtain first branch prediction information; executing the self-increment operation of the precision counter under the condition that the preset line brushing condition is not met but the preset confidence condition is met; and under the condition that the count of the precision counter reaches a preset value, activating a second prediction pipeline to simultaneously predict by using the first prediction pipeline and the second prediction pipeline to obtain first branch prediction information corresponding to the first prediction pipeline and second branch prediction information corresponding to the second prediction pipeline. The method and the device utilize the precision counter to count the prediction precision, and can adaptively adjust the prediction bandwidth.

Description

Branch prediction method and system, branch predictor, processor and storage medium
Technical Field
The present application relates to the field of branch prediction and instruction fetching techniques, and more particularly, to a branch prediction method, a branch predictor, a branch prediction system, a processor, and a computer-readable storage medium.
Background
Out-of-order superscalar processors are high performance microprocessors that can execute multiple instructions per clock cycle and do not require changing the state of the processor in the order of the instructions. The prospective speculation execution of the out-of-order superscalar processor can improve the overall performance of the CPU, but when speculation fails, the power consumption is wasted, and the predictor is polluted.
Based on this, the present application provides a branch prediction method, a branch predictor, a branch prediction system, a processor, and a computer-readable storage medium to improve the related art.
Disclosure of Invention
It is an object of the present application to provide a branch prediction method, a branch predictor, a branch prediction system, a processor, and a computer-readable storage medium, for adaptively adjusting a prediction bandwidth by counting prediction accuracy by an accuracy counter.
The purpose of the application is realized by adopting the following technical scheme:
in a first aspect, the present application provides a branch prediction method applied to a branch predictor in a branch prediction system, the system including the branch predictor and an accuracy counter, the method comprising:
predicting by using a first prediction pipeline to obtain first branch prediction information;
Executing the self-increment operation of the precision counter under the condition that the preset line brushing condition is not met but the preset confidence condition is met;
and under the condition that the count of the precision counter reaches a preset value, activating a second prediction pipeline to simultaneously predict by using the first prediction pipeline and the second prediction pipeline to obtain first branch prediction information corresponding to the first prediction pipeline and second branch prediction information corresponding to the second prediction pipeline.
In a second aspect, the present application provides a branch predictor configured to perform any one of the branch prediction methods described above.
In a third aspect, the present application provides a branch prediction system comprising a branch predictor, an accuracy counter, and a fetch target queue;
the branch predictor is used for predicting by using a first prediction pipeline to obtain first branch prediction information; executing the self-increment operation of the precision counter under the condition that the preset line brushing condition is not met but the preset confidence condition is met; activating a second prediction pipeline under the condition that the count of the precision counter reaches a preset value, so as to simultaneously predict by utilizing the first prediction pipeline and the second prediction pipeline, and obtain first branch prediction information corresponding to the first prediction pipeline and second branch prediction information corresponding to the second prediction pipeline; and writing the branch target address in the branch prediction information into a fetch target queue; the branch prediction information includes the first branch prediction information and the second branch prediction information;
The instruction fetch target queue is to decouple a prediction pipeline and an instruction cache pipeline, the prediction pipeline including the first prediction pipeline and the second prediction pipeline.
In a fourth aspect, the present application provides a computing device comprising any one of the branch prediction systems described above.
In a fifth aspect, the present application provides a processor comprising any one of the branch prediction systems described above.
In a sixth aspect, the present application provides a computer readable storage medium storing instructions of any one of the branch prediction methods described above.
The present application provides a branch prediction method, a branch predictor, a branch prediction system, a processor, and a computer-readable storage medium, which utilize an accuracy counter to count prediction accuracy, and adaptively adjust a prediction bandwidth. When the line brushing condition is not satisfied but the predicted confidence level satisfies the preset confidence level condition, the precision counter is incremented, so that the count of the precision counter is gradually increased. Once the precision counter reaches a preset value, a second prediction pipeline is activated, and two prediction pipelines, namely a first prediction pipeline and a second prediction pipeline, are utilized to simultaneously obtain two different branch prediction information when predicting a branch instruction, so that the prediction bandwidth is increased to improve the prediction performance.
Drawings
The application is further described below with reference to the drawings and detailed description.
FIG. 1 is a flow chart of a branch prediction method according to an embodiment of the present application.
FIG. 2 is a flow chart of a branch prediction method provided in an embodiment of the present application.
Fig. 3 is a schematic flow chart of statistical prediction accuracy according to an embodiment of the present application.
Fig. 4 is a schematic flow chart of selecting a predicted bandwidth according to an embodiment of the present application.
FIG. 5 is a block diagram of a branch prediction system, an instruction cache module, and a downstream processing module according to an embodiment of the present application.
Fig. 6 is a block diagram of a computing device provided in an embodiment of the present application.
Detailed Description
The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all, of the embodiments of the present application. All other embodiments, based on the embodiments herein, which are within the scope of the protection of the present application, will be within the skill of the art without inventive effort.
In the description of the embodiments of the present application, it should be understood that the terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or an implicit indication of the number of features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include one or more of the described features. In the description of the embodiments of the present application, the meaning of "a plurality" is two or more, unless explicitly defined otherwise.
Out-of-order superscalar processors are high performance microprocessors that can execute multiple instructions per clock cycle and do not require changing the state of the processor in the order of the instructions. This technique can increase the throughput of instructions and the execution speed of programs, but also requires more hardware resources and greater power consumption. The basic principle of the out-of-order superscalar processor is to solve the problems of data correlation and pipeline stages among instructions by using methods such as branch prediction, register renaming, dynamic scheduling and the like. Out-of-order superscalar processors typically have one or more pipelines, each pipeline containing operations of different stages, such as fetching, decoding, executing, accessing memory, etc. Data coherency and synchronization is maintained between pipelines through some mechanism. The out-of-order superscalar processor has the greatest advantage over a common scalar processor in that it fetches and submits multiple instructions per cycle, improving Instruction Level Parallelism (ILP) techniques. The ILP technology can enable the processor to complete more calculation tasks in one clock cycle, so that the program running efficiency is improved. ILP techniques may also be used in application specific fields or specific application scenarios such as embedded systems, graphics processing, signal processing, etc.
The prospective speculation execution of the out-of-order superscalar processor greatly improves the overall performance of the CPU, but when speculation fails, power consumption is wasted, and a predictor is polluted. Specifically, when the speculation fails, the CPU performs operations such as "fetch-decode-execute-access" on the look-ahead error path, but eventually is cleared, and the consumed energy does not contribute to the performance improvement at all, which is "waste" of power consumption. If a branch instruction on the wrong path is executed, the predictor cache is replaced and updated, and useful information is driven out of the cache, which reduces the coverage capability of the branch predictor, resulting in a subsequent drop in the accuracy of the branch prediction, which is "pollution" of the predictor.
The present application provides a branch prediction method, a branch predictor, a branch prediction system, a processor, and a computer-readable storage medium to improve the related art. According to the method and the device, the feedback of the prediction precision is monitored through the precision counter, and the prediction bandwidth can be adjusted in a self-adaptive mode.
Although the present application takes the branch prediction scenario as an example, the present application may be applied to other prediction scenarios, and the present application is not limited thereto.
Referring to fig. 1, fig. 1 is a schematic flow chart of a branch prediction method according to an embodiment of the present application.
The embodiment of the application provides a branch prediction method which is applied to a branch predictor and comprises steps S101-S103.
Step S101: and predicting by using a first prediction pipeline to obtain first branch prediction information.
Step S102: and executing the self-increment operation of the precision counter under the condition that the preset line brushing condition is not met but the preset confidence condition is met.
Step S103: and under the condition that the count of the precision counter reaches a preset value, activating a second prediction pipeline to simultaneously predict by using the first prediction pipeline and the second prediction pipeline to obtain first branch prediction information corresponding to the first prediction pipeline and second branch prediction information corresponding to the second prediction pipeline.
The branch predictor may include, for example, a prediction control module and a prediction cache module, where the prediction cache module is used to store branch information, and the prediction control module is used to control read-write access of the prediction cache module, read branch information from the prediction cache module, infer branch prediction information according to the branch information, write a branch target address in the branch prediction information into a finger-fetching target queue, receive training data fed back by a downstream processing module, context switch information, and prediction accuracy of a corresponding branch target address. The prediction control module accesses the prediction cache module to acquire the branch information, and the prediction control module can predict the execution direction (i.e. branch direction), the type (i.e. branch type) and the target address (i.e. branch target address) of the branch instruction through further analysis of the branch information as the branch prediction information. The branch predictor may not only predict the branch prediction information, but may also derive the confidence of the prediction.
The branch prediction information includes first branch prediction information and second branch prediction information, each of which may include, for example, a branch direction, a branch type, a branch target address, etc. of the corresponding branch. The branch target address in the branch prediction information may be written to the fetch target queue such that the fetch target queue accesses the instruction cache module to cause the instruction cache module to output a cache instruction to the downstream processing module.
The line is flushed (i.e., flushed) by a flush line or flush. Because the hardware mechanism that typically initiates a flush is distributed at the back end of the pipeline, when a flush is initiated, the entry of an already erroneous instruction stream into the front-end pipeline is speculatively processed, requiring that the erroneous instruction stream be purged from the pipeline, and thus referred to as flushing (including flushing various queues, buffers, etc.).
In step S102, the line brushing condition and the confidence condition are both preset conditions, and it is determined whether the corresponding conditions are satisfied, so as to determine whether to execute the flushing operation of the finger target queue, whether to execute the self-increment operation of the precision counter, and the like. The self-increment operation is to add one to the precision counter to count +1.
The precision counter is a counter for counting the number of predictions for which the predicted confidence meets a predetermined confidence condition. After the count of the precision counter reaches the preset value and activates the second prediction pipeline, the precision counter can keep the count unchanged, namely the count is always the preset value, and the zero clearing operation is executed until the line brushing condition is met. Alternatively, the precision counter may continue to count, accounting only for the predicted number of confidence levels predicted by the first prediction pipeline that satisfy the confidence condition. Alternatively, the precision counter may continue to count, accounting for not only the predicted amount of confidence that the first prediction pipeline predicts meets the confidence condition, but also the predicted amount of confidence that the second prediction pipeline predicts meets the confidence condition.
The classification manner of the predicted confidence is not limited in the embodiment of the present application, and may be classified into a first type of confidence and a second type of confidence, for example. The embodiments of the present application do not limit the confidence conditions, and the confidence conditions may include, for example: the predicted confidence level is a first type of confidence level. The first type of confidence and the second type of confidence are not limited, and the first type of confidence may be, for example, high confidence, and the second type of confidence may be, for example, non-high confidence, that is, the predicted confidence is classified into high confidence and non-high confidence.
Alternatively, the predicted confidence level may be classified into a high confidence level, a medium confidence level, a low confidence level, and the like. The confidence condition may include, for example: the predicted confidence level is high confidence level.
Alternatively, the predicted confidence level may be scored to obtain a percent or ten confidence level score. The confidence condition may include, for example: the predicted confidence score is greater than the preset score. The preset score may be, for example, 60, 70, 80, 90, etc.
The preset values are not limited in this embodiment, and may be, for example, 100, 500, 1000, 5000, 8000, 10000, 20000, etc. The counting of the precision counter reaches a preset value, which means that the counting of the precision counter gradually increases from 0 to the preset value. As an example, the preset value is 8000, and when the count of the precision counter reaches 8000, the second prediction pipeline is activated. Wherein, the counting reaches the preset value N, for example, the case that the precision counter with the counting of N-1 performs the self-increasing operation to make the counting reach N.
In step S103, activating the specific operation of the second prediction pipeline may include, for example, gating the hardware resources on which the second prediction pipeline operates so that it may normally participate in the operation. Similar to opening a "further gate". On the basis that the first prediction pipeline and the second prediction pipeline work simultaneously, the embodiment of the application can continuously activate more prediction pipelines according to the needs in practical application, for example, when the precision counter continuously counts and simultaneously counts the prediction quantity of confidence predicted by the two prediction pipelines meeting the confidence condition and the count reaches a new threshold value (the new threshold value needs to be larger than a preset value), the third prediction pipeline can be activated, and the application is not limited to this.
In this embodiment, the prediction accuracy is counted by an accuracy counter, and the prediction bandwidth is adaptively adjusted. The prediction bandwidth is used to describe the amount of data that the branch predictor processes per cycle, such as prediction bandwidth 32B (Byte) is the amount of data that is processed 32 bytes per cycle. When the line brushing condition is not satisfied but the predicted confidence level satisfies a preset confidence level condition (for example, the predicted confidence level is high confidence level), the precision counter performs an addition process, so that the count of the precision counter is gradually increased. Once the precision counter reaches a preset value, a second prediction pipeline is activated, and two prediction pipelines, namely a first prediction pipeline and a second prediction pipeline, can simultaneously obtain two different branch prediction information when predicting a branch instruction, so that the prediction bandwidth is increased to improve the prediction performance, that is, the adaptability can optimize the performance of the system according to the actual prediction quality, and whether the second prediction pipeline is activated is determined. In addition, the line brushing conditions and the confidence conditions can be flexibly adjusted, and preset line brushing conditions and confidence conditions are allowed to be customized according to performance requirements and prediction quality, so that the system can be flexibly configured according to requirements of different applications and workloads, and optimal performance and energy efficiency balance are achieved.
The present embodiments do not limit the flush condition, and in some embodiments, the branch prediction system may further include a fetch target queue, where the fetch target queue is configured to decouple a prediction pipeline and an instruction cache pipeline, and the prediction pipeline includes the first prediction pipeline and the second prediction pipeline. The brushing conditions may include: the downstream processing module feeds back mispredictions and/or context switches. The method may further comprise: writing a branch target address in the branch prediction information into the instruction fetch target queue; the branch prediction information includes the first branch prediction information and the second branch prediction information.
In some embodiments, the method may further comprise: and under the condition that the line brushing condition is met, carrying out zero clearing processing on the precision counter.
Since the branch prediction information includes the first branch prediction information and the second branch prediction information, the branch target address in the branch prediction information includes the branch target address in the first branch prediction information and the branch target address in the second branch prediction information. Since the first prediction pipeline and the second prediction pipeline predict at the same time, compared with the case that a single prediction pipeline works, the prediction speed is doubled, and double branch target addresses can be predicted in the same time and written into the fetch target queue. The type of branch target address in the embodiments of the present application is not limited, and may be, for example, a virtual address or a physical address. When the branch target address is a virtual address, the fetch target queue accesses the one-level or multi-level instruction cache module so that the lowest-level instruction cache module outputs a cache instruction of a physical address corresponding to the virtual address to the downstream processing module. When the branch target address is a physical address, the instruction cache module is accessed by the instruction fetch target queue so that the instruction cache module outputs a cache instruction corresponding to the physical address to the downstream processing module. The embodiment of the application does not limit the cache instruction output by the instruction cache module, and may be, for example, a cache line. The Cache line is an instruction block and a Cache line and is a unit of Cache operation.
In the embodiment of the application, the instruction fetching target queue can decouple the prediction pipeline and the instruction cache pipeline, that is, the instruction fetching target queue is used for decoupling between the first prediction pipeline and the instruction cache pipeline and decoupling between the second prediction pipeline and the instruction cache pipeline. The prediction pipeline may include, for example, the following operations: the prediction control module accesses the prediction buffer module to obtain branch information, predicts the branch information according to the branch information to obtain branch prediction information, and writes the branch target address into the instruction fetching target queue. The instruction cache pipeline may include, for example, the following operations: the read pointer reads a branch target address from the instruction fetch target queue, accesses the instruction cache module, and outputs a corresponding cache instruction to the downstream processing module.
In a line-brushing condition, for example, downstream processing module feedback misprediction and context switching may be included. The downstream processing module is not limited in this embodiment, and may include, for example, but not limited to, an instruction decoding unit, an instruction dispatch unit, an instruction transmitting unit, an instruction executing unit, an instruction submitting unit, and the like.
The branch instruction, when executed by a branch execution unit in a downstream processing module, obtains actual results, including actual branch direction, branch type, and branch target address. The branch execution unit compares whether the predicted branch direction, branch type, and branch target address match the actual branch direction, branch type, and branch target address, and if any one or more of these do not match, this is referred to as misprediction. The downstream processing module may feed back a misprediction (or prediction error) to the branch predictor to train the branch predictor; if each term is consistent, the prediction may be fed back correctly to the branch predictor. The branch direction in the actual result refers to whether the branch instruction has a jump behavior when executing. The branch direction in the predicted outcome is the behavior of whether the branch, which was speculated by the branch predictor, jumps or not before the branch instruction is executed.
In some embodiments, in the event of a misprediction or context switch, the downstream processing module may notify the fetch target queue to flush the fetch target queue.
In some embodiments, the method may further comprise: in the event of a feedback misprediction by the downstream processing module, a training operation for the branch predictor is performed.
In some embodiments, performing training operations for the branch predictor may include: the branch execution unit in the downstream processing module performs correction and update operations to the branch predictor after comparing the actual result of the branch instruction with the branch prediction information. For example, if the branch execution unit finds that the direction of a particular branch instruction is mispredicted, the branch predictor may be directionally corrected.
As an example, a downstream processing module is a collection of functional modules in which a branch execution unit is responsible for initiating training of a branch predictor. By executing training operation under the condition that the downstream processing module feeds back the misprediction, for example, correcting and updating operation and the like to the branch predictor, the misprediction can be corrected in time, so that the accuracy of the branch prediction is improved, the probability of misprediction is reduced, the system performance deficiency caused by misprediction is reduced, the system can be corrected quickly when the system encounters the misprediction, and the robustness and the stability of the system are improved. The misprediction is reduced, unnecessary branch transfer and instruction fetching target queue emptying operations can be avoided, the execution path of a branch instruction can be predicted more effectively, unnecessary calculation and storage expenses are reduced, unnecessary resource waste is avoided, and therefore the utilization efficiency of system resources is improved, and the overall performance is improved.
The context information of the program includes, but is not limited to, process number, exception level, etc. Branch prediction information at different processes or exception levels cannot be shared, such as that cached by running program a, and cannot be shared with program B. A switch from one program to another belongs to a context switch. After the context switch occurs, the cache of the branch predictor cannot provide a prediction result with higher precision at the moment, certain training is needed, and a larger prediction bandwidth is not suitable to be set. So first, the error information entering the instruction fetching target queue before the switching process is cleared; and secondly, resetting the precision counter, reducing the prediction bandwidth, and reducing the power consumption expenditure and buffer pollution caused by prediction failure due to underpreheating of a new process.
Therefore, the system can respond to misprediction and context switching events in time, unnecessary resource occupation and energy consumption can be effectively reduced by emptying the finger-taking target queue and resetting the precision counter, the energy efficiency of the system can be improved, and resources can be managed more effectively. Writing the branch target address in the branch prediction information into the instruction fetch target queue helps to improve the efficiency of instruction prefetching. The instruction fetching target queue can decouple the prediction pipeline and the instruction cache pipeline, and the decoupling between the branch predictor and the instruction cache module is realized, so that the instruction fetching target address can be effectively supplied to the instruction cache pipeline under the condition of correct prediction, and the possibility that the instruction cache is missed to block the branch predictor is reduced. In addition, by flexible configuration of brushing line conditions, stability and reliability of the system are improved, and the system is ensured to show good performance under various working loads and application scenes.
In some embodiments, the method may further comprise: in the case that the brush condition is satisfied and the first prediction pipeline and the second prediction pipeline predict simultaneously: closing the second prediction pipeline to predict by using the first prediction pipeline; alternatively, the first prediction pipeline is shut down to make predictions using the second prediction pipeline.
That is, one of the first prediction pipeline and the second prediction pipeline is shut down to make predictions using the other of the first prediction pipeline and the second prediction pipeline, which can be divided into two cases.
In the first case, the second prediction pipeline is shut down, and the first prediction pipeline is reserved for prediction. In this case, the first prediction pipeline is always active, evolution of a prediction pipeline in an active state may expressed as "1- > 1+2" 1→1+2→1 … … ". For example, where the prediction resources and prediction performance of the first prediction pipeline are better than the second prediction pipeline, the prediction pipeline that maintains the best prediction performance may operate to ensure that the lower limit of the prediction performance of the system is always at a higher level.
In the second case, the first prediction pipeline is shut down, and the second prediction pipeline is reserved for prediction. And when the count of the precision counter reaches the preset value again, the first prediction pipeline can be activated to simultaneously predict by using the two prediction pipelines, the second prediction pipeline is closed until the line brushing condition is met again, and the like, the evolution of the prediction pipeline in the activated state can be expressed as '1- > 1+2- > 2- > 1+2- > 1 … …'. In this case, the first prediction pipeline and the second prediction pipeline may be alternately in an active state and an inactive state, using the prediction resources of the branch predictor more evenly.
In this embodiment, the prediction resources refer to hardware and software resources used by the prediction pipeline to execute the prediction task, and the prediction performance refers to efficiency and accuracy of the prediction pipeline in predicting the branching behavior. Specifically, the prediction resources relate to, for example, hardware resources (such as cache size, type, access speed, etc.) used by each prediction pipeline, and a software-level prediction algorithm. The prediction resources and corresponding prediction performance used by the two prediction pipelines may be the same or different. For example, the first prediction pipeline may use a larger or faster branch history table (Branch History Table, BHT) or pattern history table (Pattern History Table, PHT) to provide more abundant history information to make more accurate predictions.
The evolution pattern of "1- > 1+2- > 1 … …" indicates that in the first case the first prediction pipeline remains active all the time. When certain conditions are met, the second prediction pipeline will also be activated (1+2), and when these conditions are no longer met, the system will return to a state using only the first prediction pipeline (1). This approach can dynamically optimize the prediction accuracy according to the runtime situation. In the case where the first prediction pipeline may have better prediction resources and performance than the second prediction pipeline, selecting the first pipeline that retains better performance may ensure that the prediction performance of the system is not below a certain level, even if both pipelines are available. By always keeping the first prediction pipeline with better performance running, it can be ensured that the prediction performance of the system is always kept at a relatively high level, even without activating the second prediction pipeline.
The evolution pattern of "1- > 1+2- > 2- > 1+2- > 1 … …" indicates that in the second case, the two prediction pipelines are alternately active and inactive. The first prediction pipeline and the second prediction pipeline may be dynamically switched according to actual prediction accuracy and other conditions. The precision counter is used to determine when to activate a prediction pipeline that is not currently active, and the brush condition is used to determine when to shut down one of the prediction pipelines. In this way, the resources of the branch predictor may be used more evenly. Different prediction pipelines may perform better under different conditions, and good prediction effects can be achieved in different execution stages through dynamic switching, so that a processor is helped to adapt to continuously-changing program behaviors, and overall prediction accuracy is improved.
In some embodiments, the method may further comprise: in the case that the line brushing condition is not satisfied and the confidence condition is not satisfied, the count of the precision counter is kept unchanged.
In some embodiments, the method may further comprise: and under the condition that the count of the precision counter is lower than the preset value, continuing to predict by using the first prediction pipeline.
In this embodiment, when prediction occurs once and the confidence level of the prediction meets the preset confidence level condition, the count of the precision counter is incremented by one, i.e. the self-increment operation of the precision counter is executed; the accuracy counter keeps the count unchanged when the primary prediction occurs but the confidence does not meet the confidence condition; if one misprediction or program context switch occurs, the precision counter is cleared, and statistics is restarted. When the precision counter is continuously incremented up to a threshold value, the reliability of the prediction is high and the prediction is correct (because the error prediction is performed, the zero clearing operation is performed), and the prediction precision is considered to be high, otherwise, the prediction precision is considered to be low. On the premise of an original prediction pipeline, the other prediction pipeline is gated to work through the high prediction precision so as to increase the prediction bandwidth, otherwise, the other prediction pipeline is blocked to work so as to reduce the prediction bandwidth, and further, the prediction bandwidth is doubled when the prediction precision is high so as to improve the performance, and the prediction bandwidth is halved when the prediction precision is low so as to reduce the prediction power consumption and the cache pollution.
In some embodiments, the obtaining the confidence of the prediction corresponding to the first branch prediction information includes: recording the branch direction of a branch corresponding to the first branch prediction information in the last N times by using an N-bit saturation counter so as to determine the confidence of the current prediction, wherein N is an integer greater than 1; alternatively, the process of obtaining the confidence of the prediction corresponding to the first branch prediction information and the second branch prediction information includes: and recording the branch directions of branches corresponding to the first branch prediction information and the second branch prediction information in the last M times by using an M-bit saturation counter so as to determine the confidence of the current prediction, wherein M is an integer greater than 1.
That is, there are two cases in which the saturation counter is used to determine the prediction confidence. In the first case, the branch direction of a branch corresponding to the first prediction pipeline is separately recorded using an N-bit saturation counter to determine the confidence of the current prediction of the branch corresponding to the first prediction pipeline. In the second case, the M-bit saturation counter is used to record the branch direction of one branch corresponding to two prediction pipelines (i.e., the first prediction pipeline and the second prediction pipeline) at the same time to determine the confidence of the current prediction of the branch corresponding to any one prediction pipeline, and at this time, the branch direction recorded in the M-bit saturation counter can be used to determine the confidence of the current prediction of the branch, regardless of whether the branch direction recorded in the M-bit saturation counter corresponds to the same prediction pipeline or two prediction pipelines, or regardless of which prediction pipeline the current prediction corresponds to.
The mode and trend of the branch behavior can be captured through the saturation counter, and the prediction mode based on the historical behavior can improve the accuracy of branch prediction, especially for branches showing a certain regularity. The saturation counter enables a mechanism to dynamically evaluate the confidence of the prediction, e.g., predicting correctly multiple times in succession may increase confidence, while predicting incorrectly may decrease confidence, such dynamic adjustment enabling the prediction system to self-correct based on real-time data. A saturation counter is a relatively simple implementation, easy to implement in hardware, and more efficient in terms of resources and energy consumption than more complex predictive algorithms. In the second case, the same saturation counter is used to evaluate the confidence of the predictions for two prediction pipelines for a branch, which provides a uniform way to evaluate the accuracy of the overall prediction process, and reduces the complexity of the design, which is beneficial to improving the performance of the overall branch prediction system. Sharing saturation counter data between different prediction pipelines may help the system adapt better to different prediction modes. For example, if the prediction modes of the two pipelines are similar, sharing data may improve prediction accuracy; if the patterns are different, this approach still provides valuable historical information for assessing the current prediction confidence.
The value of M, N in the embodiment of the present application is not limited, and may be, for example, 2, 3, 4, etc.
The branch prediction information is a prediction result further speculated and output by the prediction control module accessing the prediction cache module after obtaining the branch information, and includes a mode history record of the branch, which is abbreviated as PHR (Pattern History Records), for example, a 2-bit or 3-bit saturation counter may be used for recording.
In some embodiments, a 2-bit saturation counter may be designed to record the past two directions of a branch for predicting the current branch direction, considering that the current direction of the branch is related to its previous 2 directions. Taking a 2-bit saturation counter as an example, when phr=2 'b11 or 2' b00, the predicted confidence is high, and when phr=2 'b10 or 2' b01, the predicted confidence is not high. In the PHR use and training process, the PHR obtained by accessing the prediction cache can be any one of the following 4 types.
(1) 2' b00: indicating that the prediction direction is not jumping and that there is high confidence.
If the prediction is wrong, the update PHR is changed from 2'b00- >2' b01.
If the prediction is correct, the PHR remains unchanged.
(2) 2' b01: indicating that the prediction direction is not jumping and is not highly confident.
If the prediction is wrong, the update PHR is changed from 2'b01- >2' b10.
If the prediction is correct, the update PHR is defined by 2'b01- >2' b00.
(3) 2' b10: indicating that the predicted direction is skip and not highly confident.
If the prediction is wrong, the update PHR is changed from 2'b10- >2' b01.
If the prediction is correct, the update PHR is made from 2'b10- >2' b11.
(4) 2' b11: indicating that the prediction direction is skip and with high confidence.
If the prediction is wrong, the update PHR is changed from 2'b11- >2' b10.
If the prediction is correct, the PHR remains unchanged.
The predicted direction refers to a predicted branch direction, that is, a branch direction in the branch prediction information. A misprediction, i.e., a misprediction, occurs when the branch direction, branch type, and branch target address in the corresponding prediction result (i.e., branch prediction information) are inconsistent with at least one of the branch direction, branch type, and branch target address in the actual result. The prediction is correct, corresponding to the case that the branch direction, branch type, and branch target address in the predicted outcome are consistent with each of the branch direction, branch type, and branch target address in the actual outcome.
In the present embodiment, the high confidence and the prediction accuracy are not the same or the same. The high confidence is based on the premise that the prediction is correct, because the counter is cleared once the misprediction accuracy is the highest priority.
By using an N-bit saturation counter to record the last N times the direction of the branch, the current branch direction can be predicted more accurately. The saturation counter is updated according to the correctness of the prediction result, so that the branch prediction system can learn from the previous errors and self-adjust. For example, if the prediction is incorrect, the value of the saturation counter may be adjusted accordingly to reflect the most recent branch behavior change. Compared with a complex prediction algorithm, the method adopting the saturation counter is simpler and more direct to realize, is more efficient in hardware realization, and occupies less resources. Since the number of bits N of the N-bit saturation counter can be selected as desired (e.g., 2, 3, 4, etc.), the relationship between the accuracy of the prediction and the required resources can be balanced.
In some embodiments, the method may further comprise: and under the condition that the count of the precision counter reaches the preset value, the number of instruction channels and/or the capacity of a cache structure of the computing equipment are/is increased so as to promote prefetching instructions.
In some embodiments, the method may further comprise: in the event that the flush condition is satisfied, the number of instruction channels and/or the capacity of the cache structure of the computing device is reduced to suppress prefetch instructions.
The computing device may be, for example, a processor, which may include, for example, an out-of-order superscalar processor.
In the branch prediction method, when the precision counter reaches a preset value, the number of instruction channels and/or the capacity of a cache structure of the computing device are increased, and the dynamic adjustment enables the system to flexibly increase prediction resources according to the current prediction accuracy and performance requirements so as to promote instruction prefetching, thereby being beneficial to processing more complex instruction streams, and being capable of more effectively utilizing processor resources especially when the prediction accuracy is higher. When the downstream processing module feeds back mispredictions or context switching and the like meet the line brushing conditions, the number of instruction channels and/or the capacity of a cache structure of the computing device are reduced, so that unnecessary resource consumption caused by inaccurate prediction is reduced, and performance loss caused by mispredictions is reduced. By reducing the prediction resources, the system may suppress unnecessary instruction prefetching, thereby improving overall efficiency. The method for dynamically adjusting the number of instruction channels and the cache capacity can optimize the resource use according to the real-time prediction performance and the system state. When the prediction performance is good, the prediction resource is increased to improve the instruction prefetching speed; when the prediction performance is poor, the prediction resources are reduced to reduce the power consumption waste, and the method is beneficial to maximizing the resource utilization efficiency while guaranteeing the efficient prediction. The resource is dynamically adjusted according to the change of the prediction accuracy, so that the adaptability of the system to different workloads is improved, the computing equipment can process different types of programs and tasks more flexibly, and the expandability and the effectiveness of the system in a variable application environment are enhanced.
In some embodiments, the instruction channel may include a decode bandwidth and/or a transmit bandwidth.
In some embodiments, the cache structure may include one or more of the instruction fetch target queue, instruction queue, issue queue, and out-of-order resources.
In the branch prediction method, the instruction channel comprises a decoding bandwidth and an emission bandwidth, the adaptive adjustment of the decoding bandwidth is beneficial to processing intensive instruction sequences, and the adjustment of the emission bandwidth is beneficial to optimizing the parallel execution of instructions. That is, in addition to predicting the bandwidth of the bandwidth, which is the bandwidth of the predicted instruction channel, the method may adaptively adjust the bandwidths of the other two instruction channels, namely the bandwidth of the decode channel and the transmit channel. The flexibility of such adaptive tuning enables the computing device to more effectively handle different workloads, improving overall performance. Various cache structures, including instruction fetch target queues, instruction queues, issue queues, out-of-order resources, etc., offer the possibility of more optimizing prefetch instructions and execution flows so that a computing device may more precisely optimize for a particular performance bottleneck, whether in the instruction fetch, decode, issue, or execute stages. The performance of the computing device in different scenes can be effectively improved through dynamic adjustment of the decoding bandwidth and the transmitting bandwidth and reasonable configuration of different buffer structures. For example, instruction fetch target queues and instruction queues may be preferentially enhanced when branch-intensive code is executed, while processing power of issue queues and out-of-order resources may be enhanced when parallel instruction-intensive code is executed. When a prediction error occurs, the method helps to reduce resource waste and performance degradation caused by the error prediction by reducing the decoding bandwidth and the transmitting bandwidth and reducing the capacity of a buffer structure. By dynamically adjusting the instruction channels and cache structures, resources can be more efficiently utilized at different execution stages, and flexible resource management helps to improve the overall efficiency of the computing device, especially in the face of varying workloads and execution conditions.
Referring to fig. 2 to fig. 4, fig. 2 is a flowchart of a branch prediction method provided in an embodiment of the present application, fig. 3 is a flowchart of a statistical prediction accuracy provided in an embodiment of the present application, and fig. 4 is a flowchart of selecting a prediction bandwidth provided in an embodiment of the present application.
In a specific application scenario, the embodiment of the present application further provides a branch prediction method, which is applied to a branch predictor in a branch prediction system, where the system includes the branch predictor, an accuracy counter, and a finger target queue, as shown in fig. 2, and the method includes:
predicting by using a first prediction pipeline to obtain first branch prediction information, and writing a branch target address in the first branch prediction information into a fetch target queue; the instruction fetch target queue is used for decoupling a prediction pipeline and an instruction cache pipeline, and the prediction pipeline comprises the first prediction pipeline and the second prediction pipeline;
under the condition that the preset brushing condition is met, performing zero clearing processing on the precision counter (namely, zero clearing the precision counter); the brush bar condition includes: the downstream processing module feeds back misprediction and/or context switching;
In the case that the line brushing condition is not satisfied and a preset confidence condition is not satisfied, keeping the count of the precision counter unchanged (i.e., precision counter keeping); the confidence condition includes: the predicted confidence level is high confidence level;
performing a self-increment operation of the precision counter (i.e., the precision counter self-increment) if the brush bar condition is not satisfied but the confidence condition is satisfied;
continuing to predict with the first prediction pipeline (i.e., 1 prediction pipeline operation) if the count of the precision counter is below a preset value;
activating a second prediction pipeline under the condition that the count of the precision counter reaches a preset value, simultaneously predicting (namely, 2 prediction pipelines work) by using the first prediction pipeline and the second prediction pipeline to obtain first branch prediction information corresponding to the first prediction pipeline and second branch prediction information corresponding to the second prediction pipeline, and writing a branch target address in the first branch prediction information and a branch target address in the second branch prediction information into a fetch target queue; meanwhile, the number of instruction channels and/or the capacity of a cache structure of the computing device are/is improved so as to promote prefetching instructions;
Closing the second prediction pipeline to predict with the first prediction pipeline if a brush condition is satisfied and the first prediction pipeline and the second prediction pipeline simultaneously predict; and reducing the number of instruction channels and/or the capacity of a cache structure of the computing device to suppress prefetched instructions if the line swiping condition is satisfied.
Wherein the instruction channel comprises a coding bandwidth and/or a transmitting bandwidth; the cache structure includes one or more of the instruction fetch target queue, instruction queue, issue queue, and out-of-order resource.
The process for obtaining the prediction confidence corresponding to the first branch prediction information comprises the following steps: recording the branch direction of a branch corresponding to the first branch prediction information in the last N times by using an N-bit saturation counter so as to determine the confidence of the current prediction, wherein N is an integer greater than 1; alternatively, the process of obtaining the confidence of the prediction corresponding to the first branch prediction information and the second branch prediction information includes: and recording the branch directions of branches corresponding to the first branch prediction information and the second branch prediction information in the last M times by using an M-bit saturation counter so as to determine the confidence of the current prediction, wherein M is an integer greater than 1.
In this embodiment, as shown in fig. 3, the precision counter is updated each time the prediction is completed, and the specific updating operation is optional.
Firstly, checking whether a misprediction or context switching brush line (hereinafter referred to as a condition 1) occurs, if the condition 1 is satisfied, resetting the precision counter, and ending the flow; if condition 1 is not satisfied, the second step is entered.
Secondly, checking the confidence level of the predicted result (hereinafter referred to as condition 2), if the condition 2 is high confidence level, automatically increasing the precision counter, and ending the flow; if the condition 2 is not high confidence, the precision counter is maintained and the process ends.
Referring to fig. 2 and 4, before each prediction starts, a decision is made as to whether to "activate" the second prediction pipeline based on whether the precision counter reaches a threshold. The first prediction pipeline is unaffected and may be considered a constant active state.
Firstly, checking whether the precision counter reaches a preset value (hereinafter referred to as a condition 3 for short), if the condition 3 is satisfied, activating a second prediction pipeline, and finishing the flow by 2 prediction pipelines; if the condition 3 is not satisfied, the second prediction pipeline is not activated, and the 1 prediction pipeline is kept working, and the flow is ended.
It can be seen that in this embodiment, each time a prediction occurs and the confidence of the prediction is high, the count of the precision counter is +1; the precision counter keeps if a prediction occurs but the confidence is not high or the count of the precision counter has reached a maximum value (i.e., a preset value, which can be a threshold value); if one misprediction or program context switch occurs, the precision counter is cleared, and statistics is restarted. When the precision counter is continuously increased to a threshold value, the prediction confidence is high and the prediction is correct in the past period of time, and if the prediction precision is high, the prediction precision is low. The elapsed time may be a time period in which the precision counter is continuously increased up to the threshold value, and the duration of the time period is not limited in the present application. As an example, the value of the period of time is measured based on the number of predicted occurrences, and is related to the capacity of the prediction cache module. For example, the prediction cache module can accommodate 8000 branch records, and can set a period of time as the duration of 8000 predictions, or select a duration parameter according to performance exploration data.
On the premise of an original prediction pipeline (namely a first prediction pipeline), the other prediction pipeline (namely a second prediction pipeline) is gated to work through the high prediction precision so as to increase the prediction bandwidth, otherwise, the other prediction pipeline (namely the second prediction pipeline) is blocked to work so as to reduce the prediction bandwidth, so that the prediction bandwidth is doubled when the prediction precision is high and used for improving the prediction performance, and the prediction bandwidth is halved when the prediction precision is low and used for reducing the prediction power consumption and the cache pollution.
Embodiments of the present application also provide a branch predictor configured to perform any one of the branch prediction methods described above.
In some embodiments, the branch predictor may include a prediction control module and a prediction cache module, where the prediction cache module is configured to store branch information, and the prediction control module is configured to control read-write access of the prediction cache module, read the branch information from the prediction cache module, infer branch prediction information from the branch information, write a branch target address in the branch prediction information to a finger fetch target queue, and receive training data fed back by a downstream processing module, context switch information, and prediction accuracy corresponding to the branch target address.
The embodiment of the application also provides a branch prediction system, which comprises a branch predictor, an accuracy counter and a finger target queue.
The branch predictor is used for predicting by using a first prediction pipeline to obtain first branch prediction information; executing the self-increment operation of the precision counter under the condition that the preset line brushing condition is not met but the preset confidence condition is met; activating a second prediction pipeline under the condition that the count of the precision counter reaches a preset value, so as to simultaneously predict by utilizing the first prediction pipeline and the second prediction pipeline, and obtain first branch prediction information corresponding to the first prediction pipeline and second branch prediction information corresponding to the second prediction pipeline; and writing the branch target address in the branch prediction information into a fetch target queue; the branch prediction information includes the first branch prediction information and the second branch prediction information.
The instruction fetch target queue is to decouple a prediction pipeline and an instruction cache pipeline, the prediction pipeline including the first prediction pipeline and the second prediction pipeline.
In some embodiments, the precision counter is a stand-alone module. In other embodiments, the precision counter is part of the branch predictor.
Referring to fig. 5, fig. 5 is a block diagram of a branch prediction system, an instruction cache module, and a downstream processing module according to an embodiment of the present application.
In a specific application scenario, an embodiment of the present application further provides a branch prediction system, as shown in fig. 5, where the branch prediction system includes a branch predictor, an accuracy counter (not shown in the figure), and a fetch target queue.
The branch predictor includes a prediction control module 100 and a prediction cache module 101.
The prediction buffer module 101 is a memory bank of a predictor, and is a module with high power consumption. The prediction cache module 101 is configured to store branch information that is executed by a branch execution unit in a downstream processing module, including a branch direction, a branch type, and a branch target address (e.g., a virtual address).
The prediction control module 100 controls the read-write access of the predictor cache module 101, collects the prediction result, writes the instruction target queue 200, and receives the feedback of the downstream processing module 300 to make prediction accuracy statistics. Specifically, the prediction control module 100 is configured to perform prediction using the first prediction pipeline to obtain first branch prediction information, and write a branch target address in the first branch prediction information into the instruction fetch target queue 200; under the condition that the line brushing condition is met, carrying out zero clearing treatment on the precision counter; under the condition that the line brushing condition is not met and the preset confidence condition is not met, keeping the count of the precision counter unchanged; executing self-increment operation of the precision counter under the condition that the brushing condition is not satisfied but the confidence condition is satisfied; under the condition that the count of the precision counter is lower than a preset value, the first prediction pipeline is continuously utilized for prediction; activating a second prediction pipeline under the condition that the count of the precision counter reaches a preset value, simultaneously predicting by using the first prediction pipeline and the second prediction pipeline to obtain first branch prediction information corresponding to the first prediction pipeline and second branch prediction information corresponding to the second prediction pipeline, and writing a branch target address in the first branch prediction information and a branch target address in the second branch prediction information into a finger fetching target queue 200; meanwhile, the number of instruction channels and/or the capacity of a cache structure of the computing device are/is improved so as to promote prefetching instructions; in the case where the downstream processing module 300 feeds back a misprediction and the first prediction pipeline and the second prediction pipeline simultaneously predict, closing the second prediction pipeline to make predictions using the first prediction pipeline; and, in the event of a feedback misprediction or context switch by the downstream processing module 300, reducing the number of instruction channels and/or the capacity of the cache structure of the computing device to suppress prefetch instructions. 102 accessing the prediction buffer module by the prediction control module 100 using the first prediction pipeline, 103 accessing the prediction buffer module 101 by the prediction control module 100 using the second prediction pipeline, 104 accessing the result of the prediction buffer module 101 by the prediction control module 100 using the first prediction pipeline, 105 accessing the result of the prediction buffer module 101 by the prediction control module 100 using the second prediction pipeline. 106 for the prediction control module 100 to write the branch target address in the first branch prediction information to the fetch target queue 200 using the first prediction pipeline, 107 for the prediction control module 100 to write the branch target address in the second branch prediction information to the fetch target queue 200 using the second prediction pipeline.
Fetch target queue 200, the fetch target buffer, is used to decouple the prediction pipeline and the instruction cache pipeline. The fetch target queue 200 holds branch target addresses speculatively derived and provided by the prediction control module 100 for access to a subsequent instruction cache module 201.
The instruction cache module 201 is a memory bank for storing cache lines (i.e., instructions), and is a module with high power consumption. 202 initiate access to the instruction cache module 201 for the instruction fetch target queue 200, and 203 deliver the cache line after the instruction hit to the downstream processing module 300 for the instruction cache module 201.
The downstream processing module 300 encompasses various functions of decoding, dispatching, transmitting, executing, submitting, etc., and provides prediction accuracy, post-misprediction training, and context switch information to the predictive control module 100. The downstream processing module 300 may also feed back mispredictions and/or context switches to the instruction fetch module queue 200. The prediction accuracy fed back by the downstream processing module 300 may be classified into, for example, a prediction accuracy and a prediction error (i.e., misprediction). The representation mode of the prediction precision is not limited, and can be represented by one or more of Chinese, letters, numbers, symbols and graphs. For example, "correct", "Y", "1", and "v" may be used to indicate correct prediction, and "error", "N", "0", and "x" may be used to indicate incorrect prediction.
301 is the downstream processing module 300 finding a misprediction, the feedback prediction control module 100 trains the branch predictors (including assigning, updating, correcting, etc.), informs the prediction control module 100 of the prediction accuracy statistics for the prediction control module 100 to update PHR, and informs the finger fetch target queue 200 to brush.
302 is the downstream processing module 300 finding a context switch, informing the predictive control module 100 of the prediction accuracy statistics, and informing the fetch target queue 200 of the flushing.
In practice, after the hardware is powered on and reset, the downstream processing module 300 notifies the prediction control module 100 to start to load the instruction of the first branch target address.
The predictive control module 100 initiates an access to the predictive buffer module 101, where only 102 access is enabled and 103 access is not enabled because the precision counter has not reached a threshold (i.e., a preset value).
The prediction buffer module 101 feeds back 102 accessed prediction results 104 to the prediction control module 100, and 105 is not fed back because 103 is inactive.
The prediction control module 100 receives the feedback result of 104, updates the state of the precision counter, namely, checks the confidence level corresponding to the prediction result, and if the confidence level is high, the precision counter is self-increased; if not, the precision counter is maintained. Simultaneously, the branch target address in the predicted result is written into the fetch target queue 200 through 106 (107 is not activated because 103 is not activated); and at the same time, the predicted branch target address drives the prediction control module 100 itself again to access the next cycle.
The instruction fetch target queue 200 sequentially sends the branch target addresses stored by itself to the instruction cache module 201 for fetching, and the instruction of the cache line is resolved and delivered to the downstream processing module 300 through 203. After the downstream processing module 300 executes the branch instruction, the prediction control module 100 is notified via 301 to train the prediction control module 100 and update the precision counter. When the downstream processing module 300 needs to perform a context switch, the prediction control module 100 is notified via 302 to update the precision counter.
The prediction cache module 101 accumulates a plurality of branch information after training for a period of time, and when a higher-precision prediction can be given to the branch target address, the precision counter of the prediction control module 100 starts approaching the threshold value.
Once the threshold is met, the predictive control module 100 activates the second predictive pipeline 103 while 105 and 107 path synchronization is activated. At this time, two prediction blocks are processed per cycle, i.e., prediction control module 100 reads two branch information from prediction cache module 101 per cycle, and prediction control module 100 writes two branch target addresses per cycle to fetch target queue 200. At this time, the predicted speed is doubled over the previous one.
After a period of time during which both prediction pipelines are operating, the downstream processing module 300 will notify the prediction control module 100 to perform an accuracy counter adjustment, and shut down the second prediction pipeline operation, assuming that a misprediction has occurred. Because the prediction accuracy at this time is reduced, if no adjustment is made, the instruction on the more error paths will be sent to the downstream processing module 300 for execution by the fast fetch, and the branch execution of the error paths may train the prediction buffer module 101 in advance, so that pollution causes accuracy degradation; meanwhile, if the instruction cache miss is caused, the cache line on the error path is backfilled, so that the instruction cache module 201 is polluted, and the cache miss rate is increased; furthermore, execution of instructions in the wrong path may also result in wasted power consumption by the downstream processing module 300.
In this embodiment, prediction accuracy statistics is implemented through an accuracy counter, and prediction bandwidth adaptation can be performed on the basis of the accuracy statistics, and in addition, the method is also suitable for scenes such as instruction fetching target queue adaptation, instruction fetching bandwidth adaptation, instruction queue adaptation, decoding bandwidth adaptation, transmission queue adaptation, out-of-order resource adaptation, and the like, and relevant parameters of corresponding resources are adaptively adjusted based on the prediction accuracy statistics.
The embodiment of the application also provides a processor, which comprises any one of the branch prediction systems.
Referring to fig. 6, fig. 6 is a block diagram of a computing device according to an embodiment of the present application.
Embodiments of the present application also provide a computing device including any of the branch prediction systems described above.
The computing device is not limited in this embodiment, and may be, for example, a computer server, a cloud server, a distributed server, or the like. The processor may include, for example, an out-of-order superscalar processor, or the like.
In alternative embodiments, the computing device may include the branch prediction system described above. As shown in fig. 6, a computing device may include: memory 110, processor 120, communication interface 130. Wherein the memory 110, the processor 120 and the communication interface 130 are connected by an internal connection path.
Memory 110 is used to store instructions and code, which in some implementations may be code for implementing methods of embodiments of the present application.
The processor 120 is configured to execute instructions and codes stored in the memory 110, to control the communication interface 130 to receive input data and information, and output data such as operation results. In some implementations, when the aspects of the embodiments of the present application are implemented in software or firmware, code for implementing the aspects of the embodiments of the present application may be stored in the processor 120 and executed by the processor 120.
The memory 110 may be volatile memory or nonvolatile memory, or may include both volatile and nonvolatile memory. The nonvolatile memory may be a read-only memory (ROM), a Programmable ROM (PROM), an Erasable Programmable ROM (EPROM), an Electrically Erasable Programmable ROM (EEPROM), or a flash memory, among others. The volatile memory may be Random Access Memory (RAM). It should be noted that the memory 110 described herein is intended to comprise, without being limited to, any of these and other suitable types of memory. By way of example, memory 110 includes Random Access Memory (RAM), cache memory, and Read Only Memory (ROM). Wherein the memory 110 stores a computer program executable by the processor 120 such that the processor 120 implements the steps of any of the methods described above.
The processor 120 may be a central processing unit (Central Processing Unit, CPU), the processor 120 may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (application specific integrated circuit, ASIC), field programmable gate arrays (Field Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor, or the processor 120 may be any conventional processor or the like.
In implementation, the steps of the above method may be performed by integrated logic circuitry in hardware or instructions in software in the processor 120. The methods disclosed in connection with the embodiments of the present application may be embodied directly in hardware processor execution or in a combination of hardware and software modules in the processor 120. The software modules may be located in a random access memory, flash memory, read only memory, programmable read only memory, or electrically erasable programmable memory, registers, etc. as well known in the art. The storage medium is located in the memory 110, and the processor 120 reads the information in the memory 110 and, in combination with its hardware, performs the steps of the method described above. To avoid repetition, a detailed description is not provided herein.
In some implementations, the computing device may include software modules, such as an operating system, a basic input output system (Basic Input Output System, BIOS), application software (Application Software), and the like, in addition to the hardware elements described above.
The operating system, which is used to manage the hardware and/or software resources of the computing device, is the kernel and keystone of the computing device. The operating system needs to handle basic transactions such as managing and configuring memory, prioritizing the supply and demand of system resources, controlling input and output devices, operating networks, and managing file systems. To facilitate user operation, most operating systems provide an operator interface that allows a user to interact with the system.
The BIOS is used to run hardware initialization during the power-on boot phase and to provide runtime services for the operating system and applications. In some implementations, the BIOS may also monitor the display processor temperature and perform functions such as adjusting temperature protection policies.
Application software, also known as application program (Application Program), may be understood as software written for some special application purpose of the user, as one of the main classifications of computer software. For example, the application software may be a program for achieving the purpose of power control, temperature management, and the like.
Embodiments of the present application also provide a computer readable storage medium storing instructions of any one of the above branch prediction methods.
Embodiments of the present application also provide a computer program product comprising instructions of any of the branch prediction methods described above.
The computer program product may employ a portable compact disc read only memory (CD-ROM) and comprise program code and may run on a terminal device, such as a personal computer. However, the computer program product of the present application is not limited thereto, and the computer program product may employ any combination of one or more computer readable media.
User information or user account information (including but not limited to user device information, user personal information, etc.) and data (including but not limited to data for analysis, stored data, etc.) referred to in various embodiments of the present description are information and data that are authorized by the user or sufficiently authorized by the parties, and the collection, use, and processing of relevant data requires compliance with relevant legal regulations and standards of the relevant country and region, and is provided with corresponding instruction entries for the user to select authorization or denial.
It will be appreciated that the specific examples in this specification are intended only to assist those skilled in the art in better understanding the embodiments of the present application and are not intended to limit the scope of the present application.
It should be understood that, in various embodiments of the present disclosure, the sequence number of each process does not mean that the execution sequence is sequential, and the execution sequence of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the present application.
It will be appreciated that the various embodiments described in this specification may be implemented either alone or in combination, and are not limited in this regard.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this specification belongs. The terminology used in the description is for the purpose of describing particular embodiments only and is not intended to limit the scope of the description. The term "and/or" as used in this specification includes any and all combinations of one or more of the associated listed items. As used in this specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
Those of ordinary skill in the art will appreciate that the elements and algorithm steps described in connection with the embodiments disclosed herein can be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present specification.
It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above described embodiments may refer to corresponding procedures in other embodiments, and will not be described herein.
In the several embodiments provided in this specification, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of elements is merely a logical functional division, and there may be additional divisions of actual implementation, e.g., multiple elements or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all units in the system can be selected according to actual needs to achieve the purpose of the technical scheme.
In addition, each functional unit in each embodiment of the present specification may be integrated into one processing unit, each unit may exist alone physically, or two or more units may be integrated into one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present specification may be essentially or, what contributes to the prior art, or the part of the technical solution may be embodied in the form of a software product, which is stored in a storage medium, and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method described in the embodiments of the present specification. And the aforementioned storage medium includes: a usb disk, a removable hard disk, a read-only memory (ROM), a random-access memory (RAM), a magnetic disk, or an optical disk, etc.
The above is only a specific embodiment of the present disclosure, but the scope of the present disclosure is not limited thereto, and any person skilled in the art can easily think about changes or substitutions within the technical scope disclosed in the present disclosure, and should be covered in the scope of the present disclosure. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (13)

1. A method of branch prediction for use in a branch predictor in a branch prediction system, the system including the branch predictor and a precision counter, the method comprising:
predicting by using a first prediction pipeline to obtain first branch prediction information;
executing the self-increment operation of the precision counter under the condition that the preset line brushing condition is not met but the preset confidence condition is met;
and under the condition that the count of the precision counter reaches a preset value, activating a second prediction pipeline to simultaneously predict by using the first prediction pipeline and the second prediction pipeline to obtain first branch prediction information corresponding to the first prediction pipeline and second branch prediction information corresponding to the second prediction pipeline.
2. The branch prediction method of claim 1, wherein the branch prediction system further comprises a fetch target queue for decoupling a prediction pipeline and an instruction cache pipeline, the prediction pipeline comprising the first prediction pipeline and the second prediction pipeline; the brush bar condition includes: the downstream processing module feeds back misprediction and/or context switching;
The method further comprises the steps of:
writing a branch target address in the branch prediction information into the instruction fetch target queue; the branch prediction information includes the first branch prediction information and the second branch prediction information.
3. The branch prediction method according to claim 1 or 2, characterized in that the method further comprises:
and under the condition that the line brushing condition is met, carrying out zero clearing processing on the precision counter.
4. The branch prediction method according to claim 1 or 2, characterized in that the method further comprises:
in the case that the brush condition is satisfied and the first prediction pipeline and the second prediction pipeline predict simultaneously:
closing the second prediction pipeline to predict by using the first prediction pipeline; or,
and closing the first prediction pipeline to predict by using the second prediction pipeline.
5. The branch prediction method of claim 1, wherein the method further comprises:
in the case that the line brushing condition is not satisfied and the confidence condition is not satisfied, keeping the count of the precision counter unchanged;
and/or the number of the groups of groups,
and under the condition that the count of the precision counter is lower than the preset value, continuing to predict by using the first prediction pipeline.
6. The branch prediction method according to claim 1, wherein the obtaining of the confidence of the prediction corresponding to the first branch prediction information includes: recording the branch direction of a branch corresponding to the first branch prediction information in the last N times by using an N-bit saturation counter so as to determine the confidence of the current prediction, wherein N is an integer greater than 1; or,
the process for acquiring the prediction confidence corresponding to the first branch prediction information and the second branch prediction information comprises the following steps: and recording the branch directions of branches corresponding to the first branch prediction information and the second branch prediction information in the last M times by using an M-bit saturation counter so as to determine the confidence of the current prediction, wherein M is an integer greater than 1.
7. The branch prediction method of claim 1, wherein the method further comprises:
under the condition that the count of the precision counter reaches the preset value, the number of instruction channels and/or the capacity of a cache structure of the computing equipment are/is increased so as to promote prefetching instructions;
or,
in the event that the flush condition is satisfied, the number of instruction channels and/or the capacity of the cache structure of the computing device is reduced to suppress prefetch instructions.
8. The branch prediction method according to claim 7, wherein the instruction channel comprises a coding bandwidth and/or a transmission bandwidth;
and/or the number of the groups of groups,
the cache structure includes one or more of the instruction fetch target queue, instruction queue, issue queue, and out-of-order resource.
9. A branch predictor configured to perform the branch prediction method of any one of claims 1-8.
10. The branch predictor as recited in claim 9, wherein the branch predictor comprises a prediction control module and a prediction cache module, the prediction cache module for storing branch information, the prediction control module for controlling read-write access of the prediction cache module, reading the branch information from the prediction cache module, predicting branch prediction information based on the branch information, writing a branch target address in the branch prediction information to a fetch target queue, and receiving training data fed back by a downstream processing module, context switch information, and prediction accuracy corresponding to the branch target address.
11. A branch prediction system, the system comprising a branch predictor, an accuracy counter, and a fetch target queue;
The branch predictor is used for predicting by using a first prediction pipeline to obtain first branch prediction information; executing the self-increment operation of the precision counter under the condition that the preset line brushing condition is not met but the preset confidence condition is met; activating a second prediction pipeline under the condition that the count of the precision counter reaches a preset value, so as to simultaneously predict by utilizing the first prediction pipeline and the second prediction pipeline, and obtain first branch prediction information corresponding to the first prediction pipeline and second branch prediction information corresponding to the second prediction pipeline; and writing the branch target address in the branch prediction information into a fetch target queue; the branch prediction information includes the first branch prediction information and the second branch prediction information;
the instruction fetch target queue is to decouple a prediction pipeline and an instruction cache pipeline, the prediction pipeline including the first prediction pipeline and the second prediction pipeline.
12. A processor comprising the branch prediction system of claim 11.
13. A computer readable storage medium storing instructions of the branch prediction method of any one of claims 1-8.
CN202311862430.XA 2023-12-29 2023-12-29 Branch prediction method and system, branch predictor, processor and storage medium Pending CN117632262A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311862430.XA CN117632262A (en) 2023-12-29 2023-12-29 Branch prediction method and system, branch predictor, processor and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311862430.XA CN117632262A (en) 2023-12-29 2023-12-29 Branch prediction method and system, branch predictor, processor and storage medium

Publications (1)

Publication Number Publication Date
CN117632262A true CN117632262A (en) 2024-03-01

Family

ID=90021751

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311862430.XA Pending CN117632262A (en) 2023-12-29 2023-12-29 Branch prediction method and system, branch predictor, processor and storage medium

Country Status (1)

Country Link
CN (1) CN117632262A (en)

Similar Documents

Publication Publication Date Title
EP3001308B1 (en) Loop predictor-directed loop buffer
US8219834B2 (en) Predictive power gating with optional guard mechanism
US7890738B2 (en) Method and logical apparatus for managing processing system resource use for speculative execution
TWI403953B (en) System for speculative branch prediction optimization and method thereof
US7937573B2 (en) Metric for selective branch target buffer (BTB) allocation
US20080256345A1 (en) Method and Apparatus for Conserving Power by Throttling Instruction Fetching When a Processor Encounters Low Confidence Branches in an Information Handling System
US20100306515A1 (en) Predictors with Adaptive Prediction Threshold
US7533252B2 (en) Overriding a static prediction with a level-two predictor
US11586944B2 (en) Allocation filter for prediction storage structure
CN101243372A (en) Systems and methods for mutually exclusive activation of microprocessor resources to control maximum power
US10191847B2 (en) Prefetch performance
US11169807B2 (en) System and method for dynamic accuracy and threshold control for branch classification
US8219833B2 (en) Two-level guarded predictive power gating
CN112230992A (en) Instruction processing device comprising branch prediction loop, processor and processing method thereof
US7895422B2 (en) Selective postponement of branch target buffer (BTB) allocation
US7103757B1 (en) System, circuit, and method for adjusting the prefetch instruction rate of a prefetch unit
US11249762B2 (en) Apparatus and method for handling incorrect branch direction predictions
CN117632262A (en) Branch prediction method and system, branch predictor, processor and storage medium
EP4020167A1 (en) Accessing a branch target buffer based on branch instruction information
EP4020187A1 (en) Segmented branch target buffer based on branch instruction type
EP4020229A1 (en) System, apparatus and method for prefetching physical pages in a processor
US20090193240A1 (en) Method and apparatus for increasing thread priority in response to flush information in a multi-threaded processor of an information handling system
US11194575B2 (en) Instruction address based data prediction and prefetching
US20220237478A1 (en) Prediction circuitry
CN117785294A (en) Branch prediction method and system, instruction fetch control module, processor and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination