CN115480826B - Branch predictor, branch prediction method, branch prediction device and computing equipment - Google Patents

Branch predictor, branch prediction method, branch prediction device and computing equipment Download PDF

Info

Publication number
CN115480826B
CN115480826B CN202211149347.3A CN202211149347A CN115480826B CN 115480826 B CN115480826 B CN 115480826B CN 202211149347 A CN202211149347 A CN 202211149347A CN 115480826 B CN115480826 B CN 115480826B
Authority
CN
China
Prior art keywords
branch
burst
instruction
cache
target cache
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211149347.3A
Other languages
Chinese (zh)
Other versions
CN115480826A (en
Inventor
张克松
李桥
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Haiguang Information Technology Co Ltd
Original Assignee
Haiguang Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Haiguang Information Technology Co Ltd filed Critical Haiguang Information Technology Co Ltd
Priority to CN202211149347.3A priority Critical patent/CN115480826B/en
Publication of CN115480826A publication Critical patent/CN115480826A/en
Application granted granted Critical
Publication of CN115480826B publication Critical patent/CN115480826B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3802Instruction prefetching
    • G06F9/3804Instruction prefetching for branches, e.g. hedging, branch folding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3802Instruction prefetching
    • G06F9/3814Implementation provisions of instruction buffers, e.g. prefetch buffer; banks

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Advance Control (AREA)
  • Memory System Of A Hierarchy Structure (AREA)

Abstract

The present disclosure provides a branch predictor, a branch prediction method, an apparatus, and a computing device. The branch predictor includes: a burst branch target cache, a general branch target cache, and a training fill filter, wherein the training fill filter is configured to: obtaining a training branch instruction and determining whether the training branch instruction meets a burst condition; storing information of the training branch instruction in a burst branch target cache in response to meeting a burst condition; and storing information of the training branch instruction in the general branch target cache in response to the burst condition not being met, wherein the burst branch target cache and the information stored in the general branch target cache are used together for branch prediction.

Description

Branch predictor, branch prediction method, branch prediction device and computing equipment
Technical Field
The present disclosure relates to branch prediction computation, and in particular, to a branch predictor, a branch prediction method, an apparatus, and a computing device.
Background
For branch prediction techniques, the main design is to store the information of the branch instruction in the branch target cache (Branch Target Buffer, BTB). In the current design, BTBs are mostly multi-way group connection structures, and instruction address information (or instruction addresses after hashing) is generally used as an index address. Meanwhile, instruction address information (or an instruction address after hash) is used as a tag for comparison to determine whether the instruction address hits in a certain entry (entry) of the BTB. With the development of big data and other applications, the pressure of the instruction end of the pipeline becomes larger, and in the process of branch prediction, some branch instructions with fewer occurrence times and burst property exist. In current product designs, storing such bursty instructions in the BTB along with other non-bursty instructions reduces branch prediction results.
Disclosure of Invention
Some embodiments of the present disclosure provide a branch predictor, a branch prediction method, a branch prediction apparatus, and a computing device, which are used for storing burst branch instructions in a specially configured burst branch target cache area, so as to distinguish the burst branch instructions from normal branch target cache areas in which non-burst branch instructions are located, so that training of the branch instructions with burst characteristics can be filtered, damage to the trained branch instructions is reduced, and branch prediction accuracy is improved, thereby further improving performance of the computing device.
According to an aspect of the present disclosure, there is provided a branch predictor including: a burst branch target cache, a general branch target cache, and a training fill filter, wherein the training fill filter is configured to: obtaining a training branch instruction and determining whether the training branch instruction meets a burst condition; storing information of the training branch instruction in a burst branch target cache in response to meeting a burst condition; information for training branch instructions is stored in the general branch target cache in response to not meeting the burst condition, wherein the burst branch target cache and the information stored in the general branch target cache are used together for branch prediction.
According to some embodiments of the present disclosure, the burst condition includes one or more of the following: the training branch instruction is an absolute jump instruction; training a target jump address of a branch instruction to cross pages; and the occurrence frequency of the training branch instruction is smaller than the first preset times.
According to some embodiments of the present disclosure, the general branch target cache region includes two levels of caches, denoted as a primary cache and a secondary cache, respectively.
According to some embodiments of the present disclosure, for branch instructions stored in a burst branch target cache, information of branch instructions satisfying a migration condition is stored in a secondary cache of a general branch target cache, and information of branch instructions satisfying the migration condition is deleted from the burst branch target cache, wherein the burst branch target cache performs a branch instruction prediction process as a cache peer with the secondary cache.
According to some embodiments of the present disclosure, a general branch target cache is configured to: obtaining a predicted branch instruction; searching a first level cache in a general branch target cache area by using an instruction address of a predicted branch instruction to obtain a first predicted result, and sending the first predicted result to a second level cache; and looking up a second level cache in the general branch target cache using the instruction address of the predicted branch instruction and comparing with the first predicted result to obtain a second predicted result, wherein the burst branch target cache is configured to: searching the burst branch target cache by using the instruction address of the predicted branch instruction to obtain a third predicted result, wherein the branch predictor further comprises a determiner configured to: a final branch prediction result is determined based on the first prediction result, the second prediction result, and the third prediction result.
According to some embodiments of the present disclosure, for branch instructions stored in a burst branch target cache, information of branch instructions satisfying a migration condition is stored in a first level cache of a general branch target cache, and information of branch instructions satisfying the migration condition is deleted from the burst branch target cache, wherein the burst branch target cache performs a branch instruction prediction process as a cache in a same level as the first level cache.
According to some embodiments of the present disclosure, the migration conditions include one or more of the following: the occurrence frequency of the branch instruction is greater than or equal to a second preset number of times; alternatively, the branch instruction is not an absolute jump instruction; alternatively, the jump of the branch instruction does not cross an address page.
According to some embodiments of the disclosure, the branch predictor further comprises a burst filter, wherein the burst filter is located between the first level cache and the burst branch target cache and is configured to: in the event of a miss to the first level cache, determining whether an instruction address of a predicted branch instruction hits the burst filter prior to locating the burst branch target cache region with the instruction address; in the event a hit is determined, the burst branch target cache is caused to look up using the instruction address of the predicted branch instruction. As an example, it may be determined whether the instruction address hits in index information of a branch instruction in a burst branch target cache stored in the burst filter to reduce unnecessary accesses to the burst branch target cache, thereby reducing power consumption.
According to some embodiments of the present disclosure, in the case where the lookup hits the burst branch target cache, utilizing a result of the burst branch target cache as a branch prediction result, and when a frequency of occurrence of branch instructions in the burst branch target cache is greater than or equal to a second preset number of times, deleting the branch instructions from the burst branch target cache and migrating the branch instructions to the second level cache in the general branch target cache; and searching the secondary cache in the general branch target cache area under the condition that the searching miss occurs in the burst branch target cache area.
According to some embodiments of the disclosure, the branch predictor further comprises a burst filter, wherein the burst filter is located before the burst branch target cache and is configured to: determining whether an instruction address of a predicted branch instruction hits in the burst filter prior to locating the burst branch target cache using the instruction address; in the event a hit is determined, the burst branch target cache is caused to look up using the instruction address of the predicted branch instruction.
According to some embodiments of the present disclosure, in the case where the lookup hits the burst branch target cache, utilizing a result of the burst branch target cache as a branch prediction result, and when a frequency of occurrence of branch instructions in the burst branch target cache is greater than or equal to a second preset number of times, deleting the branch instructions from the burst branch target cache and migrating the branch instructions to the level one cache in the general branch target cache; and searching the first-level cache in the general branch target cache area under the condition that the searching miss occurs in the burst branch target cache area.
According to another aspect of the present disclosure, there is also provided a branch prediction method including: obtaining a training branch instruction and determining whether the training branch instruction meets a burst condition; storing information of the training branch instruction in a burst branch target cache in response to meeting a burst condition; information for training branch instructions is stored in the general branch target cache in response to not meeting the burst condition, wherein the burst branch target cache and the information stored in the general branch target cache are used together for branch prediction.
According to some embodiments of the present disclosure, the burst condition includes one or more of the following: the training branch instruction is an absolute jump instruction; training a target jump address of a branch instruction to cross pages; and the occurrence frequency of the training branch instruction is smaller than the first preset times.
According to some embodiments of the present disclosure, the general branch target cache region includes two levels of caches, denoted as a primary cache and a secondary cache, respectively.
According to some embodiments of the disclosure, the method further comprises: and storing the information of the branch instruction meeting the migration condition into a secondary cache of the general branch target cache for the branch instruction stored in the burst branch target cache, and deleting the information of the branch instruction meeting the migration condition from the burst branch target cache, wherein the burst branch target cache is used as a cache of a same level as the secondary cache to participate in the prediction process of the branch instruction.
According to some embodiments of the disclosure, the method further comprises: obtaining a predicted branch instruction; searching a first level cache in a general branch target cache area by using an instruction address of a predicted branch instruction to obtain a first predicted result, and sending the first predicted result to a second level cache; searching a second level cache in the general branch target cache area by utilizing the instruction address of the predicted branch instruction and comparing the second level cache with the first predicted result to obtain a second predicted result; searching a burst branch target cache area by using an instruction address of a predicted branch instruction to obtain a third predicted result; and determining a final branch prediction result based on the first prediction result, the second prediction result, and the third prediction result.
According to some embodiments of the disclosure, the method further comprises: and storing the information of the branch instruction meeting the migration condition into a first-level cache of the general branch target cache for the branch instruction stored in the burst branch target cache, and deleting the information of the branch instruction meeting the migration condition from the burst branch target cache, wherein the burst branch target cache is used as a cache of the same level as the first-level cache to participate in the prediction process of the branch instruction.
According to some embodiments of the present disclosure, the migration conditions include one or more of the following: the occurrence frequency of the branch instruction is greater than or equal to a second preset number of times; alternatively, the branch instruction is not an absolute jump instruction, or the jump of the branch instruction does not span an address page.
According to some embodiments of the disclosure, the method further comprises: before searching the burst branch target cache area by utilizing the instruction address of the predicted branch instruction, determining whether the instruction address hits a burst filter, wherein the burst filter comprises index information of the branch instruction in the burst branch target cache area; in the event that a hit to the burst filter is determined, the burst branch target cache is looked up using the instruction address of the predicted branch instruction. According to embodiments of the present disclosure, a burst filter is used as a means for saving power consumption, and only if it is determined that the burst filter is hit, the burst branch target cache is searched using the instruction address of the predicted branch instruction, thereby further reducing the power consumption of branch prediction.
According to yet another aspect of the present disclosure, there is also provided an apparatus for performing branch prediction, the apparatus including a branch prediction unit. The branch prediction unit is configured to: obtaining a training branch instruction and determining whether the training branch instruction meets a burst condition; storing information of the training branch instruction in a burst branch target cache in response to meeting a burst condition; information for training branch instructions is stored in the general branch target cache in response to not meeting the burst condition, wherein the burst branch target cache and the information stored in the general branch target cache are used together for branch prediction.
According to some embodiments of the present disclosure, the burst condition includes one or more of the following: the training branch instruction is an absolute jump instruction; training a target jump address of a branch instruction to cross pages; and the occurrence frequency of the training branch instruction is smaller than the first preset times.
According to some embodiments of the present disclosure, the general branch target cache region includes two levels of caches, denoted as a primary cache and a secondary cache, respectively.
According to some embodiments of the disclosure, the branch prediction unit is further configured to: and storing the information of the branch instruction meeting the migration condition into a secondary cache of the general branch target cache for the branch instruction stored in the burst branch target cache, and deleting the information of the branch instruction meeting the migration condition from the burst branch target cache, wherein the burst branch target cache is used as a cache of a same level as the secondary cache to participate in the prediction process of the branch instruction.
According to some embodiments of the disclosure, the branch prediction unit is further configured to: obtaining a predicted branch instruction; searching a first level cache in a general branch target cache area by using an instruction address of a predicted branch instruction to obtain a first predicted result, and sending the first predicted result to a second level cache; searching a second level cache in the general branch target cache area by utilizing the instruction address of the predicted branch instruction and comparing the second level cache with the first predicted result to obtain a second predicted result; searching a burst branch target cache area by using an instruction address of a predicted branch instruction to obtain a third predicted result; and determining a final branch prediction result based on the first prediction result, the second prediction result, and the third prediction result.
According to some embodiments of the disclosure, the branch prediction unit is further configured to: and storing the information of the branch instruction meeting the migration condition into a first-level cache of the general branch target cache for the branch instruction stored in the burst branch target cache, and deleting the information of the branch instruction meeting the migration condition from the burst branch target cache, wherein the burst branch target cache is used as a cache of the same level as the first-level cache to participate in the prediction process of the branch instruction.
According to some embodiments of the present disclosure, the migration conditions include one or more of the following: the occurrence frequency of the branch instruction is greater than or equal to a second preset number of times; alternatively, the branch instruction is not an absolute jump instruction; alternatively, the jump of the branch instruction does not cross an address page.
According to some embodiments of the disclosure, the branch prediction unit is further configured to: before searching the burst branch target cache area by utilizing the instruction address of the predicted branch instruction, determining whether the instruction address hits a burst filter, wherein the burst filter comprises index information of the branch instruction in the burst branch target cache area; in the event that a hit to the burst filter is determined, the burst branch target cache is looked up using the instruction address of the predicted branch instruction. According to embodiments of the present disclosure, a burst filter is used as a means for saving power, and as an example, the index information may be high order bits of the branch instruction address, and specifically, only if it is determined that the burst filter is hit, the burst branch target cache is searched using the instruction address of the predicted branch instruction, thereby further reducing the power consumption of the branch prediction.
According to yet another aspect of the present disclosure, there is also provided a computing device, comprising: a processor; and a memory, wherein the memory has stored therein computer readable code which, when executed by the processor, performs the branch prediction method as described above.
According to the branch predictor, the branch prediction method, the branch prediction device and the computing equipment provided by the embodiment of the invention, the information of the training branch instruction meeting the burst condition is stored in the burst branch target cache area, the information of the training branch instruction not meeting the burst condition is stored in the general branch target cache area, and the information stored in the burst branch target cache area and the general branch target cache area are used together for branch prediction, so that some branch instructions with burst properties are stored separately from the conventional branch instructions, the information of the burst branch instructions can be used for prediction, the influence of the burst branch instructions on the conventional branch prediction performance can be avoided, the branch prediction accuracy is improved, and the processing performance is improved. In addition, storing burst branch instructions separately from regular branch instructions can also avoid frequent replacement of information in regular BTBs, thereby reducing system power consumption.
Drawings
In order to more clearly illustrate the embodiments of the present disclosure or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present disclosure, and other drawings may be obtained according to these drawings without inventive effort to a person of ordinary skill in the art.
FIG. 1 shows a schematic diagram of a branch predictor in the related art;
FIG. 2 illustrates a schematic diagram of a branch predictor according to some embodiments of the present disclosure;
FIG. 3 illustrates a schematic diagram of a burst branch target cache according to some embodiments of the present disclosure;
FIG. 4 illustrates another schematic diagram of a branch predictor according to some embodiments of the present disclosure;
FIG. 5 illustrates yet another schematic diagram of a branch predictor according to some embodiments of the present disclosure;
FIG. 6 illustrates a data format schematic of a burst filter according to some embodiments of the present disclosure;
FIG. 7 illustrates yet another schematic diagram of a branch predictor according to some embodiments of the present disclosure;
FIG. 8 illustrates a schematic flow diagram of a branch prediction method according to some embodiments of the present disclosure;
FIG. 9 illustrates a schematic block diagram of an apparatus to perform branch prediction according to some embodiments of the present disclosure;
fig. 10 illustrates a schematic block diagram of a computing device, according to some embodiments of the disclosure.
Detailed Description
The technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the drawings in the embodiments of the present disclosure. It will be apparent that the described embodiments are merely embodiments of a portion, but not all, of the present disclosure. All other embodiments, which can be made by one of ordinary skill in the art without the need for inventive faculty, are intended to be within the scope of the present disclosure, based on the embodiments in this disclosure.
Furthermore, as shown in the present disclosure and claims, unless the context clearly indicates otherwise, the words "a," "an," "the," and/or "the" are not specific to the singular, but may include the plural. The terms "first," "second," and the like, as used in this disclosure, do not denote any order, quantity, or importance, but rather are used to distinguish one element from another. Likewise, the word "comprising" or "comprises", and the like, means that elements or items preceding the word are included in the element or item listed after the word and equivalents thereof, but does not exclude other elements or items. The terms "connected" or "connected," and the like, are not limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect.
For branch prediction techniques, current designs store information of the branch instruction in a branch target cache (Branch Target Buffer, BTB). Meanwhile, because of considering the time sequence problem in the hardware circuit design, after the capacity of the BTB reaches a certain limit, the whole BTB can adopt a multi-stage connection mode. By way of example, FIG. 1 shows a schematic diagram of a branch predictor in the related art, in which a 2-level BTB (shown as L1 and L2 BTB) connection is employed, the two-level BTBs being used to store branch instructions and make branch predictions, respectively.
Specifically, as shown in fig. 1, the branch predictor in the related art includes a prediction process and a training process in an application process, and for convenience of description, a branch instruction for the prediction process is denoted as a predicted branch instruction, and a branch instruction for the training process is denoted as a training branch instruction. In the prediction process, a predicted branch instruction is received by the L1 BTB at the first stage, the searching and the branch prediction process are carried out, then the L2 BTB at the second stage carries out branch prediction on the predicted branch instruction, and finally the final branch prediction result is obtained from the branch prediction results of the two stages. Because the BTB includes the previously found branch instruction information, if the BTB is hit, the branch prediction unit is considered to find a branch instruction, and the target address of the jump instruction is obtained as a branch prediction result by combining the branch direction prediction algorithm and the branch instruction type.
The ability of the branch predictor to make branch predictions may be derived from a training process. During the training process, training branch instructions are received for branch prediction training. For between L1 BTB and L2 BTB, instruction information stored in both may be updated during the training process, such as transferring some information from L1 BTB to L2 BTB, or transferring other information from L2 BTB to L1 BTB, which is not described in detail herein.
It will be appreciated that the capacity of the BTB and its storage affects the ability of branch prediction. In recent years, with the advent of big data and other applications, the pressure at the instruction end of the pipeline also becomes larger. The script of such application instructions is relatively large, while in such applications branch jump instructions such as bursty often occur, as an example, these jump instructions have the following characteristics: the jump span is large, for example, typically across address pages; absolute jump instructions, for example, both instruction jump direction and jump target are fixed; jump instructions occur less frequently, for example only a few times per 100M instruction.
For such branch instructions with bursty nature, storing with other non-bursty instructions in the original BTB structure would reduce the prediction performance. For example, the burst branch instructions may destroy the branch prediction information after having been trained in the BTB, resulting in reduced prediction accuracy, and further, as the prediction process proceeds, these burst branch instructions stored in the BTB will be replaced by other branch instructions, while frequent replacement of the BTB will result in increased additional power consumption.
It is noted that the present disclosure addresses the impact of such burst nature of branch instructions on the branch prediction process, providing a branch predictor that enables accurate branch prediction by storing branch instructions that meet burst conditions in a specially configured burst branch target cache (referred to as burst BTB) to distinguish from conventional branch target caches (e.g., L1 BTB and L2 BTB shown in fig. 1) in which non-burst branch instructions reside, further improving performance of the computing device.
FIG. 2 illustrates a schematic diagram of a branch predictor according to some embodiments of the present disclosure, and a structure of the branch predictor and a process for implementing the same provided according to some embodiments of the present disclosure will be described below in conjunction with FIG. 2.
A branch predictor according to some embodiments of the present disclosure may include: burst branch target cache, general branch target cache, and training Fill Filter (Fill Filter). Specifically, the burst branch target buffer is used as the burst BTB, and is used for storing a branch instruction meeting a burst condition in a branch prediction process, and is hereinafter referred to as a burst branch instruction. The general branch target cache may be, for example, for storing normal branch instructions, hereinafter referred to as non-burst branch instructions or normal branch instructions. According to embodiments of the present disclosure, burst branch target cache and branch instruction information stored in a general branch target cache are used together to make branch predictions.
According to some embodiments of the present disclosure, during training, the training filler filter may be configured to: a training branch instruction is obtained and it is determined whether the training branch instruction satisfies a burst condition. According to some embodiments of the present disclosure, the burst condition includes one or more of the following: training branch instructions are absolute jump instructions, e.g., both instruction jump direction and jump target are fixed; training the target jump address of the branch instruction to cross pages, namely indicating that the jump span of the instruction is larger; and the frequency of occurrence of the training branch instruction is less than a first preset number of times, for example, 1/2/4 times per 100M instruction, etc., for example, the first preset number of times is set to 1, 2 or 4, the specific values of which are not limited herein. As one implementation, the burst condition may include the above three. In addition, it is understood that other burst conditions may be set according to the actual application situation. It should be understood that the terms "first," "second," and the like, as used in this disclosure, do not denote any order, quantity, or importance, but rather are used to distinguish one element from another.
In accordance with some embodiments of the present disclosure, after determining whether the training branch instruction satisfies the burst condition, the training pad filter is further configured to: storing information of the training branch instruction in a burst branch target buffer in response to the burst condition being met, i.e., storing the branch instruction meeting the burst condition, i.e., the burst branch instruction, in a special burst BTB; and storing information of the training branch instruction in the general branch target cache in response to the burst condition not being met, i.e., storing the branch instruction that does not meet the burst condition, i.e., the non-burst branch instruction, in the conventional BTB. As described above, the information stored in the burst branch target cache and the general branch target cache are used together to make branch predictions.
Fig. 3 illustrates a schematic diagram of a burst branch target cache according to some embodiments of the present disclosure. The memory structure of the burst BTB according to an embodiment of the present disclosure will be described below with reference to fig. 3.
As an implementation, the burst BTB may be smaller in capacity, e.g., significantly smaller than the data capacity of a conventional BTB. Burst BTB is dedicated to storing information of burst branch instructions. As shown in fig. 3, the size 512SET x 8WAY is exemplified, which are shown as SET-0 to SET-511 and WAY-0 to WAY-7, respectively. Only the capacity setting of the burst BTB is illustrated here, and burst BTBs of any format or capacity may be set according to actual situations. Specifically, each memory unit may include a Branch instruction Tag (Tag), a Branch instruction Target address (Branch Target), an offset address (offset) of the Branch instruction relative to a Cache Line (Cache Line), and a hit count (hit count) of the Branch instruction. In which, due to the limitation of burst conditions, the branch instruction deposited in the burst BTB is generally considered to be an absolute jump instruction, in which case it is not necessary to provide a prediction unit of the corresponding branch type. Similar to conventional BTBs for storing non-burst branch instructions, burst BTBs may also employ a least recently used (Least Recently Used, LRU) replacement policy, wherein least recently unused instruction information is selected for elimination, i.e., replacement of burst BTBs by information of other branch instructions.
By using the branch predictor (for example, as shown in fig. 2) provided by the embodiment of the disclosure, by storing the information of the training branch instruction meeting the burst condition in the burst branch target buffer area and the information of the training branch instruction not meeting the burst condition in the general branch target buffer area, and using the information of the branch instructions stored in the burst branch target buffer area and the general branch target buffer area together for branch prediction, some branch instructions with burst properties are stored separately from the conventional branch instructions, so that the information of the burst branch instruction can be used for prediction, the influence of the burst branch instruction on the conventional branch prediction performance can be avoided, the accuracy of branch prediction is improved, and the processing performance is improved.
As one implementation, in a branch predictor according to some embodiments of the present disclosure, a general branch target cache region includes two levels of cache, denoted as a first level cache and a second level cache, respectively. In this partial implementation, conventional BTBs employ a two-level cache connection. It will be appreciated that in other implementations, the general purpose branch target cache may also employ other cache connections, including, for example, three levels of cache, etc. Embodiments of the present disclosure are not limited to the specific structure of the general branch target cache.
According to the branch predictor of some embodiments of the present disclosure, for branch instructions stored in a burst branch target cache, information of branch instructions satisfying a migration condition is stored in a secondary cache of a general branch target cache, and information of branch instructions satisfying the migration condition is deleted from the burst branch target cache. In these embodiments, the burst branch target cache is used as a cache in the same level as the secondary cache to conduct the branch instruction prediction process.
According to some embodiments of the present disclosure, the migration conditions include one or more of the following: the occurrence frequency of the branch instruction is greater than or equal to a second preset number of times; alternatively, the branch instruction is not an absolute jump instruction; alternatively, the jump of the branch instruction does not cross an address page. Specifically, FIG. 4 illustrates another schematic diagram of a branch predictor according to some embodiments of the present disclosure, and a training process for branch prediction and a prediction process according to some embodiments of the present disclosure will be described below in conjunction with FIG. 4. It should be understood that the terms "first," "second," and the like, as used in this disclosure, do not denote any order, quantity, or importance, but rather are used to distinguish one element from another.
As shown in fig. 4, the burst branch target cache (burst BTB) performs a branch instruction prediction process as a cache with the level of the second level cache, whereby, for the branch instructions stored in the burst BTB, information of the branch instruction satisfying the migration condition is stored in the second level cache (L2 BTB) of the general branch target cache during the training process.
According to the branch predictor of some embodiments of the present disclosure, the training stuffing filter set therein is used to perform filtering according to the characteristics of burst branch instructions, and at the same time, the hardware sets a configuration attribute, records the occurrence frequency of jump instructions, so as to migrate the instructions meeting the migration condition into the L2 BTB. For example, if the migration condition is defined to occur 2 times, that is, the second preset number of times is set to 1, then an entry with a "hit count" parameter greater than/equal to 2 in the burst BTB may be used as a non-burst branch instruction, and its information may be updated into the L2 BTB.
Referring to the training path shown in fig. 4, for training due to branch prediction errors, it is necessary to first find a training filler filter, train the branch instruction into the burst BTB if the branch instruction satisfies the characteristics of the burst branch instruction, i.e., satisfies the set burst condition, and otherwise train the branch instruction into the first-level cache L1 BTB of the general branch target cache.
As one implementation, the migration condition may include the following: case 1) when the burst condition is satisfied, but the hit number of the branch instruction is greater than/equal to the set hit number of the burst branch instruction, the branch instruction is updated from the burst BTB into the L2 BTB. This branch instruction stored in the L2BTB will be referred to as a non-burst branch instruction, i.e., to effect a burst/non-burst switch of instructions; case 2) when the branch instruction is found not to be an absolute jump instruction, then migrating the corresponding branch instruction in the burst BTB into the L2 BTB; case 3) when the jump of the branch instruction is found not to cross the address page, the corresponding branch instruction in the burst BTB is migrated into the L2 BTB.
It is important to note that during migration, each entry in the L2BTB may store more than 1 branch instruction information, while each entry in the burst BTB stores only 1 branch instruction information. Thus, for a branch instruction that needs to migrate to an L2BTB, it may first be determined whether to insert into an existing L2BTB table entry or to newly create a BTB table entry in the L2 BTB. This requires that each burst BTB to L2BTB data move, looking up the L2BTB entry, if hit, and comparing the offset address of the branch instruction with the Cache Line, if determine that the branch instruction can be replaced into the L2BTB, then updating the corresponding branch instruction in the burst BTB directly into the existing entry in the L2 BTB; if there is a miss, then the LRU approach of the L2BTB needs to be selected for depositing the branch instruction in the burst BTB.
It should be noted here that in other embodiments, the migrated branch instruction in the burst BTB may also be updated into the L1 BTB, as will be described below in connection with fig. 7.
In addition, for the two-stage cache connection mode including the L1 BTB and the L2 BTB in the conventional BTB, an instruction information update channel exists between the L1 BTB and the L2 BTB for exchanging data in the L1 BTB and the L2 BTB, which may refer to the information data exchange implementation mode of the two-stage BTB in the related art, and will not be described in detail herein.
According to the embodiment of the disclosure, since the burst BTB is specially set for the burst branch instruction, the burst BTB and an information transmission channel such as L2 BTB are provided to perform a training update process of the instruction.
In accordance with some embodiments of the present disclosure, during prediction by a branch predictor, a general branch target cache is configured to: obtaining a predicted branch instruction; searching a first level cache in a general branch target cache area by using an instruction address of a predicted branch instruction to obtain a first predicted result, and sending the first predicted result to a second level cache; and searching a second level cache in the general branch target cache area by utilizing the instruction address of the predicted branch instruction and comparing the second level cache with the first predicted result to obtain a second predicted result; and the burst branch target cache is configured to: searching the burst branch target cache by using the instruction address of the predicted branch instruction to obtain a third predicted result, wherein the branch predictor further comprises a determiner configured to: a final branch prediction result is determined based on the first prediction result, the second prediction result, and the third prediction result.
According to some embodiments of the present disclosure, burst BTB predicts as a cache in peer with L2 BTB, i.e. a training process is performed as a level 2 prediction unit.
Next, as shown in fig. 4, on the predicted path, for the received predicted branch instruction, the L1 BTB is first searched using the instruction address, and specific searching may refer to the branch prediction implementation step of the two-level cache in the related art, which will not be described in detail. The lookup result of the L1 BTB is then sent with the branch prediction pipeline into the stage 2 prediction component, i.e., into the L2 BTB and the burst BTB.
For L2 BTB, after the level 1 prediction of L1 BTB, the L2 BTB may then be looked up using the instruction address, and specific lookup may refer to a branch prediction implementation step in the related art with respect to a two-level cache, which is not described in detail. Then, the predicted result of the L2 BTB is compared with the predicted result of the L1 BTB, and the predicted result of the L2 BTB is obtained.
Further, for increased burst BTBs, after a level 1 prediction (i.e., L1 BTB) is passed, the burst BTB may be looked up with the instruction address, in accordance with an embodiment of the present disclosure. Therefore, according to the search results of the L2 BTB and the burst BTB, and simultaneously, the prediction result of the L1 BTB is combined, so that the final branch prediction result is obtained.
In the present disclosure, the burst condition restricts the type of the branch instruction to an absolute jump instruction, and thus, there is no prediction means of the branch instruction jump direction at the burst BTB level. In the case where the burst condition is a type of restricted branch instruction, the branch instruction in the burst BTB may default to a jump state.
Table 1 below lists the prediction result lookup table of the branch predictor shown in FIG. 4.
TABLE 1
In Table 1 above, offset_1 represents the offset address of the first branch instruction hitting L1 BTB relative to the Cache Line in which it resides; offset_2 represents the offset address of the first branch instruction hitting L2 BTB relative to the Cache Line in which it resides; offset_3 represents the offset address of the first branch instruction in the hit burst BTB relative to the Cache Line in which it resides.
Taking the first line of results in table 1 as an example, where T represents a hit of the branch instruction to the corresponding BTB, and further NT represents a miss. For example, in the analysis of 2-level prediction results: offset_1< = (offset_2; offset_3) indicates that the branch instruction address hits L1 BTB, L2 BTB and burst BTB, resulting in offset addresses offset_1, offset_2, offset_3, and offset_1< = (offset_2; offset_3), respectively, indicating offset_1< = offset_2, and offset_1< = offset_3. In this case, the prediction result of L1 BTB is taken as the final branch prediction result. For another example, for the case of offset_2< = (offset_1; offset_3), this means that offset_2< = offset_1, and offset_2< = offset_3, in which case the prediction result of L2 BTB is taken as the final branch prediction result. As one implementation, the determination of the branch prediction results for each stage of BTB described above in connection with Table 1 may be performed by the determiner shown in FIG. 4, and the resulting branch prediction results for the branch predictor may be obtained.
According to an example prediction process such as table 1 above, for a branch predictor including a burst BTB according to an embodiment of the present disclosure, the burst BTB can be used as a secondary cache to perform a branch prediction process, thereby implementing a process in which information stored in a burst branch target cache and a general branch target cache are used together to perform a branch prediction. It may be appreciated that in the embodiments according to the present disclosure, it is proposed to store burst branch instructions that satisfy burst conditions separately from normal branch instructions, so that not only training and prediction processes of normal BTBs can be ensured, but also information of the portion of burst branch instructions can participate in a branch prediction process, and in the training process, instructions in the burst BTBs can be migrated into, for example, an L2 BTB according to migration conditions, so as to implement instruction migration update.
According to some embodiments of the present disclosure, the branch predictor may further include a burst filter, which may be located between the first level cache and the burst branch target cache as a power consumption control unit and configured to: in the event of a miss in the first level cache, determining whether the instruction address hits a burst filter before searching the burst branch target cache using the instruction address of the predicted branch instruction; in the event of a hit, the burst branch target cache is caused to be looked up using the instruction address of the predicted branch instruction. As an example, it may be determined whether the instruction address hits in index information of a branch instruction in a burst branch target cache stored in the burst filter, e.g., the index information may be high order bits of the branch instruction, thereby reducing unnecessary access to the burst branch target cache.
According to some embodiments of the present disclosure, in the case of the lookup hitting the burst branch target cache, utilizing the result of the burst branch target cache as a branch prediction result, and when the frequency of occurrence of the branch instruction in the burst branch target cache is greater than or equal to a second preset number of times, deleting the branch instruction from the burst branch target cache and migrating the branch instruction to a second level cache in the general branch target cache; and in the event of the lookup miss burst branch target cache, then looking up a secondary cache in the general branch target cache. As described above, the second preset number of times may be used as a migration condition for migrating the branch instruction in the burst branch target cache to the general branch target cache.
FIG. 5 illustrates yet another schematic diagram of a branch predictor according to some embodiments of the present disclosure, wherein the implementation of a burst filter is shown.
In the scheme of the branch predictor shown in fig. 4, the burst BTB is used as a branch instruction prediction unit in the same stage as the L2BTB, and the lookup of the branch instruction is completed. In this case, before the prediction based on the burst BTB, since it is not known whether or not there is a burst branch instruction, it is necessary to search for the L2BTB and the burst BTB at the same time, which increases additional power consumption. In view of the above, a prediction unit of a burst branch instruction, i.e., a burst filter shown in fig. 5, is added to the branch predictor shown in fig. 4.
As shown in FIG. 5, a burst Filter (Lookup Filter) is added between the L1 BTB and the burst BTB, and the burst BTB is searched only after the instruction address hits the burst Filter high. The burst filter can be regarded as a coarse-grained burst BTB, the burst BTB aims at burst branch instruction conditions in a Cache Line, and each table entry of the burst filter contains a plurality of burst branch instruction conditions in the Cache Line. Specifically, each entry of the burst filter contains how many Cache lines can be configured by hardware, for example, 16, 32, 64 Cache lines can be set. For example, set to 16 Cache lines, each entry of the burst filter contains the bit-10 of the address to the upper address bits. Likewise, the burst filter is hit with bit-10 to high order bits of the predicted address.
Fig. 6 illustrates a data format schematic diagram of a burst filter according to some embodiments of the present disclosure. As shown in fig. 6, the burst filter includes 16 SETs, shown as SET-0 through SET-15, respectively. Wherein the number of entries of the burst filter is matched to the size of the burst BTB. In the example of fig. 6, the burst filter is set to contain 16 entries. Wherein each entry contains a Tag (Tag) and a number of burst branch instructions (Cold Branch Number) contained. The size of the Tag in the burst filter entries is related to how many burst BTBs of the Cache Line are selected, and as an example, when each burst filter entry contains 16 Cache lines, the length of the Tag is longest. The hit number parameter in the burst BTB is used to mark how many burst branch instructions are in the Cache lines contained in the burst filter entry. This requires that when training the burst BTB, the same training of the burst filter entries is required, i.e. the content of the burst filter entries is updated with the training process of the burst BTB. Notably, for each burst branch instruction, cold Branch Number +1 is in the burst filter entry, although this burst branch instruction may be hit multiple times. In the case where a branch instruction in a burst BTB is updated to an L2 BTB, cold Branch Number-1 in the corresponding burst filter entry. In the event of a miss to burst BTB, cold Branch Number-1 will be reached. When Cold Branch Number-! When=0, it can be considered that there is a burst branch instruction in the address field. When Cold Branch Number = 0, the entry will be set to the alternative state, will be overwritten by the subsequent burst filter entry.
By setting the above burst filter, after the first-stage prediction by the branch predictor, the filtering process can be performed first before the burst BTB is searched, for example, based on the index information of the burst branch instruction stored in the burst filter, the burst BTB is searched only after the instruction address hits the burst filter, and if the burst filter is not hit, the burst BTB is not searched. In this case, the second-level branch prediction is performed based on only the L2 BTB, and the second-level branch prediction is not performed based on the burst BTB, which is advantageous in avoiding the second-level branch prediction process for both the L2 BTB and the burst BTB each time, thereby reducing the branch prediction power consumption.
In the branch predictor described above in connection with fig. 4 and 5, the burst BTB is subjected to the prediction process as a level 2 BTB. Similarly, the burst BTB may also be predicted as a level 1 BTB.
A branch predictor according to some embodiments of the present disclosure, wherein a burst branch target cache is used as a cache in a level one cache to conduct a branch instruction prediction process. In these embodiments, for the branch instructions stored in the burst branch target cache, information of the branch instructions satisfying the migration condition is stored in the first level cache of the general branch target cache, and information of the branch instructions satisfying the migration condition is deleted from the burst branch target cache.
Similarly, the migration conditions include one or more of the following: the occurrence frequency of the branch instruction is greater than or equal to a second preset number of times; alternatively, the branch instruction is not an absolute jump instruction; alternatively, the jump of the branch instruction does not cross an address page.
In these embodiments, the branch predictor may further comprise a burst filter, wherein the burst filter is located before the burst branch target cache and is configured to: determining whether the instruction address hits in a burst filter before searching the burst branch target cache using the instruction address of the predicted branch instruction; in the event of a hit, the burst branch target cache is caused to be looked up using the instruction address of the predicted branch instruction. As an example, it may first be determined whether the instruction address hits in index information of a branch instruction in a burst branch target cache stored in the burst filter, which may be, for example, a high order bit of the branch instruction, thereby reducing unnecessary accesses to the burst branch target cache.
According to some embodiments of the present disclosure, in the case of the lookup hitting the burst branch target cache, utilizing the result of the burst branch target cache as a branch prediction result, and when the frequency of occurrence of the branch instruction in the burst branch target cache is greater than or equal to a second preset number of times, deleting the branch instruction from the burst branch target cache and migrating the branch instruction to a first level cache in the general branch target cache; and searching the first-level cache in the general branch target cache area under the condition that the searching is not in hit in the burst branch target cache area. As described above, the second preset number of times may be used as a migration condition for migrating the branch instruction in the burst branch target cache to the general branch target cache.
FIG. 7 illustrates yet another schematic diagram of a branch predictor according to some embodiments of the present disclosure, in which the burst BTB of FIG. 7 predicts as a level 1 BTB as compared to the branch predictor illustrated in FIG. 5, whereby, as illustrated in FIG. 7, for instruction information satisfying the migration condition, will be migrated from the burst BTB to the L1 BTB, the migration condition and the migration process of which may be referred to the above description and will not be repeated herein. In addition, the burst filter is located before the burst BTB, since the burst BTB performs branch prediction as a level 1 prediction unit. In contrast, the burst filter in FIG. 5 is located between the L1 BTB and the burst BTB, since the burst BTB performs the prediction process as a 2 nd stage prediction component after the L1 BTB. It will be appreciated that the specific implementation of the burst filter may be referred to the above description and will not be repeated here. The branch predictor structure shown in FIG. 7 is capable of faster branch prediction results than the branch predictor structure of FIG. 5.
With the branch predictor provided by some embodiments of the present disclosure, by storing burst branch instructions in a specially configured burst branch target cache (burst BTB) to distinguish from normal branch target caches (such as L1 BTB and L2 BTB) in which non-burst branch instructions reside, accurate branch prediction can be performed, thereby further improving branch prediction performance of a computing device.
According to another aspect of the present disclosure, a branch prediction method is also provided. FIG. 8 illustrates a schematic flow diagram of a branch prediction method according to some embodiments of the present disclosure.
As shown in fig. 8, a branch prediction method according to some embodiments of the present disclosure may include steps S101 and S102. In step S101, a training branch instruction is obtained, and it is determined whether the training branch instruction satisfies a burst condition. Next, at step S102, information of the training branch instruction is stored in the burst branch target cache area in response to the burst condition being satisfied; information for training branch instructions is stored in the general branch target cache in response to not meeting the burst condition, wherein the burst branch target cache and the information stored in the general branch target cache are used together for branch prediction. It is to be appreciated that the above steps S101-S102 may be, for example, steps performed by a training filler Filter (Fill Filter) in a branch predictor described in accordance with an embodiment of the present disclosure. Specifically, the Fill Filter can determine, during the training of the branch predictor, whether the training branch instruction meets the burst condition, and store the burst instruction meeting the burst condition in a specially configured burst branch target cache for use in performing branch prediction along with information stored in a conventional general branch target cache. By the implementation mode, the information of the burst branch instruction can be utilized for prediction, the influence of the burst branch instruction on the conventional branch prediction performance can be avoided, and the branch prediction accuracy is improved.
According to some embodiments of the present disclosure, the burst condition includes one or more of the following: the training branch instruction is an absolute jump instruction; training a target jump address of a branch instruction to cross pages; and the occurrence frequency of the training branch instruction is smaller than the first preset times.
According to some embodiments of the present disclosure, the general branch target cache region includes two levels of caches, denoted as a primary cache and a secondary cache, respectively.
According to some embodiments of the present disclosure, the branch prediction method may further include: and storing the information of the branch instruction meeting the migration condition into a secondary cache of the general branch target cache for the branch instruction stored in the burst branch target cache, and deleting the information of the branch instruction meeting the migration condition from the burst branch target cache, wherein the burst branch target cache is used as a cache of a same level as the secondary cache for carrying out a branch instruction prediction process.
According to some embodiments of the present disclosure, the branch prediction method may further include: obtaining a predicted branch instruction; searching a first level cache in a general branch target cache area by using an instruction address of a predicted branch instruction to obtain a first predicted result, and sending the first predicted result to a second level cache; searching a second level cache in the general branch target cache area by utilizing the instruction address of the predicted branch instruction and comparing the second level cache with the first predicted result to obtain a second predicted result; searching a burst branch target cache area by using an instruction address of a predicted branch instruction to obtain a third predicted result; and determining a final branch prediction result based on the first prediction result, the second prediction result, and the third prediction result.
According to some embodiments of the present disclosure, the branch prediction method may further include: and storing the information of the branch instruction meeting the migration condition into a first-level cache of the general branch target cache for the branch instruction stored in the burst branch target cache, and deleting the information of the branch instruction meeting the migration condition from the burst branch target cache, wherein the burst branch target cache is used as a cache in the same level as the first-level cache for the branch instruction prediction process.
According to some embodiments of the present disclosure, the migration conditions include one or more of the following: the occurrence frequency of the branch instruction is greater than or equal to a second preset number of times; alternatively, the branch instruction is not an absolute jump instruction; alternatively, the jump of the branch instruction does not cross an address page.
According to some embodiments of the present disclosure, the branch prediction method may further include: before searching the burst branch target cache area by utilizing the instruction address of the predicted branch instruction, determining whether the instruction address hits a burst filter, wherein the burst filter comprises index information of the branch instruction in the burst branch target cache area; in the event that a hit to the burst filter is determined, the burst branch target cache is looked up using the instruction address of the predicted branch instruction.
With respect to the specific implementation of the steps involved in the branch prediction method according to embodiments of the present disclosure, reference may be made to the branch predictors according to some embodiments of the present disclosure described above in connection with fig. 2-7, and the description thereof will not be repeated here. Similar branch prediction processes can be performed and similar technical effects achieved using the branch prediction methods of the present disclosure.
According to yet another aspect of the present disclosure, an apparatus for performing branch prediction is also provided. FIG. 9 illustrates a schematic block diagram of an apparatus to perform branch prediction according to an embodiment of the present disclosure.
As shown in fig. 9, an apparatus 1000 that performs branch prediction may include a branch prediction unit 1010, the branch prediction unit 1010 may be configured to: obtaining a training branch instruction and determining whether the training branch instruction meets a burst condition; storing information of the training branch instruction in a burst branch target cache in response to meeting a burst condition; information for training branch instructions is stored in the general branch target cache in response to not meeting the burst condition, wherein the burst branch target cache and the information stored in the general branch target cache are used together for branch prediction.
According to some embodiments of the present disclosure, the burst condition includes one or more of the following: the training branch instruction is an absolute jump instruction; training a target jump address of a branch instruction to cross pages; and the occurrence frequency of the training branch instruction is smaller than the first preset times.
According to some embodiments of the present disclosure, the general branch target cache region includes two levels of caches, denoted as a primary cache and a secondary cache, respectively.
According to some embodiments of the present disclosure, branch prediction unit 1010 may be further configured to: and storing the information of the branch instruction meeting the migration condition into a secondary cache of the general branch target cache for the branch instruction stored in the burst branch target cache, and deleting the information of the branch instruction meeting the migration condition from the burst branch target cache, wherein the burst branch target cache is used as a cache of a same level as the secondary cache for carrying out a branch instruction prediction process.
According to some embodiments of the present disclosure, branch prediction unit 1010 may be further configured to: obtaining a predicted branch instruction; searching a first level cache in a general branch target cache area by using an instruction address of a predicted branch instruction to obtain a first predicted result, and sending the first predicted result to a second level cache; searching a second level cache in the general branch target cache area by utilizing the instruction address of the predicted branch instruction and comparing the second level cache with the first predicted result to obtain a second predicted result; searching a burst branch target cache area by using an instruction address of a predicted branch instruction to obtain a third predicted result; and determining a final branch prediction result based on the first prediction result, the second prediction result, and the third prediction result.
According to some embodiments of the present disclosure, branch prediction unit 1010 may be further configured to: and storing the information of the branch instruction meeting the migration condition into a first-level cache of the general branch target cache for the branch instruction stored in the burst branch target cache, and deleting the information of the branch instruction meeting the migration condition from the burst branch target cache, wherein the burst branch target cache is used as a cache in the same level as the first-level cache for the branch instruction prediction process.
According to some embodiments of the present disclosure, the migration conditions include one or more of the following: the occurrence frequency of the branch instruction is greater than or equal to a second preset number of times; alternatively, the branch instruction is not an absolute jump instruction; alternatively, the jump of the branch instruction does not cross an address page.
According to some embodiments of the present disclosure, branch prediction unit 1010 may be further configured to: before searching the burst branch target cache area by utilizing the instruction address of the predicted branch instruction, determining whether the instruction address hits a burst filter, wherein the burst filter comprises index information of the branch instruction in the burst branch target cache area; in the event that a hit to the burst filter is determined, the burst branch target cache is looked up using the instruction address of the predicted branch instruction.
Specific implementation of the steps performed by the apparatus 1000 for performing branch prediction may refer to a branch predictor or a branch prediction method according to some embodiments of the present disclosure described above in conjunction with the accompanying drawings, and will not be repeated here.
According to yet another aspect of the present disclosure, a computing device is also provided. Fig. 10 illustrates a schematic block diagram of a computing device, according to some embodiments of the disclosure.
As shown in fig. 10, computing device 2000 may include a processor 2010 and a memory 2020. In accordance with an embodiment of the present disclosure, memory 2020 has stored therein computer readable code which, when executed by processor 2010, may perform a branch prediction method as described above.
Processor 2010 may perform various actions and processes in accordance with programs stored in memory 2020. In particular, processor 2010 may be an integrated circuit having signal processing capabilities. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. For example, a processor herein may refer to a CPU having a branch prediction function, where, for example, a branch predictor capable of executing some embodiments of the present disclosure is configured to enable accurate branch prediction by storing burst branch instructions in a specially configured burst branch target cache to distinguish from conventional branch target caches in which non-burst branch instructions reside, thereby improving computing performance of the computing device.
Memory 2020 stores computer-executable instruction code that, when executed by processor 2010, is used to implement a branch prediction method according to an embodiment of the present disclosure. The memory 2020 may be volatile memory or nonvolatile memory or may include both volatile and nonvolatile memory. It should be noted that the memory described herein may be any suitable type of memory. By way of example, a processor, such as a CPU, may be capable of fast branch prediction by executing computer-executable instruction code in memory 2020.
According to the branch predictor, the branch prediction method, the branch prediction device and the computing equipment provided by the embodiment of the invention, the information of the training branch instruction meeting the burst condition is stored in the burst branch target cache area, the information of the training branch instruction not meeting the burst condition is stored in the general branch target cache area, and the information stored in the burst branch target cache area and the general branch target cache area are used together for branch prediction, so that some branch instructions with burst properties are stored separately from the conventional branch instructions, the information of the burst branch instructions can be used for prediction, the influence of the burst branch instructions on the conventional branch prediction performance can be avoided, the branch prediction accuracy is improved, and the processing performance is improved. In addition, storing burst branch instructions separately from regular branch instructions also avoids frequent replacement of information in regular BTBs, thereby reducing system power consumption.
Those skilled in the art will appreciate that various modifications and improvements can be made to the disclosure. For example, the various devices or components described above may be implemented in hardware, or may be implemented in software, firmware, or a combination of some or all of the three.
Further, while the present disclosure makes various references to certain elements in a system according to embodiments of the present disclosure, any number of different elements may be used and run on a client and/or server. The units are merely illustrative and different aspects of the systems and methods may use different units.
A flowchart is used in this disclosure to describe the steps of a method according to an embodiment of the present disclosure. It should be understood that the steps that follow or before do not have to be performed in exact order. Rather, the various steps may be processed in reverse order or simultaneously. Also, other operations may be added to these processes.
Those of ordinary skill in the art will appreciate that all or a portion of the steps of the methods described above may be implemented by a computer program to instruct related hardware, and the program may be stored in a computer readable storage medium, such as a read only memory, a magnetic disk, or an optical disk. Alternatively, all or part of the steps of the above embodiments may be implemented using one or more integrated circuits. Accordingly, each module/unit in the above embodiment may be implemented in the form of hardware, or may be implemented in the form of a software functional module. The present disclosure is not limited to any specific form of combination of hardware and software.
Unless defined otherwise, all terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure pertains. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the relevant art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
The foregoing is illustrative of the present disclosure and is not to be construed as limiting thereof. Although a few exemplary embodiments of this disclosure have been described, those skilled in the art will readily appreciate that many modifications are possible in the exemplary embodiments without materially departing from the novel teachings and advantages of this disclosure. Accordingly, all such modifications are intended to be included within the scope of this disclosure as defined in the claims. It is to be understood that the foregoing is illustrative of the present disclosure and is not to be construed as limited to the specific embodiments disclosed, and that modifications to the disclosed embodiments, as well as other embodiments, are intended to be included within the scope of the appended claims. The disclosure is defined by the claims and their equivalents.

Claims (28)

1. A branch predictor, the branch predictor comprising: a burst branch target cache, a general branch target cache, and a training fill filter, wherein the training fill filter is configured to:
Obtaining a training branch instruction and determining whether the training branch instruction meets a burst condition;
storing information of the training branch instruction in the burst branch target cache in response to the burst condition being satisfied; and
storing information of the training branch instruction in the general branch target cache in response to the burst condition not being met, wherein the burst branch target cache and the information stored in the general branch target cache are used together for branch prediction.
2. The branch predictor as recited in claim 1, wherein the burst condition comprises one or more of:
the training branch instruction is an absolute jump instruction;
the target jump address of the training branch instruction spans pages; and
the frequency of occurrence of the training branch instruction is less than a first preset number of times.
3. The branch predictor as recited in claim 1, wherein the general branch target cache region comprises two levels of cache, denoted as a first level cache and a second level cache, respectively.
4. The branch predictor as set forth in claim 3, wherein, for branch instructions stored in said burst branch target cache, information of branch instructions satisfying a migration condition is stored in said secondary cache of said general branch target cache, and said information of branch instructions satisfying a migration condition is deleted from said burst branch target cache,
And the burst branch target cache area is used as a cache of the second-level cache level to conduct a branch instruction prediction process.
5. The branch predictor as recited in claim 4, wherein the general branch target cache is configured to: obtaining a predicted branch instruction; searching the first level cache in the general branch target cache area by utilizing the instruction address of the predicted branch instruction to obtain a first predicted result, and sending the first predicted result to the second level cache; and looking up the secondary cache in the general branch target cache region using the instruction address of the predicted branch instruction and comparing with the first predicted result to obtain a second predicted result, wherein the burst branch target cache region is configured to: searching the burst branch target cache by using the instruction address of the predicted branch instruction to obtain a third predicted result, wherein the branch predictor further comprises a determiner configured to:
a final branch prediction result is determined based on the first prediction result, the second prediction result, and the third prediction result.
6. The branch predictor as set forth in claim 3, wherein, for branch instructions stored in said burst branch target cache, information of branch instructions satisfying a migration condition is stored in said level one cache of said general branch target cache and information of said branch instructions satisfying a migration condition is deleted from said burst branch target cache,
And the burst branch target cache area is used as a cache of the same level as the first-level cache to conduct a branch instruction prediction process.
7. The branch predictor as claimed in claim 4 or 6, wherein the migration condition comprises one or more of:
the occurrence frequency of the branch instruction is larger than or equal to a second preset number of times; or,
the branch instruction is not an absolute jump instruction; or,
the jump of the branch instruction does not cross an address page.
8. The branch predictor as recited in claim 4, further comprising a burst filter, wherein the burst filter is located between the level one cache and the burst branch target cache and is configured to:
in the event of a miss to the first level cache, determining whether an instruction address of a predicted branch instruction hits the burst filter prior to locating the burst branch target cache region with the instruction address;
in the event a hit is determined, the burst branch target cache is caused to look up using the instruction address of the predicted branch instruction.
9. The branch predictor as claimed in claim 8, wherein in the event that the lookup hits in the burst branch target cache, utilizing the result of the burst branch target cache as a branch prediction result, and when the frequency of occurrence of branch instructions in the burst branch target cache is greater than or equal to a second predetermined number of times, deleting the branch instructions from the burst branch target cache and migrating the branch instructions to the second level cache in the general branch target cache; and
And searching the secondary cache in the general branch target cache area under the condition that the searching miss occurs in the burst branch target cache area.
10. The branch predictor as recited in claim 6, further comprising a burst filter, wherein the burst filter is located before the burst branch target cache and is configured to:
determining whether an instruction address of a predicted branch instruction hits in the burst filter prior to locating the burst branch target cache using the instruction address;
in the event a hit is determined, the burst branch target cache is caused to look up using the instruction address of the predicted branch instruction.
11. The branch predictor as claimed in claim 10, wherein in the event that the lookup hits in the burst branch target cache, utilizing the result of the burst branch target cache as a branch prediction result, and when the frequency of occurrence of branch instructions in the burst branch target cache is greater than or equal to a second predetermined number of times, deleting the branch instructions from the burst branch target cache and migrating the branch instructions to the level one cache in the general branch target cache; and
And searching the first-level cache in the general branch target cache area under the condition that the searching miss occurs in the burst branch target cache area.
12. A method of branch prediction, the method comprising:
obtaining a training branch instruction and determining whether the training branch instruction meets a burst condition;
storing information of the training branch instruction in a burst branch target cache in response to the burst condition being satisfied; and
storing information of the training branch instruction in a general branch target cache in response to the burst condition not being met, wherein the burst branch target cache and the information stored in the general branch target cache are used together for branch prediction.
13. The method of claim 12, wherein the burst condition comprises one or more of:
the training branch instruction is an absolute jump instruction;
the target jump address of the training branch instruction spans pages; and
the frequency of occurrence of the training branch instruction is less than a first preset number of times.
14. The method of claim 12, wherein the general branch target cache region comprises two levels of cache, denoted as a first level cache and a second level cache, respectively.
15. The method of claim 14, wherein the method further comprises:
storing information of the branch instruction meeting the migration condition in the secondary cache of the general branch target cache for the branch instruction stored in the burst branch target cache, deleting the information of the branch instruction meeting the migration condition from the burst branch target cache,
and the burst branch target cache area is used as a cache of the second-level cache level to conduct a branch instruction prediction process.
16. The method of claim 15, wherein the method further comprises:
obtaining a predicted branch instruction;
searching the first level cache in the general branch target cache area by utilizing the instruction address of the predicted branch instruction to obtain a first predicted result, and sending the first predicted result to the second level cache;
searching the second level cache in the general branch target cache area by utilizing the instruction address of the predicted branch instruction and comparing the second level cache with the first predicted result to obtain a second predicted result;
searching the burst branch target buffer area by utilizing the instruction address of the predicted branch instruction to obtain a third predicted result; and
A final branch prediction result is determined based on the first prediction result, the second prediction result, and the third prediction result.
17. The method of claim 14, wherein the method further comprises:
storing information of the branch instruction meeting the migration condition in the first level cache of the general branch target cache for the branch instruction stored in the burst branch target cache, deleting the information of the branch instruction meeting the migration condition from the burst branch target cache,
and the burst branch target cache area is used as a cache of the same level as the first-level cache to conduct a branch instruction prediction process.
18. The method of claim 15 or 17, wherein the migration conditions include one or more of:
the occurrence frequency of the branch instruction is larger than or equal to a second preset number of times; or,
the branch instruction is not an absolute jump instruction; or,
the jump address of the branch instruction does not span an address page.
19. The method according to claim 15 or 17, characterized in that the method further comprises:
determining whether an instruction address of a predicted branch instruction hits in a burst filter before using the instruction address to find the burst branch target cache, wherein the burst filter includes index information of the branch instruction in the burst branch target cache;
And searching the burst branch target cache area by utilizing the instruction address of the predicted branch instruction under the condition that the burst filter is determined to be hit.
20. An apparatus for performing branch prediction, the apparatus comprising a branch prediction unit configured to:
obtaining a training branch instruction and determining whether the training branch instruction meets a burst condition;
storing information of the training branch instruction in a burst branch target cache in response to the burst condition being satisfied;
storing information of the training branch instruction in a general branch target cache in response to the burst condition not being met, wherein the burst branch target cache and the information stored in the general branch target cache are used together for branch prediction.
21. The apparatus of claim 20, wherein the burst condition comprises one or more of:
the training branch instruction is an absolute jump instruction;
the target jump address of the training branch instruction spans pages; and
the frequency of occurrence of the training branch instruction is less than a first preset number of times.
22. The apparatus of claim 20, wherein the general branch target cache region comprises two levels of cache, denoted as a first level cache and a second level cache, respectively.
23. The apparatus of claim 22, wherein the branch prediction unit is further configured to:
storing information of the branch instruction meeting the migration condition in the secondary cache of the general branch target cache for the branch instruction stored in the burst branch target cache, deleting the information of the branch instruction meeting the migration condition from the burst branch target cache,
and the burst branch target cache area is used as a cache of the second-level cache level to conduct a branch instruction prediction process.
24. The apparatus of claim 23, wherein the branch prediction unit is further configured to:
obtaining a predicted branch instruction;
searching the first level cache in the general branch target cache area by utilizing the instruction address of the predicted branch instruction to obtain a first predicted result, and sending the first predicted result to the second level cache;
searching the second level cache in the general branch target cache area by utilizing the instruction address of the predicted branch instruction and comparing the second level cache with the first predicted result to obtain a second predicted result;
Searching the burst branch target buffer area by utilizing the instruction address of the predicted branch instruction to obtain a third predicted result; and
a final branch prediction result is determined based on the first prediction result, the second prediction result, and the third prediction result.
25. The apparatus of claim 24, wherein the branch prediction unit is further configured to:
storing information of the branch instruction meeting the migration condition in the first level cache of the general branch target cache for the branch instruction stored in the burst branch target cache, deleting the information of the branch instruction meeting the migration condition from the burst branch target cache,
and the burst branch target cache area is used as a cache of the same level as the first-level cache to conduct a branch instruction prediction process.
26. The apparatus of claim 23 or 25, wherein the migration conditions include one or more of:
the occurrence frequency of the branch instruction is larger than or equal to a second preset number of times; or,
the branch instruction is not an absolute jump instruction; or,
the branch instruction jumps without crossing pages.
27. The apparatus of claim 23 or 25, wherein the branch prediction unit is further configured to:
determining whether an instruction address of a predicted branch instruction hits in a burst filter before using the instruction address to find the burst branch target cache, wherein the burst filter includes index information of the branch instruction in the burst branch target cache;
and searching the burst branch target cache area by utilizing the instruction address of the predicted branch instruction under the condition that the burst filter is determined to be hit.
28. A computing device, comprising:
a processor; and
a memory, wherein the memory has stored therein computer readable code which, when executed by the processor, performs the branch prediction method of any of claims 12-19.
CN202211149347.3A 2022-09-21 2022-09-21 Branch predictor, branch prediction method, branch prediction device and computing equipment Active CN115480826B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211149347.3A CN115480826B (en) 2022-09-21 2022-09-21 Branch predictor, branch prediction method, branch prediction device and computing equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211149347.3A CN115480826B (en) 2022-09-21 2022-09-21 Branch predictor, branch prediction method, branch prediction device and computing equipment

Publications (2)

Publication Number Publication Date
CN115480826A CN115480826A (en) 2022-12-16
CN115480826B true CN115480826B (en) 2024-03-12

Family

ID=84424367

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211149347.3A Active CN115480826B (en) 2022-09-21 2022-09-21 Branch predictor, branch prediction method, branch prediction device and computing equipment

Country Status (1)

Country Link
CN (1) CN115480826B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117008979B (en) * 2023-10-07 2023-12-26 北京数渡信息科技有限公司 Branch predictor

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102968294A (en) * 2006-12-08 2013-03-13 高通股份有限公司 Apparatus and methods for low-complexity instruction prefetch system
CN104423929A (en) * 2013-08-21 2015-03-18 华为技术有限公司 Branch prediction method and related device
CN111095201A (en) * 2017-09-19 2020-05-01 国际商业机器公司 Predicting a table of contents pointer value in response to a branch to a subroutine
CN112543916A (en) * 2018-07-09 2021-03-23 超威半导体公司 Multi-table branch target buffer

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10817298B2 (en) * 2016-10-27 2020-10-27 Arm Limited Shortcut path for a branch target buffer

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102968294A (en) * 2006-12-08 2013-03-13 高通股份有限公司 Apparatus and methods for low-complexity instruction prefetch system
CN104423929A (en) * 2013-08-21 2015-03-18 华为技术有限公司 Branch prediction method and related device
CN111095201A (en) * 2017-09-19 2020-05-01 国际商业机器公司 Predicting a table of contents pointer value in response to a branch to a subroutine
CN112543916A (en) * 2018-07-09 2021-03-23 超威半导体公司 Multi-table branch target buffer

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于跳转轨迹的分支目标缓冲研究;熊振亚;林正浩;任浩琪;;计算机科学(第03期);全文 *

Also Published As

Publication number Publication date
CN115480826A (en) 2022-12-16

Similar Documents

Publication Publication Date Title
JP6725671B2 (en) Adaptive Value Range Profiling for Extended System Performance
US10133679B2 (en) Read cache management method and apparatus based on solid state drive
US7533230B2 (en) Transparent migration of files among various types of storage volumes based on file access properties
US7904660B2 (en) Page descriptors for prefetching and memory management
US11113245B2 (en) Policy-based, multi-scheme data reduction for computer memory
CN109522243B (en) Metadata cache management method and device in full flash storage and storage medium
CN110737399B (en) Method, apparatus and computer program product for managing a storage system
US9342455B2 (en) Cache prefetching based on non-sequential lagging cache affinity
EP3430519B1 (en) Priority-based access of compressed memory lines in memory in a processor-based system
CN109643237B (en) Branch target buffer compression
US11163573B2 (en) Hierarchical metadata predictor with periodic updates
CN115480826B (en) Branch predictor, branch prediction method, branch prediction device and computing equipment
US9552304B2 (en) Maintaining command order of address translation cache misses and subsequent hits
CN114817651B (en) Data storage method, data query method, device and equipment
WO2023035654A1 (en) Offset prefetching method, apparatus for executing offset prefetching, computer device, and medium
US20070162728A1 (en) Information processing apparatus, replacing method, and computer-readable recording medium on which a replacing program is recorded
CN116991855B (en) Hash table processing method, device, equipment, medium, controller and solid state disk
CN112612728B (en) Cache management method, device and equipment
KR20220079493A (en) Speculative execution using a page-level tracked load sequencing queue
CN109992535B (en) Storage control method, device and system
CN112711564A (en) Merging processing method and related equipment
CN108874753B (en) Method and device for searching response of subject post and computer equipment
CN107506156B (en) Io optimization method of block device
US10423538B2 (en) Bandwidth efficient techniques for enabling tagged memories
CN117472798B (en) Cache way prediction method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant