CN117909254A - Branch prediction method, device and storage medium - Google Patents

Branch prediction method, device and storage medium Download PDF

Info

Publication number
CN117909254A
CN117909254A CN202410317554.8A CN202410317554A CN117909254A CN 117909254 A CN117909254 A CN 117909254A CN 202410317554 A CN202410317554 A CN 202410317554A CN 117909254 A CN117909254 A CN 117909254A
Authority
CN
China
Prior art keywords
branch
prediction
result
history
reading
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202410317554.8A
Other languages
Chinese (zh)
Other versions
CN117909254B (en
Inventor
洪志博
蔡晔
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Micro Core Technology Co ltd
Original Assignee
Beijing Micro Core Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Micro Core Technology Co ltd filed Critical Beijing Micro Core Technology Co ltd
Priority to CN202410317554.8A priority Critical patent/CN117909254B/en
Publication of CN117909254A publication Critical patent/CN117909254A/en
Application granted granted Critical
Publication of CN117909254B publication Critical patent/CN117909254B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/06Addressing a physical block of locations, e.g. base addressing, module addressing, memory dedication
    • G06F12/0615Address space extension

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Advance Control (AREA)

Abstract

The application provides a branch prediction method, a device and a storage medium, and relates to the technical field of data processing, wherein the method comprises the following steps: acquiring a global branch history and an instruction address of a branch instruction; folding the global branch history and the instruction address respectively, and carrying out hash operation on the folding result to obtain a hash result, wherein the hash result is used for reading a table look-up request of a logic table; inputting the hash result into a mapper to obtain a lookup request of a physical table, wherein the mapper is used for mapping the lookup request of a logical table into the lookup request of the physical table; and reading prediction information based on a lookup request of a physical table, and acquiring a branch prediction result. The method solves the problem of unbalanced load of different tables of the TAGE branch predictor, and improves the space utilization rate and the prediction accuracy of the TAGE branch predictor.

Description

Branch prediction method, device and storage medium
Technical Field
The present application relates to the field of data processing technologies, and in particular, to a method and apparatus for branch prediction, and a storage medium.
Background
A branch predictor is a digital circuit that guesses which branch will be executed before the end of execution of a branch instruction to improve the performance of the instruction pipeline of the processor.
When the traditional TAGE (TAgged GEometric History Length) branch predictor runs different application programs, the load of each table entry is different, and the situations of large load of a short history table, insufficient space, small load of a long history table and low utilization rate often exist. When the load proportion and the size proportion of the table are large in difference, space waste is easy to cause, and the prediction accuracy is low.
Disclosure of Invention
The present application aims to solve at least one of the technical problems in the related art to some extent.
Therefore, a first object of the present application is to propose a branch prediction method to improve the spatial utilization and prediction accuracy of a tag branch predictor.
A second object of the present application is to provide a branch prediction apparatus.
A third object of the present application is to propose an electronic device.
A fourth object of the present application is to propose a computer readable storage medium.
A fifth object of the application is to propose a computer programme product.
To achieve the above object, an embodiment of a first aspect of the present application provides a branch prediction method, including:
acquiring a global branch history and an instruction address of a branch instruction;
Folding the global branch history and the instruction address respectively, and carrying out hash operation on the folding result to obtain a hash result, wherein the hash result is used for reading a table look-up request of a logic table;
Inputting the hash result into a mapper to obtain a lookup request of a physical table, wherein the mapper is used for mapping the lookup request of a logic table into the lookup request of the physical table;
And reading prediction information based on the table look-up request of the physical table, and obtaining a branch prediction result.
To achieve the above object, an embodiment of a second aspect of the present application provides a branch prediction apparatus, including:
the acquisition module is used for acquiring the global branch history and the instruction address of the branch instruction;
The hash module is used for respectively folding the global branch history and the instruction address, and carrying out hash operation on the folding result to obtain a hash result, wherein the hash result is used for reading a table look-up request of a logic table;
The mapping module is used for inputting the hash result into the mapper to obtain a table look-up request of the physical table, wherein the mapper is used for mapping the table look-up request of the logical table into the table look-up request of the physical table;
And the prediction module is used for reading the prediction information based on the table look-up request of the physical table and obtaining a branch prediction result.
To achieve the above object, an embodiment of a third aspect of the present application provides an electronic device, including: a processor, and a memory communicatively coupled to the processor;
the memory stores computer-executable instructions;
The processor executes the computer-executed instructions stored in the memory to implement a branch prediction method according to the embodiment of the first aspect of the present application.
To achieve the above object, an embodiment of a fourth aspect of the present application provides a computer-readable storage medium having stored therein computer-executable instructions, which when executed by a processor, are configured to implement a branch prediction method according to the embodiment of the first aspect of the present application.
To achieve the above object, an embodiment of a fifth aspect of the present application provides a computer program product, including a computer program, which when executed by a processor implements a branch prediction method according to an embodiment of the first aspect of the present application.
The application provides a branch prediction method, a device and a storage medium, which are characterized in that a global branch history and an instruction address of a branch instruction are obtained; folding the global branch history and the instruction address respectively, and carrying out hash operation on the folding result to obtain a hash result, wherein the hash result is used for reading a table look-up request of a logic table; inputting the hash result into a mapper to obtain a lookup request of a physical table, wherein the mapper is used for mapping the lookup request of a logical table into the lookup request of the physical table; and reading prediction information based on a lookup request of a physical table, and acquiring a branch prediction result. The method solves the problem of unbalanced load of different tables of the TAGE branch predictor, and improves the space utilization rate and the prediction accuracy of the TAGE branch predictor.
Additional aspects and advantages of the application will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the application.
Drawings
The foregoing and/or additional aspects and advantages of the application will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings, in which:
FIG. 1 is a flow chart of a branch prediction method according to an embodiment of the present application;
FIG. 2 is a flow chart of another branch prediction method according to an embodiment of the present application;
FIG. 3 is a block diagram illustrating a branch prediction method according to an embodiment of the present application;
FIG. 4 is a schematic diagram of a branch prediction apparatus according to an embodiment of the present application.
Detailed Description
Embodiments of the present application are described in detail below, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to like or similar elements or elements having like or similar functions throughout. The embodiments described below by referring to the drawings are illustrative and intended to explain the present application and should not be construed as limiting the application.
The branch prediction method and apparatus according to the embodiments of the present application are described below with reference to the accompanying drawings.
FIG. 1 is a flow chart of a branch prediction method according to an embodiment of the present application.
When the traditional TAGE branch predictor runs different application programs, the load of each table entry is different, the conditions of large load, insufficient space, small load of a long history table and low utilization rate often exist in the short history table, the load is unbalanced, the condition that a new prediction table covers the history table possibly occurs, the TAGE branch predictor carries out branch prediction based on the history table, and the condition that the prediction is inaccurate occurs under the condition that the history table is covered.
In order to solve the above problems, there are two methods as follows:
1. and adjusting parameters according to the behaviors of the application program, and adjusting different sizes for different tables.
2. All tables use the same Static Random-Access Memory (SRAM), and the number of read ports and write ports of the SRAM is required to be consistent with the number of tag tables.
However, for the method 1, the targeted call cannot be applied to the behaviors of all application programs, and the situation of unbalanced load still exists in some application scenes; for method 2, implementation of the method requires a very large number of SRAM with read and write ports, which often requires a large area.
In view of the above problems, an embodiment of the present application provides a branch prediction method to improve the space utilization and prediction accuracy of a tag branch predictor, as shown in fig. 1, the branch prediction method includes the following steps:
step 101, obtaining global branch history and instruction address of branch instruction.
It should be noted that the tag predictor is a set of multiple prediction tables, and each entry of the prediction table includes a three-bit prediction counter for predicting the jump direction of the branch instruction. In addition, each prediction table has a global branch history, and the histories are different in size.
Note that, an instruction Counter (PC) is a component in the CPU, and is used to store an address of a current instruction to be executed. The instruction address (PC value) of the branch instruction is the address of the branch instruction to be executed.
Alternatively, the global branch history is obtained along with the instruction address of the branch instruction by reading the TAGE branch predictor.
Step 102, folding the global branch history and the instruction address, and performing hash operation on the folding result to obtain a hash result, where the hash result is used to read a table look-up request of the logic table.
When the length of the global branch history or the instruction address is greater than the target bit length, the global branch history or the instruction address needs to be folded to obtain a folding result of the target bit length.
Optionally, performing hash operation on the folded global branch history and the instruction address to obtain a hash result, wherein the hash result is used for requesting to read tag hash and index hash of the TAGE logic table.
It should be noted that the logical table is a table logical position when the tag interacts with the tag from outside, and the physical table is an SRAM in which the tag actually stores the prediction information. The conventional tag logical table and physical table are in one-to-one correspondence.
In the embodiment of the application, the table lookup request of the logic table is read through hash calculation, wherein the table lookup request is embodied through a plurality of tag hash and index hash of the logic table.
Step 103, inputting the hash result into a mapper to obtain a lookup request of the physical table, wherein the mapper is used for mapping the lookup request of the logical table to the lookup request of the physical table.
Optionally, the hash result is input into the mapper, dynamic mapping is implemented according to the hash result output by the hash calculation, and the table lookup request of the logic table is mapped into the table lookup request of the physical table.
In the embodiment of the application, the logic tables are mapped to the physical tables uniformly through the mapper, so that the opportunity that each logic table is mapped to each physical table is equal, and the problem of unbalanced load is avoided.
Step 104, reading the prediction information based on the lookup request of the physical table, and obtaining the branch prediction result.
Optionally, based on the lookup request of the physical table, the prediction information of the physical table can be read, the demapper maps the prediction information of the physical table back to the logic table, arbitrates and predicts the demapped prediction information, and reads the branch prediction result.
In this embodiment, the global branch history and the instruction address of the branch instruction are obtained; folding the global branch history and the instruction address respectively, and carrying out hash operation on the folding result to obtain a hash result, wherein the hash result is used for reading a table look-up request of a logic table; inputting the hash result into a mapper to obtain a lookup request of a physical table, wherein the mapper is used for mapping the lookup request of a logical table into the lookup request of the physical table; and reading prediction information based on a lookup request of a physical table, and acquiring a branch prediction result. The method solves the problem of unbalanced load of different tables of the TAGE branch predictor, and improves the space utilization rate and the prediction accuracy of the TAGE branch predictor.
In this embodiment, another branch prediction method is provided, and fig. 2 is a schematic flow chart of another branch prediction method provided in the embodiment of the present application.
As shown in fig. 2, the branch prediction method may include the steps of:
Step 201, a global branch history is obtained along with the instruction address of the branch instruction.
Step 201 may refer to the description of the corresponding steps in the foregoing embodiment, which is not repeated in this embodiment.
Step 202, folding the global branch history and the instruction address, and performing hash operation on the folding result to obtain a hash result, where the hash result is used to read a table look-up request of the logic table.
In the embodiment of the present application, the number of tag prediction tables in the tag branch predictor is m, where m is the power of 2, and the target bit length is determined based on the number of tag prediction tables, for example, the target bit length is determined to beThe global branch history and instruction address are segmented and folded for each w bits, respectively.
Wherein the global branch history is folded by:
Reading a plurality of history data with different lengths in the global branch history; dividing history data of any length into a plurality of fragments according to a target bit length w; and performing exclusive OR on a plurality of fragments corresponding to the history data with any length to obtain folding histories for folding the global branch histories.
Alternatively, a plurality of history data of different lengths may be read by:
Reading history data of a first preset proportion of the global branch history as first history data; reading history data of a first second preset proportion of the global branch history as second history data, wherein the second preset proportion is larger than the first preset proportion; and reading the global branch histories according to different preset proportions, and obtaining a plurality of history data with different lengths until the global branch histories are read as final history data.
For example, first the first 5% of the history data of the global branch history is read as the first history data, then the first 10% of the history data of the global branch history is read as the second data, and so on until all the data of the global branch history is read as the final history data.
For history data of any length, segmenting according to every w bits to obtain a plurality of fragments, performing exclusive OR on each fragment to obtain folding histories with the width of w, namely, for global branch histories, obtaining a plurality of folding histories with the width of w.
Wherein, folding is performed for the instruction address by:
Dividing an instruction address into a plurality of fragments according to a target bit length; and carrying out exclusive OR on each segment to obtain a folded address after folding the instruction address.
Similarly, the instruction address is segmented according to every w bits to obtain a plurality of segments, and exclusive or is carried out on each segment to obtain a folded address after the instruction address is folded.
Further, the folding history corresponding to the shortest first history data is denoted as h0, the folding PC and the folding history h0 are used as inputs of a mapping hash function, the mapping hash with the width w is obtained by exclusive-or output, the folding history corresponding to the second history data is denoted as h1, and the folding history corresponding to the final history data is denoted as hn, and h1-hn are exclusive-or-operated with the folding PC respectively to calculate a plurality of tag hash and index hash for reading the tag logic table.
Step 203, inputting the hash result into a mapper to obtain a lookup request of the physical table, where the mapper is configured to map the lookup request of the logical table to the lookup request of the physical table.
Optionally, the mapper maps the lookup request of the logical table to the lookup request of the physical table, and the hash result has a width w, which together may representAnd the value corresponds to m mapping relations. In the m mapping relationships, any one of the logic tables and the physical tables is in one-to-one correspondence relationship, and one-to-many or many-to-one relationship cannot occur.
As shown in fig. 3, a Mapper (Mapper) is added to the conventional tag branch predictor, after mapping hash is performed on the first folding history and the folding PC, the first folding history, the third folding history and the fourth folding history are input to the Mapper, and after exclusive or is performed on the second folding history, the third folding history and the fourth folding history and the folding PC, the mapping hash is input to the Mapper, and the table lookup request of the logic table is mapped into table lookup requests of different physical tables through the Mapper, that is, the one-to-one correspondence relationship between the logic table and the physical table is changed.
In order to equalize the mapping, the opportunities for each logical table to be mapped to each physical table are required to be equalized. In the embodiment of the application, a cyclic shifter or a multiplexer can be used to shift or select according to the hash result.
Step 204, based on the lookup request of the physical table, the prediction information is read in the physical table; inputting the prediction information into a demapper to obtain a demapping result; and carrying out arbitration and prediction based on the demapping result to obtain a branch prediction result.
Wherein the demapper is used for mapping the prediction information of the physical table back to the logical table.
Alternatively, the demapper may also use a cyclic shifter or a multiplexer to equalize the opportunities for each logical table to be mapped to each physical table.
As shown in fig. 3, after mapping the table lookup request of the logic table into the prediction information of the physical table by the mapper, inputting the table lookup request of the physical table into the physical table to read the prediction information, inputting the prediction information into the demapper (Demaper), mapping back to the logic table by the demapper, outputting the demapped prediction information, that is, the demapped result, arbitrating and predicting based on the demapped prediction information, and obtaining the branch prediction result (prediction).
In the embodiment of the application, the structure of the mapper and the demapper is simple enough, the addition of the mapping component in the traditional TAGE branch predictor does not bring too large delay increase, and the added area is small enough not to increase the pressure of hardware equipment.
After Hash mapping, the load can be effectively uniformly spread on each physical table, so that the SRAM of each TAGE is fully utilized, and the space utilization rate and the prediction accuracy of the TAGE branch predictor are improved.
Further, after simulation, the data set test carried out by the simulator is used, and the average MPKI (the number of mispredictions per thousand instruction branches) is reduced by 2.6%.
In this embodiment, the global branch history and the instruction address of the branch instruction are obtained; folding the global branch history and the instruction address respectively, and carrying out hash operation on the folding result to obtain a hash result, wherein the hash result is used for reading a table look-up request of a logic table; inputting the hash result into a mapper to obtain a lookup request of a physical table, wherein the mapper is used for mapping the lookup request of a logical table into the lookup request of the physical table; based on a table look-up request of a physical table, reading prediction information in the physical table; inputting the prediction information into a demapper to obtain a demapping result; and carrying out arbitration and prediction based on the demapping result to obtain a branch prediction result. The method solves the problem of unbalanced load of different tables of the TAGE branch predictor, and improves the space utilization rate and the prediction accuracy of the TAGE branch predictor.
In order to implement the above embodiment, the present application further provides a branch prediction apparatus.
FIG. 4 is a schematic diagram of a branch prediction apparatus according to an embodiment of the present application.
As shown in fig. 4, the branch prediction apparatus 400 includes: an acquisition module 401, a hash module 402, a mapping module 403 and a prediction module 404.
An acquiring module 401, configured to acquire a global branch history and an instruction address of a branch instruction;
The hash module 402 is configured to fold the global branch history and the instruction address, and perform hash operation on the folded result to obtain a hash result, where the hash result is used to read a table lookup request of the logic table;
The mapping module 403 is configured to input the hash result into the mapper to obtain a lookup request of the physical table, where the mapper is configured to map the lookup request of the logical table to the lookup request of the physical table;
the prediction module 404 is configured to read the prediction information based on the lookup request of the physical table, and obtain the branch prediction result.
Further, in one possible implementation of the embodiment of the present application, the prediction module 404 further includes:
the reading unit is used for reading a table look-up result in the physical table based on the table look-up request of the physical table;
The demapping unit is used for inputting the table lookup result into the demapper so as to obtain a demapping result, wherein the demapper is used for mapping the table lookup result of the physical table back to the logic table;
And the prediction unit is used for carrying out arbitration and prediction based on the demapping result to obtain a branch prediction result.
Further, in one possible implementation of the embodiment of the present application, the mapper and demapper use a cyclic shifter or a multiplexer.
Further, in one possible implementation of the embodiment of the present application, the hash module 402 is specifically configured to:
reading a plurality of history data with different lengths in the global branch history;
dividing history data of any length into a plurality of fragments according to the target bit length;
And performing exclusive OR on a plurality of fragments corresponding to the history data with any length to obtain folding histories for folding the global branch histories.
Reading history data of a first preset proportion of the global branch history as first history data;
reading history data of a first second preset proportion of the global branch history as second history data, wherein the second preset proportion is larger than the first preset proportion;
and reading the global branch histories according to different preset proportions, and obtaining a plurality of history data with different lengths until the global branch histories are read as final history data.
Dividing an instruction address into a plurality of fragments according to a target bit length;
and carrying out exclusive OR on each segment to obtain a folded address after folding the instruction address.
The target bit length is determined based on the number of tag prediction tables in the branch predictor.
It should be noted that the foregoing explanation of the embodiment of the branch prediction method is also applicable to the branch prediction apparatus of this embodiment, and will not be repeated here.
In the embodiment of the application, the global branch history and the instruction address of the branch instruction are obtained; folding the global branch history and the instruction address respectively, and carrying out hash operation on the folding result to obtain a hash result, wherein the hash result is used for reading a table look-up request of a logic table; inputting the hash result into a mapper to obtain a lookup request of a physical table, wherein the mapper is used for mapping the lookup request of a logical table into the lookup request of the physical table; and reading prediction information based on a lookup request of a physical table, and acquiring a branch prediction result. The method solves the problem of unbalanced load of different tables of the TAGE branch predictor, and improves the space utilization rate and the prediction accuracy of the TAGE branch predictor.
In order to achieve the above embodiment, the present application further provides an electronic device, including: a processor, and a memory communicatively coupled to the processor; the memory stores computer-executable instructions; the processor executes the computer-executable instructions stored by the memory to implement the branch prediction method provided by the foregoing embodiments.
In order to implement the above embodiment, the present application also proposes a computer-readable storage medium having stored therein computer-executable instructions that, when executed by a processor, are configured to implement the branch prediction method provided in the above embodiment.
In order to implement the above embodiments, the present application also proposes a computer program product comprising a computer program which, when executed by a processor, implements the branch prediction method provided by the above embodiments.
The processing of collecting, storing, using, processing, transmitting, providing, disclosing and the like of the personal information of the user in the application accords with the regulations of related laws and regulations and does not violate the popular regulations of the public order.
It should be noted that personal information from users should be collected for legitimate and reasonable uses and not shared or sold outside of these legitimate uses. In addition, such collection/sharing should be performed after receiving user informed consent, including but not limited to informing the user to read user agreements/user notifications and signing agreements/authorizations including authorization-related user information before the user uses the functionality. In addition, any necessary steps are taken to safeguard and ensure access to such personal information data and to ensure that other persons having access to the personal information data adhere to their privacy policies and procedures.
The present application contemplates embodiments that may provide a user with selective prevention of use or access to personal information data. That is, the present disclosure contemplates that hardware and/or software may be provided to prevent or block access to such personal information data. Once personal information data is no longer needed, risk can be minimized by limiting data collection and deleting data. In addition, personal identification is removed from such personal information, as applicable, to protect the privacy of the user.
In the foregoing description of embodiments, reference has been made to the terms "one embodiment," "some embodiments," "example," "a particular example," or "some examples," etc., meaning that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the application. In this specification, schematic representations of the above terms are not necessarily directed to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, the different embodiments or examples described in this specification and the features of the different embodiments or examples may be combined and combined by those skilled in the art without contradiction.
Furthermore, the terms "first," "second," and the like, are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include at least one such feature. In the description of the present application, the meaning of "plurality" means at least two, for example, two, three, etc., unless specifically defined otherwise.
Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and additional implementations are included within the scope of the preferred embodiment of the present application in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order from that shown or discussed, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the embodiments of the present application.
Logic and/or steps represented in the flowcharts or otherwise described herein, e.g., a ordered listing of executable instructions for implementing logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). In addition, the computer readable medium may even be paper or other suitable medium on which the program is printed, as the program may be electronically captured, via, for instance, optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner, if necessary, and then stored in a computer memory.
It is to be understood that portions of the present application may be implemented in hardware, software, firmware, or a combination thereof. In the above-described embodiments, the various steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. As with the other embodiments, if implemented in hardware, may be implemented using any one or combination of the following techniques, as is well known in the art: discrete logic circuits having logic gates for implementing logic functions on data signals, application specific integrated circuits having suitable combinational logic gates, programmable Gate Arrays (PGAs), field Programmable Gate Arrays (FPGAs), and the like.
Those of ordinary skill in the art will appreciate that all or a portion of the steps carried out in the method of the above-described embodiments may be implemented by a program to instruct related hardware, where the program may be stored in a computer readable storage medium, and where the program, when executed, includes one or a combination of the steps of the method embodiments.
In addition, each functional unit in the embodiments of the present application may be integrated in one processing module, or each unit may exist alone physically, or two or more units may be integrated in one module. The integrated modules may be implemented in hardware or in software functional modules. The integrated modules may also be stored in a computer readable storage medium if implemented in the form of software functional modules and sold or used as a stand-alone product.
The above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, or the like. While embodiments of the present application have been shown and described above, it will be understood that the above embodiments are illustrative and not to be construed as limiting the application, and that variations, modifications, alternatives and variations may be made to the above embodiments by one of ordinary skill in the art within the scope of the application.

Claims (10)

1.A method of branch prediction comprising the steps of:
acquiring a global branch history and an instruction address of a branch instruction;
Folding the global branch history and the instruction address respectively, and carrying out hash operation on the folding result to obtain a hash result, wherein the hash result is used for reading a table look-up request of a logic table;
Inputting the hash result into a mapper to obtain a lookup request of a physical table, wherein the mapper is used for mapping the lookup request of a logic table into the lookup request of the physical table;
And reading prediction information based on the table look-up request of the physical table, and obtaining a branch prediction result.
2. The method of claim 1, wherein the reading of prediction information based on the lookup request of the physical table to obtain a branch prediction result comprises:
Reading prediction information from the physical table based on a table look-up request of the physical table;
inputting the prediction information into a demapper to obtain a demapping result, wherein the demapper is used for mapping the prediction information of a physical table back to the logic table;
and carrying out arbitration and prediction based on the demapping result to obtain the branch prediction result.
3. The branch prediction method of claim 2, wherein the mapper and the demapper use a cyclic shifter or a multiplexer.
4. The branch prediction method as claimed in claim 1, wherein said folding the global branch history and the instruction address, respectively, comprises:
reading a plurality of history data of different lengths in the global branch history;
dividing history data of any length into a plurality of fragments according to the target bit length;
and performing exclusive OR on a plurality of fragments corresponding to the history data with any length to obtain a folding history for folding the global branch history.
5. The branch prediction method as claimed in claim 4, wherein said reading a plurality of history data of different lengths in said global branch history comprises:
Reading history data of a first preset proportion of the global branch history as first history data;
Reading history data of a first second preset proportion of the global branch history as second history data, wherein the second preset proportion is larger than the first preset proportion;
and reading the global branch histories according to different preset proportions, and obtaining a plurality of history data with different lengths until the global branch histories are read as final history data.
6. The branch prediction method as claimed in claim 1, wherein said folding the global branch history and the instruction address, respectively, comprises:
dividing the instruction address into a plurality of fragments according to a target bit length;
and carrying out exclusive OR on each segment to obtain a folded address after folding the instruction address.
7. The branch prediction method of claim 4 or 6, wherein the target bit length is determined based on a number of target prediction tables in the branch predictor.
8. A branch prediction apparatus, comprising:
the acquisition module is used for acquiring the global branch history and the instruction address of the branch instruction;
The hash module is used for respectively folding the global branch history and the instruction address, and carrying out hash operation on the folding result to obtain a hash result, wherein the hash result is used for reading a table look-up request of a logic table;
The mapping module is used for inputting the hash result into the mapper to obtain a table look-up request of the physical table, wherein the mapper is used for mapping the table look-up request of the logical table into the table look-up request of the physical table;
And the prediction module is used for reading the prediction information based on the table look-up request of the physical table and obtaining a branch prediction result.
9. The branch prediction apparatus of claim 8, wherein the prediction module further comprises:
the reading unit is used for reading the prediction information in the physical table based on the table look-up request of the physical table;
a demapping unit, configured to input the prediction information into a demapper to obtain a demapping result, where the demapper is configured to map the prediction information of the physical table back to the logical table;
and the prediction unit is used for carrying out arbitration and prediction based on the demapping result to obtain the branch prediction result.
10. A computer readable storage medium having stored therein computer executable instructions which when executed by a processor are adapted to carry out the method of any one of claims 1-7.
CN202410317554.8A 2024-03-20 2024-03-20 Branch prediction method, device and storage medium Active CN117909254B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410317554.8A CN117909254B (en) 2024-03-20 2024-03-20 Branch prediction method, device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410317554.8A CN117909254B (en) 2024-03-20 2024-03-20 Branch prediction method, device and storage medium

Publications (2)

Publication Number Publication Date
CN117909254A true CN117909254A (en) 2024-04-19
CN117909254B CN117909254B (en) 2024-05-31

Family

ID=90686355

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410317554.8A Active CN117909254B (en) 2024-03-20 2024-03-20 Branch prediction method, device and storage medium

Country Status (1)

Country Link
CN (1) CN117909254B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109213524A (en) * 2017-06-29 2019-01-15 英特尔公司 Fallout predictor for difficult predicted branches
US20190065196A1 (en) * 2017-08-24 2019-02-28 Qualcomm Incorporated Reduced logic level operation folding of context history in a history register in a prediction system for a processor-based system
CN112328306A (en) * 2020-11-06 2021-02-05 海光信息技术股份有限公司 Branch predictor isolation method, prediction method and branch predictor
US20210232400A1 (en) * 2020-01-29 2021-07-29 Arm Limited Branch predictor
US20230078582A1 (en) * 2021-09-16 2023-03-16 International Business Machines Corporation Neuron cache-based hardware branch prediction
US20230315475A1 (en) * 2022-03-30 2023-10-05 Advanced Micro Devices, Inc. Managing large tage histories
CN117130665A (en) * 2023-08-16 2023-11-28 中国科学院计算技术研究所 Method and system for predicting execution result of processor branch instruction

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109213524A (en) * 2017-06-29 2019-01-15 英特尔公司 Fallout predictor for difficult predicted branches
US20190065196A1 (en) * 2017-08-24 2019-02-28 Qualcomm Incorporated Reduced logic level operation folding of context history in a history register in a prediction system for a processor-based system
US20210232400A1 (en) * 2020-01-29 2021-07-29 Arm Limited Branch predictor
CN112328306A (en) * 2020-11-06 2021-02-05 海光信息技术股份有限公司 Branch predictor isolation method, prediction method and branch predictor
US20230078582A1 (en) * 2021-09-16 2023-03-16 International Business Machines Corporation Neuron cache-based hardware branch prediction
US20230315475A1 (en) * 2022-03-30 2023-10-05 Advanced Micro Devices, Inc. Managing large tage histories
CN117130665A (en) * 2023-08-16 2023-11-28 中国科学院计算技术研究所 Method and system for predicting execution result of processor branch instruction

Also Published As

Publication number Publication date
CN117909254B (en) 2024-05-31

Similar Documents

Publication Publication Date Title
CN111832050B (en) Paillier encryption scheme based on FPGA chip implementation for federal learning
CN109450956B (en) Network security evaluation method, system, medium, and computer system
US20190079976A1 (en) Optimized access for hierarchical low cardinality value synopsis in analytical databases
CN106570025B (en) Data filtering method and device
CN108108190B (en) Calculation method and related product
CN111064471B (en) Data processing method and device and electronic equipment
CN113360911A (en) Malicious code homologous analysis method and device, computer equipment and storage medium
CN112068958A (en) Bloom filter and data processing method
CN113867685A (en) Multiplier conversion method, device and equipment and readable storage medium
CN117909254B (en) Branch prediction method, device and storage medium
US11216275B1 (en) Converting floating point data into integer data using a dynamically adjusted scale factor
CN110780820A (en) Method and device for determining continuous storage space, electronic equipment and storage medium
CN114708138B (en) Network disk image watermark adding method and device, network disk and storage medium
CN107832021B (en) Electronic evidence fixing method, terminal equipment and storage medium
CN114816252A (en) Reading method and device of vehicle-mounted storage equipment, vehicle and storage medium
CN110741552B (en) Data processing device for checking limits and method for operating the same
CN117392038B (en) Medical image histogram equalization method and device, electronic equipment and storage medium
CN114629895B (en) File fragment breakpoint continuous transmission method, device, terminal equipment and medium
CN114895971B (en) Data loading method, device, terminal equipment and medium
CN116226854B (en) Malware detection method, system, readable storage medium and computer
CN116860180B (en) Distributed storage method and device, electronic equipment and storage medium
CN116756472B (en) Convolution operator computing device and method
CN116719483B (en) Data deduplication method, apparatus, storage device and computer readable storage medium
CN116596043B (en) Convolutional neural network calculation method, system, electronic equipment and storage medium
CN113068043B (en) PNG image compression method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant