CN117093271A - Branch instruction prefetching method and device - Google Patents

Branch instruction prefetching method and device Download PDF

Info

Publication number
CN117093271A
CN117093271A CN202311141969.6A CN202311141969A CN117093271A CN 117093271 A CN117093271 A CN 117093271A CN 202311141969 A CN202311141969 A CN 202311141969A CN 117093271 A CN117093271 A CN 117093271A
Authority
CN
China
Prior art keywords
branch instruction
instruction
lookup table
destination address
executed
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311141969.6A
Other languages
Chinese (zh)
Inventor
王圣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Yaoxin Electronic Technology Co ltd
Original Assignee
Shanghai Yaoxin Electronic Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Yaoxin Electronic Technology Co ltd filed Critical Shanghai Yaoxin Electronic Technology Co ltd
Priority to CN202311141969.6A priority Critical patent/CN117093271A/en
Publication of CN117093271A publication Critical patent/CN117093271A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3802Instruction prefetching
    • G06F9/3804Instruction prefetching for branches, e.g. hedging, branch folding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3867Concurrent instruction execution, e.g. pipeline or look ahead using instruction pipelines
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Advance Control (AREA)

Abstract

The application discloses a method and a device for prefetching a branch instruction, wherein the method comprises the following steps: when a lower-level Cache sends instruction line data containing a target instruction to a first-level Cache, detecting whether a branch instruction exists in the instruction line data; inquiring whether the detected branch instruction exists in a pre-constructed branch instruction lookup table when the branch instruction exists in the instruction line data; if the detected branch instruction exists in the branch instruction lookup table, determining a destination address corresponding to the detected branch instruction in the branch instruction lookup table according to the detected branch instruction; and sending a prefetch request for acquiring the destination address corresponding to the detected branch instruction to the lower-level Cache. The application can avoid the access failure of the branch instruction by prefetching the branch instruction in the Cache, thereby being beneficial to improving the performance of the processor.

Description

Branch instruction prefetching method and device
Technical Field
The present application relates to the field of computer technologies, and in particular, to a method and an apparatus for prefetching a branch instruction.
Background
In the process of executing the program by the processor, the program needs to be loaded into the memory to run, and then the program is loaded into the memory by the processor to execute, however, when the processor reads data from the memory, the processor and the memory have overlarge running speed difference, so that the performance of the processor is wasted. To solve this problem, a small but fast device, i.e. a Cache (Cache, also called a Cache), is added between the processor and the memory, which saves the processor time to read instructions and data.
In the traditional instruction fetching process, a multi-level Cache is generally adopted, an instruction fetching component in a processor can firstly access a first-level Cache, find an instruction in the first-level Cache, and immediately output the instruction to the instruction fetching component if an effective instruction exists in the first-level Cache; if the first-level Cache is not available, the first-level Cache is invalid and needs to be searched in the next-level Cache until a valid instruction is searched; if the Cache has no corresponding effective instruction, the memory is accessed continuously.
The branch instruction refers to an instruction capable of changing the program flow, and after the branch instruction is executed, the processor will go to execute an instruction at another memory address. In general, when an instruction pipeline needs to execute a branch instruction, a fetch unit inevitably pauses for waiting, which affects the execution performance of a processor.
Disclosure of Invention
In order to avoid the access failure of the branch instruction as far as possible and increase the access performance of a processor, the application provides a prefetching method and a prefetching device of the branch instruction.
Specifically, the technical scheme of the application is as follows:
in one aspect, the present application provides a method for prefetching a branch instruction, including:
when a lower-level Cache sends instruction line data containing a target instruction to a first-level Cache, detecting whether a branch instruction exists in the instruction line data containing the target instruction;
inquiring whether the detected branch instruction exists in a pre-constructed branch instruction lookup table when the branch instruction exists in the instruction line data containing the target instruction;
if the detected branch instruction exists in the branch instruction lookup table, determining a destination address corresponding to the detected branch instruction in the branch instruction lookup table according to the detected branch instruction;
sending a prefetch request for acquiring a destination address corresponding to the detected branch instruction to the lower Cache; when the detected branch instruction needs to be executed, the instruction line data containing the destination address corresponding to the detected branch instruction is directly obtained from the first-stage Cache.
In some embodiments, further comprising: after the instruction pipeline executes any branch instruction, updating the branch instruction lookup table according to the executed branch instruction and the destination address thereof.
In some embodiments, the updating the branch instruction lookup table according to the executed branch instruction and the destination address thereof includes:
querying the branch instruction lookup table for the presence of the same instruction as the executed branch instruction;
if the instruction which is the same as the executed branch instruction exists, the destination address corresponding to the executed branch instruction in the branch instruction lookup table is covered by the destination address corresponding to the executed branch instruction;
and if the instruction which is the same as the executed branch instruction does not exist, storing the executed branch instruction and the corresponding destination address thereof into the branch instruction lookup table.
In some embodiments, if the instruction identical to the executed branch instruction does not exist, storing the executed branch instruction and the corresponding destination address thereof in the branch instruction lookup table, including:
if the instruction which is the same as the executed branch instruction does not exist, judging whether a vacancy exists in the branch instruction lookup table;
when a vacancy exists in the branch instruction lookup table, adding the executed branch instruction and a destination address thereof into the branch instruction lookup table;
when the branch instruction lookup table does not have a vacancy, the executed branch instruction and the destination address thereof are used for replacing the instruction with the earliest update time and the corresponding destination address thereof in the branch instruction lookup table respectively.
In some embodiments, the branch instruction lookup table further includes instruction history data of each branch instruction, where the instruction history data includes all destination addresses of jumps after each branch instruction is executed and a number of jumps corresponding to each destination address; if the detected branch instruction exists in the branch instruction lookup table, determining, in the branch instruction lookup table, a destination address corresponding to the detected branch instruction according to the detected branch instruction, including:
if the detected branch instruction exists in the branch instruction lookup table, selecting a destination address with the largest number of hops from the instruction history data corresponding to the detected branch instruction as a destination address of the hops after the next execution of the detected branch instruction.
In another aspect, the present application provides a prefetch apparatus of a branch instruction, comprising:
the detection module is used for detecting whether a branch instruction exists in the instruction line data containing the target instruction when the lower-level Cache sends the instruction line data containing the target instruction to the first-level Cache;
a query module, configured to query a pre-built branch instruction lookup table for the presence of a detected branch instruction when the presence of the branch instruction in the instruction line data including the target instruction is detected;
a determining module, configured to determine, according to the detected branch instruction, a destination address corresponding to the detected branch instruction in the branch instruction lookup table if the detected branch instruction exists in the branch instruction lookup table;
the sending module is used for sending a pre-fetching request for acquiring a destination address corresponding to the detected branch instruction to the lower-level Cache; when the detected branch instruction needs to be executed, the instruction line data containing the destination address corresponding to the detected branch instruction is directly obtained from the first-stage Cache.
In some embodiments, further comprising: and the updating module is used for updating the branch instruction lookup table according to the executed branch instruction and the destination address thereof after the instruction pipeline executes any branch instruction.
In some embodiments, the update module comprises:
a query sub-module, configured to query the branch instruction lookup table for whether the same instruction as the executed branch instruction exists;
the covering sub-module is used for covering the destination address corresponding to the executed branch instruction in the branch instruction lookup table with the destination address corresponding to the executed branch instruction if the instruction which is the same as the executed branch instruction exists;
and the storage sub-module is used for storing the executed branch instruction and the corresponding destination address thereof into the branch instruction lookup table if the instruction which is the same as the executed branch instruction does not exist.
In some embodiments, the storage sub-module comprises:
a judging unit, configured to judge whether a vacancy exists in the branch instruction lookup table if the instruction identical to the executed branch instruction does not exist;
an adding unit, configured to add the executed branch instruction and a destination address thereof to the branch instruction lookup table when a vacancy exists in the branch instruction lookup table;
and the replacing unit is used for respectively replacing the instruction with the earliest update time and the corresponding destination address in the branch instruction lookup table by the executed branch instruction and the destination address thereof when the vacancy does not exist in the branch instruction lookup table.
In some embodiments, the branch instruction lookup table further includes instruction history data of each branch instruction, where the instruction history data includes all destination addresses of jumps after each branch instruction is executed and a number of jumps corresponding to each destination address;
and the determining module is used for selecting the destination address with the largest jump frequency from the instruction history data corresponding to the detected branch instruction as the destination address of the jump after the next execution of the detected branch instruction if the detected branch instruction exists in the branch instruction lookup table.
Compared with the prior art, the application has at least one of the following beneficial effects:
(1) In the application, in the process that the lower-level Cache sends instruction line data to the first-level Cache, the first-level Cache is used for carrying out branch prefetching, and the branch instruction possibly executed by a subsequent instruction pipeline is taken into the first-level Cache in advance. When the branch instruction needs to be executed subsequently in the instruction pipeline, corresponding instruction line data can be obtained in the first-stage Cache in time, and the instruction does not need to be taken out from the next-stage Cache or the memory, so that the instruction taking process is accelerated, cache access failure is avoided, and the access performance of a processor can be improved.
(2) According to the application, the branch instruction lookup table is pre-constructed, when the lower-level Cache sends data to the first-level Cache, the first-level Cache performs branch detection based on the branch instruction lookup table, and the destination address corresponding to the branch instruction possibly executed by the instruction pipeline is pre-fetched to the first-level Cache in advance, so that the waiting time of the instruction pipeline is reduced, and the access failure of the Cache is avoided.
(3) The application takes the instruction line data containing the destination address corresponding to the branch instruction into the cache before being executed, when the branch instruction is needed to be executed, the instruction line data containing the corresponding destination address can be quickly loaded into the processor from the cache for execution and operation, so as to accelerate the access rate of the processor and further improve the operation efficiency of the system.
(4) The application has the advantages that the branch instruction lookup table is provided with an updating mechanism, and the executed branch instruction is utilized to update the contents in the table in real time, so that the optimization of the branch instruction lookup table is realized, and the branch instruction lookup table can cover the common branch instruction.
(5) The present application optimizes the branch instruction lookup table based on the instruction history data. In the scheme, when the branch instruction lookup table is used, the destination address of the branch instruction is selected according to the instruction history data, so that the accuracy of the prefetching of the branch instruction is improved.
(6) The application can prefetch different types of branch instructions, can be suitable for processors with different architectures, and has wide application scenes.
Drawings
The above features, technical features, advantages and implementation of the present application will be further described in the following description of preferred embodiments with reference to the accompanying drawings in a clear and easily understood manner.
FIG. 1 is a flow diagram of one embodiment of a method of prefetching branch instructions of the present application;
FIG. 2 is a schematic diagram of a conventional fetching process described in one embodiment of a prefetch method of branch instructions of the present application;
FIG. 3 is a partial flow chart of another embodiment of a method of prefetching branch instructions of the present application;
FIG. 4 is a schematic instruction prefetch diagram of yet another embodiment of a branch instruction prefetch method of the present application;
FIG. 5 is a schematic diagram illustrating an embodiment of a prefetch apparatus of a branch instruction according to the present application.
Reference numerals illustrate:
the device comprises a detection module 10, a query module 20, a determination module 30 and a sending module 40.
Detailed Description
In the following description, for purposes of explanation and not limitation, specific details are set forth such as the particular system architecture, techniques, etc., in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the following description will explain the specific embodiments of the present application with reference to the accompanying drawings. It is evident that the drawings in the following description are only examples of the application, from which other drawings and other embodiments can be obtained by a person skilled in the art without inventive effort.
For simplicity of the drawing, only the parts relevant to the application are schematically shown in each drawing, and they do not represent the actual structure thereof as a product. Additionally, in order to simplify the drawing for ease of understanding, components having the same structure or function in some of the drawings are shown schematically with only one of them, or only one of them is labeled. Herein, "a" means not only "only this one" but also "more than one" case.
It should be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
In addition, in the description of the present application, the terms "first," "second," and the like are used merely to distinguish between descriptions and are not to be construed as indicating or implying relative importance.
Referring to FIG. 1 of the drawings, a method for prefetching branch instructions according to one embodiment of the present application includes the steps of:
s101, when a lower-level Cache sends instruction line data comprising a target instruction to a first-level Cache, detecting whether a branch instruction exists in the instruction line data comprising the target instruction.
Specifically, the Cache (Cache, also called Cache) is used for solving the problem of overlarge running speed difference between the processor and the memory, so that the time for the processor to read instructions and data can be saved. The multi-level Cache forms a high-speed data buffer area between the processor and the memory, and the closer to the Cache of the processor, the faster the running speed and the smaller the capacity.
When the processor accesses data, the processor searches for required data in the first-level Cache, cannot find the required data and then searches for the required data in other caches step by step, and takes the two-level Cache as an example, as shown in fig. 2, wherein the two-level Cache comprises an L1-level Cache and an L2-level Cache, the CPU searches for the required data in the L1, cannot find the required data, and cannot find the required data in the memory. After the needed data are found, the data are needed to be sent to the first-stage Cache step by step, and the data are sent to an instruction pipeline in the CPU through the first-stage Cache.
The data in each stage of Cache is stored in a line form, and when the Cache sends an instruction, all the data in the whole instruction line where the target instruction is located are sent instead of directly sending a single instruction.
When the instruction pipeline needs to execute the target instruction, if the target instruction is not in the first-stage Cache, the step of searching step by step is performed until the instruction is found, the instruction line data containing the instruction is sent to the first-stage Cache step by step, and in the process, the sent data is detected in the first-stage Cache while being sent, so as to judge whether the branch instruction exists in the instruction line.
The branch instructions include absolute address branch instructions and relative address branch instructions. The absolute address branch instruction can obtain the corresponding destination address by analyzing the instruction; the relative address branch instruction can obtain the destination address only after the instruction pipeline executes the instruction, and the absolute address branch instruction can also obtain the destination address after the instruction pipeline executes the relative address branch instruction. In the scheme, a simple decoding function is given to the Cache, the specific type of the branch instruction is not detected, and whether the instruction is the branch instruction or not is only detected.
S102, when detecting that a branch instruction exists in the instruction line data containing the target instruction, inquiring whether the detected branch instruction exists in a pre-constructed branch instruction lookup table.
Specifically, the detected branch instruction is detected using a pre-built branch instruction lookup table.
The look-up table is constructed according to the following manner: when the execution of the first branch instruction is finished, a destination address of the corresponding jump of the branch instruction is obtained, and the branch recorder records the branch instruction and the destination address thereof and sends the branch instruction and the destination address thereof to the first-stage Cache to establish a lookup table. The lookup table includes the destination address of each branch instruction in the table.
S103, if the detected branch instruction exists in the branch instruction lookup table, determining a destination address corresponding to the detected branch instruction in the branch instruction lookup table according to the detected branch instruction.
S104, sending a prefetch request for acquiring a destination address corresponding to the detected branch instruction to the lower Cache; when the detected branch instruction needs to be executed, the instruction line data containing the destination address corresponding to the detected branch instruction is directly obtained from the first-stage Cache.
Specifically, the lookup table includes the executed branch instruction and the destination address of each branch instruction in the table, so if the branch instruction detected in the above step exists in the lookup table, the destination address of the branch instruction can be determined according to the lookup table. Then, the first-level Cache sends a prefetch request to the lower-level Cache to acquire instruction line data containing the destination address.
If the branch instruction is needed to be executed by the instruction pipeline subsequently, the target address is acquired without searching step by step through the Cache, but the instruction line data containing the target address can be directly fetched from the first-stage Cache, the instruction fetching failure of the first-stage Cache is avoided, the acceleration of the instruction fetching process is realized, and the data access speed is improved.
In this embodiment, in the process that the lower-level Cache sends the instruction line data including the target instruction to the first-level Cache, the first-level Cache performs branch prefetching, and the branch instruction that may be executed by the subsequent instruction pipeline is fetched into the first-level Cache in advance. When the branch instruction needs to be executed subsequently in the instruction pipeline, corresponding instruction line data can be obtained in the first-stage Cache in time, and the instruction does not need to be taken out of the next-stage Cache or the memory, so that the instruction taking process is accelerated, and the access performance of the processor can be improved. Furthermore, the branch instruction prefetching method proposed by the scheme is applicable to both the Harvard architecture processor and the von Neumann architecture processor.
On the basis of the above embodiment, the method further comprises: after the instruction pipeline executes any branch instruction, the branch instruction lookup table is updated according to the executed branch instruction and the destination address thereof.
Specifically, the instruction execution process generally includes three parts, namely instruction fetching, decoding and executing, which are respectively implemented by an instruction fetching unit, a decoding unit and an executing unit in an instruction pipeline. When the instruction fetching unit fetches the branch instruction, the decoding unit and the executing unit operate on the branch instruction. The corresponding destination address can be obtained after the instruction is executed, and the instruction and the corresponding destination address are used for updating the branch instruction lookup table.
In this embodiment, after the instruction pipeline finishes executing a branch instruction, the executed branch instruction is used to update the branch instruction lookup table, so as to optimize the branch instruction lookup table, so that the branch instruction lookup table can cover the commonly used branch instruction.
In one embodiment, based on the above embodiment, the method updates the branch instruction lookup table according to the executed branch instruction and the destination address thereof, referring to fig. 3 of the accompanying drawings, and includes the steps of:
s201, inquiring whether the instruction which is the same as the executed branch instruction exists in a branch instruction lookup table, and if so, executing step S202; otherwise, step S203 is performed.
S202, the destination address corresponding to the executed branch instruction is covered by the destination address corresponding to the executed branch instruction in the branch instruction lookup table.
Specifically, the entry in the branch instruction lookup table includes the branch instruction and the destination address of the subsequent jump access of the branch instruction. The most recently accessed instructions are always most likely to be needed next, according to program locality principles. Thus, when a detected branch instruction already exists in the table, the original destination address in the table is overwritten with the current destination address of the branch instruction.
S203, judging whether a vacancy exists in the branch instruction lookup table, if so, executing step S204; otherwise, step S205 is performed.
Specifically, because the capacity of the lookup table of the branch instruction is limited, in order to better utilize the storage space, when the executed branch instruction is not queried in the lookup table, it is further determined whether there are redundant positions in the lookup table to store the branch instruction and the corresponding destination address thereof.
S204 adds the executed branch instruction and its destination address to the branch instruction lookup table.
S205, the executed branch instruction and the destination address thereof are used to replace the instruction with the earliest update time and the corresponding destination address in the branch instruction lookup table.
In this embodiment, a process of updating the branch instruction lookup table will be described in detail. If the executed branch instruction can be queried in the existing branch instruction lookup table, updating in an overlay mode; if the existing table is not queried, firstly judging whether the lookup table has redundant gaps, and storing the executed branch instruction and the destination address thereof into the lookup table in a mode of adding the table entry only when the lookup table has the gaps.
Because of program locality, the most recently accessed instructions are most likely to execute again, and therefore, when there is no room in the table, the entries in the lookup table are replaced with the currently executed branch instruction and its destination address. Specifically, the branch instruction with the earliest update time and the destination address thereof can be replaced according to the update time sequence of the instructions in the table, or the instruction with the earliest entry and the destination address thereof can be replaced according to the time sequence of entering the lookup table. Of course, the coverage is not necessary according to the entering time of the instruction, and the contents in the table can be eliminated according to other strategies such as the execution frequency of the branch instruction or the jump frequency of the destination address.
According to the scheme, through the process, the updating of the branch instruction lookup table is realized, the branch instruction lookup table is optimized, and the cache access failure is avoided.
In one embodiment of the present application, a method for prefetching a branch instruction includes the steps of:
s301, when a lower-level Cache sends instruction line data containing a target instruction to a first-level Cache, detecting whether a branch instruction exists in the instruction line data containing the target instruction.
S302, when detecting that there is a branch instruction in the instruction line data containing the target instruction, queries whether there is a detected branch instruction in a pre-constructed branch instruction lookup table.
S303, if the detected branch instruction exists in the branch instruction lookup table, selecting a destination address with the largest number of hops from instruction history data corresponding to the detected branch instruction as a destination address of a hop after the next execution of the branch instruction; and sending a prefetch request for acquiring the destination address to the lower Cache; so that when the instruction pipeline needs to execute the detected branch instruction, the instruction line data containing the destination address is directly obtained from the first-stage Cache.
Specifically, in this embodiment, the branch instruction lookup table further includes instruction history data of each branch instruction, and further includes an instruction address of the branch instruction itself. Referring to fig. 4, assuming that only two levels of caches are provided, when the second level Cache sends instruction line data including a target instruction to the first level Cache, branch detection is performed on the instruction line data in the first level Cache: detecting whether a branch instruction exists in the instruction line; if yes, inquiring whether the same branch instruction exists in the branch instruction lookup table; if the same instruction exists in the table, the instruction address and the destination address of the branch instruction are determined according to the branch instruction lookup table.
When the instruction pipeline finishes executing one instruction each time, corresponding instruction data are stored to obtain instruction history data, wherein the instruction history data specifically comprise all destination addresses of jumps after each branch instruction is executed and the corresponding jump times of each destination address. In the table look-up process, the destination address with the maximum jump times is used as the destination address of the current branch instruction, thereby further perfecting the instruction prefetching process, improving the accuracy of branch prediction and ensuring the prefetching process of the branch instruction to be more reliable.
In addition, in the actual table look-up process, the jump strategy can be flexibly selected, not only the address of the next jump can be determined by the destination address with the largest number of jumps, but also the address of the last jump can be determined according to the branch instruction, for example, in a DSP (Digital Signal Processor ), matrix multiplication is often required to be calculated, and most of the calculation is loop calculation, in the process, a large number of the destination addresses of the jumps are the same as the destination address of the previous jump, and the destination address of the previous jump can be used as the destination address of the current jump.
Preferably, as shown in fig. 4, a branch recorder may be additionally arranged in the instruction pipeline to play a role of temporarily storing instructions, and the branch instructions, the corresponding instruction addresses and destination addresses thereof, are recorded by the branch recorder and then sent to the branch instruction lookup table. The branch logger may be implemented in a variety of ways, such as registers or Sram static memory.
According to the embodiment, the accuracy and the stability of the branch instruction lookup table are improved through the instruction history data, so that the process of searching the branch instruction by the Cache is shortened, the speed of accessing the instruction by the processor is improved as a whole, the running performance of the processor is improved, and the user experience is improved.
Referring to fig. 5 of the drawings, a prefetch apparatus for a branch instruction according to an embodiment of the present application includes a detection module 10, a query module 20, a determination module 30, and a sending module 40, where:
the detection module 10 is configured to detect whether a branch instruction exists in the instruction line data including the target instruction when the lower-level Cache sends the instruction line data including the target instruction to the first-level Cache.
And a query module 20, configured to query a pre-constructed branch instruction lookup table for the presence of a detected branch instruction when the presence of the branch instruction in the instruction line data including the target instruction is detected.
The determining module 30 is configured to determine, in the branch instruction lookup table, a destination address corresponding to the detected branch instruction according to the detected branch instruction if the detected branch instruction exists in the branch instruction lookup table.
A sending module 40, configured to send a prefetch request to the lower Cache to obtain a destination address corresponding to the detected branch instruction; when the detected branch instruction needs to be executed, the instruction line data containing the destination address corresponding to the detected branch instruction is directly obtained from the first-stage Cache.
In this embodiment, in the process that the lower-level Cache sends the data of the instruction line including the target instruction to the first-level Cache, the branch instructions in the instruction line are prefetched, and these prefetched branch instructions may be executed in the instruction pipeline. Before the branch instruction enters the instruction pipeline, the instruction line data containing the destination address is prefetched into the Cache so as to accelerate the instruction fetching process, avoid the access failure of the branch instruction in the Cache and improve the access performance of the processor.
In one embodiment, further comprising: the updating module 50 is configured to update the branch instruction lookup table according to the executed branch instruction and the destination address thereof after the instruction pipeline has executed any branch instruction. In this embodiment, after the instruction pipeline executes a branch instruction, the update module updates the established branch instruction lookup table to optimize the branch instruction lookup table.
In one embodiment, the update module 40 includes a query sub-module 41, an overlay sub-module 42, a storage sub-module 43, wherein:
a query sub-module 41 for querying the branch instruction lookup table for the presence of the same instruction as the executed branch instruction.
And the overlay sub-module 42 is configured to overlay the destination address corresponding to the executed branch instruction in the branch instruction lookup table with the destination address corresponding to the executed branch instruction if the same instruction as the executed branch instruction exists.
The storage sub-module 43 is configured to store the executed branch instruction and the destination address thereof in the branch instruction lookup table if the same instruction as the executed branch instruction does not exist.
In this embodiment, when updating the branch instruction lookup table, the query sub-module 41 first searches the table for whether the same instruction as the executed branch instruction exists; if the same instruction can be found, the destination address of the same instruction in the lookup table is covered by the covering submodule 42; if the same instruction is not found, the executed branch instruction and its corresponding destination address are stored in the branch instruction lookup table by the storage sub-module 43. And updating the contents of the branch instruction lookup table through the query submodule, the coverage submodule and the storage submodule.
In one embodiment, the storage sub-module 43 includes:
and the judging unit is used for judging whether a vacancy exists in the branch instruction lookup table if the instruction which is the same as the executed branch instruction does not exist.
And the adding unit is used for adding the executed branch instruction and the destination address thereof into the branch instruction lookup table when the vacancy exists in the branch instruction lookup table.
And the replacing unit is used for respectively replacing the instruction with the earliest update time and the corresponding destination address in the branch instruction lookup table by the executed branch instruction and the destination address thereof when the vacancy does not exist in the branch instruction lookup table.
In this embodiment, the procedure of updating the branch instruction lookup table in the case where there is a gap in the branch instruction lookup table and in the case where there is no gap is described: when a vacancy exists in the table, directly adding the executed branch instruction and the destination address thereof into the table; when no empty space exists in the table, replacing the contents in the table according to a certain strategy.
In one embodiment, the determining module is configured to select, if the detected branch instruction exists in the branch instruction lookup table, a destination address with the largest number of hops from the instruction history data corresponding to the detected branch instruction as the destination address of the detected branch instruction that hops after the next execution. The branch instruction lookup table also comprises instruction history data of each branch instruction, wherein the instruction history data comprises all destination addresses of each branch instruction which jump after being executed and the corresponding jump times of each destination address.
In this embodiment, the branch instruction lookup table includes not only the branch instruction and the destination address corresponding thereto, but also the instruction history data, specifically including all destination addresses skipped after each branch instruction is executed and the number of hops corresponding to each destination address. In the table look-up process, the destination address with the maximum jump times is used as the destination address of the current branch instruction, thereby further perfecting the instruction prefetching process, improving the accuracy of branch prediction and ensuring the prefetching process of the branch instruction to be more reliable.
It should be noted that, the embodiments of the branch instruction prefetching apparatus provided by the present application and the embodiments of the branch instruction prefetching method provided by the present application are both based on the same inventive concept, and can achieve the same technical effects, so that other specific contents of the embodiments of the branch instruction prefetching apparatus may refer to the description of the embodiment contents of the branch instruction prefetching method.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
It should be noted that the above embodiments can be freely combined as needed. The foregoing is merely a preferred embodiment of the present application and it should be noted that modifications and adaptations to those skilled in the art may be made without departing from the principles of the present application, which are intended to be comprehended within the scope of the present application.

Claims (10)

1. A method of prefetching a branch instruction, comprising:
when a lower-level Cache sends instruction line data containing a target instruction to a first-level Cache, detecting whether a branch instruction exists in the instruction line data containing the target instruction;
inquiring whether the detected branch instruction exists in a pre-constructed branch instruction lookup table when the branch instruction exists in the instruction line data containing the target instruction;
if the detected branch instruction exists in the branch instruction lookup table, determining a destination address corresponding to the detected branch instruction in the branch instruction lookup table according to the detected branch instruction;
sending a prefetch request for acquiring a destination address corresponding to the detected branch instruction to the lower Cache; when the detected branch instruction needs to be executed, the instruction line data containing the destination address corresponding to the detected branch instruction is directly obtained from the first-stage Cache.
2. The method of claim 1, further comprising:
after the instruction pipeline executes any branch instruction, updating the branch instruction lookup table according to the executed branch instruction and the destination address thereof.
3. The method according to claim 2, wherein updating the branch instruction lookup table based on the executed branch instruction and the destination address thereof comprises:
querying the branch instruction lookup table for the presence of the same instruction as the executed branch instruction;
if the instruction which is the same as the executed branch instruction exists, the destination address corresponding to the executed branch instruction in the branch instruction lookup table is covered by the destination address corresponding to the executed branch instruction;
and if the instruction which is the same as the executed branch instruction does not exist, storing the executed branch instruction and the corresponding destination address thereof into the branch instruction lookup table.
4. A method according to claim 3, wherein storing the executed branch instruction and its corresponding destination address in the branch instruction lookup table if there is no instruction identical to the executed branch instruction, comprises:
if the instruction which is the same as the executed branch instruction does not exist, judging whether a vacancy exists in the branch instruction lookup table;
when a vacancy exists in the branch instruction lookup table, adding the executed branch instruction and a destination address thereof into the branch instruction lookup table;
when the branch instruction lookup table does not have a vacancy, the executed branch instruction and the destination address thereof are used for replacing the instruction with the earliest update time and the corresponding destination address thereof in the branch instruction lookup table respectively.
5. The method according to claim 1, wherein the branch instruction lookup table further includes instruction history data of each branch instruction, the instruction history data including all destination addresses of jumps after each branch instruction is executed and a number of jumps corresponding to each destination address; if the detected branch instruction exists in the branch instruction lookup table, determining, in the branch instruction lookup table, a destination address corresponding to the detected branch instruction according to the detected branch instruction, including:
if the detected branch instruction exists in the branch instruction lookup table, selecting a destination address with the largest number of hops from the instruction history data corresponding to the detected branch instruction as a destination address of the hops after the next execution of the detected branch instruction.
6. A prefetch apparatus of a branch instruction, comprising:
the detection module is used for detecting whether a branch instruction exists in the instruction line data containing the target instruction when the lower-level Cache sends the instruction line data containing the target instruction to the first-level Cache;
a query module, configured to query a pre-built branch instruction lookup table for the presence of a detected branch instruction when the presence of the branch instruction in the instruction line data including the target instruction is detected;
a determining module, configured to determine, according to the detected branch instruction, a destination address corresponding to the detected branch instruction in the branch instruction lookup table if the detected branch instruction exists in the branch instruction lookup table;
the sending module is used for sending a pre-fetching request for acquiring a destination address corresponding to the detected branch instruction to the lower-level Cache; when the detected branch instruction needs to be executed, the instruction line data containing the destination address corresponding to the detected branch instruction is directly obtained from the first-stage Cache.
7. The apparatus of claim 6, further comprising:
and the updating module is used for updating the branch instruction lookup table according to the executed branch instruction and the destination address thereof after the instruction pipeline executes any branch instruction.
8. The apparatus of claim 7, wherein the update module comprises:
a query sub-module, configured to query the branch instruction lookup table for whether the same instruction as the executed branch instruction exists;
the covering sub-module is used for covering the destination address corresponding to the executed branch instruction in the branch instruction lookup table with the destination address corresponding to the executed branch instruction if the instruction which is the same as the executed branch instruction exists;
and the storage sub-module is used for storing the executed branch instruction and the corresponding destination address thereof into the branch instruction lookup table if the instruction which is the same as the executed branch instruction does not exist.
9. The apparatus of claim 8, wherein the memory sub-module comprises:
a judging unit, configured to judge whether a vacancy exists in the branch instruction lookup table if the instruction identical to the executed branch instruction does not exist;
an adding unit, configured to add the executed branch instruction and a destination address thereof to the branch instruction lookup table when a vacancy exists in the branch instruction lookup table;
and the replacing unit is used for respectively replacing the instruction with the earliest update time and the corresponding destination address in the branch instruction lookup table by the executed branch instruction and the destination address thereof when the vacancy does not exist in the branch instruction lookup table.
10. The apparatus of claim 6, wherein the instruction prefetch unit is configured to prefetch a branch instruction,
the branch instruction lookup table also comprises instruction history data of each branch instruction, wherein the instruction history data comprises all destination addresses of each branch instruction which jump after being executed and the corresponding jump times of each destination address;
and the determining module is used for selecting the destination address with the largest jump frequency from the instruction history data corresponding to the detected branch instruction as the destination address of the jump after the next execution of the detected branch instruction if the detected branch instruction exists in the branch instruction lookup table.
CN202311141969.6A 2023-09-06 2023-09-06 Branch instruction prefetching method and device Pending CN117093271A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311141969.6A CN117093271A (en) 2023-09-06 2023-09-06 Branch instruction prefetching method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311141969.6A CN117093271A (en) 2023-09-06 2023-09-06 Branch instruction prefetching method and device

Publications (1)

Publication Number Publication Date
CN117093271A true CN117093271A (en) 2023-11-21

Family

ID=88783365

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311141969.6A Pending CN117093271A (en) 2023-09-06 2023-09-06 Branch instruction prefetching method and device

Country Status (1)

Country Link
CN (1) CN117093271A (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070186049A1 (en) * 2006-02-03 2007-08-09 International Business Machines Corporation Self prefetching L2 cache mechanism for instruction lines
CN101495962A (en) * 2006-08-02 2009-07-29 高通股份有限公司 Method and apparatus for prefetching non-sequential instruction addresses
CN103279324A (en) * 2013-05-29 2013-09-04 华为技术有限公司 Method and device capable of prefetching orders in internal storage to cache in advance
US20130311760A1 (en) * 2012-05-16 2013-11-21 Qualcomm Incorporated Multi Level Indirect Predictor using Confidence Counter and Program Counter Address Filter Scheme
CN104793921A (en) * 2015-04-29 2015-07-22 深圳芯邦科技股份有限公司 Instruction branch prediction method and system
CN111124493A (en) * 2019-12-17 2020-05-08 天津国芯科技有限公司 Method and circuit for reducing program jump overhead in CPU
CN112579175A (en) * 2020-12-14 2021-03-30 海光信息技术股份有限公司 Branch prediction method, branch prediction device and processor core

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070186049A1 (en) * 2006-02-03 2007-08-09 International Business Machines Corporation Self prefetching L2 cache mechanism for instruction lines
CN101495962A (en) * 2006-08-02 2009-07-29 高通股份有限公司 Method and apparatus for prefetching non-sequential instruction addresses
US20130311760A1 (en) * 2012-05-16 2013-11-21 Qualcomm Incorporated Multi Level Indirect Predictor using Confidence Counter and Program Counter Address Filter Scheme
CN103279324A (en) * 2013-05-29 2013-09-04 华为技术有限公司 Method and device capable of prefetching orders in internal storage to cache in advance
CN104793921A (en) * 2015-04-29 2015-07-22 深圳芯邦科技股份有限公司 Instruction branch prediction method and system
CN111124493A (en) * 2019-12-17 2020-05-08 天津国芯科技有限公司 Method and circuit for reducing program jump overhead in CPU
CN112579175A (en) * 2020-12-14 2021-03-30 海光信息技术股份有限公司 Branch prediction method, branch prediction device and processor core

Similar Documents

Publication Publication Date Title
JP4027620B2 (en) Branch prediction apparatus, processor, and branch prediction method
US7783868B2 (en) Instruction fetch control device and method thereof with dynamic configuration of instruction buffers
US6539458B2 (en) Hierarchical memory for efficient data exchange control
CN110069285B (en) Method for detecting branch prediction and processor
US20150186293A1 (en) High-performance cache system and method
JP3542021B2 (en) Method and apparatus for reducing set associative cache delay by set prediction
US5210831A (en) Methods and apparatus for insulating a branch prediction mechanism from data dependent branch table updates that result from variable test operand locations
US7769983B2 (en) Caching instructions for a multiple-state processor
US20030200396A1 (en) Scheme for reordering instructions via an instruction caching mechanism
JPH1074166A (en) Multilevel dynamic set predicting method and its device
KR930002945A (en) Information processing device applying prefetch buffer and prefetch buffer
US9753855B2 (en) High-performance instruction cache system and method
US7447883B2 (en) Allocation of branch target cache resources in dependence upon program instructions within an instruction queue
KR101049319B1 (en) Method and apparatus for reducing lookups in branch target address cache
KR970076253A (en) Methods of predicting the outcome of data processing systems and branch instructions
JPH0773104A (en) Cache system
JP2010102623A (en) Cache memory and control method therefor
US20150193348A1 (en) High-performance data cache system and method
US11797308B2 (en) Fetch stage handling of indirect jumps in a processor pipeline
CN117093271A (en) Branch instruction prefetching method and device
US7793085B2 (en) Memory control circuit and microprocessory system for pre-fetching instructions
US20050050280A1 (en) Data accessing method and system for processing unit
CN117331602A (en) Prefetch method and device for jump instruction
CN117311814A (en) Instruction fetch unit, instruction reading method and chip
CN117648131A (en) Target address acquisition method, branch predictor training method and related devices

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination