CN108255518B - Processor and loop program branch prediction method - Google Patents

Processor and loop program branch prediction method Download PDF

Info

Publication number
CN108255518B
CN108255518B CN201611248875.9A CN201611248875A CN108255518B CN 108255518 B CN108255518 B CN 108255518B CN 201611248875 A CN201611248875 A CN 201611248875A CN 108255518 B CN108255518 B CN 108255518B
Authority
CN
China
Prior art keywords
branch
counter value
program
program counter
loop
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201611248875.9A
Other languages
Chinese (zh)
Other versions
CN108255518A (en
Inventor
埃德温·苏坦托
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Spreadtrum Communications Shanghai Co Ltd
Original Assignee
Spreadtrum Communications Shanghai Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Spreadtrum Communications Shanghai Co Ltd filed Critical Spreadtrum Communications Shanghai Co Ltd
Priority to CN201611248875.9A priority Critical patent/CN108255518B/en
Publication of CN108255518A publication Critical patent/CN108255518A/en
Application granted granted Critical
Publication of CN108255518B publication Critical patent/CN108255518B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3836Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
    • G06F9/3842Speculative instruction execution
    • G06F9/3848Speculative instruction execution using hybrid branch prediction, e.g. selection between prediction techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3836Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
    • G06F9/3851Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution from multiple instruction streams, e.g. multistreaming

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Advance Control (AREA)

Abstract

The invention provides a processor and a loop program branch prediction method. Wherein the processor comprises at least a Branch Prediction Unit (BPU) and a Loop Prediction Unit (LPU), and the LPU is configured for: identifying whether a program currently running by the processor is a loop program, and if the program currently running is a loop program, obtaining a second program counter value corresponding to the second branch before the BPU makes a prediction on the next branch of the first branch, namely the LPU predicts that the second branch is the next branch of the first branch. Therefore, the LPU predicts the next branch before the BPU predicts the next branch of the current branch, so that bubbles between one branch and the next branch in the loop program can be avoided, the delay length can be reduced, and the execution efficiency of the loop program can be improved.

Description

Processor and loop program branch prediction method
Technical Field
The present invention relates to the field of computers, and in particular, to a processor and a loop program branch prediction method.
Background
A Central Processing Unit (CPU) includes an Instruction Fetch Unit (IFU), a Branch Execution Unit (BEU), and the like. When the CPU executes a loop Program, after the IFU fetches all instructions in a branch, an indeterminate delay time is required to obtain the Program Counter (PC) value corresponding to the next branch from the BEU, and then the IFU fetches the instructions in the next branch based on this PC value. The loop program includes, for example, a for loop, a while loop, and the like.
In order to determine and reduce the length of the delay, a Branch Prediction Unit (BPU) is configured in the CPU to predict the next branch before the BEU acquires the PC value corresponding to the next branch of the loop program. Specifically, in one clock cycle (clock cycle) after a branch is complete, the IFU obtains the PC value corresponding to its predicted next branch from the BPU and fetches the instruction in the next branch based on this PC value. Thus, there is still a delay of several idle clock cycles (i.e. bubbles) between the last instruction of one branch and the first instruction of the next branch in the loop program, and for the loop program including only a few instructions per branch, such a delay is not negligible, which has a significant impact on the execution efficiency of the whole loop program.
Disclosure of Invention
Embodiments of the present invention provide a processor and a method for predicting branches of a loop program, so as to reduce a delay length between one branch and a next branch in the loop program, thereby improving an execution efficiency of the loop program.
An embodiment of the present invention provides a processor including a branch prediction unit and a loop prediction unit, where the loop prediction unit is configured to: the method includes the steps of identifying whether a currently running program of the processor is a loop program, and if the currently running program is the loop program, obtaining a second program counter value corresponding to a second branch before the branch prediction unit predicts a next branch of a first branch, wherein the currently running program comprises the first branch, and the loop prediction unit predicts that the second branch is the next branch of the first branch.
In some embodiments, the loop prediction unit is further configured to: identifying whether a number of instructions included in a third branch exceeds a threshold before identifying whether the currently running program is a loop program, wherein the currently running program further includes the third branch, and the third branch is a branch previous to the first branch; if yes, the loop prediction unit is initialized. In some embodiments, the critical value is 8 or a value less than 8.
In some embodiments, the loop prediction unit is further configured to: if the number of instructions included in the third branch does not exceed the threshold, based on the third branch, obtaining an offset value and a first program counter value corresponding to the first branch, where the offset value represents an offset between the first program counter value and a third program counter value corresponding to the third branch.
In some embodiments, the loop prediction unit configured to identify whether the currently running program is a loop program comprises: identifying whether an instruction operation in the first branch is the same as an instruction operation in the third branch; if the instruction operation in the first branch is the same as the instruction operation in the third branch, obtaining the second program counter value based on the first program counter value and the offset value; and if the instruction operation in the first branch is different from the instruction operation in the third branch, initializing.
In some embodiments, the processor further comprises an instruction fetch unit configured to: and after the second program counter value is obtained, obtaining the instruction in the second branch according to the second program counter value, wherein the currently running program further comprises the second branch.
In some embodiments, the branch prediction unit is configured to: a prediction is made of a next branch of the first branch and a predicted program counter value is sent to the instruction fetch unit and the loop prediction unit.
In some embodiments, the loop prediction unit is further configured to: determining whether the predicted program counter value and the second program counter value are consistent; if the predicted program counter value and the second program counter value do not match, the instruction fetch unit is instructed to stop fetching instructions in the second branch, and instructions in the branch corresponding to the predicted program counter value are fetched based on the predicted program counter value.
In some embodiments, the loop prediction unit is further configured to: if the predicted program counter value and the second program counter value match, instructing the instruction fetch unit to fetch an instruction in a fourth branch based on the second program counter value and the offset value before the branch prediction unit makes a prediction for a next branch of the second branch, wherein the fourth branch is the next branch of the second branch predicted by the loop prediction unit, and a sum of the second program counter value and the offset value corresponds to the fourth branch.
In some embodiments, the processor further includes an execution unit to instruct the loop prediction unit to initialize if the execution unit determines that the prediction made by the branch prediction unit is incorrect.
The embodiment of the invention also provides a loop program branch prediction method, which comprises the following steps: the method includes identifying whether a current program is a loop program, and if the current program is the loop program, obtaining a second program counter value corresponding to a second branch to predict that the second branch is a next branch of a first branch before a branch prediction unit predicts the next branch of the first branch, wherein the current program includes the first branch.
In some embodiments, the method further comprises: identifying whether a number of instructions included in a third branch exceeds a threshold before identifying whether the current program is a loop program, wherein the current program further includes the third branch that is a previous branch of the first branch; if yes, initialization is carried out. In some embodiments, the critical value is 8 or a value less than 8.
In some embodiments, the method further comprises: if the number of instructions included in the third branch does not exceed the threshold, based on the third branch, obtaining an offset value and a first program counter value corresponding to the first branch, where the offset value represents an offset between the first program counter value and a third program counter value corresponding to the third branch.
In some embodiments, identifying whether the current procedure is a loop procedure comprises: identifying whether an instruction operation in the first branch is the same as an instruction operation in the third branch; if the instruction operation in the first branch is the same as the instruction operation in the third branch, obtaining the second program counter value based on the first program counter value and the offset value; and if the instruction operation in the first branch is different from the instruction operation in the third branch, initializing.
In some embodiments, the method further comprises: after fetching the second program counter value, instruct an instruction fetch unit to fetch instructions in the second branch according to the second program counter value, wherein the current program further includes the second branch.
In some embodiments, the method further comprises: the branch prediction unit obtains a predicted program counter value after predicting a next branch of the first branch. In some embodiments, the method further comprises: determining whether the predicted program counter value and the second program counter value are consistent; if the predicted program counter value and the second program counter value are not consistent, the instruction fetch unit is instructed to stop fetching instructions in the second branch, and instructions in the branch corresponding to the predicted program counter value are fetched according to the predicted program counter value.
In some embodiments, the method further comprises: if the predicted program counter value and the second program counter value are consistent, instructing the instruction fetch unit to fetch an instruction in a fourth branch based on the second program counter value and the offset value to predict that the fourth branch is the next branch of the second branch before the branch prediction unit makes a prediction of the next branch of the second branch, wherein the sum of the second program counter value and the offset value corresponds to the fourth branch.
In some embodiments, the method further comprises: and if the prediction made by the branch prediction unit is wrong, initializing.
Compared with the prior art, the technical scheme of the embodiment of the invention has the following advantages:
the Loop Prediction Unit (LPU) is configured to predict a next branch before the BPU predicts the next branch of the current branch, and instruct the IFU to obtain an instruction in the predicted next branch, so that a bubble between one branch and the next branch in the Loop program is avoided, thereby reducing a delay length and improving execution efficiency of the Loop program.
Drawings
FIG. 1 shows a schematic block diagram of a processor according to an embodiment of the invention;
FIG. 2 illustrates a flow diagram of a portion of a branch of a computer program according to an embodiment of the present invention;
FIG. 3 shows a flow diagram of a portion of a branch of a computer program according to an embodiment of the invention;
FIG. 4 shows a flow diagram of a portion of a branch of a computer program according to an embodiment of the invention;
FIG. 5 shows a flow diagram of a portion of a branch of a computer program of one embodiment of the invention; and
FIG. 6 is a flowchart illustrating a method for loop program branch prediction according to an embodiment of the invention.
Detailed Description
Embodiments of the present invention provide a processor and a method for predicting branches of a loop program, so as to reduce a delay length between one branch and a next branch in the loop program, thereby improving an execution efficiency of the loop program.
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in detail below. The invention will now be described with reference to specific examples. Accordingly, the disclosed embodiments should not be construed as unduly limiting this invention.
Fig. 1 shows a schematic diagram of a processor 100 according to an embodiment of the invention. Processor 100 includes a Branch Prediction Unit (BPU)101, an Instruction Fetch Unit (IFU)103, a Loop Prediction Unit (LPU)105, a memory unit 107, and a Branch Execution Unit (BEU) 109.
In some embodiments, processor 100 is running a computer program. Fig. 2 shows a flow diagram of a portion of a branch 200 of a computer program according to an embodiment of the invention. The computer program includes a branch 210 and a branch 230, the branch 230 being the next branch of the branch 210 predicted by the BPU 101, wherein the instruction fetch flow 2101 corresponds to the last instruction of the branch 210 and the instruction fetch flows 2301 and 2303 correspond to the first and last instructions of the branch 230, respectively. In some embodiments, an instruction fetch flow includes three steps, namely step 1, step 2, and step 3 in fig. 2-5. In some embodiments, one clock cycle is required to perform each step.
In some embodiments, BPU 101 makes a prediction of the next branch of branch 210 when IFU 103 executes step 3 of instruction fetch flow 2101. Branch 230 is the next branch of the predicted branch 210 of the BPU 101. after branch 210 is complete, IFU 103 fetches the PC value p1 corresponding to instruction fetch flow 2301 from BPU 101 and executes instruction fetch flow 2301 according to p1 one clock cycle after branch 210 is complete to begin branch 230.
In some embodiments, when IFU 103 executes instruction fetch flow 2101, LPU105 identifies whether the number of instructions included in branch 210 is less than a threshold, where the threshold is related to the number of bubbles used and the number of steps per flow. In some embodiments, when 3 bubbles are employed and each flow performed includes three steps, the critical value may be a value less than 20, wherein the smaller the critical value, the more significant the 3 bubbles have on the loop program. In some embodiments, the threshold value may be 8 or a value below 8.
In some embodiments, if the LPU105 identifies that the number of instructions in the branch 210 is greater than the threshold, the LPU105 initializes to return to identifying whether the number of instructions in the next branch is less than the threshold; if the LPU105 identifies that the number of instructions in the branch 210 is less than the threshold, p1 and an offset value representing the offset from the PC value corresponding to the first instruction in the branch 210 to p1 are retrieved from the BPU 101 after the branch 210 is completed, and p1 and the offset value are stored in the storage unit 107. In some embodiments, storage unit 107 may be a transitory or non-transitory storage medium known to those skilled in the art, such as a register, cache, memory, optical disk, and so forth.
In some embodiments, if the LPU105 identifies that the number of instructions in the branch 210 is less than the threshold, it is identified whether the instruction operations in the branch 210 are all consistent with the instruction operations in the branch 230 before the branch 230 is completed. In some embodiments, when IFU 103 executes instruction fetch flow 2303, LPU105 is instructed to identify whether the instruction operations in branch 210 are completely consistent with the instruction operations in branch 230.
In some embodiments, LPU105 initializes if the instruction operations in branch 210 are inconsistent with the instruction operations in branch 230; if the instruction operation in branch 210 coincides with the instruction operation in branch 230, indicating that the running computer program is a loop program, the LPU105 sums p1 and the offset value to obtain the PC value p2, where p2 is the PC value corresponding to the next branch of branch 230 predicted by LPU105, and p2 may be stored in storage unit 107. Illustratively, each branch of the loop program contains the same number of instructions, so the offset value of the PC value between one branch and the next should be fixed.
Fig. 3 shows a flow chart of another part 300 of a computer program according to an embodiment of the invention. The computer program may further include a branch 310, branch 310 being the next branch of branch 230 predicted by LPU 105. Instruction fetch flows 3101 and 3103 correspond to the first and second instructions of branch 310, respectively, where p2 corresponds to the first instruction of branch 310. In some embodiments, one clock cycle after initiating execution of instruction fetch flow 2303, IFU 103 executes instruction fetch flow 3101 according to p2 to initiate branch 310. In some embodiments, when IFU 103 performs step 3 of instruction fetch flow 2303, BPU 101 predicts the next branch of branch 230 and, upon completion of branch 230, sends the PC value pb corresponding to the next branch predicted by BPU 101 to IFU 103 and LPU 105.
Fig. 4 shows a flow diagram of another part of a branch 400 of a computer program according to an embodiment of the invention. Referring to fig. 4, instruction fetch flow 3105 corresponds to the last instruction of branch 310, the computer program further includes branch 410, and instruction fetch flow 4101 corresponds to the first instruction of branch 410. In some embodiments, if pb equals p2, IFU 103 continues to execute subsequent instruction fetch flow in branch 310. In executing step 1 of the instruction fetch flow 3105, the LPU105 predicts the next branch of the branch 310, i.e., the branch 410, and obtains the PC value p3 corresponding to the first instruction of the branch 410 based on p2 and the offset value, where p3 may be stored in the storage unit 107. One clock cycle after beginning execution of instruction fetch flow 3105, IFU 103 executes instruction fetch flow 4101 according to p3 to begin branch 410.
Fig. 5 shows a flow diagram of another part of a branch 500 of a computer program according to an embodiment of the invention. Referring to fig. 5, the computer program may also include a branch 510, with the instruction fetch flow 5101 corresponding to the first instruction of the branch 510, and pb corresponding to the first instruction of the branch 510. In some embodiments, if pb is not equal to p2, after branch 230 is complete, LPU105 instructs IFU 103 to stop instruction fetch 3101 and 3103 for branch 310 and instructs IFU 103 to execute instruction fetch 5101 based on pb to begin branch 510 one clock cycle after step 1 of instruction fetch 3103.
In some embodiments, BEU 109 is configured to determine a next branch of the computer program to be executed, and to send a PC value pe corresponding to the determined next branch to IFU 103. IFU 103 executes the instruction fetch flow in the determined next branch according to pe. Moreover, BEU 109 may also send an initialization signal to LPU105, and LPU105 may initialize upon receiving the initialization signal.
The embodiment of the invention provides a processor comprising an LPU, wherein the LPU is used for predicting a next branch before a BPU predicts the next branch of a current branch and instructing an IFU to obtain an instruction in the predicted next branch, so that bubbles between one branch and the next branch in a circulation program can be avoided, the delay length can be reduced, and the execution efficiency of the circulation program can be improved.
The embodiment of the invention also provides a loop program branch prediction method. The method identifies whether a current program is a loop program, and if the current program is the loop program, acquires a second PC value corresponding to a second branch to predict that the second branch is a next branch of a first branch before a BPU makes a prediction of the next branch of the first branch, wherein the current program comprises the first branch. FIG. 6 is a flowchart illustrating a method 600 for loop program branch prediction according to an embodiment of the invention.
In step 601, it is identified whether the number of instructions included in a third branch exceeds a threshold, wherein the current program further includes the third branch, and the third branch is a branch previous to the first branch. If yes, initialization is carried out. In some embodiments, step 602 is executed to obtain, based on the third branch, an offset value and a first PC value corresponding to the first branch if the number of instructions included in the third branch does not exceed the threshold, where the offset value represents an offset between the first PC value and a third PC value corresponding to the third branch. In some embodiments, the critical value may be 8 or a value below 8.
In step 603, it is identified whether the instruction operation in the first branch is the same as the instruction operation in the third branch. And if the instruction operation in the first branch is different from the instruction operation in the third branch, initializing. In some embodiments, step 604 is performed to obtain the second PC value based on the first PC value and the offset value if the instruction operation in the first branch is the same as the instruction operation in the third branch.
In step 605, after obtaining the second PC value, the IFU is instructed to fetch an instruction in the second branch according to the second PC value, where the current program further includes the second branch.
In step 606, the BPU predicts the next branch of the first branch, obtains a predicted PC value, and determines whether the predicted PC value is consistent with the second PC value. In some embodiments, step 607 is performed to instruct the IFU to stop fetching instructions in the second branch and instruct the IFU to fetch instructions in the branch corresponding to the predicted PC value according to the predicted PC value if the predicted PC value and the second PC value do not match. In some embodiments, step 608 is performed to instruct the IFU to fetch an instruction in a fourth branch based on the second PC value and the offset value to predict that the fourth branch is the next branch of the second branch before the BPU makes a prediction of the next branch of the second branch if the predicted PC value and the second PC value match, wherein the sum of the second PC value and the offset value corresponds to the fourth branch.
In some embodiments, initialization is performed if a prediction made by the BPU is determined to be incorrect.
Although the present invention is disclosed above, the present invention is not limited thereto. Various changes and modifications may be effected therein by one skilled in the art without departing from the spirit and scope of the invention as defined in the appended claims.

Claims (18)

1. A processor comprising a branch prediction unit and a loop prediction unit, the loop prediction unit configured to:
identifying whether a third branch comprises a number of instructions that exceeds a threshold, wherein a program currently run by the processor comprises the third branch;
if the number of instructions included in the third branch exceeds the critical value, initializing;
identifying whether the currently running program is a loop program; and
if the currently running program is a loop program, before the branch prediction unit predicts a next branch of a first branch, a second program counter value corresponding to a second branch is obtained, wherein the currently running program further includes the first branch, the third branch is a previous branch of the first branch, and the loop prediction unit predicts that the second branch is the next branch of the first branch.
2. The processor of claim 1, wherein the threshold value is a value of 8 or less than 8.
3. The processor as in claim 1 wherein the loop prediction unit is further configured to: if the number of instructions included in the third branch does not exceed the threshold, based on the third branch, obtaining an offset value and a first program counter value corresponding to the first branch, where the offset value represents an offset between the first program counter value and a third program counter value corresponding to the third branch.
4. The processor as in claim 3 wherein the loop prediction unit configured to identify whether the currently running program is a loop program comprises:
identifying whether an instruction operation in the first branch is the same as an instruction operation in the third branch;
if the instruction operation in the first branch is the same as the instruction operation in the third branch, obtaining the second program counter value based on the first program counter value and the offset value;
and if the instruction operation in the first branch is different from the instruction operation in the third branch, initializing.
5. The processor of claim 4, further comprising an instruction fetch unit configured to: and after the second program counter value is obtained, obtaining the instruction in the second branch according to the second program counter value, wherein the currently running program further comprises the second branch.
6. The processor as in claim 5 wherein the branch prediction unit is configured to: a prediction is made of a next branch of the first branch and a predicted program counter value is sent to the instruction fetch unit and the loop prediction unit.
7. The processor as in claim 6 wherein the loop prediction unit is further configured to:
determining whether the predicted program counter value and the second program counter value are consistent;
if the predicted program counter value and the second program counter value do not match, the instruction fetch unit is instructed to stop fetching instructions in the second branch, and instructions in the branch corresponding to the predicted program counter value are fetched based on the predicted program counter value.
8. The processor as in claim 6 wherein the loop prediction unit is further configured to:
if the predicted program counter value and the second program counter value match, instructing the instruction fetch unit to fetch an instruction in a fourth branch based on the second program counter value and the offset value before the branch prediction unit makes a prediction for a next branch of the second branch, wherein the fourth branch is the next branch of the second branch predicted by the loop prediction unit, and a sum of the second program counter value and the offset value corresponds to the fourth branch.
9. The processor of claim 1, further comprising an execution unit to instruct the loop prediction unit to initialize if the execution unit determines that the prediction made by the branch prediction unit is incorrect.
10. A method for loop program branch prediction, comprising:
identifying whether a number of instructions included in a third branch, which is included in the current program, exceeds a threshold;
if the number of instructions included in the third branch exceeds the critical value, initializing;
identifying whether the current program is a loop program; and
if the current program is a loop program, before the branch prediction unit predicts the next branch of the first branch, a second program counter value corresponding to a second branch is obtained to predict that the second branch is the next branch of the first branch, wherein the current program further comprises the first branch, and the third branch is the previous branch of the first branch.
11. The method of claim 10, wherein the critical value is a value of 8 or less than 8.
12. The method of claim 10, further comprising: if the number of instructions included in the third branch does not exceed the threshold, based on the third branch, obtaining an offset value and a first program counter value corresponding to the first branch, where the offset value represents an offset between the first program counter value and a third program counter value corresponding to the third branch.
13. The method of claim 12, wherein identifying whether the current program is a loop program comprises:
identifying whether an instruction operation in the first branch is the same as an instruction operation in the third branch;
if the instruction operation in the first branch is the same as the instruction operation in the third branch, obtaining the second program counter value based on the first program counter value and the offset value;
and if the instruction operation in the first branch is different from the instruction operation in the third branch, initializing.
14. The method of claim 13, further comprising: after fetching the second program counter value, instruct an instruction fetch unit to fetch instructions in the second branch according to the second program counter value, wherein the current program further includes the second branch.
15. The method of claim 14, further comprising: the branch prediction unit obtains a predicted program counter value after predicting a next branch of the first branch.
16. The method of claim 15, further comprising:
determining whether the predicted program counter value and the second program counter value are consistent;
if the predicted program counter value and the second program counter value are not consistent, the instruction fetch unit is instructed to stop fetching instructions in the second branch, and instructions in the branch corresponding to the predicted program counter value are fetched according to the predicted program counter value.
17. The method of claim 15, further comprising:
if the predicted program counter value and the second program counter value are consistent, instructing the instruction fetch unit to fetch an instruction in a fourth branch based on the second program counter value and the offset value to predict that the fourth branch is the next branch of the second branch before the branch prediction unit makes a prediction of the next branch of the second branch, wherein the sum of the second program counter value and the offset value corresponds to the fourth branch.
18. The method of claim 10, further comprising: and if the prediction made by the branch prediction unit is wrong, initializing.
CN201611248875.9A 2016-12-29 2016-12-29 Processor and loop program branch prediction method Active CN108255518B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611248875.9A CN108255518B (en) 2016-12-29 2016-12-29 Processor and loop program branch prediction method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611248875.9A CN108255518B (en) 2016-12-29 2016-12-29 Processor and loop program branch prediction method

Publications (2)

Publication Number Publication Date
CN108255518A CN108255518A (en) 2018-07-06
CN108255518B true CN108255518B (en) 2020-08-11

Family

ID=62721387

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611248875.9A Active CN108255518B (en) 2016-12-29 2016-12-29 Processor and loop program branch prediction method

Country Status (1)

Country Link
CN (1) CN108255518B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115658455B (en) * 2022-12-07 2023-03-21 北京开源芯片研究院 Processor performance evaluation method and device, electronic equipment and readable storage medium

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8904155B2 (en) * 2006-03-17 2014-12-02 Qualcomm Incorporated Representing loop branches in a branch history register with multiple bits
US9146745B2 (en) * 2006-06-29 2015-09-29 Intel Corporation Method and apparatus for partitioned pipelined execution of multiple execution threads
US9710276B2 (en) * 2012-11-09 2017-07-18 Advanced Micro Devices, Inc. Execution of instruction loops using an instruction buffer
US20150074353A1 (en) * 2013-09-06 2015-03-12 Futurewei Technologies, Inc. System and Method for an Asynchronous Processor with Multiple Threading
CN105511838B (en) * 2014-09-29 2018-06-29 上海兆芯集成电路有限公司 Processor and its execution method

Also Published As

Publication number Publication date
CN108255518A (en) 2018-07-06

Similar Documents

Publication Publication Date Title
KR101827747B1 (en) Controlling the execution of adjacent instructions that are dependent upon a same data condition
EP2972798B1 (en) Method and apparatus for guest return address stack emulation supporting speculation
CN108595210B (en) Processor implementing zero overhead loops
US11163574B2 (en) Method for maintaining a branch prediction history table
KR20140010930A (en) Method and apparatus for providing efficient context classification
US11182318B2 (en) Processor and interrupt controller
US11288047B2 (en) Heterogenous computer system optimization
US20130080755A1 (en) Method for speeding up the boot time of electric device and electric device using the same
CN113196244A (en) Macro operation fusion
CN105190540A (en) Optimizing performance for context-dependent instructions
JP7232331B2 (en) loop end predictor
US20190369999A1 (en) Storing incidental branch predictions to reduce latency of misprediction recovery
US20140250289A1 (en) Branch Target Buffer With Efficient Return Prediction Capability
CN108255518B (en) Processor and loop program branch prediction method
US9652245B2 (en) Branch prediction for indirect jumps by hashing current and previous branch instruction addresses
KR102571623B1 (en) Branch target buffer with early return lookahead
US20150058602A1 (en) Processor with adaptive pipeline length
JP2010152843A (en) Circuit for estimating reliability of branch prediction and method thereof
CN112740175A (en) Load path history based branch prediction
US11579886B2 (en) System and method for multi-level classification of branches
US10956159B2 (en) Method and processor for implementing an instruction including encoding a stopbit in the instruction to indicate whether the instruction is executable in parallel with a current instruction, and recording medium therefor
CN113918225A (en) Instruction prediction method, instruction data processing apparatus, processor, and storage medium
US9489204B2 (en) Method and apparatus for precalculating a direct branch partial target address during a misprediction correction process
US9983879B2 (en) Operation of a multi-slice processor implementing dynamic switching of instruction issuance order
US9436473B2 (en) Scheduling program instructions with a runner-up execution position

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant