WO2019019719A1 - 分支预测方法及装置 - Google Patents
分支预测方法及装置 Download PDFInfo
- Publication number
- WO2019019719A1 WO2019019719A1 PCT/CN2018/084134 CN2018084134W WO2019019719A1 WO 2019019719 A1 WO2019019719 A1 WO 2019019719A1 CN 2018084134 W CN2018084134 W CN 2018084134W WO 2019019719 A1 WO2019019719 A1 WO 2019019719A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- branch
- instruction
- instructions
- branch prediction
- information
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 57
- 230000009191 jumping Effects 0.000 claims description 11
- 230000008569 process Effects 0.000 abstract description 23
- 238000010586 diagram Methods 0.000 description 18
- 230000006870 function Effects 0.000 description 11
- 238000004891 communication Methods 0.000 description 10
- 238000012545 processing Methods 0.000 description 5
- 238000004590 computer program Methods 0.000 description 4
- 230000004048 modification Effects 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 230000006399 behavior Effects 0.000 description 2
- 238000012937 correction Methods 0.000 description 2
- 230000003068 static effect Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 239000000835 fiber Substances 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3802—Instruction prefetching
- G06F9/3804—Instruction prefetching for branches, e.g. hedging, branch folding
- G06F9/3806—Instruction prefetching for branches, e.g. hedging, branch folding using address prediction, e.g. return stack, branch history buffer
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3836—Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
- G06F9/3842—Speculative instruction execution
- G06F9/3844—Speculative instruction execution using dynamic branch prediction, e.g. using branch history tables
Definitions
- the present application relates to the field of processor technologies, and in particular, to a branch prediction method and apparatus.
- processors In order to meet the ever-increasing user demand for processor performance, processors typically use pipelined techniques that can overlap execution of instructions to increase efficiency. In order to achieve flexible execution of instructions, branch instructions are often used to change the order in which instructions are executed. Since the branch jump result of the branch instruction can only be obtained after the pipeline stage, when the processor acquires the branch instruction, it needs to pause the pipeline until the branch jump result of the branch instruction is obtained, and then the jump indicated by the branch jump result can be obtained. Continue to get instructions in the direction of the turn. However, stalling the pipeline can cause the operation of the pipeline to be interrupted, causing the pipeline to create "bubbles" that greatly affect processor performance.
- the processor when the processor acquires the branch instruction, it generally performs branch prediction on the branch instruction in the pre-stage of the pipeline, and continues to acquire the instruction in the jump direction indicated by the branch prediction result, without waiting for the pipeline to return the branch instruction.
- the branch jump results which can reduce the "bubbles" of the pipeline and improve the efficiency of the processor.
- branch prediction for branch instructions can be implemented through the Global History Register (GHR) and the Pattern History Table (PHT). It is assumed that k-bit history branch information is stored in the GHR.
- the entry in an address in the PHT can be accessed based on the k-bit information stored in the GHR.
- the branch prediction result of the first branch instruction is determined according to the entry.
- an entry on two consecutive addresses in the PHT can be accessed based on k-1 bit information other than the highest one bit in the GHR, and the branch prediction result based on the first branch instruction is from the two An entry is selected in the entry, and the branch prediction result of the second branch instruction is determined according to the selected entry.
- an entry on four consecutive addresses in the PHT can be accessed based on k-2 bit information other than the highest 2 bits in the GHR, based on the branch of the first branch instruction and the second branch instruction.
- the prediction result selects one of the four entries, and determines the branch prediction result of the third branch instruction according to the selected entry.
- the branch prediction result of each of the plurality of branch instructions is sequentially determined.
- the determination of the branch prediction result of a branch instruction depends on the branch prediction result of all branch instructions before the branch instruction, thereby causing the branch prediction process to be cumbersome and the control logic to be complicated, thereby causing the multi-branch prediction process.
- Branch predictions of branch instructions are less efficient and consume more power.
- the address to which the PHT is accessed increases exponentially as the number of the plurality of branch instructions increases, for example, when the number of the plurality of branch instructions is two, the address to which the PHT is accessed is (2) 0 +2 1 ), when the number of the multiple branch instructions is three, the address to which the PHT is accessed is (2 0 +2 1 +2 2 ), ..., so the PHT entry will be More, occupying more storage resources.
- a branch prediction method comprising:
- only one branch prediction information is obtained from the PHT, and the branch prediction result of each branch instruction in the plurality of valid instructions is obtained, thereby simplifying the branch prediction process of the branch instruction in the plurality of valid instructions. Simplify the control logic, which in turn can improve branch prediction efficiency and reduce power consumption.
- only the entry on an address in the PHT needs to be accessed, that is, the PHT has fewer entries. Branch prediction, so you can reduce the storage resources occupied by PHT.
- the obtaining the branch prediction information of the instruction cluster from the PHT based on the address of the instruction cluster and the historical branch information of the thread to which the instruction cluster belongs, includes:
- the address of each instruction cluster is generally preset, the address of the instruction cluster can be associated with an entry in the PHT. However, if the address of the instruction cluster is directly mapped to the entry in the PHT, each entry in the PHT can only be used by one instruction cluster for branch prediction. In the case where the number of instruction clusters is large, There are also many PHT entries, which leads to more storage resources occupied by PHT. In order to solve this problem, the lower bits of the address of the instruction cluster are usually associated with the entries in the PHT, and the instruction cluster having the same lower address can use the same entry in the PHT for branch prediction, thereby reducing the PHT. The entry.
- the instruction cluster with the same lower address at this time will access and modify the same entry in the PHT, causing the branch predictions of the branch instructions included in the different instruction clusters to interfere with each other.
- the branch prediction information of the instruction cluster may be obtained from the PHT based on the address of the instruction cluster and the historical branch information of the thread to which the instruction cluster belongs.
- the lower bit of the address of the instruction cluster and the historical branch information of the thread to which the instruction cluster belongs may be passed a hash function. Combined, get the index value.
- the hash function may include an addition hash, a bit operation hash (such as shift, XOR, etc.), a multiplication hash, a hybrid hash, and the like, which are not limited in this embodiment of the present invention.
- the index value may be a value corresponding to the entry in the PHT.
- the PHT stores not only multiple entries but also index values corresponding to the respective entries.
- the index value may be a table included in the PHT. The address of the item is not limited in this embodiment of the present invention.
- the index value is used as an index to query the PHT, and the branch prediction information of the instruction cluster is obtained
- the corresponding value may be obtained from the PHT based on the index value.
- the determining, according to the branch prediction information of the instruction cluster, the branch prediction result of each of the plurality of valid instructions including:
- n is a storage address according to the plurality of valid instructions
- the number of branch instructions in the plurality of valid instructions is not less than the n, determining, according to an order of storage addresses of the plurality of valid instructions, determining a branch of the nth branch instruction of the plurality of valid instructions The prediction result is a jump, and it is determined that the branch prediction results of the first n-1 branch instructions among the plurality of valid instructions are no jumps.
- the specified value may be set in advance, and the specified value is used to indicate that all branch instructions are predicted not to jump.
- the jump instruction position is represented by n, and the jump instruction position is the position of the branch instruction of the predicted jump, and for the instruction cluster, the jump instruction position is included according to the instruction cluster.
- the location of the branch instruction of the predicted jump in the instruction cluster in the location of all branch instructions when the storage address of all of the plurality of valid instructions is sorted for all of the branch instructions. For example, all the branch instructions are sorted according to the storage address of all the branch instructions of the plurality of valid instructions included in the instruction cluster. If the jump instruction position is 2, the branch instruction of the predicted jump in the instruction cluster is the branch instruction. The second branch instruction in all branch instructions.
- the number of the branch instructions in the plurality of valid instructions is less than n, it indicates that the position of the last branch instruction of the plurality of valid instructions has not reached the jump instruction position indicated by the branch prediction information of the instruction cluster. Then, it can be determined that the branch prediction result of all the branch instructions in the plurality of valid instructions is no jump.
- the branch prediction result of each of the plurality of valid instructions in the instruction cluster may be determined according to the branch prediction information of the instruction cluster, and the branch instruction of the plurality of valid instructions at this time
- the branch prediction process is relatively simple, and the control logic is relatively simple, so that the branch prediction efficiency of the branch instruction in the plurality of valid instructions can be improved, and the power consumption of the branch prediction process is reduced.
- the method further includes:
- the historical branch information of the thread to which the instruction cluster belongs may be updated based on the branch prediction result of each of the plurality of valid instructions, so that the branch prediction of the subsequently acquired branch instruction may continue to be based on the update. After the historical branch information is normal.
- the entry of the PHT includes branch prediction information and status information, where the status information is used to indicate that the branch prediction information is a strong state or a weak state;
- the method further includes:
- the branch prediction result of all the branch instructions of the plurality of valid instructions is no jump and the branch jump result of the branch instruction of the plurality of valid instructions is a jump, or when the plurality of valid instructions
- the result of branch branch prediction is that the branch jump result of the branch instruction that does not jump is a jump
- the table is The status information in the item is updated to status information indicating a weak status; if the status information in the entry of the branch prediction information of the instruction cluster in the PHT indicates a weak status, the branch prediction information in the entry is updated.
- the branch prediction information indicating the location of the jump instruction of the instruction cluster, where the jump instruction position of the instruction cluster is the same as the storage address of the plurality of valid instructions
- the The result of the branch jump in the plurality of valid instructions is the position of the branch instruction of the jump in all of the plurality of active instructions.
- the PHT may be updated based on the branch jump result of each of the plurality of valid instructions, so that the branch prediction of the subsequently acquired branch instruction may be performed based on the updated PHT, ensuring subsequent The accuracy of branch prediction.
- the obtaining the branch prediction information of the instruction cluster from the PHT based on the address of the instruction cluster and the historical branch information of the thread to which the instruction cluster belongs, includes:
- the branch prediction information of the instruction cluster is obtained in the PHT.
- the thread identifier of the thread can also be used as a basis for using the entry in the PHT, that is, the address of the instruction cluster, the historical branch information of the thread to which the instruction cluster belongs, and the thread identifier of the thread to which the instruction cluster belongs can be used.
- threads with the same lower address and historical branch information can use different entries in the PHT to perform branch prediction, thereby improving the accuracy of branch prediction.
- the branch prediction information of the instruction cluster may be obtained from the PHT based on the address of the instruction cluster, the historical branch information of the thread to which the instruction cluster belongs, and the thread identifier of the thread to which the instruction cluster belongs.
- the obtaining the branch prediction information of the instruction cluster from the PHT based on the address of the instruction cluster, the historical branch information of the thread to which the instruction cluster belongs, and the thread identifier of the thread to which the instruction cluster belongs, includes:
- a branch prediction apparatus having a function of implementing the behavior of the branch prediction method in the first aspect described above.
- the branch prediction apparatus includes at least one module for implementing the branch prediction method provided by the above first aspect.
- a branch prediction apparatus in a third aspect, includes a processor and a memory, and the memory is configured to store a program that supports the branch prediction apparatus to perform the branch prediction method provided by the above first aspect, And storing data related to implementing the branch prediction method described in the first aspect above.
- the processor is configured to execute a program stored in the memory.
- the branch prediction device can also include a communication bus for establishing a connection between the processor and the memory.
- a computer readable storage medium is provided, the instructions being stored in the computer readable storage medium, when executed on a computer, causing the computer to perform the branch prediction method of the first aspect described above.
- a computer program product comprising instructions for causing a computer to perform the branch prediction method of the first aspect described above when executed on a computer is provided.
- the technical solution provided by the present application has the beneficial effects of: after acquiring a cluster of instructions to be executed in a current cycle, if a branch instruction exists in a plurality of valid instructions included in the instruction cluster, starting a branch in the plurality of valid instructions
- the instruction performs branch prediction.
- the branch prediction information of the instruction cluster is obtained from the PHT, and then the plurality of valid instructions are determined based on the branch prediction information of the instruction cluster.
- the branch of each branch instruction predicts the result, thereby completing branch prediction for the branch instruction in the plurality of valid instructions.
- only one branch prediction information is obtained from the PHT, and the branch prediction result of each branch instruction in the plurality of valid instructions is obtained, thereby simplifying the branch prediction process of the branch instruction in the multiple effective instructions.
- Simplified control logic which improves branch prediction efficiency and reduces power consumption.
- only the entry on an address in the PHT needs to be accessed, that is, the PHT has fewer entries. Branch prediction, so you can reduce the storage resources occupied by PHT.
- FIG. 1A is a schematic diagram of a command cluster according to an embodiment of the present invention.
- FIG. 1B is a schematic diagram of a system architecture according to an embodiment of the present invention.
- FIG. 1C is a schematic diagram of a GHR according to an embodiment of the present invention.
- FIG. 1D is a schematic diagram of a PHT according to an embodiment of the present invention.
- FIG. 2 is a schematic structural diagram of a computer device according to an embodiment of the present invention.
- 3A is a flowchart of a branch prediction method according to an embodiment of the present invention.
- FIG. 3B is a schematic diagram of another PHT according to an embodiment of the present invention.
- FIG. 3C is a flowchart of an operation for determining a branch prediction result of each branch instruction in a plurality of valid instructions according to an embodiment of the present invention
- FIG. 3D is a schematic diagram of a speculative update process of a GHR according to an embodiment of the present invention.
- FIG. 3E is a schematic diagram of a modified branch information according to an embodiment of the present invention.
- FIG. 3F is a schematic diagram of another modified branch information according to an embodiment of the present invention.
- FIG. 3G is a schematic diagram of a PST correction and update process according to an embodiment of the present invention.
- FIG. 4A is a schematic structural diagram of a branch prediction apparatus according to an embodiment of the present invention.
- 4B is a schematic structural diagram of a second acquiring module according to an embodiment of the present invention.
- 4C is a schematic structural diagram of a determining module according to an embodiment of the present invention.
- 4D is a schematic structural diagram of another branch prediction apparatus according to an embodiment of the present invention.
- FIG. 4E is a schematic structural diagram of still another branch prediction apparatus according to an embodiment of the present invention.
- the assembly line usually consists of a plurality of associated water flow segments, each of which has a special function module for processing the instructions, and the processing steps of the plurality of flow segments can be staggered in time, thereby Multiple instructions can be processed in parallel.
- the five-stage pipeline includes five pipeline segments, which in turn can be Instruction Fetch (IF), Decoding (Instruction Decode, ID), Execute (EX), Memory Access (MEM), and Write Back ( Write Back, WB).
- IF Instruction Fetch
- Decoding Instruction Decode, ID
- EX Execute
- MEM Memory Access
- WB Write Back
- Instruction cluster The instruction stored in the memory can be divided into multiple instruction clusters according to its storage address.
- the address of each instruction cluster is the storage address of the first instruction included in each instruction cluster, and the address of each instruction cluster. Aligned with a preset instruction boundary, where each adjacent two instruction boundaries includes a preset number of instructions, and a preset number of instructions included between each adjacent two instruction boundaries constitutes one
- the instruction cluster that is, each instruction cluster, includes a preset number of instructions.
- the address of each instruction cluster is aligned to the 4-instruction boundary, in which case four instructions are included between each adjacent two instruction boundaries, and four instructions included between each adjacent two instruction boundaries form a command cluster. That is, each instruction cluster includes 4 instructions.
- the processor fetches instructions in units of instruction clusters. That is, the processor can only fetch instructions from one instruction cluster per cycle, and cannot acquire instructions across instruction clusters.
- Branch instruction is an instruction for changing the execution order of an instruction. Normally, the processor sequentially fetches instructions from the memory and executes them in the order of the storage addresses of the instructions stored in the memory. When the instruction fetched by the processor includes a branch instruction, the branch instruction is executed to obtain a branch jump result. When the branch jump result is no jump, the execution order of the instruction is not changed, and the processor continues to acquire the instruction after the branch instruction to execute in turn; when the branch jump result is a jump, The execution order of the instructions will be changed. At this time, the processor will not continue to acquire the instructions located after the branch instruction in order to execute, but will sequentially acquire and execute the instructions from the target storage address indicated by the branch instruction. For example, as shown in FIG.
- the processor acquires the instruction cluster 1 from the memory in a certain cycle.
- the instruction cluster 1 includes the instruction 1, the instruction 2, and the instruction 3.
- the next instruction cluster in the memory adjacent to the instruction cluster 1 is Instruction cluster 2.
- the processor continues to acquire the instruction cluster 2 to execute in the next cycle.
- the instruction 2 included in the instruction cluster 1 is a branch instruction, and the result of the branch jump obtained after the execution of the instruction 2 is a jump
- the processor will not continue to acquire the instruction cluster 2 to execute in the next cycle, but will The target storage address indicated by instruction 2 begins to acquire instructions and execute them in sequence. Assuming that the instruction on the target storage address indicated by instruction 2 is instruction 16, and instruction 16 belongs to instruction cluster 5, instruction cluster 5 is the instruction cluster to which instruction 2 jumps, and the processor will start from instruction cluster 5 in the next cycle. Get the instruction and execute it.
- a valid instruction is the instruction required by the thread to run during the current cycle. For a certain instruction cluster obtained by the processor, when the instruction cluster is sequentially acquired after its adjacent previous instruction cluster, all instructions in the instruction cluster are valid instructions; when the instruction cluster is a certain When the instruction cluster to which the branch instruction jumps, that is, when the instruction cluster is the instruction cluster to which the instruction at the target storage address indicated by the branch instruction belongs, the processor starts from the target storage address indicated by the branch instruction. The instruction is fetched and executed, so the valid instruction is the instruction at the target storage address in the instruction cluster, the last instruction in the instruction cluster, and the instruction between the instruction on the target storage address and the last instruction. For example, as shown in FIG.
- the instruction cluster 1 includes instruction 1, instruction 2, and instruction 3, and instruction cluster 1 is the previous instruction cluster adjacent thereto. After obtaining the obtained ones in sequence, the instruction 1, the instruction 2, and the instruction 3 included in the instruction cluster 1 are all valid instructions. Assuming that the processor acquires the instruction cluster 5 from the memory in a certain cycle, the instruction cluster 5 includes the instruction 15, the instruction 16 and the instruction 17, and the instruction cluster 5 is the instruction cluster to which the branch instruction in the instruction cluster 1 jumps, and the instruction cluster The instruction on the target storage address indicated by the branch instruction is instruction 16, and instruction 16 and instruction 17 in instruction cluster 5 are valid instructions.
- Branch instructions are frequently occurring instructions in a thread that enable flexible execution of instructions, resulting in a variety of thread behavior.
- branch prediction comes into being.
- Branch prediction refers to branch prediction of branch instructions in the pre-pipeline stage before branching the branch instruction result of the branch instruction to obtain the branch prediction result.
- the processor can The instruction is continued in the jump direction indicated by the branch prediction result, so that the pipeline can effectively avoid "bubbles”.
- the branch direction indicated by the branch prediction result is no jump
- the processor continues to acquire the instruction after the branch instruction in turn, when the branch prediction result is a jump
- the branch direction indicated by the branch prediction result is a jump to the target storage address indicated by the branch instruction, and the processor sequentially acquires the instruction from the target storage address.
- an embodiment of the present invention provides a branch prediction method, which simplifies the branch prediction process of a branch instruction in the case of reducing a PHT entry, thereby improving branch prediction of a branch instruction while occupying less storage resources. Efficiency and reduce power consumption.
- FIG. 1B is a schematic diagram of a system architecture according to an embodiment of the present invention.
- the system architecture can include a memory 101 and a processor 102.
- the memory 101 is used to store instructions, and the stored instructions belong to a plurality of instruction clusters.
- the processor 102 is configured to acquire a cluster of instructions to be executed in the cycle from the memory 101 in each cycle, and execute a plurality of valid instructions included in the instruction cluster to run the thread.
- the processor 102 is further configured to perform branch prediction on the branch instruction in the plurality of valid instructions to obtain a branch prediction result, so as to subsequently jump in the branch prediction result. Continue to get instructions in the direction.
- the processor 102 has a pipeline structure, which may include a plurality of functional modules, such as an instruction fetching module 1021, a decoding module 1022, etc., wherein the fetching module 1021 can perform operations such as obtaining an instruction, branch prediction, and the like. .
- the processor 102 when the processor 102 performs branch prediction on the branch instruction of the plurality of valid instructions included in the acquired instruction cluster, the processor may obtain the reference from the PHT based on the address of the instruction cluster and the historical branch information of the thread to which the instruction cluster belongs.
- the branch prediction information of the instruction cluster is determined, and based on the branch prediction information of the instruction cluster, a branch prediction result of each of the plurality of valid instructions is determined.
- the historical branch information of a thread includes the branch result of the branch instruction acquired when the thread is run.
- the branch result of the branch instruction is a branch prediction result of the branch instruction
- the branch result of the branch instruction is the branch jump result of the branch instruction.
- the branch information of instruction 2 and instruction 16 may be included in the history branch information of thread 1.
- the branch branch result of instruction 2 and the branch prediction result of instruction 16 may be included in the history branch information of thread 1.
- the historical branch information of a thread can be recorded by the GHR of the thread, the GHR is a shift register, and each thread running by the multi-thread processor can have an independent GHR, and for the multiple threads For each thread, the thread's GHR can be updated based on the branch result of the thread's branch instruction.
- a branch result may be recorded on each bit of the GHR.
- the historical branch information recorded in the GHR of a thread may be 110...101, where 0 and 1 are branch results, and 0 means no jump, 1 means jump.
- the PHT is a table for predicting whether a branch instruction jumps, and multiple threads running by the multi-threaded processor can share one PHT, and for each of the multiple threads, the PHT can be based on the thread.
- the branches of the branch instruction jump result to update.
- the PHT has multiple entries, and each entry includes branch prediction information and status information.
- the branch prediction information is used to predict whether the branch instruction jumps, and when the branch prediction information is a specified value, the branch prediction information is used to indicate that all branch instructions are predicted not to jump, and when the branch prediction information is not the specified value, the branch prediction The information is used to indicate the location of the jump instruction, which is the location of the branch instruction that is predicted to jump.
- the status information is used to indicate that the branch prediction information is a strong state or a weak state
- the state information is used to reflect the prediction accuracy of the branch prediction information, that is, when the state information is a strong state, indicating the branch prediction information.
- the prediction accuracy is high, and when the state information is in a weak state, it indicates that the prediction accuracy of the branch prediction information is low.
- a plurality of entries in the PHT are 000, 001...110, 111, wherein the value of the highest 2 bits of each entry is branch prediction information, and the value at the lowest 1 bit is the state.
- Information, and status information 1 is used to indicate a strong status
- status information 0 is used to indicate a weak status.
- the branch prediction information 00 in the entry 000 is used to indicate that all branch instructions are predicted not to jump, and the state information 0 is used to indicate that the prediction accuracy of the branch prediction information 00 is low;
- the branch prediction information 00 is used to indicate that all branch instructions are predicted not to jump, and the state information 1 is used to indicate that the prediction accuracy of the branch prediction information 00 is high; ...;
- the branch prediction information 11 in the entry 110 Instructing the jump instruction position, and the status information 0 is used to indicate that the prediction accuracy of the branch prediction information 11 is low;
- the branch prediction information 11 in the entry 111 is used to indicate the jump instruction position, and the status information 1 is used to indicate the branch.
- the prediction accuracy of the prediction information 11 is high.
- the branch prediction information of the instruction cluster is obtained from the PHT based on the address of the instruction cluster and the historical branch information of the thread to which the instruction cluster belongs, and each of the plurality of valid instructions is determined based on the branch prediction information of the instruction cluster.
- the operation of the branch prediction results of the branch instructions will be explained in detail in the embodiment of Figure 3A below.
- FIG. 2 is a schematic structural diagram of a computer device according to an embodiment of the present invention.
- the system architecture in FIG. 1 can be implemented by using the computer device shown in FIG. 2.
- the computer device includes at least one processor 201, a communication bus 202, a memory 203, and at least one communication interface 204.
- the processor 201 can be a general purpose central processing unit (CPU), a microprocessor, an application-specific integrated circuit (ASIC), or one or more for controlling the execution of the program of the present application. integrated circuit.
- CPU general purpose central processing unit
- ASIC application-specific integrated circuit
- Communication bus 202 communicates information between the above components.
- the memory 203 can be a read-only memory (ROM) or other type of static storage device that can store static information and instructions, a random access memory (RAM) or other type that can store information and instructions.
- the dynamic storage device can also be an Electrically Erasable Programmable Read-Only Memory (EEPROM), a Compact Disc Read-Only Memory (CD-ROM) or other optical disc storage, and a disc storage device. (including compact discs, laser discs, optical discs, digital versatile discs, Blu-ray discs, etc.), magnetic disk storage media or other magnetic storage devices, or can be used to carry or store desired program code in the form of instructions or data structures and can be Any other medium accessed, but is not limited to this.
- Memory 203 may be present independently and coupled to processor 201 via communication bus 202.
- the memory 203 can also be integrated with the processor 201.
- the communication interface 204 uses a device such as any transceiver for communicating with other devices or communication networks, such as Ethernet, Radio Access Network (RAN), Wireless Local Area Networks (WLAN), etc. .
- a device such as any transceiver for communicating with other devices or communication networks, such as Ethernet, Radio Access Network (RAN), Wireless Local Area Networks (WLAN), etc. .
- RAN Radio Access Network
- WLAN Wireless Local Area Networks
- processor 201 may include one or more CPUs, such as CPU0 and CPU1 shown in FIG.
- a computer device can include multiple processors, such as processor 201 and processor 205 shown in FIG. Each of these processors can be a single core processor (CPU) or a multi-core processor (multi-CPU).
- processors herein may refer to one or more devices, circuits, and/or processing cores for processing data, such as computer program instructions.
- the computer device may further include an output device 206 and an input device 207.
- Output device 206 is in communication with processor 201 for displaying information.
- Output device 206 can be a variety of display devices.
- the output device 206 can be a liquid crystal display (LCD), a light emitting diode (LED) display device, a cathode ray tube (CRT) display device, or a projector.
- Input device 207 is in communication with processor 201 and can receive user input in a variety of ways.
- input device 207 can be a mouse, keyboard, touch screen device, or sensing device, and the like.
- the computer device described above may be a general purpose computer device or a special purpose computer device.
- the computer device may be a desktop computer, a portable computer, a network server, a personal digital assistant (PDA), a mobile phone, a tablet computer, a wireless terminal device, a communication device, or an embedded device.
- PDA personal digital assistant
- Embodiments of the invention do not limit the type of computer device.
- the memory 203 is configured to store the program code 210 for executing the solution of the present application, and the processor 201 is configured to execute the program code 210 stored in the memory 203.
- the computer device can implement the branch prediction method provided by the embodiment of FIG. 3A below through the processor 201 and the program code 210 in the memory 203.
- FIG. 3A is a flowchart of a branch prediction method according to an embodiment of the present invention. The method is applied to a computer device, and may be specifically applied to a processor in a computer device. Referring to FIG. 3A, the method includes:
- Step 301 Acquire an instruction cluster to be executed in a current period, where the instruction cluster includes multiple valid instructions.
- an address of an instruction cluster to be executed in a current period may be acquired, and the instruction cluster is acquired based on an address of the instruction cluster.
- the storage address stored in the program counter (PC) of the running thread may be obtained, and the obtained storage address is determined to be executed in the current period.
- the address of the instruction cluster is obtained, and the storage address stored in the program counter (PC) of the running thread may be obtained, and the obtained storage address is determined to be executed in the current period.
- the address of the instruction cluster may be used as a starting storage address, and an instruction on a preset number of consecutive storage addresses starting from the starting storage address is obtained, and the obtained instruction is obtained. Multiple instructions can form the instruction cluster.
- the preset value may be set in advance, for example, the preset value may be 4 or the like.
- the plurality of valid instructions are instructions required by the thread to run during the current period.
- the plurality of valid instructions are all instructions in the instruction cluster; when the instruction cluster is a instruction cluster to which a branch instruction jumps
- the plurality of valid instructions are an instruction in the instruction cluster at a target storage address indicated by the branch instruction, a last instruction in the instruction cluster, and an instruction between the instruction on the target storage address and the last instruction.
- Step 302 When there is a branch instruction in the plurality of valid instructions, obtain branch prediction information of the instruction cluster from the PHT based on the address of the instruction cluster and the historical branch information of the thread to which the instruction cluster belongs.
- the thread to which the instruction cluster belongs is the running thread described in step 301.
- the thread to which the instruction cluster belongs can be implemented by executing multiple valid instructions in the instruction cluster. the rout.
- the history branch information of the thread to which the instruction cluster belongs includes a branch result of the branch instruction acquired when the thread is run.
- the branch result of the branch instruction is a branch prediction result of the branch instruction
- the branch result of the branch instruction is the branch jump result of the branch instruction.
- the instruction acquired when the thread is run includes the instruction 1, the instruction 2, and the instruction 16, and the instruction 2 and the instruction 16 are branch instructions
- the branch information of the instruction 2 and the instruction 16 may be included in the history branch information of the thread. .
- the branch branch result of instruction 2 and the branch prediction result of instruction 16 may be included in the history branch information of the thread.
- the historical branch information of the thread can be recorded by the GHR of the thread, and a branch result can be recorded on each bit of the thread's GHR.
- the PHT has a plurality of entries, each of which includes branch prediction information and status information.
- the branch prediction information is used to predict whether the branch instruction jumps, and when the branch prediction information is a specified value, the branch prediction information is used to indicate that all branch instructions are predicted not to jump, and when the branch prediction information is not the specified value, the branch prediction The information is used to indicate the location of the jump instruction, which is the location of the branch instruction that is predicted to jump.
- the status information is used to indicate that the branch prediction information is a strong state or a weak state, and the state information is used to reflect the prediction accuracy of the branch prediction information, that is, when the state information is a strong state, indicating the branch prediction information.
- the prediction accuracy is high, and when the state information is in a weak state, it indicates that the prediction accuracy of the branch prediction information is low.
- the operation of obtaining the branch prediction information of the instruction cluster from the PHT may be implemented in the following two manners, based on the address of the instruction cluster and the historical branch information of the thread to which the instruction cluster belongs.
- the first mode is: determining an index value based on an address of the instruction cluster and historical branch information of a thread to which the instruction cluster belongs; and using the index value as an index query PHT to obtain branch prediction information of the instruction cluster.
- the lower bit of the address of the instruction cluster and the historical branch information of the thread to which the instruction cluster belongs may be combined by a hash function. , get the index value.
- the hash function may include an addition hash, a bit operation hash (such as shift, XOR, etc.), a multiplication hash, a hybrid hash, and the like, which are not limited in this embodiment of the present invention.
- the index value may be a value corresponding to the entry in the PHT, as shown in FIG. 3B.
- the PHT stores not only multiple entries but also index values corresponding to the respective entries; or, the index value.
- the address of the entry that can be included in the PHT is not limited in this embodiment of the present invention.
- the index value is used as an index to query the PHT, and the branch prediction information of the instruction cluster is obtained
- the corresponding value may be obtained from the PHT based on the index value.
- the address of each instruction cluster is generally preset, the address of the instruction cluster can be associated with an entry in the PHT. However, if the address of the instruction cluster is directly mapped to the entry in the PHT, each entry in the PHT can only be used by one instruction cluster for branch prediction. In the case where the number of instruction clusters is large, There are also many PHT entries, which leads to more storage resources occupied by PHT. In order to solve this problem, the lower bits of the address of the instruction cluster are usually associated with the entries in the PHT, and the instruction cluster having the same lower address can use the same entry in the PHT for branch prediction, thereby reducing the PHT. The entry.
- the instruction cluster with the same lower address at this time will access and modify the same entry in the PHT, causing the branch predictions of the branch instructions included in the different instruction clusters to interfere with each other.
- the branch prediction of the included branch instruction interferes with each other, and the history branch information of the thread to which the instruction cluster belongs can be combined with the address of the instruction cluster to access or modify the PHT, so that the instruction cluster having the same lower address can use different entries in the PHT. Branch prediction is performed to improve the accuracy of branch prediction.
- the branch prediction information of the instruction cluster can be obtained from the PHT by using the foregoing first manner.
- the second mode when branch prediction is performed on the branch instruction of the running multiple threads based on the PHT, based on the address of the instruction cluster, the historical branch information of the thread to which the instruction cluster belongs, and the thread identifier of the thread to which the instruction cluster belongs, The branch prediction information of the instruction cluster is obtained in the PHT.
- the thread identifier is used to uniquely identify the thread, and the thread identifier of each thread of the multiple threads may be set in advance.
- the index value may be determined based on the address of the instruction cluster, the historical branch information of the thread to which the instruction cluster belongs, and the thread identifier of the thread to which the instruction cluster belongs; the index value is used as an index to query the PHT, and the branch prediction of the instruction cluster is obtained. information.
- the lower address of the address of the instruction cluster and the history of the thread to which the instruction cluster belongs may be determined.
- the branch information is combined with the thread identifier of the thread to which the instruction cluster belongs by a hash function to obtain an index value.
- the operation of querying the PHT by using the index value as an index, and obtaining the branch prediction information of the instruction cluster is the same as the operation of obtaining the branch prediction information of the instruction cluster by using the index value as the index query PHT in the first manner.
- This embodiment of the present invention will not be described again.
- the thread identifier of the thread can also be used as a basis for using the entry in the PHT, that is, the address of the instruction cluster, the historical branch information of the thread to which the instruction cluster belongs, and the thread identifier of the thread to which the instruction cluster belongs can be used.
- the branch prediction information of the instruction cluster can be obtained from the PHT by using the foregoing second manner.
- Step 303 Determine, according to the branch prediction information of the instruction cluster, a branch prediction result of each of the plurality of valid instructions.
- step 303 may include the following steps 3031-3036.
- Step 3031 Determine whether the branch prediction information of the instruction cluster is a specified value.
- the specified value may be set in advance, and the specified value is used to indicate that all branch instructions are predicted not to jump, for example, the specified value may be 00 or the like.
- the following step 3032 may be continued to determine a branch prediction result of each of the plurality of valid instructions.
- the following step 3033 may be continued to determine the branch prediction result of each of the plurality of valid instructions.
- Step 3032 When the branch prediction information of the instruction cluster is a specified value, it is determined that the branch prediction result of all the branch instructions in the plurality of valid instructions does not jump.
- branch prediction information of the instruction cluster is a specified value
- branch prediction of all branch instructions among the plurality of valid instructions in the instruction cluster may be determined. The result is no jump.
- the branch prediction information of the instruction cluster is a specified value.
- the plurality of valid instructions included in the instruction cluster are instruction 2, instruction 3, instruction 4, and instruction 3 and instruction 4 are branch instructions, all branch instructions of the plurality of valid instructions (ie, instruction 3 and instruction 4) may be determined.
- the branch prediction results are all non-jumping.
- Step 3033 When the branch prediction information of the instruction cluster is not the specified value, determine the jump instruction position n indicated by the branch prediction information of the instruction cluster.
- the jump instruction position is represented by n, and the jump instruction position is the position of the branch instruction of the predicted jump.
- the jump instruction position is according to the instruction. The location of the branch instruction of the predicted jump in the instruction cluster in the branch instruction when the storage address of all of the plurality of valid instructions included in the cluster is ordered for all of the branch instructions. For example, all the branch instructions are sorted according to the storage address of all the branch instructions of the plurality of valid instructions included in the instruction cluster. If the jump instruction position is 2, the branch instruction of the predicted jump in the instruction cluster is the branch instruction. The second branch instruction in all branch instructions.
- the corresponding hop can be obtained from the correspondence between the branch prediction information and the jump instruction position shown in Table 1 below based on the branch prediction information 01 of the instruction cluster.
- the instruction instruction position is 1, and 1 is the jump instruction position indicated by the branch prediction information of the instruction cluster.
- Step 3034 Determine whether the number of branch instructions in the plurality of valid instructions is less than n.
- the number of branch instructions in the plurality of valid instructions is less than n, it indicates that the location of the last branch instruction in the plurality of valid instructions has not reached the jump instruction indicated by the branch prediction information of the instruction cluster. position.
- the number of the branch instructions in the plurality of valid instructions is not less than n, it indicates that the position of the last branch instruction of the plurality of valid instructions has reached the jump instruction position indicated by the branch prediction information of the instruction cluster.
- the following step 3035 may be continued to determine a branch prediction result of each of the plurality of valid instructions.
- the following step 3036 may be continued to determine a branch prediction result of each of the plurality of valid instructions.
- Step 3035 When the number of branch instructions in the plurality of valid instructions is less than n, it is determined that the branch prediction results of all the branch instructions in the plurality of valid instructions are not jumping.
- n 3
- the plurality of valid instructions are instruction 2
- instruction 3 and instruction 4 are branch instructions
- the number of branch instructions among the plurality of valid instructions is 2.
- the number of branch instructions in the plurality of valid instructions is less than n, and it can be determined that the branch prediction results of all the branch instructions (ie, instruction 3 and instruction 4) of the plurality of valid instructions are not skipped.
- Step 3036 When the number of the branch instructions in the plurality of valid instructions is not less than n, determining, according to the order of the storage addresses of the plurality of valid instructions, the branch prediction result of the nth branch instruction of the plurality of valid instructions In order to jump, it is determined that the branch prediction results of the first n-1 branch instructions of the plurality of valid instructions are all non-jumping.
- n 2
- the plurality of valid instructions are instruction 2, instruction 3, and instruction 4 according to the order of their storage addresses
- instruction 3 and instruction 4 are branch instructions
- the number of branch instructions among the plurality of valid instructions is 2.
- the number of branch instructions in the plurality of valid instructions is not less than n
- the second branch instruction of the plurality of valid instructions may be determined according to the order of the storage addresses of the plurality of valid instructions.
- the branch prediction result is a jump, and it is determined that the branch prediction result of the previous one branch instruction (ie, instruction 3) among the plurality of valid instructions is no jump.
- the historical branch information of the thread to which the instruction cluster belongs may be updated based on the branch prediction result of each of the plurality of valid instructions.
- the branch prediction of the branch instruction that is subsequently acquired can be performed normally.
- the operation of updating the historical branch information of the thread to which the instruction cluster belongs is based on the branch prediction result of each of the plurality of valid instructions, which will be described in detail in the following embodiments.
- the plurality of valid instructions may be executed to obtain a branch jump result of each of the plurality of valid instructions, and based on the The branch jump result of each branch instruction of the plurality of valid instructions is used to update the historical branch information and the PHT of the thread to which the instruction cluster belongs, so as to ensure the accuracy of the branch prediction of the branch instruction acquired subsequently.
- the operation of updating the historical branch information and the PHT of the thread to which the instruction cluster belongs based on the branch jump result of each of the plurality of valid instructions will be described in detail in the following embodiments.
- the branch instruction of the thread may be performed through the foregoing steps 301-303.
- Branch prediction is performed, at which time the historical branch information of each of the plurality of threads is independently recorded, and the PHT can be shared by the plurality of threads, thereby saving storage resources while ensuring the accuracy of the branch prediction.
- branch prediction is started on the branch instruction in the plurality of valid instructions.
- the branch prediction information of the instruction cluster is obtained from the PHT, and then the plurality of valid instructions are determined based on the branch prediction information of the instruction cluster. The branch of each branch instruction predicts the result, thereby completing branch prediction for the branch instruction in the plurality of valid instructions.
- only one branch prediction information is obtained from the PHT, and the branch prediction result of each branch instruction in the plurality of valid instructions is obtained, thereby simplifying the branch prediction process of the branch instruction in the multiple effective instructions.
- Simplified control logic which improves branch prediction efficiency and reduces power consumption.
- only the entry on an address in the PHT needs to be accessed, that is, the PHT has fewer entries. Branch prediction, so you can reduce the storage resources occupied by PHT.
- the historical branch information of the thread to which the instruction cluster belongs may be speculatively updated and modified.
- the operation of updating the historical branch information of the thread to which the instruction cluster belongs is performed, that is, performing historical branch information of the thread to which the instruction cluster belongs.
- Speculative update based on the branch jump result of each of the plurality of valid instructions, the operation of updating the historical branch information of the thread to which the instruction cluster belongs is performed, that is, the historical branch information of the thread to which the instruction cluster belongs is modified and updated.
- the speculative update and correction update of the historical branch information of the thread to which the instruction cluster belongs are described below.
- the speculative update of the historical branch information of the thread to which the instruction cluster belongs may be: In the historical branch information of the thread to which the instruction cluster belongs, the branch prediction result of each branch instruction of the plurality of valid instructions is added, and m branch results with the earliest storage time are deleted, and m is a branch instruction of the plurality of valid instructions number.
- the historical branch information of a thread is often recorded by the GHR of the thread. If the branch prediction result of a branch instruction of the thread is obtained, the branch prediction result of the branch instruction may be saved. Go to the lowest 1 bit (or the highest 1 bit) of the GHR of the thread, and shift the previously saved branch result in the GHR to the left (or right) by one bit to store the first branch result in the GHR. The GHR is removed and the speculative update to the GHR is completed.
- the plurality of valid instructions are instruction 2, instruction 3, instruction 4, and instruction 3 and instruction 4 are branch instructions.
- the branch prediction result of the instruction 3 is no jump (indicated by 0)
- the branch prediction result of the instruction 4 is a jump (indicated by 1)
- the branch prediction result of the instruction 3 and the instruction 4 can be obtained. Save to the lowest 2 digits of the GHR of the thread to which the instruction cluster belongs, and shift the branch result originally saved in the GHR to the left by two, to remove the first two branch results of the GHR from the GHR, and complete the GHR's speculative update.
- the modified update of the historical branch information of the thread to which the instruction cluster belongs may be:
- the modified branch information corresponding to the branch instruction is obtained, and the last branch result of the modified branch information corresponding to the branch instruction is inverted to obtain historical branch information of the thread to which the instruction cluster belongs.
- the inverting of the last branch result of the modified branch information corresponding to the branch instruction means updating the last branch result to the opposite jump state. For example, when the last branch result is a jump, the last branch result is flipped, that is, the last branch result is updated to not jump; when the last branch result is no jump, the The last branch result is flipped, which is to update the last branch result to a jump.
- the historical branch information of the thread to which the instruction cluster belongs is speculatively updated based on the branch prediction result of each branch instruction of the plurality of valid instructions, it is usually according to the branch instruction of the plurality of valid instructions.
- the order of storing the addresses, in the history branch information of the thread to which the instruction cluster belongs sequentially increases the branch prediction result of each of the plurality of valid instructions, and sequentially deletes the m branch results with the earliest storage time. That is, according to the order of the storage addresses of the branch instructions in the plurality of valid instructions, each branch branch prediction result is added to the history branch information of the thread to which the instruction cluster belongs, and the branch with the earliest storage time is deleted at the same time. result.
- the modified branch information corresponding to the branch instruction is to add a branch prediction result of the branch instruction to the historical branch information of the thread to which the instruction cluster belongs, and delete one at the same time. Branch information obtained when storing the earliest branch result.
- the history branch information of the thread to which the instruction cluster belongs is 110...101
- the branch instruction in the plurality of valid instructions is instruction 3, instruction 4, and branch prediction of instruction 3 according to the order of its storage address.
- the result is no jump (indicated by 0)
- the branch prediction result of instruction 4 is a jump (indicated by 1).
- the instruction 3 may be sequentially added in the historical branch information of the thread to which the instruction cluster belongs.
- the branch prediction result of instruction 4 and sequentially delete the two branch results with the earliest storage time.
- the modified branch information corresponding to the instruction 3 is the branch information 100 obtained by adding the branch prediction result 0 of the instruction 3 to the history branch information of the thread to which the instruction cluster belongs, and simultaneously deleting the branch result 1 with the earliest storage time. ...010.
- the modified branch information corresponding to the instruction 4 is the branch information 001...101 obtained when the branch prediction result 1 of the instruction 4 is added to the history branch information of the thread to which the instruction cluster belongs, and the branch result 1 of the earliest storage time is deleted at the same time.
- the modified branch information corresponding to each branch instruction may be obtained from the GHR after the speculative update.
- the first value digit may be additionally added to the GHR.
- the branch result recorded on the first value bit is not used for branch prediction of the branch instruction, that is, the information on the first value bit is not the historical branch information used for branch prediction of the branch instruction, except for the first value bit
- the branch result recorded on the other bits is used for branch prediction of the branch instruction, that is, the information on the bits other than the first value bits is the history branch information for branch prediction of the branch instruction.
- the first value may be set in advance, and the first value may be not less than the instruction included in the instruction cluster, in order to ensure that the modified branch information corresponding to each branch instruction of the plurality of valid instructions can be acquired subsequently.
- the value obtained by subtracting 1 from the number that is, the value obtained by subtracting 1 from the preset value.
- the second value is the number of branch results included in the history branch information.
- the dashed box of the GHR of the thread to which the instruction cluster belongs is an additional first digit, and the branch result recorded on the first digit is not used for branch prediction of the branch instruction, that is, the first
- the information on the value bits is not the historical branch information used for branch prediction of the branch instruction, and the branch result recorded on the other bits except the first value bit (ie, the solid line frame) is used for branch prediction of the branch instruction. That is, the information on the bits other than the first value bits is the history branch information for branch prediction of the branch instruction.
- the branch instructions of the plurality of valid instructions are instruction 3 and instruction 4 according to the order of their storage addresses, and the branch prediction result of instruction 3 is no jump (represented by 0), and the branch prediction result of instruction 4 is a jump ( If indicated by 1), after the GHR is speculatively updated based on the branch prediction result of each of the plurality of valid instructions, the branch result recorded in the GHR may be 011001...101 in sequence.
- the branch result recorded on the first value bit is 011 in sequence
- the history branch information is 001...101
- the branch prediction result of the instruction 3 and the instruction 4 are sequentially recorded on the last two bits of the GHR.
- the branching prediction result 0 of the instruction 3 is used as the second numerical continuous branch result of the last branch result to constitute the corrected branch information 100...010 corresponding to the instruction 3, and the speculatively updated
- the second numerical consecutive branch result of the branch prediction result 1 of the instruction 4 as the last branch result in the GHR constitutes the corrected branch information 001...101 corresponding to the instruction 4.
- the last branch result 0 of the modified branch information 100...010 corresponding to the instruction 3 is inverted to 1, and the history branch information 100 of the thread to which the instruction cluster belongs is obtained. ...011. If the branch prediction result of the instruction 4 is different from the branch jump result, the last branch result 0 of the modified branch information 001...101 corresponding to the instruction 4 is inverted to 1, and the historical branch information 001...100 of the thread to which the instruction cluster belongs is obtained. .
- the instruction obtained after obtaining the branch prediction result of the branch instruction may also be discarded, and Based on the historical branch information of the thread to which the instruction cluster belongs, branch prediction is performed on the branch instruction acquired after the execution of the branch instruction, and the instruction is continuously acquired based on the obtained branch prediction result to ensure the accuracy of the acquired instruction.
- the PHT may be modified and updated.
- step 303 based on the branch jump result of each branch instruction of the plurality of valid instructions, the operation of updating the PHT is performed, that is, the PHT is modified and updated.
- the PHT is modified and updated. The following is a description of the revised update of the PHT.
- the modified update of the PHT that is, based on the branch jump result of each of the plurality of valid instructions
- the operation of updating the PHT may include the following four cases.
- the operation of updating the PHT may include the following The first case and the second case.
- the entry is not updated.
- the state information in the entry of the branch prediction information of the instruction cluster in the PHT indicates a strong state, it indicates that the prediction accuracy of the branch prediction information in the entry is high. Then, when the branch prediction of the branch instruction in the plurality of valid instructions is correct, the entry of the branch prediction information of the instruction cluster may not be updated.
- the second case if the status information in the entry of the branch prediction information of the instruction cluster in the PHT indicates a weak state, the status information in the entry is updated to status information indicating a strong status.
- the state information in the entry of the branch prediction information of the instruction cluster in the PHT indicates a weak state, it indicates that the prediction accuracy of the branch prediction information in the entry is low. Then, when the branch prediction of the branch instruction in the plurality of valid instructions is correct, the state information in the entry may be updated to state information indicating a strong state to increase the prediction accuracy of the branch prediction information in the entry.
- the branch prediction result of all the branch instructions of the plurality of valid instructions is no jump and the branch jump result of the branch instruction among the plurality of valid instructions is a jump, or when the plurality of valid instructions are
- the branch prediction result is that the branch jump result of the branch instruction that does not jump is a jump, that is, when the branch prediction of the branch instruction in the plurality of valid instructions is incorrect, based on each of the plurality of valid instructions
- the branch jump result of the branch instruction, and the operation of updating the PHT may include the third case and the fourth case as follows.
- the third case if the state information in the entry of the branch prediction information of the instruction cluster in the PHT indicates a strong state, the state information in the entry is updated to state information indicating the weak state.
- the state information in the entry of the branch prediction information of the instruction cluster in the PHT indicates a strong state, it indicates that the prediction accuracy of the branch prediction information in the entry is high. If the branch prediction of the branch instruction in the plurality of valid instructions is incorrect, the status information in the entry may be updated to status information indicating the weak status to reduce the prediction accuracy of the branch prediction information in the entry. .
- the fourth case if the status information in the entry of the branch prediction information of the instruction cluster in the PHT indicates a weak state, the branch prediction information in the entry is updated to indicate the location of the jump instruction of the instruction cluster. Branch prediction information.
- the branch instruction in which the branch jump result is a jump in the plurality of valid instructions is The position in all branch instructions in multiple active instructions.
- the branch prediction information in the entry may be updated to branch prediction information indicating a jump instruction position of the instruction cluster to update the table. The position of the jump instruction indicated by the branch prediction information in the item.
- the PHT update process in the above four cases will be described below with reference to FIG. 3G.
- the plurality of valid instructions are instruction 2, instruction 3, and instruction 4 according to their storage order, and instruction 2, instruction 3, and instruction 4 are branch instructions.
- the PHT includes eight entries, which are 000, 001, 010, 011, 100, 101, 110, and 111, and the highest 2 bits in each entry are branch prediction information, and the branch prediction information 00 is used. It is indicated that all branch instructions are predicted not to jump, and the branch prediction information 01, 10, 11 is used to indicate the jump instruction positions 1, 2, and 3, respectively.
- the lowest 1 bit of each entry is status information, and the status information 0, 1 is used to indicate a weak state and a strong state, respectively.
- the eight entries in the PHT may be updated based on the operations in the above four cases, the update process being specifically shown in FIG. 3G, where T0 indicates that the branch jump results of all the branch instructions in the plurality of valid instructions are In order not to jump, T1 indicates that the branch jump result of the first branch instruction (ie, instruction 2) of the plurality of valid instructions is a jump, and T2 represents the second branch instruction of the plurality of valid instructions (ie, the instruction The branch jump result of 3) is a jump, and T3 indicates that the branch jump result of the third branch instruction (ie, instruction 4) of the plurality of valid instructions is a jump.
- the branch prediction information 00 in the entry is the branch prediction information of the instruction cluster, based on the branch prediction information 00 of the instruction cluster, all of the plurality of valid instructions may be determined.
- the branch prediction result of the branch instruction is no jump.
- the status information 1 in the entry is used to indicate a strong state
- the branch jump result of all the branch instructions in the multiple valid instructions is no jump (ie, T0)
- the entry is not performed. Update.
- the branch jump result of the branch instruction in the plurality of valid instructions is a jump (ie, T1/T2/T3)
- the status information in the entry is updated to status information indicating a weak state, that is, the entry is updated. It is 000.
- other entries in the PHT can also be updated based on the operations in the above four cases.
- FIG. 4A is a schematic structural diagram of a branch prediction apparatus according to an embodiment of the present invention.
- the branch prediction apparatus may be implemented as part or all of a computer device by software, hardware, or a combination of the two.
- the computer device may be as shown in FIG. Computer equipment.
- the branch prediction device may be presented in the form of a functional module.
- each functional module in the branch prediction device may be implemented by the processor and the memory in FIG. 2, and the processor can execute or control other devices to complete the embodiment of the present invention.
- Each step in the method flow implements each function.
- the apparatus includes a first acquisition module 401, a second acquisition module 402, and a determination module 403.
- the first obtaining module 401 is configured to perform step 301 in the embodiment of FIG. 3A;
- a second obtaining module 402 configured to perform step 302 in the embodiment of FIG. 3A;
- the determining module 403 is configured to perform step 303 in the embodiment of FIG. 3A.
- the second obtaining module 402 includes a first determining unit 4021 and a query unit 4022.
- the first determining unit 4021 is configured to determine an index value based on an address of the instruction cluster and historical branch information of a thread to which the instruction cluster belongs;
- the query unit 4022 is configured to query the PHT using the index value as an index to obtain branch prediction information of the instruction cluster.
- the determining module 403 includes a second determining unit 4031, a third determining unit 4032, a fourth determining unit 4033, and a fifth determining unit 4034.
- a second determining unit 4031 configured to perform step 3032 in the embodiment of FIG. 3A;
- a third determining unit 4032 configured to perform step 3033 in the embodiment of FIG. 3A;
- a fourth determining unit 4033 configured to perform step 3035 in the embodiment of FIG. 3A;
- the fifth determining unit 4034 is configured to perform step 3036 in the embodiment of FIG. 3A.
- the apparatus further includes a first update module 404.
- the first update module 404 is configured to: add, in the historical branch information of the thread to which the instruction cluster belongs, a branch prediction result of each of the plurality of valid instructions, and delete the m branch results with the earliest storage time, where m is multiple valid The number of branch instructions in the instruction.
- the entry of the PHT includes branch prediction information and status information, and the status information is used to indicate that the branch prediction information is a strong state or a weak state.
- the device further includes an execution module 405, a second update module 406, and a Three update module 407.
- the execution module 405 is configured to execute a plurality of valid instructions, and obtain a branch jump result of each of the plurality of valid instructions;
- a second update module 406, configured to perform the first case and the second case in step 303 in the embodiment of FIG. 3A;
- the third update module 407 is configured to perform the third and fourth cases in step 303 in the embodiment of FIG. 3A.
- the second obtaining module 402 is configured to perform the second mode in step 302 in the embodiment of FIG. 3A.
- branch prediction is started on the branch instruction in the plurality of valid instructions.
- the branch prediction information of the instruction cluster is obtained from the PHT, and then the plurality of valid instructions are determined based on the branch prediction information of the instruction cluster. The branch of each branch instruction predicts the result, thereby completing branch prediction for the branch instruction in the plurality of valid instructions.
- only one branch prediction information is obtained from the PHT, and the branch prediction result of each branch instruction in the plurality of valid instructions is obtained, thereby simplifying the branch prediction process of the branch instruction in the multiple effective instructions.
- Simplified control logic which improves branch prediction efficiency and reduces power consumption.
- only the entry on an address in the PHT needs to be accessed, that is, the PHT has fewer entries. Branch prediction, so you can reduce the storage resources occupied by PHT.
- branch prediction apparatus provided in the above embodiment is only illustrated by the division of each functional module in the branch prediction. In actual applications, the function allocation may be completed by different functional modules as needed. The internal structure of the device is divided into different functional modules to perform all or part of the functions described above.
- branch prediction apparatus and the branch prediction method embodiment provided by the foregoing embodiments are in the same concept, and the specific implementation process is described in detail in the method embodiment, and details are not described herein again.
- the computer program product includes one or more computer instructions.
- the computer can be a general purpose computer, a special purpose computer, a computer network, or other programmable device.
- the computer instructions can be stored in a computer readable storage medium or transferred from one computer readable storage medium to another computer readable storage medium, for example, the computer instructions can be from a website site, computer, server or data center Transfer to another website site, computer, server, or data center by wire (eg, coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (eg, infrared, wireless, microwave, etc.).
- the computer readable storage medium can be any available media that can be accessed by a computer or a data storage device such as a server, data center, or the like that includes one or more available media.
- the usable medium may be a magnetic medium (for example, a floppy disk, a hard disk, a magnetic tape), an optical medium (for example, a digital versatile disc (DVD)), or a semiconductor medium (for example, a solid state disk (SSD)). )Wait.
- a magnetic medium for example, a floppy disk, a hard disk, a magnetic tape
- an optical medium for example, a digital versatile disc (DVD)
- DVD digital versatile disc
- SSD solid state disk
Landscapes
- Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Advance Control (AREA)
Abstract
本申请公开了一种分支预测方法及装置,属于处理器技术领域。所述方法包括:获取当前周期内待执行的指令簇,所述指令簇中包括多个有效指令;当所述多个有效指令中存在分支指令时,基于所述指令簇的地址和所述指令簇所属线程的历史分支信息,从PHT中获取所述指令簇的分支预测信息;基于所述指令簇的分支预测信息,确定所述多个有效指令中每个分支指令的分支预测结果。本申请仅从PHT中获取一个分支预测信息,就可以得到多个有效指令中每个分支指令的分支预测结果,从而可以简化该多个有效指令中的分支指令的分支预测过程,简化控制逻辑,进而可以提高该多个有效指令中的分支指令的分支预测效率,并降低功耗。
Description
本申请要求于2017年07月28日提交中国国家知识产权局、申请号为201710632787.7、申请名称为“分支预测方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
本申请涉及处理器技术领域,特别涉及一种分支预测方法及装置。
为了满足用户对处理器性能日益增高的需求,处理器通常使用能够重叠执行指令的流水线技术来提高效率。为了实现指令的灵活执行,经常会使用分支指令来改变指令的执行顺序。由于分支指令的分支跳转结果只有在流水线后级才能得到,所以处理器获取到分支指令时,需要停顿流水线直至得到该分支指令的分支跳转结果后,才能在该分支跳转结果指示的跳转方向上继续获取指令。然而,停顿流水线会造成流水线的运行被打断,使得流水线产生“气泡”(Bubble),从而大大影响了处理器性能。为此,处理器获取到分支指令时,一般在流水线前级就会对该分支指令进行分支预测,并在分支预测结果指示的跳转方向上继续获取指令,无需等待流水线后级返回该分支指令的分支跳转结果,从而可以减少流水线的“气泡”,提高处理器的效率。
目前,对分支指令的分支预测可以通过全局历史寄存器(Global History Register,GHR)和模式历史表(Pattern History Table,PHT)实现。假设GHR中存储有k位的历史分支信息,对于需要进行分支预测的多个分支指令中的第1个分支指令,可以基于GHR中存储的k位信息来访问PHT中一个地址上的表项,根据该表项确定第1个分支指令的分支预测结果。对于第2个分支指令,可以基于GHR中除最高1位之外的k-1位信息来访问PHT中连续两个地址上的表项,基于第1个分支指令的分支预测结果从这两个表项中选择一个表项,根据选择的表项确定第2个分支指令的分支预测结果。对于第3个分支指令,可以基于GHR中除最高2位之外的k-2位信息来访问PHT中连续四个地址上的表项,基于第1个分支指令和第2个分支指令的分支预测结果从这四个表项中选择一个表项,根据选择的表项确定第3个分支指令的分支预测结果。以此类推,依次确定该多个分支指令中每个分支指令的分支预测结果。
然而,上述分支预测过程中,某个分支指令的分支预测结果的确定依赖于该分支指令之前的所有分支指令的分支预测结果,从而导致分支预测过程较为繁琐,控制逻辑较为复杂,进而导致该多个分支指令的分支预测效率较低,且功耗较大。另外,由于PHT被访问的地址会随着该多个分支指令的个数的增长而以指数式增长,例如,该多个分支指令的个数为2个时,PHT被访问的地址有(2
0+2
1)个,该多个分支指令的个数为3个时,PHT被访问的地址有(2
0+2
1+2
2)个,……,所以此时PHT的表项将会较多,占用的存储资源较多。
发明内容
为了解决相关技术中分支预测过程较为繁琐以及PHT的表项较多的问题,本申请提供了一种分支预测方法及装置。所述技术方案如下:
第一方面,提供了一种分支预测方法,所述方法包括:
获取当前周期内待执行的指令簇,所述指令簇中包括多个有效指令;
当所述多个有效指令中存在分支指令时,基于所述指令簇的地址和所述指令簇所属线程的历史分支信息,从PHT中获取所述指令簇的分支预测信息;
基于所述指令簇的分支预测信息,确定所述多个有效指令中每个分支指令的分支预测结果。
本发明实施例中,仅从PHT中获取一个分支预测信息,就可以得到该多个有效指令中每个分支指令的分支预测结果,从而可以简化该多个有效指令中的分支指令的分支预测过程,简化控制逻辑,进而可以提高分支预测效率,并降低功耗。另外,由于在对该多个有效指令中的分支指令进行分支预测的过程中,仅需访问PHT中一个地址上的表项,也即是,在PHT具有较少表项的情况下就可以完成分支预测,所以可以减少PHT占用的存储资源。
其中,所述基于所述指令簇的地址和所述指令簇所属线程的历史分支信息,从PHT中获取所述指令簇的分支预测信息,包括:
基于所述指令簇的地址和所述指令簇所属线程的历史分支信息,确定索引值;
将所述索引值作为索引查询所述PHT,得到所述指令簇的分支预测信息。
由于各个指令簇的地址一般是预先设定的,因此,可以将指令簇的地址与PHT中的表项进行对应。然而,如果直接将指令簇的地址与PHT中的表项进行对应,则PHT中的每个表项只能被一个指令簇所使用来进行分支预测,在指令簇的数量较多的情况下,PHT的表项也会较多,从而导致PHT占用的存储资源较多。为了解决这个问题,通常会将指令簇的地址的低位与PHT中的表项进行对应,此时具有相同低位地址的指令簇可以使用PHT中的同一表项来进行分支预测,从而可以减少PHT中的表项。然而,此时具有相同低位地址的指令簇将会访问和修改PHT中的同一表项,从而导致不同指令簇包括的分支指令的分支预测相互干扰。在此情况下,由于某个线程中将要执行的分支指令的分支跳转结果与该线程中在该分支指令之前获取的分支指令的分支结果具有很高的相关性,所以,为了避免不同指令簇包括的分支指令的分支预测相互干扰,可以将指令簇所属线程的历史分支信息与指令簇的地址结合后来访问或修改PHT,从而可以使具有相同低位地址的指令簇使用PHT中的不同表项来进行分支预测,进而提高分支预测的准确度。综上所述,本发明实施例中可以直接基于该指令簇的地址和该指令簇所属线程的历史分支信息,来从PHT中获取该指令簇的分支预测信息。
其中,基于该指令簇的地址和该指令簇所属线程的历史分支信息,确定索引值时,可以将该指令簇的地址的低位与该指令簇所属线程的历史分支信息通过哈希(hash)函数结合在一起,得到索引值。
需要说明的是,哈希函数可以包括加法哈希、位运算哈希(如移位、异或等)、乘法哈希、混合哈希等,本发明实施例对此不作限定。
另外,索引值可以为PHT中存储的与表项对应的值,此时PHT中不仅存储有多个表项, 还存储有各个表项对应的索引值;或者,索引值可以为PHT包括的表项的地址,本发明实施例对此不作限定。
其中,将该索引值作为索引查询该PHT,得到该指令簇的分支预测信息时,如果索引值为PHT中存储的与表项对应的值,则可以基于该索引值,从PHT中获取对应的表项,并将所获取的表项中的分支预测信息确定为该指令簇的分支预测信息;如果索引值为PHT包括的表项的地址,则可以将该索引值作为地址,来获取PHT中存储在该地址上的表项,并将所获取的表项中的分支预测信息确定为该指令簇的分支预测信息。
其中,所述基于所述指令簇的分支预测信息,确定所述多个有效指令中每个分支指令的分支预测结果,包括:
当所述指令簇的分支预测信息为指定值时,确定所述多个有效指令中所有分支指令的分支预测结果均为不跳转;
当所述指令簇的分支预测信息不为指定值时,确定所述指令簇的分支预测信息所指示的跳转指令位置n,所述n为按照所述多个有效指令的存储地址对所述多个有效指令排序时,所述指令簇中被预测跳转的分支指令在所述多个有效指令中的所有分支指令中的位置;
当所述多个有效指令中的分支指令的个数小于所述n时,确定所述多个有效指令中所有分支指令的分支预测结果均为不跳转;
当所述多个有效指令中的分支指令的个数不小于所述n时,按照所述多个有效指令的存储地址的顺序,确定所述多个有效指令中的第n个分支指令的分支预测结果为跳转,并确定所述多个有效指令中的前n-1个分支指令的分支预测结果均为不跳转。
需要说明的是,指定值可以预先进行设置,且指定值用于指示所有分支指令均被预测不跳转。
另外,本发明实施例中将跳转指令位置用n表示,跳转指令位置为被预测跳转的分支指令的位置,对于该指令簇来说,跳转指令位置即为按照该指令簇包括的多个有效指令中的所有分支指令的存储地址对该所有分支指令排序时,该指令簇中被预测跳转的分支指令在该所有分支指令中的位置。例如,按照该指令簇包括的多个有效指令中的所有分支指令的存储地址对该所有分支指令进行排序,假设跳转指令位置为2,则该指令簇中被预测跳转的分支指令为该所有分支指令中的第2个分支指令。
再者,当该多个有效指令中的分支指令的个数小于n时,表明该多个有效指令中最后一个分支指令的位置尚未达到该指令簇的分支预测信息所指示的跳转指令位置,则此时可以确定该多个有效指令中所有分支指令的分支预测结果均为不跳转。当该多个有效指令中的分支指令的个数不小于n时,表明该多个有效指令中最后一个分支指令的位置达到该指令簇的分支预测信息所指示的跳转指令位置,则此时可以按照该多个有效指令的存储地址的顺序,确定该跳转指令位置上的分支指令(即该多个有效指令中的第n个分支指令)的分支预测结果为跳转,并确定该跳转指令位置之前的位置上的分支指令(即该多个有效指令中的前n-1个分支指令)的分支预测结果均为不跳转。
在本发明实施例中,仅根据该指令簇的分支预测信息,就可以确定该指令簇中的多个有效指令中每个分支指令的分支预测结果,此时该多个有效指令中的分支指令的分支预测过程较为简单,控制逻辑也比较简单,从而可以提高该多个有效指令中的分支指令的分支预测效率,并降低分支预测过程的功耗。
进一步地,所述基于所述指令簇的分支预测信息,确定所述多个有效指令中每个分支指令的分支预测结果之后,还包括:
在所述指令簇所属线程的历史分支信息中,增加所述多个有效指令中每个分支指令的分支预测结果,并删除m个存储时间最早的分支结果,所述m为所述多个有效指令中的分支指令的个数。
在本发明实施例中,可以基于该多个有效指令中每个分支指令的分支预测结果,来更新该指令簇所属线程的历史分支信息,从而使得后续获取的分支指令的分支预测可以继续基于更新后的历史分支信息来正常进行。
进一步地,所述PHT的表项包括分支预测信息和状态信息,所述状态信息用于指示所述分支预测信息为强状态或弱状态;
所述基于所述指令簇的分支预测信息,确定所述多个有效指令中每个分支指令的分支预测结果之后,还包括:
执行所述多个有效指令,得到所述多个有效指令中每个分支指令的分支跳转结果;
当所述多个有效指令中的所有分支指令的分支预测结果和分支跳转结果均为不跳转时,或者当所述多个有效指令中分支预测结果为跳转的分支指令的分支跳转结果为跳转时,如果所述PHT中所述指令簇的分支预测信息所在表项中的状态信息指示强状态,则不对所述表项进行更新;如果所述PHT中所述指令簇的分支预测信息所在表项中的状态信息指示弱状态,则将所述表项中的状态信息更新为指示强状态的状态信息;
当所述多个有效指令中的所有分支指令的分支预测结果均为不跳转且所述多个有效指令中存在分支指令的分支跳转结果为跳转时,或者当所述多个有效指令中分支预测结果为不跳转的分支指令的分支跳转结果为跳转时,如果所述PHT中所述指令簇的分支预测信息所在表项中的状态信息指示强状态,则将所述表项中的状态信息更新为指示弱状态的状态信息;如果所述PHT中所述指令簇的分支预测信息所在表项中的状态信息指示弱状态,则将所述表项中的分支预测信息更新为用于指示所述指令簇的跳转指令位置的分支预测信息,所述指令簇的跳转指令位置为按照所述多个有效指令的存储地址对所述多个有效指令排序时,所述多个有效指令中分支跳转结果为跳转的分支指令在所述多个有效指令中的所有分支指令中的位置。
在本发明实施例中,可以基于该多个有效指令中每个分支指令的分支跳转结果,来更新PHT,从而保证后续获取的分支指令的分支预测可以基于更新后的PHT来进行,保证后续分支预测时的准确度。
其中,所述基于所述指令簇的地址和所述指令簇所属线程的历史分支信息,从PHT中获取所述指令簇的分支预测信息,包括:
当基于所述PHT对正在运行的多个线程的分支指令进行分支预测时,基于所述指令簇的地址、所述指令簇所属线程的历史分支信息和所述指令簇所属线程的线程标识,从所述PHT中获取所述指令簇的分支预测信息。
由于正在运行的多个线程可以共享一个PHT,所以,具有相同低位地址和历史分支信息的线程将会使用PHT中的同一表项来进行分支预测,从而导致不同线程的分支指令的分支预测相互干扰。为了解决这个问题,可以将线程的线程标识也作为使用PHT中的表项的一个依据,也即是,可以将指令簇的地址、指令簇所属线程的历史分支信息与指令簇所属 线程的线程标识结合后来访问或修改PHT,从而可以使具有相同低位地址和历史分支信息的线程使用PHT中的不同表项来进行分支预测,进而提高分支预测的准确度。综上所述,本发明实施例中可以基于该指令簇的地址、该指令簇所属线程的历史分支信息和该指令簇所属线程的线程标识,从PHT中获取该指令簇的分支预测信息。
其中,所述基于所述指令簇的地址、所述指令簇所属线程的历史分支信息和所述指令簇所属线程的线程标识,从所述PHT中获取所述指令簇的分支预测信息,包括:
基于所述指令簇的地址、所述指令簇所属线程的历史分支信息和所述指令簇所属线程的线程标识,确定索引值;
将所述索引值作为索引查询所述PHT,得到所述指令簇的分支预测信息。
第二方面,提供了一种分支预测装置,所述分支预测装置具有实现上述第一方面中分支预测方法行为的功能。所述分支预测装置包括至少一个模块,所述至少一个模块用于实现上述第一方面所提供的分支预测方法。
第三方面,提供了一种分支预测装置,所述分支预测装置的结构中包括处理器和存储器,所述存储器用于存储支持分支预测装置执行上述第一方面所提供的分支预测方法的程序,以及存储用于实现上述第一方面所述的分支预测方法所涉及的数据。所述处理器被配置为用于执行所述存储器中存储的程序。所述分支预测装置还可以包括通信总线,所述通信总线用于在所述处理器与所述存储器之间建立连接。
第四方面,提供了一种计算机可读存储介质,所述计算机可读存储介质中存储有指令,当其在计算机上运行时,使得计算机执行上述第一方面所述的分支预测方法。
第五方面,提供了一种包含指令的计算机程序产品,当其在计算机上运行时,使得计算机执行上述第一方面所述的分支预测方法。
上述第二方面、第三方面、第四方面和第五方面所获得的技术效果与上述第一方面中对应的技术手段获得的技术效果近似,在这里不再赘述。
本申请提供的技术方案带来的有益效果是:获取当前周期内待执行的指令簇后,如果该指令簇包括的多个有效指令中存在分支指令,则开始对该多个有效指令中的分支指令进行分支预测。此时,先基于该指令簇的地址和该指令簇所属线程的历史分支信息,从PHT中获取该指令簇的分支预测信息,再基于该指令簇的分支预测信息,确定该多个有效指令中每个分支指令的分支预测结果,从而完成对该多个有效指令中的分支指令的分支预测。本发明实施例中仅从PHT中获取一个分支预测信息,就可以得到该多个有效指令中每个分支指令的分支预测结果,从而简化了该多个有效指令中的分支指令的分支预测过程,简化了控制逻辑,进而提高了分支预测效率,并降低了功耗。另外,由于在对该多个有效指令中的分支指令进行分支预测的过程中,仅需访问PHT中一个地址上的表项,也即是,在PHT具有较少表项的情况下就可以完成分支预测,所以可以减少PHT占用的存储资源。
图1A是本发明实施例提供的一种指令簇的示意图;
图1B是本发明实施例提供的一种系统架构的示意图;
图1C是本发明实施例提供的一种GHR的示意图;
图1D是本发明实施例提供的一种PHT的示意图;
图2是本发明实施例提供的一种计算机设备的结构示意图;
图3A是本发明实施例提供的一种分支预测方法的流程图;
图3B是本发明实施例提供的另一种PHT的示意图;
图3C是本发明实施例提供的一种确定多个有效指令中每个分支指令的分支预测结果的操作的流程图;
图3D是本发明实施例提供的一种GHR的投机更新过程的示意图;
图3E是本发明实施例提供的一种修正分支信息的示意图;
图3F是本发明实施例提供的另一种修正分支信息的示意图;
图3G是本发明实施例提供的一种PHT的修正更新过程的示意图;
图4A是本发明实施例提供的一种分支预测装置的结构示意图;
图4B是本发明实施例提供的一种第二获取模块的结构示意图;
图4C是本发明实施例提供的一种确定模块的结构示意图;
图4D是本发明实施例提供的另一种分支预测装置的结构示意图;
图4E是本发明实施例提供的又一种分支预测装置的结构示意图。
为使本申请的目的、技术方案和优点更加清楚,下面将结合附图对本申请的实施方式作进一步地详细描述。
首先,对本发明实施例涉及的名词进行说明。
流水线:流水线通常由有联系的多个流水段组成,该多个流水段均有专门的功能模块对指令进行处理,且该多个流水段对指令的处理过程在时间上可以相互错开,从而使得多个指令能够被并行处理。例如,五级流水线包括5个流水段,依次可以为取指(Instruction Fetch,IF)、译码(Instruction Decode,ID)、执行(Execute,EX)、访问(Memory Access,MEM)、写回(Write Back,WB)。
指令簇(instruction bundle):存储器中存储的指令可以按照其存储地址被划分为多个指令簇,各个指令簇的地址为各个指令簇包括的第一个指令的存储地址,且各个指令簇的地址对齐于预设的指令边界(instruction boundary),此时每相邻两个指令边界之间均包括预设数值个指令,且每相邻两个指令边界之间包括的预设数值个指令组成一个指令簇,即各个指令簇均包括预设数值个指令。例如,各个指令簇的地址对齐于4-instruction boundary,此时每相邻两个指令边界之间均包括4个指令,且每相邻两个指令边界之间包括的4个指令组成一个指令簇,即各个指令簇均包括4个指令。处理器以指令簇为单位来获取指令,也即是,处理器每个周期只能从一个指令簇中获取指令,不能跨指令簇获取指令。
分支指令:分支指令为用于改变指令的执行顺序的指令。通常情况下,处理器按照存储器中存储的指令的存储地址的顺序,依次从存储器中获取指令并执行。当处理器获取的 指令中包括分支指令时,该分支指令被执行后得到分支跳转结果。当该分支跳转结果为不跳转时,指令的执行顺序不会被改变,此时处理器会继续依次获取位于该分支指令之后的指令来执行;当该分支跳转结果为跳转时,指令的执行顺序会被改变,此时处理器将不再继续依次获取位于该分支指令之后的指令来执行,而是会从该分支指令指示的目标存储地址开始依次获取指令并执行。例如,如图1A所示,处理器在某个周期从存储器中获取到指令簇1,指令簇1包括指令1、指令2和指令3,存储器中与指令簇1相邻的下一个指令簇为指令簇2。通常情况下,处理器在该周期获取到指令簇1后,会在下个周期继续获取指令簇2来执行。然而,如果指令簇1包括的指令2为分支指令,且指令2执行后得到的分支跳转结果为跳转,则处理器在下个周期将不再继续获取指令簇2来执行,而是会从指令2指示的目标存储地址开始依次获取指令并执行。假设指令2指示的目标存储地址上的指令为指令16,且指令16属于指令簇5,则指令簇5为指令2所跳转到的指令簇,处理器在下个周期将从指令簇5开始依次获取指令并执行。
有效指令:有效指令为线程在当前周期内运行时所需的指令。对于处理器获取到的某个指令簇,当该指令簇是在其相邻的前一个指令簇后依次获取得到时,该指令簇中的所有指令均为有效指令;当该指令簇是某个分支指令所跳转到的指令簇时,即当该指令簇为某个分支指令指示的目标存储地址上的指令所属的指令簇时,由于处理器是从该分支指令指示的目标存储地址开始依次获取指令并执行,所以,有效指令为该指令簇中目标存储地址上的指令、该指令簇中的最后一个指令以及目标存储地址上的指令与该最后一个指令之间的指令。例如,如图1A所示,假设处理器在某个周期从存储器中获取到指令簇1,指令簇1包括指令1、指令2和指令3,指令簇1是在其相邻的前一个指令簇后依次获取得到的,则指令簇1包括的指令1、指令2和指令3均为有效指令。假设处理器在某个周期从存储器中获取到指令簇5,指令簇5包括指令15、指令16和指令17,指令簇5是指令簇1中的分支指令所跳转到的指令簇,且该分支指令指示的目标存储地址上的指令为指令16,则指令簇5中的指令16和指令17为有效指令。
其次,对本发明实施例涉及的应用场景进行说明。
分支指令是线程中频繁出现的指令,其使得指令得以灵活执行,带来了线程行为的多样性。在具有流水线结构的处理器中,由于分支指令的分支跳转结果只有在流水线后级才能得到,所以分支指令的存在会导致流水线产生“气泡”,从而影响了处理器的性能。为此,分支预测应运而生,分支预测是指在流水线后级返回分支指令的分支跳转结果之前,就在流水线前级对分支指令进行分支预测来得到分支预测结果,此时处理器可以在分支预测结果指示的跳转方向上继续获取指令,从而可以有效避免流水线产生“气泡”。其中,当分支预测结果为不跳转时,该分支预测结果指示的跳转方向为不跳转,此时处理器继续依次获取位于该分支指令之后的指令,当分支预测结果为跳转时,该分支预测结果指示的跳转方向为跳转到分支指令指示的目标存储地址,此时处理器从目标存储地址开始依次获取指令。
目前,对多个分支指令进行分支预测时所需的PHT表项往往比较多,从而导致PHT占用的存储资源较多。另外,多个分支指令的分支预测过程往往也比较繁琐,控制逻辑比较复杂,从而导致多个分支指令的分支预测效率较低,且功耗较大。为此,本发明实施例提供了一种分支预测方法,来在减少PHT表项的情况下,简化分支指令的分支预测过程,从而在占用较少存储资源的情况下,提高分支指令的分支预测效率,并降低功耗。
最后,对本发明实施例涉及的系统架构进行说明。
图1B是本发明实施例提供的一种系统架构的示意图。参见图1B,该系统架构可以包括:存储器101和处理器102。
存储器101用于存储指令,且所存储的指令分属于多个指令簇。处理器102用于在每个周期从存储器101中获取该周期内待执行的指令簇,并执行该指令簇包括的多个有效指令来运行线程。另外,当该多个有效指令中存在分支指令时,处理器102还用于对该多个有效指令中的分支指令进行分支预测得到分支预测结果,以便后续可以在该分支预测结果指示的跳转方向上继续获取指令。再者,处理器102具有流水线结构,该流水线结构中可以包括多个功能模块,如可以包括取指模块1021、译码模块1022等,其中,取指模块1021可以执行获取指令、分支预测等操作。
具体地,处理器102对所获取的指令簇包括的多个有效指令中的分支指令进行分支预测时,可以基于该指令簇的地址和该指令簇所属线程的历史分支信息,从PHT中获取该指令簇的分支预测信息,并基于该指令簇的分支预测信息,确定该多个有效指令中的每个分支指令的分支预测结果。
需要说明的是,某个线程的历史分支信息包括在运行该线程时所获取的分支指令的分支结果。其中,对于在运行该线程时所获取的每个分支指令,当该分支指令未被执行但已进行分支预测时,该分支指令的分支结果为该分支指令的分支预测结果,当该分支指令已被执行时,该分支指令的分支结果为该分支指令的分支跳转结果。例如,在运行线程1时所获取的指令包括指令1、指令2和指令16,且指令2和指令16均为分支指令,则线程1的历史分支信息中可以包括指令2和指令16的分支结果。假设指令2已被执行,指令16未被执行但已进行分支预测,则线程1的历史分支信息中可以包括指令2的分支跳转结果和指令16的分支预测结果。
另外,某个线程的历史分支信息可以由该线程的GHR来记录,GHR是一个移位寄存器,多线程处理器正在运行的多个线程各自均可以具有独立的GHR,且对于该多个线程中的每个线程,该线程的GHR可以基于该线程的分支指令的分支结果来进行更新。其中,GHR的每个位上可以记录一个分支结果,例如,如图1C所示,某个线程的GHR中记录的历史分支信息可以为110…101,其中,0和1均为分支结果,且0表示不跳转,1表示跳转。
需要说明的是,PHT是用于预测分支指令是否跳转的表,多线程处理器正在运行的多个线程可以共享一个PHT,且对于该多个线程中的每个线程,PHT可以基于该线程的分支指令的分支跳转结果来进行更新。其中,PHT具有多个表项,每个表项包括分支预测信息和状态信息。分支预测信息用于预测分支指令是否跳转,且当分支预测信息为指定值时,分支预测信息用于指示所有分支指令均被预测不跳转,当分支预测信息不为指定值时,分支预测信息用于指示跳转指令位置,该跳转指令位置为被预测跳转的分支指令的位置。状态信息用于指示该分支预测信息为强状态或弱状态,且该状态信息用于体现该分支预测信息的预测准确度,也即是,当该状态信息为强状态时,表明该分支预测信息的预测准确度较高,当该状态信息为弱状态时,表明该分支预测信息的预测准确度较低。
例如,如图1D所示,PHT中的多个表项为000、001…110、111,其中,每个表项的最高2位上的数值为分支预测信息,最低1位上的数值为状态信息,且状态信息1用于指示强状态,状态信息0用于指示弱状态。假设指定值为00,则表项000中的分支预测信息 00用于指示所有分支指令均被预测不跳转,且状态信息0用于指示分支预测信息00的预测准确度较低;表项001中的分支预测信息00用于指示所有分支指令均被预测不跳转,且状态信息1用于指示分支预测信息00的预测准确度较高;……;表项110中的分支预测信息11用于指示跳转指令位置,且状态信息0用于指示分支预测信息11的预测准确度较低;表项111中的分支预测信息11用于指示跳转指令位置,且状态信息1用于指示分支预测信息11的预测准确度较高。
其中,基于该指令簇的地址和该指令簇所属线程的历史分支信息,从PHT中获取该指令簇的分支预测信息,并基于该指令簇的分支预测信息,确定该多个有效指令中的每个分支指令的分支预测结果的操作将在下文图3A实施例中进行详细阐述。
图2是本发明实施例提供的一种计算机设备的结构示意图,图1中的系统架构可以通过图2所示的计算机设备来实现。参见图2,该计算机设备包括至少一个处理器201,通信总线202,存储器203以及至少一个通信接口204。
处理器201可以是一个通用中央处理器(Central Processing Unit,CPU),微处理器,特定应用集成电路(application-specific integrated circuit,ASIC),或一个或多个用于控制本申请方案程序执行的集成电路。
通信总线202在上述组件之间传送信息。
存储器203可以是只读存储器(read-only memory,ROM)或可存储静态信息和指令的其它类型的静态存储设备,随机存取存储器(random access memory,RAM)或者可存储信息和指令的其它类型的动态存储设备,也可以是电可擦可编程只读存储器(Electrically Erasable Programmable Read-Only Memory,EEPROM)、只读光盘(Compact Disc Read-Only Memory,CD-ROM)或其它光盘存储、光碟存储(包括压缩光碟、激光碟、光碟、数字通用光碟、蓝光光碟等)、磁盘存储介质或者其它磁存储设备、或者能够用于携带或存储具有指令或数据结构形式的期望的程序代码并能够由计算机存取的任何其它介质,但不限于此。存储器203可以是独立存在,通过通信总线202与处理器201相连接。存储器203也可以和处理器201集成在一起。
通信接口204,使用任何收发器一类的装置,用于与其它设备或通信网络通信,如以太网,无线接入网(Radio Access Network,RAN),无线局域网(Wireless Local Area Networks,WLAN)等。
在具体实现中,作为一种实施例,处理器201可以包括一个或多个CPU,例如图2中所示的CPU0和CPU1。
在具体实现中,作为一种实施例,计算机设备可以包括多个处理器,例如图2中所示的处理器201和处理器205。这些处理器中的每一个可以是一个单核处理器(single-CPU),也可以是一个多核处理器(multi-CPU)。这里的处理器可以指一个或多个设备、电路、和/或用于处理数据(例如计算机程序指令)的处理核。
在具体实现中,作为一种实施例,计算机设备还可以包括输出设备206和输入设备207。输出设备206和处理器201通信,用于显示信息。输出设备206可以为多种形态的显示设备。例如,输出设备206可以是液晶显示器(liquid crystal display,LCD),发光二级管(light emitting diode,LED)显示设备,阴极射线管(cathode ray tube,CRT)显示设备,或投影 仪(projector)等。输入设备207和处理器201通信,可以以多种方式接收用户的输入。例如,输入设备207可以是鼠标、键盘、触摸屏设备或传感设备等。
上述的计算机设备可以是一个通用计算机设备或者是一个专用计算机设备。在具体实现中,计算机设备可以是台式机、便携式电脑、网络服务器、掌上电脑(Personal Digital Assistant,PDA)、移动手机、平板电脑、无线终端设备、通信设备或者嵌入式设备。本发明实施例不限定计算机设备的类型。
其中,存储器203用于存储执行本申请方案的程序代码210,处理器201用于执行存储器203中存储的程序代码210。该计算机设备可以通过处理器201以及存储器203中的程序代码210,来实现下文图3A实施例提供的分支预测方法。
图3A是本发明实施例提供的一种分支预测方法的流程图,该方法应用于计算机设备,具体可以应用于计算机设备中的处理器。参见图3A,该方法包括:
步骤301:获取当前周期内待执行的指令簇,该指令簇中包括多个有效指令。
具体地,可以获取当前周期内待执行的指令簇的地址,基于该指令簇的地址获取该指令簇。
其中,获取当前周期内待执行的指令簇的地址时,可以获取正在运行的线程的程序计数器(Program Counter,PC)中存放的存储地址,并将获取的存储地址确定为当前周期内待执行的指令簇的地址。
其中,基于该指令簇的地址获取该指令簇时,可以将该指令簇的地址作为起始存储地址,获取从该起始存储地址开始的预设数值个连续的存储地址上的指令,所获取的多个指令即可组成该指令簇。其中,预设数值可以预先进行设置,如预设数值可以为4等。
需要说明的是,该多个有效指令为该线程在当前周期内运行时所需的指令。当该指令簇是在其相邻的前一个指令簇后依次获取得到时,该多个有效指令为该指令簇中的所有指令;当该指令簇是某个分支指令所跳转到的指令簇时,该多个有效指令为该指令簇中在该分支指令指示的目标存储地址上的指令、该指令簇中的最后一个指令以及目标存储地址上的指令与该最后一个指令之间的指令。
步骤302:当该多个有效指令中存在分支指令时,基于该指令簇的地址和该指令簇所属线程的历史分支信息,从PHT中获取该指令簇的分支预测信息。
需要说明的是,该指令簇所属线程即为步骤301中所述的正在运行的线程,换句话说,该指令簇所属线程即为通过执行该指令簇中的多个有效指令就能够实现其运行的线程。
另外,该指令簇所属线程的历史分支信息中包括在运行该线程时所获取的分支指令的分支结果。其中,对于在运行该线程时所获取的每个分支指令,当该分支指令未被执行但已进行分支预测时,该分支指令的分支结果为该分支指令的分支预测结果,当该分支指令已被执行时,该分支指令的分支结果为该分支指令的分支跳转结果。例如,在运行该线程时所获取的指令包括指令1、指令2和指令16,且指令2和指令16均为分支指令,则该线程的历史分支信息中可以包括指令2和指令16的分支结果。假设指令2已被执行,指令16未被执行但已进行分支预测,则该线程的历史分支信息中可以包括指令2的分支跳转结果和指令16的分支预测结果。实际应用中,该线程的历史分支信息可以通过该线程的GHR来进行记录,此时该线程的GHR的每个位上可以记录一个分支结果。
再者,PHT具有多个表项,每个表项包括分支预测信息和状态信息。分支预测信息用于预测分支指令是否跳转,且当分支预测信息为指定值时,分支预测信息用于指示所有分支指令均被预测不跳转,当分支预测信息不为指定值时,分支预测信息用于指示跳转指令位置,该跳转指令位置为被预测跳转的分支指令的位置。状态信息用于指示该分支预测信息为强状态或弱状态,且该状态信息用于体现该分支预测信息的预测准确度,也即是,当该状态信息为强状态时,表明该分支预测信息的预测准确度较高,当该状态信息为弱状态时,表明该分支预测信息的预测准确度较低。
其中,基于该指令簇的地址和该指令簇所属线程的历史分支信息,从PHT中获取该指令簇的分支预测信息的操作可以通过如下两种方式实现。
第一种方式:基于该指令簇的地址和该指令簇所属线程的历史分支信息,确定索引值;将该索引值作为索引查询PHT,得到该指令簇的分支预测信息。
其中,基于该指令簇的地址和该指令簇所属线程的历史分支信息,确定索引值时,可以将该指令簇的地址的低位与该指令簇所属线程的历史分支信息通过哈希函数结合在一起,得到索引值。
需要说明的是,哈希函数可以包括加法哈希、位运算哈希(如移位、异或等)、乘法哈希、混合哈希等,本发明实施例对此不作限定。
另外,索引值可以为PHT中存储的与表项对应的值,如图3B所示,此时PHT中不仅存储有多个表项,还存储有各个表项对应的索引值;或者,索引值可以为PHT包括的表项的地址,本发明实施例对此不作限定。
其中,将该索引值作为索引查询该PHT,得到该指令簇的分支预测信息时,如果索引值为PHT中存储的与表项对应的值,则可以基于该索引值,从PHT中获取对应的表项,并将所获取的表项中的分支预测信息确定为该指令簇的分支预测信息;如果索引值为PHT包括的表项的地址,则可以将该索引值作为地址,来获取PHT中存储在该地址上的表项,并将所获取的表项中的分支预测信息确定为该指令簇的分支预测信息。
由于各个指令簇的地址一般是预先设定的,因此,可以将指令簇的地址与PHT中的表项进行对应。然而,如果直接将指令簇的地址与PHT中的表项进行对应,则PHT中的每个表项只能被一个指令簇所使用来进行分支预测,在指令簇的数量较多的情况下,PHT的表项也会较多,从而导致PHT占用的存储资源较多。为了解决这个问题,通常会将指令簇的地址的低位与PHT中的表项进行对应,此时具有相同低位地址的指令簇可以使用PHT中的同一表项来进行分支预测,从而可以减少PHT中的表项。然而,此时具有相同低位地址的指令簇将会访问和修改PHT中的同一表项,从而导致不同指令簇包括的分支指令的分支预测相互干扰。在此情况下,由于某个线程中将要执行的分支指令的分支跳转结果与该线程中在该分支指令之前获取的分支指令的分支结果具有很高的相关性,所以,为了避免不同指令簇包括的分支指令的分支预测相互干扰,可以将指令簇所属线程的历史分支信息与指令簇的地址结合后来访问或修改PHT,从而可以使具有相同低位地址的指令簇使用PHT中的不同表项来进行分支预测,进而提高分支预测的准确度。综上所述,本发明实施例中可以通过上述第一种方式来从PHT中获取该指令簇的分支预测信息。
第二种方式:当基于PHT对正在运行的多个线程的分支指令进行分支预测时,基于该指令簇的地址、该指令簇所属线程的历史分支信息和该指令簇所属线程的线程标识,从PHT 中获取该指令簇的分支预测信息。
需要说明的是,线程标识用于唯一标识该线程,且该多个线程中每个线程的线程标识可以预先进行设置。
具体地,可以基于该指令簇的地址、该指令簇所属线程的历史分支信息和该指令簇所属线程的线程标识,确定索引值;将该索引值作为索引查询PHT,得到该指令簇的分支预测信息。
其中,基于该指令簇的地址、该指令簇所属线程的历史分支信息和该指令簇所属线程的线程标识,确定索引值时,可以将该指令簇的地址的低位、该指令簇所属线程的历史分支信息与该指令簇所属线程的线程标识通过哈希函数结合在一起,得到索引值。
其中,将该索引值作为索引查询该PHT,得到该指令簇的分支预测信息的操作与上述第一种方式中将该索引值作为索引查询PHT,得到该指令簇的分支预测信息的操作相同,本发明实施例对此不再赘述。
由于正在运行的多个线程可以共享一个PHT,所以,具有相同低位地址和历史分支信息的线程将会使用PHT中的同一表项来进行分支预测,从而导致不同线程的分支指令的分支预测相互干扰。为了解决这个问题,可以将线程的线程标识也作为使用PHT中的表项的一个依据,也即是,可以将指令簇的地址、指令簇所属线程的历史分支信息与指令簇所属线程的线程标识结合后来访问或修改PHT,从而可以使具有相同低位地址和历史分支信息的线程使用PHT中的不同表项来进行分支预测,进而提高分支预测的准确度。综上所述,本发明实施例中可以通过上述第二种方式来从PHT中获取该指令簇的分支预测信息。
步骤303:基于该指令簇的分支预测信息,确定该多个有效指令中每个分支指令的分支预测结果。
具体地,参见图3C,步骤303可以包括如下步骤3031-3036。
步骤3031:判断该指令簇的分支预测信息是否为指定值。
需要说明的是,指定值可以预先进行设置,且指定值用于指示所有分支指令均被预测不跳转,如指定值可以为00等。
当该指令簇的分支预测信息为指定值时,可以继续执行如下步骤3032来确定该多个有效指令中每个分支指令的分支预测结果。当该指令簇的分支预测信息不为指定值时,可以继续执行如下步骤3033来确定该多个有效指令中每个分支指令的分支预测结果。
步骤3032:当该指令簇的分支预测信息为指定值时,确定该多个有效指令中所有分支指令的分支预测结果均为不跳转。
由于指定值用于指示所有分支指令均被预测不跳转,因此,当该指令簇的分支预测信息为指定值时,可以确定该指令簇中的多个有效指令中的所有分支指令的分支预测结果均为不跳转。
例如,指定值为00,该指令簇的分支预测信息为00,则可以确定该指令簇的分支预测信息为指定值。假设该指令簇包括的多个有效指令为指令2、指令3、指令4,且指令3和指令4为分支指令,则可以确定该多个有效指令中的所有分支指令(即指令3和指令4)的分支预测结果均为不跳转。
步骤3033:当该指令簇的分支预测信息不为指定值时,确定该指令簇的分支预测信息所指示的跳转指令位置n。
需要说明的是,本发明实施例中将跳转指令位置用n表示,跳转指令位置为被预测跳转的分支指令的位置,对于该指令簇来说,跳转指令位置即为按照该指令簇包括的多个有效指令中的所有分支指令的存储地址对该所有分支指令排序时,该指令簇中被预测跳转的分支指令在该所有分支指令中的位置。例如,按照该指令簇包括的多个有效指令中的所有分支指令的存储地址对该所有分支指令进行排序,假设跳转指令位置为2,则该指令簇中被预测跳转的分支指令为该所有分支指令中的第2个分支指令。
其中,确定该指令簇的分支预测信息所指示的跳转指令位置(即n)时,可以基于该指令簇的分支预测信息,从存储的分支预测信息与跳转指令位置之间的对应关系中,获取对应的跳转指令位置,获取的跳转指令位置即为该指令簇的分支预测信息所指示的跳转指令位置。
例如,该指令簇的分支预测信息为01,则可以基于该指令簇的分支预测信息01,从下表1所示的分支预测信息与跳转指令位置之间的对应关系中,获取对应的跳转指令位置为1,1即为该指令簇的分支预测信息所指示的跳转指令位置。
表1
分支预测信息 | 跳转指令位置 |
01 | 1 |
10 | 2 |
…… | …… |
需要说明的是,本发明实施例中仅以上表1所示的分支预测信息与跳转指令位置之间的对应关系为例进行说明,上表1并不对本发明实施例构成限定。
步骤3034:判断该多个有效指令中的分支指令的个数是否小于n。
需要说明的是,当该多个有效指令中的分支指令的个数小于n时,表明该多个有效指令中最后一个分支指令的位置尚未达到该指令簇的分支预测信息所指示的跳转指令位置。当该多个有效指令中的分支指令的个数不小于n时,表明该多个有效指令中最后一个分支指令的位置已达到该指令簇的分支预测信息所指示的跳转指令位置。
当该多个有效指令中的分支指令的个数小于n时,可以继续执行如下步骤3035来确定该多个有效指令中每个分支指令的分支预测结果。当该多个有效指令中的分支指令的个数不小于n时,可以继续执行如下步骤3036来确定该多个有效指令中每个分支指令的分支预测结果。
步骤3035:当该多个有效指令中的分支指令的个数小于n时,确定该多个有效指令中所有分支指令的分支预测结果均为不跳转。
当该多个有效指令中的分支指令的个数小于n时,表明该多个有效指令中最后一个分支指令的位置尚未达到该指令簇的分支预测信息所指示的跳转指令位置,则此时可以确定该多个有效指令中所有分支指令的分支预测结果均为不跳转。
例如,n为3,该多个有效指令为指令2、指令3、指令4,且指令3和指令4为分支指令,该多个有效指令中的分支指令的个数为2。则此时该多个有效指令中的分支指令的个数小于n,可以确定该多个有效指令中所有分支指令(即指令3和指令4)的分支预测结果均为不跳转。
步骤3036:当该多个有效指令中的分支指令的个数不小于n时,按照该多个有效指令的存储地址的顺序,确定该多个有效指令中的第n个分支指令的分支预测结果为跳转,并确定该多个有效指令中的前n-1个分支指令的分支预测结果均为不跳转。
当该多个有效指令中的分支指令的个数不小于n时,表明该多个有效指令中最后一个分支指令的位置已达到该指令簇的分支预测信息所指示的跳转指令位置,则此时可以按照该多个有效指令的存储地址的顺序,确定该跳转指令位置上的分支指令(即该多个有效指令中的第n个分支指令)的分支预测结果为跳转,并确定该跳转指令位置之前的位置上的分支指令(即该多个有效指令中的前n-1个分支指令)的分支预测结果均为不跳转。
例如,n为2,该多个有效指令按照其存储地址的顺序为指令2、指令3、指令4,且指令3和指令4为分支指令,该多个有效指令中的分支指令的个数为2。则此时该多个有效指令中的分支指令的个数不小于n,可以按照该多个有效指令的存储地址的顺序,确定该多个有效指令中的第2个分支指令(即指令4)的分支预测结果为跳转,确定该多个有效指令中的前1个分支指令(即指令3)的分支预测结果为不跳转。
进一步地,确定该多个有效指令中每个分支指令的分支预测结果之后,还可以基于该多个有效指令中每个分支指令的分支预测结果,来更新该指令簇所属线程的历史分支信息,以便后续获取的分支指令的分支预测可以正常进行。其中,基于该多个有效指令中每个分支指令的分支预测结果,更新该指令簇所属线程的历史分支信息的操作将在接下来的实施例中进行详细阐述。
更进一步地,确定该多个有效指令中每个分支指令的分支预测结果之后,还可以执行该多个有效指令,得到该多个有效指令中每个分支指令的分支跳转结果,并基于该多个有效指令中每个分支指令的分支跳转结果,来更新该指令簇所属线程的历史分支信息和PHT,以保证后续获取的分支指令的分支预测的准确度。其中,基于该多个有效指令中每个分支指令的分支跳转结果,更新该指令簇所属线程的历史分支信息和PHT的操作将在接下来的实施例中进行详细阐述。
需要说明的是,本发明实施例如果应用于多线程处理器,则对于多线程处理器正在运行的多个线程中的每个线程,均可以通过上述步骤301-303来对该线程的分支指令进行分支预测,此时该多个线程中的每个线程的历史分支信息独立记录,而PHT可以被该多个线程所共享,从而可以在保证分支预测的准确度的情况下节省存储资源。
在本发明实施例中,获取当前周期内待执行的指令簇后,如果该指令簇包括的多个有效指令中存在分支指令,则开始对该多个有效指令中的分支指令进行分支预测。此时,先基于该指令簇的地址和该指令簇所属线程的历史分支信息,从PHT中获取该指令簇的分支预测信息,再基于该指令簇的分支预测信息,确定该多个有效指令中每个分支指令的分支预测结果,从而完成对该多个有效指令中的分支指令的分支预测。本发明实施例中仅从PHT中获取一个分支预测信息,就可以得到该多个有效指令中每个分支指令的分支预测结果,从而简化了该多个有效指令中的分支指令的分支预测过程,简化了控制逻辑,进而提高了分支预测效率,并降低了功耗。另外,由于在对该多个有效指令中的分支指令进行分支预测的过程中,仅需访问PHT中一个地址上的表项,也即是,在PHT具有较少表项的情况下就可以完成分支预测,所以可以减少PHT占用的存储资源。
需要说明的是,本发明实施例中可以对该指令簇所属线程的历史分支信息进行投机更新和修正更新。如前文所述,步骤303中基于该多个有效指令中每个分支指令的分支预测结果,更新该指令簇所属线程的历史分支信息的操作,即是对该指令簇所属线程的历史分支信息进行投机更新。步骤303中基于该多个有效指令中每个分支指令的分支跳转结果,更新该指令簇所属线程的历史分支信息的操作,即是对该指令簇所属线程的历史分支信息进行修正更新。下面分别对该指令簇所属线程的历史分支信息的投机更新和修正更新进行说明。
具体地,该指令簇所属线程的历史分支信息的投机更新,即基于该多个有效指令中每个分支指令的分支预测结果,更新该指令簇所属线程的历史分支信息的操作可以为:在该指令簇所属线程的历史分支信息中,增加该多个有效指令中每个分支指令的分支预测结果,并删除m个存储时间最早的分支结果,m为该多个有效指令中的分支指令的个数。
实际应用中,某个线程的历史分支信息往往是由该线程的GHR进行记录的,此时如果得到了该线程的某个分支指令的分支预测结果,则可以将该分支指令的分支预测结果保存到该线程的GHR的最低1位(或最高1位),并将该GHR中原先保存的分支结果均左移(或右移)一位,以将该GHR中存储时间最早的1个分支结果移出该GHR,完成对该GHR的投机更新。
例如,该多个有效指令为指令2、指令3、指令4,且指令3和指令4为分支指令。如图3D所示,假设指令3的分支预测结果为不跳转(用0表示),指令4的分支预测结果为跳转(用1表示),则可以将指令3和指令4的分支预测结果保存到该指令簇所属线程的GHR的最低2位,并将该GHR中原先保存的分支结果均左移两位,以将该GHR中存储时间最早的2个分支结果移出该GHR,完成对该GHR的投机更新。
具体地,该指令簇所属线程的历史分支信息的修正更新,即基于该多个有效指令中每个分支指令的分支跳转结果,更新该指令簇所属线程的历史分支信息的操作可以为:对于该多个有效指令中的每个分支指令,当该分支指令的分支预测结果与分支跳转结果相同时,不更新该指令簇所属线程的历史分支信息;当该分支指令的分支预测结果与分支跳转结果不同时,获取该分支指令对应的修正分支信息,将该分支指令对应的修正分支信息的最后一个分支结果进行翻转,得到该指令簇所属线程的历史分支信息。
其中,将该分支指令对应的修正分支信息的最后一个分支结果进行翻转,是指将该最后一个分支结果更新为相反跳转状态。例如,当该最后一个分支结果为跳转时,将该最后一个分支结果进行翻转,即是将该最后一个分支结果更新为不跳转;当该最后一个分支结果为不跳转时,将该最后一个分支结果进行翻转,即是将该最后一个分支结果更新为跳转。
需要说明的是,在基于该多个有效指令中每个分支指令的分支预测结果,对该指令簇所属线程的历史分支信息进行投机更新时,通常是按照该多个有效指令中的分支指令的存储地址的顺序,在该指令簇所属线程的历史分支信息中,依次增加该多个有效指令中每个分支指令的分支预测结果,并依次删除m个存储时间最早的分支结果。也即是,按照该多个有效指令中的分支指令的存储地址的顺序,每在该指令簇所属线程的历史分支信息中增加一个分支指令的分支预测结果,就同时删除一个存储时间最早的分支结果。此时,对于该多个有效指令中的每个分支指令,该分支指令对应的修正分支信息即为在该指令簇所属线程的历史分支信息中增加该分支指令的分支预测结果,并同时删除一个存储时间最早的 分支结果时,得到的分支信息。
例如,如图3E所示,该指令簇所属线程的历史分支信息为110…101,该多个有效指令中的分支指令按照其存储地址的顺序为指令3、指令4,且指令3的分支预测结果为不跳转(用0表示),指令4的分支预测结果为跳转(用1表示)。则在基于该多个有效指令中每个分支指令的分支预测结果,对该指令簇所属线程的历史分支信息进行投机更新时,可以在该指令簇所属线程的历史分支信息中,依次增加指令3和指令4的分支预测结果,并依次删除2个存储时间最早的分支结果。此时,指令3对应的修正分支信息即为在该指令簇所属线程的历史分支信息中增加指令3的分支预测结果0,并同时删除一个存储时间最早的分支结果1时,得到的分支信息100…010。指令4对应的修正分支信息即为在该指令簇所属线程的历史分支信息中增加指令4的分支预测结果1,并同时删除一个存储时间最早的分支结果1时,得到的分支信息001…101。
由于该指令簇所属线程的历史分支信息往往是由该线程的GHR进行记录的,且该历史分支信息用于对分支指令进行分支预测,因此,为了保证在基于该多个有效指令中每个分支指令的分支预测结果,对该GHR进行投机更新后,可以从投机更新后的GHR中获取每个分支指令对应的修正分支信息,本发明实施例中可以在GHR中额外增加第一数值个位,第一数值个位上记录的分支结果不用于对分支指令进行分支预测,即第一数值个位上的信息不是用于对分支指令进行分支预测的历史分支信息,除第一数值个位之外的其它位上记录的分支结果用于对分支指令进行分支预测,即除第一数值个位之外的其它位上的信息是用于对分支指令进行分支预测的历史分支信息。此时,在基于该多个有效指令中每个分支指令的分支预测结果对该GHR进行投机更新时,该多个有效指令中每个分支指令的分支预测结果被保存到该GHR中后,m个存储时间最早的分支结果将不会被移出该GHR,而是移到了第一数值个位上。之后,对于该多个有效指令中的每个分支指令,可以将投机更新后的GHR中以该分支指令的分支预测结果作为最后一个分支结果的第二数值个连续的分支结果组成该分支指令对应的修正分支信息。
需要说明的是,第一数值可以预先进行设置,且为了保证后续可以获取到该多个有效指令中每个分支指令对应的修正分支信息,第一数值可以不小于该指令簇包括的指令的个数减1所得的数值,即不小于预设数值减1所得的数值。另外,第二数值为历史分支信息包括的分支结果的个数。
例如,如图3F所示,该指令簇所属线程的GHR的虚线框为额外增加的第一数值个位,第一数值个位上记录的分支结果不用于对分支指令进行分支预测,即第一数值个位上的信息不是用于对分支指令进行分支预测的历史分支信息,除第一数值个位之外的其它位(即实线框)上记录的分支结果用于对分支指令进行分支预测,即除第一数值个位之外的其它位上的信息是用于对分支指令进行分支预测的历史分支信息。
假设该多个有效指令中的分支指令按照其存储地址的顺序为指令3、指令4,且指令3的分支预测结果为不跳转(用0表示),指令4的分支预测结果为跳转(用1表示),则基于该多个有效指令中每个分支指令的分支预测结果对该GHR进行投机更新后,该GHR中记录的分支结果依次可以为011001…101。其中,第一数值个位上记录的分支结果依次为011,历史分支信息为001…101,且该GHR中最后两位上依次记录指令3和指令4的分支预测结果。则此时可以将投机更新后的GHR中以指令3的分支预测结果0作为最后一个分 支结果的第二数值个连续的分支结果组成指令3对应的修正分支信息100…010,将投机更新后的GHR中以指令4的分支预测结果1作为最后一个分支结果的第二数值个连续的分支结果组成指令4对应的修正分支信息001…101。
之后,如果指令3的分支预测结果与分支跳转结果不同时,则将指令3对应的修正分支信息100…010的最后一个分支结果0翻转为1,得到该指令簇所属线程的历史分支信息100…011。如果指令4的分支预测结果与分支跳转结果不同时,则将指令4对应的修正分支信息001…101的最后一个分支结果0翻转为1,得到该指令簇所属线程的历史分支信息001…100。
进一步地,将该分支指令对应的修正分支信息的最后一个分支结果进行翻转,得到该指令簇所属线程的历史分支信息之后,还可以丢弃在得到该分支指令的分支预测结果后获取的指令,并基于该指令簇所属线程的历史分支信息,对在该分支指令执行后获取的分支指令进行分支预测,基于得到的分支预测结果来继续获取指令,以保证获取的指令的准确度。
需要说明的是,本发明实施例中可以对PHT进行修正更新。如前文所述,步骤303中基于该多个有效指令中每个分支指令的分支跳转结果,更新PHT的操作,即是对PHT进行修正更新。下面对PHT的修正更新进行说明。
具体地,PHT的修正更新,即基于该多个有效指令中每个分支指令的分支跳转结果,更新PHT的操作可以包括如下四种情况。
其中,当该多个有效指令中的所有分支指令的分支预测结果和分支跳转结果均为不跳转时,或者当该多个有效指令中分支预测结果为跳转的分支指令的分支跳转结果为跳转时,也即是,当该多个有效指令中的分支指令的分支预测正确时,基于该多个有效指令中每个分支指令的分支跳转结果,更新PHT的操作可以包括如下第一种情况和第二种情况。
第一种情况:如果PHT中该指令簇的分支预测信息所在表项中的状态信息指示强状态,则不对该表项进行更新。
当PHT中该指令簇的分支预测信息所在表项中的状态信息指示强状态时,表明该表项中的分支预测信息的预测准确度较高。则在该多个有效指令中的分支指令的分支预测正确时,可以不对该指令簇的分支预测信息所在表项进行更新。
第二种情况:如果PHT中该指令簇的分支预测信息所在表项中的状态信息指示弱状态,则将该表项中的状态信息更新为指示强状态的状态信息。
当PHT中该指令簇的分支预测信息所在表项中的状态信息指示弱状态时,表明该表项中的分支预测信息的预测准确度较低。则在该多个有效指令中的分支指令的分支预测正确时,可以将该表项中的状态信息更新为指示强状态的状态信息,以增高该表项中的分支预测信息的预测准确度。
其中,当该多个有效指令中的所有分支指令的分支预测结果均为不跳转且该多个有效指令中存在分支指令的分支跳转结果为跳转时,或者当该多个有效指令中分支预测结果为不跳转的分支指令的分支跳转结果为跳转时,也即是,当该多个有效指令中的分支指令的分支预测不正确时,基于该多个有效指令中每个分支指令的分支跳转结果,更新PHT的操作可以包括如下第三种情况和第四种情况。
第三种情况:如果PHT中该指令簇的分支预测信息所在表项中的状态信息指示强状态,则将该表项中的状态信息更新为指示弱状态的状态信息。
当PHT中该指令簇的分支预测信息所在表项中的状态信息指示强状态时,表明该表项中的分支预测信息的预测准确度较高。则在该多个有效指令中的分支指令的分支预测不正确时,可以将该表项中的状态信息更新为指示弱状态的状态信息,以降低该表项中的分支预测信息的预测准确度。
第四种情况:如果PHT中该指令簇的分支预测信息所在表项中的状态信息指示弱状态,则将该表项中的分支预测信息更新为用于指示该指令簇的跳转指令位置的分支预测信息。
需要说明的是,该指令簇的跳转指令位置为按照该多个有效指令的存储地址对该多个有效指令排序时,该多个有效指令中分支跳转结果为跳转的分支指令在该多个有效指令中的所有分支指令中的位置。
当PHT中该指令簇的分支预测信息所在表项中的状态信息指示弱状态时,表明该表项中的分支预测信息的预测准确度较低。则在该多个有效指令中的分支指令的分支预测不正确时,可以将该表项中的分支预测信息更新为用于指示该指令簇的跳转指令位置的分支预测信息,以更新该表项中的分支预测信息所指示的跳转指令位置。
下面结合图3G来对上述四种情况下的PHT更新过程进行说明。如图3G所示,假设该多个有效指令按照其存储顺序为指令2、指令3、指令4,且指令2、指令3和指令4为分支指令。PHT包括八种表项,该八种表项分别为000、001、010、011、100、101、110、111,各个表项中的最高2位为分支预测信息,且分支预测信息00用于指示所有分支指令均被预测不跳转,分支预测信息01、10、11分别用于指示跳转指令位置1、2、3。各个表项中的最低1位为状态信息,且状态信息0、1分别用于指示弱状态、强状态。
PHT中的八种表项可以基于上述四种情况中的操作进行更新,该更新过程具体在图3G中示出,其中,T0表示该多个有效指令中的所有分支指令的分支跳转结果均为不跳转,T1表示该多个有效指令中的第1个分支指令(即指令2)的分支跳转结果为跳转,T2表示该多个有效指令中的第2个分支指令(即指令3)的分支跳转结果为跳转,T3表示该多个有效指令中的第3个分支指令(即指令4)的分支跳转结果为跳转。
例如,对于PHT中的表项001,如果该表项中的分支预测信息00为该指令簇的分支预测信息,则基于该指令簇的分支预测信息00,可以确定该多个有效指令中的所有分支指令的分支预测结果均为不跳转。此时由于该表项中的状态信息1用于指示强状态,所以如果该多个有效指令中的所有分支指令的分支跳转结果均为不跳转(即T0),则不对该表项进行更新。如果该多个有效指令中存在分支指令的分支跳转结果为跳转(即T1/T2/T3),则将该表项中的状态信息更新为指示弱状态的状态信息,即将该表项更新为000。同理,PHT中的其它表项也可以基于上述四种情况中的操作进行更新。
图4A是本发明实施例提供的一种分支预测装置的结构示意图,该分支预测装置可以由软件、硬件或者两者的结合实现成为计算机设备的部分或者全部,该计算机设备可以为图2所示的计算机设备。该分支预测装置可以以功能模块的形式来呈现,此时该分支预测装置中的各个功能模块可以通过图2中的处理器和存储器来实现,处理器能够执行或者控制其他器件完成本发明实施例的方法流程中的各步骤,实现各功能。
参见图4A,该装置包括第一获取模块401,第二获取模块402和确定模块403。
第一获取模块401,用于执行图3A实施例中的步骤301;
第二获取模块402,用于执行图3A实施例中的步骤302;
确定模块403,用于执行图3A实施例中的步骤303。
可选地,参见图4B,第二获取模块402包括第一确定单元4021和查询单元4022。
第一确定单元4021,用于基于指令簇的地址和指令簇所属线程的历史分支信息,确定索引值;
查询单元4022,用于将索引值作为索引查询PHT,得到指令簇的分支预测信息。
可选地,参见图4C,确定模块403包括第二确定单元4031,第三确定单元4032,第四确定单元4033和第五确定单元4034。
第二确定单元4031,用于执行图3A实施例中的步骤3032;
第三确定单元4032,用于执行图3A实施例中的步骤3033;
第四确定单元4033,用于执行图3A实施例中的步骤3035;
第五确定单元4034,用于执行图3A实施例中的步骤3036。
可选地,参见图4D,该装置还包括第一更新模块404。
第一更新模块404,用于在指令簇所属线程的历史分支信息中,增加多个有效指令中每个分支指令的分支预测结果,并删除m个存储时间最早的分支结果,m为多个有效指令中的分支指令的个数。
可选地,PHT的表项包括分支预测信息和状态信息,状态信息用于指示分支预测信息为强状态或弱状态;参见图4E,该装置还包括执行模块405,第二更新模块406和第三更新模块407。
执行模块405,用于执行多个有效指令,得到多个有效指令中每个分支指令的分支跳转结果;
第二更新模块406,用于执行图3A实施例中的步骤303中的第一种情况和第二种情况;
第三更新模块407,用于执行图3A实施例中的步骤303中的第三种情况和第四种情况。
可选地,第二获取模块402用于执行图3A实施例中的步骤302中的第二种方式。
在本发明实施例中,获取当前周期内待执行的指令簇后,如果该指令簇包括的多个有效指令中存在分支指令,则开始对该多个有效指令中的分支指令进行分支预测。此时,先基于该指令簇的地址和该指令簇所属线程的历史分支信息,从PHT中获取该指令簇的分支预测信息,再基于该指令簇的分支预测信息,确定该多个有效指令中每个分支指令的分支预测结果,从而完成对该多个有效指令中的分支指令的分支预测。本发明实施例中仅从PHT中获取一个分支预测信息,就可以得到该多个有效指令中每个分支指令的分支预测结果,从而简化了该多个有效指令中的分支指令的分支预测过程,简化了控制逻辑,进而提高了分支预测效率,并降低了功耗。另外,由于在对该多个有效指令中的分支指令进行分支预测的过程中,仅需访问PHT中一个地址上的表项,也即是,在PHT具有较少表项的情况下就可以完成分支预测,所以可以减少PHT占用的存储资源。
需要说明的是:上述实施例提供的分支预测装置在分支预测时,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将装置的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。 另外,上述实施例提供的分支预测装置与分支预测方法实施例属于同一构思,其具体实现过程详见方法实施例,这里不再赘述。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意结合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机指令时,全部或部分地产生按照本发明实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如:同轴电缆、光纤、数据用户线(Digital Subscriber Line,DSL))或无线(例如:红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质(例如:软盘、硬盘、磁带)、光介质(例如:数字通用光盘(Digital Versatile Disc,DVD))、或者半导体介质(例如:固态硬盘(Solid State Disk,SSD))等。
以上所述为本申请提供的实施例,并不用以限制本申请,凡在本申请的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本申请的保护范围之内。
Claims (12)
- 一种分支预测方法,其特征在于,所述方法包括:获取当前周期内待执行的指令簇,所述指令簇中包括多个有效指令;当所述多个有效指令中存在分支指令时,基于所述指令簇的地址和所述指令簇所属线程的历史分支信息,从模式历史表PHT中获取所述指令簇的分支预测信息;基于所述指令簇的分支预测信息,确定所述多个有效指令中每个分支指令的分支预测结果。
- 如权利要求1所述的方法,其特征在于,所述基于所述指令簇的地址和所述指令簇所属线程的历史分支信息,从PHT中获取所述指令簇的分支预测信息,包括:基于所述指令簇的地址和所述指令簇所属线程的历史分支信息,确定索引值;将所述索引值作为索引查询所述PHT,得到所述指令簇的分支预测信息。
- 如权利要求1所述的方法,其特征在于,所述基于所述指令簇的分支预测信息,确定所述多个有效指令中每个分支指令的分支预测结果,包括:当所述指令簇的分支预测信息为指定值时,确定所述多个有效指令中所有分支指令的分支预测结果均为不跳转;当所述指令簇的分支预测信息不为指定值时,确定所述指令簇的分支预测信息所指示的跳转指令位置n,所述n为按照所述多个有效指令的存储地址对所述多个有效指令排序时,所述指令簇中被预测跳转的分支指令在所述多个有效指令中的所有分支指令中的位置;当所述多个有效指令中的分支指令的个数小于所述n时,确定所述多个有效指令中所有分支指令的分支预测结果均为不跳转;当所述多个有效指令中的分支指令的个数不小于所述n时,按照所述多个有效指令的存储地址的顺序,确定所述多个有效指令中的第n个分支指令的分支预测结果为跳转,并确定所述多个有效指令中的前n-1个分支指令的分支预测结果均为不跳转。
- 如权利要求1-3任一所述的方法,其特征在于,所述基于所述指令簇的分支预测信息,确定所述多个有效指令中每个分支指令的分支预测结果之后,还包括:在所述指令簇所属线程的历史分支信息中,增加所述多个有效指令中每个分支指令的分支预测结果,并删除m个存储时间最早的分支结果,所述m为所述多个有效指令中的分支指令的个数。
- 如权利要求1-3任一所述的方法,其特征在于,所述PHT的表项包括分支预测信息和状态信息,所述状态信息用于指示所述分支预测信息为强状态或弱状态;所述基于所述指令簇的分支预测信息,确定所述多个有效指令中每个分支指令的分支预测结果之后,还包括:执行所述多个有效指令,得到所述多个有效指令中每个分支指令的分支跳转结果;当所述多个有效指令中的所有分支指令的分支预测结果和分支跳转结果均为不跳转时, 或者当所述多个有效指令中分支预测结果为跳转的分支指令的分支跳转结果为跳转时,如果所述PHT中所述指令簇的分支预测信息所在表项中的状态信息指示强状态,则不对所述表项进行更新;如果所述PHT中所述指令簇的分支预测信息所在表项中的状态信息指示弱状态,则将所述表项中的状态信息更新为指示强状态的状态信息;当所述多个有效指令中的所有分支指令的分支预测结果均为不跳转且所述多个有效指令中存在分支指令的分支跳转结果为跳转时,或者当所述多个有效指令中分支预测结果为不跳转的分支指令的分支跳转结果为跳转时,如果所述PHT中所述指令簇的分支预测信息所在表项中的状态信息指示强状态,则将所述表项中的状态信息更新为指示弱状态的状态信息;如果所述PHT中所述指令簇的分支预测信息所在表项中的状态信息指示弱状态,则将所述表项中的分支预测信息更新为用于指示所述指令簇的跳转指令位置的分支预测信息,所述指令簇的跳转指令位置为按照所述多个有效指令的存储地址对所述多个有效指令排序时,所述多个有效指令中分支跳转结果为跳转的分支指令在所述多个有效指令中的所有分支指令中的位置。
- 如权利要求1所述的方法,其特征在于,所述基于所述指令簇的地址和所述指令簇所属线程的历史分支信息,从PHT中获取所述指令簇的分支预测信息,包括:当基于所述PHT对正在运行的多个线程的分支指令进行分支预测时,基于所述指令簇的地址、所述指令簇所属线程的历史分支信息和所述指令簇所属线程的线程标识,从所述PHT中获取所述指令簇的分支预测信息。
- 一种分支预测装置,其特征在于,所述装置包括:第一获取模块,用于获取当前周期内待执行的指令簇,所述指令簇中包括多个有效指令;第二获取模块,用于当所述多个有效指令中存在分支指令时,基于所述指令簇的地址和所述指令簇所属线程的历史分支信息,从模式历史表PHT中获取所述指令簇的分支预测信息;确定模块,用于基于所述指令簇的分支预测信息,确定所述多个有效指令中每个分支指令的分支预测结果。
- 如权利要求7所述的装置,其特征在于,所述第二获取模块包括:第一确定单元,用于基于所述指令簇的地址和所述指令簇所属线程的历史分支信息,确定索引值;查询单元,用于将所述索引值作为索引查询所述PHT,得到所述指令簇的分支预测信息。
- 如权利要求7所述的装置,其特征在于,所述确定模块包括:第二确定单元,用于当所述指令簇的分支预测信息为指定值时,确定所述多个有效指令中所有分支指令的分支预测结果均为不跳转;第三确定单元,用于当所述指令簇的分支预测信息不为指定值时,确定所述指令簇的分支预测信息所指示的跳转指令位置n,所述n为按照所述多个有效指令的存储地址对所述多个有效指令排序时,所述指令簇中被预测跳转的分支指令在所述多个有效指令中的所有分支指令中的位置;第四确定单元,用于当所述多个有效指令中的分支指令的个数小于所述n时,确定所述多个有效指令中所有分支指令的分支预测结果均为不跳转;第五确定单元,用于当所述多个有效指令中的分支指令的个数不小于所述n时,按照所述多个有效指令的存储地址的顺序,确定所述多个有效指令中的第n个分支指令的分支预测结果为跳转,并确定所述多个有效指令中的前n-1个分支指令的分支预测结果均为不跳转。
- 如权利要求7-9任一所述的装置,其特征在于,所述装置还包括:第一更新模块,用于在所述指令簇所属线程的历史分支信息中,增加所述多个有效指令中每个分支指令的分支预测结果,并删除m个存储时间最早的分支结果,所述m为所述多个有效指令中的分支指令的个数。
- 如权利要求7-9任一所述的装置,其特征在于,所述PHT的表项包括分支预测信息和状态信息,所述状态信息用于指示所述分支预测信息为强状态或弱状态;所述装置还包括:执行模块,用于执行所述多个有效指令,得到所述多个有效指令中每个分支指令的分支跳转结果;第二更新模块,用于当所述多个有效指令中的所有分支指令的分支预测结果和分支跳转结果均为不跳转时,或者当所述多个有效指令中分支预测结果为跳转的分支指令的分支跳转结果为跳转时,如果所述PHT中所述指令簇的分支预测信息所在表项中的状态信息指示强状态,则不对所述表项进行更新;如果所述PHT中所述指令簇的分支预测信息所在表项中的状态信息指示弱状态,则将所述表项中的状态信息更新为指示强状态的状态信息;第三更新模块,用于当所述多个有效指令中的所有分支指令的分支预测结果均为不跳转且所述多个有效指令中存在分支指令的分支跳转结果为跳转时,或者当所述多个有效指令中分支预测结果为不跳转的分支指令的分支跳转结果为跳转时,如果所述PHT中所述指令簇的分支预测信息所在表项中的状态信息指示强状态,则将所述表项中的状态信息更新为指示弱状态的状态信息;如果所述PHT中所述指令簇的分支预测信息所在表项中的状态信息指示弱状态,则将所述表项中的分支预测信息更新为用于指示所述指令簇的跳转指令位置的分支预测信息,所述指令簇的跳转指令位置为按照所述多个有效指令的存储地址对所述多个有效指令排序时,所述多个有效指令中分支跳转结果为跳转的分支指令在所述多个有效指令中的所有分支指令中的位置。
- 如权利要求7所述的装置,其特征在于,所述第二获取模块用于:当基于所述PHT对正在运行的多个线程的分支指令进行分支预测时,基于所述指令簇的地址、所述指令簇所属线程的历史分支信息和所述指令簇所属线程的线程标识,从所述PHT中获取所述指令簇的分支预测信息。
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710632787.7 | 2017-07-28 | ||
CN201710632787.7A CN109308191B (zh) | 2017-07-28 | 2017-07-28 | 分支预测方法及装置 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2019019719A1 true WO2019019719A1 (zh) | 2019-01-31 |
Family
ID=65039388
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2018/084134 WO2019019719A1 (zh) | 2017-07-28 | 2018-04-23 | 分支预测方法及装置 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN109308191B (zh) |
WO (1) | WO2019019719A1 (zh) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117389629A (zh) * | 2023-11-02 | 2024-01-12 | 北京市合芯数字科技有限公司 | 分支预测方法、装置、电子设备及介质 |
CN118626150A (zh) * | 2024-08-12 | 2024-09-10 | 北京微核芯科技有限公司 | 基于覆盖分支预测器的分支预测方法及装置 |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110134441B (zh) * | 2019-05-23 | 2020-11-10 | 苏州浪潮智能科技有限公司 | Risc-v分支预测方法、装置、电子设备及存储介质 |
CN110336803B (zh) * | 2019-06-21 | 2020-08-11 | 中国科学院软件研究所 | 一种目标主机分支预测单元的安全性评估方法 |
CN111459549B (zh) * | 2020-04-07 | 2022-11-01 | 上海兆芯集成电路有限公司 | 具有高度领先分支预测器的微处理器 |
CN114020441B (zh) * | 2021-11-29 | 2023-03-21 | 锐捷网络股份有限公司 | 一种多线程处理器的指令预测方法及相关装置 |
CN114253821B (zh) * | 2022-03-01 | 2022-05-27 | 西安芯瞳半导体技术有限公司 | 一种分析gpu性能的方法、装置及计算机存储介质 |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101256481A (zh) * | 2007-03-02 | 2008-09-03 | 富士通株式会社 | 数据处理器以及存储器读激活控制方法 |
CN102520914A (zh) * | 2011-11-04 | 2012-06-27 | 杭州中天微系统有限公司 | 支持多路并行预测的分支预测装置 |
CN104423929A (zh) * | 2013-08-21 | 2015-03-18 | 华为技术有限公司 | 一种分支预测方法及相关装置 |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102053818B (zh) * | 2009-11-05 | 2014-07-02 | 无锡江南计算技术研究所 | 分支预测方法及装置 |
CN102184091A (zh) * | 2011-04-18 | 2011-09-14 | 孙瑞琛 | 一种分支预测方法及装置 |
CN102520913B (zh) * | 2011-11-03 | 2014-03-26 | 浙江大学 | 基于分组更新历史信息的并行分支预测装置 |
US9229723B2 (en) * | 2012-06-11 | 2016-01-05 | International Business Machines Corporation | Global weak pattern history table filtering |
GB2511949B (en) * | 2013-03-13 | 2015-10-14 | Imagination Tech Ltd | Indirect branch prediction |
US10534611B2 (en) * | 2014-07-31 | 2020-01-14 | International Business Machines Corporation | Branch prediction using multi-way pattern history table (PHT) and global path vector (GPV) |
-
2017
- 2017-07-28 CN CN201710632787.7A patent/CN109308191B/zh active Active
-
2018
- 2018-04-23 WO PCT/CN2018/084134 patent/WO2019019719A1/zh active Application Filing
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101256481A (zh) * | 2007-03-02 | 2008-09-03 | 富士通株式会社 | 数据处理器以及存储器读激活控制方法 |
CN102520914A (zh) * | 2011-11-04 | 2012-06-27 | 杭州中天微系统有限公司 | 支持多路并行预测的分支预测装置 |
CN104423929A (zh) * | 2013-08-21 | 2015-03-18 | 华为技术有限公司 | 一种分支预测方法及相关装置 |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117389629A (zh) * | 2023-11-02 | 2024-01-12 | 北京市合芯数字科技有限公司 | 分支预测方法、装置、电子设备及介质 |
CN117389629B (zh) * | 2023-11-02 | 2024-06-04 | 北京市合芯数字科技有限公司 | 分支预测方法、装置、电子设备及介质 |
CN118626150A (zh) * | 2024-08-12 | 2024-09-10 | 北京微核芯科技有限公司 | 基于覆盖分支预测器的分支预测方法及装置 |
Also Published As
Publication number | Publication date |
---|---|
CN109308191B (zh) | 2021-09-14 |
CN109308191A (zh) | 2019-02-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2019019719A1 (zh) | 分支预测方法及装置 | |
US10552163B2 (en) | Method and apparatus for efficient scheduling for asymmetrical execution units | |
JP6744423B2 (ja) | プロセッサベースシステム内のロード経路履歴に基づくアドレス予測テーブルを使用したロードアドレス予測の実現 | |
US9946549B2 (en) | Register renaming in block-based instruction set architecture | |
US11989561B2 (en) | Method and apparatus for scheduling out-of-order execution queue in out-of-order processor | |
US9830152B2 (en) | Selective storing of previously decoded instructions of frequently-called instruction sequences in an instruction sequence buffer to be executed by a processor | |
WO2020199058A1 (zh) | 分支指令的处理方法、分支预测器及处理器 | |
CN104603747B (zh) | 响应于分支预测表调换指令而调换分支方向历史及相关的系统和方法 | |
US11442727B2 (en) | Controlling prediction functional blocks used by a branch predictor in a processor | |
US11816061B2 (en) | Dynamic allocation of arithmetic logic units for vectorized operations | |
CN103365628A (zh) | 用于执行预解码时优化的指令的方法和系统 | |
CN114968373A (zh) | 指令分派方法、装置、电子设备及计算机可读存储介质 | |
CN116860665A (zh) | 由处理器执行的地址翻译方法及相关产品 | |
US9223714B2 (en) | Instruction boundary prediction for variable length instruction set | |
JP2017537408A (ja) | アウトオブオーダー(ooo)プロセッサにおける早期命令実行を提供すること、ならびに関連する装置、方法、およびコンピュータ可読媒体 | |
US11327768B2 (en) | Arithmetic processing apparatus and memory apparatus | |
US11256543B2 (en) | Processor and instruction scheduling method | |
CN107924310A (zh) | 使用避免转出表(pat)预测计算机处理器中的存储器指令转出 | |
WO2023045250A1 (zh) | 一种内存池资源共用的方法、装置、设备及可读介质 | |
US11093401B2 (en) | Hazard prediction for a group of memory access instructions using a buffer associated with branch prediction | |
US9489246B2 (en) | Method and device for determining parallelism of tasks of a program | |
CN113778526B (zh) | 一种基于Cache的流水线的执行方法及装置 | |
WO2024087039A1 (zh) | 一种块指令的处理方法和块指令处理器 | |
US20230367596A1 (en) | Instruction prediction method and apparatus, system, and computer-readable storage medium | |
TW202340940A (zh) | 使用處理器中的機率計數器更新進行分支預測器訓練 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 18839387 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 18839387 Country of ref document: EP Kind code of ref document: A1 |