CN116719561B - Conditional branch instruction processing system and method - Google Patents
Conditional branch instruction processing system and method Download PDFInfo
- Publication number
- CN116719561B CN116719561B CN202310993515.5A CN202310993515A CN116719561B CN 116719561 B CN116719561 B CN 116719561B CN 202310993515 A CN202310993515 A CN 202310993515A CN 116719561 B CN116719561 B CN 116719561B
- Authority
- CN
- China
- Prior art keywords
- instruction
- unit
- fetching
- state
- jump
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 61
- 238000012545 processing Methods 0.000 title claims abstract description 24
- 230000008569 process Effects 0.000 claims abstract description 48
- 238000011010 flushing procedure Methods 0.000 abstract description 6
- 238000003672 processing method Methods 0.000 abstract description 3
- 238000010586 diagram Methods 0.000 description 4
- 238000013507 mapping Methods 0.000 description 4
- 238000013528 artificial neural network Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000007639 printing Methods 0.000 description 2
- 230000009467 reduction Effects 0.000 description 2
- 230000007704 transition Effects 0.000 description 2
- 238000013461 design Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 238000007641 inkjet printing Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3867—Concurrent instruction execution, e.g. pipeline or look ahead using instruction pipelines
- G06F9/3869—Implementation aspects, e.g. pipeline latches; pipeline synchronisation and clocking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/3005—Arrangements for executing specific machine instructions to perform operations for flow control
- G06F9/30069—Instruction skipping instructions, e.g. SKIP
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3802—Instruction prefetching
- G06F9/3804—Instruction prefetching for branches, e.g. hedging, branch folding
- G06F9/3806—Instruction prefetching for branches, e.g. hedging, branch folding using address prediction, e.g. return stack, branch history buffer
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3802—Instruction prefetching
- G06F9/3814—Implementation provisions of instruction buffers, e.g. prefetch buffer; banks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3836—Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
- G06F9/3842—Speculative instruction execution
- G06F9/3844—Speculative instruction execution using dynamic branch prediction, e.g. using branch history tables
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/06—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
- G06N3/063—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Neurology (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Advance Control (AREA)
Abstract
The invention discloses a processing system and a processing method of a conditional branch instruction, comprising the following steps: the instruction transmitting unit is used for transmitting the instruction data fed back by the instruction storage unit based on the instruction fetching request to the instruction buffer area of the instruction fetching unit by adopting the first process, and transmitting the conditional branch instruction read from the instruction buffer area to the instruction executing unit by adopting the second process; the instruction execution unit is used for executing the conditional branch instruction to generate an execution result; the instruction fetching unit is used for emptying the instruction buffer area when the jump information is determined to be needed to jump, and sending the target instruction to the instruction executing unit through the instruction transmitting unit. When processing conditional branch instructions, the method can be realized by simple hardware, and the depth of an instruction buffer area is the number of instructions needed for hiding the pipeline execution period of an instruction execution unit, so that 1 branch prediction error is allowed by flushing the instruction buffer area when determining that jump is needed, the instruction is not needed to be deeply flushed, and the power consumption of branch prediction is reduced.
Description
Technical Field
The present invention relates to the field of processor technologies, and in particular, to a system and a method for processing a conditional branch instruction.
Background
Conditional branch instructions are common in data pipelines of neural network processors (Neural Network Processor Unit, NPUs). The instruction to be executed after the conditional branch instruction has a control dependency on the execution result of the conditional branch instruction, and when the conditional branch instruction is executed, the pipeline is halted, and a program counter (Programming Counter, PC) of the next instruction is decided by waiting for the result of whether the branch instruction jumps or not. If the instruction does not jump, the next instruction is continued, and if the instruction jumps, the instruction jumps to the target PC for execution.
Many existing processors employ branch prediction techniques to improve the efficiency of instruction execution through branch prediction. However, branch prediction techniques require a branch prediction hardware unit that records the address of the branch instruction and the history of the instruction jump, combined to predict the PC of the next instruction. Therefore, the branch prediction unit hardware implementation is more complex and the power consumption is also greater.
Disclosure of Invention
The present invention provides a conditional branch instruction processing system and method to enable processing of conditional branch instructions in a simpler system, allowing 1 branch prediction error, without the need for deep drain instructions, to reduce the power consumption of branch prediction.
According to a first aspect of the present invention there is provided a processing system for conditional branch instructions, comprising: the instruction fetching unit, the instruction storage unit, the instruction transmitting unit and the instruction executing unit are respectively connected with the instruction fetching unit, and the instruction transmitting unit is respectively connected with the instruction storage unit and the instruction executing unit;
the instruction fetching unit is used for sending an instruction fetching request to the instruction storage unit, wherein the instruction fetching request comprises a current program counter PC pointer and running state information;
the instruction transmitting unit is used for transmitting the instruction data fed back by the instruction storage unit based on the instruction fetching request to an instruction buffer area of the instruction fetching unit by adopting a first process, and transmitting the conditional branch instruction read from the instruction buffer area to the instruction executing unit by adopting a second process, wherein the depth of the instruction buffer area is the instruction number required for hiding the pipeline execution period of the instruction executing unit;
the instruction execution unit is used for executing the conditional branch instruction to generate an execution result and sending the execution result to the instruction fetching unit, wherein the execution result comprises jump information and a target PC pointer;
the instruction fetching unit is used for emptying the instruction buffer area when the jump information is determined to be needed to jump, sending a re-fetching instruction request generated according to the target PC pointer to the instruction storage unit, enabling the instruction storage unit to determine a target instruction according to the target PC, then sending the target instruction to the instruction transmitting unit, and sending the target instruction to the instruction executing unit through the instruction transmitting unit.
According to another aspect of the present invention, there is provided a method of processing a conditional branch instruction, comprising: transmitting an instruction fetching request to an instruction storage unit through an instruction fetching unit, wherein the instruction fetching request comprises a current program counter PC pointer and running state information;
the instruction transmitting unit adopts a first process to transmit the instruction data fed back by the instruction storage unit based on the instruction fetching request to an instruction buffer area of the instruction fetching unit, and adopts a second process to transmit the conditional branch instruction read from the instruction buffer area to the instruction executing unit, wherein the depth of the instruction buffer area is the instruction number required for hiding the execution period of an instruction executing unit pipeline;
executing the conditional branch instruction through an instruction execution unit to generate an execution result, and sending the execution result to the instruction fetching unit, wherein the execution result comprises jump information and a target PC pointer;
when the jump information is determined to be needed to jump, the instruction buffer area is emptied through the instruction fetching unit, and a re-fetching instruction request generated according to the target PC pointer is sent to the instruction storage unit, so that the instruction storage unit determines a target instruction according to the target PC and then sends the target instruction to the instruction transmitting unit, and the target instruction is sent to the instruction buffer area through the instruction transmitting unit.
The technical scheme of the embodiment of the invention can be realized by simple hardware when processing the conditional branch instruction, and because the depth of the instruction buffer is the number of instructions needed for hiding the execution cycle of the instruction execution unit pipeline, 1 branch prediction error is allowed by clearing the instruction buffer when determining that the jump is needed, the instruction is not needed to be deeply emptied, and the power consumption of branch prediction is reduced.
It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the invention or to delineate the scope of the invention. Other features of the present invention will become apparent from the description that follows.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic diagram of a conditional branch instruction processing system according to a first embodiment of the present invention;
FIG. 2 is a state transition diagram of an instruction fetch unit state machine according to a first embodiment of the present invention;
fig. 3 is a flowchart of a processing method of a conditional branch instruction according to a second embodiment of the present invention.
Detailed Description
In order that those skilled in the art will better understand the present invention, a technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.
It is noted that the terms "comprises" and "comprising," and any variations thereof, in the description and claims of the present invention and in the foregoing figures, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed or inherent to such process, method, article, or apparatus.
Example 1
Fig. 1 is a schematic structural diagram of a conditional branch instruction processing system according to an embodiment of the present invention, as shown in fig. 1, where the system includes an instruction fetching unit, an instruction storage unit, an instruction transmitting unit and an instruction executing unit, where the instruction storage unit, the instruction transmitting unit and the instruction executing unit are respectively connected to the instruction fetching unit, and the instruction transmitting unit is respectively connected to the instruction storage unit and the instruction executing unit.
The instruction fetching unit is used for sending an instruction fetching request to the instruction storage unit, wherein the instruction fetching request comprises a current program counter PC pointer and running state information;
the instruction transmitting unit is used for transmitting the instruction data fed back by the instruction storage unit based on the instruction fetching request to the instruction buffer zone of the instruction fetching unit by adopting the first process, and transmitting the conditional branch instruction read from the instruction buffer zone to the instruction executing unit by adopting the second process, wherein the depth of the instruction buffer zone is the instruction quantity required for hiding the pipeline execution period of the instruction executing unit;
the instruction execution unit is used for executing the conditional branch instruction to generate an execution result and sending the execution result to the instruction fetching unit, wherein the execution result comprises jump information and a target PC pointer;
and the instruction fetching unit is used for emptying the instruction buffer area when the jump information is determined to be required to jump, and sending a re-fetching instruction request generated according to the target PC pointer to the instruction storage unit, so that the instruction storage unit determines a target instruction according to the target PC, and then sends the target instruction to the instruction transmitting unit, and the instruction transmitting unit sends the target instruction to the instruction executing unit.
Specifically, the instruction fetching unit of the present embodiment further includes a state machine, as shown in fig. 2, which is a state transition diagram of the state machine, so that the operation state information of the instruction fetching unit has four states, namely, an IDLE state-IDLE, an instruction fetching state-IFETCH, a re-instruction fetching state-refatch, and a waiting-for-empty state-wait_draw, and the state machine is configured to switch the operation state information of the instruction fetching unit from the IDLE state to the instruction fetching state when it is determined that the instruction fetching unit starts to operate; when determining that the instruction fetching unit receives the jump information needing to jump for the first time, switching the running state information from the instruction fetching state to the re-instruction fetching state; when the instruction fetching unit is determined to receive the jump information needing to jump again in the re-fetching state, the running state information is switched from the re-fetching state to the waiting and draining state. When determining that the instruction fetching unit is in the re-fetching state, the instruction storage unit returns all instructions of the request, namely, an instruction with a REFETCH state value of 0, switching the running state information from the re-fetching state to the fetching state; when the instruction fetching unit is determined to receive the jump information to be jumped under the instruction fetching state, and the instruction storage unit does not return all instructions of the history request, namely, the instruction with the REFETCH state value of 1, the running state information is switched from the instruction fetching state to the waiting emptying state; when the instruction fetching unit is determined to be in a waiting and draining state, the instruction storage unit returns all instructions of the history request, and then the running state information is switched from the waiting and draining state to the instruction fetching state; when the instruction fetching unit is determined to receive the last instruction in the instruction storage unit, the running state information is switched from the instruction fetching state to the idle state.
Optionally, the instruction fetching unit is configured to generate an instruction fetching request according to the current PC pointer and the running state information when it is determined that the running state information is an instruction fetching state or a re-instruction fetching state and the instruction buffer area is not full; and sending the instruction fetching request to an instruction storage unit, and self-increasing the current PC pointer according to the designated step length.
Specifically, in this embodiment, when the instruction fetch unit determines that the running state information of the instruction fetch unit is the fetch state IFETCH and the re-fetch state reftch through the state machine, and the instruction buffer is not full, the instruction fetch unit sends an instruction fetch request to the instruction storage unit, where the instruction fetch request includes the current PC pointer, for example, pc=1, and the running state information, for example, the fetch state IFETCH. In addition, the instruction fetch unit will also self-increment its current PC pointer according to a specified step, for example, step 1, after sending the instruction fetch request, so that the PC is updated to 2 in the instruction fetch unit, which is, of course, only illustrated in the present embodiment, and not limited to the self-increment mode of the PC pointer.
Optionally, the instruction storage unit is used for searching locally according to the current PC pointer to obtain an instruction corresponding to the current PC pointer; and generating instruction data according to the searched instruction and running state information, and sending the instruction data to an instruction transmitting unit.
In this embodiment, the instruction to be processed is stored in the instruction storage unit in advance, and the mapping relationship between the PC pointer and the instruction is established in the instruction storage unit, so when the instruction storage unit receives the instruction fetching request sent by the instruction fetching unit, the instruction storage unit extracts the current PC instruction, for example, pc=1, in the instruction fetching request, queries the instruction corresponding to pc=1 according to the pre-established mapping relationship, and after querying the instruction corresponding to the current PC pointer, the instruction storage unit composes the instruction and the received running state information into instruction data, thereby sending the instruction data including the instruction and the running state information to the instruction transmitting unit. The instruction transmitting unit decodes the instruction included in the instruction data to obtain the type of the instruction after receiving the instruction data sent by the instruction storing unit, and the process of decoding the instruction is essentially a process of identifying the instruction, for example, the type of the instruction includes a conditional branch instruction or a non-conditional branch instruction, and the specific type of the instruction is not limited in this embodiment. In this embodiment, the instruction transmitting unit has two processes that do not interfere with each other, namely, a first process for transmitting an instruction and a second process for reading the instruction, so that after decoding the instruction, the instruction transmitting unit transmits the instruction, the running state information and the type to the instruction fetching unit by using the first process.
Optionally, the instruction fetching unit is configured to directly store the instruction and the type into the instruction buffer when determining that the running state information is an instruction fetching state; when the running state information is determined to be in a waiting and draining state, suspending storing the instruction and the type into an instruction buffer; when the running state information is determined to be the re-instruction fetching state, checking the state value of the re-instruction fetching state, judging whether the state value is 1, if so, storing the instruction and the type into the instruction buffer, otherwise, stopping storing the instruction and the type into the instruction buffer.
Specifically, the instruction fetching unit in this embodiment does not necessarily store the instruction in the instruction buffer after receiving the instruction sent by the instruction transmitting unit, and needs to determine whether to store the instruction based on the running state information, and when the running state information is the instruction fetching state IFETCH, it indicates that the instruction is a general instruction request sent by the instruction fetching unit, and the instruction is directly received and stored in the instruction buffer; when the running state information is in a WAIT-empty state, the instruction cache unit is indicated to need to completely clear after all the instructions corresponding to the previous history request are received at present, so that the acquired instructions are not cached in the instruction buffer; when the running state information is the re-fetch state reftch, the state value of the re-fetch state is checked, when the state value is 0, the received instruction is not stored in the instruction buffer, and when the state value is 1, the instruction is cached in the instruction buffer. In this embodiment, the depth of the instruction buffer of the instruction fetch unit needs to be set in advance, and in particular, the depth is set to hide the number of instructions required by the pipeline execution cycle of the instruction execution unit, so that 1 branch prediction error can be allowed when the instruction is flushed, and no deep flushing instruction is required, so that the number of instructions stored in the instruction buffer in this embodiment is limited.
It should be noted that, in this embodiment, the instruction transmitting unit is configured to use the second process to read the instruction from the instruction buffer at regular time, and when the type of the read instruction is the unconditional branch instruction, the instruction executing unit is configured to sequentially execute the instructions in the instruction buffer in sequence, and then directly send the read unconditional branch instruction to the instruction executing unit. However, when the type of the read instruction is a conditional branch instruction, the read conditional branch instruction is sent to the instruction execution unit, but the instruction execution unit is in a judging operation when executing the conditional branch instruction, if the instruction execution unit is not determined, the instruction execution unit only needs to execute the conditional branch instruction in sequence, if the instruction execution unit is determined to be not, the target instruction is required to jump to the target instruction, and the target instruction is not necessarily the next adjacent instruction of the instruction, but the second process of the instruction execution unit reads the next adjacent instruction in sequence at the time of reading and sends the next adjacent instruction to the instruction execution unit, which inevitably causes the situation that the reading of the instruction execution unit and the execution of the instruction execution unit collide, and at the moment, the instruction execution unit updates the own reading flag bit to a pause state, so that the second process pauses the instruction reading work when the reading flag bit is in the pause state.
In this embodiment, the instruction execution unit receives the instruction sent by the instruction sending unit, and the instruction execution unit may normally execute the unconditional instruction, and there is no subsequent instruction jump, so that the important issue in this embodiment is the processing of the conditional branch instruction. The instruction execution unit executes the conditional branch execution instruction and generates an execution result when receiving the instruction sent by the instruction sending unit and determining that the type of the instruction is a conditional branch instruction, and includes jump information and a target PC pointer in the execution result and sends the obtained execution result to the instruction fetching unit.
Optionally, the instruction fetching unit is configured to empty the instruction buffer when determining that the jump information is that a jump is required, and determining that the instruction is in the instruction fetching state or the instruction is fetched again.
Optionally, the instruction fetching unit is further configured to update a read flag bit of the instruction transmitting unit to an operating state when it is determined that the jump information is received, where the second process starts the instruction reading operation when the read flag bit is in the operating state.
Specifically, after receiving the jump information, the instruction fetching unit in this embodiment determines whether the branch jump is no, and if not, the instruction executing unit only needs to execute the instructions in the instruction buffer in sequence, so only needs to update the reading flag of the instruction transmitting unit to the running state, so that the instruction transmitting unit starts the second process to continue to read the instructions from the instruction buffer in sequence. However, when it is determined that the branch jump is yes, not only the read instruction flag of the instruction transmitting unit needs to be updated to the running state, but if the instruction transmitting unit continues to read sequentially, the instruction read from the instruction buffer is not matched with the target PC pointer corresponding to the jump, so that when the instruction fetching unit determines that the instruction fetching unit is in the instruction fetching state IFETCH, receives the jump information sent by the executing unit and determines that the jump is required, or when the instruction fetching unit determines that the instruction fetching unit is in the re-instruction fetching state refatch, receives the jump information sent by the executing unit and determines that the jump is required, the instruction in the instruction buffer needs to be subjected to the flushing process. Therefore, under the condition that the instruction buffer is determined to be empty, a re-fetching instruction request generated according to the received target PC pointer, for example, the target pc=10, the instruction storage unit determines a matched target instruction according to the target PC and sends the target instruction to the instruction transmitting unit, the instruction transmitting unit stores the acquired instruction into the buffer through the first process, and at the moment, because the instruction buffer only contains the instruction to be executed by the instruction executing unit, the instruction transmitting unit can directly acquire the target instruction to be skipped when reading the instruction buffer through the second process and sends the target instruction to the instruction executing unit to execute the instruction, so that when the instruction executing unit executes the conditional branch instruction corresponding to pc=1, the instruction corresponding to the pc=10 to be skipped is directly acquired and executed when the skip is needed. Because only the instructions related to 1 branch pre-storage are deleted in the instruction buffer area, the number of the instructions is small, and therefore, the processing of conditional branch instructions can be realized without deep draining processing, and the power consumption of branch prediction is reduced.
In this embodiment, when processing a conditional branch instruction, it can be implemented in simple hardware, and since the depth of the instruction buffer is the number of instructions needed to conceal the pipeline execution cycle of the instruction execution unit, 1 branch prediction error is allowed by flushing the instruction buffer when determining that a jump is required, no deep flush instruction is needed, and the power consumption of branch prediction is reduced.
Example two
Fig. 3 is a flowchart of a processing method of a conditional branch instruction according to an embodiment of the present invention, where the present invention is applicable to a case where the conditional branch instruction is processed, and the method may be performed by the processing system of the conditional branch instruction in the above embodiment. As shown in fig. 3, the method includes:
step S101, an instruction fetch unit sends an instruction fetch request to an instruction storage unit.
Specifically, in this embodiment, when the instruction fetch unit determines that the running state information of the instruction fetch unit itself is the fetch state IFETCH and the refectoch, and the state value is 0 when the fetch state IFETCH is the fetch state IFETCH, and is 1 when the refectoch is the refectory state, and the instruction buffer is not full, the instruction fetch unit sends an instruction fetch request to the instruction storage unit, where the instruction fetch request includes the current PC pointer, for example, pc=1, and the running state information, for example, the fetch state IFETCH. In addition, the instruction fetch unit will also self-increment its current PC pointer according to a specified step, for example, step 1, after sending the instruction fetch request, so that the PC is updated to 2 in the instruction fetch unit, which is, of course, only illustrated in the present embodiment, and not limited to the self-increment mode of the PC pointer.
Step S102, the instruction transmitting unit adopts a first process to transmit the instruction data fed back by the instruction storage unit based on the instruction fetching request to the instruction buffer area of the instruction fetching unit, and adopts a second process to transmit the conditional branch instruction read from the instruction buffer area to the instruction executing unit.
All instructions to be processed are stored in the instruction storage unit in advance, and a mapping relation between a PC pointer and the instructions is established in the instruction storage unit, so when the instruction storage unit receives an instruction fetching request sent by the instruction fetching unit, a current PC instruction in the instruction fetching request, for example, pc=1, is extracted, an instruction corresponding to pc=1 is queried according to the pre-established mapping relation, and after the instruction storage unit queries the instruction corresponding to the current PC pointer, the instruction storage unit forms instruction data with the received running state information, so that the instruction data containing the instruction and the running state information is sent to the instruction transmitting unit. The instruction transmitting unit decodes the instruction included in the instruction data to obtain the type of the instruction after receiving the instruction data sent by the instruction storing unit, and the process of decoding the instruction is essentially a process of identifying the instruction, for example, the type of the instruction includes a conditional branch instruction or a non-conditional branch instruction, and the specific type of the instruction is not limited in this embodiment. In this embodiment, the instruction transmitting unit has two processes that do not interfere with each other, namely, a first process for transmitting an instruction and a second process for reading the instruction, so that after decoding the instruction, the instruction transmitting unit transmits the instruction, the running state information and the type to the instruction fetching unit by using the first process.
Specifically, the instruction fetching unit in this embodiment does not necessarily store the instruction in the instruction buffer after receiving the instruction sent by the instruction transmitting unit, and needs to determine whether to store the instruction based on the running state information, and when the running state information is the instruction fetching state IFETCH, it indicates that the instruction is a general instruction request sent by the instruction fetching unit, and the instruction is directly received and stored in the instruction buffer; when the running state information is in a WAIT-empty state, the instruction cache unit is indicated to need to completely clear after all the instructions corresponding to the previous history request are received at present, so that the acquired instructions are not cached in the instruction buffer; when the running state information is the re-fetch state reftch, the state value of the re-fetch state is checked, when the state value is 0, the received instruction is not stored in the instruction buffer, and when the state value is 1, the instruction is cached in the instruction buffer. In this embodiment, the depth of the instruction buffer of the instruction fetch unit needs to be set in advance, and in particular, the depth is set to hide the number of instructions required for the pipeline execution cycle of the instruction execution unit, so that 1 branch prediction error can be allowed when the flush is performed, and no deep flush instruction is required, so that the number of instructions stored in the instruction buffer in this embodiment is limited.
It should be noted that, in this embodiment, the instruction transmitting unit is configured to use the second process to read the instruction from the instruction buffer at regular time, and when the type of the read instruction is the unconditional branch instruction, the instruction executing unit is configured to sequentially execute the instructions in the instruction buffer in sequence, and then directly send the read unconditional branch instruction to the instruction executing unit. However, when the type of the read instruction is a conditional branch instruction, the read conditional branch instruction is sent to the instruction execution unit, but the instruction execution unit is in a judging operation when executing the conditional branch instruction, if the instruction execution unit is not determined, the instruction in the instruction buffer area is only required to be sequentially ordered, if the instruction is determined to be not, the target instruction is required to jump to the target instruction, and the target instruction is not necessarily the next adjacent instruction of the instruction, but the second process of the instruction transmitting unit is sequentially read and sent to the instruction execution unit at the time of reading, which inevitably causes the situation that the reading of the instruction transmitting unit and the execution of the instruction execution unit collide, and at the moment, the instruction transmitting unit updates the reading flag bit of the instruction transmitting unit to be in a pause state, so that the second process stops the instruction reading work when the reading flag bit is in the pause state.
Step S103, executing the conditional branch instruction by the instruction execution unit to generate an execution result, and sending the execution result to the instruction fetching unit.
In this embodiment, the instruction execution unit receives the instruction sent by the instruction sending unit, and the instruction execution unit may normally execute the unconditional instruction, and there is no subsequent instruction jump, so that the important issue in this embodiment is the processing of the conditional branch instruction. The instruction execution unit executes the conditional branch execution instruction and generates an execution result when receiving the instruction sent by the instruction sending unit and determining that the type of the instruction is a conditional branch instruction, and includes jump information and a target PC pointer in the execution result and sends the obtained execution result to the instruction fetching unit.
Step S104, when the instruction fetching unit determines that the jump information is needed to jump, the instruction buffer area is emptied, and a re-instruction fetching request generated according to the target PC pointer is sent to the instruction storage unit, so that the instruction storage unit determines a target instruction according to the target PC and then sends the target instruction to the instruction transmitting unit, and the instruction transmitting unit sends the target instruction to the instruction buffer area.
Specifically, after receiving the jump information, the instruction fetching unit in this embodiment determines whether the branch jump is no, and if not, the instruction executing unit only needs to execute the instructions in the instruction buffer in sequence, so only needs to update the reading flag of the instruction transmitting unit to the running state, so that the instruction transmitting unit starts the second process to continue to read the instructions from the instruction buffer in sequence. However, when it is determined that the branch jump is yes, not only the read instruction flag of the instruction transmitting unit needs to be updated to the running state, but if the instruction transmitting unit continues to read sequentially, the instruction read from the instruction buffer is not matched with the target PC pointer corresponding to the jump, so that when the instruction fetching unit determines that the instruction fetching unit is in the instruction fetching state IFETCH, receives the jump information sent by the executing unit and determines that the jump is required, or when the instruction fetching unit determines that the instruction fetching unit is in the re-instruction fetching state refatch, receives the jump information sent by the executing unit and determines that the jump is required, the instruction in the instruction buffer needs to be subjected to the flushing process. Therefore, under the condition that the instruction buffer is determined to be empty, a re-fetching instruction request generated according to the received target PC pointer, for example, the target pc=10, the instruction storage unit determines a matched target instruction according to the target PC and sends the target instruction to the instruction transmitting unit, the instruction transmitting unit stores the acquired instruction into the buffer through the first process, and at the moment, because the instruction buffer only contains the instruction to be executed by the instruction executing unit, the instruction transmitting unit can directly acquire the target instruction to be skipped when reading the instruction buffer through the second process and sends the target instruction to the instruction executing unit to execute the instruction, so that when the instruction executing unit executes the conditional branch instruction corresponding to pc=1, the instruction corresponding to the pc=10 to be skipped is directly acquired and executed when the skip is needed. Because only the instructions related to 1 branch pre-storage are deleted in the instruction buffer area, the number of the instructions is small, and therefore, the processing of conditional branch instructions can be realized without deep draining processing, and the power consumption of branch prediction is reduced.
In this embodiment, when processing a conditional branch instruction, it can be implemented in simple hardware, and since the depth of the instruction buffer is the number of instructions needed to conceal the pipeline execution cycle of the instruction execution unit, 1 branch prediction error is allowed by flushing the instruction buffer when determining that a jump is required, no deep flush instruction is needed, and the power consumption of branch prediction is reduced.
According to the embodiment of the invention, when the original vector image is determined to comprise the closed shape, the original vector image is subjected to the primitive reduction processing to obtain the adjustment vector image and then is converted into the bitmap, and the pixels to be printed in the corresponding bitmap are reduced under the condition of primitive reduction, so that the interference of the ink-jet printing characteristics such as ink fusion on the printing result is reduced, and the recognition of the printing result is improved.
It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps described in the present invention may be performed in parallel, sequentially, or in a different order, so long as the desired results of the technical solution of the present invention are achieved, and the present invention is not limited herein.
The above embodiments do not limit the scope of the present invention. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the scope of the present invention.
Claims (9)
1. A processing system for conditional branch instructions, comprising: the instruction fetching unit, the instruction storage unit, the instruction transmitting unit and the instruction executing unit are respectively connected with the instruction fetching unit, and the instruction transmitting unit is respectively connected with the instruction storage unit and the instruction executing unit;
the instruction fetching unit is used for sending an instruction fetching request to the instruction storage unit, wherein the instruction fetching request comprises a current program counter PC pointer and running state information;
the instruction transmitting unit is used for transmitting the instruction data fed back by the instruction storage unit based on the instruction fetching request to an instruction buffer area of the instruction fetching unit by adopting a first process, and transmitting the conditional branch instruction read from the instruction buffer area to the instruction executing unit by adopting a second process, wherein the depth of the instruction buffer area is the instruction number required for hiding the pipeline execution period of the instruction executing unit;
the instruction execution unit is used for executing the conditional branch instruction to generate an execution result and sending the execution result to the instruction fetching unit, wherein the execution result comprises jump information and a target PC pointer;
the instruction fetching unit is used for emptying the instruction buffer area when the jump information is determined to be needed to jump, and sending a re-fetching instruction request generated according to the target PC pointer to the instruction storage unit, so that the instruction storage unit determines a target instruction according to the target PC, then sends the target instruction to the instruction transmitting unit, and sends the target instruction to the instruction executing unit through the instruction transmitting unit;
the instruction fetching unit comprises a state machine, wherein the state machine is used for switching the running state information of the instruction fetching unit from an idle state to an instruction fetching state when the instruction fetching unit is determined to start working;
when the instruction fetching unit is determined to receive the jump information needing to jump for the first time, switching the running state information from the instruction fetching state to the instruction re-fetching state;
when the instruction fetching unit is determined to receive the jump information needing to jump again in the re-fetching state, switching the running state information from the re-fetching state to the waiting emptying state;
when the instruction fetching unit is determined to be in the re-fetching state and the instruction storage unit returns all instructions requested at this time, switching the running state information from the re-fetching state to the fetching state;
when the instruction fetching unit is determined to receive the jump information needing to jump in the instruction fetching state, and the instruction storage unit does not return all instructions of the history request, switching the running state information from the instruction fetching state to a waiting emptying state;
when the instruction fetching unit is determined to be in the waiting and draining state, the instruction storage unit returns all instructions of the history request, and then the running state information is switched from the waiting and draining state to the instruction fetching state;
and when the instruction fetching unit is determined to receive the last instruction in the instruction storage unit, switching the running state information from the instruction fetching state to the idle state.
2. The system according to claim 1, wherein the instruction fetch unit is configured to generate the instruction fetch request according to a current PC pointer and the running state information when it is determined that the running state information is a fetch state or a re-fetch state and the instruction buffer is not full;
and sending the instruction fetching request to the instruction storage unit, and self-increasing the current PC pointer of the instruction fetching request according to the specified step length.
3. The system according to claim 2, wherein the instruction storage unit is configured to search locally according to the current PC pointer to obtain an instruction corresponding to the current PC pointer;
and generating the instruction data according to the searched instruction and the running state information, and sending the instruction data to the instruction transmitting unit.
4. A system according to claim 3, wherein the instruction issue unit is configured to decode an instruction in the instruction data to obtain a type of the instruction, wherein the type includes a conditional branch instruction or a non-conditional branch instruction;
and adopting a first process to send the instruction, the running state information and the type to the instruction fetching unit.
5. The system of claim 4, wherein the instruction fetch unit is configured to directly store the instruction and the type into the instruction buffer when the running state information is determined to be an instruction fetch state;
suspending storing the instruction and the type in the instruction buffer when the running state information is determined to be a waiting to empty state;
and when the running state information is determined to be a re-instruction taking state, checking a state value of the re-instruction taking state, judging whether the state value is 1, if so, storing the instruction and the type into the instruction buffer area, and otherwise, stopping storing the instruction and the type into the instruction buffer area.
6. The system of claim 1, wherein the instruction issue unit is configured to employ a second process to read instructions from the instruction buffer at regular intervals, and to send the read unconditional branch instructions directly to the instruction execution unit when the type of instruction read is unconditional branch instructions;
when the type of the read instruction is a conditional branch instruction, the read conditional branch instruction is sent to the instruction execution unit, and meanwhile, the read flag bit of the second process is updated to be in a pause state, wherein the second process suspends the instruction reading work when the read flag bit is in the pause state.
7. The system of claim 1, wherein the instruction fetch unit is configured to empty the instruction buffer when the jump information is determined to be a jump required and when the jump information is determined to be in a fetch state or a re-fetch state.
8. The system of claim 6, wherein the instruction fetch unit is further configured to update a read flag bit of the instruction issue unit to an operational state when the jump information is determined to be received, wherein the second process initiates an instruction read operation when the read flag bit is in the operational state.
9. A method of processing conditional branch instructions as claimed in any one of claims 1 to 8, comprising:
transmitting an instruction fetching request to an instruction storage unit through an instruction fetching unit, wherein the instruction fetching request comprises a current program counter PC pointer and running state information;
the instruction transmitting unit adopts a first process to transmit the instruction data fed back by the instruction storage unit based on the instruction fetching request to an instruction buffer area of the instruction fetching unit, and adopts a second process to transmit the conditional branch instruction read from the instruction buffer area to the instruction executing unit, wherein the depth of the instruction buffer area is the instruction number required for hiding the execution period of an instruction executing unit pipeline;
executing the conditional branch instruction through an instruction execution unit to generate an execution result, and sending the execution result to the instruction fetching unit, wherein the execution result comprises jump information and a target PC pointer;
when the jump information is determined to be needed to jump, the instruction buffer area is emptied through the instruction fetching unit, and a re-fetching instruction request generated according to the target PC pointer is sent to the instruction storage unit, so that the instruction storage unit determines a target instruction according to the target PC and then sends the target instruction to the instruction transmitting unit, and the instruction transmitting unit sends the target instruction to the instruction buffer area;
the instruction fetching unit comprises a state machine, wherein the state machine is used for switching the running state information of the instruction fetching unit from an idle state to an instruction fetching state when the instruction fetching unit is determined to start working;
when the instruction fetching unit is determined to receive the jump information needing to jump for the first time, switching the running state information from the instruction fetching state to the instruction re-fetching state;
when the instruction fetching unit is determined to receive the jump information needing to jump again in the re-fetching state, switching the running state information from the re-fetching state to the waiting emptying state;
when the instruction fetching unit is determined to be in the re-fetching state and the instruction storage unit returns all instructions requested at this time, switching the running state information from the re-fetching state to the fetching state;
when the instruction fetching unit is determined to receive the jump information needing to jump in the instruction fetching state, and the instruction storage unit does not return all instructions of the history request, switching the running state information from the instruction fetching state to a waiting emptying state;
when the instruction fetching unit is determined to be in the waiting and draining state, the instruction storage unit returns all instructions of the history request, and then the running state information is switched from the waiting and draining state to the instruction fetching state;
and when the instruction fetching unit is determined to receive the last instruction in the instruction storage unit, switching the running state information from the instruction fetching state to the idle state.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310993515.5A CN116719561B (en) | 2023-08-09 | 2023-08-09 | Conditional branch instruction processing system and method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310993515.5A CN116719561B (en) | 2023-08-09 | 2023-08-09 | Conditional branch instruction processing system and method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN116719561A CN116719561A (en) | 2023-09-08 |
CN116719561B true CN116719561B (en) | 2023-10-31 |
Family
ID=87871963
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310993515.5A Active CN116719561B (en) | 2023-08-09 | 2023-08-09 | Conditional branch instruction processing system and method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116719561B (en) |
Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5574871A (en) * | 1994-01-04 | 1996-11-12 | Intel Corporation | Method and apparatus for implementing a set-associative branch target buffer |
CN102968293A (en) * | 2012-11-28 | 2013-03-13 | 中国人民解放军国防科学技术大学 | Dynamic detection and execution method of program loop code based on instruction queue |
CN105718241A (en) * | 2016-01-18 | 2016-06-29 | 北京时代民芯科技有限公司 | SPARC V8 system structure based classified type mixed branch prediction system |
CN106293642A (en) * | 2016-08-08 | 2017-01-04 | 合肥工业大学 | A kind of branch process module and branch process mechanism thereof calculating system for coarseness multinuclear |
CN112540792A (en) * | 2019-09-23 | 2021-03-23 | 阿里巴巴集团控股有限公司 | Instruction processing method and device |
CN113076090A (en) * | 2021-04-23 | 2021-07-06 | 中国人民解放军国防科技大学 | Side channel safety protection-oriented loop statement execution method and device |
CN113760366A (en) * | 2021-07-30 | 2021-12-07 | 浪潮电子信息产业股份有限公司 | Method, system and related device for processing conditional jump instruction |
CN114528024A (en) * | 2022-02-21 | 2022-05-24 | 安徽芯纪元科技有限公司 | Instruction fetching assembly line for storage and calculation fusion processor |
CN114579479A (en) * | 2021-11-16 | 2022-06-03 | 中国科学院上海高等研究院 | Low-pollution cache prefetching system and method based on instruction flow mixed mode learning |
CN115454504A (en) * | 2022-09-05 | 2022-12-09 | 山东大学 | Four-emission RISC-V processor micro-architecture and working method thereof |
CN116089346A (en) * | 2023-04-07 | 2023-05-09 | 芯砺智能科技(上海)有限公司 | Method, system, medium and device for retransmitting error data on embedded bus |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9092225B2 (en) * | 2012-01-31 | 2015-07-28 | Freescale Semiconductor, Inc. | Systems and methods for reducing branch misprediction penalty |
US9940262B2 (en) * | 2014-09-19 | 2018-04-10 | Apple Inc. | Immediate branch recode that handles aliasing |
CN112540797A (en) * | 2019-09-23 | 2021-03-23 | 阿里巴巴集团控股有限公司 | Instruction processing apparatus and instruction processing method |
-
2023
- 2023-08-09 CN CN202310993515.5A patent/CN116719561B/en active Active
Patent Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5574871A (en) * | 1994-01-04 | 1996-11-12 | Intel Corporation | Method and apparatus for implementing a set-associative branch target buffer |
CN102968293A (en) * | 2012-11-28 | 2013-03-13 | 中国人民解放军国防科学技术大学 | Dynamic detection and execution method of program loop code based on instruction queue |
CN105718241A (en) * | 2016-01-18 | 2016-06-29 | 北京时代民芯科技有限公司 | SPARC V8 system structure based classified type mixed branch prediction system |
CN106293642A (en) * | 2016-08-08 | 2017-01-04 | 合肥工业大学 | A kind of branch process module and branch process mechanism thereof calculating system for coarseness multinuclear |
CN112540792A (en) * | 2019-09-23 | 2021-03-23 | 阿里巴巴集团控股有限公司 | Instruction processing method and device |
CN113076090A (en) * | 2021-04-23 | 2021-07-06 | 中国人民解放军国防科技大学 | Side channel safety protection-oriented loop statement execution method and device |
CN113760366A (en) * | 2021-07-30 | 2021-12-07 | 浪潮电子信息产业股份有限公司 | Method, system and related device for processing conditional jump instruction |
CN114579479A (en) * | 2021-11-16 | 2022-06-03 | 中国科学院上海高等研究院 | Low-pollution cache prefetching system and method based on instruction flow mixed mode learning |
CN114528024A (en) * | 2022-02-21 | 2022-05-24 | 安徽芯纪元科技有限公司 | Instruction fetching assembly line for storage and calculation fusion processor |
CN115454504A (en) * | 2022-09-05 | 2022-12-09 | 山东大学 | Four-emission RISC-V processor micro-architecture and working method thereof |
CN116089346A (en) * | 2023-04-07 | 2023-05-09 | 芯砺智能科技(上海)有限公司 | Method, system, medium and device for retransmitting error data on embedded bus |
Non-Patent Citations (1)
Title |
---|
M5-EDGE分布式取指模型设计;张超;喻明艳;;哈尔滨工业大学学报(第05期);第16-21页 * |
Also Published As
Publication number | Publication date |
---|---|
CN116719561A (en) | 2023-09-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP1889152B1 (en) | A method and apparatus for predicting branch instructions | |
JP5579930B2 (en) | Method and apparatus for changing the sequential flow of a program using prior notification technology | |
JP4027620B2 (en) | Branch prediction apparatus, processor, and branch prediction method | |
US7783869B2 (en) | Accessing branch predictions ahead of instruction fetching | |
US7797520B2 (en) | Early branch instruction prediction | |
EP2864868B1 (en) | Methods and apparatus to extend software branch target hints | |
CN102483696A (en) | Methods and apparatus to predict non-execution of conditional non-branching instructions | |
TWI502347B (en) | Branch prediction power reduction | |
TWI502496B (en) | Microprocessor capable of branch prediction power reduction | |
JP2009536770A (en) | Branch address cache based on block | |
US10831499B2 (en) | Apparatus and method for performing branch prediction | |
CN114116016B (en) | Instruction prefetching method and device based on processor | |
JP2019526873A (en) | Branch target buffer compression | |
US7017030B2 (en) | Prediction of instructions in a data processing apparatus | |
TWI258072B (en) | Method and apparatus of providing branch prediction enabling information to reduce power consumption | |
WO2022187014A1 (en) | Loop buffering employing loop characteristic prediction in a processor for optimizing loop buffer performance | |
US5815700A (en) | Branch prediction table having pointers identifying other branches within common instruction cache lines | |
EP2057536B1 (en) | Methods and apparatus for reducing lookups in a branch target address cache | |
CN117971324A (en) | CPU instruction prefetching method, device, equipment and storage medium | |
CN116719561B (en) | Conditional branch instruction processing system and method | |
US20030204705A1 (en) | Prediction of branch instructions in a data processing apparatus | |
US20040225866A1 (en) | Branch prediction in a data processing system | |
US7234046B2 (en) | Branch prediction using precedent instruction address of relative offset determined based on branch type and enabling skipping | |
CN117311814A (en) | Instruction fetch unit, instruction reading method and chip | |
CN114816533A (en) | Instruction processing method, processor, device and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |