CN116719561B - Conditional branch instruction processing system and method - Google Patents

Conditional branch instruction processing system and method Download PDF

Info

Publication number
CN116719561B
CN116719561B CN202310993515.5A CN202310993515A CN116719561B CN 116719561 B CN116719561 B CN 116719561B CN 202310993515 A CN202310993515 A CN 202310993515A CN 116719561 B CN116719561 B CN 116719561B
Authority
CN
China
Prior art keywords
instruction
unit
fetching
state
jump
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310993515.5A
Other languages
Chinese (zh)
Other versions
CN116719561A (en
Inventor
孙华庆
郑杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xinli Intelligent Technology Shanghai Co ltd
Original Assignee
Xinli Intelligent Technology Shanghai Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xinli Intelligent Technology Shanghai Co ltd filed Critical Xinli Intelligent Technology Shanghai Co ltd
Priority to CN202310993515.5A priority Critical patent/CN116719561B/en
Publication of CN116719561A publication Critical patent/CN116719561A/en
Application granted granted Critical
Publication of CN116719561B publication Critical patent/CN116719561B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3867Concurrent instruction execution, e.g. pipeline or look ahead using instruction pipelines
    • G06F9/3869Implementation aspects, e.g. pipeline latches; pipeline synchronisation and clocking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/3005Arrangements for executing specific machine instructions to perform operations for flow control
    • G06F9/30069Instruction skipping instructions, e.g. SKIP
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3802Instruction prefetching
    • G06F9/3804Instruction prefetching for branches, e.g. hedging, branch folding
    • G06F9/3806Instruction prefetching for branches, e.g. hedging, branch folding using address prediction, e.g. return stack, branch history buffer
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3802Instruction prefetching
    • G06F9/3814Implementation provisions of instruction buffers, e.g. prefetch buffer; banks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3836Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
    • G06F9/3842Speculative instruction execution
    • G06F9/3844Speculative instruction execution using dynamic branch prediction, e.g. using branch history tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Neurology (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Advance Control (AREA)

Abstract

The invention discloses a processing system and a processing method of a conditional branch instruction, comprising the following steps: the instruction transmitting unit is used for transmitting the instruction data fed back by the instruction storage unit based on the instruction fetching request to the instruction buffer area of the instruction fetching unit by adopting the first process, and transmitting the conditional branch instruction read from the instruction buffer area to the instruction executing unit by adopting the second process; the instruction execution unit is used for executing the conditional branch instruction to generate an execution result; the instruction fetching unit is used for emptying the instruction buffer area when the jump information is determined to be needed to jump, and sending the target instruction to the instruction executing unit through the instruction transmitting unit. When processing conditional branch instructions, the method can be realized by simple hardware, and the depth of an instruction buffer area is the number of instructions needed for hiding the pipeline execution period of an instruction execution unit, so that 1 branch prediction error is allowed by flushing the instruction buffer area when determining that jump is needed, the instruction is not needed to be deeply flushed, and the power consumption of branch prediction is reduced.

Description

Conditional branch instruction processing system and method
Technical Field
The present invention relates to the field of processor technologies, and in particular, to a system and a method for processing a conditional branch instruction.
Background
Conditional branch instructions are common in data pipelines of neural network processors (Neural Network Processor Unit, NPUs). The instruction to be executed after the conditional branch instruction has a control dependency on the execution result of the conditional branch instruction, and when the conditional branch instruction is executed, the pipeline is halted, and a program counter (Programming Counter, PC) of the next instruction is decided by waiting for the result of whether the branch instruction jumps or not. If the instruction does not jump, the next instruction is continued, and if the instruction jumps, the instruction jumps to the target PC for execution.
Many existing processors employ branch prediction techniques to improve the efficiency of instruction execution through branch prediction. However, branch prediction techniques require a branch prediction hardware unit that records the address of the branch instruction and the history of the instruction jump, combined to predict the PC of the next instruction. Therefore, the branch prediction unit hardware implementation is more complex and the power consumption is also greater.
Disclosure of Invention
The present invention provides a conditional branch instruction processing system and method to enable processing of conditional branch instructions in a simpler system, allowing 1 branch prediction error, without the need for deep drain instructions, to reduce the power consumption of branch prediction.
According to a first aspect of the present invention there is provided a processing system for conditional branch instructions, comprising: the instruction fetching unit, the instruction storage unit, the instruction transmitting unit and the instruction executing unit are respectively connected with the instruction fetching unit, and the instruction transmitting unit is respectively connected with the instruction storage unit and the instruction executing unit;
the instruction fetching unit is used for sending an instruction fetching request to the instruction storage unit, wherein the instruction fetching request comprises a current program counter PC pointer and running state information;
the instruction transmitting unit is used for transmitting the instruction data fed back by the instruction storage unit based on the instruction fetching request to an instruction buffer area of the instruction fetching unit by adopting a first process, and transmitting the conditional branch instruction read from the instruction buffer area to the instruction executing unit by adopting a second process, wherein the depth of the instruction buffer area is the instruction number required for hiding the pipeline execution period of the instruction executing unit;
the instruction execution unit is used for executing the conditional branch instruction to generate an execution result and sending the execution result to the instruction fetching unit, wherein the execution result comprises jump information and a target PC pointer;
the instruction fetching unit is used for emptying the instruction buffer area when the jump information is determined to be needed to jump, sending a re-fetching instruction request generated according to the target PC pointer to the instruction storage unit, enabling the instruction storage unit to determine a target instruction according to the target PC, then sending the target instruction to the instruction transmitting unit, and sending the target instruction to the instruction executing unit through the instruction transmitting unit.
According to another aspect of the present invention, there is provided a method of processing a conditional branch instruction, comprising: transmitting an instruction fetching request to an instruction storage unit through an instruction fetching unit, wherein the instruction fetching request comprises a current program counter PC pointer and running state information;
the instruction transmitting unit adopts a first process to transmit the instruction data fed back by the instruction storage unit based on the instruction fetching request to an instruction buffer area of the instruction fetching unit, and adopts a second process to transmit the conditional branch instruction read from the instruction buffer area to the instruction executing unit, wherein the depth of the instruction buffer area is the instruction number required for hiding the execution period of an instruction executing unit pipeline;
executing the conditional branch instruction through an instruction execution unit to generate an execution result, and sending the execution result to the instruction fetching unit, wherein the execution result comprises jump information and a target PC pointer;
when the jump information is determined to be needed to jump, the instruction buffer area is emptied through the instruction fetching unit, and a re-fetching instruction request generated according to the target PC pointer is sent to the instruction storage unit, so that the instruction storage unit determines a target instruction according to the target PC and then sends the target instruction to the instruction transmitting unit, and the target instruction is sent to the instruction buffer area through the instruction transmitting unit.
The technical scheme of the embodiment of the invention can be realized by simple hardware when processing the conditional branch instruction, and because the depth of the instruction buffer is the number of instructions needed for hiding the execution cycle of the instruction execution unit pipeline, 1 branch prediction error is allowed by clearing the instruction buffer when determining that the jump is needed, the instruction is not needed to be deeply emptied, and the power consumption of branch prediction is reduced.
It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the invention or to delineate the scope of the invention. Other features of the present invention will become apparent from the description that follows.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic diagram of a conditional branch instruction processing system according to a first embodiment of the present invention;
FIG. 2 is a state transition diagram of an instruction fetch unit state machine according to a first embodiment of the present invention;
fig. 3 is a flowchart of a processing method of a conditional branch instruction according to a second embodiment of the present invention.
Detailed Description
In order that those skilled in the art will better understand the present invention, a technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.
It is noted that the terms "comprises" and "comprising," and any variations thereof, in the description and claims of the present invention and in the foregoing figures, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed or inherent to such process, method, article, or apparatus.
Example 1
Fig. 1 is a schematic structural diagram of a conditional branch instruction processing system according to an embodiment of the present invention, as shown in fig. 1, where the system includes an instruction fetching unit, an instruction storage unit, an instruction transmitting unit and an instruction executing unit, where the instruction storage unit, the instruction transmitting unit and the instruction executing unit are respectively connected to the instruction fetching unit, and the instruction transmitting unit is respectively connected to the instruction storage unit and the instruction executing unit.
The instruction fetching unit is used for sending an instruction fetching request to the instruction storage unit, wherein the instruction fetching request comprises a current program counter PC pointer and running state information;
the instruction transmitting unit is used for transmitting the instruction data fed back by the instruction storage unit based on the instruction fetching request to the instruction buffer zone of the instruction fetching unit by adopting the first process, and transmitting the conditional branch instruction read from the instruction buffer zone to the instruction executing unit by adopting the second process, wherein the depth of the instruction buffer zone is the instruction quantity required for hiding the pipeline execution period of the instruction executing unit;
the instruction execution unit is used for executing the conditional branch instruction to generate an execution result and sending the execution result to the instruction fetching unit, wherein the execution result comprises jump information and a target PC pointer;
and the instruction fetching unit is used for emptying the instruction buffer area when the jump information is determined to be required to jump, and sending a re-fetching instruction request generated according to the target PC pointer to the instruction storage unit, so that the instruction storage unit determines a target instruction according to the target PC, and then sends the target instruction to the instruction transmitting unit, and the instruction transmitting unit sends the target instruction to the instruction executing unit.
Specifically, the instruction fetching unit of the present embodiment further includes a state machine, as shown in fig. 2, which is a state transition diagram of the state machine, so that the operation state information of the instruction fetching unit has four states, namely, an IDLE state-IDLE, an instruction fetching state-IFETCH, a re-instruction fetching state-refatch, and a waiting-for-empty state-wait_draw, and the state machine is configured to switch the operation state information of the instruction fetching unit from the IDLE state to the instruction fetching state when it is determined that the instruction fetching unit starts to operate; when determining that the instruction fetching unit receives the jump information needing to jump for the first time, switching the running state information from the instruction fetching state to the re-instruction fetching state; when the instruction fetching unit is determined to receive the jump information needing to jump again in the re-fetching state, the running state information is switched from the re-fetching state to the waiting and draining state. When determining that the instruction fetching unit is in the re-fetching state, the instruction storage unit returns all instructions of the request, namely, an instruction with a REFETCH state value of 0, switching the running state information from the re-fetching state to the fetching state; when the instruction fetching unit is determined to receive the jump information to be jumped under the instruction fetching state, and the instruction storage unit does not return all instructions of the history request, namely, the instruction with the REFETCH state value of 1, the running state information is switched from the instruction fetching state to the waiting emptying state; when the instruction fetching unit is determined to be in a waiting and draining state, the instruction storage unit returns all instructions of the history request, and then the running state information is switched from the waiting and draining state to the instruction fetching state; when the instruction fetching unit is determined to receive the last instruction in the instruction storage unit, the running state information is switched from the instruction fetching state to the idle state.
Optionally, the instruction fetching unit is configured to generate an instruction fetching request according to the current PC pointer and the running state information when it is determined that the running state information is an instruction fetching state or a re-instruction fetching state and the instruction buffer area is not full; and sending the instruction fetching request to an instruction storage unit, and self-increasing the current PC pointer according to the designated step length.
Specifically, in this embodiment, when the instruction fetch unit determines that the running state information of the instruction fetch unit is the fetch state IFETCH and the re-fetch state reftch through the state machine, and the instruction buffer is not full, the instruction fetch unit sends an instruction fetch request to the instruction storage unit, where the instruction fetch request includes the current PC pointer, for example, pc=1, and the running state information, for example, the fetch state IFETCH. In addition, the instruction fetch unit will also self-increment its current PC pointer according to a specified step, for example, step 1, after sending the instruction fetch request, so that the PC is updated to 2 in the instruction fetch unit, which is, of course, only illustrated in the present embodiment, and not limited to the self-increment mode of the PC pointer.
Optionally, the instruction storage unit is used for searching locally according to the current PC pointer to obtain an instruction corresponding to the current PC pointer; and generating instruction data according to the searched instruction and running state information, and sending the instruction data to an instruction transmitting unit.
In this embodiment, the instruction to be processed is stored in the instruction storage unit in advance, and the mapping relationship between the PC pointer and the instruction is established in the instruction storage unit, so when the instruction storage unit receives the instruction fetching request sent by the instruction fetching unit, the instruction storage unit extracts the current PC instruction, for example, pc=1, in the instruction fetching request, queries the instruction corresponding to pc=1 according to the pre-established mapping relationship, and after querying the instruction corresponding to the current PC pointer, the instruction storage unit composes the instruction and the received running state information into instruction data, thereby sending the instruction data including the instruction and the running state information to the instruction transmitting unit. The instruction transmitting unit decodes the instruction included in the instruction data to obtain the type of the instruction after receiving the instruction data sent by the instruction storing unit, and the process of decoding the instruction is essentially a process of identifying the instruction, for example, the type of the instruction includes a conditional branch instruction or a non-conditional branch instruction, and the specific type of the instruction is not limited in this embodiment. In this embodiment, the instruction transmitting unit has two processes that do not interfere with each other, namely, a first process for transmitting an instruction and a second process for reading the instruction, so that after decoding the instruction, the instruction transmitting unit transmits the instruction, the running state information and the type to the instruction fetching unit by using the first process.
Optionally, the instruction fetching unit is configured to directly store the instruction and the type into the instruction buffer when determining that the running state information is an instruction fetching state; when the running state information is determined to be in a waiting and draining state, suspending storing the instruction and the type into an instruction buffer; when the running state information is determined to be the re-instruction fetching state, checking the state value of the re-instruction fetching state, judging whether the state value is 1, if so, storing the instruction and the type into the instruction buffer, otherwise, stopping storing the instruction and the type into the instruction buffer.
Specifically, the instruction fetching unit in this embodiment does not necessarily store the instruction in the instruction buffer after receiving the instruction sent by the instruction transmitting unit, and needs to determine whether to store the instruction based on the running state information, and when the running state information is the instruction fetching state IFETCH, it indicates that the instruction is a general instruction request sent by the instruction fetching unit, and the instruction is directly received and stored in the instruction buffer; when the running state information is in a WAIT-empty state, the instruction cache unit is indicated to need to completely clear after all the instructions corresponding to the previous history request are received at present, so that the acquired instructions are not cached in the instruction buffer; when the running state information is the re-fetch state reftch, the state value of the re-fetch state is checked, when the state value is 0, the received instruction is not stored in the instruction buffer, and when the state value is 1, the instruction is cached in the instruction buffer. In this embodiment, the depth of the instruction buffer of the instruction fetch unit needs to be set in advance, and in particular, the depth is set to hide the number of instructions required by the pipeline execution cycle of the instruction execution unit, so that 1 branch prediction error can be allowed when the instruction is flushed, and no deep flushing instruction is required, so that the number of instructions stored in the instruction buffer in this embodiment is limited.
It should be noted that, in this embodiment, the instruction transmitting unit is configured to use the second process to read the instruction from the instruction buffer at regular time, and when the type of the read instruction is the unconditional branch instruction, the instruction executing unit is configured to sequentially execute the instructions in the instruction buffer in sequence, and then directly send the read unconditional branch instruction to the instruction executing unit. However, when the type of the read instruction is a conditional branch instruction, the read conditional branch instruction is sent to the instruction execution unit, but the instruction execution unit is in a judging operation when executing the conditional branch instruction, if the instruction execution unit is not determined, the instruction execution unit only needs to execute the conditional branch instruction in sequence, if the instruction execution unit is determined to be not, the target instruction is required to jump to the target instruction, and the target instruction is not necessarily the next adjacent instruction of the instruction, but the second process of the instruction execution unit reads the next adjacent instruction in sequence at the time of reading and sends the next adjacent instruction to the instruction execution unit, which inevitably causes the situation that the reading of the instruction execution unit and the execution of the instruction execution unit collide, and at the moment, the instruction execution unit updates the own reading flag bit to a pause state, so that the second process pauses the instruction reading work when the reading flag bit is in the pause state.
In this embodiment, the instruction execution unit receives the instruction sent by the instruction sending unit, and the instruction execution unit may normally execute the unconditional instruction, and there is no subsequent instruction jump, so that the important issue in this embodiment is the processing of the conditional branch instruction. The instruction execution unit executes the conditional branch execution instruction and generates an execution result when receiving the instruction sent by the instruction sending unit and determining that the type of the instruction is a conditional branch instruction, and includes jump information and a target PC pointer in the execution result and sends the obtained execution result to the instruction fetching unit.
Optionally, the instruction fetching unit is configured to empty the instruction buffer when determining that the jump information is that a jump is required, and determining that the instruction is in the instruction fetching state or the instruction is fetched again.
Optionally, the instruction fetching unit is further configured to update a read flag bit of the instruction transmitting unit to an operating state when it is determined that the jump information is received, where the second process starts the instruction reading operation when the read flag bit is in the operating state.
Specifically, after receiving the jump information, the instruction fetching unit in this embodiment determines whether the branch jump is no, and if not, the instruction executing unit only needs to execute the instructions in the instruction buffer in sequence, so only needs to update the reading flag of the instruction transmitting unit to the running state, so that the instruction transmitting unit starts the second process to continue to read the instructions from the instruction buffer in sequence. However, when it is determined that the branch jump is yes, not only the read instruction flag of the instruction transmitting unit needs to be updated to the running state, but if the instruction transmitting unit continues to read sequentially, the instruction read from the instruction buffer is not matched with the target PC pointer corresponding to the jump, so that when the instruction fetching unit determines that the instruction fetching unit is in the instruction fetching state IFETCH, receives the jump information sent by the executing unit and determines that the jump is required, or when the instruction fetching unit determines that the instruction fetching unit is in the re-instruction fetching state refatch, receives the jump information sent by the executing unit and determines that the jump is required, the instruction in the instruction buffer needs to be subjected to the flushing process. Therefore, under the condition that the instruction buffer is determined to be empty, a re-fetching instruction request generated according to the received target PC pointer, for example, the target pc=10, the instruction storage unit determines a matched target instruction according to the target PC and sends the target instruction to the instruction transmitting unit, the instruction transmitting unit stores the acquired instruction into the buffer through the first process, and at the moment, because the instruction buffer only contains the instruction to be executed by the instruction executing unit, the instruction transmitting unit can directly acquire the target instruction to be skipped when reading the instruction buffer through the second process and sends the target instruction to the instruction executing unit to execute the instruction, so that when the instruction executing unit executes the conditional branch instruction corresponding to pc=1, the instruction corresponding to the pc=10 to be skipped is directly acquired and executed when the skip is needed. Because only the instructions related to 1 branch pre-storage are deleted in the instruction buffer area, the number of the instructions is small, and therefore, the processing of conditional branch instructions can be realized without deep draining processing, and the power consumption of branch prediction is reduced.
In this embodiment, when processing a conditional branch instruction, it can be implemented in simple hardware, and since the depth of the instruction buffer is the number of instructions needed to conceal the pipeline execution cycle of the instruction execution unit, 1 branch prediction error is allowed by flushing the instruction buffer when determining that a jump is required, no deep flush instruction is needed, and the power consumption of branch prediction is reduced.
Example two
Fig. 3 is a flowchart of a processing method of a conditional branch instruction according to an embodiment of the present invention, where the present invention is applicable to a case where the conditional branch instruction is processed, and the method may be performed by the processing system of the conditional branch instruction in the above embodiment. As shown in fig. 3, the method includes:
step S101, an instruction fetch unit sends an instruction fetch request to an instruction storage unit.
Specifically, in this embodiment, when the instruction fetch unit determines that the running state information of the instruction fetch unit itself is the fetch state IFETCH and the refectoch, and the state value is 0 when the fetch state IFETCH is the fetch state IFETCH, and is 1 when the refectoch is the refectory state, and the instruction buffer is not full, the instruction fetch unit sends an instruction fetch request to the instruction storage unit, where the instruction fetch request includes the current PC pointer, for example, pc=1, and the running state information, for example, the fetch state IFETCH. In addition, the instruction fetch unit will also self-increment its current PC pointer according to a specified step, for example, step 1, after sending the instruction fetch request, so that the PC is updated to 2 in the instruction fetch unit, which is, of course, only illustrated in the present embodiment, and not limited to the self-increment mode of the PC pointer.
Step S102, the instruction transmitting unit adopts a first process to transmit the instruction data fed back by the instruction storage unit based on the instruction fetching request to the instruction buffer area of the instruction fetching unit, and adopts a second process to transmit the conditional branch instruction read from the instruction buffer area to the instruction executing unit.
All instructions to be processed are stored in the instruction storage unit in advance, and a mapping relation between a PC pointer and the instructions is established in the instruction storage unit, so when the instruction storage unit receives an instruction fetching request sent by the instruction fetching unit, a current PC instruction in the instruction fetching request, for example, pc=1, is extracted, an instruction corresponding to pc=1 is queried according to the pre-established mapping relation, and after the instruction storage unit queries the instruction corresponding to the current PC pointer, the instruction storage unit forms instruction data with the received running state information, so that the instruction data containing the instruction and the running state information is sent to the instruction transmitting unit. The instruction transmitting unit decodes the instruction included in the instruction data to obtain the type of the instruction after receiving the instruction data sent by the instruction storing unit, and the process of decoding the instruction is essentially a process of identifying the instruction, for example, the type of the instruction includes a conditional branch instruction or a non-conditional branch instruction, and the specific type of the instruction is not limited in this embodiment. In this embodiment, the instruction transmitting unit has two processes that do not interfere with each other, namely, a first process for transmitting an instruction and a second process for reading the instruction, so that after decoding the instruction, the instruction transmitting unit transmits the instruction, the running state information and the type to the instruction fetching unit by using the first process.
Specifically, the instruction fetching unit in this embodiment does not necessarily store the instruction in the instruction buffer after receiving the instruction sent by the instruction transmitting unit, and needs to determine whether to store the instruction based on the running state information, and when the running state information is the instruction fetching state IFETCH, it indicates that the instruction is a general instruction request sent by the instruction fetching unit, and the instruction is directly received and stored in the instruction buffer; when the running state information is in a WAIT-empty state, the instruction cache unit is indicated to need to completely clear after all the instructions corresponding to the previous history request are received at present, so that the acquired instructions are not cached in the instruction buffer; when the running state information is the re-fetch state reftch, the state value of the re-fetch state is checked, when the state value is 0, the received instruction is not stored in the instruction buffer, and when the state value is 1, the instruction is cached in the instruction buffer. In this embodiment, the depth of the instruction buffer of the instruction fetch unit needs to be set in advance, and in particular, the depth is set to hide the number of instructions required for the pipeline execution cycle of the instruction execution unit, so that 1 branch prediction error can be allowed when the flush is performed, and no deep flush instruction is required, so that the number of instructions stored in the instruction buffer in this embodiment is limited.
It should be noted that, in this embodiment, the instruction transmitting unit is configured to use the second process to read the instruction from the instruction buffer at regular time, and when the type of the read instruction is the unconditional branch instruction, the instruction executing unit is configured to sequentially execute the instructions in the instruction buffer in sequence, and then directly send the read unconditional branch instruction to the instruction executing unit. However, when the type of the read instruction is a conditional branch instruction, the read conditional branch instruction is sent to the instruction execution unit, but the instruction execution unit is in a judging operation when executing the conditional branch instruction, if the instruction execution unit is not determined, the instruction in the instruction buffer area is only required to be sequentially ordered, if the instruction is determined to be not, the target instruction is required to jump to the target instruction, and the target instruction is not necessarily the next adjacent instruction of the instruction, but the second process of the instruction transmitting unit is sequentially read and sent to the instruction execution unit at the time of reading, which inevitably causes the situation that the reading of the instruction transmitting unit and the execution of the instruction execution unit collide, and at the moment, the instruction transmitting unit updates the reading flag bit of the instruction transmitting unit to be in a pause state, so that the second process stops the instruction reading work when the reading flag bit is in the pause state.
Step S103, executing the conditional branch instruction by the instruction execution unit to generate an execution result, and sending the execution result to the instruction fetching unit.
In this embodiment, the instruction execution unit receives the instruction sent by the instruction sending unit, and the instruction execution unit may normally execute the unconditional instruction, and there is no subsequent instruction jump, so that the important issue in this embodiment is the processing of the conditional branch instruction. The instruction execution unit executes the conditional branch execution instruction and generates an execution result when receiving the instruction sent by the instruction sending unit and determining that the type of the instruction is a conditional branch instruction, and includes jump information and a target PC pointer in the execution result and sends the obtained execution result to the instruction fetching unit.
Step S104, when the instruction fetching unit determines that the jump information is needed to jump, the instruction buffer area is emptied, and a re-instruction fetching request generated according to the target PC pointer is sent to the instruction storage unit, so that the instruction storage unit determines a target instruction according to the target PC and then sends the target instruction to the instruction transmitting unit, and the instruction transmitting unit sends the target instruction to the instruction buffer area.
Specifically, after receiving the jump information, the instruction fetching unit in this embodiment determines whether the branch jump is no, and if not, the instruction executing unit only needs to execute the instructions in the instruction buffer in sequence, so only needs to update the reading flag of the instruction transmitting unit to the running state, so that the instruction transmitting unit starts the second process to continue to read the instructions from the instruction buffer in sequence. However, when it is determined that the branch jump is yes, not only the read instruction flag of the instruction transmitting unit needs to be updated to the running state, but if the instruction transmitting unit continues to read sequentially, the instruction read from the instruction buffer is not matched with the target PC pointer corresponding to the jump, so that when the instruction fetching unit determines that the instruction fetching unit is in the instruction fetching state IFETCH, receives the jump information sent by the executing unit and determines that the jump is required, or when the instruction fetching unit determines that the instruction fetching unit is in the re-instruction fetching state refatch, receives the jump information sent by the executing unit and determines that the jump is required, the instruction in the instruction buffer needs to be subjected to the flushing process. Therefore, under the condition that the instruction buffer is determined to be empty, a re-fetching instruction request generated according to the received target PC pointer, for example, the target pc=10, the instruction storage unit determines a matched target instruction according to the target PC and sends the target instruction to the instruction transmitting unit, the instruction transmitting unit stores the acquired instruction into the buffer through the first process, and at the moment, because the instruction buffer only contains the instruction to be executed by the instruction executing unit, the instruction transmitting unit can directly acquire the target instruction to be skipped when reading the instruction buffer through the second process and sends the target instruction to the instruction executing unit to execute the instruction, so that when the instruction executing unit executes the conditional branch instruction corresponding to pc=1, the instruction corresponding to the pc=10 to be skipped is directly acquired and executed when the skip is needed. Because only the instructions related to 1 branch pre-storage are deleted in the instruction buffer area, the number of the instructions is small, and therefore, the processing of conditional branch instructions can be realized without deep draining processing, and the power consumption of branch prediction is reduced.
In this embodiment, when processing a conditional branch instruction, it can be implemented in simple hardware, and since the depth of the instruction buffer is the number of instructions needed to conceal the pipeline execution cycle of the instruction execution unit, 1 branch prediction error is allowed by flushing the instruction buffer when determining that a jump is required, no deep flush instruction is needed, and the power consumption of branch prediction is reduced.
According to the embodiment of the invention, when the original vector image is determined to comprise the closed shape, the original vector image is subjected to the primitive reduction processing to obtain the adjustment vector image and then is converted into the bitmap, and the pixels to be printed in the corresponding bitmap are reduced under the condition of primitive reduction, so that the interference of the ink-jet printing characteristics such as ink fusion on the printing result is reduced, and the recognition of the printing result is improved.
It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps described in the present invention may be performed in parallel, sequentially, or in a different order, so long as the desired results of the technical solution of the present invention are achieved, and the present invention is not limited herein.
The above embodiments do not limit the scope of the present invention. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the scope of the present invention.

Claims (9)

1. A processing system for conditional branch instructions, comprising: the instruction fetching unit, the instruction storage unit, the instruction transmitting unit and the instruction executing unit are respectively connected with the instruction fetching unit, and the instruction transmitting unit is respectively connected with the instruction storage unit and the instruction executing unit;
the instruction fetching unit is used for sending an instruction fetching request to the instruction storage unit, wherein the instruction fetching request comprises a current program counter PC pointer and running state information;
the instruction transmitting unit is used for transmitting the instruction data fed back by the instruction storage unit based on the instruction fetching request to an instruction buffer area of the instruction fetching unit by adopting a first process, and transmitting the conditional branch instruction read from the instruction buffer area to the instruction executing unit by adopting a second process, wherein the depth of the instruction buffer area is the instruction number required for hiding the pipeline execution period of the instruction executing unit;
the instruction execution unit is used for executing the conditional branch instruction to generate an execution result and sending the execution result to the instruction fetching unit, wherein the execution result comprises jump information and a target PC pointer;
the instruction fetching unit is used for emptying the instruction buffer area when the jump information is determined to be needed to jump, and sending a re-fetching instruction request generated according to the target PC pointer to the instruction storage unit, so that the instruction storage unit determines a target instruction according to the target PC, then sends the target instruction to the instruction transmitting unit, and sends the target instruction to the instruction executing unit through the instruction transmitting unit;
the instruction fetching unit comprises a state machine, wherein the state machine is used for switching the running state information of the instruction fetching unit from an idle state to an instruction fetching state when the instruction fetching unit is determined to start working;
when the instruction fetching unit is determined to receive the jump information needing to jump for the first time, switching the running state information from the instruction fetching state to the instruction re-fetching state;
when the instruction fetching unit is determined to receive the jump information needing to jump again in the re-fetching state, switching the running state information from the re-fetching state to the waiting emptying state;
when the instruction fetching unit is determined to be in the re-fetching state and the instruction storage unit returns all instructions requested at this time, switching the running state information from the re-fetching state to the fetching state;
when the instruction fetching unit is determined to receive the jump information needing to jump in the instruction fetching state, and the instruction storage unit does not return all instructions of the history request, switching the running state information from the instruction fetching state to a waiting emptying state;
when the instruction fetching unit is determined to be in the waiting and draining state, the instruction storage unit returns all instructions of the history request, and then the running state information is switched from the waiting and draining state to the instruction fetching state;
and when the instruction fetching unit is determined to receive the last instruction in the instruction storage unit, switching the running state information from the instruction fetching state to the idle state.
2. The system according to claim 1, wherein the instruction fetch unit is configured to generate the instruction fetch request according to a current PC pointer and the running state information when it is determined that the running state information is a fetch state or a re-fetch state and the instruction buffer is not full;
and sending the instruction fetching request to the instruction storage unit, and self-increasing the current PC pointer of the instruction fetching request according to the specified step length.
3. The system according to claim 2, wherein the instruction storage unit is configured to search locally according to the current PC pointer to obtain an instruction corresponding to the current PC pointer;
and generating the instruction data according to the searched instruction and the running state information, and sending the instruction data to the instruction transmitting unit.
4. A system according to claim 3, wherein the instruction issue unit is configured to decode an instruction in the instruction data to obtain a type of the instruction, wherein the type includes a conditional branch instruction or a non-conditional branch instruction;
and adopting a first process to send the instruction, the running state information and the type to the instruction fetching unit.
5. The system of claim 4, wherein the instruction fetch unit is configured to directly store the instruction and the type into the instruction buffer when the running state information is determined to be an instruction fetch state;
suspending storing the instruction and the type in the instruction buffer when the running state information is determined to be a waiting to empty state;
and when the running state information is determined to be a re-instruction taking state, checking a state value of the re-instruction taking state, judging whether the state value is 1, if so, storing the instruction and the type into the instruction buffer area, and otherwise, stopping storing the instruction and the type into the instruction buffer area.
6. The system of claim 1, wherein the instruction issue unit is configured to employ a second process to read instructions from the instruction buffer at regular intervals, and to send the read unconditional branch instructions directly to the instruction execution unit when the type of instruction read is unconditional branch instructions;
when the type of the read instruction is a conditional branch instruction, the read conditional branch instruction is sent to the instruction execution unit, and meanwhile, the read flag bit of the second process is updated to be in a pause state, wherein the second process suspends the instruction reading work when the read flag bit is in the pause state.
7. The system of claim 1, wherein the instruction fetch unit is configured to empty the instruction buffer when the jump information is determined to be a jump required and when the jump information is determined to be in a fetch state or a re-fetch state.
8. The system of claim 6, wherein the instruction fetch unit is further configured to update a read flag bit of the instruction issue unit to an operational state when the jump information is determined to be received, wherein the second process initiates an instruction read operation when the read flag bit is in the operational state.
9. A method of processing conditional branch instructions as claimed in any one of claims 1 to 8, comprising:
transmitting an instruction fetching request to an instruction storage unit through an instruction fetching unit, wherein the instruction fetching request comprises a current program counter PC pointer and running state information;
the instruction transmitting unit adopts a first process to transmit the instruction data fed back by the instruction storage unit based on the instruction fetching request to an instruction buffer area of the instruction fetching unit, and adopts a second process to transmit the conditional branch instruction read from the instruction buffer area to the instruction executing unit, wherein the depth of the instruction buffer area is the instruction number required for hiding the execution period of an instruction executing unit pipeline;
executing the conditional branch instruction through an instruction execution unit to generate an execution result, and sending the execution result to the instruction fetching unit, wherein the execution result comprises jump information and a target PC pointer;
when the jump information is determined to be needed to jump, the instruction buffer area is emptied through the instruction fetching unit, and a re-fetching instruction request generated according to the target PC pointer is sent to the instruction storage unit, so that the instruction storage unit determines a target instruction according to the target PC and then sends the target instruction to the instruction transmitting unit, and the instruction transmitting unit sends the target instruction to the instruction buffer area;
the instruction fetching unit comprises a state machine, wherein the state machine is used for switching the running state information of the instruction fetching unit from an idle state to an instruction fetching state when the instruction fetching unit is determined to start working;
when the instruction fetching unit is determined to receive the jump information needing to jump for the first time, switching the running state information from the instruction fetching state to the instruction re-fetching state;
when the instruction fetching unit is determined to receive the jump information needing to jump again in the re-fetching state, switching the running state information from the re-fetching state to the waiting emptying state;
when the instruction fetching unit is determined to be in the re-fetching state and the instruction storage unit returns all instructions requested at this time, switching the running state information from the re-fetching state to the fetching state;
when the instruction fetching unit is determined to receive the jump information needing to jump in the instruction fetching state, and the instruction storage unit does not return all instructions of the history request, switching the running state information from the instruction fetching state to a waiting emptying state;
when the instruction fetching unit is determined to be in the waiting and draining state, the instruction storage unit returns all instructions of the history request, and then the running state information is switched from the waiting and draining state to the instruction fetching state;
and when the instruction fetching unit is determined to receive the last instruction in the instruction storage unit, switching the running state information from the instruction fetching state to the idle state.
CN202310993515.5A 2023-08-09 2023-08-09 Conditional branch instruction processing system and method Active CN116719561B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310993515.5A CN116719561B (en) 2023-08-09 2023-08-09 Conditional branch instruction processing system and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310993515.5A CN116719561B (en) 2023-08-09 2023-08-09 Conditional branch instruction processing system and method

Publications (2)

Publication Number Publication Date
CN116719561A CN116719561A (en) 2023-09-08
CN116719561B true CN116719561B (en) 2023-10-31

Family

ID=87871963

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310993515.5A Active CN116719561B (en) 2023-08-09 2023-08-09 Conditional branch instruction processing system and method

Country Status (1)

Country Link
CN (1) CN116719561B (en)

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5574871A (en) * 1994-01-04 1996-11-12 Intel Corporation Method and apparatus for implementing a set-associative branch target buffer
CN102968293A (en) * 2012-11-28 2013-03-13 中国人民解放军国防科学技术大学 Dynamic detection and execution method of program loop code based on instruction queue
CN105718241A (en) * 2016-01-18 2016-06-29 北京时代民芯科技有限公司 SPARC V8 system structure based classified type mixed branch prediction system
CN106293642A (en) * 2016-08-08 2017-01-04 合肥工业大学 A kind of branch process module and branch process mechanism thereof calculating system for coarseness multinuclear
CN112540792A (en) * 2019-09-23 2021-03-23 阿里巴巴集团控股有限公司 Instruction processing method and device
CN113076090A (en) * 2021-04-23 2021-07-06 中国人民解放军国防科技大学 Side channel safety protection-oriented loop statement execution method and device
CN113760366A (en) * 2021-07-30 2021-12-07 浪潮电子信息产业股份有限公司 Method, system and related device for processing conditional jump instruction
CN114528024A (en) * 2022-02-21 2022-05-24 安徽芯纪元科技有限公司 Instruction fetching assembly line for storage and calculation fusion processor
CN114579479A (en) * 2021-11-16 2022-06-03 中国科学院上海高等研究院 Low-pollution cache prefetching system and method based on instruction flow mixed mode learning
CN115454504A (en) * 2022-09-05 2022-12-09 山东大学 Four-emission RISC-V processor micro-architecture and working method thereof
CN116089346A (en) * 2023-04-07 2023-05-09 芯砺智能科技(上海)有限公司 Method, system, medium and device for retransmitting error data on embedded bus

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9092225B2 (en) * 2012-01-31 2015-07-28 Freescale Semiconductor, Inc. Systems and methods for reducing branch misprediction penalty
US9940262B2 (en) * 2014-09-19 2018-04-10 Apple Inc. Immediate branch recode that handles aliasing
CN112540797A (en) * 2019-09-23 2021-03-23 阿里巴巴集团控股有限公司 Instruction processing apparatus and instruction processing method

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5574871A (en) * 1994-01-04 1996-11-12 Intel Corporation Method and apparatus for implementing a set-associative branch target buffer
CN102968293A (en) * 2012-11-28 2013-03-13 中国人民解放军国防科学技术大学 Dynamic detection and execution method of program loop code based on instruction queue
CN105718241A (en) * 2016-01-18 2016-06-29 北京时代民芯科技有限公司 SPARC V8 system structure based classified type mixed branch prediction system
CN106293642A (en) * 2016-08-08 2017-01-04 合肥工业大学 A kind of branch process module and branch process mechanism thereof calculating system for coarseness multinuclear
CN112540792A (en) * 2019-09-23 2021-03-23 阿里巴巴集团控股有限公司 Instruction processing method and device
CN113076090A (en) * 2021-04-23 2021-07-06 中国人民解放军国防科技大学 Side channel safety protection-oriented loop statement execution method and device
CN113760366A (en) * 2021-07-30 2021-12-07 浪潮电子信息产业股份有限公司 Method, system and related device for processing conditional jump instruction
CN114579479A (en) * 2021-11-16 2022-06-03 中国科学院上海高等研究院 Low-pollution cache prefetching system and method based on instruction flow mixed mode learning
CN114528024A (en) * 2022-02-21 2022-05-24 安徽芯纪元科技有限公司 Instruction fetching assembly line for storage and calculation fusion processor
CN115454504A (en) * 2022-09-05 2022-12-09 山东大学 Four-emission RISC-V processor micro-architecture and working method thereof
CN116089346A (en) * 2023-04-07 2023-05-09 芯砺智能科技(上海)有限公司 Method, system, medium and device for retransmitting error data on embedded bus

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
M5-EDGE分布式取指模型设计;张超;喻明艳;;哈尔滨工业大学学报(第05期);第16-21页 *

Also Published As

Publication number Publication date
CN116719561A (en) 2023-09-08

Similar Documents

Publication Publication Date Title
EP1889152B1 (en) A method and apparatus for predicting branch instructions
JP5579930B2 (en) Method and apparatus for changing the sequential flow of a program using prior notification technology
JP4027620B2 (en) Branch prediction apparatus, processor, and branch prediction method
US7783869B2 (en) Accessing branch predictions ahead of instruction fetching
US7797520B2 (en) Early branch instruction prediction
EP2864868B1 (en) Methods and apparatus to extend software branch target hints
CN102483696A (en) Methods and apparatus to predict non-execution of conditional non-branching instructions
TWI502347B (en) Branch prediction power reduction
TWI502496B (en) Microprocessor capable of branch prediction power reduction
JP2009536770A (en) Branch address cache based on block
US10831499B2 (en) Apparatus and method for performing branch prediction
CN114116016B (en) Instruction prefetching method and device based on processor
JP2019526873A (en) Branch target buffer compression
US7017030B2 (en) Prediction of instructions in a data processing apparatus
TWI258072B (en) Method and apparatus of providing branch prediction enabling information to reduce power consumption
WO2022187014A1 (en) Loop buffering employing loop characteristic prediction in a processor for optimizing loop buffer performance
US5815700A (en) Branch prediction table having pointers identifying other branches within common instruction cache lines
EP2057536B1 (en) Methods and apparatus for reducing lookups in a branch target address cache
CN117971324A (en) CPU instruction prefetching method, device, equipment and storage medium
CN116719561B (en) Conditional branch instruction processing system and method
US20030204705A1 (en) Prediction of branch instructions in a data processing apparatus
US20040225866A1 (en) Branch prediction in a data processing system
US7234046B2 (en) Branch prediction using precedent instruction address of relative offset determined based on branch type and enabling skipping
CN117311814A (en) Instruction fetch unit, instruction reading method and chip
CN114816533A (en) Instruction processing method, processor, device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant