CN112181492A - Instruction processing method, instruction processing device and chip - Google Patents

Instruction processing method, instruction processing device and chip Download PDF

Info

Publication number
CN112181492A
CN112181492A CN202011009527.2A CN202011009527A CN112181492A CN 112181492 A CN112181492 A CN 112181492A CN 202011009527 A CN202011009527 A CN 202011009527A CN 112181492 A CN112181492 A CN 112181492A
Authority
CN
China
Prior art keywords
instruction
arithmetic unit
address
queue
identification value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011009527.2A
Other languages
Chinese (zh)
Inventor
文兴植
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Panyi Technology Co.,Ltd.
Original Assignee
Beijing Eswin Computing Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Eswin Computing Technology Co Ltd filed Critical Beijing Eswin Computing Technology Co Ltd
Priority to CN202011009527.2A priority Critical patent/CN112181492A/en
Priority to PCT/CN2020/139465 priority patent/WO2022062230A1/en
Publication of CN112181492A publication Critical patent/CN112181492A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/3004Arrangements for executing specific machine instructions to perform operations on memory
    • G06F9/30047Prefetch instructions; cache control instructions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3867Concurrent instruction execution, e.g. pipeline or look ahead using instruction pipelines

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Advance Control (AREA)

Abstract

The invention provides an instruction processing method, an instruction processing device and a chip, wherein the method comprises the following steps: when a first instruction is executed, recording a target operation unit identification value carried by the first instruction; when executing a second instruction, comparing whether the arithmetic unit identification value of the arithmetic unit started by the second instruction is the same as the target arithmetic unit identification value; and under the condition that the identification value of the arithmetic unit started by the second instruction is the same as the identification value of the target arithmetic unit, sending the operand of the second instruction to the arithmetic unit for calculation, recording the address of a result register of the second instruction, and continuously executing subsequent instructions. According to the instruction processing method provided by the embodiment of the invention, the multi-cycle instruction is processed by introducing a non-blocking execution mode, so that the execution time of the instruction is greatly shortened, and the efficiency of processing the multi-cycle instruction by an instruction pipeline is improved.

Description

Instruction processing method, instruction processing device and chip
Technical Field
The invention relates to the technical field of processors, in particular to an instruction processing method, an instruction processing device and a chip.
Background
Sequential instruction pipelining is a processor micro-architecture that improves the efficiency of processor instruction execution. A sequential instruction pipeline divides the execution of an instruction into several sub-processes (stages), also called stages, each of which runs in parallel with the others. This is called a pipeline because it works very similar to a production line in a plant. A common sequential instruction pipeline is generally divided into five stages: fetch instruction (Fetch), Decode instruction (Decode), Execute instruction (Execute), read-Write Memory (Memory Access), and result Write-back register (Write-back). The sequential instruction pipeline is not limited to five stages and may have fewer or more stages. Within each clock cycle, there may be one or more instructions in the sequential instruction pipeline, with each stage in the instruction pipeline being responsible for one of the instructions. When an instruction passes through all stages, the instruction completes execution. However, some instructions need more than one cycle in the sequential instruction pipeline to pass through the stage of executing the instruction (Execute) (referred to as multi-cycle instruction), which means that the instruction will stay more than one cycle in the stage of executing the instruction (Execute), so that the entire sequential instruction pipeline needs to be halted, and subsequent other instructions need to be halted in respective stages until the instruction is completed.
Disclosure of Invention
In view of the above, the present invention provides an instruction processing method, an instruction processing apparatus, and a chip, which can solve the problem of low operation efficiency caused by frequent suspension of an instruction pipeline during execution of multi-cycle instructions in the prior art.
In order to solve the technical problems, the invention adopts the following technical scheme:
an embodiment of one aspect of the present invention provides an instruction processing method, including:
when a first instruction is executed, recording a target operation unit identification value carried by the first instruction;
when executing a second instruction, comparing whether the arithmetic unit identification value of the arithmetic unit enabled by the second instruction is the same as the target arithmetic unit identification value, wherein the second instruction is an instruction executed after the first instruction, and the second instruction stays for more than one cycle in an execution stage of an instruction pipeline;
and under the condition that the identification value of the arithmetic unit started by the second instruction is the same as the identification value of the target arithmetic unit, sending the operand of the second instruction to the arithmetic unit for calculation, recording the address of a result register of the second instruction, and continuously executing subsequent instructions.
Optionally, the step of continuing to execute the subsequent instruction includes:
when a write memory instruction is executed, if the address of a source register of the write memory instruction is the same as the address of a result register of the second instruction, the write address of the write memory instruction is stored in a first-in first-out address queue, and the recorded address of the result register of the second instruction is emptied.
Optionally, after the step of sending the operand of the second instruction to the arithmetic unit for calculation, the method further includes:
after the calculation is completed by the operation unit, storing an operation result into a first-in first-out data queue;
and if the outlets of the first-in first-out address queue and the first-in first-out data queue have effective values, matching the outlet values of the first-in first-out data queue and the outlet values of the first-in first-out address queue, and writing the matched data into corresponding addresses.
Optionally, the step of continuing to execute the subsequent instruction further includes:
executing the third instruction;
if the FIFO address queue and the FIFO data queue are not empty, pausing the instruction pipeline with the third instruction until the FIFO address queue and the FIFO data queue are empty;
and if the first-in first-out address queue and the first-in first-out data queue are emptied, emptying the recorded target arithmetic unit identification value carried by the first instruction by using the third instruction.
Another embodiment of the present invention provides an instruction processing apparatus, including:
the recording module is used for recording a target operation unit identification value carried by a first instruction when the first instruction is executed;
a comparison module, configured to compare whether an arithmetic unit identifier value of an arithmetic unit enabled by a second instruction is the same as the target arithmetic unit identifier value when the second instruction is executed, where the second instruction is an instruction executed after the first instruction, and the second instruction stays in an execution stage in an instruction pipeline for more than one cycle;
and the processing module is used for sending the operand of the second instruction to the arithmetic unit for calculation under the condition that the arithmetic unit identification value of the arithmetic unit started by the second instruction is the same as the target arithmetic unit identification value, recording the address of a result register of the second instruction, and continuously executing subsequent instructions.
Optionally, the processing module includes:
the first processing unit is configured to, when a write memory instruction is executed, store a write address of the write memory instruction in a first-in first-out address queue if a source register address of the write memory instruction is the same as a result register address of the second instruction, and clear a recorded result register address of the second instruction.
Optionally, the method further includes:
the storage module is used for storing an operation result into a first-in first-out data queue after the operation unit completes the calculation;
and the pairing module is used for pairing the value of the outlet of the first-in first-out data queue and the value of the outlet of the first-in first-out address queue and writing the paired data into a corresponding address if the outlets of the first-in first-out address queue and the first-in first-out data queue have effective values.
Optionally, the method further includes:
the execution module is used for executing the third instruction;
a second pausing module, configured to pause the instruction pipeline using the third instruction until the fifo address queue and the fifo data queue are emptied, if the fifo address queue and the fifo data queue are not emptied;
and the emptying module is used for emptying the recorded target operation unit identification value carried by the first instruction by using the third instruction if the first-in first-out address queue and the first-in first-out data queue are both emptied.
In yet another embodiment of the present invention, a chip is further provided, where the chip includes a processor and a communication interface, the communication interface is coupled to the processor, and the processor is configured to execute a program or an instruction to implement the instruction processing method described above.
In still another aspect, an embodiment of the present invention further provides a readable storage medium, on which a program or instructions are stored, and the program or instructions, when executed by a processor, implement the steps of the instruction processing method as described above.
The technical scheme of the invention has the following beneficial effects:
according to the instruction processing method provided by the embodiment of the invention, the multi-cycle instruction is processed by introducing a non-blocking execution mode, so that the execution time of the instruction is greatly shortened, and the efficiency of processing the multi-cycle instruction by an instruction pipeline is improved.
Drawings
FIG. 1 is a schematic diagram of instructions in an instruction pipeline provided by an embodiment of the present invention;
FIG. 2 is a diagram illustrating the execution of instructions in an instruction pipeline according to an embodiment of the present invention;
FIG. 3 is a block diagram of an exemplary method for processing instructions;
FIG. 4 is a diagram illustrating a non-blocking execution of an instruction pipeline according to an embodiment of the present invention;
FIG. 5 is a diagram illustrating pseudo assembler instructions according to an embodiment of the invention;
FIG. 6 is a diagram illustrating a non-blocking execution of an instruction pipeline according to an embodiment of the present invention;
fig. 7 is a schematic structural diagram of an instruction processing apparatus according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the drawings of the embodiments of the present invention. It is to be understood that the embodiments described are only a few embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the described embodiments of the invention, are within the scope of the invention.
FIG. 1 is a diagram illustrating instructions in an instruction pipeline according to an embodiment of the present invention. As shown in fig. 1, the instruction pipeline may execute a variety of different instructions, such as integer subtraction SUB, bitwise AND, floating-point multiplication fp.
FIG. 2 is a diagram illustrating the execution of instructions in an instruction pipeline according to an embodiment of the present invention. As shown in fig. 2, the instruction pipeline divides the execution of an instruction into several sub-processes (stages), also called stages, each of which runs in parallel with the other sub-processes. In general, an instruction pipeline may be divided into five stages: fetch instruction (Fetch); parse instruction (Decode); execute instruction (Execute); read-Write Memory (Memory Access) and result-Write-back registers (Write-back), although the instruction pipeline is not limited to five stages and may have fewer or more stages. In FIG. 2, one or more instructions may be present in the instruction pipeline during each clock cycle, with each stage (stage) in the instruction pipeline being responsible for one of the instructions. For example, on the first clock cycle, instruction SUB enters the first stage of the instruction pipeline (i.e., the instruction fetch stage); in the second clock cycle, the instruction SUB enters the second stage of the instruction pipeline (i.e., the instruction parsing stage), while the new instruction AND enters the first stage of the instruction pipeline; in the third clock cycle, instruction SUB enters the third stage of the instruction pipeline, instruction AND enters the second stage of the instruction pipeline, new instruction XOR enters the first stage … … of the instruction pipeline, AND so on until the instruction enters the fifth stage AND exits from the instruction pipeline. When an instruction passes through all stages, the instruction completes execution.
With continued reference to fig. 2, it can be seen that most instructions require only one clock cycle to pass through a stage (stage) in the instruction pipeline, such as integer subtraction SUB, bitwise AND so on, but some instructions, such as complex arithmetic instructions, usually cannot be completed within one clock cycle, AND need to stay in the third stage (execution stage) for multiple cycles to complete, such as instruction fp.
Since the execution order of the instructions in the instruction pipeline is consistent with the order of the compiler-generated assembly instructions, when a certain instruction in the instruction pipeline is halted at a certain stage, the whole instruction pipeline is also halted until the instruction resumes running. For example, arithmetic instructions are typically fed into an arithmetic Unit (Function Unit) at the third stage (Execute stage) for computation; if the arithmetic unit requires several cycles to complete the calculation, the instruction pipeline must be halted for several clock cycles (see fig. 2), and the pipeline cannot receive a new instruction and the instruction in each stage is halted in its own stage until the arithmetic unit completes the calculation, the instruction pipeline can resume running, the arithmetic unit and the instruction pipeline execute in series, and the instruction pipeline usually uses the blocking type execution mode to process multi-cycle instructions. Therefore, when the instruction pipeline frequently executes multi-cycle instructions, the instruction pipeline needs to be frequently halted to wait for the multi-cycle instructions to complete, which undoubtedly results in inefficient operation of the instruction pipeline.
Therefore, please refer to fig. 3, which is a flowchart illustrating an instruction processing method according to an embodiment of the present invention. As shown in fig. 3, the instruction processing method in the embodiment of the present invention may include:
step 31: when a first instruction is executed, recording a target arithmetic unit identification value carried by the first instruction.
In this step, when the first instruction enters the instruction pipeline, the instruction pipeline acquires and records the target arithmetic unit identification value carried in the first instruction; because each multi-cycle arithmetic unit has a fixed arithmetic unit identification value, namely, fnc id value, for example, fnc id of integer multiplication is 1, fnc id of integer division is 2, a first instruction is used to carry a target arithmetic unit identification value, and when the instruction pipeline processes subsequent instructions through the target arithmetic unit identification value, as long as the arithmetic unit identification value of the arithmetic unit started by the subsequent instructions is the same as the target arithmetic unit identification value, the instruction pipeline can process the instructions by adopting a non-blocking execution method, so that the operating efficiency of the instruction pipeline is improved.
Step 32: and when a second instruction is executed, comparing whether the arithmetic unit identification value of the arithmetic unit enabled by the second instruction is the same as the target arithmetic unit identification value or not, wherein the second instruction is an instruction executed after the first instruction, and the second instruction stays for more than one cycle in an execution stage in an instruction pipeline.
In the embodiment of the present invention, after the second instruction enters the instruction pipeline, for example, when the second instruction is an arithmetic instruction, the second instruction is usually sent to the arithmetic Unit (Function Unit) at the third stage (Execute stage) to perform calculation, and the arithmetic Unit identifier value of the arithmetic Unit activated by the second instruction may be compared with the target arithmetic Unit identifier value recorded in step 31 to determine whether the second instruction needs to adopt a non-blocking execution mode; wherein the second instruction is an instruction that enters the same instruction pipeline after the first instruction, and the second instruction stays for more than one cycle at an execution stage in the instruction pipeline; of course, it can be understood that if the instruction entering the instruction pipeline after the first instruction stays in the execution stage for no more than one cycle, the instruction pipeline will not be halted, and the instruction pipeline will not be blocked.
Step 33: and under the condition that the identification value of the arithmetic unit started by the second instruction is the same as the identification value of the target arithmetic unit, sending the operand of the second instruction to the arithmetic unit for calculation, recording the address of a result register of the second instruction, and continuously executing subsequent instructions.
In this step, under the condition that the identification value of the arithmetic unit started by the second instruction is the same as the identification value of the target arithmetic unit recorded by the instruction pipeline, it is determined that the second instruction needs to adopt a non-blocking execution mode; specifically, the operand of the second instruction is sent to the corresponding arithmetic unit for computation, and as can be seen from the above, the arithmetic unit enabled by the second instruction needs more than one clock cycle to complete computation, and at the same time, the instruction pipeline also records the result register address of the second instruction, where the result register address is used for storing the computation result of the second instruction, and at this time, the instruction pipeline continues to execute the subsequent instructions, that is, the second instruction does not need to continue to stay in the execution stage of the instruction pipeline after sending the operand to the corresponding arithmetic unit in the execution stage, but directly enters the next stage, that is, the instructions in each stage in the instruction pipeline sequentially enter the next stage, so that the instruction pipeline can continue to fetch the next instruction for processing without pausing, and at this time, the arithmetic unit and the instruction pipeline execute in parallel, thereby effectively improving the efficiency of the instruction pipeline.
Referring to FIG. 4, FIG. 4 is a block diagram illustrating an embodiment of an instruction pipeline that is non-blocking. As shown in fig. 4, in the SUB instruction, the AND instruction, etc., since the arithmetic unit identification value of the arithmetic unit enabled by these instructions is different from the target arithmetic unit identification value recorded in the instruction pipeline, these instructions do not need to adopt a non-blocking execution mode; the operation unit identification value of the operation unit started by the FP.MUL instruction is the same as the target operation unit identification value recorded by the instruction pipeline, namely a non-blocking execution mode is needed, at the moment, the FP.MUL instruction does not need to stay in the execution stage of the instruction pipeline continuously, the operation number is sent to the corresponding operation unit and then directly enters the next stage, namely, the instructions in each stage in the instruction pipeline sequentially enter the next stage.
In this embodiment of the present invention, the instruction processing method may further include:
and under the condition that the identification value of the arithmetic unit started by the second instruction is different from the identification value of the target arithmetic unit, pausing the instruction pipeline until the arithmetic unit finishes calculation.
That is, if the identification value of the arithmetic unit enabled by the second instruction is different from the identification value of the target arithmetic unit of the first instruction recorded in the instruction pipeline, the instruction pipeline still executes the second instruction in a blocking execution manner, that is, the instruction pipeline needs to be halted, until the arithmetic unit completes the calculation, the second instruction cannot enter the next stage, and the subsequent instructions cannot be executed. In fact, the first instruction is used to specify which multi-cycle unit needs to use non-blocking execution, the instruction pipeline needs to use non-blocking execution when processing an instruction that enables the specified multi-cycle unit, and the instruction pipeline needs to halt the pipeline until the unit completes its computation because of blocking when processing an instruction whose enabled multi-cycle unit identifier is different from the target unit identifier.
In this embodiment of the present invention, the step of continuing to execute the subsequent instruction includes:
when a write memory instruction is executed, if the address of a source register of the write memory instruction is the same as the address of a result register of the second instruction, the write address of the write memory instruction is stored in a first-in first-out address queue, and the recorded address of the result register of the second instruction is emptied.
The method comprises the steps that a write memory instruction enters an execution stage after a second instruction, if the source register address of the write memory instruction is the same as the result register address of the second instruction, the result generated by the second instruction needs to be written into a memory, at the moment, the write address of the write memory instruction can be stored into a first-in first-out address queue, the first-in first-out address queue is specially used for storing addresses, and the recorded result register address of the second instruction can be simultaneously removed as the source register address of the write memory instruction is the same as the result register address of the second instruction. Generally, when the second instruction is an arithmetic instruction, a corresponding write memory instruction is subsequently provided for storing the result of the calculation.
In an embodiment of the present invention, after the step of sending the operand of the second instruction to the arithmetic unit for calculation, the method further includes:
after the calculation is completed by the operation unit, storing an operation result into a first-in first-out data queue;
and if the outlets of the first-in first-out address queue and the first-in first-out data queue have effective values, matching the outlet values of the first-in first-out data queue and the outlet values of the first-in first-out address queue, and writing the matched data into corresponding addresses.
Specifically, after the operand is sent to the operation unit by the second instruction, the operation unit completes calculation after more than one clock cycle, and then the operation result is stored into a first-in first-out data queue, wherein the first-in first-out data queue is specially used for storing data, when the outlets of the first-in first-out address queue and the first-in first-out data queue both have valid values, the value of the outlet of the first-in first-out data queue and the value of the outlet of the first-in first-out address queue are paired, and the paired data are written into corresponding addresses. The first-in first-out address queue and the first-in first-out data queue are both in first-out characteristic, and the second instruction and the subsequent write memory instruction are paired one by one, so that correct pairing of data and addresses can be realized.
In this embodiment of the present invention, the step of continuing to execute the subsequent instruction further includes:
executing the third instruction;
if the FIFO address queue and the FIFO data queue are not empty, pausing the instruction pipeline with the third instruction until the FIFO address queue and the FIFO data queue are empty;
and if the first-in first-out address queue and the first-in first-out data queue are emptied, emptying the recorded target arithmetic unit identification value carried by the first instruction by using the third instruction.
In the step of continuing to execute the subsequent instruction in the pipeline, when executing the third instruction, if the fifo address queue and the fifo data queue are not emptied, it means that the computation result and the write address in the previous paired instruction have not been all paired and written into the register, so the third instruction will halt the instruction pipeline until the fifo address queue and the fifo data queue are emptied; and if the first-in first-out address queue and the first-in first-out data queue are emptied, the calculation result is matched with the corresponding write address and is written into the register, all non-blocking operations needing to be adopted are finished, the instruction pipeline normally continues to execute the subsequent instruction, and meanwhile, the third instruction empties the recorded target operation unit identification value, so that the non-blocking execution mode is not adopted in the process of executing the instruction by the subsequent instruction pipeline.
It can be known that the first instruction is used for making the instruction pipeline adopt the above-mentioned non-blocking execution mode when processing the instruction whose operation unit identification value of the subsequently enabled operation unit is the same as the target operation unit identification value carried in the first instruction, and the third instruction is used for making the instruction pipeline not adopt the above-mentioned non-blocking execution mode when processing the subsequent instruction, that is, in all the instructions (including the multi-cycle second instruction and other non-multi-cycle instructions) entering the instruction pipeline between the first instruction and the third instruction, as long as the operation unit identification value of the enabled operation unit is the same as the target operation unit identification value carried in the first instruction, the above-mentioned non-blocking execution mode is adopted to improve the efficiency of the instruction pipeline, and if the operation unit identification value of the enabled operation unit is different from the target operation unit identification value carried in the first instruction, the non-blocking execution mode is not adopted, namely a normal processing mode is adopted; it should be noted that, in all the instructions entering the instruction pipeline between the first instruction and the third instruction, the instruction pipeline does not cause blocking when processing the instruction enabling the single-cycle arithmetic unit, that is, the execution stage stays for no more than one cycle, and therefore, the non-blocking execution mode is not required, and the execution can be performed according to the conventional instruction processing flow. If the instruction pipeline needs to adopt a non-blocking execution mode again when the complex operation instruction is executed subsequently, the target operation unit identification value of the first instruction is reset, and the instruction pipeline records the target operation unit identification value carried in the first instruction again.
In the embodiment of the invention, a plurality of second instructions which are executed in a non-blocking execution mode can be arranged between the first instruction and the third instruction, each second instruction has a corresponding write address instruction, so that a plurality of data are stored in a first-in first-out data queue, a plurality of write addresses are stored in the first-in first-out address queue, and the pairing is carried out subsequently.
In the embodiment of the invention, the multi-cycle instruction is processed by introducing a non-blocking execution mode, so that the execution time of the instruction is greatly shortened, and the efficiency of processing the multi-cycle instruction by an instruction pipeline is improved.
The following further illustrates an instruction processing method in an embodiment of the present invention.
Referring to fig. 5 and fig. 6, fig. 5 is a schematic diagram of a plurality of pseudo assembler instructions according to an embodiment of the present invention, and fig. 6 is a schematic diagram of an instruction pipeline according to an embodiment of the present invention, which employs a non-blocking execution mode. As shown in fig. 5 and 6, a custom instruction nobblk is introduced, which carries a target arithmetic unit identification value, i.e. a NONBLK fnc id value, and the target arithmetic unit identification value specifies which multi-cycle arithmetic unit needs to perform a non-blocking operation, for example, an instruction noblk.
The following describes the specific execution process of each pseudo-assembler instruction in the instruction pipeline:
when an instruction pipeline executes a NONBLK.FP.MUL instruction, recording a target operation unit identification value carried by the instruction pipeline, namely a nonblk fnc id value, namely that a multi-cycle operation unit started by the FP.MUL instruction needs to perform non-blocking operation;
when the instruction pipeline executes the multi-cycle instruction FP.MUL, comparing the fnc value of the operation unit of the instruction FP.MUL with the previously recorded identification value of the target operation unit, if the fnc value of the operation unit of the instruction FP.MUL is consistent with the identification value of the target operation unit, indicating that the instruction FP.MUL needs to adopt a non-blocking execution mode, namely the instruction pipeline is not suspended, continuously executing the next instruction, sending the operand of the instruction FP.MUL to the operation unit, executing the operation unit and the instruction pipeline in parallel, and recording the result register address nonbldst of the instruction FP.MUL by the instruction pipeline;
when the instruction pipeline executes the memory-writing instruction STORE, if the source register address of the instruction is the same as the result register address nonblk dst recorded before, it indicates that the memory-writing instruction STORE needs to write the result generated by the instruction fp.mul into the memory, at this time, the write address of the memory-writing instruction STORE is stored in the FIFO B, and the result register address nonblk dst recorded in the above step is emptied;
fourthly, when the arithmetic unit completes the calculation, the arithmetic result is written into the first-in first-out data queue FIFO A;
fifthly, if the FIFO B and FIFO A outlets have valid values, matching the value of the FIFO A outlet with the value of the FIFO B outlet, and writing the value into the memory (wherein, the value of the FIFO A is data, and the value of the FIFO B is a write address), and then the instruction FP.MUL and the instruction STORE of the memory are executed;
a custom instruction NONBLK.WAIT instruction is introduced which halts the instruction pipeline as it is, i.e.: if FIFO A and FIFO B have already been emptied, the instruction pipeline does not halt; if FIFO A and FIFO B are not empty, then NONBLK.WAIT instruction will halt the instruction pipeline until FIFO A and FIFO B are empty; if both FIFOs are empty, all non-blocking operations are finished, and the instruction pipeline resumes execution; the WAIT instruction clears the target ALU ID value recorded in the first step, i.e., the nonblk fnc id value.
If the non-blocking operation needs to be started again, the NONBLK instruction needs to be executed again, and the value of NONBLK fnc id carried in the NONBLK instruction can be reset.
In the embodiment of the invention, the multi-cycle instruction is processed by introducing a non-blocking execution mode, so that the execution time of the instruction is greatly shortened, and the efficiency of processing the multi-cycle instruction by an instruction pipeline is improved.
Fig. 7 is a schematic structural diagram of an instruction processing apparatus according to an embodiment of the present invention. As shown in fig. 7, another embodiment of the present invention further provides an instruction processing apparatus, where the instruction processing apparatus 70 may include:
the recording module 71 is configured to record a target operation unit identification value carried by a first instruction when the first instruction is executed;
a comparing module 72, configured to compare whether an arithmetic unit identifier of an arithmetic unit enabled by a second instruction is the same as the target arithmetic unit identifier when the second instruction is executed, where the second instruction is an instruction executed after the first instruction, and the second instruction stays in an execution stage in an instruction pipeline for more than one cycle;
and the processing module 73 is configured to, when the arithmetic unit identifier value of the arithmetic unit enabled by the second instruction is the same as the target arithmetic unit identifier value, send the operand of the second instruction to the arithmetic unit for calculation, record the address of the result register of the second instruction, and continue to execute subsequent instructions.
The instruction processing apparatus in the embodiment of the present invention is an apparatus corresponding to the instruction processing method in the above embodiment, and can implement each step in the instruction processing method, and can achieve the same technical effect, and for avoiding repetition, details are not described here again.
Optionally, the method further includes:
and the first pause module is used for pausing the instruction assembly line until the arithmetic unit finishes calculation under the condition that the arithmetic unit identification value of the arithmetic unit started by the second instruction is different from the target arithmetic unit identification value.
Optionally, the processing module 73 includes:
the first processing unit is configured to, when a write memory instruction is executed, store a write address of the write memory instruction in a first-in first-out address queue if a source register address of the write memory instruction is the same as a result register address of the second instruction, and clear a recorded result register address of the second instruction.
Optionally, the method further includes:
the storage module is used for storing an operation result into a first-in first-out data queue after the operation unit completes the calculation;
and the pairing module is used for pairing the value of the outlet of the first-in first-out data queue and the value of the outlet of the first-in first-out address queue and writing the paired data into a corresponding address if the outlets of the first-in first-out address queue and the first-in first-out data queue have effective values.
Optionally, the method further includes:
the execution module is used for executing the third instruction;
a second pausing module, configured to pause the instruction pipeline using the third instruction until the fifo address queue and the fifo data queue are emptied, if the fifo address queue and the fifo data queue are not emptied;
and the emptying module is used for emptying the recorded target operation unit identification value carried by the first instruction by using the third instruction if the first-in first-out address queue and the first-in first-out data queue are both emptied.
In the embodiment of the invention, the multi-cycle instruction is processed by introducing a non-blocking execution mode, so that the execution time of the instruction is greatly shortened, and the efficiency of processing the multi-cycle instruction by an instruction pipeline is improved.
An embodiment of another aspect of the present invention further provides a chip, where the chip includes a processor and a communication interface, the communication interface is coupled to the processor, and the processor is configured to execute a program or an instruction to implement the instruction processing method described in the above embodiment, and can achieve the same technical effect, and no further description is provided herein to avoid repetition.
In another aspect, an embodiment of the present invention further provides a readable storage medium, where a program or an instruction is stored on the readable storage medium, and when the program or the instruction is executed by a processor, the steps of the instruction processing method described above are implemented, and the same technical effects can be achieved, and in order to avoid repetition, details are not repeated here.
While the foregoing is directed to embodiments of the present invention, it will be appreciated by those skilled in the art that various changes and modifications may be made without departing from the principles of the invention, and it is intended that all such changes and modifications be considered as within the scope of the invention.

Claims (10)

1. An instruction processing method, comprising:
when a first instruction is executed, recording a target operation unit identification value carried by the first instruction;
when executing a second instruction, comparing whether the arithmetic unit identification value of the arithmetic unit enabled by the second instruction is the same as the target arithmetic unit identification value, wherein the second instruction is an instruction executed after the first instruction, and the second instruction stays for more than one cycle in an execution stage of an instruction pipeline;
and under the condition that the identification value of the arithmetic unit started by the second instruction is the same as the identification value of the target arithmetic unit, sending the operand of the second instruction to the arithmetic unit for calculation, recording the address of a result register of the second instruction, and continuously executing subsequent instructions.
2. The method of claim 1, wherein the step of continuing to execute the subsequent instruction comprises:
when a write memory instruction is executed, if the address of a source register of the write memory instruction is the same as the address of a result register of the second instruction, the write address of the write memory instruction is stored in a first-in first-out address queue, and the recorded address of the result register of the second instruction is emptied.
3. The method of claim 2, wherein after the step of providing the operand of the second instruction to the arithmetic unit for computation, the method further comprises:
after the calculation is completed by the operation unit, storing an operation result into a first-in first-out data queue;
and if the outlets of the first-in first-out address queue and the first-in first-out data queue have effective values, matching the outlet values of the first-in first-out data queue and the outlet values of the first-in first-out address queue, and writing the matched data into corresponding addresses.
4. The instruction processing method of claim 3, wherein the step of continuing to execute the subsequent instruction further comprises:
executing the third instruction;
if the FIFO address queue and the FIFO data queue are not empty, pausing the instruction pipeline with the third instruction until the FIFO address queue and the FIFO data queue are empty;
and if the first-in first-out address queue and the first-in first-out data queue are emptied, emptying the recorded target arithmetic unit identification value carried by the first instruction by using the third instruction.
5. An instruction processing apparatus, comprising:
the recording module is used for recording a target operation unit identification value carried by a first instruction when the first instruction is executed;
a comparison module, configured to compare whether an arithmetic unit identifier value of an arithmetic unit enabled by a second instruction is the same as the target arithmetic unit identifier value when the second instruction is executed, where the second instruction is an instruction executed after the first instruction, and the second instruction stays in an execution stage in an instruction pipeline for more than one cycle;
and the processing module is used for sending the operand of the second instruction to the arithmetic unit for calculation under the condition that the arithmetic unit identification value of the arithmetic unit started by the second instruction is the same as the target arithmetic unit identification value, recording the address of a result register of the second instruction, and continuously executing subsequent instructions.
6. The instruction processing apparatus according to claim 5, wherein the processing module comprises:
the first processing unit is configured to, when a write memory instruction is executed, store a write address of the write memory instruction in a first-in first-out address queue if a source register address of the write memory instruction is the same as a result register address of the second instruction, and clear a recorded result register address of the second instruction.
7. The instruction processing apparatus according to claim 6, further comprising:
the storage module is used for storing an operation result into a first-in first-out data queue after the operation unit completes the calculation;
and the pairing module is used for pairing the value of the outlet of the first-in first-out data queue and the value of the outlet of the first-in first-out address queue and writing the paired data into a corresponding address if the outlets of the first-in first-out address queue and the first-in first-out data queue have effective values.
8. The instruction processing apparatus according to claim 7, further comprising:
the execution module is used for executing the third instruction;
a second pausing module, configured to pause the instruction pipeline using the third instruction until the fifo address queue and the fifo data queue are emptied, if the fifo address queue and the fifo data queue are not emptied;
and the emptying module is used for emptying the recorded target operation unit identification value carried by the first instruction by using the third instruction if the first-in first-out address queue and the first-in first-out data queue are both emptied.
9. A chip comprising a processor and a communication interface, the communication interface being coupled to the processor, the processor being configured to execute a program or instructions implementing the instruction processing method of any of claims 1-4.
10. A readable storage medium, characterized in that the readable storage medium stores thereon a program or instructions which, when executed by a processor, implement the steps of the instruction processing method according to any one of claims 1 to 4.
CN202011009527.2A 2020-09-23 2020-09-23 Instruction processing method, instruction processing device and chip Pending CN112181492A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202011009527.2A CN112181492A (en) 2020-09-23 2020-09-23 Instruction processing method, instruction processing device and chip
PCT/CN2020/139465 WO2022062230A1 (en) 2020-09-23 2020-12-25 Instruction processing method, instruction processing apparatus, and chip

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011009527.2A CN112181492A (en) 2020-09-23 2020-09-23 Instruction processing method, instruction processing device and chip

Publications (1)

Publication Number Publication Date
CN112181492A true CN112181492A (en) 2021-01-05

Family

ID=73956546

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011009527.2A Pending CN112181492A (en) 2020-09-23 2020-09-23 Instruction processing method, instruction processing device and chip

Country Status (2)

Country Link
CN (1) CN112181492A (en)
WO (1) WO2022062230A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114116229A (en) * 2021-12-01 2022-03-01 北京奕斯伟计算技术有限公司 Method and apparatus for adjusting instruction pipeline, memory and storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020129227A1 (en) * 2001-03-07 2002-09-12 Fumio Arakawa Processor having priority changing function according to threads
US20080133888A1 (en) * 2006-11-30 2008-06-05 Hitachi, Ltd. Data processor
CN106325812A (en) * 2015-06-15 2017-01-11 华为技术有限公司 Processing method and device for multiplication and accumulation operation
CN107957976A (en) * 2017-12-15 2018-04-24 北京中科寒武纪科技有限公司 A kind of computational methods and Related product
CN108287730A (en) * 2018-03-14 2018-07-17 武汉市聚芯微电子有限责任公司 A kind of processor pipeline structure
CN109522254A (en) * 2017-10-30 2019-03-26 上海寒武纪信息科技有限公司 Arithmetic unit and method
CN110780845A (en) * 2019-10-17 2020-02-11 浙江大学 Configurable approximate multiplier for quantization convolutional neural network and implementation method thereof

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105975252B (en) * 2016-04-29 2018-10-09 龙芯中科技术有限公司 A kind of implementation method, device and the processor of the assembly line of process instruction

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020129227A1 (en) * 2001-03-07 2002-09-12 Fumio Arakawa Processor having priority changing function according to threads
US20080133888A1 (en) * 2006-11-30 2008-06-05 Hitachi, Ltd. Data processor
CN106325812A (en) * 2015-06-15 2017-01-11 华为技术有限公司 Processing method and device for multiplication and accumulation operation
CN109522254A (en) * 2017-10-30 2019-03-26 上海寒武纪信息科技有限公司 Arithmetic unit and method
CN107957976A (en) * 2017-12-15 2018-04-24 北京中科寒武纪科技有限公司 A kind of computational methods and Related product
CN108287730A (en) * 2018-03-14 2018-07-17 武汉市聚芯微电子有限责任公司 A kind of processor pipeline structure
CN110780845A (en) * 2019-10-17 2020-02-11 浙江大学 Configurable approximate multiplier for quantization convolutional neural network and implementation method thereof

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114116229A (en) * 2021-12-01 2022-03-01 北京奕斯伟计算技术有限公司 Method and apparatus for adjusting instruction pipeline, memory and storage medium

Also Published As

Publication number Publication date
WO2022062230A1 (en) 2022-03-31

Similar Documents

Publication Publication Date Title
EP0297893B1 (en) Apparatus and method for recovering from page faults in vector data processing operations
KR100900364B1 (en) System and method for reducing write traffic in processors
JPH04313121A (en) Instruction memory device
JPH06214799A (en) Method and apparatus for improvement of performance of random-sequence loading operation in computer system
JPS6028015B2 (en) information processing equipment
US20180181347A1 (en) Data processing apparatus and method for controlling vector memory accesses
JP2620511B2 (en) Data processor
US20030120882A1 (en) Apparatus and method for exiting from a software pipeline loop procedure in a digital signal processor
EP1861775B1 (en) Processor and method of indirect register read and write operations
CN110928577B (en) Execution method of vector storage instruction with exception return
KR101077425B1 (en) Efficient interrupt return address save mechanism
CN112181492A (en) Instruction processing method, instruction processing device and chip
US10338926B2 (en) Processor with conditional instructions
US20030154469A1 (en) Apparatus and method for improved execution of a software pipeline loop procedure in a digital signal processor
US8924693B2 (en) Predicting a result for a predicate-generating instruction when processing vector instructions
US6711670B1 (en) System and method for detecting data hazards within an instruction group of a compiled computer program
US20210042123A1 (en) Reducing Operations of Sum-Of-Multiply-Accumulate (SOMAC) Instructions
EP0279953B1 (en) Computer system having mixed macrocode and microcode instruction execution
TWI770079B (en) Vector generating instruction
KR20070108936A (en) Stop waiting for source operand when conditional instruction will not execute
CN111656337A (en) System and method for executing instructions
US6651164B1 (en) System and method for detecting an erroneous data hazard between instructions of an instruction group and resulting from a compiler grouping error
US10437596B2 (en) Processor with a full instruction set decoder and a partial instruction set decoder
US9983932B2 (en) Pipeline processor and an equal model compensator method and apparatus to store the processing result
KR100612193B1 (en) Method and apparatus for distributing commands to a plurality of circuit blocks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20210630

Address after: 100176 room 431, building a, building 18, Xihuan South Road, economic and Technological Development Zone, Daxing District, Beijing

Applicant after: Beijing Panyi Technology Co.,Ltd.

Address before: 100176 no.2179, 2 / F, building D, 33, 99 Kechuang 14th Street, Beijing Economic and Technological Development Zone, Beijing (centralized office area)

Applicant before: Beijing yisiwei Computing Technology Co.,Ltd.

TA01 Transfer of patent application right