Detailed Description
In order to make those skilled in the art better understand the technical solutions in the embodiments of the present invention, the technical solutions in the embodiments of the present invention will be described in detail below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments. All other embodiments obtained by a person skilled in the art based on the embodiments of the present invention shall fall within the scope of the protection of the embodiments of the present invention.
The following terms are used herein:
a processor core: also referred to as processor cores, are the cores of a processor, and a processor may have multiple (two or more) cores, but one core belongs to only one processor. The processor core assumes the functions of the processor compute engine, and all computations, commands accepted/stored, and data processed are performed in the processor core.
A pipelined processor: a processor having a pipeline of stages, each stage performing a different task with respect to program instructions. In a standard pipelined processor, the stages typically include five stages, instruction fetch, instruction decode, operand fetch, execute, and write back results.
And (3) instruction fusion: the method is a means for replacing one or more instructions with more efficient use of a plurality of instructions so as to improve the instruction execution performance.
And (3) masking operation: the masking operation in the embodiments of the present invention is also referred to as a bit masking operation, and refers to using a string of binary digits (mask) to operate on another string of binary digits (e.g., a machine instruction), to mask a portion of the binary digits in another string of binary digits (e.g., a machine instruction) based on the masking operation, or to retrieve a portion of the binary digits in another string of binary digits (e.g., a machine instruction) based on the masking operation.
Immediate count: the immediate in assembly language is equivalent to a constant in high level language, which is a number that appears directly in the instruction and does not need to be read in a register or memory.
Operand: an operand, which indicates the source of data required for the operation performed by an instruction, is a field of an assembly language instruction. The operand can be put in the field of the operand, the operation number itself, the operation address and the calculation method of the operation address. Operands whose contents do not change with the execution of the instruction are referred to as source operands, and operands whose contents change with the execution of the instruction are referred to as target operands.
In the following, the embodiments of the present invention will be described with reference to a plurality of examples based on the above terms used in the present invention.
Fig. 1 shows a schematic block diagram of a processor core according to an embodiment of the invention, which at least comprises: the device comprises an instruction fetching unit, a fusion detection unit, a decoding unit and an execution unit.
The instruction fetching unit is used for acquiring two machine instructions with adjacent time sequences. Since instructions that can be fused are usually consecutive in time sequence in the case of instruction fusion, a processor core performs fusion determination and processing on two machine instructions adjacent in time sequence.
The fusion detection unit is used for judging whether the two machine instructions meet a fusion condition, wherein the fusion condition at least comprises the following steps: the two machine instructions are respectively a first operation instruction for indicating addition and subtraction operation and a second operation instruction for indicating mask operation; and if the fusion condition is met, fusing the first operation instruction and the second operation instruction to obtain a fusion instruction.
In order to ensure smooth proceeding of instruction fusion, in the embodiment of the invention, corresponding fusion conditions are set, and when two machine instructions adjacent in time sequence meet the fusion conditions, the two machine instructions can be considered to be fused; otherwise, they are considered to be immiscible. Wherein the fusion conditions at least comprise: the two machine instructions are respectively a first operation instruction for indicating an addition and subtraction operation and a second operation instruction for indicating a mask operation. The addition and subtraction operation includes both addition operation and subtraction operation, and correspondingly, the first operation instruction may be an addition operation instruction or a subtraction operation instruction. Most of the underlying machine instructions use the arithmetic logic of addition and subtraction operations, which usually accompany masking operations. Therefore, in the embodiment of the present invention, the set fusion conditions at least include: the two machine instructions are respectively a first operation instruction for indicating an addition and subtraction operation and a second operation instruction for indicating a mask operation. It should be noted that, in the embodiment of the present invention, "first" and "second" are only used to distinguish different operation instructions, and do not indicate a timing or sequence relationship between the two.
For the two machine instructions, there may be a case where the addition and subtraction operation precedes the masking operation, and a case where the masking operation precedes the addition and subtraction follows, and therefore, the two machine instructions may be a first operation instruction for instructing the addition and subtraction operation and a second operation instruction for instructing the masking operation, respectively, and may include: the two machine instructions are two machine instructions executed by the first operation instruction before the second operation instruction; or the two machine instructions are two machine instructions executed by the second operation instruction before the first operation instruction. Therefore, the situation that instruction fusion is needed can be comprehensively considered.
In practical applications, since the instruction needs to write back to the register after execution, there may be only one write back port for writing the execution result back to the register in the execution unit, but there may be more than one write back ports (in the embodiment of the present invention, the numbers related to "more" such as "more" and "multiple" refer to two or more than two, unless otherwise specified). When the write-back ports comprise a plurality of write-back ports, whether instruction fusion needs to be carried out or not can be accurately judged through the fusion condition.
However, in some cases, there may be only one write-back port, and in this case, in order to ensure that the instruction merging can be performed successfully, optionally, the merging condition further includes: and on the basis of determining that the two machine instructions are the first operating instruction and the second operating instruction respectively, the destination register of the first operating instruction is the same as the destination register of the second operating instruction. Then, the fusion detection unit fuses the first operation instruction and the second operation instruction if the fusion condition is satisfied, and the operation of obtaining the fusion instruction includes: if the two machine instructions are determined to be the first operation instruction and the second operation instruction respectively, and the destination register of the first operation instruction is the same as the destination register of the second operation instruction; and fusing the first operation instruction and the second operation instruction to obtain a fused instruction. Therefore, the accuracy of judgment can be ensured to a certain extent.
However, in order to make the determination more accurate, further optionally, the fusion condition may further include: and on the basis of determining that the two machine instructions are respectively the first operating instruction and the second operating instruction, and that the destination register of the first operating instruction is the same as the destination register of the second operating instruction, the destination register of the first operating instruction is the same as the register of the first source operand of the second operating instruction. Then, the fusion detection unit fuses the first operation instruction and the second operation instruction if the fusion condition is satisfied, and the operation of obtaining the fusion instruction includes: if the two machine instructions are determined to be the first operation instruction and the second operation instruction respectively, and the destination register of the first operation instruction is the same as that of the second operation instruction, and the destination register of the first operation instruction is the same as that of the first source operand of the second operation instruction; and fusing the first operation instruction and the second operation instruction to obtain a fused instruction.
As can be seen from the above, when two machine instructions adjacent in time sequence satisfy the above fusion condition, they can be subjected to instruction fusion to obtain a fused instruction.
After the two machine instructions are fused to generate a fusion instruction, the two machine instructions need to be processed continuously, and in order to enable a subsequent processing unit to clearly know that the current instruction is information of the fusion instruction, the generated fusion instruction carries, in addition to instruction data: indication information for indicating whether the current instruction is a fused instruction (which may be indicated by an indication bit, e.g., 0 for a non-fused instruction and 1 for a fused instruction), information for indicating a type of fusion when the current instruction is a fused instruction (i.e., information for indicating whether a first operation instruction precedes or a second operation instruction precedes, e.g., 0 for a first operation instruction precedes and 1 for a second operation instruction precedes), and immediate information. Based on the fused instruction, subsequent related processing units such as an execution unit and a retirement unit can clearly know whether the currently processed instruction is the fused instruction.
On the basis, the decoding unit is used for decoding the fused instruction to obtain a decoding result and an operand corresponding to the fused instruction. And the execution unit is used for performing mask operation on the operand according to the operation instruction of the fusion instruction to obtain an execution result of the fusion instruction.
In the scheme of the embodiment of the invention, the fusion detection unit can detect and fuse the first operation instruction and the second operation instruction, and correspondingly, the decoding unit can decode the fusion instruction to obtain the fusion operation code and the operand thereof, so that the hardware acceleration of the pipeline processor is realized while the instruction system of the processor is compatible.
In addition, the processor core may further include a retirement unit, configured to perform retirement processing on the first operation instruction and the second operation instruction corresponding to the fused instruction after obtaining the execution result of the fused instruction, for example, writing back the execution result generated by the execution unit to a corresponding storage location (e.g., a register inside the pipelined processor 200) so that a subsequent instruction can quickly obtain the corresponding execution result from the storage location.
For the above fusion detection unit, in a practical implementation, the fusion detection unit may be disposed in the fetching unit; in another possible approach, the fusion detection unit may be disposed in the decoding unit. That is, the fusion detection unit may be incorporated with the fetch unit; or, it is incorporated in the decoding unit. In specific applications, those skilled in the art can select and use the above-mentioned materials according to actual needs. In this embodiment, the fusion detection unit is disposed in the fetching unit as an example.
In the scheme of the embodiment of the invention, the fusion detection unit can detect and fuse the first operation instruction and the second operation instruction, and correspondingly, the decoding unit can decode the fusion instruction to obtain the fusion operation code and the operand thereof, so that the hardware acceleration of the processor core is realized while the processor instruction system is compatible.
In addition, the processor core may further include conventional devices such as a cache, a memory management unit, and a register file, which will be further described below and will not be described in detail herein.
Through the embodiment, when the processor core processes two machine instructions adjacent to each other in time sequence, if the two machine instructions can meet the corresponding instruction fusion condition, for example, if the two machine instructions are an instruction for indicating addition and subtraction operation and an instruction for indicating mask operation, the two machine instructions can be fused, and the fused instruction as a whole is used for subsequent processing. This is because the arithmetic logic of the add-subtract operation is usually accompanied by a masking operation to perform data extraction during the operation. Based on the method, the arithmetic logic and the mask operation can be fused, so that the hardware acceleration is realized and the performance of the processor core is improved under the condition of not increasing instructions. And because no additional instruction is added, the problem of poor compatibility with the existing processor instruction system is avoided.
On the basis of the processor core shown in fig. 1, an embodiment of the present invention further provides a pipeline processor, as shown in fig. 2. It should be noted that other types of processors that can be adapted to the above-mentioned processor core are also within the scope of the embodiments of the present invention.
FIG. 2 is a block diagram of an exemplary pipelined processor to which the instruction fusion scheme of embodiments of the present invention may be applied.
As can be seen, the pipelined processor 100 may include one or more processor cores 120 for processing instructions, each processor core 120 may be used to process a particular instruction set. In some embodiments, the Instruction Set may support Complex Instruction Set Computing (CISC), reduced Instruction Set Computing (RISC), or Very Long Instruction Word (VLIW) -based Computing. The different processor cores 120 may each process different or the same instruction set. In an embodiment of the present invention, at least one of the processor cores 120 is configured to support RISC, illustratively RISC-V. As an example, processor cores 1 to m are shown in fig. 2, m being a natural number greater than 1.
In some embodiments, the cache memory 18 shown in FIG. 2 may be fully or partially integrated in the pipelined processor 100. And depending on the architecture, the cache memory 18 may be a single or multiple levels of internal cache memory (e.g., level 3 cache memories L1-L3 shown in fig. 2, collectively referenced as 18 in fig. 2) within and/or outside of the respective processor cores 101, as well as instruction-oriented instruction cache and data-oriented data cache. In some embodiments, various components in pipelined processor 100 may share at least a portion of the cache memory, as shown in FIG. 2, with processor cores 1-m sharing a third level cache memory L3, for example.
In some embodiments, as shown in FIG. 2, pipelined processor 100 may include a Register File 126 (Register File), where Register File 126 may include a plurality of registers for storing different types of data and/or instructions, and where the registers may be of different types. For example, register file 126 may include: integer registers, floating point registers, status registers, instruction registers, pointer registers, and the like. The registers in the register file 126 may be implemented by general purpose registers, or may be designed specifically according to the actual requirements of the pipelined processor 100.
In some embodiments, the pipelined processor 100 may further include a Memory Management Unit (MMU) 122 for implementing the virtual address to physical address translation. One or more memory management units 122 may be disposed in each processor core 120, and the memory management units 120 in different processor cores 120 may also be synchronized with the memory management units 120 located in other processors or processor cores, such that each processor or processor core may share a unified virtual storage system.
The pipelined processor 100 based on the above-mentioned arrangement is used for executing a sequence of instructions, in particular, the sequence of instructions can be executed by the processor core 120, and the process of executing each instruction includes: and the steps of fetching the instruction from the memory for storing the instruction, decoding the fetched instruction, executing the decoded instruction, saving the instruction execution result and the like are circulated until all instructions in the instruction sequence are executed or a halt instruction is encountered.
Thus, to execute a sequence of instructions, the pipelined processor 100 may also include an instruction fetch unit 124, a decode unit 125, an execution unit 121, a retirement unit (not shown), and so on. In this embodiment, the pipelined processor 100 includes these elements by a processor core 120, and the processor core is the processor core shown in fig. 1 in this embodiment of the present invention.
In this regard, the fetch unit 124 acts as a boot engine for the pipelined processor 100, and is used to move instructions from the memory 14 into an instruction register (which may be one of the registers in the register file 26 shown in FIG. 1 used to store instructions) and receive or compute a next fetch address according to a fetch algorithm, such as: the address is incremented or decremented according to the instruction length. When the processor core shown in fig. 1 is used, the instruction fetching unit 124 is further provided with a fusion detection unit to determine whether two machine instructions adjacent to each other in time sequence fetched by the instruction fetching unit 124 satisfy a fusion condition, and when the fusion condition is satisfied, perform instruction fusion to obtain a fusion instruction. For specific implementation of the process, reference may be made to the description in the foregoing processor core embodiment, and details are not described herein again.
Then, entering the instruction decoding stage, the decoding unit 125 decodes the fused instruction to obtain operand information required by the fused instruction, so as to prepare for the operation of the execution unit 121. Operand information such as an immediate, register, or other information capable of providing a source operand.
After the instruction is fetched, decoded and dispatched to the corresponding execution unit 121, the execution unit 121 starts to execute the fused instruction, i.e. execute the operation indicated by the fused instruction, and implement the corresponding function.
The retirement unit (or instruction write-back unit) is mainly responsible for writing back the execution results generated by the execution units 121 to corresponding storage locations (e.g., registers inside the pipeline processor 100) so that subsequent instructions can quickly obtain the corresponding execution results from the storage locations. And then, performing instruction recovery and retirement to finish the processing process of the corresponding instruction. In this embodiment, retirement operations are performed on the first operation instruction and the second operation instruction corresponding to the fusion instruction at the same time.
By the processor of the embodiment, hardware acceleration can be realized and the performance of the processor is improved under the condition that instructions are not increased. And because no additional instruction is added, the problem of poor compatibility with the existing processor instruction system is avoided.
On the basis of the processor, the embodiment of the invention further provides a chip, which at least comprises the processor core or the processor. In practical applications, the chip may further be provided with hardware, a controller, and the like for implementing various functions according to different actual requirements, but it is within the protection scope of the present invention as long as the chip includes the processor core or the processor.
Further, an embodiment of the present invention further provides a control device, which at least includes the processor core or the processor or the chip as described above. In practical applications, the control device may be implemented as any suitable device, such as a mobile control device, an industrial control device, a desktop control device, and so on.
In addition, an embodiment of the present invention further provides an instruction fusing method, and referring to fig. 3, a flowchart illustrating steps of the instruction fusing method according to an embodiment of the present invention is shown.
The instruction fusion method of the embodiment comprises the following steps:
step S202: two machine instructions adjacent in time sequence are obtained.
Whatever the application or program, the code for implementing the corresponding functions or operations will be finally compiled into computer-readable machine language, i.e., machine instructions, and sent to a processor for execution. The specific execution time sequence of the instruction is determined by the compiling result, and for the processor, the processor acquires the machine instruction to perform corresponding instruction processing so as to realize the operation and the function indicated by the machine instruction. For example, a machine instruction may be fetched by instruction fetch unit 124 as shown in FIG. 1.
Since the instructions that can be fused are usually consecutive in time sequence in the case of instruction fusion, the processor performs fusion judgment and processing on two machine instructions adjacent in time sequence in this step.
Step S204: and judging whether the two machine instructions meet the fusion condition.
In order to ensure smooth proceeding of instruction fusion, in the embodiment of the invention, corresponding fusion conditions are set, and when two machine instructions adjacent in time sequence meet the fusion conditions, the two machine instructions can be considered to be fused; otherwise, they are considered to be unable to fuse.
In this embodiment, the fusion condition at least includes: the two machine instructions are respectively a first operation instruction for indicating an addition and subtraction operation and a second operation instruction for indicating a mask operation. The addition and subtraction operation includes both addition operation and subtraction operation, and correspondingly, the first operation instruction may be an addition operation instruction or a subtraction operation instruction. Since most of the underlying machine instructions use the arithmetic logic of addition and subtraction operations, which are usually accompanied by masking operations. Therefore, in the embodiment of the present invention, the set fusion conditions at least include: the two machine instructions are respectively a first operation instruction for indicating an addition and subtraction operation and a second operation instruction for indicating a mask operation. It should be noted that, in the embodiment of the present invention, "first" and "second" are only used to distinguish different operation instructions, and do not indicate a timing or sequence relationship between the two.
For the two machine instructions, there may be a case where the addition and subtraction operation precedes the masking operation, and a case where the masking operation precedes the addition and subtraction follows, and therefore, the two machine instructions may be a first operation instruction for instructing the addition and subtraction operation and a second operation instruction for instructing the masking operation, respectively, and may include: the two machine instructions are two machine instructions executed by the first operation instruction before the second operation instruction; or the two machine instructions are two machine instructions executed by the second operation instruction before the first operation instruction. Therefore, the situation that instruction fusion is needed can be comprehensively considered.
In practical applications, since the instruction needs to write back to the register after execution, there may be only one write back port for writing the execution result back to the register in the execution unit, but there may be more than one write back ports (in the embodiment of the present invention, the numbers related to "more" such as "more" and "multiple" refer to two or more than two, unless otherwise specified). When the write-back ports comprise a plurality of write-back ports, whether instruction fusion needs to be carried out or not can be accurately judged through the fusion condition.
However, in some cases, there may be only one write-back port, and in this case, in order to ensure that the instruction merging can be performed successfully, optionally, the merging condition further includes: and on the basis of determining that the two machine instructions are the first operating instruction and the second operating instruction respectively, the destination register of the first operating instruction is the same as the destination register of the second operating instruction. Therefore, the accuracy of judgment can be ensured to a certain extent.
However, in order to make the determination more accurate, further optionally, the fusion condition may further include: and on the basis of determining that the two machine instructions are respectively the first operating instruction and the second operating instruction, and that the destination register of the first operating instruction is the same as the destination register of the second operating instruction, the destination register of the first operating instruction is the same as the register of the first source operand of the second operating instruction.
Step S206: and if the fusion condition is met, fusing the first operation instruction and the second operation instruction to obtain a fusion instruction.
This step may include the following situations:
the first situation is as follows: when the fusion conditions only include: when the two machine instructions are respectively a first operation instruction for indicating addition and subtraction operations and a second operation instruction for indicating masking operations, if two machine instructions adjacent in time sequence satisfy: the first operation instruction is firstly, and the second operation instruction is secondly; or, the first operation instruction is the second operation instruction, and the second operation instruction is the first operation instruction. Then, the two operation instructions may be fused to obtain a fused instruction, considering that the fusion condition is satisfied. If this condition is not satisfied, fusion cannot be performed.
Case two: when the fusion conditions include: the two machine instructions are respectively a first operation instruction for indicating addition and subtraction operation and a second operation instruction for indicating mask operation, and a destination register of the first operation instruction is the same as a destination register of the second operation instruction, except that the first operation instruction is needed to be preceded and the second operation instruction is needed to be succeeded; or, the first operation instruction is the second operation instruction, and then the first operation instruction is the first operation instruction, and the first operation instruction and the second operation instruction are fused to obtain the fused instruction only when the destination register of the first operation instruction is the same as the destination register of the second operation instruction. If any of the conditions is not satisfied, fusion cannot be performed.
Case three: when the fusion conditions include: the two machine instructions are respectively a first operation instruction for indicating addition and subtraction operation and a second operation instruction for indicating mask operation, and a destination register of the first operation instruction is the same as a destination register of the second operation instruction, and a destination register of the first operation instruction is the same as a register of a first source operand of the second operation instruction, except that the first operation instruction is needed to be firstly followed by the second operation instruction; or, the first operation instruction is the second operation instruction, the second operation instruction is the first operation instruction, and the destination register of the first operation instruction is the same as the destination register of the second operation instruction, and the first operation instruction and the second operation instruction are fused to obtain the fused instruction only when the destination register of the first operation instruction is the same as the register of the first source operand of the second operation instruction. If any of the conditions is not satisfied, fusion cannot be performed.
After the two machine instructions are fused to generate a fusion instruction, the two machine instructions need to be processed continuously, and in order to enable a subsequent processing unit to clearly know that the current instruction is information of the fusion instruction, the generated fusion instruction carries, in addition to instruction data: indication information for indicating whether the current instruction is a fused instruction (which may be indicated by an indication bit, e.g., 0 for a non-fused instruction and 1 for a fused instruction), information for indicating a type of fusion when the current instruction is a fused instruction (i.e., information for indicating whether a first operation instruction precedes or a second operation instruction precedes, e.g., 0 for a first operation instruction precedes and 1 for a second operation instruction precedes), and immediate information.
Based on the fused instruction, the subsequent related processing units such as the execution unit and the retirement unit can clearly know whether the currently processed instruction is the fused instruction.
Step S208: and performing instruction processing on the two machine instructions based on the fusion instruction.
When two machine instructions are merged into one merged instruction, the merged instruction can be processed as a whole instruction, such as decoding, execution, retirement, and the like.
Specifically, the instruction processing of the two machine instructions based on the fused instruction may be implemented as: carrying out decoding operation on the fusion instruction to obtain a decoding result and an operand corresponding to the fusion instruction; and performing mask operation on the operand according to the operation instruction of the fused instruction to obtain an execution result of the fused instruction.
Furthermore, after the fused instruction is successfully executed, that is, after the execution result of the fused instruction is obtained, the first operation instruction and the second operation instruction corresponding to the fused instruction can be retired together to complete the life cycle of the instruction and recover the related resources.
As can be seen, with this embodiment, when the processor processes two machine instructions adjacent to each other in time sequence, if the two machine instructions can satisfy the corresponding instruction fusion condition, for example, the two instructions are an instruction indicating an add-subtract operation and an instruction indicating a mask operation, the two instructions may be fused, and the fused instruction as a whole is used for subsequent processing. This is because, for the arithmetic logic of the addition and subtraction operation, it is usually accompanied by a masking operation to perform data extraction during the operation. Based on the method, the operation logic and the mask operation can be fused, so that the hardware acceleration is realized and the performance of the processor is improved under the condition of not increasing instructions. And because no additional instruction is added, the problem of poor compatibility with the existing processor instruction system is avoided.
The execution of the above process in a processor is exemplified below by taking the microprocessor architecture shown in fig. 2 as an example.
Firstly, it needs to be noted that, the above-mentioned method determines whether the two machine instructions satisfy the fusion condition; and if the fusion condition is met, fusing the first operation instruction and the second operation instruction to obtain the operation of the fusion instruction, wherein the operation can be executed through an instruction fetching unit of the microprocessor or a decoding unit of the microprocessor. That is, whether the two machine instructions meet the fusion condition can be judged through an instruction fetching unit or a decoding unit of the microprocessor; and if the fusion condition is met, fusing the first operation instruction and the second operation instruction to obtain the operation of the fusion instruction. In this example, the instruction is executed by the instruction fetch unit, but it should be understood by those skilled in the art that the instruction fusion is an operation after instruction fetch and before decoding, and therefore, the instruction fusion can be implemented by the execution of the decoding unit.
Further, the operation instruction in the present example is exemplified by a RISC-V instruction set in which the relevant first operation instruction for indicating addition and subtraction operations and second operation instruction for indicating mask operations are respectively shown in the following tables:
the specific meaning of each instruction follows the RISC-V specification, and is not described herein.
It should be further noted that, in this example, for convenience of description, the foregoing is to determine whether two machine instructions satisfy the blending condition; and if the fusion condition is met, fusing the first operation instruction and the second operation instruction to obtain the operation package of the fusion instruction as a unit, which is called a fusion detection unit.
On this basis, when the microprocessor (pipeline type) shown in fig. 2 performs instruction processing, first, the instruction fetch unit 124 accesses a bus, a cache, and the like to store results, and encapsulates at least 1 operation instruction; then, the fusion detection unit in the instruction fetching unit 124 pre-decodes and detects two operation instructions adjacent in sequence, and if the two operation instructions satisfy (1), the first operation instruction is first, and the second operation instruction is later; or, the first operation instruction is the second operation instruction, and the second operation instruction is the first operation instruction; (2) The destination register of the first operation instruction is the same as the destination register of the second operation instruction; (3) The destination register of the first operation instruction is the same as the register of the first source operand of the second operation instruction, then the fusion detection unit fuses the two operation instructions into one fused instruction, and sends the fused instruction to the decoding unit 125, and the fused instruction is indicated by additional information as information of the fused instruction, the fused type and the immediate. Then, the decoding unit 125 decodes the fused instruction, and prepares a corresponding operand and the like according to the requirement of the fused instruction. Then, after receiving the fused instruction, the execution unit 121 performs a masking operation on the operand or the operation result according to the requirement of the fused instruction, so as to obtain a correct fused instruction execution result. The execution result is written back to the destination register. After the execution unit 121 completes all operations corresponding to the fused instruction, the fused instruction is forwarded to the retirement unit. And after receiving the fusion instruction, the retirement unit confirms that the fusion instruction is the fusion instruction according to the information carried by the fusion instruction, and performs retirement maintenance on the two operation instructions corresponding to the fusion instruction together.
Illustratively, assume that the two instructions fetched by the fetch unit 124 are: instruction 1"addw s1, a5"; instruction 2"andi S1,127". Based on the relevant specifications in RISC-V, "addw" is an add operation and "andi" is a mask operation. The instruction format corresponding to "addw" is: addw rd, rs1, rs2, where rd represents the destination register, rs1 represents the register of the first source operand, and rs2 represents the register of the second source operand. The instruction format corresponding to "andi" is: anddi rd, rs1, immedate, where rd represents the destination register, rs1 represents the register of the source operand, immedate represents the immediate.
As can be seen, if instruction 1 is an addition operation instruction in the addition and subtraction operation instructions, and instruction 2 is a mask operation instruction, then the fusion condition "(1) is the first operation instruction before the second operation instruction; alternatively, the second operation instruction is satisfied first, followed by the first operation instruction ". Further, if the destination register of instruction 1 is "s1" and the destination register of instruction 2 is also "s1", the fusion condition "(2) that the destination register of the first operation instruction and the destination register of the second operation instruction are the same" is satisfied. Further, if the destination register of instruction 1 is "s1" and the register of the source operand of instruction 2 is also "s1", the blend condition "(3) that the destination register of the first operation instruction is the same as the register of the first source operand of the second operation instruction" is satisfied.
Based on this, instruction fusion may be performed on instruction 1 and instruction 2 above to generate fused instructions "addw.mrd, rs1, rs2, imm", where ". M" indicates that the instructions have a mask operation. The fused instruction will be subsequently decoded, executed and retired as an instruction in its entirety.
Therefore, through instruction fusion, the execution efficiency and performance of the processor are effectively improved, and no additional instruction is added.
In addition, since the mask operation can have more bits after the instruction fusion, the supported range of the mask operation is wider, for example: the immediate of the mask is generated by the andi instruction, and the mask can be supported according to the andi immediate range instead of the limited number specified by the tradition; furthermore, the masking operation may support the implementation of masking on the source and destination operands, which is also more widely applicable than conventional.
The above embodiments are only used for illustrating the embodiments of the present invention, and not for limiting the embodiments of the present invention, and those skilled in the art can make various changes and modifications without departing from the spirit and scope of the embodiments of the present invention, so that all equivalent technical solutions also belong to the scope of the embodiments of the present invention, and the scope of patent protection of the embodiments of the present invention should be defined by the claims.