CN115576608A - Processor core, processor, chip, control equipment and instruction fusion method - Google Patents

Processor core, processor, chip, control equipment and instruction fusion method Download PDF

Info

Publication number
CN115576608A
CN115576608A CN202211197891.5A CN202211197891A CN115576608A CN 115576608 A CN115576608 A CN 115576608A CN 202211197891 A CN202211197891 A CN 202211197891A CN 115576608 A CN115576608 A CN 115576608A
Authority
CN
China
Prior art keywords
instruction
operation instruction
fusion
machine instructions
destination register
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211197891.5A
Other languages
Chinese (zh)
Inventor
刘东启
赵朝君
刘畅
魏定彦
徐文健
江滔
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou C Sky Microsystems Co Ltd
Original Assignee
Pingtouge Shanghai Semiconductor Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Pingtouge Shanghai Semiconductor Co Ltd filed Critical Pingtouge Shanghai Semiconductor Co Ltd
Priority to CN202211197891.5A priority Critical patent/CN115576608A/en
Publication of CN115576608A publication Critical patent/CN115576608A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3836Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
    • G06F9/3853Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution of compound instructions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/30007Arrangements for executing specific machine instructions to perform operations on data operands

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Advance Control (AREA)

Abstract

The invention provides a processor core, a processor, a chip, a control device and an instruction fusion method. Wherein, the processor core includes: the instruction fetching unit is used for acquiring two machine instructions adjacent in time sequence; the fusion detection unit is used for judging whether the two machine instructions meet a fusion condition or not; if yes, fusing the first operation instruction and the second operation instruction to obtain a fusion instruction; the decoding unit is used for carrying out decoding operation on the fusion instruction to obtain a decoding result and an operand corresponding to the fusion instruction; and the execution unit is used for performing mask operation on the operand according to the operation instruction of the fusion instruction to obtain the execution result of the fusion instruction. The invention is suitable for various chips comprising a CISC instruction set, a RISC instruction set (particularly RISC-V instruction set) or a VLIM instruction set architecture, such as an Internet of things chip, an audio/video chip and the like.

Description

Processor core, processor, chip, control equipment and instruction fusion method
Technical Field
The embodiment of the invention relates to the technical field of processors, in particular to a processor core, a processor, a chip, control equipment and an instruction fusion method.
Background
Reduced Instruction Set Computer (RISC) Instruction systems are relatively simple, requiring hardware to execute a limited and commonly used Set of instructions, and implementing complex operations by incorporating compilation techniques. The reduced instruction set is composed of simple instructions, and can cover most use scenarios of a pipelined processor.
With the gradual improvement of performance requirements of modern processors, hardware acceleration is required for use scenes which cannot be covered by a simplified instruction set, and the current mode is realized by adding corresponding instructions mostly, so that additional instruction codes are required to be added for realization, and the compatibility with the existing processor instruction system is also poor.
Disclosure of Invention
Embodiments of the present invention provide an instruction fusion scheme to solve at least some of the above problems.
According to a first aspect of embodiments of the present invention, there is provided a processor core, including: the device comprises an instruction fetching unit, a fusion detection unit, a decoding unit, an execution unit and a retirement unit; wherein: the instruction fetching unit is used for acquiring two machine instructions adjacent in time sequence; the fusion detection unit is configured to determine whether the two machine instructions satisfy a fusion condition, where the fusion condition at least includes: the two machine instructions are respectively a first operation instruction for indicating addition and subtraction operation and a second operation instruction for indicating mask operation; if the fusion condition is met, fusing the first operation instruction and the second operation instruction to obtain a fusion instruction; the decoding unit is used for carrying out decoding operation on the fused instruction to obtain a decoding result and an operand corresponding to the fused instruction; and the execution unit is used for performing mask operation on the operand according to the operation instruction of the fusion instruction to obtain the execution result of the fusion instruction.
According to a second aspect of embodiments of the present invention, there is provided a processor, including: the processor core of the first aspect.
According to a third aspect of an embodiment of the present invention, there is provided a chip, including: the processor core of the first aspect; or comprising a processor as described in the second aspect.
According to a fourth aspect of an embodiment of the present invention, there is provided a control apparatus characterized by comprising: the processor core of the first aspect; or, comprising a processor as described in the second aspect; alternatively, a chip as described in the third aspect is included.
According to a fifth aspect of the embodiments of the present invention, there is provided an instruction fusing method, including: acquiring two machine instructions adjacent in time sequence; judging whether the two machine instructions meet a fusion condition, wherein the fusion condition at least comprises the following steps: the two machine instructions are respectively a first operation instruction for indicating addition and subtraction operation and a second operation instruction for indicating mask operation; if the fusion condition is met, fusing the first operation instruction and the second operation instruction to obtain a fusion instruction; and performing instruction processing on the two machine instructions based on the fusion instruction.
According to the scheme of the embodiment of the invention, when the processor core processes two machine instructions adjacent in time sequence, if the two machine instructions can meet the corresponding instruction fusion condition, for example, if the two instructions are an instruction indicating addition and subtraction operation and an instruction indicating mask operation, the two instructions can be fused, and the fused instruction as a whole is used as a unit for subsequent processing. This is because the arithmetic logic of the add-subtract operation is usually accompanied by a masking operation to perform data extraction during the operation. Based on the method, the operation logic and the mask operation can be fused, so that the hardware acceleration is realized and the performance of the processor core is improved under the condition of not increasing instructions. And because no additional instruction is added, the problem of poor compatibility with the existing processor instruction system is avoided.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments described in the embodiments of the present invention, and it is also possible for a person skilled in the art to obtain other drawings based on the drawings.
FIG. 1 is a schematic block diagram of a processor core in accordance with one embodiment;
FIG. 2 is a schematic block diagram of a pipelined processor, according to one embodiment.
FIG. 3 is a flowchart illustrating steps of an instruction fusing method according to an embodiment of the present invention.
Detailed Description
In order to make those skilled in the art better understand the technical solutions in the embodiments of the present invention, the technical solutions in the embodiments of the present invention will be described in detail below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments. All other embodiments obtained by a person skilled in the art based on the embodiments of the present invention shall fall within the scope of the protection of the embodiments of the present invention.
The following terms are used herein:
a processor core: also referred to as processor cores, are the cores of a processor, and a processor may have multiple (two or more) cores, but one core belongs to only one processor. The processor core assumes the functions of the processor compute engine, and all computations, commands accepted/stored, and data processed are performed in the processor core.
A pipelined processor: a processor having a pipeline of stages, each stage performing a different task with respect to program instructions. In a standard pipelined processor, the stages typically include five stages, instruction fetch, instruction decode, operand fetch, execute, and write back results.
And (3) instruction fusion: the method is a means for replacing one or more instructions with more efficient use of a plurality of instructions so as to improve the instruction execution performance.
And (3) masking operation: the masking operation in the embodiments of the present invention is also referred to as a bit masking operation, and refers to using a string of binary digits (mask) to operate on another string of binary digits (e.g., a machine instruction), to mask a portion of the binary digits in another string of binary digits (e.g., a machine instruction) based on the masking operation, or to retrieve a portion of the binary digits in another string of binary digits (e.g., a machine instruction) based on the masking operation.
Immediate count: the immediate in assembly language is equivalent to a constant in high level language, which is a number that appears directly in the instruction and does not need to be read in a register or memory.
Operand: an operand, which indicates the source of data required for the operation performed by an instruction, is a field of an assembly language instruction. The operand can be put in the field of the operand, the operation number itself, the operation address and the calculation method of the operation address. Operands whose contents do not change with the execution of the instruction are referred to as source operands, and operands whose contents change with the execution of the instruction are referred to as target operands.
In the following, the embodiments of the present invention will be described with reference to a plurality of examples based on the above terms used in the present invention.
Fig. 1 shows a schematic block diagram of a processor core according to an embodiment of the invention, which at least comprises: the device comprises an instruction fetching unit, a fusion detection unit, a decoding unit and an execution unit.
The instruction fetching unit is used for acquiring two machine instructions with adjacent time sequences. Since instructions that can be fused are usually consecutive in time sequence in the case of instruction fusion, a processor core performs fusion determination and processing on two machine instructions adjacent in time sequence.
The fusion detection unit is used for judging whether the two machine instructions meet a fusion condition, wherein the fusion condition at least comprises the following steps: the two machine instructions are respectively a first operation instruction for indicating addition and subtraction operation and a second operation instruction for indicating mask operation; and if the fusion condition is met, fusing the first operation instruction and the second operation instruction to obtain a fusion instruction.
In order to ensure smooth proceeding of instruction fusion, in the embodiment of the invention, corresponding fusion conditions are set, and when two machine instructions adjacent in time sequence meet the fusion conditions, the two machine instructions can be considered to be fused; otherwise, they are considered to be immiscible. Wherein the fusion conditions at least comprise: the two machine instructions are respectively a first operation instruction for indicating an addition and subtraction operation and a second operation instruction for indicating a mask operation. The addition and subtraction operation includes both addition operation and subtraction operation, and correspondingly, the first operation instruction may be an addition operation instruction or a subtraction operation instruction. Most of the underlying machine instructions use the arithmetic logic of addition and subtraction operations, which usually accompany masking operations. Therefore, in the embodiment of the present invention, the set fusion conditions at least include: the two machine instructions are respectively a first operation instruction for indicating an addition and subtraction operation and a second operation instruction for indicating a mask operation. It should be noted that, in the embodiment of the present invention, "first" and "second" are only used to distinguish different operation instructions, and do not indicate a timing or sequence relationship between the two.
For the two machine instructions, there may be a case where the addition and subtraction operation precedes the masking operation, and a case where the masking operation precedes the addition and subtraction follows, and therefore, the two machine instructions may be a first operation instruction for instructing the addition and subtraction operation and a second operation instruction for instructing the masking operation, respectively, and may include: the two machine instructions are two machine instructions executed by the first operation instruction before the second operation instruction; or the two machine instructions are two machine instructions executed by the second operation instruction before the first operation instruction. Therefore, the situation that instruction fusion is needed can be comprehensively considered.
In practical applications, since the instruction needs to write back to the register after execution, there may be only one write back port for writing the execution result back to the register in the execution unit, but there may be more than one write back ports (in the embodiment of the present invention, the numbers related to "more" such as "more" and "multiple" refer to two or more than two, unless otherwise specified). When the write-back ports comprise a plurality of write-back ports, whether instruction fusion needs to be carried out or not can be accurately judged through the fusion condition.
However, in some cases, there may be only one write-back port, and in this case, in order to ensure that the instruction merging can be performed successfully, optionally, the merging condition further includes: and on the basis of determining that the two machine instructions are the first operating instruction and the second operating instruction respectively, the destination register of the first operating instruction is the same as the destination register of the second operating instruction. Then, the fusion detection unit fuses the first operation instruction and the second operation instruction if the fusion condition is satisfied, and the operation of obtaining the fusion instruction includes: if the two machine instructions are determined to be the first operation instruction and the second operation instruction respectively, and the destination register of the first operation instruction is the same as the destination register of the second operation instruction; and fusing the first operation instruction and the second operation instruction to obtain a fused instruction. Therefore, the accuracy of judgment can be ensured to a certain extent.
However, in order to make the determination more accurate, further optionally, the fusion condition may further include: and on the basis of determining that the two machine instructions are respectively the first operating instruction and the second operating instruction, and that the destination register of the first operating instruction is the same as the destination register of the second operating instruction, the destination register of the first operating instruction is the same as the register of the first source operand of the second operating instruction. Then, the fusion detection unit fuses the first operation instruction and the second operation instruction if the fusion condition is satisfied, and the operation of obtaining the fusion instruction includes: if the two machine instructions are determined to be the first operation instruction and the second operation instruction respectively, and the destination register of the first operation instruction is the same as that of the second operation instruction, and the destination register of the first operation instruction is the same as that of the first source operand of the second operation instruction; and fusing the first operation instruction and the second operation instruction to obtain a fused instruction.
As can be seen from the above, when two machine instructions adjacent in time sequence satisfy the above fusion condition, they can be subjected to instruction fusion to obtain a fused instruction.
After the two machine instructions are fused to generate a fusion instruction, the two machine instructions need to be processed continuously, and in order to enable a subsequent processing unit to clearly know that the current instruction is information of the fusion instruction, the generated fusion instruction carries, in addition to instruction data: indication information for indicating whether the current instruction is a fused instruction (which may be indicated by an indication bit, e.g., 0 for a non-fused instruction and 1 for a fused instruction), information for indicating a type of fusion when the current instruction is a fused instruction (i.e., information for indicating whether a first operation instruction precedes or a second operation instruction precedes, e.g., 0 for a first operation instruction precedes and 1 for a second operation instruction precedes), and immediate information. Based on the fused instruction, subsequent related processing units such as an execution unit and a retirement unit can clearly know whether the currently processed instruction is the fused instruction.
On the basis, the decoding unit is used for decoding the fused instruction to obtain a decoding result and an operand corresponding to the fused instruction. And the execution unit is used for performing mask operation on the operand according to the operation instruction of the fusion instruction to obtain an execution result of the fusion instruction.
In the scheme of the embodiment of the invention, the fusion detection unit can detect and fuse the first operation instruction and the second operation instruction, and correspondingly, the decoding unit can decode the fusion instruction to obtain the fusion operation code and the operand thereof, so that the hardware acceleration of the pipeline processor is realized while the instruction system of the processor is compatible.
In addition, the processor core may further include a retirement unit, configured to perform retirement processing on the first operation instruction and the second operation instruction corresponding to the fused instruction after obtaining the execution result of the fused instruction, for example, writing back the execution result generated by the execution unit to a corresponding storage location (e.g., a register inside the pipelined processor 200) so that a subsequent instruction can quickly obtain the corresponding execution result from the storage location.
For the above fusion detection unit, in a practical implementation, the fusion detection unit may be disposed in the fetching unit; in another possible approach, the fusion detection unit may be disposed in the decoding unit. That is, the fusion detection unit may be incorporated with the fetch unit; or, it is incorporated in the decoding unit. In specific applications, those skilled in the art can select and use the above-mentioned materials according to actual needs. In this embodiment, the fusion detection unit is disposed in the fetching unit as an example.
In the scheme of the embodiment of the invention, the fusion detection unit can detect and fuse the first operation instruction and the second operation instruction, and correspondingly, the decoding unit can decode the fusion instruction to obtain the fusion operation code and the operand thereof, so that the hardware acceleration of the processor core is realized while the processor instruction system is compatible.
In addition, the processor core may further include conventional devices such as a cache, a memory management unit, and a register file, which will be further described below and will not be described in detail herein.
Through the embodiment, when the processor core processes two machine instructions adjacent to each other in time sequence, if the two machine instructions can meet the corresponding instruction fusion condition, for example, if the two machine instructions are an instruction for indicating addition and subtraction operation and an instruction for indicating mask operation, the two machine instructions can be fused, and the fused instruction as a whole is used for subsequent processing. This is because the arithmetic logic of the add-subtract operation is usually accompanied by a masking operation to perform data extraction during the operation. Based on the method, the arithmetic logic and the mask operation can be fused, so that the hardware acceleration is realized and the performance of the processor core is improved under the condition of not increasing instructions. And because no additional instruction is added, the problem of poor compatibility with the existing processor instruction system is avoided.
On the basis of the processor core shown in fig. 1, an embodiment of the present invention further provides a pipeline processor, as shown in fig. 2. It should be noted that other types of processors that can be adapted to the above-mentioned processor core are also within the scope of the embodiments of the present invention.
FIG. 2 is a block diagram of an exemplary pipelined processor to which the instruction fusion scheme of embodiments of the present invention may be applied.
As can be seen, the pipelined processor 100 may include one or more processor cores 120 for processing instructions, each processor core 120 may be used to process a particular instruction set. In some embodiments, the Instruction Set may support Complex Instruction Set Computing (CISC), reduced Instruction Set Computing (RISC), or Very Long Instruction Word (VLIW) -based Computing. The different processor cores 120 may each process different or the same instruction set. In an embodiment of the present invention, at least one of the processor cores 120 is configured to support RISC, illustratively RISC-V. As an example, processor cores 1 to m are shown in fig. 2, m being a natural number greater than 1.
In some embodiments, the cache memory 18 shown in FIG. 2 may be fully or partially integrated in the pipelined processor 100. And depending on the architecture, the cache memory 18 may be a single or multiple levels of internal cache memory (e.g., level 3 cache memories L1-L3 shown in fig. 2, collectively referenced as 18 in fig. 2) within and/or outside of the respective processor cores 101, as well as instruction-oriented instruction cache and data-oriented data cache. In some embodiments, various components in pipelined processor 100 may share at least a portion of the cache memory, as shown in FIG. 2, with processor cores 1-m sharing a third level cache memory L3, for example.
In some embodiments, as shown in FIG. 2, pipelined processor 100 may include a Register File 126 (Register File), where Register File 126 may include a plurality of registers for storing different types of data and/or instructions, and where the registers may be of different types. For example, register file 126 may include: integer registers, floating point registers, status registers, instruction registers, pointer registers, and the like. The registers in the register file 126 may be implemented by general purpose registers, or may be designed specifically according to the actual requirements of the pipelined processor 100.
In some embodiments, the pipelined processor 100 may further include a Memory Management Unit (MMU) 122 for implementing the virtual address to physical address translation. One or more memory management units 122 may be disposed in each processor core 120, and the memory management units 120 in different processor cores 120 may also be synchronized with the memory management units 120 located in other processors or processor cores, such that each processor or processor core may share a unified virtual storage system.
The pipelined processor 100 based on the above-mentioned arrangement is used for executing a sequence of instructions, in particular, the sequence of instructions can be executed by the processor core 120, and the process of executing each instruction includes: and the steps of fetching the instruction from the memory for storing the instruction, decoding the fetched instruction, executing the decoded instruction, saving the instruction execution result and the like are circulated until all instructions in the instruction sequence are executed or a halt instruction is encountered.
Thus, to execute a sequence of instructions, the pipelined processor 100 may also include an instruction fetch unit 124, a decode unit 125, an execution unit 121, a retirement unit (not shown), and so on. In this embodiment, the pipelined processor 100 includes these elements by a processor core 120, and the processor core is the processor core shown in fig. 1 in this embodiment of the present invention.
In this regard, the fetch unit 124 acts as a boot engine for the pipelined processor 100, and is used to move instructions from the memory 14 into an instruction register (which may be one of the registers in the register file 26 shown in FIG. 1 used to store instructions) and receive or compute a next fetch address according to a fetch algorithm, such as: the address is incremented or decremented according to the instruction length. When the processor core shown in fig. 1 is used, the instruction fetching unit 124 is further provided with a fusion detection unit to determine whether two machine instructions adjacent to each other in time sequence fetched by the instruction fetching unit 124 satisfy a fusion condition, and when the fusion condition is satisfied, perform instruction fusion to obtain a fusion instruction. For specific implementation of the process, reference may be made to the description in the foregoing processor core embodiment, and details are not described herein again.
Then, entering the instruction decoding stage, the decoding unit 125 decodes the fused instruction to obtain operand information required by the fused instruction, so as to prepare for the operation of the execution unit 121. Operand information such as an immediate, register, or other information capable of providing a source operand.
After the instruction is fetched, decoded and dispatched to the corresponding execution unit 121, the execution unit 121 starts to execute the fused instruction, i.e. execute the operation indicated by the fused instruction, and implement the corresponding function.
The retirement unit (or instruction write-back unit) is mainly responsible for writing back the execution results generated by the execution units 121 to corresponding storage locations (e.g., registers inside the pipeline processor 100) so that subsequent instructions can quickly obtain the corresponding execution results from the storage locations. And then, performing instruction recovery and retirement to finish the processing process of the corresponding instruction. In this embodiment, retirement operations are performed on the first operation instruction and the second operation instruction corresponding to the fusion instruction at the same time.
By the processor of the embodiment, hardware acceleration can be realized and the performance of the processor is improved under the condition that instructions are not increased. And because no additional instruction is added, the problem of poor compatibility with the existing processor instruction system is avoided.
On the basis of the processor, the embodiment of the invention further provides a chip, which at least comprises the processor core or the processor. In practical applications, the chip may further be provided with hardware, a controller, and the like for implementing various functions according to different actual requirements, but it is within the protection scope of the present invention as long as the chip includes the processor core or the processor.
Further, an embodiment of the present invention further provides a control device, which at least includes the processor core or the processor or the chip as described above. In practical applications, the control device may be implemented as any suitable device, such as a mobile control device, an industrial control device, a desktop control device, and so on.
In addition, an embodiment of the present invention further provides an instruction fusing method, and referring to fig. 3, a flowchart illustrating steps of the instruction fusing method according to an embodiment of the present invention is shown.
The instruction fusion method of the embodiment comprises the following steps:
step S202: two machine instructions adjacent in time sequence are obtained.
Whatever the application or program, the code for implementing the corresponding functions or operations will be finally compiled into computer-readable machine language, i.e., machine instructions, and sent to a processor for execution. The specific execution time sequence of the instruction is determined by the compiling result, and for the processor, the processor acquires the machine instruction to perform corresponding instruction processing so as to realize the operation and the function indicated by the machine instruction. For example, a machine instruction may be fetched by instruction fetch unit 124 as shown in FIG. 1.
Since the instructions that can be fused are usually consecutive in time sequence in the case of instruction fusion, the processor performs fusion judgment and processing on two machine instructions adjacent in time sequence in this step.
Step S204: and judging whether the two machine instructions meet the fusion condition.
In order to ensure smooth proceeding of instruction fusion, in the embodiment of the invention, corresponding fusion conditions are set, and when two machine instructions adjacent in time sequence meet the fusion conditions, the two machine instructions can be considered to be fused; otherwise, they are considered to be unable to fuse.
In this embodiment, the fusion condition at least includes: the two machine instructions are respectively a first operation instruction for indicating an addition and subtraction operation and a second operation instruction for indicating a mask operation. The addition and subtraction operation includes both addition operation and subtraction operation, and correspondingly, the first operation instruction may be an addition operation instruction or a subtraction operation instruction. Since most of the underlying machine instructions use the arithmetic logic of addition and subtraction operations, which are usually accompanied by masking operations. Therefore, in the embodiment of the present invention, the set fusion conditions at least include: the two machine instructions are respectively a first operation instruction for indicating an addition and subtraction operation and a second operation instruction for indicating a mask operation. It should be noted that, in the embodiment of the present invention, "first" and "second" are only used to distinguish different operation instructions, and do not indicate a timing or sequence relationship between the two.
For the two machine instructions, there may be a case where the addition and subtraction operation precedes the masking operation, and a case where the masking operation precedes the addition and subtraction follows, and therefore, the two machine instructions may be a first operation instruction for instructing the addition and subtraction operation and a second operation instruction for instructing the masking operation, respectively, and may include: the two machine instructions are two machine instructions executed by the first operation instruction before the second operation instruction; or the two machine instructions are two machine instructions executed by the second operation instruction before the first operation instruction. Therefore, the situation that instruction fusion is needed can be comprehensively considered.
In practical applications, since the instruction needs to write back to the register after execution, there may be only one write back port for writing the execution result back to the register in the execution unit, but there may be more than one write back ports (in the embodiment of the present invention, the numbers related to "more" such as "more" and "multiple" refer to two or more than two, unless otherwise specified). When the write-back ports comprise a plurality of write-back ports, whether instruction fusion needs to be carried out or not can be accurately judged through the fusion condition.
However, in some cases, there may be only one write-back port, and in this case, in order to ensure that the instruction merging can be performed successfully, optionally, the merging condition further includes: and on the basis of determining that the two machine instructions are the first operating instruction and the second operating instruction respectively, the destination register of the first operating instruction is the same as the destination register of the second operating instruction. Therefore, the accuracy of judgment can be ensured to a certain extent.
However, in order to make the determination more accurate, further optionally, the fusion condition may further include: and on the basis of determining that the two machine instructions are respectively the first operating instruction and the second operating instruction, and that the destination register of the first operating instruction is the same as the destination register of the second operating instruction, the destination register of the first operating instruction is the same as the register of the first source operand of the second operating instruction.
Step S206: and if the fusion condition is met, fusing the first operation instruction and the second operation instruction to obtain a fusion instruction.
This step may include the following situations:
the first situation is as follows: when the fusion conditions only include: when the two machine instructions are respectively a first operation instruction for indicating addition and subtraction operations and a second operation instruction for indicating masking operations, if two machine instructions adjacent in time sequence satisfy: the first operation instruction is firstly, and the second operation instruction is secondly; or, the first operation instruction is the second operation instruction, and the second operation instruction is the first operation instruction. Then, the two operation instructions may be fused to obtain a fused instruction, considering that the fusion condition is satisfied. If this condition is not satisfied, fusion cannot be performed.
Case two: when the fusion conditions include: the two machine instructions are respectively a first operation instruction for indicating addition and subtraction operation and a second operation instruction for indicating mask operation, and a destination register of the first operation instruction is the same as a destination register of the second operation instruction, except that the first operation instruction is needed to be preceded and the second operation instruction is needed to be succeeded; or, the first operation instruction is the second operation instruction, and then the first operation instruction is the first operation instruction, and the first operation instruction and the second operation instruction are fused to obtain the fused instruction only when the destination register of the first operation instruction is the same as the destination register of the second operation instruction. If any of the conditions is not satisfied, fusion cannot be performed.
Case three: when the fusion conditions include: the two machine instructions are respectively a first operation instruction for indicating addition and subtraction operation and a second operation instruction for indicating mask operation, and a destination register of the first operation instruction is the same as a destination register of the second operation instruction, and a destination register of the first operation instruction is the same as a register of a first source operand of the second operation instruction, except that the first operation instruction is needed to be firstly followed by the second operation instruction; or, the first operation instruction is the second operation instruction, the second operation instruction is the first operation instruction, and the destination register of the first operation instruction is the same as the destination register of the second operation instruction, and the first operation instruction and the second operation instruction are fused to obtain the fused instruction only when the destination register of the first operation instruction is the same as the register of the first source operand of the second operation instruction. If any of the conditions is not satisfied, fusion cannot be performed.
After the two machine instructions are fused to generate a fusion instruction, the two machine instructions need to be processed continuously, and in order to enable a subsequent processing unit to clearly know that the current instruction is information of the fusion instruction, the generated fusion instruction carries, in addition to instruction data: indication information for indicating whether the current instruction is a fused instruction (which may be indicated by an indication bit, e.g., 0 for a non-fused instruction and 1 for a fused instruction), information for indicating a type of fusion when the current instruction is a fused instruction (i.e., information for indicating whether a first operation instruction precedes or a second operation instruction precedes, e.g., 0 for a first operation instruction precedes and 1 for a second operation instruction precedes), and immediate information.
Based on the fused instruction, the subsequent related processing units such as the execution unit and the retirement unit can clearly know whether the currently processed instruction is the fused instruction.
Step S208: and performing instruction processing on the two machine instructions based on the fusion instruction.
When two machine instructions are merged into one merged instruction, the merged instruction can be processed as a whole instruction, such as decoding, execution, retirement, and the like.
Specifically, the instruction processing of the two machine instructions based on the fused instruction may be implemented as: carrying out decoding operation on the fusion instruction to obtain a decoding result and an operand corresponding to the fusion instruction; and performing mask operation on the operand according to the operation instruction of the fused instruction to obtain an execution result of the fused instruction.
Furthermore, after the fused instruction is successfully executed, that is, after the execution result of the fused instruction is obtained, the first operation instruction and the second operation instruction corresponding to the fused instruction can be retired together to complete the life cycle of the instruction and recover the related resources.
As can be seen, with this embodiment, when the processor processes two machine instructions adjacent to each other in time sequence, if the two machine instructions can satisfy the corresponding instruction fusion condition, for example, the two instructions are an instruction indicating an add-subtract operation and an instruction indicating a mask operation, the two instructions may be fused, and the fused instruction as a whole is used for subsequent processing. This is because, for the arithmetic logic of the addition and subtraction operation, it is usually accompanied by a masking operation to perform data extraction during the operation. Based on the method, the operation logic and the mask operation can be fused, so that the hardware acceleration is realized and the performance of the processor is improved under the condition of not increasing instructions. And because no additional instruction is added, the problem of poor compatibility with the existing processor instruction system is avoided.
The execution of the above process in a processor is exemplified below by taking the microprocessor architecture shown in fig. 2 as an example.
Firstly, it needs to be noted that, the above-mentioned method determines whether the two machine instructions satisfy the fusion condition; and if the fusion condition is met, fusing the first operation instruction and the second operation instruction to obtain the operation of the fusion instruction, wherein the operation can be executed through an instruction fetching unit of the microprocessor or a decoding unit of the microprocessor. That is, whether the two machine instructions meet the fusion condition can be judged through an instruction fetching unit or a decoding unit of the microprocessor; and if the fusion condition is met, fusing the first operation instruction and the second operation instruction to obtain the operation of the fusion instruction. In this example, the instruction is executed by the instruction fetch unit, but it should be understood by those skilled in the art that the instruction fusion is an operation after instruction fetch and before decoding, and therefore, the instruction fusion can be implemented by the execution of the decoding unit.
Further, the operation instruction in the present example is exemplified by a RISC-V instruction set in which the relevant first operation instruction for indicating addition and subtraction operations and second operation instruction for indicating mask operations are respectively shown in the following tables:
Figure BDA0003871183940000111
the specific meaning of each instruction follows the RISC-V specification, and is not described herein.
It should be further noted that, in this example, for convenience of description, the foregoing is to determine whether two machine instructions satisfy the blending condition; and if the fusion condition is met, fusing the first operation instruction and the second operation instruction to obtain the operation package of the fusion instruction as a unit, which is called a fusion detection unit.
On this basis, when the microprocessor (pipeline type) shown in fig. 2 performs instruction processing, first, the instruction fetch unit 124 accesses a bus, a cache, and the like to store results, and encapsulates at least 1 operation instruction; then, the fusion detection unit in the instruction fetching unit 124 pre-decodes and detects two operation instructions adjacent in sequence, and if the two operation instructions satisfy (1), the first operation instruction is first, and the second operation instruction is later; or, the first operation instruction is the second operation instruction, and the second operation instruction is the first operation instruction; (2) The destination register of the first operation instruction is the same as the destination register of the second operation instruction; (3) The destination register of the first operation instruction is the same as the register of the first source operand of the second operation instruction, then the fusion detection unit fuses the two operation instructions into one fused instruction, and sends the fused instruction to the decoding unit 125, and the fused instruction is indicated by additional information as information of the fused instruction, the fused type and the immediate. Then, the decoding unit 125 decodes the fused instruction, and prepares a corresponding operand and the like according to the requirement of the fused instruction. Then, after receiving the fused instruction, the execution unit 121 performs a masking operation on the operand or the operation result according to the requirement of the fused instruction, so as to obtain a correct fused instruction execution result. The execution result is written back to the destination register. After the execution unit 121 completes all operations corresponding to the fused instruction, the fused instruction is forwarded to the retirement unit. And after receiving the fusion instruction, the retirement unit confirms that the fusion instruction is the fusion instruction according to the information carried by the fusion instruction, and performs retirement maintenance on the two operation instructions corresponding to the fusion instruction together.
Illustratively, assume that the two instructions fetched by the fetch unit 124 are: instruction 1"addw s1, a5"; instruction 2"andi S1,127". Based on the relevant specifications in RISC-V, "addw" is an add operation and "andi" is a mask operation. The instruction format corresponding to "addw" is: addw rd, rs1, rs2, where rd represents the destination register, rs1 represents the register of the first source operand, and rs2 represents the register of the second source operand. The instruction format corresponding to "andi" is: anddi rd, rs1, immedate, where rd represents the destination register, rs1 represents the register of the source operand, immedate represents the immediate.
As can be seen, if instruction 1 is an addition operation instruction in the addition and subtraction operation instructions, and instruction 2 is a mask operation instruction, then the fusion condition "(1) is the first operation instruction before the second operation instruction; alternatively, the second operation instruction is satisfied first, followed by the first operation instruction ". Further, if the destination register of instruction 1 is "s1" and the destination register of instruction 2 is also "s1", the fusion condition "(2) that the destination register of the first operation instruction and the destination register of the second operation instruction are the same" is satisfied. Further, if the destination register of instruction 1 is "s1" and the register of the source operand of instruction 2 is also "s1", the blend condition "(3) that the destination register of the first operation instruction is the same as the register of the first source operand of the second operation instruction" is satisfied.
Based on this, instruction fusion may be performed on instruction 1 and instruction 2 above to generate fused instructions "addw.mrd, rs1, rs2, imm", where ". M" indicates that the instructions have a mask operation. The fused instruction will be subsequently decoded, executed and retired as an instruction in its entirety.
Therefore, through instruction fusion, the execution efficiency and performance of the processor are effectively improved, and no additional instruction is added.
In addition, since the mask operation can have more bits after the instruction fusion, the supported range of the mask operation is wider, for example: the immediate of the mask is generated by the andi instruction, and the mask can be supported according to the andi immediate range instead of the limited number specified by the tradition; furthermore, the masking operation may support the implementation of masking on the source and destination operands, which is also more widely applicable than conventional.
The above embodiments are only used for illustrating the embodiments of the present invention, and not for limiting the embodiments of the present invention, and those skilled in the art can make various changes and modifications without departing from the spirit and scope of the embodiments of the present invention, so that all equivalent technical solutions also belong to the scope of the embodiments of the present invention, and the scope of patent protection of the embodiments of the present invention should be defined by the claims.

Claims (14)

1. A processor core, comprising: the device comprises an instruction fetching unit, a fusion detection unit, a decoding unit and an execution unit;
wherein:
the instruction fetching unit is used for acquiring two machine instructions adjacent in time sequence;
the fusion detection unit is configured to determine whether the two machine instructions satisfy a fusion condition, where the fusion condition at least includes: the two machine instructions are respectively a first operation instruction for indicating addition and subtraction operation and a second operation instruction for indicating mask operation; if the fusion condition is met, fusing the first operation instruction and the second operation instruction to obtain a fusion instruction;
the decoding unit is used for carrying out decoding operation on the fused instruction to obtain a decoding result and an operand corresponding to the fused instruction;
and the execution unit is used for performing mask operation on the operand according to the operation instruction of the fusion instruction to obtain the execution result of the fusion instruction.
2. The processor core of claim 1, wherein the two machine instructions are two machine instructions that the first operating instruction executes before the second operating instruction; or, the two machine instructions are two machine instructions executed by the second operation instruction before the first operation instruction.
3. The processor core of claim 1 or 2, wherein the fusion instruction carries: indication information for indicating whether the current instruction is a fused instruction, information for indicating a fusion type when the current instruction is a fused instruction, and immediate information.
4. The processor core of claim 1, wherein,
the fusion conditions further include: on the basis of determining that the two machine instructions are the first operation instruction and the second operation instruction respectively, a destination register of the first operation instruction is the same as a destination register of the second operation instruction;
if the fusion condition is met, fusing the first operation instruction and the second operation instruction to obtain a fusion instruction, including: if the two machine instructions are determined to be the first operation instruction and the second operation instruction respectively, and a destination register of the first operation instruction is the same as a destination register of the second operation instruction; and fusing the first operation instruction and the second operation instruction to obtain a fused instruction.
5. The processor core of claim 4, wherein,
the fusion conditions further include: on the basis of determining that the destination register of the first operation instruction is the same as the destination register of the second operation instruction, the destination register of the first operation instruction is the same as the register of the first source operand of the second operation instruction;
if the fusion condition is met, fusing the first operation instruction and the second operation instruction to obtain a fusion instruction, including: if the two machine instructions are determined to be the first operation instruction and the second operation instruction respectively, and a destination register of the first operation instruction is the same as a destination register of the second operation instruction, and a destination register of the first operation instruction is the same as a register of a first source operand of the second operation instruction; and fusing the first operation instruction and the second operation instruction to obtain a fused instruction.
6. The processor core of claim 1, wherein the fusion detection unit is disposed in combination with the fetch unit; or, the fusion detection unit and the decoding unit are combined.
7. The processor core of claim 1, wherein the processor further comprises a retirement unit;
the retirement unit is configured to perform retirement processing on the first operation instruction and the second operation instruction corresponding to the fused instruction together after the execution result of the fused instruction is obtained.
8. A processor, comprising: the processor core of any one of claims 1-7.
9. The processor of claim 8, wherein the processor is a pipelined processor.
10. A chip, comprising: the processor core of any one of claims 1-7; or, comprising a processor according to any of claims 8-9.
11. A control apparatus, characterized by comprising: the processor core of any one of claims 1-7; or, comprising a processor according to any one of claims 8-9; or, comprising a chip as claimed in claim 10.
12. An instruction fusing method, comprising:
acquiring two machine instructions adjacent in time sequence;
judging whether the two machine instructions meet a fusion condition, wherein the fusion condition at least comprises the following steps: the two machine instructions are respectively a first operation instruction for indicating addition and subtraction operation and a second operation instruction for indicating mask operation;
if the fusion condition is met, fusing the first operation instruction and the second operation instruction to obtain a fusion instruction;
and performing instruction processing on the two machine instructions based on the fusion instruction.
13. The method of claim 12, wherein,
the fusion conditions further include: on the basis of determining that the two machine instructions are the first operation instruction and the second operation instruction respectively, a destination register of the first operation instruction is the same as a destination register of the second operation instruction;
if the fusion condition is met, fusing the first operation instruction and the second operation instruction to obtain a fusion instruction, including: if the two machine instructions are determined to be the first operation instruction and the second operation instruction respectively, and a destination register of the first operation instruction is the same as a destination register of the second operation instruction; and fusing the first operation instruction and the second operation instruction to obtain a fused instruction.
14. The method of claim 13, wherein,
the fusion conditions further include: on the basis of determining that the destination register of the first operation instruction is the same as the destination register of the second operation instruction, the destination register of the first operation instruction is the same as the register of the first source operand of the second operation instruction;
if the fusion condition is met, fusing the first operation instruction and the second operation instruction to obtain a fusion instruction, including: if the two machine instructions are determined to be the first operation instruction and the second operation instruction respectively, and a destination register of the first operation instruction is the same as a destination register of the second operation instruction, and a destination register of the first operation instruction is the same as a register of a first source operand of the second operation instruction; and fusing the first operation instruction and the second operation instruction to obtain a fused instruction.
CN202211197891.5A 2022-09-29 2022-09-29 Processor core, processor, chip, control equipment and instruction fusion method Pending CN115576608A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211197891.5A CN115576608A (en) 2022-09-29 2022-09-29 Processor core, processor, chip, control equipment and instruction fusion method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211197891.5A CN115576608A (en) 2022-09-29 2022-09-29 Processor core, processor, chip, control equipment and instruction fusion method

Publications (1)

Publication Number Publication Date
CN115576608A true CN115576608A (en) 2023-01-06

Family

ID=84583175

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211197891.5A Pending CN115576608A (en) 2022-09-29 2022-09-29 Processor core, processor, chip, control equipment and instruction fusion method

Country Status (1)

Country Link
CN (1) CN115576608A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116302114A (en) * 2023-02-24 2023-06-23 进迭时空(珠海)科技有限公司 Compiler instruction scheduling optimization method for supporting instruction macro fusion CPU
CN116737241A (en) * 2023-05-25 2023-09-12 进迭时空(杭州)科技有限公司 Instruction fusion method, processor core, processor and computer system

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116302114A (en) * 2023-02-24 2023-06-23 进迭时空(珠海)科技有限公司 Compiler instruction scheduling optimization method for supporting instruction macro fusion CPU
CN116302114B (en) * 2023-02-24 2024-01-23 进迭时空(珠海)科技有限公司 Compiler instruction scheduling optimization method for supporting instruction macro fusion CPU
CN116737241A (en) * 2023-05-25 2023-09-12 进迭时空(杭州)科技有限公司 Instruction fusion method, processor core, processor and computer system
CN116737241B (en) * 2023-05-25 2024-04-19 进迭时空(杭州)科技有限公司 Instruction fusion method, processor core, processor and computer system

Similar Documents

Publication Publication Date Title
US6105129A (en) Converting register data from a first format type to a second format type if a second type instruction consumes data produced by a first type instruction
CN101965554B (en) System and method of selectively committing a result of an executed instruction
JP3810407B2 (en) System and method for reducing execution of instructions containing unreliable data in speculative processors
US10768930B2 (en) Processor supporting arithmetic instructions with branch on overflow and methods
US10176104B2 (en) Instruction predecoding
CN115576608A (en) Processor core, processor, chip, control equipment and instruction fusion method
US6289445B2 (en) Circuit and method for initiating exception routines using implicit exception checking
US10162635B2 (en) Confidence-driven selective predication of processor instructions
KR100254007B1 (en) Data processor simultaneously executable two instruction
CN108780397B (en) Program loop control
KR100493126B1 (en) Multi-pipeline microprocessor with data precsion mode indicator
CN110825437B (en) Method and apparatus for processing data
US4739470A (en) Data processing system
KR101016257B1 (en) Processor and information processing apparatus
CN113535236A (en) Method and apparatus for instruction set architecture based and automated load tracing
US6871343B1 (en) Central processing apparatus and a compile method
US20150227371A1 (en) Processors with Support for Compact Branch Instructions & Methods
US7426631B2 (en) Methods and systems for storing branch information in an address table of a processor
US7010676B2 (en) Last iteration loop branch prediction upon counter threshold and resolution upon counter one
JP3751402B2 (en) Multi-pipeline microprocessor with data accuracy mode indicator
KR20010053623A (en) Processor configured to selectively free physical registers upon retirement of instructions
CN112182999B (en) Three-stage pipeline CPU design method based on MIPS32 instruction system
CN116737241B (en) Instruction fusion method, processor core, processor and computer system
CN117008975A (en) Instruction fusion segmentation method, processor core and processor
JP2020077333A (en) Arithmetic processing device and control method of arithmetic processing device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20240226

Address after: 310052 Room 201, floor 2, building 5, No. 699, Wangshang Road, Changhe street, Binjiang District, Hangzhou City, Zhejiang Province

Applicant after: C-SKY MICROSYSTEMS Co.,Ltd.

Country or region after: China

Address before: 201208 floor 5, No. 2, Lane 55, Chuanhe Road, No. 366, Shangke Road, China (Shanghai) pilot Free Trade Zone, Pudong New Area, Shanghai

Applicant before: Pingtouge (Shanghai) semiconductor technology Co.,Ltd.

Country or region before: China