CN112947999A - Method and device for expanding instruction function of simplified instruction set computer - Google Patents

Method and device for expanding instruction function of simplified instruction set computer Download PDF

Info

Publication number
CN112947999A
CN112947999A CN202110261019.1A CN202110261019A CN112947999A CN 112947999 A CN112947999 A CN 112947999A CN 202110261019 A CN202110261019 A CN 202110261019A CN 112947999 A CN112947999 A CN 112947999A
Authority
CN
China
Prior art keywords
instruction
enhanced
function
target
instructions
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110261019.1A
Other languages
Chinese (zh)
Other versions
CN112947999B (en
Inventor
施军
叶晨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chaorui Technology Shanghai Co ltd
Original Assignee
Transcendence Information Technology Changsha Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Transcendence Information Technology Changsha Co Ltd filed Critical Transcendence Information Technology Changsha Co Ltd
Priority to CN202110261019.1A priority Critical patent/CN112947999B/en
Publication of CN112947999A publication Critical patent/CN112947999A/en
Application granted granted Critical
Publication of CN112947999B publication Critical patent/CN112947999B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30181Instruction operation extension or modification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30145Instruction analysis, e.g. decoding, instruction word fields
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3802Instruction prefetching
    • G06F9/3808Instruction prefetching for instruction reuse, e.g. trace cache, branch target cache

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Devices For Executing Special Programs (AREA)

Abstract

The invention discloses a method and a device for simplifying instruction set computer instruction function expansion, wherein the method comprises the steps of determining a target instruction function to be enhanced; the method comprises the steps of using instructions of the existing instruction set to achieve a target instruction function, and obtaining a corresponding fixed instruction sequence; identifying the target instruction function and realizing the target instruction function by using a corresponding fixed instruction sequence in a compiler; the fixed instruction sequence is detected and transformed into an enhanced instruction within the processor, the enhanced instruction being an internal instruction of the processor, invisible to a programmer and not part of an existing instruction set, the execution of the enhanced instruction being implemented within the processor. The invention can enhance the function of the instruction set on the basis of not adding new instructions in the instruction set and maintaining the compatibility of the instruction set, realizes the enhancement of the instruction function in the simplified instruction set, improves the performance of the processor, and has the advantages of simple realization and flexible use.

Description

Method and device for expanding instruction function of simplified instruction set computer
Technical Field
The invention relates to the field of processor micro-architecture design, in particular to a method and a device for simplifying instruction set computer instruction function expansion.
Background
The function of each instruction of the reduced instruction set computer is relatively single, and the reduced instruction set computer has the advantages of simple instruction implementation and low corresponding hardware implementation overhead. RISC-V is an abbreviation for the fifth generation reduced instruction set, a currently widely used reduced instruction set. The RISC-V instruction set classifies the instruction set in order to reduce the complexity of the corresponding instruction set, and one computer can only realize some instruction sets according to the corresponding use scene. For example, only the class I instruction is realized in the MCU, the standard property of the instruction set is ensured, and meanwhile, the realization is as simple as possible, the chip design cost is low, and the power consumption and the area in practical use have stronger advantages.
The RISC-V standard instruction set G represents standard IMAFD extensions (respectively code integer, multi-cycle multiply-divide, atomic, single-precision point and double-precision floating-point instruction extensions) and Zicsr, zifengei extensions, and for software compatibility, the instruction set implemented in the current general field is generally the G extension.
The instruction set is functionally complete, but for some programs it is possible that the instructions are not powerful enough, i.e. a relatively independent function requires a combination of instructions to be completed. This causes the volume of the program to expand, and the instructions may occupy more execution resources, such as instruction queues, issue resources, execution units, etc., when executed. In addition, since the execution delay is equal to the delay superposition of all instructions, a longer execution delay is required to complete the function. Thus, the performance of the processor will be directly affected when executing the program.
An intuitive way is to expand the instruction function, and implement an independent function by using one instruction, and reduce the instruction number and the instruction execution delay by adding instructions in the instruction set, thereby improving the performance of the processor for executing the instructions. However, the greatest disadvantage of this approach is that the compatibility of the program is destroyed, and the fragmentation of the instruction set ecosystem is aggravated.
Disclosure of Invention
The technical problems to be solved by the invention are as follows: aiming at the problems in the prior art, the invention provides a method and a device for expanding the instruction function of a simplified instruction set computer.
In order to solve the technical problems, the invention adopts the technical scheme that:
a method of reducing instruction set computer instruction function extensions, comprising:
1) determining a target instruction function to be enhanced;
2) the method comprises the steps of using instructions of the existing instruction set to achieve a target instruction function, and obtaining a corresponding fixed instruction sequence;
3) identifying the target instruction function and realizing the target instruction function by using a corresponding fixed instruction sequence in a compiler;
4) detecting and transforming a fixed instruction sequence into an enhanced instruction in the processor, wherein the enhanced instruction is an internal instruction of the processor, is invisible to a programmer and is not part of an existing instruction set;
5) execution of the enhanced instructions is implemented within the processor.
Optionally, the target instruction function determined to be enhanced in step 1) includes a plurality of instructions in the target reduced instruction set.
Optionally, the determination of the target instruction function to be enhanced in step 1) is one of the following two ways: mode 1, analyzing the instruction number and execution delay realized by a target simplified instruction set through the instruction function of a forward analysis program, and selecting a certain instruction function as a target instruction function to be enhanced according to the instruction number and the execution delay; in the mode 2, a certain instruction function is selected as a target instruction function to be enhanced with reference to the compiling result using another instruction set with respect to the compiling result using the target simplified instruction set for the program.
Optionally, the parameter index used when selecting a certain instruction function includes: the implementation complexity of the instruction function comprises hardware overhead, and the hardware overhead comprises at least one or more indexes of logic quantity, area and power consumption; a delay of an instruction function, the delay being how many clock cycles the instruction function requires to complete; whether the number and type of inputs and outputs of the instruction function exceed the existing data path.
Optionally, when the target instruction function is implemented by using the instruction of the existing instruction set in step 2), the instruction common destination register in the corresponding fixed instruction sequence is obtained.
Optionally, when the step 4) of detecting and converting the fixed instruction sequence into the enhanced instruction is implemented inside the processor, the method further includes a step of optimizing a last instruction of the common destination register in the fixed instruction sequence to be a no-operation instruction.
Optionally, step 5) when the execution of the enhanced instruction is implemented inside the processor, the instruction execution step of the processor includes:
s1) fetching an instruction;
s2), before decoding, firstly detecting whether the continuous instructions are fixed instruction sequences through operands, if the fixed instruction sequences are detected, judging whether the destination registers of the instructions in the fixed instruction sequences are the same, if the destination registers are different, converting the fixed instruction sequences into enhanced instructions, and keeping the previous instruction of the shared destination register in the fixed instruction sequences unchanged; if the target registers are the same, converting the fixed instruction sequence into an enhanced instruction, and optimizing the last instruction sharing the target register in the fixed instruction sequence into a no-operation instruction; then, decoding the instruction;
s3) executing instruction dispatch for the decoded instruction;
s4) transmitting the instruction;
s5), executing the instruction, and if the executed instruction is an enhanced instruction, executing the enhanced instruction through a corresponding logic component in the processor;
s6) writes the instruction execution result back to the target register.
Optionally, the target instruction function to be enhanced in step 1) is determined as a binary bit operation of obtaining a specific bit from a binary bit string in one rs register and placing the specific bit into the lower bit of an rd register; the existing instruction set in the step 2) is a RISC-V instruction set, and the obtained corresponding fixed instruction sequence comprises two instructions, namely Slli rd0, rs1, # imm0 and Slri rd0, rd0 and # imm1, wherein Slli is a logic left shift instruction with an immediate, Slri is a logic right shift instruction with an immediate, rd0 and rs1 are registers, and # imm0 and # imm1 are two immediate respectively; the step 4) of detecting the fixed instruction sequence and converting the fixed instruction sequence into an enhanced instruction in the processor to obtain the enhanced instruction comprises the following steps: nop and Bext rd0, rs1, # imm0, and # imm1, wherein nop is a null operation instruction, Bext is an instruction enhanced function encoding instruction corresponding to an enhanced instruction function, and # imm0 and # imm1 are two immediate numbers respectively.
The present invention also provides an apparatus for reduced instruction set computer instruction function extension, comprising a microprocessor and a memory interconnected, the microprocessor being programmed or configured to perform the steps of the method for reduced instruction set computer instruction function extension.
Optionally, the microprocessor includes an enhanced instruction execution unit for executing an enhanced instruction.
Compared with the prior art, the invention has the following advantages:
1. the performance is high. The invention realizes the function which can be completed by a plurality of instructions by using an internal enhanced instruction, reduces the instruction execution delay and improves the program execution performance.
2. The compatibility is good. The method of the invention uses the existing instruction set to express the newly added function, ensures the compatibility of the program and avoids the fragmentation of the instruction set ecology.
3. The use is flexible. Even without the support of a compiler, the fixed sequence detection can be carried out on the binary program which is compiled well, thereby enhancing the performance of the processor.
Drawings
FIG. 1 is a core flow diagram of a method according to an embodiment of the present invention.
FIG. 2 is a diagram illustrating a comparison of fixed instruction sequence instructions for implementing enhanced function instructions according to an embodiment of the present invention.
FIG. 3 is a diagram illustrating a comparison of fixed instruction sequences with internal instructions according to an embodiment of the present invention.
FIG. 4 is a diagram illustrating a binary bit truncation function according to an embodiment of the present invention.
FIG. 5 is a corresponding assembly of the RISC-V G extended instruction set implementing the binary bit truncation function according to an embodiment of the present invention.
FIG. 6 is a sequence of RISC-V G extended instruction set implementing function enhanced instructions according to an embodiment of the present invention.
Detailed Description
As shown in fig. 1, the method for extending the functions of the reduced instruction set computer instructions of the present embodiment includes:
1) determining a target instruction function to be enhanced;
2) the method comprises the steps of using instructions of the existing instruction set to achieve a target instruction function, and obtaining a corresponding fixed instruction sequence;
3) identifying the target instruction function and realizing the target instruction function by using a corresponding fixed instruction sequence in a compiler;
4) detecting and transforming a fixed instruction sequence into an enhanced instruction in the processor, wherein the enhanced instruction is an internal instruction of the processor, is invisible to a programmer and is not part of an existing instruction set;
5) execution of the enhanced instructions is implemented within the processor.
Generally, starting from program behavior, a wide range of functions are used in the analysis program to determine the target instruction functions to be enhanced. And the target instruction function determined to be enhanced in step 1) includes a plurality of instructions, e.g., two or more, in the target reduced instruction set.
In this embodiment, the determination of the target instruction function to be enhanced in step 1) is one of the following two ways:
mode 1, analyzing the instruction number and execution delay realized by a target simplified instruction set through the instruction function of a forward analysis program, and selecting a certain instruction function as a target instruction function to be enhanced according to the instruction number and the execution delay;
in the mode 2, a certain instruction function is selected as a target instruction function to be enhanced with reference to the compiling result using another instruction set with respect to the compiling result using the target simplified instruction set for the program.
In this embodiment, the parameter indexes adopted when selecting a certain instruction function include: the implementation complexity of the instruction function comprises hardware overhead, and the hardware overhead comprises at least one or more indexes of logic quantity, area and power consumption; the latency of an instruction function, which is how many clock cycles the instruction function needs to complete (the benefit of the enhanced instruction function can be evaluated); whether the number and type of inputs and outputs of the instruction function exceed the existing data path.
To maintain instruction compatibility, only instructions of the existing instruction set can be used to implement instruction functions that require enhancement. Typically comprising a plurality of existing instructions to implement the corresponding functions. The sequence is typically required to be fixed, i.e., the type and order of instructions are fixed, facilitating simplicity of implementation for compilers and target processors.
In order to improve the performance of enhancing the instruction function of the target architecture, and enable the register of the instruction sequence destination for implementing the instruction performance enhancement to be reused as much as possible, in step 2) of this embodiment, when the instruction of the existing instruction set is used to implement the target instruction function, the register of the instruction common destination in the corresponding fixed instruction sequence is obtained. As shown in sub-diagram (a) of fig. 2, some enhanced instruction functionality is implemented using Inst0 and Inst 1: the destination register of Inst0 is rd0, the source registers are rs1 and rs 2; the destination registers of Inst1 are rd1, and the source registers are rd0 and rs 3. In the implementation of the enhanced instruction function, the destination register rd0 needs to be reused, i.e. the destination register of Inst1 is modified to rd0, as shown in sub-diagram (b) of fig. 2. This ensures that the rd0 result is used only by Inst1 and no other instructions are used, so that better performance optimization is possible when hardware implements instruction functionality enhancements.
And 3) identifying the target instruction function in the compiler and realizing the target instruction function by using a corresponding fixed instruction sequence. The step comprises the steps of identifying a mode of enhancing the function of the instruction in the compiling process of the source program, realizing the mode by using the fixed instruction sequence in the step 2), and simultaneously realizing the use optimization of the target register.
The detection and conversion of the fixed instruction sequence into the enhanced instruction in the step 4) are mainly realized by converting the fixed instruction sequence in the step 2) into an internally realized instruction enhanced function code, and in addition, the fixed instruction sequence needs to be processed, so that the performance of the processor is improved, and the execution correctness of the instruction stream is ensured. In this embodiment, when the step 4) is to implement detection and conversion of the fixed instruction sequence into the enhanced instruction in the processor, the method further includes a step of optimizing a previous instruction of the common destination register in the fixed instruction sequence to be a no-operation instruction (nop instruction). For the instruction sequence in sub-graph (a) in fig. 2, after fixed instruction sequence detection is completed, the instruction sequence will be converted into the instruction sequence in sub-graph (a) in fig. 3, where InstE is an internal instruction of the enhanced instruction in the target processor. After the sequence of Inst0 and Inst1 is converted to Inst0 and InstE, the source operands of InstE are rs1, rs2, and rs3, so that InstE is not dependent on execution of Inst 0. Assuming that the execution delays of Inst0, Inst1 and InstE are all 1, the delay of the enhanced function is changed from 2 to 1 after the instruction function enhancement is performed, so that the program execution performance can be improved. For the instruction sequence in sub-diagram (b) in fig. 2, since the reuse of the destination register is performed, and the destination register of Inst0 is only used by Inst1, after the instruction function enhancement is performed, the destination register of Inst0 is not used by the instruction, and Inst0 can be optimized as a nop instruction (no operation is performed), as shown in sub-diagram (b) in fig. 3. After the instruction is enhanced, the enhanced function delay can be changed from 2 to 1, and one instruction can be executed less, so that the waste of execution resources is reduced, and the power consumption is reduced.
Finally, execution of the enhanced instruction is effected inside the processor by step 5). The instruction functions that need to be enhanced are implemented within the target processor. The method generally comprises the steps of setting up internal coding of an enhanced instruction, and adding corresponding logic to perform control and operation realization in a decoding stage, an instruction dispatching stage, an instruction transmitting stage, an instruction executing stage and an instruction writing-back stage. In this embodiment, in step 5), when the processor implements execution of the enhanced instruction, the instruction execution step of the processor includes:
s1) fetching an instruction;
s2), before decoding, firstly detecting whether the continuous instructions are fixed instruction sequences through operands, if the fixed instruction sequences are detected, judging whether the destination registers of the instructions in the fixed instruction sequences are the same, if the destination registers are different, converting the fixed instruction sequences into enhanced instructions, and keeping the previous instruction of the shared destination register in the fixed instruction sequences unchanged; if the target registers are the same, converting the fixed instruction sequence into an enhanced instruction, and optimizing the last instruction sharing the target register in the fixed instruction sequence into a no-operation instruction; then, decoding the instruction;
s3) executing instruction dispatch for the decoded instruction;
s4) transmitting the instruction;
s5), executing the instruction, and if the executed instruction is an enhanced instruction, executing the enhanced instruction through a corresponding logic component in the processor;
s6) writes the instruction execution result back to the target register.
The method for expanding the instruction function of the reduced instruction set computer of the present embodiment is further described in detail below by taking the existing instruction set extension G of RISC-V as an example:
after analyzing the assembly generated by the C source code, the current RISC-V G extension lacks operations on binary bits. As shown in fig. 4, one operation commonly used in programs is to take a particular bit from a binary bit string in the rs register and place it in the low order of the rd register. Therefore, the target instruction function determined to be enhanced in step 1) is a binary bit operation of taking a specific bit from a binary bit string in one rs register and placing the bit into the lower bit of the rd register.
In step 2), the target instruction function is realized by using the instruction of the existing instruction set, and when the corresponding fixed instruction sequence is obtained, due to the limitation of the RISC-V G extended instruction set, the realization of the function needs to firstly shift the binary number to the left, cut off the high order, and then shift the binary number to the right and cut off the low order. I.e. the instruction sequence as shown in sub-diagram (a) of fig. 5: one slli (logical left shift with immediate) instruction followed by one slri (logical right shift with immediate) instruction. Such a logically simple operation requires two instructions to implement, with a delay of 2 clock cycles. The combination function of two instructions needs 1 source register and 2 immediate numbers (total 12 bits), and can be naturally fused in the current instruction and data path, so that the function enhancement extension of the instruction set can be considered, and the function of bit interception is realized.
In order to improve the performance of enhancing the instruction function of the target architecture, and enable the register of the instruction sequence destination for implementing the instruction performance enhancement to be reused as much as possible, in step 2) of this embodiment, when the instruction of the existing instruction set is used to implement the target instruction function, the register of the instruction common destination in the corresponding fixed instruction sequence is obtained. As shown in fig. 5, in sub-graph (a) in fig. 5, the destination register of the slli instruction is not optimized. The optimization is implemented in sub-graph (b) of fig. 5, where the destination registers of slli and slri are the same. Therefore, the existing instruction set in step 2) is a RISC-V instruction set, and the obtained corresponding fixed instruction sequence includes two instructions Slli rd0, rs1, # imm0 and Slri rd0, rd0 and # imm1, where Slli is a logical left shift instruction with an immediate, Slri is a logical right shift instruction with an immediate, rd0 and rs1 are registers, and # imm0 and # imm1 are two immediate respectively; the function is supported in a compiler, the operation of bit interception is found, an instruction sequence in a subgraph (b) in fig. 5 is generated, it is ensured that no other instruction is inserted between two instructions, the use of a destination register is optimized, and the destination register of the slli and the destination register of the slri are set as the same register.
Finally, the step 4) of detecting the fixed instruction sequence and converting the fixed instruction sequence into the enhanced instruction in the processor to obtain the enhanced instruction comprises the following steps: nop and Bext rd0, rs1, # imm0, and # imm1, where nop is a null operation instruction, Bext is an instruction enhanced function encoding instruction corresponding to an enhanced instruction function, and # imm0 and # imm1 are two immediate numbers, respectively, as shown in sub-diagram (b) in fig. 6.
Step 5) when the execution of the enhanced instruction is realized in the processor, the instruction execution step of the processor comprises the following steps:
a) and (5) fetching the instruction. The instruction fetching process is identical to the ordinary instruction processing process.
b) And (5) decoding. The two types of instruction sequences in figure 5 are probed and transformed prior to decoding. The method comprises the following steps: i. detecting: the detection is realized by judging a fixed instruction sequence through an operand, specifically, judging two continuous instructions and judging through the operand. If a certain instruction is an slli instruction, the determination of the fixed instruction sequence in the subgraph (b) in fig. 5 can be realized by determining whether the next instruction following the instruction is an slri instruction; optimizing: after the detection is implemented, it is further required to determine whether the destination register of the slli instruction is the same as the destination register of the slri, that is, whether the instruction sequence is subgraph (a) in fig. 5 or subgraph (b) in fig. 5. Transforming: if the destination registers are different, these two instructions are transformed into the internal representation of sub-graph (a) in fig. 6, and if the destination registers are the same, these two instructions are transformed into the internal representation of sub-graph (b) in fig. 6. Wherein the bext instruction indicates that the bit between imm2 and imm3 is truncated from rs1 and put to the low order of the rd register. The Imm2 and Imm3 of the Bext instruction are calculated from the Imm0 and the Imm1 of slri of slli instructions. Whereas the Bext instruction is used only internally within the processor and is not visible to the programmer and is not part of the instruction set.
c) And (4) instruction dispatching. Instruction dispatch, as is common in the processing of processors, typically involves assigning different instruction types to different execution units, waiting in a queue of instruction units. And sending the slli instruction to an original instruction component, and directly completing the nop instruction without sending the nop instruction to an instruction execution component. The bext instruction is sent to the corresponding component. In general, bext and slli instructions operate similarly and may be implemented in the same instruction unit, with an execution delay of one clock cycle.
d) And (5) transmitting the instruction. The instruction waits in the queue of the component for the source operands to be ready and then scheduled.
e) And executing the instruction. The Slli instruction is executed according to the original execution mode. The Bext instruction is executed by the added logic. Since the bext is simple in executing function, usually one clock cycle can be completed, that is, the function executed by the two instructions slli and slri in series is shortened from the delay of two clock cycles to one clock cycle, thereby improving the performance of the processor.
f) The result is written back. And writing the Bext result into a corresponding register, finishing and submitting the instruction, and finishing the life cycle of the instruction.
In addition, the present embodiment also provides an apparatus for reduced instruction set computer instruction function extension, which includes a microprocessor and a memory connected to each other, wherein the microprocessor is programmed or configured to execute the steps of the method for reduced instruction set computer instruction function extension. As an optional implementation manner, in this embodiment, the microprocessor includes an enhanced instruction execution unit for executing the enhanced instruction, so as to improve the execution efficiency of the enhanced instruction.
Furthermore, the present embodiment also provides a computer-readable storage medium having stored therein a computer program programmed or configured to perform the method of reduced instruction set computer instruction function extension described above.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-readable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein. The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks. These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks. These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The above description is only a preferred embodiment of the present invention, and the protection scope of the present invention is not limited to the above embodiments, and all technical solutions belonging to the idea of the present invention belong to the protection scope of the present invention. It should be noted that modifications and embellishments within the scope of the invention may occur to those skilled in the art without departing from the principle of the invention, and are considered to be within the scope of the invention.

Claims (10)

1. A method of reducing instruction set computer instruction function extensions, comprising:
1) determining a target instruction function to be enhanced;
2) the method comprises the steps of using instructions of the existing instruction set to achieve a target instruction function, and obtaining a corresponding fixed instruction sequence;
3) identifying the target instruction function and realizing the target instruction function by using a corresponding fixed instruction sequence in a compiler;
4) detecting and transforming a fixed instruction sequence into an enhanced instruction in the processor, wherein the enhanced instruction is an internal instruction of the processor, is invisible to a programmer and is not part of an existing instruction set;
5) execution of the enhanced instructions is implemented within the processor.
2. The method of claim 1 wherein the target instruction functionality determined to be enhanced in step 1) comprises a plurality of instructions in the target reduced instruction set.
3. The method of claim 2 wherein the determination of the target instruction functionality to be enhanced in step 1) is one of the following two ways: mode 1, analyzing the instruction number and execution delay realized by a target simplified instruction set through the instruction function of a forward analysis program, and selecting a certain instruction function as a target instruction function to be enhanced according to the instruction number and the execution delay; in the mode 2, a certain instruction function is selected as a target instruction function to be enhanced with reference to the compiling result using another instruction set with respect to the compiling result using the target simplified instruction set for the program.
4. The method of claim 3 wherein the selecting a particular instruction function using the parameter metrics comprises: the implementation complexity of the instruction function comprises hardware overhead, and the hardware overhead comprises at least one or more indexes of logic quantity, area and power consumption; a delay of an instruction function, the delay being how many clock cycles the instruction function requires to complete; whether the number and type of inputs and outputs of the instruction function exceed the existing data path.
5. The method of claim 1 wherein the common destination register for instructions in the corresponding fixed instruction sequence is obtained when the target instruction function is implemented using instructions in the existing instruction set in step 2).
6. The method of claim 1 wherein the step 4) of probing and converting the fixed instruction sequence into the enhanced instruction is performed within the processor, and further comprising the step of optimizing a last instruction in the fixed instruction sequence that shares the destination register to be a no-operation instruction.
7. The method of claim 1 wherein step 5) enables execution of the enhanced instructions within the processor, and wherein the processor's instruction execution step comprises:
s1) fetching an instruction;
s2), before decoding, firstly detecting whether the continuous instructions are fixed instruction sequences through operands, if the fixed instruction sequences are detected, judging whether the destination registers of the instructions in the fixed instruction sequences are the same, if the destination registers are different, converting the fixed instruction sequences into enhanced instructions, and keeping the previous instruction of the shared destination register in the fixed instruction sequences unchanged; if the target registers are the same, converting the fixed instruction sequence into an enhanced instruction, and optimizing the last instruction sharing the target register in the fixed instruction sequence into a no-operation instruction; then, decoding the instruction;
s3) executing instruction dispatch for the decoded instruction;
s4) transmitting the instruction;
s5), executing the instruction, and if the executed instruction is an enhanced instruction, executing the enhanced instruction through a corresponding logic component in the processor;
s6) writes the instruction execution result back to the target register.
8. The method of claim 1, wherein the target instruction function determined to be enhanced in step 1) is a binary bit operation of fetching a specific bit from a binary bit string in an rs register and placing the bit into a lower bit of an rd register; the existing instruction set in the step 2) is a RISC-V instruction set, and the obtained corresponding fixed instruction sequence comprises two instructions, namely Slli rd0, rs1, # imm0 and Slri rd0, rd0 and # imm1, wherein Slli is a logic left shift instruction with an immediate, Slri is a logic right shift instruction with an immediate, rd0 and rs1 are registers, and # imm0 and # imm1 are two immediate respectively; the step 4) of detecting the fixed instruction sequence and converting the fixed instruction sequence into an enhanced instruction in the processor to obtain the enhanced instruction comprises the following steps: nop and Bext rd0, rs1, # imm0, and # imm1, wherein nop is a null operation instruction, Bext is an instruction enhanced function encoding instruction corresponding to an enhanced instruction function, and # imm0 and # imm1 are two immediate numbers respectively.
9. An apparatus for reduced instruction set computer instruction function extension, comprising a microprocessor and a memory interconnected, wherein the microprocessor is programmed or configured to perform the steps of the method for reduced instruction set computer instruction function extension of any one of claims 1 to 8.
10. A computer-readable storage medium having stored thereon a computer program programmed or configured to perform a method of reduced instruction set computer instruction function extension according to any one of claims 1 to 8.
CN202110261019.1A 2021-03-10 2021-03-10 Method and device for expanding instruction function of reduced instruction set computer Active CN112947999B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110261019.1A CN112947999B (en) 2021-03-10 2021-03-10 Method and device for expanding instruction function of reduced instruction set computer

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110261019.1A CN112947999B (en) 2021-03-10 2021-03-10 Method and device for expanding instruction function of reduced instruction set computer

Publications (2)

Publication Number Publication Date
CN112947999A true CN112947999A (en) 2021-06-11
CN112947999B CN112947999B (en) 2022-06-28

Family

ID=76228630

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110261019.1A Active CN112947999B (en) 2021-03-10 2021-03-10 Method and device for expanding instruction function of reduced instruction set computer

Country Status (1)

Country Link
CN (1) CN112947999B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050268293A1 (en) * 2004-05-25 2005-12-01 International Business Machines Corporation Compiler optimization
CN110018848A (en) * 2018-09-29 2019-07-16 安凯(广州)微电子技术有限公司 A kind of mixing based on RISC-V is mixed to calculate system and method
WO2020138663A1 (en) * 2018-12-26 2020-07-02 (주)자람테크놀로지 Device for risc-v-based operation including hardware-based fast operation supporting user-defined instruction set, and method therefor
CN111399912A (en) * 2020-03-26 2020-07-10 超验信息科技(长沙)有限公司 Instruction scheduling method, system and medium for multi-cycle instruction
CN112214242A (en) * 2020-09-23 2021-01-12 上海赛昉科技有限公司 RISC-V instruction compression method, system and computer readable medium
CN112256330A (en) * 2020-11-03 2021-01-22 中国人民解放军军事科学院国防科技创新研究院 RISC-V instruction set extension method for accelerating digital signal processing

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050268293A1 (en) * 2004-05-25 2005-12-01 International Business Machines Corporation Compiler optimization
CN110018848A (en) * 2018-09-29 2019-07-16 安凯(广州)微电子技术有限公司 A kind of mixing based on RISC-V is mixed to calculate system and method
WO2020138663A1 (en) * 2018-12-26 2020-07-02 (주)자람테크놀로지 Device for risc-v-based operation including hardware-based fast operation supporting user-defined instruction set, and method therefor
CN111399912A (en) * 2020-03-26 2020-07-10 超验信息科技(长沙)有限公司 Instruction scheduling method, system and medium for multi-cycle instruction
CN112214242A (en) * 2020-09-23 2021-01-12 上海赛昉科技有限公司 RISC-V instruction compression method, system and computer readable medium
CN112256330A (en) * 2020-11-03 2021-01-22 中国人民解放军军事科学院国防科技创新研究院 RISC-V instruction set extension method for accelerating digital signal processing

Also Published As

Publication number Publication date
CN112947999B (en) 2022-06-28

Similar Documents

Publication Publication Date Title
CN108027769B (en) Initiating instruction block execution using register access instructions
KR101754462B1 (en) Method and apparatus for implementing a dynamic out-of-order processor pipeline
JP3797471B2 (en) Method and apparatus for identifying divisible packets in a multi-threaded VLIW processor
US8904153B2 (en) Vector loads with multiple vector elements from a same cache line in a scattered load operation
KR101450675B1 (en) Data processing apparatus, data processing system, packet, recording medium, storage device, and data processing method
US20170083320A1 (en) Predicated read instructions
US20120060016A1 (en) Vector Loads from Scattered Memory Locations
KR20180021812A (en) Block-based architecture that executes contiguous blocks in parallel
TW201712544A (en) Verifying branch targets
JP2018500657A5 (en)
US6950926B1 (en) Use of a neutral instruction as a dependency indicator for a set of instructions
JP2012529096A (en) Data processing apparatus and method for handling vector instructions
US20130151822A1 (en) Efficient Enqueuing of Values in SIMD Engines with Permute Unit
US11366669B2 (en) Apparatus for preventing rescheduling of a paused thread based on instruction classification
WO2014090085A1 (en) Branch-free condition evaluation
CN108319559B (en) Data processing apparatus and method for controlling vector memory access
CN114327620A (en) Apparatus, method, and system for a configurable accelerator having data stream execution circuitry
CN113703832B (en) Method, device and medium for executing immediate data transfer instruction
JPH1165844A (en) Data processor with pipeline bypass function
JP5326314B2 (en) Processor and information processing device
EP1378824A1 (en) A method for executing programs on multiple processors and corresponding processor system
EP3746883A1 (en) Processor having multiple execution lanes and coupling of wide memory interface via writeback circuit
CN112947999B (en) Method and device for expanding instruction function of reduced instruction set computer
JP3835764B2 (en) Processor and recording medium
US20140365751A1 (en) Operand generation in at least one processing pipeline

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20220531

Address after: 201210 room 2fa222, block a, building 1, No. 800, Naxian Road, Pudong New Area, Shanghai

Applicant after: Chaorui Technology (Shanghai) Co.,Ltd.

Address before: Room 2106, Great Wall wanfuhui gold block, No.9 Shuangyong Road, Kaifu District, Changsha City, Hunan Province, 410003

Applicant before: Transcendence information technology (Changsha) Co.,Ltd.

GR01 Patent grant
GR01 Patent grant