CN118092853A - Instruction set expansion method and device based on RISC-V floating point overrunning function - Google Patents

Instruction set expansion method and device based on RISC-V floating point overrunning function Download PDF

Info

Publication number
CN118092853A
CN118092853A CN202410510687.7A CN202410510687A CN118092853A CN 118092853 A CN118092853 A CN 118092853A CN 202410510687 A CN202410510687 A CN 202410510687A CN 118092853 A CN118092853 A CN 118092853A
Authority
CN
China
Prior art keywords
instruction
floating point
function
risc
instruction set
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202410510687.7A
Other languages
Chinese (zh)
Inventor
覃博琛
蔡刚
黄志洪
魏育成
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ehiway Microelectronic Science And Technology Suzhou Co ltd
Original Assignee
Ehiway Microelectronic Science And Technology Suzhou Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ehiway Microelectronic Science And Technology Suzhou Co ltd filed Critical Ehiway Microelectronic Science And Technology Suzhou Co ltd
Priority to CN202410510687.7A priority Critical patent/CN118092853A/en
Publication of CN118092853A publication Critical patent/CN118092853A/en
Pending legal-status Critical Current

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Executing Machine-Instructions (AREA)

Abstract

The invention provides an instruction set expansion method and device based on RISC-V floating point overrunning function, the device comprises: a processor based on RISC-V architecture; the processor comprises a program counter, a decoding module, a plurality of execution units, a loading storage module, a register file, an instruction memory and a data memory; the plurality of execution units include a CORDIC unit and an FPU unit; the execution units receive the instruction control signals sent by the decoding modules to execute CORDIC floating point operation of the overrunning function; the CORDIC unit adopts a parallel floating point fusion multiply-add, floating point addition and floating point multiplication circuit iteration structure; a 2-line parallel floating point multiplication circuit scaling architecture is employed. The scheme compresses the number of instructions exceeding function calculation, improves compiling speed, adopts the floating point data format of IEEE-754 standard, and has high calculation accuracy and large calculation range.

Description

Instruction set expansion method and device based on RISC-V floating point overrunning function
Technical Field
The invention relates to the technical field of processors based on RISC-V architecture, in particular to an instruction set expansion method and device under the RISC-V processor architecture supporting overrunning functions such as floating point trigonometric functions, hyperbolic functions and the like.
Background
The RISC-V instruction set is an open source instruction set architecture (Instruction Set Architecture, ISA) developed on the basis of the Reduced Instruction Set (RISC), and is an emerging, completely open, standard, and completely new instruction set capable of supporting a variety of applications. The instruction set of RISC-V includes two classes, a base instruction set and an extended instruction set. The RISC-V instruction set architecture is defined as a combination of a base instruction set and several optional extended instruction sets. The basic instruction set is an integer instruction set RV32I and RV64I, and provides 32-bit address space and 64-bit address space respectively; other optional instruction sets include integer multiply-divide operations (M), atomic operations (A), single/double floating point operations (F/D), packed instructions (C), and the like. Due to the advantages of RISC-V in terms of customizable and scalable properties, system-level tasks in the respective fields can be accomplished efficiently, quickly and at low cost. The RISC-V architecture provides enough encodable space to support the expansion of a custom instruction set at the beginning of design, and a user can design the custom expansion instruction according to specific requirements, and can flexibly select different module combinations or expand to construct a microprocessor suitable for any field, such as a high-performance CPU for advanced system application, or an embedded MCU with small area and low power consumption, and the like.
With the increasing demands of floating point arithmetic operations in computer systems, applications range from artificial intelligence, electromagnetic scattering theory, to image processing, FFT computation, eigenvalue computation, and the like. In order to support and accelerate these applications, high performance computing devices capable of producing high throughput are needed. The floating point processor (Floating Point Unit, FPU) greatly improves the performance of these high-computing applications while acting as an accelerator in parallel with the integer pipeline and sharing large-scale, high-latency floating point instructions from the host processor. The floating point extended instruction based on RISC-V includes the basic operation instructions of floating point addition, floating point multiplication, floating point multiply accumulation, floating point division, floating point square root, floating point conversion, floating point number comparison, etc.
Currently, for a high-level language program with an overrunning function such as a trigonometric function, a hyperbolic function, etc., if a target ISA supports a machine instruction (such as X86) of the overrunning function, etc., a compiler will optimize by compiling Guan Daima into the machine instruction supported by the ISA. However, in the conventional RISC-V instruction set, there is no instruction for such an override function, and if a trigonometric function is required to be used in a high-level application program, the RISC-V GCC (GNU Compiler Collection) compiles a large number of basic integer instructions to execute, for example, a trigonometric function, resulting in problems of large compiled instruction code volume, long calculation delay, and the like.
In addition, in the existing CORDIC IP, floating point numbers are all represented by fixed point data types, in a high-level language application program, fixed point data formats need to set bit widths of integer digits and decimal digits according to different application requirements, and under the limited bit widths, the represented data range is smaller, the precision is not high, and unified standards are lacking. The floating point data based on the IEEE-754 standard is the most widely used floating point data type, and compared with the fixed point data type, the floating point data can represent a data range far larger than that represented by the fixed point number under the same data bit width, and the relative precision is higher than that of the fixed point number. Meanwhile, in the existing advanced application program, the IEEE-754 standard is adopted, so that a CORDIC operation module capable of supporting the input and output of the floating point number of the IEEE-754 standard is required to be designed.
Disclosure of Invention
Aiming at the problems existing in the prior art, the invention provides a floating point override function instruction set expansion method and device based on RISC-V, so as to realize a structure based on RISC-V instruction set, and further support floating point override function expansion instructions on the basis of supporting a floating point operation instruction set RVF/D. Specifically, the invention provides the following technical scheme:
in one aspect, the present invention provides a RISC-V floating point override function based instruction set extension apparatus, the apparatus comprising:
A processor based on RISC-V architecture; the processor comprises a program counter, a decoding module, a plurality of execution units, a loading storage module, a register file, an instruction memory and a data memory;
the plurality of execution units comprise a CORDIC unit and an FPU unit;
the execution units receive the instruction control signals sent by the decoding module to execute CORDIC floating point operation of the overrunning function;
The CORDIC unit adopts a parallel floating point fusion multiplier-adder, a floating point adder and a floating point multiplier to form an iterative structure, and the iterative structure is responsible for completing iterative operation; forming a scaling structure through two parallel floating point multipliers, wherein the scaling structure is responsible for completing scaling operation; the instruction control signal is derived based on an extended instruction set.
Preferably, the iterative structure specifically includes: 2 parallel floating-point fusion multipliers, 1 floating-point multiplier and 1 floating-point adder.
Preferably, the iterative calculation mode of the CORDIC unit is as follows:
(1)
(2)
(3)
(4)
(5)
Wherein, ,/>Represents the coordinate value after the ith iterative operation,/>Representing the residual angle value after the ith iterative operation,/>Is a weight factor,/>Is a scale factor,/>For the rotation angle, coefficient/>Updating according to the mode m; m represents a pattern,/>Corresponding to a circular coordinate system, a linear coordinate system and a hyperbolic coordinate system in sequence.
Preferably, the variable is scaled to obtain a final output value,/>,/>
(6)
(7)
(8)
Wherein the variables are、/>、/>Values representing convergence of the variables after r iterations,/>, respectivelyRepresenting the value of the scale factor after r iterations.
Preferably, in the formulas (1) and (2)And/>Performing a complete floating point multiply-add operation;
in the formulas (3) and (4) And/>To perform single floating point addition and floating point multiplication operations, respectively.
In the formulas (6) and (7)And/>Performs parallel floating point multiplication operations.
Preferably, the extended instruction set is a floating point type instruction, an R-type instruction which is classified into a RISC-V floating point instruction set RVF; the instructions in the extended instruction set are all single operand instructions.
Preferably, in the instruction set, an rs1 field represents an address of a source operand 1, and an rs2 field represents an address of a source operand 2; the rd field indicates the destination register address; funct7, funct3 fields represent different instructions, and opcode field represents instruction type operation code;
In the decoding stage, the decoding module removes funct, funct and opcode fields according to the instruction sent by the program counter and compares the fields one by one; after confirming the override function floating point operation instruction provided by the invention, forming a control logic instruction, and sending the control logic instruction to a CORDIC unit; the control logic instructions correspond to specific function functions.
Preferably, source operand 2 is set to 0, since the instructions in the extended instruction set are set to single operand instructions.
Preferably, the decoding module further includes a control signal unit, where the control signal unit receives the instruction control signal and initiates an execution request to the CORDIC unit, and sends the control logic instruction to a multiplexer Mux;
the CORDIC unit executes corresponding function operation based on the instruction control signal and sends an operation result to the multiplexer Mux;
the multiplexer Mux transmits the output result of the CORDIC unit to the load store module as the output result of the multiplexer Mux based on the instruction control signal.
Preferably, in the write-back stage, the result of the calculation of the CORDIC floating point operation is written back to a floating point register in the register file.
Preferably, the decoding module fetches the corresponding floating point data from the floating point registers in the register file based on the value of the rs1 field, and inputs the floating point data to the CORDIC unit to perform the floating point function operation.
Preferably, the device according to claim 1, wherein the program counter performs data interaction with the instruction memory, and the program counter sends the data obtained from the instruction memory to the decoding module; the load storage module sends the operation result output by the multiplexer Mux to the data memory or the register file.
In another aspect, the present invention also provides a method for extending a floating point override function instruction set based on RISC-V, the method being implemented by applying to a floating point override function instruction set extending device based on RISC-V as described above, the method comprising:
According to the arithmetic type of the override function, determining the data type of the extended instruction as a floating point, setting an opcode coding field to 1010011, and setting opcodes in tables 1 and 2 to be the same; setting funct, funct field values according to different functions contained in the override function; according to table 1, funct field of four trigonometric function types such as function sin, cos, tan, artan is set to 1000000, and the funct3 field is set from 000 to 011 according to binary encoding format; setting funct field of six functions such as function sinh, cosh, tanh, artanh, exp, ln to 1000100, setting the function 3 field from 000 to 101 according to binary coding format, all of which are shown in table 2, this arrangement has the advantage of simplifying the logic of the decoding module; and encoding the extended instruction according to the instruction format of the RISC-V instruction set, and constructing the extended instruction set of the RISC-V for solving the floating point overrunning function.
Preferably, the source operand address is represented by an rs1 field and the function type is represented by funct, funct, and opcode fields.
Compared with the prior art, the scheme has the following beneficial effects:
1. the number of instructions is small. Compared with a RISC-V architecture basic instruction set, the floating point override function instruction set based on the RISC-V architecture is customized, so that the number of instructions calculated by the floating point override function is greatly reduced, the original sin function is realized by using 63 integer instructions, and after the instructions are compressed, only one instruction is needed, thereby improving the compiling speed of software.
2. Adapting advanced application programming. The scheme fully supports the floating point data type of the IEEE-754 standard by designing floating point override function instructions and a floating point CORDIC unit, and the standard floating point data type is the most widely used standard at present, and the floating point number can be directly defined in a plurality of high-level programming languages, such as C language.
3. The calculation accuracy is high, and the calculation range is large. In the scheme, a CORDIC algorithm is adopted, and IEEE-754 floating point standard is used for processing internal data, and compared with fixed point data types, the floating point standard has the advantages of larger data range, higher calculation precision and self-defined expansion instruction set, so that the purposes of calculating trigonometric functions and hyperbolic functions of a RISC-V processor with high precision are achieved.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings which are required in the description of the embodiments or the prior art will be briefly described, it being obvious that the drawings in the description below are some embodiments of the invention and that other drawings may be obtained from these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic diagram of a processor architecture and a function calculation flow according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a decode-execute portion according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a top layer structure of a CORDIC unit according to the present invention;
fig. 4 is a schematic diagram of an iteration unit of a CORDIC apparatus in accordance with an embodiment of the invention;
Fig. 5 is a schematic diagram of a scaling unit of a CORDIC apparatus in accordance with an embodiment of the invention;
FIG. 6 is a schematic diagram illustrating instruction compression according to an embodiment of the present invention;
FIG. 7 is a schematic diagram of decoding logic according to an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made apparent and fully in view of the accompanying drawings, in which some, but not all embodiments of the invention are shown. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
In the description of the present invention, it should be noted that the directions or positional relationships indicated by the terms "center", "upper", "lower", "left", "right", "vertical", "horizontal", "inner", "outer", etc. are based on the directions or positional relationships shown in the drawings, are merely for convenience of describing the present invention and simplifying the description, and do not indicate or imply that the devices or elements referred to must have a specific orientation, be configured and operated in a specific orientation, and thus should not be construed as limiting the present invention. Furthermore, the terms "first," "second," and "third" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.
The design of the scheme is based on a RISC-V floating point processor, namely, the RISC-V floating point processor architecture is improved. The RISC-V floating point processor applying the scheme can support basic floating point operation and also support floating point overrun function expansion instructions on the basis, mainly relates to FPU design of an execution unit and floating point CORDIC unit structure, and simultaneously designs a corresponding method for floating point overrun function expansion instruction set definition, and the method can be used for constructing a RISC-V instruction set for solving overrun functions such as trigonometric functions, hyperbolic functions, exponential functions, logarithmic functions and the like.
The following is a detailed description of the technical scheme provided by the invention with reference to the accompanying drawings:
The present solution provides a RISC-V custom extended instruction set for constructing a floating point override function, so as to simplify a floating point operation process of solving the override function by a RISC-V processor, and an exemplary processor used in the solution is a processor core including a five-stage pipeline and sequentially single-shot, as shown in fig. 1, including the following main steps: the device provided by the embodiment has at least the following structure: program counter, instruction memory, decoding module, load memory module, register file and data memory; the program counter generates an instruction address, instruction data corresponding to the instruction address is taken out from the instruction memory and sent to a decoding stage, the decoding module decodes the instruction, and then takes out a data value on a corresponding address from a register file according to a source operand 1 address rs1 and a source operand 2 address rs2, particularly, if the instruction is an integer instruction, integer data is taken out from a general register, if the instruction is a floating point type instruction, floating point data is taken out from a floating point register and output to an execution stage; the execution stage executes corresponding function operation based on the extracted data value, selects the data calculated by each functional unit according to the instruction control signal through the multiplexer, and sends the selected data to the loading storage module; the loading storage module sends data to a data memory or a register file, if the instruction is of a loading and storing type, the loading storage module interacts with the data memory, and if the instruction is not, the loading storage module outputs the data to a write-back stage; wherein, the register file is provided with a general register and a floating point register. The decoding module is in bidirectional connection with the register file, the register file receives addresses of source operands rs1 and rs2 from the decoding stage, and then the data is fetched from the register file and is transmitted back to the decoding module.
When the instruction is set, the data type of the extended instruction as the floating point is determined according to the arithmetic type of the override function by combining the table 1 and the table 2, an opcode encoding field is required to be set firstly, the table 1 is an R-type encoding form of a RISC-V floating point extended instruction set, the opcode encoding field is 1010011, and in order to enable the custom floating point override function instruction to be compatible with the RISC-V floating point extended instruction, the opcode of the custom instruction in the table 2 is also set to be 1010011; depending on the different functions involved in the override function, the funct field and funct field are set next, illustratively, as in table 1, the funct field of the four trigonometric functions of function sin, cos, tan, artan is set to 1000000, the funct3 field is set from 000 to 011 according to the binary encoding format; setting funct field of six functions such as function sinh, cosh, tanh, artanh, exp, ln to 1000100, setting the function 3 field from 000 to 101 according to binary coding format, all of which are shown in table 2, this arrangement has the advantage of simplifying the logic of the decoding module; encoding the extended instruction set according to the instruction format of the RISC-V instruction set, and constructing the RISC-V instruction set for solving the floating point overrunning function; the floating point override function extension instruction is also a floating point arithmetic instruction, and is classified into an R-type instruction of a RISC-V floating point instruction set F when the floating point override function extension instruction is defined. After the instruction is compressed, as shown in the example of fig. 6, an original sin function is implemented by using 63 integer instructions according to the prior art, and only one instruction is needed after the instruction is compressed.
The following details 5 main steps:
First, in the instruction fetch stage, an instruction is fetched from the instruction memory using an instruction address generated by a Program Counter (PC).
The specific instruction format is defined by the RISC-V standard, in which the basic integer instruction set RVI and the floating point extended instruction set RVF/D are defined, but the RISC-V standard is not available for the floating point override function custom extended instruction referred to in this embodiment, and is self-defined in this embodiment. In RISC-V, many instruction spaces are reserved for a designer to set, and the floating point override function designed in this embodiment supports ten instructions of sine function sin, cosine function cos, tangent function tan, hyperbolic sine function sinh, hyperbolic cosine function cosh, hyperbolic tangent function tanh, arctangent function artan, archyperbolic tangent function artanh, natural exponential function exp, natural logarithmic function ln, and the like. Since these ten instructions are floating point instructions, the embodiment classifies such instructions into R-type in RVF instruction set in RISC-V standard, table 1 gives the coding form of the type, and illustratively, three fields of rs1, rs2 and rd are addresses of registers for identifying the instructions, wherein rs1 represents source operand 1 address, rs2 represents source operand 2 address, rd represents destination register address, opcode adopts 1010011 of RVF, funct7 is set to 1000000 and 1000100 according to circular coordinates and hyperbolic coordinates, funct3 adopts three-bit binary code of decimal 0-7, wherein funct field represents 3-bit function code, funct field represents 7-bit function code, and opcode field represents instruction type operation code, and since the ten override function instructions all use only a single input variable, the 2 field can be set to 00000, the setting process can be known, the override function implementation of the above-mentioned ten override function instructions is set to be a single operand of rs2, and the implementation of the override function instruction in the instruction set to be a single operand of rs2 is a full-operand of the preferred embodiment of the instruction set to be 2. Illustratively, specific coding conditions may be set as shown in table 2.
R-type encoded version of floating point instruction set F in Table 1 RISC-V
Table 2 custom floating point CORDIC instruction encoding
Secondly, in the decoding stage, the fetched instruction is sent to a decoding module to decode the instruction, the decoding module sends control logic to a subsequent execution module according to the difference of the instructions opcode, function 7 and funct, and reads an operand from a register file according to a register address rs1, rd is a destination register address, and after the instruction is executed, a calculation result is written into an address indicated by rd in the register file. The execution module at least comprises a floating point arithmetic unit FPU and a coordinate rotation digital arithmetic unit CORDIC. Further, the execution module may also include a logic operation unit ALU, for example.
Specifically, the decoding logic decodes the function codes in the instruction, that is, logically determines the codes opcode, funct, funct, and illustratively, as shown in fig. 7, the received 32-bit instruction data is stripped out according to the coding fields shown in table 1, the fields opcode, function 3, and funct7 are compared one by one against the values set in table 2, and finally, the instruction is confirmed as to what function the instruction is. Firstly judging the opcode, if the opcode is 1010011, indicating that the instruction is a floating point type instruction, secondly judging funct, if the opcode is 1000000 or 1000100, indicating that the instruction is a CORDIC instruction proposed by the invention, further judging funct3, and finally determining a logic control signal corresponding to the function selection of a CORDIC expansion circuit unit function according to table 3, and determining the function operation specifically performed by the instruction. If the instruction is finally confirmed to be a sin function instruction, then '0000000001' is output as an instruction control signal to enable the subsequent circuit module to process the sin function.
TABLE 3 single hot code encoded control signal
Third, in the execution stage, the instruction data enters the corresponding execution unit according to the decoding result, and the CORDIC module is a special circuit unit for executing 10 function commands defined in table 2 according to the scheme. For example, if it is determined in the previous stage that the instruction is a sin function instruction, a control signal corresponding to the function is input to the CORDIC unit. The CORDIC unit performs sin function operation, and outputs the result to the multiplexer Mux in fig. 2 after the operation is finished, because in the execution stage of fig. 2, besides the CORDIC unit, there is a logic operation unit ALU and a floating point operation unit FPU, these units generate an output result, these ALU, FPU, CORDIC results are all input to the multiplexer Mux, and finally these results are selectively output through an instruction control signal generated in the decoding stage, for example, if this instruction is a sin instruction and is output by the CORDIC module, the output of the CORDIC is selected as final output data from the results output by the three circuit modules at ALU, FPU, CORDIC.
The relevant floating point override function operation is performed by using the CORDIC unit in fig. 1, fig. 2 shows the complete steps of the decoding-executing stage, firstly, the decoding stage outputs the specific instruction control signal of table 3 through the judgment flowchart in fig. 7, and if the specific instruction control signal related to the override function proposed by the present invention is output, the request for executing is initiated to the CORDIC unit. The decoding stage separates a source operand 1 address rs1 and a source operand 2 address rs2, reads data where the addresses rs1 and rs2 are located from a floating point register if the floating point instruction type is the floating point instruction type and reads data from a general register if the floating point instruction type is the integer instruction type according to the values of rs1 and rs2 and the instruction type. For the floating point override function instruction in the invention, only one address rs1 is used for reading floating point data on the address rs1 from a floating point register and inputting the floating point data into a CORDIC unit to execute iterative calculation of the data. Referring to fig. 2, the decoding module performs logic judgment based on the obtained opcodes, funct, and funct data, and obtains the instruction control signal related to the override function corresponding to table 3 according to the instruction code in table 2, and after receiving the instruction control signal, the CORDIC unit may start the function operation represented by the instruction control signal, for example, the control signal is "0000000001", which means that it is a sin function, and performs the sin function operation on the source operation data extracted from the register file. The arithmetic result is obtained by operation, and in the execution stage, the arithmetic logic unit ALU, the floating point arithmetic unit FPU and the CORDIC unit are provided, so that the arithmetic result is required to be selected and output according to specific instruction control signals, and then the sin function control signal of 0000000001 is continuously used, which indicates that the instruction needs to select the arithmetic result of the CORDIC unit instead of the output of other two functional modules, and finally the arithmetic result of the CORDIC unit is selected as the data output of the execution stage through the multiplexer Mux.
The specific iteration formula of the floating point CORDIC unit of this embodiment is as follows:
(1)
(2)
(3)
(4)
(5)
Hypothetical variables ,/>,/>After r times of iteration, the variable is required to be scaled to obtain a final output value/>,/>
(6)
(7)
(8)
In the above formula, the variables、/>、/>Respectively represent the convergence values of the corresponding variables after r iterations,/>Representing the value of the scale factor after r iterations; all variables in formulas (1) to (8) above use the data form of the IEEE754 floating point standard. /(I),/>Represents the coordinate value after the ith iterative operation,/>Representing the remaining angle value after the ith iteration operation,Is a weight factor,/>Is a scale factor,/>For the rotation angle, coefficient/>Updated according to pattern m, in equation (5), m represents the pattern, and/>Respectively corresponding to a circular coordinate system, a linear coordinate system and a hyperbolic coordinate system, i.e., in this embodiment, m=1 corresponds to a circular coordinate system, m=0 corresponds to a linear coordinate system, and m= -1 corresponds to a hyperbolic coordinate system.
In the above formula calculation, the formulas (1) and (2)And/>Performing a complete floating point multiply-add operation; in the formula (3) and the formula (4)/>And/>To perform single floating point addition and floating point multiplication operations, respectively. In the formula (6) and the formula (7)/>And/>Performs parallel floating point multiplication operations.
The CORDIC unit has a structure as shown in FIG. 3, and comprises an iteration unit, a scaling unit and a storage unit, wherein the storage unit is stored in advance,/>,/>The possible values of the three factors can be calculated in advance and stored in a storage unit in advance, so that the calculation formula can be conveniently and directly called. The external part receives the data input from the decoding stage and the instruction control signal, and outputs the result of one operation.
The specific iteration unit and scaling unit structures of the CORDIC unit performing the iteration are shown in fig. 4 and 5. The CORDIC unit adopts a parallel structure to realize an iterative computation function, and aims at computation results respectively,/>,/>,/>2 Parallel floating point fusion multiply-add operation circuits are arranged, and a floating point addition operation circuit and a floating point multiplication operation circuit are arranged. The scaling unit sets up 2-term parallel floating-point multiplication circuits.
Fourth, in the memory access and write-back phase, the results of the calculation of the CORDIC unit are written back to the floating point registers in the register file.
In yet another embodiment, the specific method of the apparatus when performing the above-mentioned transcendental function operations of trigonometric function, hyperbolic function and the like may be set as follows:
According to the arithmetic type of the override function, determining the data type of the extended instruction as a floating point, setting an opcode coding field to 1010011, and setting opcodes in tables 1 and 2 to be the same; setting funct, funct field values according to different functions contained in the override function; according to table 1, funct field of four trigonometric function types such as function sin, cos, tan, artan is set to 1000000, and the funct3 field is set from 000 to 011 according to binary encoding format; setting funct field of six functions such as function sinh, cosh, tanh, artanh, exp, ln to 1000100, setting the function 3 field from 000 to 101 according to binary coding format, all of which are shown in table 2, this arrangement has the advantage of simplifying the logic of the decoding module; and encoding the extended instruction according to the instruction format of the RISC-V instruction set, and constructing the extended instruction set of the RISC-V for solving the floating point overrunning function.
Preferably, the operation type is represented by an rs2 field, the operand type is represented by an rs1 field, and the function type is represented by funct, funct7 and opcode fields.
In yet another embodiment, this may be implemented by means of a device, which may include corresponding modules performing each or several of the steps of the various embodiments described above. Thus, each step or several steps of the various embodiments described above may be performed by a respective module, and the apparatus may include one or more of these modules. A module may be one or more hardware modules specifically configured to perform the respective steps, or be implemented by a processor configured to perform the respective steps, or be stored within a computer-readable medium for implementation by a processor, or be implemented by some combination.
The device may be implemented using a bus architecture. The bus architecture may include any number of interconnecting buses and bridges depending on the specific application of the hardware and the overall design constraints. The bus connects together various circuits including one or more processors, memories, and/or hardware modules. The bus may also connect various other circuits such as peripherals, voltage regulators, power management circuits, external antennas, and the like.
Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and further implementations are included within the scope of the preferred embodiment of the present invention in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the embodiment of the present invention.
Those skilled in the art will appreciate that implementing all or part of the above-described methods in accordance with the embodiments may be accomplished by way of a computer program stored on a computer readable storage medium, which when executed may comprise the steps of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a random-access Memory (Random Access Memory, RAM), or the like.
The foregoing is merely illustrative of the present invention, and the present invention is not limited thereto, and any changes or substitutions easily contemplated by those skilled in the art within the scope of the present invention should be included in the present invention. Therefore, the protection scope of the invention is subject to the protection scope of the claims.

Claims (10)

1. An instruction set extension apparatus based on RISC-V floating point override functions, the apparatus comprising:
A processor based on RISC-V architecture; the processor comprises a program counter, a decoding module, a plurality of execution units, a loading storage module, a register file, an instruction memory and a data memory;
The plurality of execution units include a CORDIC unit;
the execution units receive the instruction control signals sent by the decoding module to execute CORDIC floating point operation of the overrunning function;
The CORDIC unit adopts a parallel floating point fusion multiplier-adder, a floating point adder and a floating point multiplier to form an iterative structure, and the iterative structure is responsible for completing iterative operation; forming a scaling structure through two parallel floating point multipliers, wherein the scaling structure is responsible for completing scaling operation; the instruction control signal is derived based on an extended instruction set.
2. The apparatus of claim 1, wherein the iterative operation of the CORDIC unit is as follows:
Wherein, ,/>Represents the coordinate value after the ith iterative operation,/>Representing the remaining angle value after the ith iteration operation,Is a weight factor,/>Is a scale factor,/>For the rotation angle, coefficient/>Updating according to the mode m; m represents a pattern of the pattern and,
3. The apparatus according to claim 1, wherein the iterative structure is specifically: 2 parallel floating-point fusion multipliers, 1 floating-point multiplier and 1 floating-point adder.
4. The apparatus of claim 1 wherein the program counter interacts with the instruction memory and the program counter sends instruction data retrieved from the instruction memory to the decode module; the load store module sends the operation result output by the multiplexer Mux to the data memory or register file.
5. The apparatus of claim 1, wherein the extended instruction set is a floating point type instruction, an R-type instruction that falls into a RISC-V floating point instruction set RVF; the instructions in the extended instruction set are all single operand instructions.
6. The apparatus of claim 1, wherein in the extended instruction set, an rs1 field represents an address of a source operand 1, an rs2 field represents an address of a source operand 2, and an rd field represents a destination register address; funct7, funct3 fields represent different instructions, and opcode field represents instruction type operation code;
In the decoding stage, the decoding module removes funct, funct and opcode fields according to the instruction sent by the program counter and compares the fields one by one; after confirming the floating point operation instruction which is the overrun function, forming an instruction control signal corresponding to the overrun function, and sending the instruction control signal to a CORDIC unit; the command control signal corresponds to a specific function.
7. The apparatus of claim 6, wherein the decode module outputs an instruction control signal and initiates execution requests to a plurality of execution units while sending the instruction control signal to the multiplexer Mux, wherein the instruction control signal corresponding to the override function can only be identified by the CORDIC unit;
The CORDIC unit executes corresponding function operation based on the instruction control signal corresponding to the override function and sends an operation result to the multiplexer Mux;
the multiplexer Mux sends the output result of the CORDIC unit to the load storage module as the output result of the multiplexer Mux based on the instruction control signal corresponding to the override function.
8. The apparatus of claim 7, wherein the instructions in the extended instruction set are floating point instructions, and the decode module retrieves corresponding data from a floating point register in the register file based on the value of the rs1 field and inputs the retrieved data to the CORDIC unit to perform the function operation.
9. The apparatus of claim 7, wherein in the write-back stage, the result of the calculation of the CORDIC floating point operation is written back to a floating point register in the register file.
10. A RISC-V floating point override function based instruction set extension method, wherein the method is implemented by applying the RISC-V floating point override function based instruction set extension apparatus according to any one of claims 6 to 8, the method comprising:
According to the arithmetic type of the override function, determining that the extended instruction is a floating point instruction, and the opcode field is 1010011; setting funct and funct function fields of the expansion instruction according to different functions contained in the override function; and encoding the extended instruction according to the instruction format of the RISC-V instruction set, and constructing the extended instruction set of the RISC-V for solving the floating point overrunning function.
CN202410510687.7A 2024-04-26 2024-04-26 Instruction set expansion method and device based on RISC-V floating point overrunning function Pending CN118092853A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410510687.7A CN118092853A (en) 2024-04-26 2024-04-26 Instruction set expansion method and device based on RISC-V floating point overrunning function

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410510687.7A CN118092853A (en) 2024-04-26 2024-04-26 Instruction set expansion method and device based on RISC-V floating point overrunning function

Publications (1)

Publication Number Publication Date
CN118092853A true CN118092853A (en) 2024-05-28

Family

ID=91157807

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410510687.7A Pending CN118092853A (en) 2024-04-26 2024-04-26 Instruction set expansion method and device based on RISC-V floating point overrunning function

Country Status (1)

Country Link
CN (1) CN118092853A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2003050697A1 (en) * 2001-12-06 2003-06-19 University Of Georgia Floating point intensive reconfigurable computing system for iterative applications
CN110998514A (en) * 2017-08-08 2020-04-10 德克萨斯仪器股份有限公司 Interruptible trigonometric operation
CN113138749A (en) * 2021-04-09 2021-07-20 中科亿海微电子科技(苏州)有限公司 Trigonometric function calculation device and method based on CORDIC algorithm
CN116662255A (en) * 2023-06-29 2023-08-29 贵州电网有限责任公司 RISC-V processor realization method and system combined with overrunning function hardware accelerator
CN117634569A (en) * 2023-11-24 2024-03-01 浙江大学 Quantized neural network acceleration processor based on RISC-V expansion instruction

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2003050697A1 (en) * 2001-12-06 2003-06-19 University Of Georgia Floating point intensive reconfigurable computing system for iterative applications
CN110998514A (en) * 2017-08-08 2020-04-10 德克萨斯仪器股份有限公司 Interruptible trigonometric operation
CN113138749A (en) * 2021-04-09 2021-07-20 中科亿海微电子科技(苏州)有限公司 Trigonometric function calculation device and method based on CORDIC algorithm
CN116662255A (en) * 2023-06-29 2023-08-29 贵州电网有限责任公司 RISC-V processor realization method and system combined with overrunning function hardware accelerator
CN117634569A (en) * 2023-11-24 2024-03-01 浙江大学 Quantized neural network acceleration processor based on RISC-V expansion instruction

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
白永强;沈绪榜;罗旻;靳战鹏;: "一种高阶除法器的设计与实现", 微电子学与计算机, no. 01, 5 February 2006 (2006-02-05) *

Similar Documents

Publication Publication Date Title
US10599429B2 (en) Variable format, variable sparsity matrix multiplication instruction
KR101854520B1 (en) Hardware processors and methods for tightly-coupled heterogeneous computing
CN107077321B (en) Instruction and logic to perform fused single cycle increment-compare-jump
US20190102671A1 (en) Inner product convolutional neural network accelerator
JP4480997B2 (en) SIMD integer multiplication upper rounding shift
CN117349584A (en) System and method for implementing 16-bit floating point matrix dot product instruction
CN108647044B (en) Floating point scaling processor, method, system and instructions
JP6711480B2 (en) Method and apparatus for vector index loading and storing
TWI761367B (en) Systems, apparatuses, and methods for fused multiply add
KR101722645B1 (en) Vectorization of collapsed multi-nested loops
KR20240011204A (en) Apparatuses, methods, and systems for instructions of a matrix operations accelerator
CN117407058A (en) System and method for executing fused multiply-add instructions for complex numbers
US9733935B2 (en) Super multiply add (super madd) instruction
TW201604776A (en) Three source operand floating point addition processors, methods, systems, and instructions
CN111767516A (en) System and method for performing floating point addition with selected rounding
JP6760935B2 (en) Methods and equipment for performing arithmetic operations on multiple-precision integers (BIG INTER)
CN108304217B (en) Method for converting long bit width operand instruction into short bit width operand instruction
KR20170036036A (en) Instruction and logic for a vector format for processing computations
JP6738579B2 (en) Apparatus and method for performing checks that optimize instruction flow
KR20210028075A (en) System to perform unary functions using range-specific coefficient sets
CN112540790A (en) Apparatus, method and system for dual spatial mode prefetcher
KR101539173B1 (en) Systems, apparatuses, and methods for reducing the number of short integer multiplications
CN118092853A (en) Instruction set expansion method and device based on RISC-V floating point overrunning function
CN107077333B (en) Method and apparatus for performing vector bit aggregation
TW201810020A (en) Systems, apparatuses, and methods for cumulative product

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination