CN117908962A - Nonlinear calculation method, open source processor, electronic device and storage medium - Google Patents

Nonlinear calculation method, open source processor, electronic device and storage medium Download PDF

Info

Publication number
CN117908962A
CN117908962A CN202410027709.4A CN202410027709A CN117908962A CN 117908962 A CN117908962 A CN 117908962A CN 202410027709 A CN202410027709 A CN 202410027709A CN 117908962 A CN117908962 A CN 117908962A
Authority
CN
China
Prior art keywords
calculation
nonlinear
instruction
risc
source
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202410027709.4A
Other languages
Chinese (zh)
Inventor
胡福平
顾正付
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Tiantian Smart Core Semiconductor Co ltd
Original Assignee
Shanghai Tiantian Smart Core Semiconductor Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Tiantian Smart Core Semiconductor Co ltd filed Critical Shanghai Tiantian Smart Core Semiconductor Co ltd
Priority to CN202410027709.4A priority Critical patent/CN117908962A/en
Publication of CN117908962A publication Critical patent/CN117908962A/en
Pending legal-status Critical Current

Links

Landscapes

  • Complex Calculations (AREA)

Abstract

The application relates to a nonlinear calculation method, an open source processor, electronic equipment and a storage medium, and belongs to the field of data processing. The nonlinear calculation method comprises the following steps: acquiring RISC-V computing instructions for nonlinear computation; decoding the RISC-V calculation instruction, and acquiring a source register address, a destination register address and a nonlinear calculation type carried in the RISC-V calculation instruction; acquiring a source operand required by nonlinear calculation from a source register corresponding to the source register address, and performing nonlinear calculation on the source operand according to the nonlinear calculation type to obtain a nonlinear calculation result; and storing the nonlinear calculation result to a destination register corresponding to the destination register address. The method and the device can improve the efficiency of nonlinear calculation and are suitable for various nonlinear calculation scenes.

Description

Nonlinear calculation method, open source processor, electronic device and storage medium
Technical Field
The application belongs to the field of data processing, and particularly relates to a nonlinear calculation method, an open source processor, electronic equipment and a storage medium.
Background
With the continuous development of artificial intelligence, autopilot, big data, 5G (5 th Generation Mobile Communication Technology, fifth generation mobile communication technology) and other technologies, the complexity of the algorithm is increased, for example, the structural model of the deep learning neural network is more and more complex. The demand for computing power is also increasing, and the demand for special processors such as corresponding AI (ARTIFICIAL INTELLIGENCE ) chips, gpus (Graphics Processing Unit, image processors), npus (Neural Processing Unit, neural network processors), dsps (DIGITAL SIGNAL processors ) and the like is also rapidly increasing as core infrastructures. These special purpose processors involve many nonlinear computations, such as trigonometric functions, exponential functions, root-opening numbers, etc. when running complex algorithms, and how to efficiently and flexibly implement nonlinear operations in the special purpose processors is a difficult problem.
Disclosure of Invention
In view of the above, an object of the present application is to provide a nonlinear computing method, an open source processor, an electronic device, and a storage medium, so as to implement nonlinear operation efficiently and flexibly.
Embodiments of the present application are implemented as follows:
In a first aspect, an embodiment of the present application provides a nonlinear calculation method, including: acquiring RISC-V computing instructions for nonlinear computation; decoding the RISC-V calculation instruction, and acquiring a source register address, a destination register address and a nonlinear calculation type carried in the RISC-V calculation instruction; acquiring a source operand required by nonlinear calculation from a source register corresponding to the source register address, and performing nonlinear calculation on the source operand according to the nonlinear calculation type to obtain a nonlinear calculation result; and storing the nonlinear calculation result to a destination register corresponding to the destination register address.
In the embodiment of the application, the computing instruction based on RISC-V realization has flexible expansibility, can be expanded according to the requirement to support various nonlinear computations, and can effectively and flexibly realize nonlinear operation by acquiring the source register address, the destination register address and the nonlinear computation type carried in the RISC-V computing instruction and carrying out nonlinear computation on a source operand according to the nonlinear computation type during computation, thereby improving the nonlinear computation efficiency and being capable of using nonlinear computation in various scenes.
With reference to one possible implementation manner of the first aspect embodiment, the RISC-V computing instruction further includes an instruction operation code, and before acquiring a source register address, a destination register address, and a nonlinear computing type carried in the RISC-V computing instruction, the nonlinear computing method further includes: and determining that the RISC-V calculation instruction is a nonlinear calculation instruction according to the instruction operation code.
In the embodiment of the application, the RISC-V calculation instruction is determined to be a nonlinear calculation instruction by carrying the instruction operation code on the RISC-V calculation instruction, so that the accuracy of nonlinear calculation can be improved.
With reference to one possible implementation manner of the embodiment of the first aspect, if the instruction operation code is an instruction operation code of a first nonlinear type; the non-linear computation types include: sin (x), cos (x), sinh (x), dash (x), e -x、ex,1/X; if the instruction operation code is the second nonlinear type instruction operation code; the non-linear computation types include: one of tan (x), tan h (x), tan -1x、tanh-1 x.
In the embodiment of the application, the instruction operation code can determine that the RISC-V computing instruction is a nonlinear computing instruction, and can further determine the nonlinear computing type (such as a first nonlinear type or a second nonlinear type), so that the number of data bits for representing the nonlinear computing type can be further reduced, for example, the original 4 bits (which represent 16 types of two nonlinear types at most) are changed into 3 bits (which represent 8 types of two nonlinear types at most).
With reference to a possible implementation manner of the first aspect embodiment, the RISC-V computing instruction further includes a computing precision; performing nonlinear calculation on the source operand according to the nonlinear calculation type to obtain a nonlinear calculation result, wherein the nonlinear calculation result comprises: performing nonlinear calculation on the source operand according to the nonlinear calculation type by using a CORDIC algorithm; and outputting a nonlinear calculation result according to the calculation precision when the rotation calculation cumulative value of the CORDIC algorithm is smaller than the calculation precision, wherein the number of bits of the decimal part of the nonlinear calculation result is determined by the calculation precision.
In the embodiment of the application, by introducing the calculation precision, when in nonlinear calculation, whether the CORDIC rotation calculation meets the completion condition or not can be judged according to the decimal bit number of the calculation precision, and when the completion condition is met (the rotation calculation cumulative value of the CORDIC algorithm is smaller than the calculation precision), the nonlinear calculation result can be output according to the calculation precision, so that the nonlinear calculation can be suitable for various precision requirements.
With reference to a possible implementation manner of the first aspect embodiment, the RISC-V computing instruction further includes an input data format and an output data format; and when the source operand is acquired, acquiring the source operand according to the input data format, and when the nonlinear calculation result is stored, storing the nonlinear calculation result according to the output data format.
In the embodiment of the application, by introducing the input data format and the output data format, the scheme can support the nonlinear computation of various data types, such as integer, fixed point number, presence or absence of symbol, byte (8 bit), half-word (16 bit), word (32 bit) type and the like, and can use the nonlinear computation of various scenes.
With reference to a possible implementation manner of the embodiment of the first aspect, the method is applied to an open source processor, where the open source processor includes a plurality of computing modules, and the source operand is vector data; performing nonlinear computation on the source operand according to the nonlinear computation type, including: and calling the plurality of calculation modules to perform nonlinear calculation on source operands in the vector data in parallel according to the nonlinear calculation type.
In the embodiment of the application, when the source operand is vector data, a plurality of calculation modules can be simultaneously called to perform nonlinear calculation on the source operand in the vector data in parallel according to the nonlinear calculation type so as to improve the operation efficiency.
In a second aspect, an embodiment of the present application further provides an open source processor, including: register group, instruction fetch decoding module and calculating module; a register set for storing a source operand required for nonlinear computation and storing a nonlinear computation result; the instruction fetching decoding module is used for acquiring RISC-V computing instructions for nonlinear computation, decoding the RISC-V computing instructions and acquiring source register addresses, destination register addresses and nonlinear computing types carried in the RISC-V computing instructions; and the calculation module is used for acquiring a source operand required by nonlinear calculation from a source register corresponding to the source register address, carrying out nonlinear calculation on the source operand according to the nonlinear calculation type to obtain a nonlinear calculation result, and storing the nonlinear calculation result into a destination register corresponding to the destination register address.
With reference to a possible implementation manner of the second aspect embodiment, the number of the computing modules is a plurality, and the source operand is vector data; the open source processor further comprises: the control module is connected with the calculation modules and is used for calling the plurality of calculation modules to perform nonlinear calculation on source operands in the vector data in parallel according to the nonlinear calculation types.
With reference to a possible implementation manner of the second aspect embodiment, the RISC-V computing instruction further includes an instruction operation code; the instruction fetching and decoding module is further configured to determine that the RISC-V computing instruction is a nonlinear computing instruction according to the instruction operation code before acquiring a source register address, a destination register address and a nonlinear computing type carried in the RISC-V computing instruction.
With reference to a possible implementation manner of the second aspect embodiment, the RISC-V computing instruction further includes a computing precision; the calculation module is used for carrying out nonlinear calculation on the source operand according to the nonlinear calculation type by utilizing a CORDIC algorithm; and outputting a nonlinear calculation result according to the calculation precision when the rotation calculation cumulative value of the CORDIC algorithm is smaller than the calculation precision, wherein the number of bits of the decimal part of the nonlinear calculation result is determined by the calculation precision.
With reference to a possible implementation manner of the second aspect embodiment, each computing module includes n rotation computing units connected in series, where n is an integer greater than or equal to 2; the rotation calculation cumulative values of the n rotation calculation units are sequentially decreased, the first rotation calculation unit in the n rotation calculation units is used for carrying out nonlinear calculation on the source operand according to the nonlinear calculation type by utilizing a CORDIC algorithm, and each of the other rotation calculation units is used for carrying out nonlinear calculation on the operand input to the rotation calculation unit according to the nonlinear calculation type by utilizing the CORDIC algorithm, wherein when the rotation calculation cumulative value of the rotation calculation unit is smaller than the calculation precision, the nonlinear calculation result is output according to the calculation precision.
In a third aspect, an embodiment of the present application further provides an electronic device, including the open source processor as provided in the foregoing second aspect embodiment and/or in combination with any one of the possible implementation manners of the second aspect embodiment.
In a fourth aspect, an embodiment of the present application further provides an electronic device, including: the device comprises a memory and a processor, wherein the processor is connected with the memory; the memory is used for storing programs; the processor is configured to invoke a program stored in the memory to perform the method as provided by the embodiment of the first aspect and/or any possible implementation of the embodiment of the first aspect.
In a fifth aspect, embodiments of the present application also provide a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the method as provided by the embodiments of the first aspect and/or any one of the possible implementations in combination with the embodiments of the first aspect.
Additional features and advantages of the application will be set forth in the description which follows. The objectives and other advantages of the application may be realized and attained by the structure particularly pointed out in the written description and drawings.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art. The above and other objects, features and advantages of the present application will become more apparent from the accompanying drawings.
Fig. 1 shows a flow chart of a nonlinear calculation method according to an embodiment of the present application.
Fig. 2 shows a schematic structural diagram of an open source processor according to an embodiment of the present application.
Fig. 3 is a schematic diagram of a computing module according to an embodiment of the present application.
Fig. 4 is a schematic diagram of calculating a plurality of exponential functions by using single running water according to an embodiment of the present application.
Fig. 5 shows a schematic diagram of a single-flow nonlinear calculation according to an embodiment of the present application.
Fig. 6 is a schematic diagram of a multi-pipeline nonlinear calculation according to an embodiment of the present application.
Fig. 7 is a schematic diagram of a nonlinear calculation suspension instruction pipeline according to an embodiment of the present application.
Fig. 8 shows a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be described below with reference to the accompanying drawings in the embodiments of the present application. It will be apparent that the described embodiments are only some, but not all, embodiments of the application. The following examples are given by way of illustration for more clearly illustrating the technical solution of the present application, and are not to be construed as limiting the scope of the application. Those skilled in the art will appreciate that the embodiments described below and features of the embodiments can be combined with one another without conflict.
It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further definition or explanation thereof is necessary in the following figures. Meanwhile, relational terms such as "first," "second," and the like may be used solely to distinguish one entity or action from another entity or action in the description of the application without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
Furthermore, the term "and/or" in the present application is merely an association relationship describing the association object, and indicates that three relationships may exist, for example, a and/or B may indicate: a exists alone, A and B exist together, and B exists alone.
In the description of the embodiments of the present application, unless explicitly specified and limited otherwise, the term "electrically coupled" may be either directly or indirectly via an intermediate medium.
In order to efficiently and flexibly realize nonlinear operation, the embodiment of the application provides a nonlinear calculation method, an open source processor, electronic equipment and a storage medium. The nonlinear calculation method provided by the embodiment of the application is realized based on RISC-V instructions. Wherein RISC-V is an open instruction set architecture (Instruction Set Architecture, ISA) built based on a reduced instruction set RISC (Reduced Instruction Set Computer ), implemented by a Bokrill-division-related team at the university of California, wherein RISC-V represents the fifth generation of RISC.
For better understanding, a nonlinear calculation method provided by an embodiment of the present application is described below with reference to fig. 1.
S1: RISC-V calculation instructions for non-linear calculation are fetched.
When nonlinear calculation is needed, RISC-V calculation instructions for nonlinear calculation are obtained, wherein the RISC-V calculation instructions are calculation instructions realized based on RISC-V, have flexible expansibility, and can be expanded according to the needs to support various nonlinear calculations.
The RISC-V calculation instruction carries an active register address (shown as Rs), a destination register address (shown as Rd) and a nonlinear calculation type (shown as Funct 3).
The RISC-V calculation instruction provided by the embodiment of the application comprises a source register address, a destination register address and a nonlinear calculation type, wherein a source operand required by nonlinear calculation can be acquired from a corresponding source register according to the source register address, a nonlinear calculation result can be stored into the corresponding destination register according to the destination register address, and the specific nonlinear calculation can be determined according to the nonlinear calculation type. The nonlinear calculation in the embodiment of the application relates to sin (x), cos (x), sinh (x), dash (x), e -x、ex,Nonlinear calculations of 1/x, tan (x), tan h (x), tan -1x、tanh-1 x. Different values of Funct3 represent different non-linear calculation types, such as funct3=4 ' b0000 for sin (x), 4' b0001 for cos (x), 4' b0010 for sin (x), and so on. Where x represents the input parameter, i.e. the source operand.
S2: and decoding the RISC-V calculation instruction, and acquiring a source register address, a destination register address and a nonlinear calculation type carried in the RISC-V calculation instruction.
After the RISC-V calculation instruction is obtained, the RISC-V calculation instruction is decoded to obtain a source register address, a destination register address and a nonlinear calculation type carried in the RISC-V calculation instruction.
S3: and acquiring a source operand required by nonlinear calculation from a source register corresponding to the source register address, and performing nonlinear calculation on the source operand according to the nonlinear calculation type to obtain a nonlinear calculation result.
After the source register address, the destination register address and the nonlinear calculation type carried in the RISC-V calculation instruction are obtained, a source operand required by nonlinear calculation can be obtained from a source register corresponding to the source register address, and nonlinear calculation is carried out on the source operand according to the nonlinear calculation type, so that a nonlinear calculation result is obtained. For example, if the nonlinear calculation type is sin (x), sin (x) calculation is performed on the source operand x.
The data types of the source operands shown in the present application may be integer data (only integer parts), fixed point data (including integers and fractions), wherein the integer data, fixed point data may be signed or unsigned data, i.e., integer data (signed or unsigned), fixed point data (signed or unsigned). Wherein, both integer data and fixed point data can support byte (8 bit), half-word (16 bit) and word (32 bit) types. Further, the source operand may be a single operand or vector data (including a plurality of single operands, which may be regarded as a data set).
If the data is byte type, 8 bits of data are contained, if the data is signed, the highest 1bit is sign bit, the rest 7 bits are data, and if the data is unsigned, the 8 bits are all data. Similarly, if the data is half-word type, 16 bits of data are included, if the data is signed data, the highest 1bit is sign bit, the rest 15 bits are data, and if the data is unsigned data, the 16 bits are all data. Similarly, if the data is word type, the data contains 32 bits, if the data is signed, the highest 1bit is sign bit, the rest 31 bits are data, and if the data is unsigned, the 32 bits are all data.
For fixed point data, the number of decimal parts is larger than that of integer data, that is, the fraction of the remainder of the bit data excluding the sign bit is an integer, and the fraction is a decimal, for example, if the data is byte type and signed, the remaining 7 bits of the data are the fraction of the decimal and the fraction of the decimal is an integer.
S4: and storing the nonlinear calculation result to a destination register corresponding to the destination register address.
After the nonlinear calculation result is obtained, the nonlinear calculation result can be stored into a destination register corresponding to the destination register address.
For more efficient non-linear computation, in an alternative implementation, the RISC-V computation instruction further includes an instruction Opcode (e.g., as represented by Opcode), where the current RISC-V computation instruction may be determined to be a non-linear computation instruction based on the instruction Opcode. In this embodiment, before acquiring the source register address, the destination register address, and the nonlinear calculation type carried in the RISC-V calculation instruction, the nonlinear calculation method further includes: and determining the RISC-V calculation instruction as a nonlinear calculation instruction according to the instruction operation code. For example, the instruction operation code of the nonlinear calculation instruction may be stored in advance, then the instruction operation code carried in the RISC-V calculation instruction is compared with the instruction operation code stored in advance, and if the instruction operation code is consistent with the instruction operation code, the current RISC-V calculation instruction is determined to be the nonlinear calculation instruction.
In an alternative embodiment, the instruction Opcode may also be used to determine a type of non-linear computation, e.g., if the instruction Opcode is of a first non-linear type, e.g., opcode=7' b1111011; determining the type of nonlinear computation includes: sin (x), cos (x), sinh (x), dash (x), e -x、ex,1/X; if the instruction Opcode is of a second non-linear type, for example opcode=7' b1111001; determining the type of nonlinear computation includes: one of tan (x), tan h (x), tan -1x、tanh-1 x.
When the instruction operation code is of a first nonlinear type, the nonlinear calculation type is specifically of a first nonlinear type (such as sin (x), cos (x), sinh (x), cosh (x), e -x、ex,1/X), then determined by the type of nonlinear computation; similarly, when the instruction operation code is the instruction operation code of the second nonlinear type, the nonlinear calculation type is specifically the second nonlinear type (such as tan (x), tan h (x), tan -1x、tanh-1 x), and is determined by the nonlinear calculation type.
For example, when Opcode is 7' b1111011, the relationship of the value of function 3 to the nonlinear calculation type is shown in table 1.
TABLE 1
When Opcode is 7' b1111001, the relationship between the value of function 3 and the nonlinear calculation type is shown in table 2.
TABLE 2
Funct3 Type
3’b000 tan-1x
3’b001 tanh-1x
3’b010 tan(x)
3’b011 tanh(x)
In an alternative embodiment, the RISC-V calculation instructions may also include a calculation accuracy (as represented by Immu:0) for determining the number of bits of the fractional portion of the non-linear calculation result. Since the nonlinear calculation is mostly implemented based on CORDIC (Coordinate Rotation Digital Computer ) algorithm, in one possible implementation, the process of obtaining the nonlinear calculation result by performing the nonlinear calculation on the source operand according to the nonlinear calculation type in S3 may be: performing nonlinear calculation on the source operand according to the nonlinear calculation type by utilizing a CORDIC algorithm; when the rotation calculation cumulative value (which can be understood as precision) of the CORDIC algorithm is smaller than the calculation precision, a nonlinear calculation result is output according to the calculation precision, wherein the number of bits of the fractional part of the nonlinear calculation result is determined by the calculation precision. The calculation accuracy supports the Q15.16 format at most, Q in the Q15.16 format represents that the most significant bit is a sign bit (whether a sign exists or not), 15 represents that an integer has 15bits, and 16 represents that a decimal has 16bits. The nonlinear calculation judges whether the CORDIC rotation calculation meets the completion condition according to the bit number of the decimal, for example Immu [5:0] is 8, the precision is 2 -8, namely 0.015625, and when the cumulative value of the CORDIC rotation calculation is smaller than the precision, the calculation is completed.
For example, sin (x), cos (x), sinh (x), dash (x), e -x、ex, for the above examples,The nonlinear calculation of 1/x, tan (x), tan h (x) and tan -1x、tanh-1 x can be realized by adopting a CORDIC algorithm.
The CORDIC algorithm supports a rotation mode and a vector mode, and the vector mode adopts a pseudo rotation mode to realize calculation. For example, trigonometric function (e.g., sin (x), cos (x), sinh (x), cosh (x)) computation may be accomplished using a rotation pattern, and exponential function (e.g., e -x、ex), open square may be accomplished using a vector patternCalculation of relevant functions such as reverse and forward cotangent (tan -1x、tanh-1 x). Wherein, 1/x can be realized by division operation of 1 and x through vector mode, or conventional division operation; the tan (x) function can be realized by sin (x) and cos (x) through vector mode division operation or conventional division operation; and tan (x) may be implemented by sin (x) and dash (x) by vector mode division operations, or conventional division operations.
In an alternative embodiment, the RISC-V calculation instructions further comprise an input data format and an output data format, wherein the input data format and the output data format may be represented by Funct 7. I.e. the input data format and the output data format can be determined according to the value of the function 7 field, for example, the correspondence between the value of the function 7 field and the input data format and the output data format is shown in table 3.
TABLE 3 Table 3
Funct7 Data Type
6’b000000 Integer byte
6’b000001 Integer half-word
6’b000010 Integer word
6’b000011 Fixed point number byte
6’b000100 Fixed point number half-word
6’b000101 Fixed point word
6’b000110 Integer vector byte
6’b000111 Integer vector half-word
6’b001000 Fixed point number vector byte
6’b001001 Fixed point number vector half-word
If the calculation is an integer, inputting x is an integer, and outputting a result to an integer part; if the fixed point number is calculated, the input and output are fixed point numbers.
The nonlinear computation in the embodiment of the application supports single-pipeline computation and multi-pipeline parallel computation, for example, when the source operand is vector data, the nonlinear computation can be single-pipeline computation or multi-pipeline computation. For example, in one embodiment, the nonlinear computing method is applied to an open source processor (such as a RISC-V processor), where the open source processor includes a plurality of computing modules, and when the source operand is vector data, the process of performing nonlinear computation on the source operand according to the nonlinear computation type in S3 may be to call the plurality of computing modules to perform nonlinear computation on the source operand in the vector data in parallel according to the nonlinear computation type, so that the computing efficiency may be improved.
It will be appreciated that the above only illustrates the case where the input data format and the output data format are the same, and that in an alternative embodiment the input data format and the output data format may be different.
For better understanding of the above-described nonlinear calculations, in one embodiment, the RISC-V calculation instruction is a 32bit calculation instruction, the structure of which is shown in Table 4.
TABLE 4 Table 4
Wherein, opcode represents instruction operation code, rd represents destination register address, funct3 represents nonlinear calculation type, rs represents source register address, immu:0 represents calculation accuracy, and Funct7 represents input data format and output data format. It can be appreciated that the values of Opcode, rd, funct, rs, immu [5:0], and Funct7 are all configured, so that different nonlinear calculations are realized to adapt to various nonlinear calculation scenes.
By adopting the calculation instruction shown in the embodiment of the application, the nonlinear calculation of different nonlinear calculation types, different input and output data formats, different calculation precision and the like can be rapidly completed by only slightly changing the values of different fields in the calculation instruction, so that the data processing is accelerated, the calculation speed is improved, the efficiency of the nonlinear calculation is improved, and the nonlinear calculation of various scenes can be used.
The nonlinear calculation method provided by the application can simultaneously perform a plurality of nonlinear calculations according to different precision requirements, and the types of the nonlinear calculations can be different. For example, for 32-bit register bit width (same for 64-bit register expansion method), nonlinear calculation of integer and fixed point number formats can be supported, calculation of 32/16/8-bit signed or unsigned data formats and calculation of 32/16/8-bit signed or unsigned fixed point number formats can be supported according to SIMD (Single Instruction Multiple DATA STREAM, single instruction stream multiple data stream) and vector expansion, and corresponding fixed point number formats can be selectively configured according to precision requirements.
The embodiment of the application also provides an open source processor supporting parallel computing, wherein the open source processor is a processor based on an open source instruction architecture, and can be a RISC-V processor, for example, a RISC-V central processing unit (Central Processing Unit, CPU). The processor may be a central processing unit, a network processor (Network Processor, NP), a microprocessor, or the like; and may also be a digital signal processor (DIGITAL SIGNAL processor, DSP), application SPECIFIC INTEGRATED Circuit (ASIC), field programmable gate array (Field Programmable GATE ARRAY, FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components. Or any conventional processor or the like.
As shown in fig. 2, the open source processor includes: register group, instruction fetch decoding module and calculating module. The instruction fetching decoding module and the register group are respectively connected with the calculation module.
The register set is used for storing source operands required by nonlinear calculation and storing nonlinear calculation results.
The instruction fetching and decoding module is used for acquiring RISC-V computing instructions for nonlinear computation, decoding the RISC-V computing instructions, and acquiring source register addresses, destination register addresses and nonlinear computation types carried in the RISC-V computing instructions.
The calculation module is used for acquiring a source operand required by nonlinear calculation from a source register corresponding to a source register address, carrying out nonlinear calculation on the source operand according to a nonlinear calculation type to obtain a nonlinear calculation result, and storing the nonlinear calculation result into a destination register corresponding to a destination register address.
In addition, the open source processor may further include an instruction cache and a control module, where the control module is connected to the instruction fetch decoding module and the computing module, respectively, and the control module is used to control processing logic of the instruction fetch decoding module, the computing module, and the like. The instruction cache is used to cache RISC-V calculation instructions.
In order to support parallel computing, the number of the computing modules can be multiple, and when the source operands are vector data, the control module is used for calling the computing modules to perform nonlinear computing on the source operands in the vector data in parallel according to the nonlinear computing types.
In an alternative embodiment, the RISC-V calculation instruction further includes an instruction opcode; the instruction fetch decoding module is further configured to determine that the RISC-V computing instruction is a nonlinear computing instruction according to the instruction operation code before the source register address, the destination register address, and the nonlinear computing type carried in the RISC-V computing instruction are obtained.
In an alternative embodiment, the RISC-V calculation instructions further include a calculation precision; the calculation module is used for carrying out nonlinear calculation on the source operand according to the nonlinear calculation type by utilizing a CORDIC algorithm; when the rotation calculation cumulative value of the CORDIC algorithm is smaller than the calculation accuracy, outputting a nonlinear calculation result according to the calculation accuracy, wherein the number of bits of the decimal part of the nonlinear calculation result is determined by the calculation accuracy.
In an alternative embodiment, each calculation module includes n rotation calculation units connected in series, as shown in fig. 3, where n is an integer greater than or equal to 2.M0 and M1 … … Mn in fig. 3 are rotation calculation units, each of which may perform a nonlinear calculation based on a lookup table (lookup table) algorithm in the CORDIC algorithm. the rotation calculation units shown as M0 and M1 … … Mn in fig. 3 may be implemented by using existing CORDIC operation modules.
The rotation calculation cumulative values of the n rotation calculation units are sequentially decreased, the first rotation calculation unit in the n rotation calculation units is used for carrying out nonlinear calculation on a source operand according to a nonlinear calculation type by utilizing a CORDIC algorithm, and each of the other rotation calculation units is used for carrying out nonlinear calculation on an operand input to the rotation calculation unit according to the nonlinear calculation type by utilizing the CORDIC algorithm, wherein when the rotation calculation cumulative value of the rotation calculation unit is smaller than calculation precision, a nonlinear calculation result is output according to the calculation precision. For example, such as the accuracy Δ of M0.ltoreq.0.1, the accuracy Δ of M1.ltoreq.0.01, the accuracy Δ of M2.ltoreq.0.001, the accuracy Δ of M3.ltoreq.0.0001, and so on.
Wherein, except for the first rotation calculating unit (M0), the input of each rotation calculating unit is the output Data of the previous rotation calculating unit, such as the output Data0 of M0 for the input of M1, the output Data1 of M1 for the input of M2 in fig. 3, and so on, the output Datan of Mn.
The input a0, b0, z0 in fig. 3 are initial values, and each rotation calculation unit may be a calculation unit based on the CORDIC algorithm, supporting a rotation mode and a vector mode. In the rotation mode, a0 is a compensation factor of the CORDIC algorithm, b0 is 0, z0 is an input x value, for example, sin (x) is calculated, and x is an angle value. All three input values are in Q15.16 fixed point number format. An represents the value of cos (x) or cosh (x), bn represents the value of sin (x) or sinh (x), and zn represents the residual value (tending to 0) after rotation, which are obtained by rotating the calculated values of an, bn, zn, and the calculated values of a trigonometric function. For the vector mode, a0 and b0 are initial coordinate values, and z0 may be 0. For exponential function calculations, only an, which is the residual value after rotation (trending to 0), and bn, which is the value of the calculation result e x or e -x, are concerned. For open square calculations, an representsAnd bn is the residual value after rotation (tends to 0). The zn is calculated for tan - 1x,tanh-1 x as the final calculation, i.e. the rotation angle.
The nonlinear computation provided by the embodiment of the application is based on an instruction pipeline implementation, such as continuously executing a plurality of exponential functions, and the computation principle is shown in fig. 4. Wherein fig. 4 is a schematic diagram of a single running water calculating multiple exponential functions.
If the vector is calculated by SIMD type, two modes are supported: the single computing module and the multiple computing modules are pipelined in parallel, for example, the nonlinear computation of 4 byte data is completed, and a single computing module or multiple computing modules can be called, and schematic diagrams of the single computing module or the multiple computing modules are shown in fig. 5 and 6. Fig. 5 is a schematic diagram of a single-stream nonlinear calculation for 4 byte data by calling a single calculation module, and fig. 6 is a schematic diagram of a multi-stream nonlinear calculation for 4 byte data by calling 4 calculation modules. As can be seen from the comparison, 2n+1 cycles (cycle) are required to perform the nonlinear computation of 4 byte data in the manner shown in FIG. 5, while n+1 cycles (cycle) are only required to perform the nonlinear computation of 4 byte data in the manner shown in FIG. 6.
The nonlinear computation provided by the embodiment of the application relates to five stages of pipeline, namely 5 stages (instruction fetching, decoding, executing, memory access and write-back), and when the pipeline enters the executing stage, the instruction fetching and decoding enter the suspension stage until the execution is completed, and the subsequent instruction fetching and decoding are started. For example, when a nonlinear computation enters the execution phase, the execution signals a stall to halt the pipeline, so that the fetch and decode phase can be kept waiting until the execution computation is complete, the schematic diagram of which is shown in FIG. 7.
The nonlinear computing principle provided by the open source processor embodiment, its implementation principle and the technical effects produced by the nonlinear computing principle are the same as those of the foregoing method embodiment, and for brevity, reference may be made to the corresponding content in the foregoing method embodiment where the portion of the open source processor embodiment is not mentioned.
The embodiment of the application also provides electronic equipment which comprises the open source processor. In an alternative implementation manner, a block diagram of an electronic device 200 according to an embodiment of the present application is shown in fig. 8. The electronic device 200 includes: a transceiver 210, a memory 220, a communication bus 230, and a processor 240. Processor 240 may be an open source processor, such as a RISC-V processor, among others.
The transceiver 210, the memory 220, and the processor 240 are electrically connected directly or indirectly to each other to realize data transmission or interaction. For example, the components may be electrically coupled to each other via one or more communication buses 230 or signal lines. Wherein the transceiver 210 is configured to transmit and receive data. The memory 220 is used to store a computer program comprising at least one software function module that may be stored in the memory 220 in the form of software or Firmware (Firmware) or cured in an Operating System (OS) of the electronic device 200. The processor 240 is configured to execute a computer program stored in the memory 220. For example, the processor 240 is configured to perform the nonlinear calculation method described above.
The memory 220 may be, but is not limited to, random access memory (Random Access Memory, RAM), read Only Memory (ROM), programmable read only memory (Programmable Read-only memory, PROM), erasable read only memory (Erasable Programmable Read-only memory, EPROM), electrically erasable read only memory (Electric Erasable Programmable Read-only memory, EEPROM), etc.
The electronic device 200 includes, but is not limited to, a mobile phone, a tablet, a computer, etc.
The embodiment of the present application further provides a non-volatile computer readable storage medium (hereinafter referred to as a storage medium) having a computer program stored thereon, which when executed by a computer such as the electronic device 200 described above, performs the above-described nonlinear calculation method.
It should be noted that, in the present specification, each embodiment is described in a progressive manner, and each embodiment is mainly described as different from other embodiments, and identical and similar parts between the embodiments are all enough to be referred to each other.
In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other manners. The apparatus embodiments described above are merely illustrative, for example, of the flowcharts and block diagrams in the figures that illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
In addition, functional modules in the embodiments of the present application may be integrated together to form a single part, or each module may exist alone, or two or more modules may be integrated to form a single part.
The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a computer-readable storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a notebook computer, a server, or an electronic device, etc.) to perform all or part of the steps of the method according to the embodiments of the present application. And the aforementioned computer-readable storage medium includes: a U-disk, a removable hard disk, a read-only memory (ROM), a random access memory (Random Access Memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
The foregoing is merely illustrative of the present application, and the present application is not limited thereto, and any person skilled in the art will readily recognize that variations or substitutions are within the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (14)

1. A method of nonlinear computation, comprising:
acquiring RISC-V computing instructions for nonlinear computation;
Decoding the RISC-V calculation instruction, and acquiring a source register address, a destination register address and a nonlinear calculation type carried in the RISC-V calculation instruction;
acquiring a source operand required by nonlinear calculation from a source register corresponding to the source register address, and performing nonlinear calculation on the source operand according to the nonlinear calculation type to obtain a nonlinear calculation result;
And storing the nonlinear calculation result to a destination register corresponding to the destination register address.
2. The method of claim 1, wherein the RISC-V calculation instruction further comprises an instruction opcode, the method further comprising, prior to fetching the source register address, the destination register address, and the type of nonlinear calculation carried in the RISC-V calculation instruction:
and determining that the RISC-V calculation instruction is a nonlinear calculation instruction according to the instruction operation code.
3. The method of claim 2, wherein if the instruction opcode is of a first nonlinear type; the non-linear computation types include: sin (x), cos (x), sinh (x), dash (x), e -x、ex,1/X;
If the instruction operation code is the second nonlinear type instruction operation code; the non-linear computation types include: one of tan (x), tan h (x), tan -1x、tanh-1 x.
4. A non-linear computing method according to any one of claims 1-3, wherein the RISC-V computing instruction further includes a computing precision; performing nonlinear calculation on the source operand according to the nonlinear calculation type to obtain a nonlinear calculation result, wherein the nonlinear calculation result comprises:
performing nonlinear calculation on the source operand according to the nonlinear calculation type by using a CORDIC algorithm;
And outputting a nonlinear calculation result according to the calculation precision when the rotation calculation cumulative value of the CORDIC algorithm is smaller than the calculation precision, wherein the number of bits of the decimal part of the nonlinear calculation result is determined by the calculation precision.
5. The method of claim 4, wherein the RISC-V calculation instructions further include an input data format and an output data format;
And when the source operand is acquired, acquiring the source operand according to the input data format, and when the nonlinear calculation result is stored, storing the nonlinear calculation result according to the output data format.
6. The method of claim 1, wherein the method is applied to an open source processor comprising a plurality of calculation modules, the source operands being vector data; performing nonlinear computation on the source operand according to the nonlinear computation type, including:
and calling the plurality of calculation modules to perform nonlinear calculation on source operands in the vector data in parallel according to the nonlinear calculation type.
7. An open source processor, comprising:
a register set for storing a source operand required for nonlinear computation and storing a nonlinear computation result;
The instruction fetching decoding module is used for acquiring RISC-V computing instructions for nonlinear computation, decoding the RISC-V computing instructions and acquiring source register addresses, destination register addresses and nonlinear computing types carried in the RISC-V computing instructions;
And the calculation module is used for acquiring a source operand required by nonlinear calculation from a source register corresponding to the source register address, carrying out nonlinear calculation on the source operand according to the nonlinear calculation type to obtain a nonlinear calculation result, and storing the nonlinear calculation result into a destination register corresponding to the destination register address.
8. The open source processor of claim 7 wherein the number of computation modules is a plurality and the source operands are vector data; the open source processor further comprises:
The control module is connected with the calculation modules and is used for calling the plurality of calculation modules to perform nonlinear calculation on source operands in the vector data in parallel according to the nonlinear calculation types.
9. The open source processor of claim 7 wherein said RISC-V calculation instructions further comprise instruction opcodes; the instruction fetching and decoding module is further configured to determine that the RISC-V computing instruction is a nonlinear computing instruction according to the instruction operation code before acquiring a source register address, a destination register address and a nonlinear computing type carried in the RISC-V computing instruction.
10. The open source processor of claim 8 wherein said RISC-V calculation instructions further comprise a calculation precision; the calculation module is used for carrying out nonlinear calculation on the source operand according to the nonlinear calculation type by utilizing a CORDIC algorithm; and outputting a nonlinear calculation result according to the calculation precision when the rotation calculation cumulative value of the CORDIC algorithm is smaller than the calculation precision, wherein the number of bits of the decimal part of the nonlinear calculation result is determined by the calculation precision.
11. The open source processor of claim 10 wherein each of said computing modules comprises n rotating computing units in series, n being an integer greater than or equal to 2;
The rotation calculation cumulative values of the n rotation calculation units are sequentially decreased, the first rotation calculation unit in the n rotation calculation units is used for carrying out nonlinear calculation on the source operand according to the nonlinear calculation type by utilizing a CORDIC algorithm, and each of the other rotation calculation units is used for carrying out nonlinear calculation on the operand input to the rotation calculation unit according to the nonlinear calculation type by utilizing the CORDIC algorithm, wherein when the rotation calculation cumulative value of the rotation calculation unit is smaller than the calculation precision, the nonlinear calculation result is output according to the calculation precision.
12. An electronic device comprising an open source processor as claimed in any one of claims 7-11.
13. An electronic device, comprising:
The device comprises a memory and a processor, wherein the processor is connected with the memory;
the memory is used for storing programs;
The processor is configured to invoke a program stored in the memory to perform the method of any of claims 1-6.
14. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, performs the method according to any of claims 1-6.
CN202410027709.4A 2024-01-08 2024-01-08 Nonlinear calculation method, open source processor, electronic device and storage medium Pending CN117908962A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410027709.4A CN117908962A (en) 2024-01-08 2024-01-08 Nonlinear calculation method, open source processor, electronic device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410027709.4A CN117908962A (en) 2024-01-08 2024-01-08 Nonlinear calculation method, open source processor, electronic device and storage medium

Publications (1)

Publication Number Publication Date
CN117908962A true CN117908962A (en) 2024-04-19

Family

ID=90686359

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410027709.4A Pending CN117908962A (en) 2024-01-08 2024-01-08 Nonlinear calculation method, open source processor, electronic device and storage medium

Country Status (1)

Country Link
CN (1) CN117908962A (en)

Similar Documents

Publication Publication Date Title
CN110036368B (en) Apparatus and method for performing arithmetic operations to accumulate floating point numbers
US10514912B2 (en) Vector multiplication with accumulation in large register space
US11836464B2 (en) Method and apparatus for efficient binary and ternary support in fused multiply-add (FMA) circuits
US20210263993A1 (en) Apparatuses and methods to accelerate matrix multiplication
US20200097799A1 (en) Heterogeneous multiplier
US11036504B2 (en) Systems and methods for performing 16-bit floating-point vector dot product instructions
US10929127B2 (en) Systems, methods, and apparatuses utilizing an elastic floating-point number
CN109716290B (en) Systems, devices, and methods for fused multiply-add
US10037210B2 (en) Apparatus and method for vector instructions for large integer arithmetic
US9965276B2 (en) Vector operations with operand base system conversion and re-conversion
KR20080089313A (en) Method and apparatus for performing multiplicative functions
JP2017016637A (en) Hardware processors and methods for tightly-coupled heterogeneous computing
KR101395260B1 (en) Microprocessor and method for enhanced precision sum-of-products calculation on a microprocessor
US11455167B2 (en) Efficient implementation of complex vector fused multiply add and complex vector multiply
US10210137B2 (en) Binary multiplier for binary vector factorization
US9703626B2 (en) Recycling error bits in floating point units
CN109328334B (en) Systems, apparatus, and methods for cumulative summation
CN111814093A (en) Multiply-accumulate instruction processing method and device
CN109328333B (en) System, apparatus and method for cumulative product
CN117908962A (en) Nonlinear calculation method, open source processor, electronic device and storage medium
US20200073635A1 (en) Systems, apparatuses, and methods for vector-packed fractional multiplication of signed words with rounding, saturation, and high-result selection
US20190042236A1 (en) Apparatus and method for vector multiply and accumulate of packed bytes
CN113064841B (en) Data storage method, processing method, computing device and readable storage medium
WO2019005130A1 (en) Apparatus and method for multiplication and accumulation of complex values
KR20170097613A (en) Apparatus and method for vector horizontal logical instruction

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination