WO2023206832A1 - 函数实现方法、逼近区间分段方法、芯片、设备及介质 - Google Patents

函数实现方法、逼近区间分段方法、芯片、设备及介质 Download PDF

Info

Publication number
WO2023206832A1
WO2023206832A1 PCT/CN2022/107166 CN2022107166W WO2023206832A1 WO 2023206832 A1 WO2023206832 A1 WO 2023206832A1 CN 2022107166 W CN2022107166 W CN 2022107166W WO 2023206832 A1 WO2023206832 A1 WO 2023206832A1
Authority
WO
WIPO (PCT)
Prior art keywords
function
processed
floating point
point number
interval
Prior art date
Application number
PCT/CN2022/107166
Other languages
English (en)
French (fr)
Inventor
孙存浩
赵芮
Original Assignee
成都登临科技有限公司
上海登临科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 成都登临科技有限公司, 上海登临科技有限公司 filed Critical 成都登临科技有限公司
Publication of WO2023206832A1 publication Critical patent/WO2023206832A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/483Computations with numbers represented by a non-linear combination of denominational numbers, e.g. rational numbers, logarithmic number system or floating-point numbers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/30007Arrangements for executing specific machine instructions to perform operations on data operands
    • G06F9/3001Arithmetic instructions
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • This application relates to the field of data processing technology. Specifically, it relates to a function implementation method, an approximation interval segmentation method, a chip, a device and a medium.
  • GPU Graphics Processing Unit
  • floating point computing capability of digital signal processor and the reasoning speed of artificial intelligence model.
  • elementary function operation units are indispensable in these operations.
  • batch normalization and Sigmoid functions in artificial intelligence models require elementary function calculations (finding reciprocals, square roots, reciprocal square roots, exponents, etc.).
  • chip design higher requirements are constantly put forward for chip area, power consumption and operation speed. This urgently requires the design of elementary function operations with low power consumption, fast speed and small area while meeting the limited accuracy. units to adapt to the growing demand for chip computing power.
  • Method 1 Use software to implement interpolation or approximation within the defined domain.
  • Method 2 By setting up a dedicated elementary function circuit, first reduce the input of the elementary function to a certain range through the representation characteristics of computer floating point, and then evenly segment it within this range, and call the corresponding function of each segment through the elementary function circuit. The coefficients of the interpolation function are approximated, and finally the obtained results are normalized.
  • Method 2 has better performance while ensuring calculation accuracy, but it will consume a certain amount of hardware to store the coefficients of the interpolation function and store the coefficients of the interpolation function corresponding to each segment.
  • the hardware area consumed is equivalent to the area corresponding to each segment.
  • the number of interpolations and the number of segments of the interpolation function are directly proportional to each other. Obviously, the more coefficients of the interpolation function are stored, the greater the storage overhead required and the lower the chip performance.
  • This application provides a function implementation method, approximation interval segmentation method, chip, device and medium to reduce storage overhead and improve device performance.
  • Embodiments of the present application provide a function implementation method, which may include: obtaining data to be processed; sending the data to be processed to an elementary function circuit of the target function to perform a piecewise approximation operation to obtain a processing result; wherein, the elementary function
  • the function circuit performs a segmented approximation operation, it calls the interpolation coefficient corresponding to the segmented interval to perform the interpolation operation according to the segmented interval where the data to be processed is located; the segmented interval is: according to the value of the objective function
  • the slope change rate of the change curve is each interval obtained by segmenting the approximation interval corresponding to the objective function; wherein, within the approximation interval, the interval corresponding to the slope change rate is within the same preset change rate range, according to The preset interval corresponding to the preset change rate range is obtained by segmenting; different preset change rate ranges do not overlap, and the smaller the upper limit of the preset change rate range, the larger the corresponding preset interval is. .
  • the segmentation interval is divided according to the slope change rate of the numerical change curve of the objective function, and the smaller the slope change rate is within the preset change rate range, the larger the segmentation interval is. This eliminates the need for the objective function to evenly segment the entire approximation interval according to the minimum interval, thereby reducing the number of segments, thereby reducing the number of stored interpolation coefficients, saving storage overhead, and improving chip performance.
  • the smaller the slope change rate in the numerical change curve it means that within the interval corresponding to the slope change rate, the difference between the interpolation functions corresponding to different points is smaller. Therefore, by segmenting at larger intervals, it can still be achieved The required interpolation accuracy requirements are required, so that the solution provided by this application can still meet the calculation accuracy requirements of the objective function.
  • the solution provided by this application can continue to use various existing elementary function circuits, so it is hardware-friendly and is conducive to promotion and application in industrial applications.
  • obtaining the data to be processed may include: obtaining a floating point number to be processed; preprocessing the floating point number to be processed to obtain the data to be processed; and the data to be processed being located within the approximation interval.
  • the floating point numbers to be processed are preprocessed so that the data to be processed is located within the approximation interval of the objective function, thereby ensuring that the elementary function circuit of the objective function can operate correctly.
  • preprocessing the floating-point number to be processed may include: determining the function set to which the target function belongs; if the target function belongs to a function in the first function set, then taking the value of the floating-point number to be processed. The mantissa bit is used to obtain the data to be processed; if the target function belongs to the function in the second function set, determine that the floating point number to be processed is the data to be processed; if the target function belongs to the third function set function, perform fixed-point processing on the floating point number to be processed to obtain the data to be processed.
  • the output data can meet the processing requirements of that type of function, ensuring that the elementary function circuit can perform operations correctly.
  • the objective function may be one of a reciprocal function, a square root function, a reciprocal square root function, a logarithmic function, an exponential function, a trigonometric function, a sigmoid function, a tanh function, and an erf function.
  • the objective function is a trigonometric function; the elementary function circuit of the objective function performs an interpolation operation through a Taylor interpolation function.
  • the constant term coefficient and the quadratic term coefficient in the interpolation coefficient are equal or are mutually inverse, so when storing interpolation coefficients, only one of the constant term coefficient and the quadratic term coefficient can be saved, which can further reduce the number of saved interpolation coefficients, save storage overhead, and improve chip performance.
  • the method may further include: determining that the floating point number to be processed is a canonical floating point number.
  • the method may also include: if the floating point number to be processed is a non-canonical floating point number, setting all other parts of the floating point number to be processed except the sign bit to 0 to obtain a canonical floating point number 0; If the floating point number to be processed is a non-numeric number, the floating point number to be processed is output.
  • the method may also include: if the floating point number to be processed is not a non-standard floating point number and is not a non-numeric number, it may be determined that the floating point number to be processed is a normal floating point number, and then the normal floating point number may be processed. Points are preprocessed.
  • the method may also include: determining whether the non-standard floating-point number is determined by: determining whether the exponent bit of the floating-point number to be processed is 0, and whether the mantissa bit is not 0; if the exponent of the floating-point number to be processed is bit is 0, and the mantissa bit is not 0, it is determined that the floating-point number to be processed is a non-standard floating-point number.
  • the method may also include: realizing the judgment of non-numeric numbers in the following manner: judging whether the exponent bits of the floating-point number to be processed are all 1, and whether the mantissa bit is not 0; if the exponent of the floating-point number to be processed is If all bits are 1 and the mantissa bit is not 0, the floating point number to be processed is a non-numeric number.
  • the denormalized floating point number can be converted into a normalized floating point number 0 before being input into the elementary function circuit, and then the operation can be performed. In this way, when designing the chip, the additional hardware area and power consumption in the elementary function circuit to support denormalized floating point numbers can be eliminated.
  • non-numeric numbers are output directly without calculation, which avoids the elementary function circuit from processing non-numeric numbers in piecewise approximation and reduces unnecessary processing overhead.
  • Embodiments of the present application also provide a method for segmenting the approximation interval of a function, which may include: obtaining the numerical change curve of the objective function in the approximation interval corresponding to the objective function; obtaining the slope corresponding to each sampling position on the approximation curve; According to the slope corresponding to each sampling position on the approximation curve, the slope change rate corresponding to the sampling interval formed by each adjacent sampling position on the approximation curve is determined; the slope change rate is located in the same preset change rate range for multiple consecutive The total interval composed of sampling intervals is segmented according to the preset interval corresponding to the preset change rate range; among them, different preset change rate ranges do not overlap, and the smaller the upper limit of the preset change rate range, the smaller the upper limit of the preset change rate range. The corresponding preset interval is larger.
  • the segmented interval is divided according to the slope change rate of the numerical change curve of the objective function, and when the slope change rate is within the preset change rate range with a smaller value, the larger the segment interval is.
  • the smaller the slope change rate in the numerical change curve it means that within the interval corresponding to the slope change rate, the difference between the interpolation functions corresponding to different points is smaller. Therefore, by segmenting at larger intervals, it can still be achieved The required interpolation accuracy requirements are required, so that the solution provided by this application can still meet the calculation accuracy requirements of the objective function.
  • Embodiments of the present application also provide a function implementation circuit, which may include: different elementary function circuits corresponding to different elementary functions; the elementary function circuit is used to perform piecewise approximation operations on the data to be processed to implement the corresponding elementary function The function; wherein, when the elementary function circuit performs the segmented approximation operation, according to the segmented interval where the data to be processed is located, the interpolation coefficient corresponding to the segmented interval is called to perform the interpolation operation; the segmented interval is : Each interval obtained by segmenting the approximation interval corresponding to the objective function according to the slope change rate of the numerical change curve of the objective function; wherein the approximation interval is within the same preset change rate range.
  • the interval corresponding to the slope change rate is segmented according to the preset interval corresponding to the preset change rate range; where different preset change rate ranges do not overlap, and the smaller the upper limit of the preset change rate range, then The corresponding preset interval is larger.
  • the segmented interval is divided according to the slope change rate of the numerical change curve of the objective function, and the smaller the slope change rate is within the preset change rate range, the larger the segment interval is. This eliminates the need for the objective function to evenly segment the entire approximation interval according to the minimum interval, thereby reducing the number of segments, thereby reducing the number of interpolation coefficients stored in the circuit, saving storage overhead, and improving circuit performance. .
  • the smaller the slope change rate in the numerical change curve it means that within the interval corresponding to the slope change rate, the difference between the interpolation functions corresponding to different points is smaller. Therefore, by segmenting at larger intervals, it can still be achieved The required interpolation accuracy requirements are required, so that the function implementation circuit provided in this application can still meet the calculation accuracy requirements of the target function.
  • the function implementation circuit may further include: a preprocessing unit configured to preprocess floating point numbers to be processed to obtain the data to be processed; the data to be processed is located within the approximation interval.
  • a preprocessing unit configured to preprocess floating point numbers to be processed to obtain the data to be processed; the data to be processed is located within the approximation interval.
  • the preprocessing unit is specifically configured to: determine the function set to which the target function belongs; if the target function belongs to a function in the first function set, take the mantissa of the floating point number to be processed. bit, obtain the data to be processed; if the target function belongs to a function in the second function set, determine that the floating point number to be processed is the data to be processed; if the target function belongs to a function in the third function set function, perform fixed-point processing on the floating point number to be processed to obtain the data to be processed.
  • the different elementary functions include at least two of a reciprocal function, a square root function, a reciprocal square root function, a logarithmic function, an exponential function, a trigonometric function, a sigmoid function, a tanh function, and an erf function.
  • the elementary function circuit of the sigmoid function is multiplexed in the elementary function circuit corresponding to the tanh function.
  • the function implementation circuit may also include: a floating-point outlier processing unit configured to: determine the type of the floating-point number to be processed; if the floating-point number to be processed is a standard floating-point number, The floating point number to be processed is output to the preprocessing unit; if the floating point number to be processed is a non-standard floating point number, all other parts of the floating point number to be processed except the sign bit are set to 0 to obtain a standard floating point number 0 , and output the canonical floating point number 0 to the preprocessing unit; if the floating point number to be processed is a non-numeric number, directly output the floating point number to be processed.
  • a floating-point outlier processing unit configured to: determine the type of the floating-point number to be processed; if the floating-point number to be processed is a standard floating-point number, The floating point number to be processed is output to the preprocessing unit; if the floating point number to be processed is a non-standard floating point number, all other parts of the floating point
  • the different elementary function circuits multiplex multipliers and adders.
  • multipliers and adders that can be multiplexed by different elementary function circuits are set up, so that for each elementary function circuit, This can reduce the design of multipliers and adders in each elementary function circuit, thereby effectively reducing the design of repetitive hardware and optimizing the area and power consumption of the hardware.
  • the embodiment of the present application also provides a chip, which may include: any of the above function implementation circuits.
  • An embodiment of the present application also provides an electronic device, which may include: the above chip.
  • Embodiments of the present application also provide a computer-readable storage medium that stores one or more programs, and the one or more programs can be executed by one or more processing units to implement Any of the above function implementation methods.
  • Embodiments of the present application also provide a computer program product.
  • the computer program product includes a computer program. When the computer program is executed by a processor, it implements any one of the above function implementation methods.
  • Figure 1 is a schematic flow chart of a function implementation method provided by an embodiment of the present application.
  • Figure 2 is an example numerical change curve provided by the embodiment of the present application.
  • Figure 3 is a schematic flowchart of a function approximation interval segmentation method provided by an embodiment of the present application
  • Figure 4 is a schematic structural diagram of a function implementation circuit provided by an embodiment of the present application.
  • Figure 5 is a schematic structural diagram of a more specific function implementation circuit provided by an embodiment of the present application.
  • Figure 6 is a schematic structural diagram of a more specific function implementation circuit provided by an embodiment of the present application.
  • Figure 7 is a schematic diagram of the connection logic between processing units within an exemplary function implementation circuit provided by the embodiment of the present application.
  • Figure 8 is a schematic diagram of the processing logic of a pre-processing unit provided by an embodiment of the present application.
  • FIG. 9 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
  • Figure 1 is a schematic flowchart of the function implementation method provided in the embodiment of the present application, including:
  • S102 Send the data to be processed to the elementary function circuit of the target function for piecewise approximation operation to obtain the processing result.
  • the elementary function circuit when the elementary function circuit performs the segmented approximation operation, it calls the interpolation coefficient corresponding to the segmented interval to perform the interpolation operation according to the segmented interval where the data to be processed is located.
  • the segmented intervals may be: intervals obtained by segmenting the approximation interval corresponding to the objective function according to the slope change rate of the numerical change curve of the objective function. Among them, within the approximation interval, the interval corresponding to the slope change rate within the same preset change rate range is obtained by segmenting the preset interval corresponding to the preset change rate range. Different preset change rate ranges do not overlap, and the smaller the upper limit of the preset change rate range, the larger the corresponding preset interval.
  • this part is also intensively segmented, the calculation accuracy will not be greatly improved, but the number of interpolation coefficients will be significantly increased.
  • a larger interval will be used for segmentation, thereby reducing the number of segmentation intervals in this part, thereby reducing the number of interpolation coefficients and ensuring operation accuracy.
  • the sampling location can be preset by the engineer. For example, you can set the origin as the starting point, and generate a sampling position every preset distance to obtain the slope at that position.
  • the number of set sampling positions can be an integer power of 2, which allows the computer to determine the sampling positions only based on the high-order part of the binary number. For hardware Implementation friendly, reducing the cost of hardware implementation.
  • S304 Divide the total interval consisting of multiple consecutive sampling intervals with slope change rates within the same preset change rate range into segments according to the preset intervals corresponding to the preset change rate range.
  • the approximation interval of the objective function is 0 to 1, where: all slope change rates in the interval from 0 to 0.5 (inclusive) are within the first preset change rate range, and in the interval from 0.5 (inclusive) to 1 All slope change rates of are within the second preset change rate range, then the interval from 0 to 0.5 (inclusive) is segmented by the first interval corresponding to the first preset change rate range, and the interval from 0.5 (inclusive) to The 1 interval is segmented by the second interval corresponding to the second preset change rate range. Assuming that the first preset change rate range is 0 to 50% (inclusive) and the second preset change rate range is 50% (inclusive) to 100%, then the first interval is greater than the second interval.
  • the interval from 0 to 0.5 (inclusive) is evenly divided into 64 segments, and the interval from 0.5 (inclusive) to 1 is evenly divided into 128 segments.
  • the number of segmented intervals can be reduced by 25%, thereby reducing the interpolation coefficient storage overhead by 25%.
  • the total number of intervals in the entire approximation interval and the total number of segmented intervals obtained by segmenting each total interval can be an integer power of 2, which allows subsequent determination of the interval to which the data belongs.
  • the computer can determine the segmented interval based only on the high-order part of the binary number, which is friendly to hardware implementation and reduces the cost of hardware implementation.
  • the data transmitted from the device that needs to be calculated by the elementary function circuit is usually a floating point number (hereinafter referred to as a floating point number to be processed).
  • a floating point number to be processed For different functions, the approximation intervals may be different, so the data requirements are also different. Therefore, in order to ensure that the data to be processed input to the elementary function circuit for calculation can be correctly processed by the elementary function circuit, the data to be processed should be located within the approximation interval of the objective function corresponding to the elementary function circuit.
  • the floating point numbers to be processed can be preprocessed to obtain the data to be processed that is located in the approximation interval of the objective function.
  • the preprocessing process for floating point numbers to be processed may include:
  • the target function belongs to the function in the first function set, then the mantissa bit of the floating point number to be processed is taken to obtain the data to be processed. If the target function belongs to the function in the second function set, it is determined that the floating point number to be processed is the data to be processed. If the target function belongs to the function in the third function set, the floating point number is processed into fixed point to obtain the data to be processed.
  • the functions included in the first function set, the second function set and the third function set can be classified by engineers according to the data requirements of each elementary function.
  • the first function set may include reciprocal functions, square root functions, reciprocal square root functions, and logarithmic functions
  • the second function set may include trigonometric functions (sin functions, cos functions), and erf functions
  • the third function set may include Can include sigmoid function, tanh function, exponential function.
  • the objective function may be one of the reciprocal function, square root function, reciprocal square root function, logarithmic function, exponential function, trigonometric function, sigmoid function, tanh function, and erf function, but not as a restriction.
  • the floating point number to be processed before preprocessing the floating point number to be processed, it can also be determined whether the floating point number to be processed is a canonical floating point number, so that only the canonical floating point number is preprocessed, and then the data to be processed is obtained and output to the target Elementary function circuits of functions.
  • the floating point number to be processed is a non-standard floating point number and whether it is a non-numeric number.
  • the floating point number to be processed is not a non-canonical floating point number and is not a non-numeric number, it can be determined that the floating point number to be processed is a canonical floating point number, and then preprocessing is performed.
  • the floating-point number to be processed is a non-standard floating-point number
  • all other parts of the floating-point number to be processed except the sign bit can be set to 0, thereby obtaining a canonical floating-point number.
  • the obtained canonical floating point number can be preprocessed and output to the elementary function circuit of the target function. In this way, when designing the chip, the additional hardware area and power consumption in the elementary function circuit to support denormalized floating point numbers can be eliminated.
  • the floating point number to be processed is a non-numeric number, it may not be output to the elementary function circuit of the target function for operation, but the floating point number to be processed may be directly output as the final processing result. This avoids the elementary function circuit from processing non-numeric numbers in piecewise approximation, reducing unnecessary processing overhead.
  • the judgment of non-standard floating-point numbers can be achieved in the following manner: it can be judged whether the exponent bit of the floating-point number to be processed is 0, and whether the mantissa bit is not 0. If the exponent bit of the floating point number to be processed is 0 and the mantissa bit is not 0, it is determined that the floating point number to be processed is a non-canonical floating point number.
  • the judgment of non-numeric numbers can be realized in the following manner: it can be judged whether the exponent bits of the floating point number to be processed are all 1, and whether the mantissa bit is not 0. If the exponent bits of the floating-point number to be processed are all 1 and the mantissa bit is not 0, the floating-point number to be processed is a non-numeric number.
  • the arranged elementary function circuit supports the operation of non-standard floating point numbers, before preprocessing the floating point number to be processed, it is also possible to only determine whether the floating point number to be processed is a non-numeric number. When it is a non-numeric number, it is not necessary to output it to the elementary function circuit of the target function for calculation, but directly output the floating point number to be processed as the final processing result.
  • the floating point number to be processed is a numerical number, since the elementary function circuit supports the operation of non-standard floating point numbers, the floating point number to be processed can be preprocessed, and then the elementary function circuit that needs to process the data and output it to the target function is obtained.
  • Figure 4 is a function implementation circuit provided in an embodiment of the present application.
  • the function implementation circuit includes different elementary function circuits corresponding to different elementary functions.
  • each elementary function circuit is used to perform piecewise approximation operations on the data to be processed, so as to realize the function of the elementary function corresponding to each elementary function circuit.
  • the elementary function circuit when it performs piecewise approximation operations, it will call the interpolation coefficient corresponding to the segmented interval according to the segmented interval where the data to be processed is located to perform the interpolation operation.
  • the method of determining the segmentation interval please refer to the previous description and will not be repeated here.
  • the elementary function circuit may also include a preprocessing unit.
  • the preprocessing unit is used to preprocess the floating point numbers to be processed, obtain the data to be processed, and make the data to be processed within the approximation interval of the objective function.
  • the objective function is the elementary function corresponding to the elementary function circuit required for this operation.
  • the preprocessing unit can be specifically configured to:
  • the target function belongs to the function in the first function set. If the target function belongs to the function in the first function set, then the mantissa bit of the floating point number to be processed is taken to obtain the data to be processed. If the target function belongs to the function in the second function set, the floating point number to be processed is determined to be the data to be processed. If the target function belongs to the function in the third function set, the floating point number is processed into fixed point to obtain the data to be processed.
  • different elementary functions may include at least two of reciprocal functions, square root functions, reciprocal square root functions, logarithmic functions, exponential functions, trigonometric functions, sigmoid functions, tanh functions, and erf functions. That is, the function implementation circuit can be arranged with elementary functions corresponding to at least two elementary functions among the reciprocal function, the square root function, the reciprocal square root function, the logarithmic function, the exponential function, the trigonometric function, the sigmoid function, the tanh function, and the erf function. circuit.
  • the sigmoid can be reused in the elementary function circuit corresponding to the tanh function.
  • the elementary function circuit may also include a floating-point outlier processing unit.
  • the floating point exception handling unit can be configured to:
  • the floating point number to be processed is a standard floating point number
  • the floating point number to be processed is output to the preprocessing unit.
  • the floating-point number to be processed is a non-standard floating-point number
  • all other parts of the floating-point number to be processed except the sign bit are set to 0 to obtain a canonical floating-point number 0, and the processed canonical floating-point number 0 is output to the preprocessing unit.
  • the floating point number to be processed is a non-numeric number, the floating point number to be processed is directly output as the final output result and is no longer output to the preprocessing unit.
  • both the pre-processing unit and the floating-point outlier processing unit can be implemented by designing specialized hardware circuits.
  • the preprocessing unit and floating-point exception value processing unit can also be implemented by running relevant program instructions through a processing unit (such as CPU (Central Processing Unit, central processing unit), MCU (Micro Controller Unit, micro control unit)).
  • CPU Central Processing Unit, central processing unit
  • MCU Micro Controller Unit, micro control unit
  • a multiplier and an adder that can be multiplexed by all elementary function circuits can be provided, so that when each elementary function circuit performs operations, the multiplier and adder can be used to perform operations, thereby effectively reducing Repeat the hardware design to optimize the hardware area and power consumption.
  • the elementary function circuit may be configured as an arithmetic circuit for a quadratic interpolation function.
  • two sets of multiplexed multipliers and adders can be set up to implement the operation of the quadratic interpolation function.
  • the Taylor interpolation function can be used for interpolation operations for trigonometric functions, that is, the elementary function circuit of trigonometric functions can be configured according to the Taylor interpolation function.
  • the constant term coefficient and the quadratic term coefficient in the interpolation coefficient are equal or opposite to each other, Therefore, when storing interpolation coefficients, only one of the constant term coefficient and the quadratic term coefficient can be saved, which can further reduce the number of saved interpolation coefficients, save storage overhead, and improve the performance of the function implementation circuit.
  • the elementary function circuit in order to implement operations on the data to be processed and ensure that the final output result is still a floating-point number, the elementary function circuit generally has a function piecewise approximation unit for interpolation operations and a floating-point function unit for interpolation operations.
  • the function piecewise approximation unit can be specifically used to determine the segmented interval to which the data to be processed belongs according to the data to be processed, and then call the interpolation coefficient of the segmented interval from the memory, and then perform the interpolation operation, Get the interpolation operation result.
  • the output value normalization unit can be specifically used to process the sign bit and exponent bit of the data according to the need, combined with the interpolation operation result, to restore the interpolation operation result to a floating point number and output it.
  • the interpolation coefficients can be queried according to the same bit part of each binary value in each segment interval as an index, and only the remaining part can be calculated, thereby reducing Computation.
  • the first 6 bits of data to be processed can be taken as the index for querying the interpolation coefficient.
  • the interpolation operation is performed on other data after removing the first 6-bit data from the data to be processed, and the first 6-bit data is added to the operation result to obtain the final interpolation operation result.
  • the first 6 bits of data can also be added to the constant term coefficient of the interpolation coefficient in advance, so that the final interpolation can be directly obtained by performing interpolation operations on other data after removing the first 6 bits of data from the data to be processed.
  • the calculation result further reduces the amount of calculation.
  • the function implementation circuit includes a floating-point outlier processing unit, a preprocessing unit, and a function piecewise approximation unit (although only one is shown in the figure, in actual applications, different elementary functions have corresponding function piecewise approximation units. unit) and output value normalization unit.
  • the floating point exception processing unit receives input floating point numbers to be processed. At this time, first determine whether the floating point number to be processed is a non-standard floating point number. If it is a non-standard floating point number, the sign bit data of the floating point number to be processed is retained, and all other parts of the data are set to 0 to obtain the floating point number to be processed (the value is 0) converted into a standard floating point number, and output to the preprocessing unit .
  • the floating point number to be processed When it is determined that the floating point number to be processed is not a non-standard floating point number, it is determined whether the floating point number to be processed is a non-numeric number. If the floating point number to be processed is a non-numeric number, the floating point number to be processed can be output directly through the output value normalization unit.
  • the floating point number to be processed is neither a non-standard floating point number nor a non-numeric number, the floating point number to be processed is output to the preprocessing unit.
  • the preprocessing unit reduces the numerical value based on the received floating point number to be processed and the elementary function type required for operation, using the computer floating point number representation principle.
  • the mantissa bit of the floating point number to be processed can be directly passed into the function piecewise approximation unit for piecewise approximation within the range of [1, 2).
  • the input of the function piecewise approximation unit is in the form of I1F23 (i.e. 1 integer bit, 23 fixed-point number form with digit fraction).
  • the mantissa bit of the floating point number to be processed can be directly passed into the function piecewise approximation unit for piecewise approximation within the range of [1, 2).
  • the input to the unit is in the form I1F23.
  • the mantissa bit of the floating point number to be processed can be directly passed into the function piecewise approximation unit for piecewise approximation within the range of [1, 2).
  • the input of the unit is in the form of I0F23.
  • sigmoid(16) ⁇ 1 and erf(-x) 1-erf(x).
  • the input in the range of [0, 16) can be represented in fixed-point form and then passed into the function piecewise approximation unit.
  • the input of the time function piecewise approximation unit is in the form of U4F24 (the integer part of the floating point number is converted into a 4-bit unsigned integer, and the decimal part of the floating point number is converted into a 24-bit fractional bit).
  • the fixed-point data to be processed can be first *2, and then handed over to the elementary function circuit of the sigmoid function for processing.
  • the sigmoid result can be transformed accordingly to obtain the value of the tanh function.
  • the sigmoid result is transformed accordingly to obtain the tanh function. value.
  • the sign bit (sign bit) in the floating point number to be processed before preprocessing can be The content of the exponent bit (exp bit) is transferred to the output value normalization unit for subsequent use when restoring the result to a floating point format.
  • the function piecewise approximation unit After the function piecewise approximation unit receives the data to be processed, it first searches for the interpolation coefficients based on the data to be processed:
  • the size relationship between the data to be processed and 1.5 For the reciprocal function in the interval [1, 2), first determine the size relationship between the data to be processed and 1.5. When the data to be processed is less than 1.5, take the first 7 bits of the data to be processed as the index of the query interpolation coefficient, and search Get the corresponding interpolation coefficient; when the data to be processed is greater than or equal to 1.5, take the first 6 bits of the data to be processed as the index to query the interpolation coefficient, and find the corresponding interpolation coefficient.
  • the data to be processed For in the interval Cos function approximation within (if the approximation target is a sin function, the data to be processed needs to be Transform), determine the size relationship between the data to be processed and 1 and 1.5. When the data to be processed is less than 1, take the first 6 bits as the index to query the interpolation coefficient and find the corresponding interpolation coefficient; when the data to be processed is between 1 and 1.5 When the data to be processed is larger than 1.5, the first 8 bits of the data to be processed are used as the index to query the interpolation coefficient, and the corresponding interpolation coefficient is found. When the data to be processed is greater than 1.5, the first 10 bits of the data to be processed are used as the index to query the interpolation coefficient to find the corresponding interpolation coefficient.
  • the sigmoid function approximation in the interval [0, 16) first determine the size relationship between the data to be processed and 4. When the data to be processed is less than 4, take the first 8 bits of the data to be processed as the index of the query difference coefficient, and find out The corresponding interpolation coefficient; when the data to be processed is greater than 4, take the first 7 bits of the data to be processed as the index to query the difference coefficient, and find the corresponding interpolation coefficient.
  • the constant term coefficient, linear term coefficient, and quadratic term coefficient are obtained from the interpolation coefficient table.
  • the interpolation coefficients of elementary functions other than trigonometric functions are obtained by selecting Chebyshev points on the segmentation interval corresponding to the index and performing quadratic Newton interpolation. For non-uniform segmentation functions, multiple corresponding segmentation interpolation coefficient tables need to be prepared. . For the interpolation coefficients of trigonometric functions, you can choose Taylor interpolation at the midpoint of the segment, so that only two coefficients for trigonometric functions need to be saved.
  • the function piecewise approximation unit can determine the input of the interpolation function based on the data to be processed, and then perform the interpolation operation.
  • the input of the interpolation function may be the low-bit truncation result retained after removing the bits corresponding to the index from the data to be processed (that is, the remaining part of the data to be processed after excluding the bits corresponding to the index).
  • the function piecewise approximation unit outputs the calculation results of the interpolation function to the output value normalization unit.
  • the output format is I1F24.
  • the output format is I1F26.
  • the output value normalization unit After receiving the output result of the function piecewise approximation unit, the output value normalization unit combines the sign bit and exponent bit data of the floating point number to be processed from the preprocessing unit, and restores the format through the floating point number specification to obtain the final output floating point result.
  • the I1F26 result output by the function piecewise approximation unit can be combined with the exponential bit data to obtain fixed-point data of the I7F26 type, and converted into a floating point number for output.
  • the above function implementation method and the above function implementation circuit can be applied to a chip with elementary function implementation requirements, such as a GPU chip. Therefore, in the embodiment of the present application, a chip is also provided, which has the aforementioned function implementation circuit.
  • an electronic device which has the aforementioned chip.
  • the electronic devices described in the embodiments of the present application can be, but are not limited to: mobile terminals (such as mobile phones, notebook computers, etc.), fixed terminals (such as desktop computers, etc.), servers and other devices with data processing requirements.
  • the electronic device may also have other components.
  • the electronic device can also have an I/O interface 902, a memory 903 (such as ROM (Read-Only Memory, read-only memory), RAM (Random Access Memory, random access memory), etc. ) and other components, and can be connected through bus 904.
  • ROM Read-Only Memory
  • RAM Random Access Memory
  • FIG. 9 is only illustrative, and the electronic device may also include more or fewer components than shown in FIG. 9 , or have a different configuration than that shown in FIG. 9 .
  • This embodiment also provides a computer-readable storage medium, such as a floppy disk, optical disk, hard disk, flash memory, U disk, SD (Secure Digital Memory Card, secure digital card) card, MMC (Multimedia Card, multimedia card) card, etc.
  • a computer-readable storage medium such as a floppy disk, optical disk, hard disk, flash memory, U disk, SD (Secure Digital Memory Card, secure digital card) card, MMC (Multimedia Card, multimedia card) card, etc.
  • One or more programs are stored in the computer-readable storage medium, and the one or more programs can be executed by one or more processing units (such as CPU, MCU, etc.) to implement the above function implementation method. I won’t go into details here.
  • This embodiment also provides a computer program product.
  • the computer program product includes a computer program.
  • the function implementation method according to the embodiments of the present application is implemented.
  • relational terms such as first, second, etc. are used merely to distinguish one entity or operation from another entity or operation and do not necessarily require or imply the existence of any such entity or operation between these entities or operations. Actual relationship or sequence.
  • multiple means two or more.
  • This application provides a function implementation method, a function approximation interval segmentation method, equipment and media, and relates to the field of data processing technology.
  • each segmented interval obtained by segmenting the approximation interval corresponding to the objective function is segmented and obtained according to the slope change rate of the numerical change curve of the objective function.
  • the interval corresponding to the slope change rate within the same preset change rate range is segmented according to the preset interval corresponding to the preset change rate range; different preset change rate ranges do not overlap, and The smaller the upper limit of the preset change rate range, the larger the corresponding preset interval.
  • the function implementation method, approximation interval segmentation method, chip, device and medium of the present application are reproducible and can be used in a variety of industrial applications.
  • the function implementation method, approximation interval segmentation method, chip, device and medium of this application can be used in the field of data processing technology.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Pure & Applied Mathematics (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Nonlinear Science (AREA)
  • Complex Calculations (AREA)

Abstract

本申请提供一种函数实现方法、函数的逼近区间分段方法、设备及介质,涉及数据处理技术领域。本申请中,目标函数对应的逼近区间进行分段得到的各分段区间,根据目标函数的数值变化曲线的斜率变化率分段得到。其中,逼近区间内,位于同一预设变化率范围内的斜率变化率对应的区间,按该预设变化率范围所对应的预设间隔进行分段得到;不同预设变化率范围不重叠,且预设变化率范围的上限值越小,则所对应的预设间隔越大。这就使得对于目标函数而言,无需针对整个逼近区间,都按照最小的间隔进行均匀分段,从而可以减少分段数量,进而减少所存储的插值系数数量,节约存储开销,提高芯片性能。

Description

函数实现方法、逼近区间分段方法、芯片、设备及介质
相关申请的交叉引用
本申请要求于2022年04月26日提交中国国家知识产权局的申请号为202210441226.X、名称为“函数实现方法、逼近区间分段方法、芯片、设备及介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及数据处理技术领域,具体而言,涉及一种函数实现方法、逼近区间分段方法、芯片、设备及介质。
背景技术
随着三维图形、数字信号处理及人工智能技术的发展,人们对GPU(Graphics Processing Unit,图形处理器)的渲染质量和速度、数字信号处理器的浮点运算能力与人工智能模型的推理速度提出了更高的要求,而初等函数运算单元在这些运算中是必不可少的。例如人工智能模型中的批归一化、Sigmoid函数等都需要进行初等函数计算(求倒数、平方根、平方根倒数、指数等)。目前,在芯片设计中,对芯片面积、功耗和运算速度都不断提出更高的要求,这就迫切要求在满足限定精度的情况下设计出功耗低、速度快、面积小的初等函数运算单元,以适应日渐增长的芯片算力需求。
目前,GPU中初等函数的计算主要是通过以下两种方式实现:
方式一:以软件形式进行定义域范围内的插值实现或逼近实现。
方式二:通过设置专用的初等函数电路,首先将初等函数的输入通过计算机浮点的表示特点缩小到一定范围内,然后在这个范围内进行均匀分段,通过初等函数电路调用每一分段对应的插值函数的系数进行逼近,最后对得到的结果进行规范化。
其中,方式一中需要重复调用多次浮点运算指令(例如ADD(加法指令),MUL(乘法指令),FMA(Fused Multiply and Add,融合乘法和加法指令))在性能表现上较差。
方式二在保证计算精度的情况下拥有更好的性能表现,但是会消耗一定硬件来存储插值函数的系数,存储各分段对应的插值函数的系数,所耗费的硬件面积与各分段对应的插值函数的插值次数、分段数量成正比关系。显然,存储的插值函数的系数越多,所需的存储开销就越大,芯片性能就越低。
发明内容
本申请提供了一种函数实现方法、逼近区间分段方法、芯片、设备及介质,用以降低存储开销,提高设备性能。
本申请实施例提供了一种函数实现方法,可以包括:获取需处理数据;将所述需处理数据发送至目标函数的初等函数电路中进行分段逼近运算,得到处理结果;其中,所述初等函数电路在进行分段逼近运算时,根据所述需处理数据所在的分段区间,调用该分段区间所对应的插值系数进行插值运算;所述分段区间为:根据所述目标函数的数值变化曲线的斜率变化率,对所述目标函数对应的逼近区间进行分段得到的各区间;其中,所述逼近区间内,位于同一预设变化率范围内所述斜率变化率对应的区间,按该预设变化率范围所对应的预设间隔进行分段得到;其中,不同预设变化率范围不重叠,且预设变化率范围的上限值越小,则所对应的预设间隔越大。
在上述实现过程中,由于分段区间是根据目标函数的数值变化曲线的斜率变化率进行划分,且斜率变化率位于值越小的预设变化率范围时,分段的间隔越大。这就使得对于目标函数而言,无需针对整个逼近区间,都按照最小的间隔进行均匀分段,从而可以减少分段数量,进而减少所存储的插值系数数量,节约存储开销,提高芯片性能。
此外,数值变化曲线中斜率变化率越小,即表明该斜率变化率所对应的区间内,不同点所对应的插值函数的差别就越小,从而通过较大的间隔进行分段,仍旧可以达到所需的插值精度要求,从而使得本申请所提供的方案仍可满足目标函数的计算精度要求。
此外,本申请所提供的方案,可以继续采用已有的各种初等函数电路,因此硬件实现友好,利于在工业应用中推广应用。
可选地,获取需处理数据可以包括:获取待处理浮点数;对所述待处理浮点数进行预处理,得到所述需处理数据;所述需处理数据位于所述逼近区间内。
在上述实现过程中,通过对待处理浮点数进行预处理,以使需处理数据得以位于目标函数的逼近区间内,从而保证了目标函数的初等函数电路可以正确进行运算。
可选地,对所述待处理浮点数进行预处理可以包括:判断所述目标函数所属的函数集合;若所述目标函数属于第一函数集合中的函数,则取所述待处理浮点数的尾数位,得到所述需处理数据;若所述目标函数属于第二函数集合中的函数,则确定所述待处理浮点数为所述需处理数据;若所述目标函数属于第三函数集合中的函数,则对所述待处理浮点数进行定点化处理,得到所述需处理数据。
在上述实现过程中,通过对不同类型的函数做不同的处理操作,可以使得输出数据能够满足该类型的函数的处理要求,保证初等函数电路可以正确进行运算。
可选地,所述目标函数可以为倒数函数、平方根函数、平方根倒数函数、对数函数、指数函数、三角函数、sigmoid函数、tanh函数、erf函数中的一种。
可选地,所述目标函数为三角函数;所述目标函数的初等函数电路通过泰勒插值函数进行插值运算。
在实际应用过程中,由于三角函数的二阶导数与原函数相等或者互为相反数,这样在采用泰勒插值函数进行插值运算时,那么插值系数中的常数项系数与二次项系数就是相等或者互为相反数的,因此在进行插值系数的存储时,就可以仅保存常数项系数与二次项系数中的一个,从而可以进一步减少保存的插值系数的数量,节约存储开销,提高芯片性能。
可选地,对所述待处理浮点数进行预处理之前,所述方法还可以包括:确定所述待处理浮点数为规范浮点数。
可选地,所述方法还可以包括:若所述待处理浮点数为非规范浮点数,则将所述待处理浮点数中除符号位外的其他部分全部置0,得到规范浮点数0;若所述待处理浮点数为非数值数,输出所述待处理浮点数。
可选地,所述方法还可以包括:若待处理浮点数不为非规范浮点数且也不为非数值数,则可以确定所述待处理浮点数为规范浮点数,然后对所述规范浮点数进行预处理。
可选地,所述方法还可以包括:通过以下方式实现对于非规范浮点数的判断:判断待处理浮点数的指数位是否为0,且尾数位是否不为0;如果待处理浮点数的指数位为0,且尾数位不为0,则确定待处理浮点数为非规范浮点数。
可选地,所述方法还可以包括:通过以下方式实现对于非数值数的判断:判断待处理浮点数的指数位是否全为1,且尾数位是否不为0;若待处理浮点数的指数位全为1,且尾数位不为0,则待处理浮点数为非数值数。
在上述实现过程中,对于非规范化浮点数可以在输入初等函数电路前就变换为规范化浮点数0,再进行运算。这样,在进行芯片设计时,就可以省去初等函数电路中为了支持非规范化浮点数而额外增加的硬件面积及功耗。此外,上述实现过程中,对于非数值数直接进行输出,不进行运算,这就避免了初等函数电路在分段逼近中对非数值数进行处理,减少了不必要的处理开销。
本申请实施例还提供了一种函数的逼近区间分段方法,可以包括:获取目标函数在该目标函数对应的逼近区间内的数值变化曲线;获取所述逼近曲线上各采样位置对应的斜率;根据所述逼近曲线上各采样位置对应的斜率,确定所述逼近曲线上各相邻采样位置形成的采样区间对应的斜率变化率;将斜率变化率位于同一预设变化率范围内的连续多个采样区间构成的总区间,按该预设变化率范围所对应的预设间隔进行分段;其中,不同预设变化率范围不重叠,且预设变化率范围的上限值越小,则所对应的预设间隔越大。
在上述实现过程中,分段区间是根据目标函数的数值变化曲线的斜率变化率进行划分,且斜率变化率位于值越小的预设变化率范围时,分段的间隔越大。这就使得对于目标函数 而言,无需针对整个逼近区间,都按照最小的间隔进行均匀分段,从而可以减少分段数量,进而减少所存储的插值系数数量,节约存储开销,提高芯片性能。
此外,数值变化曲线中斜率变化率越小,即表明该斜率变化率所对应的区间内,不同点所对应的插值函数的差别就越小,从而通过较大的间隔进行分段,仍旧可以达到所需的插值精度要求,从而使得本申请所提供的方案仍可满足目标函数的计算精度要求。
本申请实施例还提供了一种函数实现电路,可以包括:不同初等函数对应的不同初等函数电路;所述初等函数电路用于对需处理数据进行分段逼近运算,以实现所对应的初等函数的功能;其中,所述初等函数电路在进行分段逼近运算时,根据所述需处理数据所在的分段区间,调用该分段区间所对应的插值系数进行插值运算;所述分段区间为:根据所述目标函数的数值变化曲线的斜率变化率,对所述目标函数对应的逼近区间进行分段得到的各区间;其中,所述逼近区间内,位于同一预设变化率范围内所述斜率变化率对应的区间,按该预设变化率范围所对应的预设间隔进行分段得到;其中,不同预设变化率范围不重叠,且预设变化率范围的上限值越小,则所对应的预设间隔越大。
在上述电路中,由于分段区间是根据目标函数的数值变化曲线的斜率变化率进行划分,且斜率变化率位于值越小的预设变化率范围时,分段的间隔越大。这就使得对于目标函数而言,无需针对整个逼近区间,都按照最小的间隔进行均匀分段,从而可以减少分段数量,进而减少电路中所存储的插值系数数量,节约存储开销,提高电路性能。
此外,数值变化曲线中斜率变化率越小,即表明该斜率变化率所对应的区间内,不同点所对应的插值函数的差别就越小,从而通过较大的间隔进行分段,仍旧可以达到所需的插值精度要求,从而使得本申请所提供的函数实现电路仍可满足目标函数的计算精度要求。
可选地,所述函数实现电路还可以包括:预处理单元,被配置成用于对待处理浮点数进行预处理,得到所述需处理数据;所述需处理数据位于所述逼近区间内。
可选地,所述预处理单元具体被配置成用于:判断所述目标函数所属的函数集合;若所述目标函数属于第一函数集合中的函数,则取所述待处理浮点数的尾数位,得到所述需处理数据;若所述目标函数属于第二函数集合中的函数,则确定所述待处理浮点数为所述需处理数据;若所述目标函数属于第三函数集合中的函数,则对所述待处理浮点数进行定点化处理,得到所述需处理数据。
可选地,所述不同初等函数包括倒数函数、平方根函数、平方根倒数函数、对数函数、指数函数、三角函数、sigmoid函数、tanh函数、erf函数中的至少两种。
可选地,所述不同初等函数包括sigmoid函数和tanh函数时,所述tanh函数对应的初等函数电路中,复用所述sigmoid函数的初等函数电路。
考虑到对于tanh函数,有tanh(x)=2×sigmoid(2x)-1,因此在上述实现电路中,在设计 tanh函数的初等函数电路时,通过在tanh函数对应的初等函数电路中,复用sigmoid函数的初等函数电路,从而就可以在保证硬件功能的情况下,减少重复硬件单元的设计,优化硬件单元的面积与功耗。
可选地,所述函数实现电路还可以包括:浮点异常值处理单元,被配置成用于:判断所述待处理浮点数的类型;若所述待处理浮点数为规范浮点数,将所述待处理浮点数输出给所述预处理单元;若所述待处理浮点数为非规范浮点数,将所述待处理浮点数中除符号位外的其他部分全部置0,得到规范浮点数0,并将所述规范浮点数0输出给所述预处理单元;若所述待处理浮点数为非数值数,直接输出所述待处理浮点数。
可选地,所述不同初等函数电路复用乘法器和加法器。
考虑到插值运算是通过调用乘法器和加法器进行计算实现的,因此在上述实现电路中,通过设置可被不同初等函数电路复用的乘法器和加法器,这样对于各个初等函数电路而言,就可以减少对于各个初等函数电路中的乘法器和加法器的设计,从而可以有效减少重复硬件的设计,优化硬件的面积与功耗。
本申请实施例还提供了一种芯片,可以包括:如上任一种的函数实现电路。
本申请实施例还提供了一种电子设备,可以包括:上述芯片。
本申请实施例中还提供了一种计算机可读存储介质,所述计算机可读存储介质存储有一个或者多个程序,所述一个或者多个程序可被一个或者多个处理单元执行,以实现上述任一种的函数实现方法。
本申请实施例中还提供了一种计算机程序产品,所述计算机程序产品包括计算机程序,所述计算机程序被处理器执行时实现上述任一种所述的函数实现方法。
附图说明
为了更清楚地说明本申请实施例的技术方案,下面将对本申请实施例中所需要使用的附图作简单地介绍,应当理解,以下附图仅示出了本申请的某些实施例,因此不应被看作是对范围的限定,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他相关的附图。
图1为本申请实施例提供的一种函数实现方法的流程示意图;
图2为本申请实施例提供的一种示例的数值变化曲线;
图3为本申请实施例提供的一种函数的逼近区间分段方法的流程示意图;
图4为本申请实施例提供的一种函数实现电路的结构示意图;
图5为本申请实施例提供的一种较具体的函数实现电路的结构示意图;
图6为本申请实施例提供的一种更具体的函数实现电路的结构示意图;
图7为本申请实施例提供的一种示例性的函数实现电路内部各处理单元之间的连接逻辑示意图;
图8为本申请实施例提供的一种预处理单元的处理逻辑示意图;
图9为本申请实施例示例的一种电子设备的结构示意图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行描述。
为了降低存储开销,提高实现初等函数时的性能,本申请实施例中提供了一种函数实现方法。可以参见图1所示,图1为本申请实施例中提供的函数实现方法的流程示意图,包括:
S101:获取需处理数据。
S102:将需处理数据发送至目标函数的初等函数电路中进行分段逼近运算,得到处理结果。
其中,初等函数电路在进行分段逼近运算时,根据需处理数据所在的分段区间,调用该分段区间所对应的插值系数进行插值运算。
在本申请实施例中,分段区间可以为:根据目标函数的数值变化曲线的斜率变化率,对目标函数对应的逼近区间进行分段得到的各区间。其中,逼近区间内,位于同一预设变化率范围内斜率变化率对应的区间,按该预设变化率范围所对应的预设间隔进行分段得到。其中,不同预设变化率范围不重叠,且预设变化率范围的上限值越小,则所对应的预设间隔越大。
应理解,对于一个初等函数而言,其在逼近区间内,数值变化曲线中斜率变化率越小,即表明该斜率变化率所对应的区间内,不同点所对应的插值函数的差别就越小。例如,参见图2所示的数值变化曲线,可见在虚线左侧部分,曲线接近直线,该部分的斜率的变化率很小,甚至可能接近于0。在该部分曲线所对应的区间内,各处对应的插值函数的系数差别非常小,甚至可能没有差别。此时,若对该部分也进行密集性的分段,对于运算精度的提升并不高,但是却会显著增加插值系数的数量。而在本申请实施例中,对于该部分,则会采用较大的间隔进行分段,从而减少该部分的分段区间数量,从而减少插值系数的数量,且保证运算精度。
为了实现对于各函数的逼近区间的合理分段,可以参见图3所示,可以按照图3所示的函数的逼近区间分段方法进行分段:
S301:获取目标函数在该目标函数对应的逼近区间内的数值变化曲线。
S302:获取逼近曲线上各采样位置对应的斜率。
需要理解的是,采样位置可以由工程师预先设定。例如,可以设置以原点为起点,每间隔预设距离即生成一个采样位置进行该位置处的斜率获取。可选的,考虑到计算机中的数据为二进制数,因此设定的采样位置的数量可以为2的整数次方,这就使得计算机可以仅根据二进制数的高位部分就确定出采样位置,对于硬件实现友好,减少了硬件实现的成本。
S303:根据逼近曲线上各采样位置对应的斜率,确定逼近曲线上各相邻采样位置形成的采样区间对应的斜率变化率。
S304:将斜率变化率位于同一预设变化率范围内的连续多个采样区间构成的总区间,按该预设变化率范围所对应的预设间隔进行分段。
其中,不同预设变化率范围不重叠,且预设变化率范围的上限值越小,则所对应的预设间隔越大。
示例性的,假设目标函数的逼近区间为0至1,其中:0至0.5(含)区间内的所有斜率变化率均位于第一预设变化率范围内,0.5(不含)至1区间内的所有斜率变化率均位于第二预设变化率范围内,则对0至0.5(含)区间以第一预设变化率范围所对应的第一间隔进行分段,对0.5(不含)至1区间以第二预设变化率范围所对应的第二间隔进行分段。假设第一预设变化率范围为0至50%(含),第二预设变化率范围为50%(不含)至100%,则第一间隔大于第二间隔。假设,第一间隔为0.002,第二间隔为0.001,则逼近区间内,0至0.5(含)区间内均分为64段,而0.5(不含)至1区间内均分为128段。相比于将整个逼近区间全部密集性分段(即全部按2 -8进行分段),可以减少25%的分段区间数量,从而减少25%的插值系数存储开销。
在本申请实施例中,整个逼近区间内的总区间数量,以及各总区间内分段得到的分段区间的总数量可以为2的整数次方,这就使得后续进行数据的所属区间的确定时,计算机可以仅根据二进制数的高位部分就确定出所属的分段区间,对于硬件实现友好,减少了硬件实现的成本。
应理解,以上过程可以由工程师在实验室等环境中预先获取到各类初等函数进行在对应的逼近区间内的数值变化曲线,然后进行分段后,应用到芯片或电路中。
还应理解,在实际应用过程中,设备传来的需要通过初等函数电路进行运算的数据通常是浮点数(后文记为待处理浮点数)。而对于不同的函数而言,其所具有的逼近区间可能不同,从而对于数据的要求也就有所不同。因此,为了保证输入至初等函数电路中进行运算的需处理数据可以正确被该初等函数电路所处理,需处理数据应当位于该初等函数电路所对应的目标函数的逼近区间内。为此,在本申请实施例中,可以对待处理浮点数进行预 处理,得到位于目标函数的逼近区间的需处理数据。
示例性的,对待处理浮点数的预处理过程可以包括:
判断目标函数所属的函数集合。若目标函数属于第一函数集合中的函数,则取待处理浮点数的尾数位,得到需处理数据。若目标函数属于第二函数集合中的函数,则确定所述待处理浮点数为需处理数据。若目标函数属于第三函数集合中的函数,则对浮点数进行定点化处理,得到需处理数据。
应理解,第一函数集合、第二函数集合和第三函数集合中所具有的函数,可以由工程师根据各初等函数对于数据的要求进行归类得到。示例性的,第一函数集合中可以包括倒数函数、平方根函数、倒数平方根函数、对数函数;第二函数集合中可以包括三角函数(sin函数、cos函数)、erf函数;第三函数集合中可以包括sigmoid函数、tanh函数、指数函数。
需要说明的是,在芯片或电路中,预先可以布设有多种不同初等函数的初等函数电路,从而实现不同初等函数的功能。而芯片或电路在运行过程中,根据数据处理需要,在输出待处理浮点数时,该待处理浮点数需要采样何种初等函数进行计算是已经被确定好的,也即目标函数是已被确定好的。在此基础上,可以很容易的判断目标函数所属的函数集合。
可选的,在本申请实施例中,目标函数可以为倒数函数、平方根函数、平方根倒数函数、对数函数、指数函数、三角函数、sigmoid函数、tanh函数、erf函数中的一种,但不作为限制。
在本申请实施例中,在对待处理浮点数进行预处理之前,还可以先判断待处理浮点数是否为规范浮点数,从而仅对规范浮点数进行预处理,然后得到需处理数据并输出至目标函数的初等函数电路。
示例性的,在本申请实施例中,可以判断待处理浮点数是否为非规范浮点数以及是否为非数值数。
若待处理浮点数不为非规范浮点数且也不为非数值数,则可以确定该待处理浮点数为规范浮点数,然后进行预处理。
若待处理浮点数为非规范浮点数,则可以将该待处理浮点数中除符号位外的其他部分全部置0,从而得到规范浮点数。此后,可以将得到的规范浮点数进行预处理后,输出给目标函数的初等函数电路。这样,在进行芯片设计时,就可以省去初等函数电路中为了支持非规范化浮点数而额外增加的硬件面积及功耗。
若待处理浮点数为非数值数,则可以不将其输出至目标函数的初等函数电路中进行运算,而是直接将该待处理浮点数作为最终的处理结果进行输出。这就避免了初等函数电路在分段逼近中对非数值数进行处理,减少了不必要的处理开销。
在本申请实施例中,可以通过以下方式实现对于非规范浮点数的判断:可以判断待处 理浮点数的指数位是否为0,且尾数位是否不为0。如果待处理浮点数的指数位为0,且尾数位不为0,则确定待处理浮点数为非规范浮点数。
在本申请实施例中,可以通过以下方式实现对于非数值数的判断:可以判断待处理浮点数的指数位是否全为1,且尾数位是否不为0。若待处理浮点数的指数位全为1,且尾数位不为0,则待处理浮点数为非数值数。
当然,在本申请实施例中,若布设的初等函数电路支持非规范浮点数的运算,则在对待处理浮点数进行预处理之前,也可以仅先判断待处理浮点数是否为非数值数。在为非数值数时,则可以不将其输出至目标函数的初等函数电路中进行运算,而是直接将该待处理浮点数作为最终的处理结果进行输出。在待处理浮点数为数值数时,由于初等函数电路支持非规范浮点数的运算,因此可以对待处理浮点数进行预处理,然后得到需处理数据并输出至目标函数的初等函数电路。
参见图4所示,图4为本申请实施例中提供的一种函数实现电路,该函数实现电路中包括有不同初等函数对应的不同初等函数电路。
在本申请实施例中,各初等函数电路,用于对需处理数据进行分段逼近运算,以实现各初等函数电路所对应的初等函数的功能。
如前文所述,初等函数电路在进行分段逼近运算时,会根据需处理数据所在的分段区间,调用该分段区间所对应的插值系数进行插值运算。而分段区间的确定方式,可参见前文所述,在此不再赘述。
请参见图5所示,初等函数电路中还可以包括预处理单元。预处理单元用于对待处理浮点数进行预处理,得到需处理数据,并使需处理数据位于目标函数的逼近区间内。其中,目标函数为本次运算所需的初等函数电路所对应的初等函数。
示例性的,预处理单元具体可以被配置成用于:
判断目标函数所属的函数集合。若目标函数属于第一函数集合中的函数,则取待处理浮点数的尾数位,得到需处理数据。若目标函数属于第二函数集合中的函数,则确定待处理浮点数为需处理数据。若目标函数属于第三函数集合中的函数,则对浮点数进行定点化处理,得到需处理数据。
其中,针对第一函数集合、第二函数集合和第三函数集合的配置,可以参见前文描述,在此不再赘述。
应理解,在本申请实施例中,不同初等函数可以包括倒数函数、平方根函数、平方根倒数函数、对数函数、指数函数、三角函数、sigmoid函数、tanh函数、erf函数中的至少两种。也即,函数实现电路中可以布设有倒数函数、平方根函数、平方根倒数函数、对数函数、指数函数、三角函数、sigmoid函数、tanh函数、erf函数中的至少两种初等函数所 对应的初等函数电路。
而当初等函数包括sigmoid函数和tanh函数时,也即函数实现电路中需要同时布设sigmoid函数的初等函数电路和tanh函数的初等函数电路时,可以在tanh函数对应的初等函数电路中,复用sigmoid函数的初等函数电路。这是由于对于tanh函数,有tanh(x)=2×sigmoid(2x)-1,因此在设计tanh函数的初等函数电路时,在tanh函数对应的初等函数电路中,复用sigmoid函数的初等函数电路,这就可以在保证硬件功能的情况下,减少重复硬件单元的设计,优化硬件单元的面积与功耗。
请参见图6所示,初等函数电路中还可以包括浮点异常值处理单元。浮点异常值处理单元可以被配置成用于:
判断待处理浮点数的类型。若待处理浮点数为规范浮点数,将待处理浮点数输出给预处理单元。若待处理浮点数为非规范浮点数,将待处理浮点数中除符号位外的其他部分全部置0,得到规范浮点数0,并将处理得到的该规范浮点数0输出给预处理单元。若待处理浮点数为非数值数,则直接将该待处理浮点数作为最终的输出结果进行输出,不再输出给预处理单元。
应理解,在本申请实施例中,预处理单元和浮点异常值处理单元都可以通过设计专门的硬件电路来实现。但是,预处理单元和浮点异常值处理单元也可以通过处理单元(如CPU(Central Processing Unit,中央处理器)、MCU(Micro Controller Unit,微控制单元))运行相关程序指令来实现。在本申请实施例中对于预处理单元和浮点异常值处理单元的实现方式不做限制。
在本申请实施例中,可以设置可被所有初等函数电路复用的乘法器和加法器,从而使得各初等函数电路在进行运算时,都可以通过该乘法器和加法器进行运算,从而有效减少重复硬件的设计,优化硬件的面积与功耗。
应理解,在本申请实施例中,初等函数电路可以被设置为二次插值函数的运算电路。此时,可以设置两组被复用的乘法器和加法器,从而实现二次插值函数的运算。
还应理解,在本申请实施例中,二次插值函数的运算电路,可以按照霍纳法则进行配置,从而通过霍纳法则计算二次插值多项式(即按照y=C 0+(C 2x+C 1)x进行计算),从而相比于常规的二次插值多项式计算方法(按照y=C 0+C 1x+C 2x 2进行计算),计算量从三次乘法与两次加法变为了两次乘法与两次加法,在保证结果准确性的情况下可以减少一次乘法器的调用,有效提升计算速度。
需要说明的是,在布设的初等函数电路包括有三角函数的初等函数电路时,针对三角函数,可以采用泰勒插值函数进行插值运算,也即三角函数的初等函数电路可以按照泰勒插值函数进行配置。这样,由于三角函数的二阶导数与原函数相等或者互为相反数,在采 用泰勒插值函数进行插值运算时,插值系数中的常数项系数与二次项系数就是相等或者互为相反数的,因此在进行插值系数的存储时,就可以仅保存常数项系数与二次项系数中的一个,从而可以进一步减少保存的插值系数的数量,节约存储开销,提高函数实现电路的性能。
应理解,在本申请实施例中,除了可以采用二次插值函数进行运算外,也可以采用其他的多次插值函数进行运算,在本申请实施例中不做限制。
还需要说明的是,初等函数电路为了实现对于需处理数据的运算以及保证最终输出结果仍旧为浮点数,一般初等函数电路内会具有用于进行插值运算的函数分段逼近单元和用于进行浮点数格式还原的输出值规范化单元。
当然,整个函数实现电路中,也可以仅设置一个输出值规范化单元供各初等函数电路复用,这样就可以有效减少重复硬件的设计,优化硬件的面积与功耗。
在本申请实施例中,函数分段逼近单元具体可以用于,根据需处理数据确定该需处理数据所属的分段区间,进而从存储器中调用该分段区间的插值系数,然后进行插值运算,得到插值运算结果。
输出值规范化单元具体可以用于,按照需处理数据的符号位和指数位,结合插值运算结果,将插值运算结果恢复为浮点数并进行输出。
在本申请实施例中,为了提高函数分段逼近单元的计算效率,可以按照各分段区间内各二进制数值的相同比特部分作为索引进行插值系数的查询,并且仅针对剩余部分进行运算,从而减少运算量。
示例性的,假设针对区间[0,1)分为了2的6次方个分段区间,那么可以取需处理数据的前6bit数据作为查询插值系数的索引。而针对需处理数据去除该前6bit数据后的其他数据进行插值运算,在运算结果中加上该前6比特数据,即得到最终的插值运算结果。应理解,该前6比特数据也可以是预先在插值系数的常数项系数中加上了的,从而针对需处理数据去除该前6bit数据后的其他数据进行插值运算,即可直接得到最终的插值运算结果,进一步减少运算量。
下面,为便于理解本申请的方案,以一个更具体的实施过程为例,对本申请做示例说明:
参见图7所示,函数实现电路包括浮点异常值处理单元,预处理单元,函数分段逼近单元(虽然图中仅示出一个,但实际应用中,不同初等函数各自对应有函数分段逼近单元)和输出值规范化单元。
在需要进行数据运算时:
浮点异常值处理单元接收输入的待处理浮点数。此时,首先判断该待处理浮点数是否 是非规范浮点数。若为非规范浮点数,则保留该待处理浮点数的符号位数据,将其他部分的数据全部置0,得到转换为规范浮点数的待处理浮点数(数值为0),输出给预处理单元。
在判断出该待处理浮点数不是非规范浮点数时,判断该待处理浮点数是否为非数值数。若该待处理浮点数为非数值数,则可以直接通过输出值规范化单元输出该待处理浮点数。
若该待处理浮点数既不是非规范浮点数,又不是非数值数,则将该待处理浮点数输出给预处理单元。
参见图8所示,预处理单元根据接收到的待处理浮点数,以及所需运算的初等函数类型,通过计算机浮点数表示原理进行数值缩小。
假设预处理前的待处理浮点数表示为
Figure PCTCN2022107166-appb-000001
(S x,E x,X分别被称为符号位,指数位和尾数位),则缩小规则如下:
对于倒数函数,有
Figure PCTCN2022107166-appb-000002
可以直接将待处理浮点数的尾数位传入函数分段逼近单元进行[1,2)范围内的分段逼近,此时函数分段逼近单元的输入为I1F23形式(即1位整数位,23位分数位的定点数形式)。
对于指数函数,有
Figure PCTCN2022107166-appb-000003
F∈[0,1),可以将待处理浮点数从浮点数形式转换为定点数形式I8F23(即8位整数位,23位分数位的定点数形式),将定点数中的fraction(分数)部分传入函数分段逼近单元进行[0,1)范围内的分段逼近,此时函数分段逼近单元的输入为I0F23形式(即0位整数位,23位分数位的定点数形式)。
对于对数函数,有log 2(M x)=log 2(X)+E x,X∈[1,2),可以直接将待处理浮点数的尾数位传入函数分段逼近单元进行[1,2)范围内的分段逼近,此时函数分段逼近单元的输入为I1F23形式。
对于平方根函数,当待处理浮点数的指数位E x是偶数时有
Figure PCTCN2022107166-appb-000004
X∈[1,2);当待处理浮点数的指数E x是奇数时有
Figure PCTCN2022107166-appb-000005
X∈[1,2),这两种情况下都可以直接将待处理浮点数的尾数位传入函数分段逼近单元进行[1,2)范围内的分段逼近,此时函数分段逼近单元的输入为I1F23形式。
对于倒数平方根函数,当待处理浮点数的指数位E x是偶数时有
Figure PCTCN2022107166-appb-000006
X∈[1,2);当待处理浮点数的指数E x是奇数时有
Figure PCTCN2022107166-appb-000007
X∈[1,2),这两种情况下都可以直接将待处理浮点数的尾数位传入函数分段逼近单元进行[1,2)范围内的分段 逼近,此时函数分段逼近单元的输入为I0F23形式。
对于三角函数,有sin/cos(M x)=sin/cos(x),
Figure PCTCN2022107166-appb-000008
直接将已经限定在
Figure PCTCN2022107166-appb-000009
范围内的输入传入函数分段逼近单元,此时函数分段逼近单元的输入为标准的计算机浮点表示即S1E8M23形式(即浮点数形式,不做处理)。
对于erf函数,有erf(±3.5)≈±1并且erf(-x)=-erf(x),可以将[0,3.5)范围内的输入传入函数分段逼近单元,此时函数分段逼近单元的输入为标准的计算机浮点表示即S1E8M23形式。
对于sigmoid函数,有sigmoid(16)≈1并且erf(-x)=1-erf(x),可以将[0,16)范围内的输入进行定点化表示后传入函数分段逼近单元,此时函数分段逼近单元的输入为U4F24形式(浮点数的整数部分转换为4bit无符号整数,浮点数的小数部分转换为24位的分数位)。
对于tanh函数,有tanh(x)=2*sigmoid(2x)-1,可以直接共享sigmoid函数的初等函数电路。计算时,可以先对定点化后的需处理数据*2,然后交由sigmoid函数的初等函数电路进行处理,最后对sigmoid结果做相应变换,得到tanh函数的值。此外,计算时,也可以是先对待处理浮点数*2,然后进行定点化处理得到需处理数据,然后交由sigmoid函数的初等函数电路进行处理,最后对sigmoid结果做相应变换,得到tanh函数的值。
在本申请实施例中,针对倒数函数、平方根函数、倒数平方根函数、对数函数、sigmoid函数、tanh函数和指数函数,可以将预处理前的待处理浮点数中的符号位(sign位)和指数位(exp位)内容传输给输出值规范化单元,供后续还原为浮点数格式的结果时使用。
函数分段逼近单元接收到需处理数据后,首先根据需处理数据进行插值系数的查找:
对于在区间[1,2)内的倒数函数,先判断需处理数据与1.5的大小关系,当需处理数据小于1.5时,取需处理数据前7bit数据作为查询插值系数的index(索引),查找出对应的插值系数;当需处理数据大于等于1.5时,取当需处理数据前6bit数据作为查询插值系数的index,查找出对应的插值系数。
对于在区间[0,1)内的指数函数,取需处理数据前6bit数据作为查询插值系数的index,查找出对应的插值系数。
对于在区间[1,2)内的对数函数逼近,取需处理数据前6bit数据作为查询插值系数的index,查找出对应的插值系数。
对于在区间[1,2)内的平方根函数逼近,取需处理数据前5bit数据作为查询插值系数 的index,查找出对应的插值系数。
对于在区间[1,2)内的倒数平方根函数逼近,取需处理数据前6bit数据作为查询插值系数的index,查找出对应的插值系数。
对于在区间
Figure PCTCN2022107166-appb-000010
内的cos函数逼近(如果逼近目标是sin函数,需要对需处理数据做
Figure PCTCN2022107166-appb-000011
变换),判断需处理数据与1和1.5的大小关系,当需处理数据小于1时,取前6bit作为查询插值系数的index,查找出对应的插值系数;在需处理数据在1到1.5之间时,取需处理数据前8bit作为查询插值系数的index,查找出对应的插值系数;在需处理数据大于1.5时,取需处理数据前10bit作为查询插值系数的index,查找出对应的插值系数。
对于在区间[0,3.5)内的erf函数逼近,同三角函数的处理类似,根据计算机浮点表示的规则设置切分点{0,2 -10,2 -9,…,2 0,2 1},在这些区间分别取前{0,0,0,0,1,2,2,3,3,4,4,5,6}bit作为查询插值系数的index,查找出对应的插值系数。
对于在区间[0,16)内的sigmoid函数逼近,先判断需处理数据与4的大小关系,当需处理数据小于4时,取需处理数据的前8bit作为查询差值系数的index,查找出对应的插值系数;需处理数据大于4时,取需处理数据的前7bit作为查询差值系数的index,查找出对应的插值系数。
根据确定的index从插值系数表中取常数项系数、一次项系数、二次项系数。除三角函数外的初等函数的插值系数通过在该index对应的分段区间上选择切比雪夫点进行二次牛顿插值得到,对于非均匀分段的函数需要准备对应的多张分段插值系数表。对于三角函数的插值系数则可以选择在分段中点的泰勒插值得到,从而使得对于三角函数只需保存两种系数。
在得到常数项系数、一次项系数、二次项系数后,函数分段逼近单元即可根据需处理数据确定插值函数的输入,然后进行插值运算。
示例性的,计算时,插值函数的输入可以为需处理数据去除index所对应的各bit后保留的低位截断结果(即需处理数据中除index所对应的各bit后的剩余部分)。
然后,使用霍纳法则将原公式转化为y=C 0+(C 2x+C 1)x,进行计算,得到最终的插值结果。
最后,函数分段逼近单元将插值函数的计算结果输出给输出值规范化单元。
应理解,对于不同的初等函数,有不同的输出形式。例如:
对于倒数函数、指数函数、平方根函数与sigmoid函数,输出形式为I1F24。对于对数函数、三角函数、倒数平方根函数与erf函数,输出形式为I1F26。
输出值规范化单元接收函数分段逼近单元的输出结果后,结合预处理单元传来的待处理浮点数的符号位、指数位数据,通过浮点数规范进行格式还原,得到最终输出的浮点结果。
示例性的,对于对数函数与三角函数,可以将函数分段逼近单元输出的I1F26结果结合指数位的数据,得到I7F26类型的定点形式数据,并转换为浮点数进行输出。而对于tanh函数,则可以直接将sigmoid函数的计算结果右移1位并去掉最高bit的1,即可得到最终输出的浮点结果进行输出。
应理解,在本申请实施例中,上述函数实现方法以及上述函数实现电路,可以应用于具有初等函数实现需求的芯片上,例如GPU芯片上。因此,在本申请实施例中,还提供了一种芯片,其内具有前述的函数实现电路。
此外,在本申请实施例中,还提供了一种电子设备,其内具有前述芯片。应理解,本申请实施例中所述的电子设备可以但不限于:移动终端(如手机、笔记本电脑等)、固定终端(如台式电脑等)、服务器等具有数据处理需求的设备。
可以理解,电子设备内除了具有前述芯片外,还可以具有其他的部件。例如图9所示,电子设备除了可以具有芯片901外,还可以具有I/O接口902、存储器903(如ROM(Read-Only Memory,只读存储器)、RAM(Random Access Memory,随机存储器)等)等部件,并可通过总线904连接。
应理解,图9所示的结构仅为示意,电子设备还可包括比图9中所示更多或者更少的组件,或者具有与图9所示不同的配置。
本实施例还提供了一种计算机可读存储介质,如软盘、光盘、硬盘、闪存、U盘、SD(Secure Digital Memory Card,安全数码卡)卡、MMC(Multimedia Card,多媒体卡)卡等,在该计算机可读存储介质中存储有一个或者多个程序,这一个或者多个程序可被一个或者多个处理单元(如CPU、MCU等)执行,以实现上述函数实现方法。在此不再赘述。
本实施例还提供了一种计算机程序产品,该计算机程序产品包括计算机程序,该计算机程序被处理器执行时实现根据本申请的实施方式所述的函数实现方法。
在本申请所提供的实施例中,应该理解到,所揭露电路和方法,可以通过其它的方式实现。以上所描述的实施例仅仅是示意性的。
在本文中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。
在本文中,多个是指两个或两个以上。
以上所述仅为本申请的实施例而已,并不用于限制本申请的保护范围,对于本领域的技术人员来说,本申请可以有各种更改和变化。凡在本申请的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本申请的保护范围之内。
工业实用性
本申请提供了一种函数实现方法、函数的逼近区间分段方法、设备及介质,涉及数据处理技术领域。本申请中,目标函数对应的逼近区间进行分段得到的各分段区间,根据目标函数的数值变化曲线的斜率变化率分段得到。其中,逼近区间内,位于同一预设变化率范围内的斜率变化率对应的区间,按该预设变化率范围所对应的预设间隔进行分段得到;不同预设变化率范围不重叠,且预设变化率范围的上限值越小,则所对应的预设间隔越大。这就使得对于目标函数而言,无需针对整个逼近区间,都按照最小的间隔进行均匀分段,从而可以减少分段数量,进而减少所存储的插值系数数量,节约存储开销,提高芯片性能。
此外,可以理解的是,本申请的函数实现方法、逼近区间分段方法、芯片、设备及介质是可以重现的,并且可以用在多种工业应用中。例如,本申请的函数实现方法、逼近区间分段方法、芯片、设备及介质可以用于数据处理技术领域。

Claims (19)

  1. 一种函数实现方法,其特征在于,包括:
    获取需处理数据;
    将所述需处理数据发送至目标函数的初等函数电路中进行分段逼近运算,得到处理结果;
    其中,所述初等函数电路在进行分段逼近运算时,根据所述需处理数据所在的分段区间,调用该分段区间所对应的插值系数进行插值运算;
    所述分段区间为:根据所述目标函数的数值变化曲线的斜率变化率,对所述目标函数对应的逼近区间进行分段得到的各区间;其中,所述逼近区间内,位于同一预设变化率范围内所述斜率变化率对应的区间,按该预设变化率范围所对应的预设间隔进行分段得到;其中,不同预设变化率范围不重叠,且预设变化率范围的上限值越小,则所对应的预设间隔越大。
  2. 根据权利要求1所述的函数实现方法,其特征在于,获取需处理数据包括:
    获取待处理浮点数;
    判断所述目标函数所属的函数集合;
    若所述目标函数属于第一函数集合中的函数,则取所述待处理浮点数的尾数位,得到所述需处理数据;
    若所述目标函数属于第二函数集合中的函数,则确定所述待处理浮点数为所述需处理数据;
    若所述目标函数属于第三函数集合中的函数,则对所述待处理浮点数进行定点化处理,得到所述需处理数据。
  3. 根据权利要求2所述的函数实现方法,其特征在于,在判断所述目标函数所属的函数集合之前,所述方法还包括:
    确定所述待处理浮点数为规范浮点数。
  4. 根据权利要求3所述的函数实现方法,其特征在于,所述方法还包括:
    若所述待处理浮点数为非规范浮点数,则将所述待处理浮点数中除符号位外的其他部分全部置0,得到规范浮点数0;
    若所述待处理浮点数为非数值数,输出所述待处理浮点数。
  5. 根据权利要求3所述的函数实现方法,其特征在于,所述方法还包括:
    若待处理浮点数不为非规范浮点数且也不为非数值数,确定所述待处理浮点数为规范浮点数,对所述规范浮点数进行预处理。
  6. 根据权利要求5所述的函数实现方法,其特征在于,所述方法还包括:
    判断待处理浮点数的指数位是否为0,且尾数位是否不为0;如果待处理浮点数的指数位为0,且尾数位不为0,则确定待处理浮点数为非规范浮点数。
  7. 根据权利要求5或6所述的函数实现方法,其特征在于,所述方法还包括:
    判断待处理浮点数的指数位是否全为1,且尾数位是否不为0;若待处理浮点数的指数位全为1,且尾数位不为0,则待处理浮点数为非数值数。
  8. 根据权利要求1至7中任一项所述的函数实现方法,其特征在于,所述目标函数为倒数函数、平方根函数、平方根倒数函数、对数函数、指数函数、三角函数、sigmoid函数、tanh函数、erf函数中的一种。
  9. 根据权利要求1至7中任一项所述的函数实现方法,其特征在于,所述目标函数为三角函数;所述目标函数的初等函数电路通过泰勒插值函数进行插值运算。
  10. 一种函数的逼近区间分段方法,其特征在于,包括:
    获取目标函数在该目标函数对应的逼近区间内的数值变化曲线;
    获取所述逼近曲线上各采样位置对应的斜率;
    根据所述逼近曲线上各采样位置对应的斜率,确定所述逼近曲线上各相邻采样位置形成的采样区间对应的斜率变化率;
    将斜率变化率位于同一预设变化率范围内的连续多个采样区间构成的总区间,按该预设变化率范围所对应的预设间隔进行分段;其中,不同预设变化率范围不重叠,且预设变化率范围的上限值越小,则所对应的预设间隔越大。
  11. 一种函数实现电路,其特征在于,包括:
    不同初等函数对应的不同初等函数电路;
    所述初等函数电路用于对需处理数据进行分段逼近运算,以实现所对应的初等函数的功能;
    其中,所述初等函数电路在进行分段逼近运算时,根据所述需处理数据所在的分段区间,调用该分段区间所对应的插值系数进行插值运算;
    所述分段区间为:根据所述目标函数的数值变化曲线的斜率变化率,对所述目标函数对应的逼近区间进行分段得到的各区间;其中,所述逼近区间内,位于同一预设变化率范围内所述斜率变化率对应的区间,按该预设变化率范围所对应的预设间隔进行分段得到;其中,不同预设变化率范围不重叠,且预设变化率范围的上限值越小,则所对应的预设间隔越大。
  12. 根据权利要求11所述的函数实现电路,其特征在于,所述函数实现电路还包括:
    预处理单元,被配置成用于
    判断所述目标函数所属的函数集合;
    若所述目标函数属于第一函数集合中的函数,则取待处理浮点数的尾数位,得到所述需处理数据;
    若所述目标函数属于第二函数集合中的函数,则确定待处理浮点数为所述需处理数据;
    若所述目标函数属于第三函数集合中的函数,则对待处理浮点数进行定点化处理,得到所述需处理数据。
  13. 根据权利要求11或12所述的函数实现电路,其特征在于,所述不同初等函数包括sigmoid函数和tanh函数时,所述tanh函数对应的初等函数电路中,复用所述sigmoid函数的初等函数电路。
  14. 根据权利要求12所述的函数实现电路,其特征在于,还包括:
    浮点异常值处理单元,被配置成用于
    判断所述待处理浮点数的类型;
    若所述待处理浮点数为规范浮点数,将所述待处理浮点数输出给所述预处理单元;
    若所述待处理浮点数为非规范浮点数,将所述待处理浮点数中除符号位外的其他部分全部置0,得到规范浮点数0,并将所述规范浮点数0输出给所述预处理单元;
    若所述待处理浮点数为非数值数,直接输出所述待处理浮点数。
  15. 根据权利要求11至14任一项所述的函数实现电路,其特征在于,所述不同初等函数电路复用乘法器和加法器。
  16. 一种芯片,其特征在于,包括:如权利要求11至15任一项所述的函数实现电路。
  17. 一种电子设备,其特征在于,包括:根据权利要求16所述的芯片。
  18. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质存储有一个或者多个程序,所述一个或者多个程序可被一个或者多个处理单元执行,以实现根据权利要求1至9中任一项所述的函数实现方法。
  19. 一种计算机程序产品,其特征在于,所述计算机程序产品包括计算机程序,所述计算机程序被处理器执行时实现根据权利要求1至9中任一项所述的函数实现方法。
PCT/CN2022/107166 2022-04-26 2022-07-21 函数实现方法、逼近区间分段方法、芯片、设备及介质 WO2023206832A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210441226.XA CN114546330B (zh) 2022-04-26 2022-04-26 函数实现方法、逼近区间分段方法、芯片、设备及介质
CN202210441226.X 2022-04-26

Publications (1)

Publication Number Publication Date
WO2023206832A1 true WO2023206832A1 (zh) 2023-11-02

Family

ID=81667527

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/107166 WO2023206832A1 (zh) 2022-04-26 2022-07-21 函数实现方法、逼近区间分段方法、芯片、设备及介质

Country Status (2)

Country Link
CN (1) CN114546330B (zh)
WO (1) WO2023206832A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114546330B (zh) * 2022-04-26 2022-07-12 成都登临科技有限公司 函数实现方法、逼近区间分段方法、芯片、设备及介质
CN115936965A (zh) * 2022-11-07 2023-04-07 格兰菲智能科技有限公司 应用于gpu的函数计算系统、方法和装置

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106126834A (zh) * 2016-06-28 2016-11-16 西安交通大学 一种基于惯导角度量测的轨道平面最佳线形确定算法
CN110210612A (zh) * 2019-05-14 2019-09-06 北京中科汇成科技有限公司 一种基于自适应分段线性逼近曲线的集成电路加速方法及系统
US10447297B1 (en) * 2018-10-03 2019-10-15 Honeywell Federal Manufacturing & Technologies, Llc Electronic device and method for compressing sampled data
CN114546330A (zh) * 2022-04-26 2022-05-27 成都登临科技有限公司 函数实现方法、逼近区间分段方法、芯片、设备及介质
CN114610267A (zh) * 2022-03-22 2022-06-10 奥比中光科技集团股份有限公司 一种基于指数函数和softmax函数的优化方法、硬件系统及芯片

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0148315A1 (en) * 1984-07-05 1985-07-17 Jiluan Pan Method of controlling the output characteristic of a welding power source, apparatus for arc welding and electrical circuit to be used for such apparatus
KR100399737B1 (ko) * 2001-11-13 2003-09-29 김정국 신호 파형의 분할 및 분할된 구간의 특성화 방법
CN102530055A (zh) * 2010-12-22 2012-07-04 上海联盛汽车电子有限公司 具有双斜率平滑助力曲线的电动助力转向系统
CN104036356B (zh) * 2014-06-12 2017-11-07 国家电网公司 一种利用分形算法对电网未来运行状态进行预测的方法
US10725742B2 (en) * 2018-06-05 2020-07-28 Texas Instruments Incorporated Transcendental function evaluation
US10924131B1 (en) * 2018-10-03 2021-02-16 Honeywell Federal Manufacturing & Technologies, Llc Electronic device and method for compressing sampled data
CN111046084A (zh) * 2019-12-18 2020-04-21 重庆大学 一种多元时间序列监测数据的关联规则挖掘方法
CN113342881B (zh) * 2021-05-21 2023-07-07 中广核工程有限公司 核电厂仪控系统测试曲线构建方法、装置、设备、介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106126834A (zh) * 2016-06-28 2016-11-16 西安交通大学 一种基于惯导角度量测的轨道平面最佳线形确定算法
US10447297B1 (en) * 2018-10-03 2019-10-15 Honeywell Federal Manufacturing & Technologies, Llc Electronic device and method for compressing sampled data
CN110210612A (zh) * 2019-05-14 2019-09-06 北京中科汇成科技有限公司 一种基于自适应分段线性逼近曲线的集成电路加速方法及系统
CN114610267A (zh) * 2022-03-22 2022-06-10 奥比中光科技集团股份有限公司 一种基于指数函数和softmax函数的优化方法、硬件系统及芯片
CN114546330A (zh) * 2022-04-26 2022-05-27 成都登临科技有限公司 函数实现方法、逼近区间分段方法、芯片、设备及介质

Also Published As

Publication number Publication date
CN114546330B (zh) 2022-07-12
CN114546330A (zh) 2022-05-27

Similar Documents

Publication Publication Date Title
WO2023206832A1 (zh) 函数实现方法、逼近区间分段方法、芯片、设备及介质
US9519460B1 (en) Universal single instruction multiple data multiplier and wide accumulator unit
WO2021147395A1 (zh) 算数逻辑单元、浮点数乘法计算的方法及设备
JP7244186B2 (ja) 改良された低精度の2進浮動小数点形式設定
JP5175379B2 (ja) 選択可能な下位精度を有する浮動小数点プロセッサ
WO2022028134A1 (zh) 一种芯片、终端及浮点运算的控制方法和相关装置
CN112230881A (zh) 浮点数处理器
Prabhu et al. A delay efficient vedic multiplier
WO2022170811A1 (zh) 一种适用于混合精度神经网络的定点乘加运算单元及方法
WO2018196750A1 (zh) 处理乘加运算的装置和处理乘加运算的方法
Lyu et al. PWL-based architecture for the logarithmic computation of floating-point numbers
Subhasri et al. Hardware‐efficient approximate logarithmic division with improved accuracy
CN111984226A (zh) 一种基于双曲cordic的立方根求解装置及求解方法
CN112163185B (zh) Fft/ifft运算装置及基于该装置的fft/ifft运算方法
Alla et al. An area and delay efficient logarithmic multiplier
CN115268832A (zh) 浮点数取整的方法、装置以及电子设备
CN116700666A (zh) 一种浮点数处理方法及装置
Ismail et al. Hybrid logarithmic number system arithmetic unit: A review
Lastras et al. A logarithmic approach to energy-efficient GPU arithmetic for mobile devices
Kamble et al. Research trends in development of floating point computer arithmetic
Altamimi et al. Novel seed generation and quadrature-based square rooting algorithms
JPH04172526A (ja) 浮動小数点除算器
WO2022047873A1 (zh) 除法运算方法、装置、电子设备和介质
WO2024082674A1 (zh) 浮点数据精度转换方法和装置
US20240069865A1 (en) Fractional logarithmic number system adder

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22939660

Country of ref document: EP

Kind code of ref document: A1