CN111061992A

CN111061992A - Function fitting method and device based on parabola

Info

Publication number: CN111061992A
Application number: CN201911194243.2A
Authority: CN
Inventors: 潘红兵; 吕航; 安梦瑜; 罗元勇
Original assignee: Nanjing University
Current assignee: Nanjing University
Priority date: 2019-11-28
Filing date: 2019-11-28
Publication date: 2020-04-24

Abstract

The invention discloses a function fitting method and a device based on a parabola. The method comprises the following specific steps: and continuously iterating in a designated interval by using a dichotomy, solving corresponding coefficients by using three-point coordinates, calculating errors, and finally performing piecewise fitting on various curve functions within a given error range to obtain the number of segments and simultaneously give the parabolic coefficients of the segments. The device comprises a data input module, a comparison module, a coefficient selection module, a calculation unit and a data output module. The method can obtain the least number of segments in the current function approximate fitting method, and can ensure that the error of each segment can reach the minimum, namely, the purposes of high precision and low complexity are achieved.

Description

Function fitting method and device based on parabola

Technical Field

The invention relates to the field of integrated circuit algorithm and hardware implementation, in particular to a method based on a parabolic fitting function and an implementation device thereof.

Background

The approximation calculation is a trade-off between computational quality and consumed performance resources. With ever increasing performance demands and increasing resource budgets, approximate computing approaches become attractive and increasingly imperative.

Newton's iteration method is a method of solving equations approximately in real and complex domains, and is commonly used to implement reciprocal, division, reciprocal square root and square root calculations in the design of complex computational units in VLSI. The Newton iteration method, as a traditional VLSI design method of division and square root calculation, has a faster square root convergence characteristic, and otherwise, the advantage is not obvious due to the defects brought by initially guessed solutions. Meanwhile, the hardware overhead of the newton iteration method is too large. Taking the square root implementation as an example, for a full expansion implementation, 17 clock cycles and 13 multipliers are required for 4 iterations, and both delay and area overhead are large. In theory, newton's iteration methods can also be used to implement cubic roots or even high-order roots, but the hardware cost and latency are prohibitive.

A COordinate Rotation DIgital Computer (CORDIC) is an approximation method for computing trigonometric functions and multiplying and dividing. The CORDIC comprises 3 tracks of circumference, linearity and hyperbola, and each track is divided into two convergence modes of rotation and vector 2. The greatest advantage of CORDIC is its simple hardware implementation, including both folding (time division multiplexing) and full-unfolding implementations. The folding mode reduces the hardware cost by replacing the sacrificial sampling rate, while the full-unfolding pipeline mode can realize the input of one data in one clock period, and the full-unfolding mode can realize extremely high frequency because the key path is only the shift addition, thereby realizing extremely high sampling rate. The circular and hyperbolic CORDIC consumes 6 adders per iteration, while the linear CORDIC consumes 4 adders per iteration. However, the CORDIC has a limited approximate precision, and the time delay and hardware resource overhead caused by improving the precision are large.

Disclosure of Invention

Aiming at the technical defects in the existing method, the invention provides a method based on a parabola fitting function and a hardware device for realizing the method, in order to more accurately and completely fit all the unary functions.

The technical scheme adopted by the method is as follows:

a function fitting method based on parabola specifically comprises the following steps:

(1) adopting a binary iteration method to segment the whole interval of the function into a plurality of element intervals, wherein the interval length of the element interval is 2^-odOd is the decimal significant digit; calculating the coefficient of the parabolic function by using the real coordinates of the three points, namely the two divided end points and the middle point;

(2) and (2) back-substituting the coefficient obtained in the step (1) into a parabolic function, and calculating an error at an end point of each element interval: substituting the endpoint of each element interval into a secondary expression, calculating a secondary function value of a corresponding point, and subtracting a fitting value of a secondary function from the secondary function value to obtain an error; comparing the calculated error with the set error;

(3) dividing the current element interval into two parts, and if the calculation error is greater than the set error, repeating the steps (1) to (2) on the first half section after dividing into two parts; if the calculation error is less than or equal to the set error, performing halving on the second half section after halving, adding the first half section after halving to the first half section, and repeating the steps (1) to (2) until the calculation error is less than or equal to the set error, wherein the length of each obtained section interval is longest, namely the number of integral sections is least;

(4) and (4) performing quadratic fitting on the segmented interval by using the coefficient obtained in the step (1) to complete the fitting of the whole interval of the function.

The invention relates to a function fitting device based on a parabola, which comprises a data input module, a comparison module, a coefficient selection module, a calculation unit and a data output module; the comparison module is used for comparing the input data with the segmented interval to determine the interval of the data, so as to determine the coefficient of the quadratic function; the coefficient selection module is used for selecting the parabolic coefficient of the interval according to the comparison result of the comparison module; and the calculating unit is used for calculating a quadratic function value according to the input data and the parabolic coefficient.

The invention does not relate to any specific expression of functions in the fitting process, the curve to be fitted is segmented, a parabolic coefficient is determined by three-point coordinates in each interval, the step length of variable value change is set by a given error, and the maximum interval smaller than the set error is found by continuously adopting dichotomy iteration until the whole target interval is covered. The interval obtained in this way is long enough, the number of the interval, namely the number of the segments, reaches the minimum, and the storage and the time delay during hardware calculation are effectively reduced. In addition, two different hardware implementations are provided according to the calculation and quantification results, the direct expansion method can achieve the purposes of low area and low resource occupation, and the CSA and parallel processing method can achieve the result of low time delay. The fitting method and the fitting device can be applied to various occasions requiring function approximate calculation, such as deep learning, big data calculation and the like, and have the advantages of high calculation accuracy, simplicity in realization, low hardware cost, low delay and the like.

Drawings

FIG. 1 is an overall flow diagram of the process of the present invention;

FIG. 2 is a flow chart of an implementation of the method of the present invention;

FIG. 3 is a matlab quadratic piecewise approximation image of several common functions in an embodiment of the present invention, (a) the function f ═ e^x(b) a function f is sin (x), and (c) a function f is 1/(1+ e)^-x) (d) function f ═ tanh (x);

FIG. 4 is a logic diagram of a hardware compute unit;

FIG. 5 is a schematic diagram of a quadratic trinomial computing unit structure using CSA;

FIG. 6 is a schematic diagram of a simplified quadratic trinomial computing unit.

Detailed Description

Assume the presence of a point (x) on a generic function₀,f(x₀) When x approaches the base point x)₀When, a function at x may be used₀The tangent of the point is taken as the approximation of the function. Function(s)

f(x)≈f(x₀)+f'(x₀)(x-x₀)

Called function f at x₀A linear approximation of the points.

When x ≈ x₀When the temperature of the water is higher than the set temperature,

the geometric meaning of the second order approximation is the parabola closest to the original function, which is more accurate than the linear approximation.

It is known that

y＝ax²+bx+c(a≠0)，

And knowing that the parabola crosses three points (x)₁,y₁),(x₂,y₂),(x₃,y₃). Then

I.e. an analytic expression of the parabola can be obtained by the known coordinates of the three points. The present embodiment employs the end point (x) of the interval_s,F(x_s)),(x_e,f(x_e) To the midpoint

And substituting for solving.

As shown in fig. 1, the fitting method of this embodiment includes the following specific steps:

(1) setting precision od, dividing target interval into several lengths of 2^-odThe meta interval of (2).

(2) And calculating function values of the end points and the middle points of the target interval by using the function expression to be fitted.

(3) The coefficients a, b, c are calculated using the aforementioned formula and known three-point coordinates. Substituting the end point of each element interval into a secondary expression, calculating a secondary function value of the corresponding point, and subtracting the fitting value of the secondary function from the calculated function value to obtain an error.

(4) And (4) if the error is larger than the set error, dividing the current interval into two parts, and repeating the steps (1) to (3) for the first half part after the division.

(5) Otherwise, carrying out halving again in the second half section after halving in the previous step, and adding the first half section to the first half section (namely 3/4 length in the previous step section) in the step (4) to repeat the operations in the steps (1) to (3) until the error does not exceed the set error. At this time, the current interval is subjected to quadratic fitting by using the currently calculated a, b and c coefficients, and the length of the interval obtained at this time is the maximum interval which can be fitted by the current coefficient.

The device of the embodiment comprises: the data input module receives input data; the data comparison module is used for comparing the input data with the segmented interval to determine a specific interval; the coefficient selection module is used for selecting the parabolic coefficient of the corresponding interval according to the comparison result; a calculation unit for calculating a quadratic function value for the input data and the coefficient; and the data output module outputs a calculation result.

The method comprises the steps of inputting bit width, a curve function expression to be fitted and a fitting interval in advance on a server, setting maximum error, segmenting the interval according to the bit width by a data segmentation module, dividing the interval into a plurality of element intervals, wherein the interval length is 2^-od. And calculating coefficients by using the operation method with the input interval as an initial interval, calculating errors of each point by back substitution, comparing the calculated errors with the set errors, trying to increase the interval length if the conditions are met until the interval length is maximum, jumping out of a cycle at the moment, and fitting the next interval. If not, taking the middle point as the right end point to repeat the steps. The implementation flow is as shown in fig. 2, and the number of segments and the maximum error are finally calculated. And the coefficients of each section are stored in the variables for extraction, so that the time is short and the precision is high. The matlab calculation results and the elapsed time for the partial function in Ryzen 52600X CPU +16G memory are shown in table 1.

TABLE 1 results of calculation

Function(s)	Interval(s)	Setting bit width	Number of segments	Total time(s)	Time of use(s)	Maximum error
							f＝1/(1+e^-x)	[-π,π]	14	14	0.079	0.069	3.051605e-05
f＝sin(x)	[0,π]	14	17	0.073	0.054	3.051750e-05
							f＝e^x	[0,π]	14	36	0.068	0.059	3.051651e-05
f＝tanh(x)	[0,π]	14	15	0.047	0.034	3.051645e-05

The corresponding fitted image is shown in fig. 3.

A flow chart for calculating the segmentation and parameters is shown in fig. 2.

And performing hardware quantization on the result, and setting variable value quantization digit according to the bit width required by the hardware. And a direct truncation mode is adopted for the end point of the element interval to adapt to the requirement of bit width. For the coefficient, a method of setting a protection bit is adopted, a calculation error is continuously input from the specified bit width, and if the error is smaller than a set hardware error 2^-odThen, the current bit width is the quantization bit number of the parabolic coefficient. If not, increasing the number of bits until the output and input accuracies are consistent. And then, storing the quantized interval endpoint data into a hardware data selection module, and storing the quantized parabolic coefficients into a ROM in the hardware coefficient selection module so as to perform hardware calculation.

The hardware implementation block diagram of the computing unit is shown in fig. 4, input data x is input into the selection module through the data input module, where x may be a feature map of a neural network, may be an input angle of FFT or other calculations requiring trigonometric functions, and may be a physical equation satisfying an e-exponential law, such as a time signal of zero input response or zero state response in a circuit. Then comparing with the quantized interval end point to determine the interval where the variable is located, reading the quantized coefficients a, b and c according to the determined interval, and sending the coefficients to a computing unit for:

y＝ax²+bx+c

and (4) calculating. Wherein x²The calculation is carried out synchronously with the index quadratic term coefficients a, b and c, and ax can be obtained through multiplication²Bx. With CSA structure, ax can be achieved²And bx and c are added and calculated. CSA (carry save adder) is a digital adder used in computer microarchitecture to compute the sum of three or more n-bit numbers in binary form. It differs from other digital adders in thatTwo numbers of the same dimension as the input are output, one is a partial sum bit sequence and the other is a carry sequence. The carry memory cell consists of n full adders, each of which calculates a sum and a carry based on only corresponding bits of three input numbers. Given three n-bit numbers a, b and c, it produces a partial sum ps and a shift carry sc:

sc_i＝(a_i∧b_i)v(a_i∧c_i)∨(b_i∧c_i)

then, the entire sum is calculated by:

the carry sequence sc is shifted one position to the left, 0 is appended to the front (most significant bit) of the partial sum sequence ps, these two are added using one ripple carry adder and the resulting (n +1) bit value is produced.

The key path of the whole process is about 2M +1A u.t., M and A are respectively the time delay of multiplication and addition, and u.t. is a clock unit. Is a hardware platform clock signal. This approach has less delay. As shown in fig. 5.

Due to the fact that

y＝ax²+bx+c＝(ax+b)*x+c

Therefore, two multiplication and addition units can be cascaded for calculation, the first multiplication and addition unit completes the calculation of ax + b, and the result is input into the second multiplication and addition unit of the cascade. Compared with the method adopting the CSA structure, the method has smaller area and larger delay, and the critical path is 2(M + A) u.t. The hardware implementation of which is shown in fig. 6.

The method can be applied to a large number of scenes because all the first-order curve functions can be fitted. For example, an activation function sigmoid in a neural network, with the expression f ═ 1/(1+ e)^-x) And tanh function, can be calculated using this approximation. And e is e^xMay be used to calculate FFT, gaussian distributions, etc. Trigonometric functions may be used to calculate periodic signals, periodic motion, etc. It can be said that in almost any project where the function of a curve needs to be calculatedThe method can be used for realizing approximate calculation with high precision and low time delay.

Claims

1. A function fitting method based on a parabola is characterized by comprising the following specific steps:

(1) dividing the whole interval of the function into a plurality of element intervals by adopting a binary iteration method, and calculating the coefficient of the parabolic function by using the real coordinates of three points, namely two divided end points and a middle point;

(2) substituting the coefficient obtained in the step (1) back into a parabolic function, calculating an error at an end point of each element interval, and comparing the calculated error with a set error;

2. The method according to claim 1, wherein in step (1), the interval length of the element interval is 2^-odWherein od is a fractional significant digit.

3. The method of claim 1, wherein in step (2), the error is calculated by: substituting the endpoint of each element interval into a quadratic expression, calculating a quadratic function value of the corresponding point, and subtracting a quadratic function fitting value from the quadratic function value to obtain an error.

4. A function fitting device based on parabola is characterized by comprising a data input module, a comparison module, a coefficient selection module, a calculation unit and a data output module; the comparison module is used for comparing the input data with the segmented interval to determine the interval of the data, so as to determine the coefficient of the quadratic function; the coefficient selection module is used for selecting the parabolic coefficient of the interval according to the comparison result of the comparison module; and the calculating unit is used for calculating a quadratic function value according to the input data and the parabolic coefficient.

5. The apparatus of claim 4, wherein the computing unit employs two cascaded multiply-add units or a carry-save adder.