CN111061992A - Function fitting method and device based on parabola - Google Patents
Function fitting method and device based on parabola Download PDFInfo
- Publication number
- CN111061992A CN111061992A CN201911194243.2A CN201911194243A CN111061992A CN 111061992 A CN111061992 A CN 111061992A CN 201911194243 A CN201911194243 A CN 201911194243A CN 111061992 A CN111061992 A CN 111061992A
- Authority
- CN
- China
- Prior art keywords
- interval
- error
- function
- coefficient
- fitting
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 38
- 238000004364 calculation method Methods 0.000 claims abstract description 28
- 238000012887 quadratic function Methods 0.000 claims description 8
- 230000006870 function Effects 0.000 abstract description 43
- 238000010586 diagram Methods 0.000 description 5
- 238000013459 approach Methods 0.000 description 3
- 230000008901 benefit Effects 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 238000013139 quantization Methods 0.000 description 3
- 238000013528 artificial neural network Methods 0.000 description 2
- 230000007547 defect Effects 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 230000000737 periodic effect Effects 0.000 description 2
- 230000004044 response Effects 0.000 description 2
- 238000005070 sampling Methods 0.000 description 2
- 230000011218 segmentation Effects 0.000 description 2
- PGLIUCLTXOYQMV-UHFFFAOYSA-N Cetirizine hydrochloride Chemical compound Cl.Cl.C1CN(CCOCC(=O)O)CCN1C(C=1C=CC(Cl)=CC=1)C1=CC=CC=C1 PGLIUCLTXOYQMV-UHFFFAOYSA-N 0.000 description 1
- 230000004913 activation Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000009191 jumping Effects 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
- 230000000135 prohibitive effect Effects 0.000 description 1
- 238000011002 quantification Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/11—Complex mathematical operations for solving equations, e.g. nonlinear equations, general mathematical optimization problems
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- General Physics & Mathematics (AREA)
- Pure & Applied Mathematics (AREA)
- Mathematical Optimization (AREA)
- Mathematical Analysis (AREA)
- Computational Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- Operations Research (AREA)
- Algebra (AREA)
- Databases & Information Systems (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- Complex Calculations (AREA)
Abstract
The invention discloses a function fitting method and a device based on a parabola. The method comprises the following specific steps: and continuously iterating in a designated interval by using a dichotomy, solving corresponding coefficients by using three-point coordinates, calculating errors, and finally performing piecewise fitting on various curve functions within a given error range to obtain the number of segments and simultaneously give the parabolic coefficients of the segments. The device comprises a data input module, a comparison module, a coefficient selection module, a calculation unit and a data output module. The method can obtain the least number of segments in the current function approximate fitting method, and can ensure that the error of each segment can reach the minimum, namely, the purposes of high precision and low complexity are achieved.
Description
Technical Field
The invention relates to the field of integrated circuit algorithm and hardware implementation, in particular to a method based on a parabolic fitting function and an implementation device thereof.
Background
The approximation calculation is a trade-off between computational quality and consumed performance resources. With ever increasing performance demands and increasing resource budgets, approximate computing approaches become attractive and increasingly imperative.
Newton's iteration method is a method of solving equations approximately in real and complex domains, and is commonly used to implement reciprocal, division, reciprocal square root and square root calculations in the design of complex computational units in VLSI. The Newton iteration method, as a traditional VLSI design method of division and square root calculation, has a faster square root convergence characteristic, and otherwise, the advantage is not obvious due to the defects brought by initially guessed solutions. Meanwhile, the hardware overhead of the newton iteration method is too large. Taking the square root implementation as an example, for a full expansion implementation, 17 clock cycles and 13 multipliers are required for 4 iterations, and both delay and area overhead are large. In theory, newton's iteration methods can also be used to implement cubic roots or even high-order roots, but the hardware cost and latency are prohibitive.
A COordinate Rotation DIgital Computer (CORDIC) is an approximation method for computing trigonometric functions and multiplying and dividing. The CORDIC comprises 3 tracks of circumference, linearity and hyperbola, and each track is divided into two convergence modes of rotation and vector 2. The greatest advantage of CORDIC is its simple hardware implementation, including both folding (time division multiplexing) and full-unfolding implementations. The folding mode reduces the hardware cost by replacing the sacrificial sampling rate, while the full-unfolding pipeline mode can realize the input of one data in one clock period, and the full-unfolding mode can realize extremely high frequency because the key path is only the shift addition, thereby realizing extremely high sampling rate. The circular and hyperbolic CORDIC consumes 6 adders per iteration, while the linear CORDIC consumes 4 adders per iteration. However, the CORDIC has a limited approximate precision, and the time delay and hardware resource overhead caused by improving the precision are large.
Disclosure of Invention
Aiming at the technical defects in the existing method, the invention provides a method based on a parabola fitting function and a hardware device for realizing the method, in order to more accurately and completely fit all the unary functions.
The technical scheme adopted by the method is as follows:
a function fitting method based on parabola specifically comprises the following steps:
(1) adopting a binary iteration method to segment the whole interval of the function into a plurality of element intervals, wherein the interval length of the element interval is 2-odOd is the decimal significant digit; calculating the coefficient of the parabolic function by using the real coordinates of the three points, namely the two divided end points and the middle point;
(2) and (2) back-substituting the coefficient obtained in the step (1) into a parabolic function, and calculating an error at an end point of each element interval: substituting the endpoint of each element interval into a secondary expression, calculating a secondary function value of a corresponding point, and subtracting a fitting value of a secondary function from the secondary function value to obtain an error; comparing the calculated error with the set error;
(3) dividing the current element interval into two parts, and if the calculation error is greater than the set error, repeating the steps (1) to (2) on the first half section after dividing into two parts; if the calculation error is less than or equal to the set error, performing halving on the second half section after halving, adding the first half section after halving to the first half section, and repeating the steps (1) to (2) until the calculation error is less than or equal to the set error, wherein the length of each obtained section interval is longest, namely the number of integral sections is least;
(4) and (4) performing quadratic fitting on the segmented interval by using the coefficient obtained in the step (1) to complete the fitting of the whole interval of the function.
The invention relates to a function fitting device based on a parabola, which comprises a data input module, a comparison module, a coefficient selection module, a calculation unit and a data output module; the comparison module is used for comparing the input data with the segmented interval to determine the interval of the data, so as to determine the coefficient of the quadratic function; the coefficient selection module is used for selecting the parabolic coefficient of the interval according to the comparison result of the comparison module; and the calculating unit is used for calculating a quadratic function value according to the input data and the parabolic coefficient.
The invention does not relate to any specific expression of functions in the fitting process, the curve to be fitted is segmented, a parabolic coefficient is determined by three-point coordinates in each interval, the step length of variable value change is set by a given error, and the maximum interval smaller than the set error is found by continuously adopting dichotomy iteration until the whole target interval is covered. The interval obtained in this way is long enough, the number of the interval, namely the number of the segments, reaches the minimum, and the storage and the time delay during hardware calculation are effectively reduced. In addition, two different hardware implementations are provided according to the calculation and quantification results, the direct expansion method can achieve the purposes of low area and low resource occupation, and the CSA and parallel processing method can achieve the result of low time delay. The fitting method and the fitting device can be applied to various occasions requiring function approximate calculation, such as deep learning, big data calculation and the like, and have the advantages of high calculation accuracy, simplicity in realization, low hardware cost, low delay and the like.
Drawings
FIG. 1 is an overall flow diagram of the process of the present invention;
FIG. 2 is a flow chart of an implementation of the method of the present invention;
FIG. 3 is a matlab quadratic piecewise approximation image of several common functions in an embodiment of the present invention, (a) the function f ═ ex(b) a function f is sin (x), and (c) a function f is 1/(1+ e)-x) (d) function f ═ tanh (x);
FIG. 4 is a logic diagram of a hardware compute unit;
FIG. 5 is a schematic diagram of a quadratic trinomial computing unit structure using CSA;
FIG. 6 is a schematic diagram of a simplified quadratic trinomial computing unit.
Detailed Description
Assume the presence of a point (x) on a generic function0,f(x0) When x approaches the base point x)0When, a function at x may be used0The tangent of the point is taken as the approximation of the function. Function(s)
f(x)≈f(x0)+f'(x0)(x-x0)
Called function f at x0A linear approximation of the points.
When x ≈ x0When the temperature of the water is higher than the set temperature,
the geometric meaning of the second order approximation is the parabola closest to the original function, which is more accurate than the linear approximation.
It is known that
y=ax2+bx+c(a≠0),
And knowing that the parabola crosses three points (x)1,y1),(x2,y2),(x3,y3). Then
I.e. an analytic expression of the parabola can be obtained by the known coordinates of the three points. The present embodiment employs the end point (x) of the intervals,F(xs)),(xe,f(xe) To the midpointAnd substituting for solving.
As shown in fig. 1, the fitting method of this embodiment includes the following specific steps:
(1) setting precision od, dividing target interval into several lengths of 2-odThe meta interval of (2).
(2) And calculating function values of the end points and the middle points of the target interval by using the function expression to be fitted.
(3) The coefficients a, b, c are calculated using the aforementioned formula and known three-point coordinates. Substituting the end point of each element interval into a secondary expression, calculating a secondary function value of the corresponding point, and subtracting the fitting value of the secondary function from the calculated function value to obtain an error.
(4) And (4) if the error is larger than the set error, dividing the current interval into two parts, and repeating the steps (1) to (3) for the first half part after the division.
(5) Otherwise, carrying out halving again in the second half section after halving in the previous step, and adding the first half section to the first half section (namely 3/4 length in the previous step section) in the step (4) to repeat the operations in the steps (1) to (3) until the error does not exceed the set error. At this time, the current interval is subjected to quadratic fitting by using the currently calculated a, b and c coefficients, and the length of the interval obtained at this time is the maximum interval which can be fitted by the current coefficient.
The device of the embodiment comprises: the data input module receives input data; the data comparison module is used for comparing the input data with the segmented interval to determine a specific interval; the coefficient selection module is used for selecting the parabolic coefficient of the corresponding interval according to the comparison result; a calculation unit for calculating a quadratic function value for the input data and the coefficient; and the data output module outputs a calculation result.
The method comprises the steps of inputting bit width, a curve function expression to be fitted and a fitting interval in advance on a server, setting maximum error, segmenting the interval according to the bit width by a data segmentation module, dividing the interval into a plurality of element intervals, wherein the interval length is 2-od. And calculating coefficients by using the operation method with the input interval as an initial interval, calculating errors of each point by back substitution, comparing the calculated errors with the set errors, trying to increase the interval length if the conditions are met until the interval length is maximum, jumping out of a cycle at the moment, and fitting the next interval. If not, taking the middle point as the right end point to repeat the steps. The implementation flow is as shown in fig. 2, and the number of segments and the maximum error are finally calculated. And the coefficients of each section are stored in the variables for extraction, so that the time is short and the precision is high. The matlab calculation results and the elapsed time for the partial function in Ryzen 52600X CPU +16G memory are shown in table 1.
TABLE 1 results of calculation
Function(s) | Interval(s) | Setting bit width | Number of segments | Total time(s) | Time of use(s) | Maximum error |
f=1/(1+e-x) | [-π,π] | 14 | 14 | 0.079 | 0.069 | 3.051605e-05 |
f=sin(x) | [0,π] | 14 | 17 | 0.073 | 0.054 | 3.051750e-05 |
f=ex | [0,π] | 14 | 36 | 0.068 | 0.059 | 3.051651e-05 |
f=tanh(x) | [0,π] | 14 | 15 | 0.047 | 0.034 | 3.051645e-05 |
The corresponding fitted image is shown in fig. 3.
A flow chart for calculating the segmentation and parameters is shown in fig. 2.
And performing hardware quantization on the result, and setting variable value quantization digit according to the bit width required by the hardware. And a direct truncation mode is adopted for the end point of the element interval to adapt to the requirement of bit width. For the coefficient, a method of setting a protection bit is adopted, a calculation error is continuously input from the specified bit width, and if the error is smaller than a set hardware error 2-odThen, the current bit width is the quantization bit number of the parabolic coefficient. If not, increasing the number of bits until the output and input accuracies are consistent. And then, storing the quantized interval endpoint data into a hardware data selection module, and storing the quantized parabolic coefficients into a ROM in the hardware coefficient selection module so as to perform hardware calculation.
The hardware implementation block diagram of the computing unit is shown in fig. 4, input data x is input into the selection module through the data input module, where x may be a feature map of a neural network, may be an input angle of FFT or other calculations requiring trigonometric functions, and may be a physical equation satisfying an e-exponential law, such as a time signal of zero input response or zero state response in a circuit. Then comparing with the quantized interval end point to determine the interval where the variable is located, reading the quantized coefficients a, b and c according to the determined interval, and sending the coefficients to a computing unit for:
y=ax2+bx+c
and (4) calculating. Wherein x2The calculation is carried out synchronously with the index quadratic term coefficients a, b and c, and ax can be obtained through multiplication2Bx. With CSA structure, ax can be achieved2And bx and c are added and calculated. CSA (carry save adder) is a digital adder used in computer microarchitecture to compute the sum of three or more n-bit numbers in binary form. It differs from other digital adders in thatTwo numbers of the same dimension as the input are output, one is a partial sum bit sequence and the other is a carry sequence. The carry memory cell consists of n full adders, each of which calculates a sum and a carry based on only corresponding bits of three input numbers. Given three n-bit numbers a, b and c, it produces a partial sum ps and a shift carry sc:
sci=(ai∧bi)v(ai∧ci)∨(bi∧ci)
then, the entire sum is calculated by:
the carry sequence sc is shifted one position to the left, 0 is appended to the front (most significant bit) of the partial sum sequence ps, these two are added using one ripple carry adder and the resulting (n +1) bit value is produced.
The key path of the whole process is about 2M +1A u.t., M and A are respectively the time delay of multiplication and addition, and u.t. is a clock unit. Is a hardware platform clock signal. This approach has less delay. As shown in fig. 5.
Due to the fact that
y=ax2+bx+c=(ax+b)*x+c
Therefore, two multiplication and addition units can be cascaded for calculation, the first multiplication and addition unit completes the calculation of ax + b, and the result is input into the second multiplication and addition unit of the cascade. Compared with the method adopting the CSA structure, the method has smaller area and larger delay, and the critical path is 2(M + A) u.t. The hardware implementation of which is shown in fig. 6.
The method can be applied to a large number of scenes because all the first-order curve functions can be fitted. For example, an activation function sigmoid in a neural network, with the expression f ═ 1/(1+ e)-x) And tanh function, can be calculated using this approximation. And e is exMay be used to calculate FFT, gaussian distributions, etc. Trigonometric functions may be used to calculate periodic signals, periodic motion, etc. It can be said that in almost any project where the function of a curve needs to be calculatedThe method can be used for realizing approximate calculation with high precision and low time delay.
Claims (5)
1. A function fitting method based on a parabola is characterized by comprising the following specific steps:
(1) dividing the whole interval of the function into a plurality of element intervals by adopting a binary iteration method, and calculating the coefficient of the parabolic function by using the real coordinates of three points, namely two divided end points and a middle point;
(2) substituting the coefficient obtained in the step (1) back into a parabolic function, calculating an error at an end point of each element interval, and comparing the calculated error with a set error;
(3) dividing the current element interval into two parts, and if the calculation error is greater than the set error, repeating the steps (1) to (2) on the first half section after dividing into two parts; if the calculation error is less than or equal to the set error, performing halving on the second half section after halving, adding the first half section after halving to the first half section, and repeating the steps (1) to (2) until the calculation error is less than or equal to the set error, wherein the length of each obtained section interval is longest, namely the number of integral sections is least;
(4) and (4) performing quadratic fitting on the segmented interval by using the coefficient obtained in the step (1) to complete the fitting of the whole interval of the function.
2. The method according to claim 1, wherein in step (1), the interval length of the element interval is 2-odWherein od is a fractional significant digit.
3. The method of claim 1, wherein in step (2), the error is calculated by: substituting the endpoint of each element interval into a quadratic expression, calculating a quadratic function value of the corresponding point, and subtracting a quadratic function fitting value from the quadratic function value to obtain an error.
4. A function fitting device based on parabola is characterized by comprising a data input module, a comparison module, a coefficient selection module, a calculation unit and a data output module; the comparison module is used for comparing the input data with the segmented interval to determine the interval of the data, so as to determine the coefficient of the quadratic function; the coefficient selection module is used for selecting the parabolic coefficient of the interval according to the comparison result of the comparison module; and the calculating unit is used for calculating a quadratic function value according to the input data and the parabolic coefficient.
5. The apparatus of claim 4, wherein the computing unit employs two cascaded multiply-add units or a carry-save adder.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911194243.2A CN111061992A (en) | 2019-11-28 | 2019-11-28 | Function fitting method and device based on parabola |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911194243.2A CN111061992A (en) | 2019-11-28 | 2019-11-28 | Function fitting method and device based on parabola |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111061992A true CN111061992A (en) | 2020-04-24 |
Family
ID=70299079
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911194243.2A Pending CN111061992A (en) | 2019-11-28 | 2019-11-28 | Function fitting method and device based on parabola |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111061992A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112051980A (en) * | 2020-10-13 | 2020-12-08 | 浙江大学 | Non-linear activation function computing device based on Newton iteration method |
CN112257361A (en) * | 2020-10-22 | 2021-01-22 | 东南大学 | Standard unit library construction method based on quadratic fitting model |
CN116720554A (en) * | 2023-08-11 | 2023-09-08 | 南京师范大学 | Method for realizing multi-section linear fitting neuron circuit based on FPGA technology |
-
2019
- 2019-11-28 CN CN201911194243.2A patent/CN111061992A/en active Pending
Non-Patent Citations (1)
Title |
---|
牛涛: "初等函数运算器的设计研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112051980A (en) * | 2020-10-13 | 2020-12-08 | 浙江大学 | Non-linear activation function computing device based on Newton iteration method |
CN112051980B (en) * | 2020-10-13 | 2022-06-21 | 浙江大学 | Non-linear activation function computing device based on Newton iteration method |
CN112257361A (en) * | 2020-10-22 | 2021-01-22 | 东南大学 | Standard unit library construction method based on quadratic fitting model |
CN112257361B (en) * | 2020-10-22 | 2024-02-20 | 东南大学 | Standard cell library construction method based on quadratic fit model |
CN116720554A (en) * | 2023-08-11 | 2023-09-08 | 南京师范大学 | Method for realizing multi-section linear fitting neuron circuit based on FPGA technology |
CN116720554B (en) * | 2023-08-11 | 2023-11-14 | 南京师范大学 | Method for realizing multi-section linear fitting neuron circuit based on FPGA technology |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111061992A (en) | Function fitting method and device based on parabola | |
Coleman et al. | Arithmetic on the European logarithmic microprocessor | |
Obermann et al. | Division algorithms and implementations | |
US4949296A (en) | Method and apparatus for computing square roots of binary numbers | |
Barrois et al. | The hidden cost of functional approximation against careful data sizing—A case study | |
CN111488133B (en) | High-radix approximate Booth coding method and mixed-radix Booth coding approximate multiplier | |
CN111984227B (en) | Approximation calculation device and method for complex square root | |
CN108196822A (en) | A kind of method and system of double-precision floating point extracting operation | |
Jun et al. | Modified non-restoring division algorithm with improved delay profile and error correction | |
CN103677737A (en) | Method and device for achieving low delay CORDIC trigonometric function based on carry-save summator | |
Adams et al. | Approximate restoring dividers using inexact cells and estimation from partial remainders | |
CN111443893A (en) | N-time root calculation device and method based on CORDIC algorithm | |
Esposito et al. | Approximate adder with output correction for error tolerant applications and Gaussian distributed inputs | |
CN110187866B (en) | Hyperbolic CORDIC-based logarithmic multiplication computing system and method | |
Kanani et al. | ACA-CSU: A carry selection based accuracy configurable approximate adder design | |
KR20170138143A (en) | Method and apparatus for fused multiply-add | |
Lakshmi et al. | VLSI architecture for low latency radix-4 CORDIC | |
CN107423026B (en) | Method and device for realizing sine and cosine function calculation | |
Rudagi et al. | Comparative analysis of radix-2, radix-4, radix-8 CORDIC processors | |
CN107657078B (en) | Ultrasonic phased array floating point focusing transmission implementation method based on FPGA | |
Sadeghian et al. | Optimized low-power elementary function approximation for Chebyshev series approximations | |
Bajger et al. | Low-error, high-speed approximation of the sigmoid function for large FPGA implementations | |
Hsiao et al. | Redundant constant-factor implementation of multi-dimensional CORDIC and its application to complex SVD | |
CN113919264A (en) | Complex quadratic root calculation circuit design method based on general linear approximation algorithm | |
Zhou et al. | Approximate comparator: Design and analysis |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20200424 |