CN115983354B - High-precision adjustable general activation function implementation method - Google Patents

High-precision adjustable general activation function implementation method Download PDF

Info

Publication number
CN115983354B
CN115983354B CN202310052328.7A CN202310052328A CN115983354B CN 115983354 B CN115983354 B CN 115983354B CN 202310052328 A CN202310052328 A CN 202310052328A CN 115983354 B CN115983354 B CN 115983354B
Authority
CN
China
Prior art keywords
segments
approximation
activation function
error
precision
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310052328.7A
Other languages
Chinese (zh)
Other versions
CN115983354A (en
Inventor
马艳华
徐琪灿
陈聪聪
宋泽睿
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dalian University of Technology
Original Assignee
Dalian University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dalian University of Technology filed Critical Dalian University of Technology
Priority to CN202310052328.7A priority Critical patent/CN115983354B/en
Publication of CN115983354A publication Critical patent/CN115983354A/en
Application granted granted Critical
Publication of CN115983354B publication Critical patent/CN115983354B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Complex Calculations (AREA)

Abstract

The invention belongs to the technical field of field programmable gate array hardware accelerators, and discloses a high-precision adjustable general activation function implementation method, which aims to realize high-precision activation function approximation by using a small amount of storage resources and on-chip resources, and can set precision according to requirements to realize balance between precision and storage space. The method for realizing the universal activation function can accurately estimate the accuracy which can be achieved by the segmentation strategy of the provided activation function, thereby realizing the adjustment of the segmentation strategy under the condition that the target of the given activation function approaches the accuracy and avoiding the waste of on-chip resources caused by accuracy overflow. Compared with the traditional method, the method provided by the invention has the advantages of higher precision, larger adjustable space and less hardware resources consumption compared with other methods capable of realizing high precision.

Description

High-precision adjustable general activation function implementation method
Technical Field
The invention belongs to the technical field of Field Programmable Gate Array (FPGA) hardware accelerators, and particularly provides an approximation method which aims at effectively solving various common nonlinear activation functions, reduces the consumption of FPGA hardware resources while realizing high precision, provides a larger adjustable space, avoids the resource waste caused by precision overflow, and particularly relates to a high-precision adjustable general activation function realization method.
Background
The nonlinear activation function provides a nonlinear factor to the neural network and is an important component of the neural network. In general, the nonlinear activation functions are quite complex to calculate and difficult to implement accurately on an FPGA. Therefore, when a designer needs to implement a nonlinear activation function on an FPGA, it is necessary to approximate the activation function using a certain approximation method.
In recent years, in order to improve the approximation accuracy of the activation function, students at home and abroad have conducted corresponding researches. The FPGA-oriented activation function approximation method is mainly divided into two types, the first type is a piecewise approximation method, a target activation function is divided into a plurality of areas according to a specific piecewise mode, and each area is described by using different linearization expressions, so that the purpose of approximating an original function is achieved (FPGA implementation for the sigmoid with piecewise linear fitting method based on curvature analysis published in Electronics in 2022). Another method is a lookup table method, in which all the input/output values of the activation function are stored in the memory and read according to the lookup table method (A twofold lookup table architecture for efficient approximation of activation functions published in ieee transactionverylargescalei program system in 2020). This approach can achieve very high accuracy, but requires a large amount of memory space, resulting in a significant stress on the on-chip memory cells of the FPGA. In addition, a hybrid approach has emerged in the current study to improve approximation accuracy by using a small amount of memory (A modular approximationmethodology for efficient fixed-point hardware implementation of the sigmoid function published in IEEETransactions On Industrial Electronics in 2022). However, in the above method, only the piecewise approximation or the mixing method is used, and only the function obtained by the special mathematical calculation through the Sigmoid such as the Sigmoid or the Tanh is ignored, and the new activation functions such as Swish, mish and the like are ignored. Considering the advance of neural network hardware acceleration research, a general method should be designed when an activation function approximation method is designed. If only the look-up table method is used, storing all possible output results of the activation function in the look-up table is not friendly to the memory space on the FPGA chip. Therefore, an activation function approximation method which is universal, high in precision, adjustable and small in occupied storage space is necessary.
Disclosure of Invention
Since the hybrid method in the existing research uses a small amount of memory space and a special mathematical calculation method, the hybrid method is only suitable for Sigmoid activation functions and activation functions obtained by simple mathematical calculation, such as Tanh, and lacks versatility. And due to the fixity of the mathematical calculation process, the realized approximation activation function has smaller adjustable space in precision. Aiming at the problems, the invention provides a general mixed approximation method of an activation function for an FPGA (field programmable gate array), which aims to realize high-precision activation function approximation by using a small amount of storage resources and on-chip resources, and can set precision according to requirements to realize the balance between precision and storage space.
The technical scheme of the invention is as follows:
a high-precision adjustable general activation function implementation method comprises the following steps:
step 1: assuming that the bit width of the input data x is n, uniformly dividing the activation function f (x) into 16 segments, using an expected error E to represent the expected activation function precision, and dividing all segments into three types through calculation according to the expected error;
step 1.1: calculating an approximation error E when approximating an activation function using a piecewise linear approach of 16-segment equipartition 1avg Average curvature of each of the 16 segments and maximum curvature C of the entire activation function max
Step 1.2: determining a required constant coefficient K 1 And K 2 The formula is as follows:
step 1.3: the average curvature of each segment is rearranged in order from big to small, and the segments with the largest average curvature are counted, so that k segments with the largest average curvature are obtained as the first class, wherein k is the minimum integer which needs to satisfy the following inequality:
k<C sum K 2 –16(E 1avg –E)K 1 K 2
wherein C is sum Representing the sum of the average curvatures of the first class of segments of number k;
step 1.4: estimating margin E of approximation error 2 The formula is as follows:
according to margin E 2 Size, from the smallest fraction of the average curvatureStarting counting of the segments, merging adjacent segments, calculating the error increment after merging as error summation before merging the segments and multiplying the error summation by the number of the segments; under the condition of meeting the allowance requirement, combining as many segments as possible to obtain segments to be combined as a third type segment, wherein the number of the segments is m, the rest segments are the second type segments, and the number of the segments is 16-k-m;
step 2: three different approximation methods are used for three different classes of segments;
step 2.1: for the first class of segments, a method of nonlinear approximation is used; firstly, calculating a tangent g (x) of a left end point of the segment, then squaring the lowest n-5 effective bits of the x, taking the first ten bits of the result, forming a corresponding relation with f (x) -g (x), training a single-layer perceptron as a data set, and adding the result of the single-layer perceptron with the g (x) to obtain an approximation result of the f (x);
step 2.2: aiming at the second class of segments, a linear approximation method is used, and approximation is carried out through a least square method;
step 2.3: for the third class of segments, several adjacent segments are combined and then a linear approximation method, i.e. a least squares method, is used.
Step 3: completing hardware deployment according to the algorithm designed in the step 1 and the step 2;
step 3.1: calculating to obtain all weights, biases and coefficients required in the step 1 and the step 2;
step 3.2: coding the highest 4-bit valid bit of x according to the segmentation condition, reading the coefficient and bias of the straight line as an address, if the segmentation uses nonlinear approximation, reading the weight of a single-layer perceptron, and accumulating the bias of the single-layer perceptron to the bias of the straight line;
step 3.3: and (3) cutting the weight of the single-layer perceptron, only leaving the data valid bit, calculating an approximation error, if the approximation error is smaller than the expected error, further cutting the weight, and cutting the bit width of all the weights to be consistent with the weight of the least valid bit.
The invention has the beneficial effects that: the curvature-based approximation precision prediction method provided by the invention can accurately estimate the precision which can be achieved by the proposed activation function segmentation strategy, so that the adjustment of the segmentation strategy under the condition of the approximation precision of the given activation function target is realized, and the waste of on-chip resources caused by precision overflow is avoided. Compared with the traditional method, the method provided by the invention has the advantages of higher precision, larger adjustable space and less hardware resources consumption compared with other methods capable of realizing high precision.
Drawings
FIG. 1 is a hardware deployment scenario of the method used by the present invention, where x represents input data, f (x) represents output data, W i I=1, …,9, which is the weight of the single-layer perceptron. k (k) 1 And b 1 Is the slope and offset of the straight line.
Fig. 2 is a flow chart of an algorithm of a nonlinear approximation method used in the present invention.
FIG. 3 is a schematic diagram of weight truncation for a single layer perceptron in the present invention. Where each point represents a bit, (a) represents the state of an untruncated bit, (b) represents leaving only the data valid bit, (c) represents truncating the valid bits of all data to the same.
Detailed Description
The invention will be further described with reference to the drawings and the specific embodiments, wherein the method provided by the invention is used for approaching the Sigmoid activation function, and the expected error is 7 multiplied by 10 -5
Step 1: the bit width of the input data x is 16, and the formula of the Sigmoid activation function f (x) comprises one decimal place, three decimal places and twelve decimal places:
since the activation function is symmetrical about a point on the vertical axis, only data ranging between 0 and 8 is approximated, dividing the function in this range into 16 segments, each segment having a length of 0.5. The expected activation function approximation error is 7×10 -5 All segments are divided into three categories by calculation based on the expected error.
Step 1.1: calculation of segment line when 16 segment averages are usedApproximation error E when sexual approach approximates activation function 1avg Is 7.11 multiplied by 10 -3 Maximum curvature C of the entire activation function max An average curvature of each segment of 0.0275,0.0717,0.0908,0.0862,0.0690,0.0496,0.0334,0.0216,0.0136,0.0084,0.0052,0.0032,0.0019,0.0012,7.15 ×10, respectively, of 0.0924 -4 ,4.34×10 -4
Step 1.2: calculating required constant coefficient, K1 is 1.299×10 3 K2 is 12.6.
Step 1.3: the average curvature of each segment is reordered according to the sequence from big to small, the segments with the largest average curvature are counted, the number k of segments of the first segment classification is calculated according to a formula, the minimum integer value of k is 9, and the segments of the first class are 9 segments with the largest average curvature, namely 9 segments with 0 to 4.5.
Step 1.4: estimating margin E of approximation error according to formula 2 6.7X10 -6 And according to the margin size, counting from the segment with the minimum average curvature, merging adjacent segments, and estimating the error increment after merging as the error summation before merging the segments and multiplying the error summation by the number of segments. Under the condition that the allowance requirement is met, the segments which can be combined are 7 to 7.5 and 7.5 to 8, the number of the third type of segments is 2, the rest segments are the second type of segments, and the number is 5.
Step 2: three different approaches are used for three different classes of segments.
Step 2.1: for the first class of segments, a method of non-linear approximation is used. Firstly, calculating a tangent g (x) of the left end point of the segment, then squaring the lowest 11 valid bits of the x, taking the first ten bits of the result, forming a corresponding relation with f (x) -g (x), training a single-layer perceptron as a data set, and adding the result of the single-layer perceptron and the g (x) to obtain an approximation result of f (x). The flow of the nonlinear approximation is shown in fig. 2.
Step 2.2: for the second class of segments, a linear approximation method is used, and approximation is performed by a least square method.
Step 2.3: for the third class of segments, several adjacent segments are combined and then a linear approximation method, that is, a least square method, is used.
Step 3: and (3) completing hardware deployment according to the algorithm designed in the step (1) and the step (2).
Step 3.1: all weights, biases and coefficients required in the step 1 and the step 2 are obtained through calculation. The weights using the nonlinear approximation method are shown in table 1. The coefficients and offsets of the straight lines are shown in table 2.
Step 3.2: the most significant 4 bits of x are encoded according to the segmentation condition, the coefficients and the bias of the straight line are read as addresses, if nonlinear approximation is used for the segmentation, the weight of the single-layer perceptron is also required to be read, and the bias of the single-layer perceptron is added to the bias of the straight line. The hardware deployment is shown in fig. 1.
Step 3.3: the weight of the single-layer perceptron is cut off, only the data valid bit is left, and the approximation error is calculated to be 6.54 multiplied by 10 -5 Smaller than expected error, thus further truncating the weights, truncating the bit widths of all weights to be consistent with the weight of the least significant bit, increasing the approximation error to 6.82 x 10 -5
The approximation result and the resource use condition of the invention are shown in table 3, and the following can be seen from the above: the invention provides an approximation method which is based on an FPGA and is effective for various common nonlinear activation functions. The whole method mainly comprises the steps of adopting different approximation methods according to the average curvature of the segments, estimating the number of the segments of the activation function by using the different approximation methods according to the approximation error of the expected activation function and hardware deployment thereof. The method reduces the consumption of hardware resources while achieving high precision, and provides a large adjustable space.
Table 1 is the weights and biases used in the present invention when approximating the Sigmoid activation function using a nonlinear method
Left end point of interval 0 0.5 1 1.5 2 2.5 3 3.5 4
W 0 (×10 -6 ) 5.72 13.4 14.3 11.4 7.63 3.81 2.86 2.86 1.91
W 1 (×10 -6 ) 9.54 21.9 24.8 22.9 15.3 8.58 6.68 4.77 3.81
W 2 (×10 -6 ) 19.1 39.1 47.7 45.8 31.5 19.1 14.3 9.54 8.58
W 3 (×10 -5 ) 3.53 7.34 9.25 9.16 6.39 4.10 3.05 2.00 1.62
W 4 (×10 -5 ) 8.49 14.3 18.4 18.3 13.2 8.77 6.20 4.01 3.15
W 5 (×10 -5 ) 14.8 28.4 36.7 36.4 26.6 18.0 12.8 8.30 6.29
W 6 (×10 -4 ) 2.57 5.67 7.32 7.26 5.41 3.74 2.61 1.68 1.23
W 7 (×10 -4 ) 4.84 11.34 14.66 14.42 10.94 7.66 5.27 3.41 2.40
W 8 (×10 -4 ) 9.65 22.89 29.45 28.73 21.98 15.46 10.61 6.87 4.74
W 9 (×10 -4 ) 19.28 46.49 58.07 56.46 43.48 30.59 20.66 13.41 8.98
Bias (. Times.10) -6 ) 132 7.63 -2.86 1.91 -37.2 -42.9 -25.7 -16.2 2.86
Table 2 shows the linear slopes and offsets of all segments when approaching the Sigmoid activation function in the present invention
Table 3 shows the results of the present invention compared with other designs in terms of accuracy and hardware resource usage
Design of Average error Maximum error Lookup table Trigger device Storage occupancy
The invention is that 6.82×10 -5 2.91×10 -4 200 61 827 bits
Piecewise linearity 5.87×10 -3 1.89×10 -2 158 46 0 bit
Lookup table method 6.17×10 -5 1.22×10 -4 0 0 10 6 Bits

Claims (1)

1. The method for realizing the high-precision adjustable general activation function is characterized by comprising the following steps of:
step 1: assuming that the bit width of the input data x is n, uniformly dividing the activation function f (x) into 16 segments, using an expected error E to represent the expected activation function precision, and dividing all segments into three types through calculation according to the expected error;
step 1.1: calculating an approximation error E when approximating an activation function using a piecewise linear approach of 16-segment equipartition 1avg Average curvature of each of the 16 segments and maximum curvature C of the entire activation function max
Step 1.2: determining constant coefficient K 1 And K 2 The formula is as follows:
step 1.3: the average curvature of each segment is reordered from the largest to the smallest, and the segments with the largest average curvature are counted, so that k segments with the largest average curvature are obtained as the first class, wherein k is the minimum integer which satisfies the following inequality:
k<C sum K 2 –16(E 1avg –E)K 1 K 2
wherein C is sum Representing the sum of the average curvatures of the first class of segments of number k;
step 1.4: estimating margin E of approximation error 2 The formula is as follows:
according to margin E 2 Counting from the segment with the minimum average curvature, merging adjacent segments, and calculating the error increment after merging as the error summation before merging the segments and multiplying the error summation by the number of segments; under the condition of meeting the allowance requirement, combining as many segments as possible to obtain segments to be combined as a third type segment, wherein the number of the segments is m, the rest segments are the second type segments, and the number of the segments is 16-k-m;
step 2: three different approximation methods are used for three different classes of segments;
step 2.1: for the first class of segments, a method of nonlinear approximation is used; firstly, calculating a tangent g (x) of a left end point of the segment, then squaring the lowest n-5 effective bits of the x, taking the first ten bits of the result, forming a corresponding relation with f (x) -g (x), training a single-layer perceptron as a data set, and adding the result of the single-layer perceptron with the g (x) to obtain an approximation result of the f (x);
step 2.2: aiming at the second class of segments, a linear approximation method is used, and approximation is carried out through a least square method;
step 2.3: for the third class of segments, combining a plurality of adjacent segments and then using a linear approximation method, namely a least square method;
step 3: completing hardware deployment according to the algorithm designed in the step 1 and the step 2;
step 3.1: calculating to obtain all weights, biases and coefficients required in the step 1 and the step 2;
step 3.2: coding the highest 4-bit valid bit of x according to the segmentation condition, reading the coefficient and bias of the straight line as an address, if the segmentation uses nonlinear approximation, reading the weight of a single-layer perceptron, and accumulating the bias of the single-layer perceptron to the bias of the straight line;
step 3.3: and (3) cutting the weight of the single-layer perceptron, only leaving the data valid bit, calculating an approximation error, if the approximation error is smaller than the expected error, further cutting the weight, and cutting the bit width of all the weights to be consistent with the weight of the least valid bit.
CN202310052328.7A 2023-02-02 2023-02-02 High-precision adjustable general activation function implementation method Active CN115983354B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310052328.7A CN115983354B (en) 2023-02-02 2023-02-02 High-precision adjustable general activation function implementation method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310052328.7A CN115983354B (en) 2023-02-02 2023-02-02 High-precision adjustable general activation function implementation method

Publications (2)

Publication Number Publication Date
CN115983354A CN115983354A (en) 2023-04-18
CN115983354B true CN115983354B (en) 2023-08-22

Family

ID=85975984

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310052328.7A Active CN115983354B (en) 2023-02-02 2023-02-02 High-precision adjustable general activation function implementation method

Country Status (1)

Country Link
CN (1) CN115983354B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108537332A (en) * 2018-04-12 2018-09-14 合肥工业大学 A kind of Sigmoid function hardware-efficient rate implementation methods based on Remez algorithms
CN110210612A (en) * 2019-05-14 2019-09-06 北京中科汇成科技有限公司 A kind of integrated circuit accelerated method and system based on dispositif de traitement lineaire adapte approximating curve
CN110659015A (en) * 2018-06-29 2020-01-07 英特尔公司 Deep neural network architecture using piecewise linear approximation
CN111680782A (en) * 2020-05-20 2020-09-18 河海大学常州校区 FPGA-based RBF neural network activation function implementation method
CN113837365A (en) * 2021-09-22 2021-12-24 中科亿海微电子科技(苏州)有限公司 Model for realizing sigmoid function approximation, FPGA circuit and working method
CN115423081A (en) * 2022-09-21 2022-12-02 重庆邮电大学 Neural network accelerator based on CNN _ LSTM algorithm of FPGA

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108537332A (en) * 2018-04-12 2018-09-14 合肥工业大学 A kind of Sigmoid function hardware-efficient rate implementation methods based on Remez algorithms
CN110659015A (en) * 2018-06-29 2020-01-07 英特尔公司 Deep neural network architecture using piecewise linear approximation
CN110210612A (en) * 2019-05-14 2019-09-06 北京中科汇成科技有限公司 A kind of integrated circuit accelerated method and system based on dispositif de traitement lineaire adapte approximating curve
CN111680782A (en) * 2020-05-20 2020-09-18 河海大学常州校区 FPGA-based RBF neural network activation function implementation method
CN113837365A (en) * 2021-09-22 2021-12-24 中科亿海微电子科技(苏州)有限公司 Model for realizing sigmoid function approximation, FPGA circuit and working method
CN115423081A (en) * 2022-09-21 2022-12-02 重庆邮电大学 Neural network accelerator based on CNN _ LSTM algorithm of FPGA

Also Published As

Publication number Publication date
CN115983354A (en) 2023-04-18

Similar Documents

Publication Publication Date Title
Lewis Interleaved memory function interpolators with application to an accurate LNS arithmetic unit
CN108596328B (en) Fixed point method and device and computer equipment
Boghosian et al. A new pathology in the simulation of chaotic dynamical systems on digital computers
CN114066699A (en) Carbon emission measuring and calculating method and device and terminal equipment
CN115983354B (en) High-precision adjustable general activation function implementation method
Belinsky et al. Small ball probabilities of fractional Brownian sheets via fractional integration operators
CN110969296B (en) Electric heating load prediction method and device and terminal equipment
CN112686384A (en) Bit-width-adaptive neural network quantization method and device
US7058540B2 (en) Method and system for accelerating power complementary cumulative distribution function measurements
WO2021213649A1 (en) Method and system for generating a predictive model
Keelin et al. The metalog distributions and extremely accurate sums of lognormals in closed form
CN110703038A (en) Harmonic impedance estimation method suitable for fan access power distribution network
Kosunen et al. DNL and INL yield models for a current-steering D/A converter
Lu et al. Auto-LUT: Auto Approximation of Non-Linear Operations for Neural Networks on FPGA
CN115642913A (en) Analog-to-digital converter ADC calibration method, device, equipment and storage medium
CN113076663A (en) Dynamic hybrid precision model construction method and system
CN112766537A (en) Short-term electric load prediction method
Winzer Accuracy of error propagation exemplified with ratios of random variables
CN113656953B (en) Wind power sequence modeling method based on state number optimal decision model
Cheah et al. Some alternatives to Edgeworth
Tian et al. A Low-Latency Power Series Approximate Computing and Architecture for Co-Calculation of Division and Square Root
Caselli et al. Modelling and Optimization of a Mixed-Signal Accelerator for Deep Neural Networks
Lu et al. Estimation in the generalized linear empirical Bayes model using the extended quasi-likelihood
CN103490778A (en) Multistage parallel super-high-speed ADC and multistage parallel super-high-speed DAC for realizing logarithmic companding law
CN113449242B (en) Logistics bill data processing method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant