CN116432711A  Hardware implementation method and device of SiLU activation function and computing equipment  Google Patents
Hardware implementation method and device of SiLU activation function and computing equipment Download PDFInfo
 Publication number
 CN116432711A CN116432711A CN202310166986.9A CN202310166986A CN116432711A CN 116432711 A CN116432711 A CN 116432711A CN 202310166986 A CN202310166986 A CN 202310166986A CN 116432711 A CN116432711 A CN 116432711A
 Authority
 CN
 China
 Prior art keywords
 function
 interval
 value
 target
 silu
 Prior art date
 Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
 Granted
Links
 230000004913 activation Effects 0.000 title claims abstract description 69
 238000000034 method Methods 0.000 title claims abstract description 45
 230000006870 function Effects 0.000 claims abstract description 228
 238000004422 calculation algorithm Methods 0.000 claims description 95
 230000011218 segmentation Effects 0.000 claims description 25
 238000013528 artificial neural network Methods 0.000 claims description 19
 238000004364 calculation method Methods 0.000 claims description 18
 230000008569 process Effects 0.000 claims description 9
 238000004590 computer program Methods 0.000 claims description 7
 238000013135 deep learning Methods 0.000 abstract description 2
 238000010586 diagram Methods 0.000 description 10
 238000004891 communication Methods 0.000 description 7
 230000008859 change Effects 0.000 description 4
 238000001514 detection method Methods 0.000 description 4
 238000012545 processing Methods 0.000 description 4
 238000011176 pooling Methods 0.000 description 3
 NAWXUBYGYWOOIXSFHVURJKSAN (2s)2[[4[2(2,4diaminoquinazolin6yl)ethyl]benzoyl]amino]4methylidenepentanedioic acid Chemical compound C1=CC2=NC(N)=NC(N)=C2C=C1CCC1=CC=C(C(=O)N[C@@H](CC(=C)C(O)=O)C(O)=O)C=C1 NAWXUBYGYWOOIXSFHVURJKSAN 0.000 description 2
 238000013527 convolutional neural network Methods 0.000 description 2
 230000000694 effects Effects 0.000 description 2
 238000005516 engineering process Methods 0.000 description 1
 238000000605 extraction Methods 0.000 description 1
 238000012986 modification Methods 0.000 description 1
 230000004048 modification Effects 0.000 description 1
 238000003062 neural network model Methods 0.000 description 1
 230000003287 optical effect Effects 0.000 description 1
 238000006467 substitution reaction Methods 0.000 description 1
 238000012549 training Methods 0.000 description 1
 238000012546 transfer Methods 0.000 description 1
Images
Classifications

 G—PHYSICS
 G06—COMPUTING; CALCULATING OR COUNTING
 G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
 G06N3/00—Computing arrangements based on biological models
 G06N3/02—Neural networks
 G06N3/06—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
 G06N3/063—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means

 Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSSSECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSSREFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
 Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
 Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
 Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Abstract
The invention provides a hardware realization method, a device and a computing device of a SiLU activation function, which relate to the technical field of deep learning.
Description
Technical Field
The present invention relates to the field of deep learning technologies, and in particular, to a method, an apparatus, and a computing device for implementing a SiLU activation function by hardware.
Background
Under the support of hardware computing power and mass data, the deep neural network is rapidly developed, is widely applied to the fields of image recognition, industrial fault detection, unmanned driving and the like by virtue of excellent feature extraction capability, takes an activation function as an important component of the neural network, and has the function of improving the robustness and nonlinear expression capability of a neural network model.
The SiLU function is a common activation function of the artificial intelligent neural network and has the characteristics of capability of avoiding overfitting, excellent regularization effect, easy training and the like. However, when implementing the SiLU function on hardware, not only a large amount of logic resources are consumed, but the computation is slow.
Disclosure of Invention
The invention aims to provide a hardware implementation method, a device and a computing device of a SiLU activation function, so that hardware consumption and computing time during SiLU function computing are reduced under the condition that the accuracy of the SiLU function is ensured.
In a first aspect, an embodiment of the present invention provides a hardware implementation method of a lu activation function, which is applied to an activation layer of a target neural network, where the activation layer is a lu function; the hardware implementation method of the SiLU activation function comprises the following steps:
acquiring target data to be processed; the target data are input data of the activation layer when the target neural network processes the original data;
when the target data belong to a first interval, determining a first target fitting algorithm as a first direct proportion function; the first interval is an interval larger than or equal to a preset first segmentation value;
when the target data belong to a second interval, determining that the first target fitting algorithm is a second direct proportion function; the second interval is an interval which is larger than or equal to a preset second segmentation value and smaller than the first segmentation value;
when the target data belong to a third interval, determining that the first target fitting algorithm is a uniform compensation table lookup algorithm; wherein the third interval is an interval greater than or equal to 0 and less than the second segment value;
when the target data belong to a fourth interval, determining that a first target fitting algorithm is an absolute valuebased fitting algorithm; wherein the fourth interval is an interval smaller than 0;
and calculating the SiLU function value corresponding to the target data by adopting the first target fitting algorithm.
Further, the third interval is a nonnegative interval with a rapid sigmoid function change, the uniform compensation table lookup algorithm refers to searching corresponding sigmoid function values in a preset storage table through address compensation to calculate the SiLU function values, and the preset storage table stores a preset number of sigmoid function values corresponding to the third interval through address indexes.
Further, the first direct scaling function is:
f(x)＝x；
the second interval comprises a plurality of subintervals, each subinterval corresponding to a different second direct proportional function.
Further, the second interval includes two subintervals obtained by splitting a preset third segmentation value, and the second direct proportion function includes:
wherein k is _{1} 、k _{2} All are preset parameter values, a _{1} For the first segment value, a _{2} For the third segment value, a _{3} Is the second segment value.
Further, the preset storage table uses binary address indexes with preset digits, the 0 th bit of the address indexes is a compensation bit, and other bits of the address indexes are valid bits; the calculating, by using the first target fitting algorithm, a lu function value corresponding to the target data includes:
when the first target fitting algorithm is a uniform compensation table lookup algorithm, converting the target data into a binary initial index address value;
correcting the initial index address value according to the value of the compensation bit in the initial index address value to obtain a target index address value;
searching in the preset storage table according to the target index address value to obtain a corresponding sigmoid function value;
and calculating the SiLU function value corresponding to the target data according to the corresponding sigmoid function value.
Further, the calculating, according to the corresponding sigmoid function value, a SiLU function value corresponding to the target data includes:
bringing the corresponding sigmoid function value into the following formula, and calculating to obtain a SiLU function value corresponding to the target data:
f(x)＝x·s(x)，x∈[0，a _{3} )；
wherein s (x) is the corresponding sigmoid function value.
Further, the calculating, by using the first target fitting algorithm, a SiLU function value corresponding to the target data includes:
when the first target fitting algorithm is an absolute valuebased fitting algorithm, calculating to obtain an absolute value of the target data;
determining a second target fitting algorithm based on the interval to which the absolute value of the target data belongs; wherein the second target fitting algorithm comprises the first direct scaling function, the second direct scaling function or the uniform compensation lookup table algorithm;
calculating a SiLU function value corresponding to the absolute value of the target data by adopting the second target fitting algorithm;
and calculating the SiLU function value corresponding to the target data according to the SiLU function value corresponding to the absolute value of the target data.
Further, the calculating the lu function value corresponding to the target data according to the lu function value corresponding to the absolute value of the target data includes:
bringing the SiLU function value corresponding to the absolute value of the target data into the following formula, and calculating to obtain the SiLU function value corresponding to the target data:
f(x)＝f(x)x，x∈(∞，0，)；
wherein f (x) is a Silu function value corresponding to x), and x is an absolute value of x.
In a second aspect, the embodiment of the present invention further provides a hardware implementation device of a SiLU activation function, which is applied to an activation layer of a target neural network, where the activation layer is a SiLU function; the hardware implementation device of the SiLU activation function comprises:
the data acquisition module is used for acquiring target data to be processed; the target data are input data of the activation layer when the target neural network processes the original data;
the algorithm determining module is used for determining that a first target fitting algorithm is a first direct proportion function when the target data belong to a first interval; the first interval is an interval larger than or equal to a preset first segmentation value; when the target data belong to a second interval, determining that the first target fitting algorithm is a second direct proportion function; the second interval is an interval which is larger than or equal to a preset second segmentation value and smaller than the first segmentation value; when the target data belong to a third interval, determining that the first target fitting algorithm is a uniform compensation table lookup algorithm; wherein the third interval is an interval greater than or equal to 0 and less than the second segment value; when the target data belong to a fourth interval, determining that a first target fitting algorithm is an absolute valuebased fitting algorithm; wherein the fourth interval is an interval smaller than 0;
and the fitting calculation module is used for calculating and obtaining the SiLU function value corresponding to the target data by adopting the first target fitting algorithm.
In a third aspect, an embodiment of the present invention further provides a computing device, including a memory, and a processor, where the memory stores a computer program that can run on the processor, and the processor implements a hardware implementation method of the SiLU activation function of the first aspect when the processor executes the computer program.
According to the hardware implementation method, the device and the computing equipment of the SiLU activation function, provided by the embodiment of the invention, when the SiLU activation function is computed on target data, the function simplification mode of segmented fitting is adopted, and the method comprises the steps of partial approximation and additional compensation of a uniform compensation table lookup algorithm, so that the hardware consumption and the computing time during the SiLU function computation are reduced under the condition that the accuracy of the SiLU function is ensured.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are needed in the description of the embodiments or the prior art will be briefly described, and it is obvious that the drawings in the description below are some embodiments of the present invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.
FIG. 1 is a functional image of a SiLU function;
FIG. 2 is a functional image of a sigmoid function;
fig. 3 is a flow chart of a method for implementing a hardware of a SiLU activation function according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of an address index according to an embodiment of the present invention;
fig. 5 is a schematic structural diagram of a hardware implementation device of a SiLU activation function according to an embodiment of the present invention;
fig. 6 is a schematic structural diagram of a SiLU function calculation unit according to an embodiment of the present invention;
fig. 7 is a schematic structural diagram of a computing device according to an embodiment of the present invention.
Detailed Description
The technical solutions of the present invention will be clearly and completely described in connection with the embodiments, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The activation function acts as an important component of the neural network, typically after the convolutional layer, its output acts as an input to the subsequent pooling layer. The SiLU function is a common activation function, and the expression of the SiLU function is as follows:
the functional image is shown in fig. 1.
The SiLU function is a nonlinear function, and involves not only exponent operation but also division operation, so when implementing the SiLU function on hardware, not only a large amount of logic resources are required to be consumed, but also calculation is slow.
Based on this, the hardware implementation method, device and computing equipment for the Silu activation function provided by the embodiments of the present invention reduce the cost of hardware resources by means of segment fitting (i.e. simplifying the functions of different intervals by segment fitting), and ensure the accuracy of the Silu function by means of partial approximation and additional compensation (i.e. fitting by means of uniform compensation lookup table in intervals where linear fitting cannot be performed).
As shown in equation (2), the SiLU function can be divided into x and x according to its definitionTwo parts, wherein the latter part is a sigmoid function. The functional image of the sigmoid function is shown in fig. 2 as being centrosymmetric about the point (0, 0.5), i.e. when x < 0, equation (3) holds. By utilizing this property, the SiLU function can be divided into two segments ( ++0) and [0, ++infinity), as shown in equation (4).
In hardware, the division operation and the exponent operation are compared and consume resources, so the embodiment of the invention only aims at the SiLU functionAnd performing approximation processing on the part.
For the convenience of understanding the present embodiment, a detailed description will be first given of a hardware implementation method of a SiLU activation function disclosed in the present embodiment.
Embodiments of the present invention provide a method for implementing a SiLU activation function in hardware, which may be performed by a computing device having data processing capabilities, where the hardware of the computing device may employ an FPGA (Field Programmable Gate Array ). The method is applied to an activation layer of the target neural network, wherein the activation layer is a SiLU function. Referring to fig. 3, a flow chart of a hardware implementation method of a SiLU activation function is shown, and the method mainly includes steps S302 to S306 as follows:
step S302, target data to be processed is acquired.
The target data is input data of an activation layer when the target neural network processes the original data. The target neural network is a convolutional neural network that realizes a specific function such as image recognition, industrial fault detection, unmanned driving detection, or classification, and thus, raw data is input data corresponding to the specific function, for example, raw data may be an image corresponding to the image recognition function, or industrial data corresponding to the industrial fault detection function, or the like. The target neural network may include a plurality of network layers, such as a convolution layer, an activation layer, a pooling layer, a full connection layer, etc., connected in a certain order, and typically, the upper layer of the activation layer is the convolution layer, and the lower layer of the activation layer is the pooling layer.
Step S304, determining a first target fitting algorithm according to the interval to which the target data belongs.
In this embodiment, [0 ] + ] is divided into [0, a ] according to the change of the sigmoid function _{3} )、[a _{3} ,a _{1} )、[a _{1} , +++) threestage, respectively corresponding to a third interval, a second interval and a first interval, the sigmoid function varies slowly between the second interval and the first interval, a proportional function may be used, the sigmoid function changes faster in the third interval, and if one monotonic function fitting is still used, the error is larger, so that a uniform compensation table lookup algorithm can be adopted; further, as can be seen from the equation (4), the negative number interval ( ≡0) may employ a fitting algorithm based on absolute values. Based on this, the above step S304 may include:
when the target data belong to a first interval, determining that a first target fitting algorithm is a first direct proportion function; wherein the first section is a section greater than or equal to a preset first segment value (i.e., [ a ] _{1} ,+∞)，a _{1} A first segment value);
when the target data belong to the second interval, determining that the first target fitting algorithm is a second direct proportion function; wherein the second interval is an interval greater than or equal to a preset second segment value and less than the first segment value (i.e., [ a ] _{3} ,a _{1} )，a _{3} Is the second segment value);
when the target data belong to a third interval, determining that the first target fitting algorithm is a uniform compensation table lookup algorithm; wherein the third interval is an interval greater than or equal to 0 and less than the second segment value (i.e., [0, a ] _{3} ))；
When the target data belong to the fourth interval, determining that the first target fitting algorithm is an absolute valuebased fitting algorithm; wherein the fourth interval is an interval less than 0 (i.e., ( +, 0)).
Further, consider x ε [ a ] _{1} , + infinity at the time of the transfer of the sample), the sigmoid function value tends to be 1, thus, the first direct scaling function may be:
f(x)＝x (5)
consider x ε a _{3} ,a _{1} ) When there is the following equation (6), the second direct proportional function may take the form of equation (7).
f(x)＝xx/k (7)
Wherein k is a preset parameter value.
Further, to improve the accuracy of the SiLU function, the second interval may be divided into a plurality of subintervals, each subinterval corresponding to a different second direct proportional function.
In one possible implementation manner, the second interval includes two subintervals obtained by splitting a preset third segmentation value, and the second direct proportion function includes:
wherein k is _{1} 、k _{2} All are preset parameter values, a _{1} For the first segment value, a _{2} For the third segment value, a _{3} Is the second segment value.
Further, the third interval is a nonnegative interval with a rapid sigmoid function change, the uniform compensation table lookup algorithm refers to searching a corresponding sigmoid function value in a preset storage table through address compensation to calculate a SilU function value, and the preset storage table stores a preset number of sigmoid function values corresponding to the third interval through address index. The preset number may be set according to actual requirements, and is not limited herein. For example, the preset number is 1024, at which time the third section may be divided into 1024 parts, preferably 1024 are stored using ROM (ReadOnly Memory)Values.
The preset storage table may use a binary address index with preset bits, where the 0 th bit of the address index is a compensation bit, and the other bits of the address index are valid bits, where the compensation bit is used for performing decimal compensation on the address index: when the compensation bit value is 0, the value of the valid bit is directly used as the address index value, and when the compensation bit value is 1, the value of the valid bit is added with 1 to be used as the address index value, so that the additional compensation of the address index is realized. The preset number of bits corresponds to the preset number, the preset number of bits is equal to the number of bits of the valid bit plus 1, and the storage number corresponding to the valid bit is greater than or equal to the preset number. For example, when the preset number is 1024, the preset number of bits is 11, and the address index includes 10 valid bits and 1 compensation bit.
Step S306, a first target fitting algorithm is adopted, and SiLU function values corresponding to the target data are calculated.
Specifically, when the target data belongs to the first interval or the second interval, the target data is directly brought into the corresponding proportional function calculation, and the SiLU function value corresponding to the target data can be obtained.
When the target data belongs to the third interval, a uniform compensation table lookup algorithm is adopted, and the process of calculating the SilU function value corresponding to the target data can be as follows: converting the target data into binary initial index address values; correcting the initial index address value according to the value of the compensation bit in the initial index address value to obtain a target index address value; searching in a preset storage table according to the target index address value to obtain a corresponding sigmoid function value; and calculating to obtain the SiLU function value corresponding to the target data according to the corresponding sigmoid function value.
In some possible embodiments, the initial index address value d of x may be calculated by equation (9) _{0} The method comprises the steps of carrying out a first treatment on the surface of the When the value of the compensation bit in the initial index address value is 0, taking the value of the valid bit corresponding to the initial index address value as a target index address value; when the value of the compensation bit in the initial index address value is 1, adding 1 to the value of the valid bit corresponding to the initial index address value to serve as a target index address value; then, according to the target index address value, a sigmoid function value corresponding to x (namely a corresponding sigmoid function value) is found in a preset storage table; finally, bringing the sigmoid function value corresponding to x into a formula (10), and calculating to obtain the SiLU function value corresponding to x:
d _{0} ＝x m/0 _{3} (9)
f(x)＝x·s(x)，x∈[0，a _{3} ) (10)
wherein m is the number of sigmoid function values stored in a preset memory table (i.e. preset number), a _{3} S (x) is a sigmoid function value corresponding to x for the second segment value.
In one possible implementation, a _{3} =8, m=1024, i.e. dividing the [0,8 ] interval into 1024 parts, storing 1024 sigmoid function values using ROM (ReadOnly Memory); for ROM, an 11bit address index is used, as shown in FIG. 4, in which the upper 101 bits are the valid bits of the address index and the 0 th bit is the offset bits of the address index. When calculating the address index value of ROM, if the compensation bit value is 0, the value of 10bit significant bit is directly used as the index address of ROMA value; if the compensation bit value is 1, adding 1 to the value of the 10bit valid bit to be used as an index address value of the ROM; and searching a sigmoid function value corresponding to x in the ROM according to the address index value, and finally calculating by using the formula (10) to obtain the SiLU function value corresponding to x. For example, when x=0.1, the initial index address value is a binary number corresponding to 12.8, and the offset bit is 1 at this time, so the target index address value is a binary number corresponding to 13, the sigmoid function value corresponding to 0.1 is found in the ROM according to the binary number corresponding to 13, and then the SiLU function value corresponding to 0.1 is calculated using equation (10).
When the target data belongs to the fourth interval, the process of calculating the SiLU function value corresponding to the target data by adopting the fitting algorithm based on the absolute value may be as follows: calculating to obtain an absolute value of target data; determining a second target fitting algorithm based on an interval to which the absolute value of the target data belongs; the second target fitting algorithm comprises a first direct proportion function, a second direct proportion function or a uniform compensation table lookup algorithm; calculating a SiLU function value corresponding to the absolute value of the target data by adopting a second target fitting algorithm; and calculating the SiLU function value corresponding to the target data according to the SiLU function value corresponding to the absolute value of the target data.
In some possible embodiments, the lu function value corresponding to the absolute value of the target data may be taken into the following formula (11), and the lu function value corresponding to the target data is calculated:
f(x)＝f(x)x，x∈(∞，0，) (11)
wherein f (x) is a Silu function value corresponding to x), and x is an absolute value of x.
In order to facilitate understanding and implementation, the embodiment of the present invention further provides a comprehensive calculation formula of a hardware implementation method of a SiLU activation function, which is as follows:
s (x) is a corresponding sigmoid function value obtained by searching in a preset storage table through address compensation.
And the SiLU function calculation is carried out through the formula (12), so that the error of the SiLU function is smaller, and the accuracy is higher.
According to the hardware implementation method of the SiLU activation function, the SiLU activation function is split, complex exponential operation and division operation are simplified into linear operation in different value intervals in a piecewise fitting mode, fitting is performed in intervals where linear fitting cannot be performed through a uniform compensation table lookup method, accuracy of a convolutional neural network algorithm can be guaranteed under low hardware complexity, the hardware implementation problem of the complex activation function is solved, and therefore accuracy is guaranteed, meanwhile consumption of hardware resources is reduced, and calculation time is saved.
The embodiment of the invention also provides a hardware implementation device of the SiLU activation function, which is applied to an activation layer of a target neural network, wherein the activation layer is the SiLU function. Referring to fig. 5, a schematic structural diagram of a hardware implementation device of a SiLU activation function is shown, where the device includes:
a data acquisition module 501, configured to acquire target data to be processed; the target data is input data of an activation layer when the target neural network processes the original data;
the algorithm determining module 502 is configured to determine that the first target fitting algorithm is a first direct proportion function when the target data belongs to the first interval; the first interval is an interval larger than or equal to a preset first segmentation value; when the target data belong to the second interval, determining that the first target fitting algorithm is a second direct proportion function; the second interval is an interval which is larger than or equal to a preset second segmentation value and smaller than the first segmentation value; when the target data belong to a third interval, determining that the first target fitting algorithm is a uniform compensation table lookup algorithm; wherein the third interval is an interval which is greater than or equal to 0 and smaller than the second segmentation value; when the target data belong to the fourth interval, determining that the first target fitting algorithm is an absolute valuebased fitting algorithm; wherein the fourth interval is an interval smaller than 0;
the fitting calculation module 503 is configured to calculate a lu function value corresponding to the target data by using a first target fitting algorithm.
Further, the third interval is a nonnegative interval with a rapid sigmoid function change, the uniform compensation table lookup algorithm refers to searching a corresponding sigmoid function value in a preset storage table through address compensation to calculate a SilU function value, and the preset storage table stores a preset number of sigmoid function values corresponding to the third interval through address index.
Further, the first direct scaling function is:
f(x)＝x；
the second interval includes a plurality of subintervals, each subinterval corresponding to a different second direct proportional function.
Further, the second interval includes two subintervals obtained by splitting a preset third segmentation value, and the second direct proportion function includes:
wherein k is _{1} 、k _{2} All are preset parameter values, a _{1} For the first segment value, a _{2} For the third segment value, a _{3} Is the second segment value.
Further, the preset storage table uses binary address indexes with preset digits, the 0 th bit of the address indexes is a compensation bit, and other bits of the address indexes are valid bits; the fitting calculation module 503 is specifically configured to:
when the first target fitting algorithm is a uniform compensation table lookup algorithm, converting target data into binary initial index address values;
correcting the initial index address value according to the value of the compensation bit in the initial index address value to obtain a target index address value;
searching in a preset storage table according to the target index address value to obtain a corresponding sigmoid function value;
and calculating to obtain the SiLU function value corresponding to the target data according to the corresponding sigmoid function value.
Further, the fitting calculation module 503 is further configured to:
the corresponding sigmoid function value is brought into the following formula, and the SiLU function value corresponding to the target data is obtained through calculation:
f(x)＝x·s(x)，x∈[0，a _{3} )；
where s (x) is the corresponding sigmoid function value.
Further, the fitting calculation module 503 is further configured to:
when the first target fitting algorithm is an absolute valuebased fitting algorithm, calculating to obtain an absolute value of target data;
determining a second target fitting algorithm based on an interval to which the absolute value of the target data belongs; the second target fitting algorithm comprises a first direct proportion function, a second direct proportion function or a uniform compensation table lookup algorithm;
calculating a SiLU function value corresponding to the absolute value of the target data by adopting a second target fitting algorithm;
and calculating the SiLU function value corresponding to the target data according to the SiLU function value corresponding to the absolute value of the target data.
Further, the fitting calculation module 503 is further configured to:
bringing the SiLU function value corresponding to the absolute value of the target data into the following formula, and calculating to obtain the SiLU function value corresponding to the target data:
f(x)＝f(x)x，x∈(∞，0，)；
wherein f (x) is a Silu function value corresponding to x), and x is an absolute value of x.
The implementation principle and the generated technical effects of the hardware implementation device of the lu activation function provided in this embodiment are the same as those of the foregoing hardware implementation method embodiment of the lu activation function, and for a brief description, reference may be made to corresponding contents in the foregoing hardware implementation method embodiment of the lu activation function where the embodiment of the hardware implementation device of the lu activation function is not mentioned.
The embodiment of the invention also provides a SiLU function calculation unit which is a hardware device corresponding to the hardware implementation device of the SiLU activation function. As shown in fig. 6, the lu function calculation unit 600 provided by the embodiment of the invention includes an input register 601, an absolute value calculation unit 602, a positive and negative judgment unit 603, an interval judgment unit 604, a linear fitting unit 0, a linear fitting unit 1, a linear fitting unit 2, an index compensation circuit 606, a lookup table 607, a selector 608, and an arithmetic logic unit 609. The input register 601 is used for storing a variable x of an input SiLU function; the absolute value calculating unit 602 and the positive and negative judging unit 603 are respectively used for calculating the absolute value of the variable x and judging the positive and negative conditions of the variable x; the linear fitting unit calculates an intermediate result of the SiLU function in a linear fitting mode; the index compensation circuit 606 performs data compensation on the input of the lookup table 607 to improve the calculation accuracy, and specifically refers to the abovementioned SiLU function implementation principle; the lookup table 607 stores intermediate results of the SiLU function, which may be a register array, ROM or RAM onchip storage logic resources; the arithmetic logic unit 609 finally selects the corresponding intermediate result according to the absolute value interval segmentation condition of the variable x, and finally calculates the final result through some arithmetic logic according to the positive and negative conditions and outputs the final result.
Referring to fig. 7, fig. 7 is a schematic structural diagram of a computing device 700 according to an embodiment of the present invention, where the computing device 700 may include: a processor 701, a memory 702, a communication interface 703 and a communication bus 704. The processor 701, the memory 702 and the communication interface 703 all perform communication with each other via a communication bus 704.
In an embodiment of the present invention, the processor 701 may be a central processing unit (Central Processing Unit, CPU), an asic, a dsp, a field programmable gate array, or other programmable logic device, etc.
The processor 701 may call a program stored in the memory 702, and in particular, the processor 701 may perform the operations of the hardware implementation method of the lu activation function described above.
The memory 702 is used for storing one or more programs, and the programs may include program codes, and the program codes include computer operation instructions, in the embodiment of the present invention, at least the programs for implementing the following functions are stored in the memory 702:
acquiring target data to be processed; the target data is input data of an activation layer when the target neural network processes the original data;
when the target data belong to a first interval, determining that a first target fitting algorithm is a first direct proportion function; the first interval is an interval larger than or equal to a preset first segmentation value;
when the target data belong to the second interval, determining that the first target fitting algorithm is a second direct proportion function; the second interval is an interval which is larger than or equal to a preset second segmentation value and smaller than the first segmentation value;
when the target data belong to a third interval, determining that the first target fitting algorithm is a uniform compensation table lookup algorithm; wherein the third interval is an interval which is greater than or equal to 0 and smaller than the second segmentation value;
when the target data belong to the fourth interval, determining that the first target fitting algorithm is an absolute valuebased fitting algorithm; wherein the fourth interval is an interval smaller than 0;
and calculating to obtain the SiLU function value corresponding to the target data by adopting a first target fitting algorithm.
In one possible implementation, the memory 702 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, and at least one application program required for functionality, etc.; the storage data area may store data created during use.
In addition, the memory 702 may include highspeed random access memory, and may also include nonvolatile memory, such as at least one magnetic disk storage device or other volatile solidstate storage device.
The communication interface 703 may be an interface of a communication module for connecting with other devices or systems.
Of course, it should be noted that the configuration shown in fig. 7 does not limit the computing device 700 in the embodiment of the present invention, and the computing device 700 may include more or less components than those shown in fig. 7 or may combine some components in practical applications.
The embodiment of the invention also provides a computer readable storage medium, and the computer readable storage medium stores a computer program, and the computer program is executed by a processor to execute the hardware implementation method of the SiLU activation function in the previous method embodiment. The computerreadable storage medium includes: a Udisk, a removable hard disk, a ReadOnly Memory (ROM), a RAM, a magnetic disk, or an optical disk, etc., which can store program codes.
Any particular values in all examples shown and described herein are to be construed as merely illustrative and not a limitation, and thus other examples of exemplary embodiments may have different values.
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardwarebased systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the invention.
Claims (10)
1. A hardware realization method of a SiLU activation function is characterized by being applied to an activation layer of a target neural network, wherein the activation layer is the SiLU function; the hardware implementation method of the SiLU activation function comprises the following steps:
acquiring target data to be processed; the target data are input data of the activation layer when the target neural network processes the original data;
when the target data belong to a first interval, determining a first target fitting algorithm as a first direct proportion function; the first interval is an interval larger than or equal to a preset first segmentation value;
when the target data belong to a second interval, determining that the first target fitting algorithm is a second direct proportion function; the second interval is an interval which is larger than or equal to a preset second segmentation value and smaller than the first segmentation value;
when the target data belong to a third interval, determining that the first target fitting algorithm is a uniform compensation table lookup algorithm; wherein the third interval is an interval greater than or equal to 0 and less than the second segment value;
when the target data belong to a fourth interval, determining that a first target fitting algorithm is an absolute valuebased fitting algorithm; wherein the fourth interval is an interval smaller than 0;
and calculating the SiLU function value corresponding to the target data by adopting the first target fitting algorithm.
2. The hardware implementation method of a sizu activation function according to claim 1, wherein the third interval is a nonnegative interval in which the sigmoid function changes faster, the uniform compensation table lookup algorithm refers to searching a preset memory table for a corresponding sigmoid function value through address compensation to calculate a sizu function value, and the preset memory table stores a preset number of sigmoid function values corresponding to the third interval through an address index.
3. The hardware implementation method of the SiLU activation function according to claim 1, wherein the first direct scaling function is:
f(x)＝x；
the second interval comprises a plurality of subintervals, each subinterval corresponding to a different second direct proportional function.
4. The hardware implementation method of a SiLU activation function according to claim 3, wherein the second interval includes two subintervals split by a preset third segmentation value, and the second direct proportional function includes:
wherein k is _{1} 、k _{2} All are preset parameter values, a _{1} For the first segment value, a _{2} For the third segment value, a _{3} Is the second segment value.
5. The hardware implementation method of the SiLU activation function according to claim 2, wherein the preset storage table uses a binary address index of a preset number of bits, the 0 th bit of the address index is a compensation bit, and the other bits of the address index are valid bits; the calculating, by using the first target fitting algorithm, a lu function value corresponding to the target data includes:
when the first target fitting algorithm is a uniform compensation table lookup algorithm, converting the target data into a binary initial index address value;
correcting the initial index address value according to the value of the compensation bit in the initial index address value to obtain a target index address value;
searching in the preset storage table according to the target index address value to obtain a corresponding sigmoid function value;
and calculating the SiLU function value corresponding to the target data according to the corresponding sigmoid function value.
6. The hardware implementation method of the nalu activation function according to claim 5, wherein the calculating the nalu function value corresponding to the target data according to the corresponding sigmoid function value includes:
bringing the corresponding sigmoid function value into the following formula, and calculating to obtain a SiLU function value corresponding to the target data:
f(x)＝x·s(x)，x∈[0，a _{3} )；
wherein s (x) is the corresponding sigmoid function value.
7. The hardware implementation method of the nalu activation function according to claim 1, wherein the calculating, by using the first objective fitting algorithm, the nalu function value corresponding to the objective data includes:
when the first target fitting algorithm is an absolute valuebased fitting algorithm, calculating to obtain an absolute value of the target data;
determining a second target fitting algorithm based on the interval to which the absolute value of the target data belongs; wherein the second target fitting algorithm comprises the first direct scaling function, the second direct scaling function or the uniform compensation lookup table algorithm;
calculating a SiLU function value corresponding to the absolute value of the target data by adopting the second target fitting algorithm;
and calculating the SiLU function value corresponding to the target data according to the SiLU function value corresponding to the absolute value of the target data.
8. The hardware implementation method of the lu activation function according to claim 7, wherein the calculating the lu function value corresponding to the target data according to the lu function value corresponding to the absolute value of the target data includes:
bringing the SiLU function value corresponding to the absolute value of the target data into the following formula, and calculating to obtain the SiLU function value corresponding to the target data:
f(x)＝f(x)x，x∈(∞，0，)；
wherein f (x) is a Silu function value corresponding to x), and x is an absolute value of x.
9. The hardware realization device of the SiLU activation function is characterized by being applied to an activation layer of a target neural network, wherein the activation layer is the SiLU function; the hardware implementation device of the SiLU activation function comprises:
the data acquisition module is used for acquiring target data to be processed; the target data are input data of the activation layer when the target neural network processes the original data;
the algorithm determining module is used for determining that a first target fitting algorithm is a first direct proportion function when the target data belong to a first interval; the first interval is an interval larger than or equal to a preset first segmentation value; when the target data belong to a second interval, determining that the first target fitting algorithm is a second direct proportion function; the second interval is an interval which is larger than or equal to a preset second segmentation value and smaller than the first segmentation value; when the target data belong to a third interval, determining that the first target fitting algorithm is a uniform compensation table lookup algorithm; wherein the third interval is an interval greater than or equal to 0 and less than the second segment value; when the target data belong to a fourth interval, determining that a first target fitting algorithm is an absolute valuebased fitting algorithm; wherein the fourth interval is an interval smaller than 0;
and the fitting calculation module is used for calculating and obtaining the SiLU function value corresponding to the target data by adopting the first target fitting algorithm.
10. A computing device comprising a memory, a processor, the memory having stored therein a computer program executable on the processor, wherein the processor, when executing the computer program, implements a hardware implementation of the SiLU activation function of any of claims 18.
Priority Applications (1)
Application Number  Priority Date  Filing Date  Title 

CN202310166986.9A CN116432711B (en)  20230213  20230213  Hardware implementation method and device of SiLU activation function and computing equipment 
Applications Claiming Priority (1)
Application Number  Priority Date  Filing Date  Title 

CN202310166986.9A CN116432711B (en)  20230213  20230213  Hardware implementation method and device of SiLU activation function and computing equipment 
Publications (2)
Publication Number  Publication Date 

CN116432711A true CN116432711A (en)  20230714 
CN116432711B CN116432711B (en)  20231205 
Family
ID=87084423
Family Applications (1)
Application Number  Title  Priority Date  Filing Date 

CN202310166986.9A Active CN116432711B (en)  20230213  20230213  Hardware implementation method and device of SiLU activation function and computing equipment 
Country Status (1)
Country  Link 

CN (1)  CN116432711B (en) 
Citations (16)
Publication number  Priority date  Publication date  Assignee  Title 

CN108875915A (en) *  20180612  20181123  辽宁工程技术大学  A kind of depth confrontation network optimized approach of Embedded application 
CN110610235A (en) *  20190822  20191224  北京时代民芯科技有限公司  Neural network activation function calculation circuit 
CN110659015A (en) *  20180629  20200107  英特尔公司  Deep neural network architecture using piecewise linear approximation 
CN110688088A (en) *  20190930  20200114  南京大学  General nonlinear activation function computing device and method for neural network 
CN111581593A (en) *  20200421  20200825  天津大学  Configurable reuse sectional type lookup table activation function implementation device 
CN111680782A (en) *  20200520  20200918  河海大学常州校区  FPGAbased RBF neural network activation function implementation method 
US20210133568A1 (en) *  20191101  20210506  Applied Brain Research Inc.  Methods and systems for training multibit spiking neural networks for efficient implementation on digital hardware 
CN113780545A (en) *  20211112  20211210  南京风兴科技有限公司  General fitting method and device for neural network activation function 
US20210397596A1 (en) *  20200619  20211223  Apple Inc.  Lookup table activation functions for neural networks 
US20210406645A1 (en) *  20200629  20211230  Aselsan Elektronik San. Ve Tic. A. S.  Method for Low Resource and Low Power Consuming Implementation of Nonlinear Activation Functions of Artificial Neural Networks 
CN114119338A (en) *  20200826  20220301  英特尔公司  tanh and sigmoid function execution 
CN114330656A (en) *  20211224  20220412  杭州菲数科技有限公司  Convolution operation hardware accelerator and data processing method 
CN114519419A (en) *  20220217  20220520  深圳鲲云信息科技有限公司  Method, structure, computer equipment and medium for realizing neural network activation function 
CN115526320A (en) *  20220916  20221227  南京地平线集成电路有限公司  Neural network model inference acceleration method, apparatus, electronic device and medium 
WO2023003246A1 (en) *  20210719  20230126  주식회사 사피온코리아  Function approximation device and method using multilevel lookup table 
CN115668224A (en) *  20200629  20230131  美光科技公司  Neuromorphic operation using posit 

2023
 20230213 CN CN202310166986.9A patent/CN116432711B/en active Active
Patent Citations (16)
Publication number  Priority date  Publication date  Assignee  Title 

CN108875915A (en) *  20180612  20181123  辽宁工程技术大学  A kind of depth confrontation network optimized approach of Embedded application 
CN110659015A (en) *  20180629  20200107  英特尔公司  Deep neural network architecture using piecewise linear approximation 
CN110610235A (en) *  20190822  20191224  北京时代民芯科技有限公司  Neural network activation function calculation circuit 
CN110688088A (en) *  20190930  20200114  南京大学  General nonlinear activation function computing device and method for neural network 
US20210133568A1 (en) *  20191101  20210506  Applied Brain Research Inc.  Methods and systems for training multibit spiking neural networks for efficient implementation on digital hardware 
CN111581593A (en) *  20200421  20200825  天津大学  Configurable reuse sectional type lookup table activation function implementation device 
CN111680782A (en) *  20200520  20200918  河海大学常州校区  FPGAbased RBF neural network activation function implementation method 
US20210397596A1 (en) *  20200619  20211223  Apple Inc.  Lookup table activation functions for neural networks 
US20210406645A1 (en) *  20200629  20211230  Aselsan Elektronik San. Ve Tic. A. S.  Method for Low Resource and Low Power Consuming Implementation of Nonlinear Activation Functions of Artificial Neural Networks 
CN115668224A (en) *  20200629  20230131  美光科技公司  Neuromorphic operation using posit 
CN114119338A (en) *  20200826  20220301  英特尔公司  tanh and sigmoid function execution 
WO2023003246A1 (en) *  20210719  20230126  주식회사 사피온코리아  Function approximation device and method using multilevel lookup table 
CN113780545A (en) *  20211112  20211210  南京风兴科技有限公司  General fitting method and device for neural network activation function 
CN114330656A (en) *  20211224  20220412  杭州菲数科技有限公司  Convolution operation hardware accelerator and data processing method 
CN114519419A (en) *  20220217  20220520  深圳鲲云信息科技有限公司  Method, structure, computer equipment and medium for realizing neural network activation function 
CN115526320A (en) *  20220916  20221227  南京地平线集成电路有限公司  Neural network model inference acceleration method, apparatus, electronic device and medium 
NonPatent Citations (5)
Title 

X. FENG等: "A HighPrecision Flexible SymmetryAware Architecture for ElementWise Activation Functions", 《2021 INTERNATIONAL CONFERENCE ON FIELDPROGRAMMABLE TECHNOLOGY (ICFPT)》, pages 1  4 * 
YVINEC E等: "Powerquant: Automorphism search for nonuniform quantization", 《ARXIV:2301.09858V1 》, pages 1  22 * 
刘玉宣: "基于FPGA的高性能椭圆曲线密码加速技术研究", 《中国优秀硕士学位论文全文数据库信息科技辑》, no. 2, pages 135  385 * 
米硕等: "Swish激活函数在中小规模数据集上的性能表现", 《科技创新与应用》, no. 1, pages 4  5 * 
肖健: "基于FPGA的激活函数硬件加速器的研究与设计", 《中国优秀硕士学位论文全文数据库信息科技辑》, no. 2, pages 135  408 * 
Also Published As
Publication number  Publication date 

CN116432711B (en)  20231205 
Similar Documents
Publication  Publication Date  Title 

US20190164043A1 (en)  Lowpower hardware acceleration method and system for convolution neural network computation  
Kim et al.  Zerocentered fixedpoint quantization with iterative retraining for deep convolutional neural networkbased object detectors  
CN111507993A (en)  Image segmentation method and device based on generation countermeasure network and storage medium  
CN112085191A (en)  Neural network quantitative parameter determination method and related product  
CN111581593B (en)  Device for realizing configurable and reusable sectional lookup table activation function  
Nazari et al.  Totnet: An endeavor toward optimizing ternary neural networks  
CN110598673A (en)  Remote sensing image road extraction method based on residual error network  
CN110688088A (en)  General nonlinear activation function computing device and method for neural network  
CN111240746B (en)  Floating point data inverse quantization and quantization method and equipment  
CN112051980B (en)  Nonlinear activation function computing device based on Newton iteration method  
CN113741858A (en)  Inmemory multiplyadd calculation method, device, chip and calculation equipment  
CN110704424B (en)  Sorting method and device applied to database and related equipment  
CN116432711B (en)  Hardware implementation method and device of SiLU activation function and computing equipment  
CN112734023B (en)  Reconfigurable circuit applied to activation function of cyclic neural network  
Guan et al.  NCDCN: multifocus image fusion via nest connection and dilated convolution network  
CN113222209A (en)  Regional tail gas migration prediction method and system based on domain adaptation and storage medium  
CN110837885B (en)  Sigmoid function fitting method based on probability distribution  
CN110955405B (en)  Input data processing and index value acquisition method and device and electronic equipment  
Liu et al.  A robust regression based on weighted LSSVM and penalized trimmed squares  
CN113743593B (en)  Neural network quantization method, system, storage medium and terminal  
CN115526131A (en)  Method and device for approximately calculating Tanh function by multilevel coding  
CN111930670B (en)  Heterogeneous intelligent processing quantization device, quantization method, electronic device and storage medium  
CN114492631A (en)  Spatial attention calculation method based on channel attention  
CN114722902A (en)  Unmarked video Hash retrieval method and device based on selfsupervision learning  
CN114239949A (en)  Website access amount prediction method and system based on twostage attention mechanism 
Legal Events
Date  Code  Title  Description 

PB01  Publication  
PB01  Publication  
SE01  Entry into force of request for substantive examination  
SE01  Entry into force of request for substantive examination  
GR01  Patent grant  
GR01  Patent grant 