CN116432711A - Hardware implementation method and device of SiLU activation function and computing equipment - Google Patents

Hardware implementation method and device of SiLU activation function and computing equipment Download PDF

Info

Publication number
CN116432711A
CN116432711A CN202310166986.9A CN202310166986A CN116432711A CN 116432711 A CN116432711 A CN 116432711A CN 202310166986 A CN202310166986 A CN 202310166986A CN 116432711 A CN116432711 A CN 116432711A
Authority
CN
China
Prior art keywords
function
interval
value
target
silu
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310166986.9A
Other languages
Chinese (zh)
Other versions
CN116432711B (en
Inventor
刘玉宣
丁昊杰
王慧渊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Flyslice Technologies Co ltd
Original Assignee
Hangzhou Flyslice Technologies Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Flyslice Technologies Co ltd filed Critical Hangzhou Flyslice Technologies Co ltd
Priority to CN202310166986.9A priority Critical patent/CN116432711B/en
Publication of CN116432711A publication Critical patent/CN116432711A/en
Application granted granted Critical
Publication of CN116432711B publication Critical patent/CN116432711B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention provides a hardware realization method, a device and a computing device of a SiLU activation function, which relate to the technical field of deep learning.

Description

Hardware implementation method and device of SiLU activation function and computing equipment
Technical Field
The present invention relates to the field of deep learning technologies, and in particular, to a method, an apparatus, and a computing device for implementing a SiLU activation function by hardware.
Background
Under the support of hardware computing power and mass data, the deep neural network is rapidly developed, is widely applied to the fields of image recognition, industrial fault detection, unmanned driving and the like by virtue of excellent feature extraction capability, takes an activation function as an important component of the neural network, and has the function of improving the robustness and nonlinear expression capability of a neural network model.
The SiLU function is a common activation function of the artificial intelligent neural network and has the characteristics of capability of avoiding overfitting, excellent regularization effect, easy training and the like. However, when implementing the SiLU function on hardware, not only a large amount of logic resources are consumed, but the computation is slow.
Disclosure of Invention
The invention aims to provide a hardware implementation method, a device and a computing device of a SiLU activation function, so that hardware consumption and computing time during SiLU function computing are reduced under the condition that the accuracy of the SiLU function is ensured.
In a first aspect, an embodiment of the present invention provides a hardware implementation method of a lu activation function, which is applied to an activation layer of a target neural network, where the activation layer is a lu function; the hardware implementation method of the SiLU activation function comprises the following steps:
acquiring target data to be processed; the target data are input data of the activation layer when the target neural network processes the original data;
when the target data belong to a first interval, determining a first target fitting algorithm as a first direct proportion function; the first interval is an interval larger than or equal to a preset first segmentation value;
when the target data belong to a second interval, determining that the first target fitting algorithm is a second direct proportion function; the second interval is an interval which is larger than or equal to a preset second segmentation value and smaller than the first segmentation value;
when the target data belong to a third interval, determining that the first target fitting algorithm is a uniform compensation table look-up algorithm; wherein the third interval is an interval greater than or equal to 0 and less than the second segment value;
when the target data belong to a fourth interval, determining that a first target fitting algorithm is an absolute value-based fitting algorithm; wherein the fourth interval is an interval smaller than 0;
and calculating the SiLU function value corresponding to the target data by adopting the first target fitting algorithm.
Further, the third interval is a non-negative interval with a rapid sigmoid function change, the uniform compensation table look-up algorithm refers to searching corresponding sigmoid function values in a preset storage table through address compensation to calculate the SiLU function values, and the preset storage table stores a preset number of sigmoid function values corresponding to the third interval through address indexes.
Further, the first direct scaling function is:
f(x)=x;
the second interval comprises a plurality of subintervals, each subinterval corresponding to a different second direct proportional function.
Further, the second interval includes two subintervals obtained by splitting a preset third segmentation value, and the second direct proportion function includes:
Figure BDA0004096245300000031
wherein k is 1 、k 2 All are preset parameter values, a 1 For the first segment value, a 2 For the third segment value, a 3 Is the second segment value.
Further, the preset storage table uses binary address indexes with preset digits, the 0 th bit of the address indexes is a compensation bit, and other bits of the address indexes are valid bits; the calculating, by using the first target fitting algorithm, a lu function value corresponding to the target data includes:
when the first target fitting algorithm is a uniform compensation table look-up algorithm, converting the target data into a binary initial index address value;
correcting the initial index address value according to the value of the compensation bit in the initial index address value to obtain a target index address value;
searching in the preset storage table according to the target index address value to obtain a corresponding sigmoid function value;
and calculating the SiLU function value corresponding to the target data according to the corresponding sigmoid function value.
Further, the calculating, according to the corresponding sigmoid function value, a SiLU function value corresponding to the target data includes:
bringing the corresponding sigmoid function value into the following formula, and calculating to obtain a SiLU function value corresponding to the target data:
f(x)=x·s(x),x∈[0,a 3 );
wherein s (x) is the corresponding sigmoid function value.
Further, the calculating, by using the first target fitting algorithm, a SiLU function value corresponding to the target data includes:
when the first target fitting algorithm is an absolute value-based fitting algorithm, calculating to obtain an absolute value of the target data;
determining a second target fitting algorithm based on the interval to which the absolute value of the target data belongs; wherein the second target fitting algorithm comprises the first direct scaling function, the second direct scaling function or the uniform compensation look-up table algorithm;
calculating a SiLU function value corresponding to the absolute value of the target data by adopting the second target fitting algorithm;
and calculating the SiLU function value corresponding to the target data according to the SiLU function value corresponding to the absolute value of the target data.
Further, the calculating the lu function value corresponding to the target data according to the lu function value corresponding to the absolute value of the target data includes:
bringing the SiLU function value corresponding to the absolute value of the target data into the following formula, and calculating to obtain the SiLU function value corresponding to the target data:
f(x)=f(|x|)-|x|,x∈(-∞,0,);
wherein f (|x|) is a Silu function value corresponding to |x|), and |x| is an absolute value of x.
In a second aspect, the embodiment of the present invention further provides a hardware implementation device of a SiLU activation function, which is applied to an activation layer of a target neural network, where the activation layer is a SiLU function; the hardware implementation device of the SiLU activation function comprises:
the data acquisition module is used for acquiring target data to be processed; the target data are input data of the activation layer when the target neural network processes the original data;
the algorithm determining module is used for determining that a first target fitting algorithm is a first direct proportion function when the target data belong to a first interval; the first interval is an interval larger than or equal to a preset first segmentation value; when the target data belong to a second interval, determining that the first target fitting algorithm is a second direct proportion function; the second interval is an interval which is larger than or equal to a preset second segmentation value and smaller than the first segmentation value; when the target data belong to a third interval, determining that the first target fitting algorithm is a uniform compensation table look-up algorithm; wherein the third interval is an interval greater than or equal to 0 and less than the second segment value; when the target data belong to a fourth interval, determining that a first target fitting algorithm is an absolute value-based fitting algorithm; wherein the fourth interval is an interval smaller than 0;
and the fitting calculation module is used for calculating and obtaining the SiLU function value corresponding to the target data by adopting the first target fitting algorithm.
In a third aspect, an embodiment of the present invention further provides a computing device, including a memory, and a processor, where the memory stores a computer program that can run on the processor, and the processor implements a hardware implementation method of the SiLU activation function of the first aspect when the processor executes the computer program.
According to the hardware implementation method, the device and the computing equipment of the SiLU activation function, provided by the embodiment of the invention, when the SiLU activation function is computed on target data, the function simplification mode of segmented fitting is adopted, and the method comprises the steps of partial approximation and additional compensation of a uniform compensation table look-up algorithm, so that the hardware consumption and the computing time during the SiLU function computation are reduced under the condition that the accuracy of the SiLU function is ensured.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are needed in the description of the embodiments or the prior art will be briefly described, and it is obvious that the drawings in the description below are some embodiments of the present invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.
FIG. 1 is a functional image of a SiLU function;
FIG. 2 is a functional image of a sigmoid function;
fig. 3 is a flow chart of a method for implementing a hardware of a SiLU activation function according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of an address index according to an embodiment of the present invention;
fig. 5 is a schematic structural diagram of a hardware implementation device of a SiLU activation function according to an embodiment of the present invention;
fig. 6 is a schematic structural diagram of a SiLU function calculation unit according to an embodiment of the present invention;
fig. 7 is a schematic structural diagram of a computing device according to an embodiment of the present invention.
Detailed Description
The technical solutions of the present invention will be clearly and completely described in connection with the embodiments, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The activation function acts as an important component of the neural network, typically after the convolutional layer, its output acts as an input to the subsequent pooling layer. The SiLU function is a common activation function, and the expression of the SiLU function is as follows:
Figure BDA0004096245300000061
the functional image is shown in fig. 1.
The SiLU function is a nonlinear function, and involves not only exponent operation but also division operation, so when implementing the SiLU function on hardware, not only a large amount of logic resources are required to be consumed, but also calculation is slow.
Based on this, the hardware implementation method, device and computing equipment for the Silu activation function provided by the embodiments of the present invention reduce the cost of hardware resources by means of segment fitting (i.e. simplifying the functions of different intervals by segment fitting), and ensure the accuracy of the Silu function by means of partial approximation and additional compensation (i.e. fitting by means of uniform compensation look-up table in intervals where linear fitting cannot be performed).
As shown in equation (2), the SiLU function can be divided into x and x according to its definition
Figure BDA0004096245300000062
Two parts, wherein the latter part is a sigmoid function. The functional image of the sigmoid function is shown in fig. 2 as being centrosymmetric about the point (0, 0.5), i.e. when x < 0, equation (3) holds. By utilizing this property, the SiLU function can be divided into two segments (- ++0) and [0, ++infinity), as shown in equation (4).
Figure BDA0004096245300000071
Figure BDA0004096245300000072
Figure BDA0004096245300000073
In hardware, the division operation and the exponent operation are compared and consume resources, so the embodiment of the invention only aims at the SiLU function
Figure BDA0004096245300000074
And performing approximation processing on the part.
For the convenience of understanding the present embodiment, a detailed description will be first given of a hardware implementation method of a SiLU activation function disclosed in the present embodiment.
Embodiments of the present invention provide a method for implementing a SiLU activation function in hardware, which may be performed by a computing device having data processing capabilities, where the hardware of the computing device may employ an FPGA (Field Programmable Gate Array ). The method is applied to an activation layer of the target neural network, wherein the activation layer is a SiLU function. Referring to fig. 3, a flow chart of a hardware implementation method of a SiLU activation function is shown, and the method mainly includes steps S302 to S306 as follows:
step S302, target data to be processed is acquired.
The target data is input data of an activation layer when the target neural network processes the original data. The target neural network is a convolutional neural network that realizes a specific function such as image recognition, industrial fault detection, unmanned driving detection, or classification, and thus, raw data is input data corresponding to the specific function, for example, raw data may be an image corresponding to the image recognition function, or industrial data corresponding to the industrial fault detection function, or the like. The target neural network may include a plurality of network layers, such as a convolution layer, an activation layer, a pooling layer, a full connection layer, etc., connected in a certain order, and typically, the upper layer of the activation layer is the convolution layer, and the lower layer of the activation layer is the pooling layer.
Step S304, determining a first target fitting algorithm according to the interval to which the target data belongs.
In this embodiment, [0 ] + ] is divided into [0, a ] according to the change of the sigmoid function 3 )、[a 3 ,a 1 )、[a 1 , +++) three-stage, respectively corresponding to a third interval, a second interval and a first interval, the sigmoid function varies slowly between the second interval and the first interval, a proportional function may be used, the sigmoid function changes faster in the third interval, and if one monotonic function fitting is still used, the error is larger, so that a uniform compensation table look-up algorithm can be adopted; further, as can be seen from the equation (4), the negative number interval (- ≡0) may employ a fitting algorithm based on absolute values. Based on this, the above step S304 may include:
when the target data belong to a first interval, determining that a first target fitting algorithm is a first direct proportion function; wherein the first section is a section greater than or equal to a preset first segment value (i.e., [ a ] 1 ,+∞),a 1 A first segment value);
when the target data belong to the second interval, determining that the first target fitting algorithm is a second direct proportion function; wherein the second interval is an interval greater than or equal to a preset second segment value and less than the first segment value (i.e., [ a ] 3 ,a 1 ),a 3 Is the second segment value);
when the target data belong to a third interval, determining that the first target fitting algorithm is a uniform compensation table look-up algorithm; wherein the third interval is an interval greater than or equal to 0 and less than the second segment value (i.e., [0, a ] 3 ));
When the target data belong to the fourth interval, determining that the first target fitting algorithm is an absolute value-based fitting algorithm; wherein the fourth interval is an interval less than 0 (i.e., (- +, 0)).
Further, consider x ε [ a ] 1 , + -infinity at the time of the transfer of the sample), the sigmoid function value tends to be 1, thus, the first direct scaling function may be:
f(x)=x (5)
consider x ε a 3 ,a 1 ) When there is the following equation (6), the second direct proportional function may take the form of equation (7).
Figure BDA0004096245300000081
f(x)=x-x/k (7)
Wherein k is a preset parameter value.
Further, to improve the accuracy of the SiLU function, the second interval may be divided into a plurality of sub-intervals, each sub-interval corresponding to a different second direct proportional function.
In one possible implementation manner, the second interval includes two subintervals obtained by splitting a preset third segmentation value, and the second direct proportion function includes:
Figure BDA0004096245300000091
wherein k is 1 、k 2 All are preset parameter values, a 1 For the first segment value, a 2 For the third segment value, a 3 Is the second segment value.
Further, the third interval is a non-negative interval with a rapid sigmoid function change, the uniform compensation table lookup algorithm refers to searching a corresponding sigmoid function value in a preset storage table through address compensation to calculate a SilU function value, and the preset storage table stores a preset number of sigmoid function values corresponding to the third interval through address index. The preset number may be set according to actual requirements, and is not limited herein. For example, the preset number is 1024, at which time the third section may be divided into 1024 parts, preferably 1024 are stored using ROM (Read-Only Memory)
Figure BDA0004096245300000092
Values.
The preset storage table may use a binary address index with preset bits, where the 0 th bit of the address index is a compensation bit, and the other bits of the address index are valid bits, where the compensation bit is used for performing decimal compensation on the address index: when the compensation bit value is 0, the value of the valid bit is directly used as the address index value, and when the compensation bit value is 1, the value of the valid bit is added with 1 to be used as the address index value, so that the additional compensation of the address index is realized. The preset number of bits corresponds to the preset number, the preset number of bits is equal to the number of bits of the valid bit plus 1, and the storage number corresponding to the valid bit is greater than or equal to the preset number. For example, when the preset number is 1024, the preset number of bits is 11, and the address index includes 10 valid bits and 1 compensation bit.
Step S306, a first target fitting algorithm is adopted, and SiLU function values corresponding to the target data are calculated.
Specifically, when the target data belongs to the first interval or the second interval, the target data is directly brought into the corresponding proportional function calculation, and the SiLU function value corresponding to the target data can be obtained.
When the target data belongs to the third interval, a uniform compensation table look-up algorithm is adopted, and the process of calculating the SilU function value corresponding to the target data can be as follows: converting the target data into binary initial index address values; correcting the initial index address value according to the value of the compensation bit in the initial index address value to obtain a target index address value; searching in a preset storage table according to the target index address value to obtain a corresponding sigmoid function value; and calculating to obtain the SiLU function value corresponding to the target data according to the corresponding sigmoid function value.
In some possible embodiments, the initial index address value d of x may be calculated by equation (9) 0 The method comprises the steps of carrying out a first treatment on the surface of the When the value of the compensation bit in the initial index address value is 0, taking the value of the valid bit corresponding to the initial index address value as a target index address value; when the value of the compensation bit in the initial index address value is 1, adding 1 to the value of the valid bit corresponding to the initial index address value to serve as a target index address value; then, according to the target index address value, a sigmoid function value corresponding to x (namely a corresponding sigmoid function value) is found in a preset storage table; finally, bringing the sigmoid function value corresponding to x into a formula (10), and calculating to obtain the SiLU function value corresponding to x:
d 0 =x m/0 3 (9)
f(x)=x·s(x),x∈[0,a 3 ) (10)
wherein m is the number of sigmoid function values stored in a preset memory table (i.e. preset number), a 3 S (x) is a sigmoid function value corresponding to x for the second segment value.
In one possible implementation, a 3 =8, m=1024, i.e. dividing the [0,8 ] interval into 1024 parts, storing 1024 sigmoid function values using ROM (Read-Only Memory); for ROM, an 11-bit address index is used, as shown in FIG. 4, in which the upper 10-1 bits are the valid bits of the address index and the 0 th bit is the offset bits of the address index. When calculating the address index value of ROM, if the compensation bit value is 0, the value of 10-bit significant bit is directly used as the index address of ROMA value; if the compensation bit value is 1, adding 1 to the value of the 10-bit valid bit to be used as an index address value of the ROM; and searching a sigmoid function value corresponding to x in the ROM according to the address index value, and finally calculating by using the formula (10) to obtain the SiLU function value corresponding to x. For example, when x=0.1, the initial index address value is a binary number corresponding to 12.8, and the offset bit is 1 at this time, so the target index address value is a binary number corresponding to 13, the sigmoid function value corresponding to 0.1 is found in the ROM according to the binary number corresponding to 13, and then the SiLU function value corresponding to 0.1 is calculated using equation (10).
When the target data belongs to the fourth interval, the process of calculating the SiLU function value corresponding to the target data by adopting the fitting algorithm based on the absolute value may be as follows: calculating to obtain an absolute value of target data; determining a second target fitting algorithm based on an interval to which the absolute value of the target data belongs; the second target fitting algorithm comprises a first direct proportion function, a second direct proportion function or a uniform compensation table look-up algorithm; calculating a SiLU function value corresponding to the absolute value of the target data by adopting a second target fitting algorithm; and calculating the SiLU function value corresponding to the target data according to the SiLU function value corresponding to the absolute value of the target data.
In some possible embodiments, the lu function value corresponding to the absolute value of the target data may be taken into the following formula (11), and the lu function value corresponding to the target data is calculated:
f(x)=f(|x|)-|x|,x∈(-∞,0,) (11)
wherein f (|x|) is a Silu function value corresponding to |x|), and |x| is an absolute value of x.
In order to facilitate understanding and implementation, the embodiment of the present invention further provides a comprehensive calculation formula of a hardware implementation method of a SiLU activation function, which is as follows:
Figure BDA0004096245300000111
s (x) is a corresponding sigmoid function value obtained by searching in a preset storage table through address compensation.
And the SiLU function calculation is carried out through the formula (12), so that the error of the SiLU function is smaller, and the accuracy is higher.
According to the hardware implementation method of the SiLU activation function, the SiLU activation function is split, complex exponential operation and division operation are simplified into linear operation in different value intervals in a piecewise fitting mode, fitting is performed in intervals where linear fitting cannot be performed through a uniform compensation table look-up method, accuracy of a convolutional neural network algorithm can be guaranteed under low hardware complexity, the hardware implementation problem of the complex activation function is solved, and therefore accuracy is guaranteed, meanwhile consumption of hardware resources is reduced, and calculation time is saved.
The embodiment of the invention also provides a hardware implementation device of the SiLU activation function, which is applied to an activation layer of a target neural network, wherein the activation layer is the SiLU function. Referring to fig. 5, a schematic structural diagram of a hardware implementation device of a SiLU activation function is shown, where the device includes:
a data acquisition module 501, configured to acquire target data to be processed; the target data is input data of an activation layer when the target neural network processes the original data;
the algorithm determining module 502 is configured to determine that the first target fitting algorithm is a first direct proportion function when the target data belongs to the first interval; the first interval is an interval larger than or equal to a preset first segmentation value; when the target data belong to the second interval, determining that the first target fitting algorithm is a second direct proportion function; the second interval is an interval which is larger than or equal to a preset second segmentation value and smaller than the first segmentation value; when the target data belong to a third interval, determining that the first target fitting algorithm is a uniform compensation table look-up algorithm; wherein the third interval is an interval which is greater than or equal to 0 and smaller than the second segmentation value; when the target data belong to the fourth interval, determining that the first target fitting algorithm is an absolute value-based fitting algorithm; wherein the fourth interval is an interval smaller than 0;
the fitting calculation module 503 is configured to calculate a lu function value corresponding to the target data by using a first target fitting algorithm.
Further, the third interval is a non-negative interval with a rapid sigmoid function change, the uniform compensation table lookup algorithm refers to searching a corresponding sigmoid function value in a preset storage table through address compensation to calculate a SilU function value, and the preset storage table stores a preset number of sigmoid function values corresponding to the third interval through address index.
Further, the first direct scaling function is:
f(x)=x;
the second interval includes a plurality of subintervals, each subinterval corresponding to a different second direct proportional function.
Further, the second interval includes two subintervals obtained by splitting a preset third segmentation value, and the second direct proportion function includes:
Figure BDA0004096245300000131
wherein k is 1 、k 2 All are preset parameter values, a 1 For the first segment value, a 2 For the third segment value, a 3 Is the second segment value.
Further, the preset storage table uses binary address indexes with preset digits, the 0 th bit of the address indexes is a compensation bit, and other bits of the address indexes are valid bits; the fitting calculation module 503 is specifically configured to:
when the first target fitting algorithm is a uniform compensation table look-up algorithm, converting target data into binary initial index address values;
correcting the initial index address value according to the value of the compensation bit in the initial index address value to obtain a target index address value;
searching in a preset storage table according to the target index address value to obtain a corresponding sigmoid function value;
and calculating to obtain the SiLU function value corresponding to the target data according to the corresponding sigmoid function value.
Further, the fitting calculation module 503 is further configured to:
the corresponding sigmoid function value is brought into the following formula, and the SiLU function value corresponding to the target data is obtained through calculation:
f(x)=x·s(x),x∈[0,a 3 );
where s (x) is the corresponding sigmoid function value.
Further, the fitting calculation module 503 is further configured to:
when the first target fitting algorithm is an absolute value-based fitting algorithm, calculating to obtain an absolute value of target data;
determining a second target fitting algorithm based on an interval to which the absolute value of the target data belongs; the second target fitting algorithm comprises a first direct proportion function, a second direct proportion function or a uniform compensation table look-up algorithm;
calculating a SiLU function value corresponding to the absolute value of the target data by adopting a second target fitting algorithm;
and calculating the SiLU function value corresponding to the target data according to the SiLU function value corresponding to the absolute value of the target data.
Further, the fitting calculation module 503 is further configured to:
bringing the SiLU function value corresponding to the absolute value of the target data into the following formula, and calculating to obtain the SiLU function value corresponding to the target data:
f(x)=f(|x|)-|x|,x∈(-∞,0,);
wherein f (|x|) is a Silu function value corresponding to |x|), and |x| is an absolute value of x.
The implementation principle and the generated technical effects of the hardware implementation device of the lu activation function provided in this embodiment are the same as those of the foregoing hardware implementation method embodiment of the lu activation function, and for a brief description, reference may be made to corresponding contents in the foregoing hardware implementation method embodiment of the lu activation function where the embodiment of the hardware implementation device of the lu activation function is not mentioned.
The embodiment of the invention also provides a SiLU function calculation unit which is a hardware device corresponding to the hardware implementation device of the SiLU activation function. As shown in fig. 6, the lu function calculation unit 600 provided by the embodiment of the invention includes an input register 601, an absolute value calculation unit 602, a positive and negative judgment unit 603, an interval judgment unit 604, a linear fitting unit 0, a linear fitting unit 1, a linear fitting unit 2, an index compensation circuit 606, a lookup table 607, a selector 608, and an arithmetic logic unit 609. The input register 601 is used for storing a variable x of an input SiLU function; the absolute value calculating unit 602 and the positive and negative judging unit 603 are respectively used for calculating the absolute value of the variable x and judging the positive and negative conditions of the variable x; the linear fitting unit calculates an intermediate result of the SiLU function in a linear fitting mode; the index compensation circuit 606 performs data compensation on the input of the lookup table 607 to improve the calculation accuracy, and specifically refers to the above-mentioned SiLU function implementation principle; the lookup table 607 stores intermediate results of the SiLU function, which may be a register array, ROM or RAM on-chip storage logic resources; the arithmetic logic unit 609 finally selects the corresponding intermediate result according to the absolute value interval segmentation condition of the variable x, and finally calculates the final result through some arithmetic logic according to the positive and negative conditions and outputs the final result.
Referring to fig. 7, fig. 7 is a schematic structural diagram of a computing device 700 according to an embodiment of the present invention, where the computing device 700 may include: a processor 701, a memory 702, a communication interface 703 and a communication bus 704. The processor 701, the memory 702 and the communication interface 703 all perform communication with each other via a communication bus 704.
In an embodiment of the present invention, the processor 701 may be a central processing unit (Central Processing Unit, CPU), an asic, a dsp, a field programmable gate array, or other programmable logic device, etc.
The processor 701 may call a program stored in the memory 702, and in particular, the processor 701 may perform the operations of the hardware implementation method of the lu activation function described above.
The memory 702 is used for storing one or more programs, and the programs may include program codes, and the program codes include computer operation instructions, in the embodiment of the present invention, at least the programs for implementing the following functions are stored in the memory 702:
acquiring target data to be processed; the target data is input data of an activation layer when the target neural network processes the original data;
when the target data belong to a first interval, determining that a first target fitting algorithm is a first direct proportion function; the first interval is an interval larger than or equal to a preset first segmentation value;
when the target data belong to the second interval, determining that the first target fitting algorithm is a second direct proportion function; the second interval is an interval which is larger than or equal to a preset second segmentation value and smaller than the first segmentation value;
when the target data belong to a third interval, determining that the first target fitting algorithm is a uniform compensation table look-up algorithm; wherein the third interval is an interval which is greater than or equal to 0 and smaller than the second segmentation value;
when the target data belong to the fourth interval, determining that the first target fitting algorithm is an absolute value-based fitting algorithm; wherein the fourth interval is an interval smaller than 0;
and calculating to obtain the SiLU function value corresponding to the target data by adopting a first target fitting algorithm.
In one possible implementation, the memory 702 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, and at least one application program required for functionality, etc.; the storage data area may store data created during use.
In addition, the memory 702 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device or other volatile solid-state storage device.
The communication interface 703 may be an interface of a communication module for connecting with other devices or systems.
Of course, it should be noted that the configuration shown in fig. 7 does not limit the computing device 700 in the embodiment of the present invention, and the computing device 700 may include more or less components than those shown in fig. 7 or may combine some components in practical applications.
The embodiment of the invention also provides a computer readable storage medium, and the computer readable storage medium stores a computer program, and the computer program is executed by a processor to execute the hardware implementation method of the SiLU activation function in the previous method embodiment. The computer-readable storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a RAM, a magnetic disk, or an optical disk, etc., which can store program codes.
Any particular values in all examples shown and described herein are to be construed as merely illustrative and not a limitation, and thus other examples of exemplary embodiments may have different values.
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the invention.

Claims (10)

1. A hardware realization method of a SiLU activation function is characterized by being applied to an activation layer of a target neural network, wherein the activation layer is the SiLU function; the hardware implementation method of the SiLU activation function comprises the following steps:
acquiring target data to be processed; the target data are input data of the activation layer when the target neural network processes the original data;
when the target data belong to a first interval, determining a first target fitting algorithm as a first direct proportion function; the first interval is an interval larger than or equal to a preset first segmentation value;
when the target data belong to a second interval, determining that the first target fitting algorithm is a second direct proportion function; the second interval is an interval which is larger than or equal to a preset second segmentation value and smaller than the first segmentation value;
when the target data belong to a third interval, determining that the first target fitting algorithm is a uniform compensation table look-up algorithm; wherein the third interval is an interval greater than or equal to 0 and less than the second segment value;
when the target data belong to a fourth interval, determining that a first target fitting algorithm is an absolute value-based fitting algorithm; wherein the fourth interval is an interval smaller than 0;
and calculating the SiLU function value corresponding to the target data by adopting the first target fitting algorithm.
2. The hardware implementation method of a sizu activation function according to claim 1, wherein the third interval is a non-negative interval in which the sigmoid function changes faster, the uniform compensation table lookup algorithm refers to searching a preset memory table for a corresponding sigmoid function value through address compensation to calculate a sizu function value, and the preset memory table stores a preset number of sigmoid function values corresponding to the third interval through an address index.
3. The hardware implementation method of the SiLU activation function according to claim 1, wherein the first direct scaling function is:
f(x)=x;
the second interval comprises a plurality of subintervals, each subinterval corresponding to a different second direct proportional function.
4. The hardware implementation method of a SiLU activation function according to claim 3, wherein the second interval includes two subintervals split by a preset third segmentation value, and the second direct proportional function includes:
Figure FDA0004096245270000021
wherein k is 1 、k 2 All are preset parameter values, a 1 For the first segment value, a 2 For the third segment value, a 3 Is the second segment value.
5. The hardware implementation method of the SiLU activation function according to claim 2, wherein the preset storage table uses a binary address index of a preset number of bits, the 0 th bit of the address index is a compensation bit, and the other bits of the address index are valid bits; the calculating, by using the first target fitting algorithm, a lu function value corresponding to the target data includes:
when the first target fitting algorithm is a uniform compensation table look-up algorithm, converting the target data into a binary initial index address value;
correcting the initial index address value according to the value of the compensation bit in the initial index address value to obtain a target index address value;
searching in the preset storage table according to the target index address value to obtain a corresponding sigmoid function value;
and calculating the SiLU function value corresponding to the target data according to the corresponding sigmoid function value.
6. The hardware implementation method of the nalu activation function according to claim 5, wherein the calculating the nalu function value corresponding to the target data according to the corresponding sigmoid function value includes:
bringing the corresponding sigmoid function value into the following formula, and calculating to obtain a SiLU function value corresponding to the target data:
f(x)=x·s(x),x∈[0,a 3 );
wherein s (x) is the corresponding sigmoid function value.
7. The hardware implementation method of the nalu activation function according to claim 1, wherein the calculating, by using the first objective fitting algorithm, the nalu function value corresponding to the objective data includes:
when the first target fitting algorithm is an absolute value-based fitting algorithm, calculating to obtain an absolute value of the target data;
determining a second target fitting algorithm based on the interval to which the absolute value of the target data belongs; wherein the second target fitting algorithm comprises the first direct scaling function, the second direct scaling function or the uniform compensation look-up table algorithm;
calculating a SiLU function value corresponding to the absolute value of the target data by adopting the second target fitting algorithm;
and calculating the SiLU function value corresponding to the target data according to the SiLU function value corresponding to the absolute value of the target data.
8. The hardware implementation method of the lu activation function according to claim 7, wherein the calculating the lu function value corresponding to the target data according to the lu function value corresponding to the absolute value of the target data includes:
bringing the SiLU function value corresponding to the absolute value of the target data into the following formula, and calculating to obtain the SiLU function value corresponding to the target data:
f(x)=f(|x|)-|x|,x∈(-∞,0,);
wherein f (|x|) is a Silu function value corresponding to |x|), and |x| is an absolute value of x.
9. The hardware realization device of the SiLU activation function is characterized by being applied to an activation layer of a target neural network, wherein the activation layer is the SiLU function; the hardware implementation device of the SiLU activation function comprises:
the data acquisition module is used for acquiring target data to be processed; the target data are input data of the activation layer when the target neural network processes the original data;
the algorithm determining module is used for determining that a first target fitting algorithm is a first direct proportion function when the target data belong to a first interval; the first interval is an interval larger than or equal to a preset first segmentation value; when the target data belong to a second interval, determining that the first target fitting algorithm is a second direct proportion function; the second interval is an interval which is larger than or equal to a preset second segmentation value and smaller than the first segmentation value; when the target data belong to a third interval, determining that the first target fitting algorithm is a uniform compensation table look-up algorithm; wherein the third interval is an interval greater than or equal to 0 and less than the second segment value; when the target data belong to a fourth interval, determining that a first target fitting algorithm is an absolute value-based fitting algorithm; wherein the fourth interval is an interval smaller than 0;
and the fitting calculation module is used for calculating and obtaining the SiLU function value corresponding to the target data by adopting the first target fitting algorithm.
10. A computing device comprising a memory, a processor, the memory having stored therein a computer program executable on the processor, wherein the processor, when executing the computer program, implements a hardware implementation of the SiLU activation function of any of claims 1-8.
CN202310166986.9A 2023-02-13 2023-02-13 Hardware implementation method and device of SiLU activation function and computing equipment Active CN116432711B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310166986.9A CN116432711B (en) 2023-02-13 2023-02-13 Hardware implementation method and device of SiLU activation function and computing equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310166986.9A CN116432711B (en) 2023-02-13 2023-02-13 Hardware implementation method and device of SiLU activation function and computing equipment

Publications (2)

Publication Number Publication Date
CN116432711A true CN116432711A (en) 2023-07-14
CN116432711B CN116432711B (en) 2023-12-05

Family

ID=87084423

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310166986.9A Active CN116432711B (en) 2023-02-13 2023-02-13 Hardware implementation method and device of SiLU activation function and computing equipment

Country Status (1)

Country Link
CN (1) CN116432711B (en)

Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108875915A (en) * 2018-06-12 2018-11-23 辽宁工程技术大学 A kind of depth confrontation network optimized approach of Embedded application
CN110610235A (en) * 2019-08-22 2019-12-24 北京时代民芯科技有限公司 Neural network activation function calculation circuit
CN110659015A (en) * 2018-06-29 2020-01-07 英特尔公司 Deep neural network architecture using piecewise linear approximation
CN110688088A (en) * 2019-09-30 2020-01-14 南京大学 General nonlinear activation function computing device and method for neural network
CN111581593A (en) * 2020-04-21 2020-08-25 天津大学 Configurable reuse sectional type lookup table activation function implementation device
CN111680782A (en) * 2020-05-20 2020-09-18 河海大学常州校区 FPGA-based RBF neural network activation function implementation method
US20210133568A1 (en) * 2019-11-01 2021-05-06 Applied Brain Research Inc. Methods and systems for training multi-bit spiking neural networks for efficient implementation on digital hardware
CN113780545A (en) * 2021-11-12 2021-12-10 南京风兴科技有限公司 General fitting method and device for neural network activation function
US20210397596A1 (en) * 2020-06-19 2021-12-23 Apple Inc. Lookup table activation functions for neural networks
US20210406645A1 (en) * 2020-06-29 2021-12-30 Aselsan Elektronik San. Ve Tic. A. S. Method for Low Resource and Low Power Consuming Implementation of Nonlinear Activation Functions of Artificial Neural Networks
CN114119338A (en) * 2020-08-26 2022-03-01 英特尔公司 tanh and sigmoid function execution
CN114330656A (en) * 2021-12-24 2022-04-12 杭州菲数科技有限公司 Convolution operation hardware accelerator and data processing method
CN114519419A (en) * 2022-02-17 2022-05-20 深圳鲲云信息科技有限公司 Method, structure, computer equipment and medium for realizing neural network activation function
CN115526320A (en) * 2022-09-16 2022-12-27 南京地平线集成电路有限公司 Neural network model inference acceleration method, apparatus, electronic device and medium
WO2023003246A1 (en) * 2021-07-19 2023-01-26 주식회사 사피온코리아 Function approximation device and method using multi-level look-up table
CN115668224A (en) * 2020-06-29 2023-01-31 美光科技公司 Neuromorphic operation using posit

Patent Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108875915A (en) * 2018-06-12 2018-11-23 辽宁工程技术大学 A kind of depth confrontation network optimized approach of Embedded application
CN110659015A (en) * 2018-06-29 2020-01-07 英特尔公司 Deep neural network architecture using piecewise linear approximation
CN110610235A (en) * 2019-08-22 2019-12-24 北京时代民芯科技有限公司 Neural network activation function calculation circuit
CN110688088A (en) * 2019-09-30 2020-01-14 南京大学 General nonlinear activation function computing device and method for neural network
US20210133568A1 (en) * 2019-11-01 2021-05-06 Applied Brain Research Inc. Methods and systems for training multi-bit spiking neural networks for efficient implementation on digital hardware
CN111581593A (en) * 2020-04-21 2020-08-25 天津大学 Configurable reuse sectional type lookup table activation function implementation device
CN111680782A (en) * 2020-05-20 2020-09-18 河海大学常州校区 FPGA-based RBF neural network activation function implementation method
US20210397596A1 (en) * 2020-06-19 2021-12-23 Apple Inc. Lookup table activation functions for neural networks
US20210406645A1 (en) * 2020-06-29 2021-12-30 Aselsan Elektronik San. Ve Tic. A. S. Method for Low Resource and Low Power Consuming Implementation of Nonlinear Activation Functions of Artificial Neural Networks
CN115668224A (en) * 2020-06-29 2023-01-31 美光科技公司 Neuromorphic operation using posit
CN114119338A (en) * 2020-08-26 2022-03-01 英特尔公司 tanh and sigmoid function execution
WO2023003246A1 (en) * 2021-07-19 2023-01-26 주식회사 사피온코리아 Function approximation device and method using multi-level look-up table
CN113780545A (en) * 2021-11-12 2021-12-10 南京风兴科技有限公司 General fitting method and device for neural network activation function
CN114330656A (en) * 2021-12-24 2022-04-12 杭州菲数科技有限公司 Convolution operation hardware accelerator and data processing method
CN114519419A (en) * 2022-02-17 2022-05-20 深圳鲲云信息科技有限公司 Method, structure, computer equipment and medium for realizing neural network activation function
CN115526320A (en) * 2022-09-16 2022-12-27 南京地平线集成电路有限公司 Neural network model inference acceleration method, apparatus, electronic device and medium

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
X. FENG等: "A High-Precision Flexible Symmetry-Aware Architecture for Element-Wise Activation Functions", 《2021 INTERNATIONAL CONFERENCE ON FIELD-PROGRAMMABLE TECHNOLOGY (ICFPT)》, pages 1 - 4 *
YVINEC E等: "Powerquant: Automorphism search for non-uniform quantization", 《ARXIV:2301.09858V1 》, pages 1 - 22 *
刘玉宣: "基于FPGA的高性能椭圆曲线密码加速技术研究", 《中国优秀硕士学位论文全文数据库信息科技辑》, no. 2, pages 135 - 385 *
米硕等: "Swish激活函数在中小规模数据集上的性能表现", 《科技创新与应用》, no. 1, pages 4 - 5 *
肖健: "基于FPGA的激活函数硬件加速器的研究与设计", 《中国优秀硕士学位论文全文数据库信息科技辑》, no. 2, pages 135 - 408 *

Also Published As

Publication number Publication date
CN116432711B (en) 2023-12-05

Similar Documents

Publication Publication Date Title
US20190164043A1 (en) Low-power hardware acceleration method and system for convolution neural network computation
Kim et al. Zero-centered fixed-point quantization with iterative retraining for deep convolutional neural network-based object detectors
CN111507993A (en) Image segmentation method and device based on generation countermeasure network and storage medium
CN112085191A (en) Neural network quantitative parameter determination method and related product
CN111581593B (en) Device for realizing configurable and reusable sectional lookup table activation function
Nazari et al. Tot-net: An endeavor toward optimizing ternary neural networks
CN110598673A (en) Remote sensing image road extraction method based on residual error network
CN110688088A (en) General nonlinear activation function computing device and method for neural network
CN111240746B (en) Floating point data inverse quantization and quantization method and equipment
CN112051980B (en) Non-linear activation function computing device based on Newton iteration method
CN113741858A (en) In-memory multiply-add calculation method, device, chip and calculation equipment
CN110704424B (en) Sorting method and device applied to database and related equipment
CN116432711B (en) Hardware implementation method and device of SiLU activation function and computing equipment
CN112734023B (en) Reconfigurable circuit applied to activation function of cyclic neural network
Guan et al. NCDCN: multi-focus image fusion via nest connection and dilated convolution network
CN113222209A (en) Regional tail gas migration prediction method and system based on domain adaptation and storage medium
CN110837885B (en) Sigmoid function fitting method based on probability distribution
CN110955405B (en) Input data processing and index value acquisition method and device and electronic equipment
Liu et al. A robust regression based on weighted LSSVM and penalized trimmed squares
CN113743593B (en) Neural network quantization method, system, storage medium and terminal
CN115526131A (en) Method and device for approximately calculating Tanh function by multi-level coding
CN111930670B (en) Heterogeneous intelligent processing quantization device, quantization method, electronic device and storage medium
CN114492631A (en) Spatial attention calculation method based on channel attention
CN114722902A (en) Unmarked video Hash retrieval method and device based on self-supervision learning
CN114239949A (en) Website access amount prediction method and system based on two-stage attention mechanism

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant