CN113780545A

CN113780545A - General fitting method and device for neural network activation function

Info

Publication number: CN113780545A
Application number: CN202111335738.XA
Authority: CN
Inventors: 王丹阳; 杨东天; 王中风; 林军; 刘阳
Original assignee: Nanjing Fengxing Technology Co ltd
Current assignee: Nanjing Fengxing Technology Co ltd
Priority date: 2021-11-12
Filing date: 2021-11-12
Publication date: 2021-12-10

Abstract

The application discloses a general fitting method and a general fitting device for a neural network activation function; the method includes loading an activation layer in a neural network; generating four segmentation parameters according to a preset activation function in the activation layer; obtaining five-segment piecewise fitting functions according to the four piecewise parameters to generate a target fitting function; and resetting the activation layer according to the target fitting function. The method can fit the complex activation function into a simple piecewise linear function, can ensure the precision of the convolutional neural network algorithm under lower hardware complexity, solves the hardware realization problem of the activation complex function, and improves the processing performance of the NPU.

Description

General fitting method and device for neural network activation function

Technical Field

The present application relates to the field of convolutional neural networks, and in particular, to a general fitting method and device for a neural network activation function.

Background

The activation functions in the neural network, such as SiReLu, Mish, Hard-Swish and the like, can improve the precision of the neural network and strengthen the learning capability of the neural network through the nonlinear characteristics of the activation functions. However, because the calculation formula of the activation function is complex, a conventional NPU (Neural-network Processing Unit) cannot directly support the hardware implementation of the activation function.

Currently, when an NPU processes a neural network, the NPU generally reloads an encountered activation function into a Central Processing Unit (CPU) or a Graphics Processing Unit (GPU), and the NPU implements the encountered activation function in a software manner.

However, the software mode brings obvious data transmission burden, and the software processing speed is slow, which increases the waiting time of the NPU and reduces the actual processing performance of the NPU.

Disclosure of Invention

In order to solve the problems that the waiting time of an NPU is increased and the actual processing performance of the NPU is reduced when an activation function in a convolutional neural network is processed in a software mode, the application discloses a general fitting method and a general fitting device for the activation function of the neural network through the following aspects.

The application discloses in a first aspect a general fitting method for a neural network activation function, comprising

Loading an activation layer in a neural network; the activation layer is a preset activation function;

generating a first segmentation parameter, a second segmentation parameter, a third segmentation parameter and a fourth segmentation parameter according to a preset activation function;

when the input data is less than or equal to the first segmentation parameter, outputting a first segment fitting function;

when the input data is larger than the first segmentation parameters and smaller than or equal to the second segmentation parameters, outputting a second segment fitting function,

when the input data is larger than the second section parameter and smaller than or equal to the third section parameter, outputting a third section fitting function;

when the input data is larger than the third segmentation parameter and smaller than or equal to the fourth segmentation parameter, outputting a fourth segment fitting function;

when the input data is larger than the fourth segmentation parameter, outputting a fifth segment of fitting function;

generating a target fitting function according to the first section of fitting function, the second section of fitting function, the third section of fitting function, the fourth section of fitting function and the fifth section of fitting function;

and resetting the activation layer according to the target fitting function.

Optionally, the first segmentation parameter is determined according to an input value corresponding to an approximate zero point of a preset activation function;

the second section parameter is determined according to the corresponding input value when the output value of the preset activation function is minimum;

the third section parameter is zero;

and the fourth section parameter is determined according to the corresponding input value when the output value of the preset activation function is maximum.

Optionally, the output of the first segment of the fitting function is zero; the second fitting function is a unary linear function; the third fitting function is a direct proportional function; the output value of the fourth segment of fitting function is an input value; the output of the fifth segment fitting function is the fourth segment parameters.

Optionally, the target fitting function satisfies the following relationship:

，

wherein x is₁Is a first segmentation parameter, x₂Is a second section parameter, x₃Is a fourth segmentation parameter;

wherein n, K and b are respectively a first fitting parameter, a second fitting parameter and a third fitting parameter; and the first fitting parameter, the second fitting parameter and the third fitting parameter are determined according to the corresponding function expression of the target fitting function when the square difference between the function expression of the target fitting function and the function expression of the preset activation function is minimum.

Optionally, the fourth section parameter is a maximum value that can be represented by a single datum in the neural network.

A second aspect of the present application discloses a general fitting apparatus for a neural network activation function, which is applied to the general fitting method for a neural network activation function according to the first aspect of the present application, and the general fitting apparatus includes:

a plurality of fitting units; wherein the plurality of fitting units operate in a pipelined manner;

any fitting unit comprises an activation function unit and a control unit; wherein,

the control unit is used for configuring the fitting parameters in the corresponding activation function unit according to the preset activation function and temporarily storing the segmented fitting result output by the previous fitting unit; wherein the fitting parameters comprise comparison objects, coefficients, shift numbers and bias terms;

the activation function unit is used for carrying out the piecewise fitting operation of the activation function on the received data, generating a piecewise fitting result and outputting the piecewise fitting result to the next fitting unit.

Optionally, any of the activation function units includes a selector, a processing unit, a judging unit, and a data through path; wherein

The processing unit is used for fitting the input data and outputting a fitting result;

the judging unit is used for storing the segmentation judging conditions corresponding to the processing unit;

the data through path is used for directly transmitting the input data to the selector;

the selector is used for selecting from the fitting result and the input data according to the segmentation judgment condition to generate a segmentation fitting result and transmitting the segmentation fitting result to the next fitting unit.

Optionally, only one of the fitting units includes a multiplier; where the multiplier uses 4-bit unsigned numbers as coefficients.

Optionally, the control unit is a register.

Optionally, the general fitting device includes two fitting units; or the generic fitting means comprise four fitting units.

Drawings

FIG. 1 is a schematic flow chart illustrating a general fitting method for neural network activation functions according to an embodiment of the present disclosure;

FIG. 2 is a diagram illustrating fitting results of a general fitting method for neural network activation functions disclosed in an embodiment of the present application;

fig. 3 is a schematic diagram of a fitting result of a SiReLu function implemented by a general fitting method for a neural network activation function disclosed in an embodiment of the present application;

fig. 4 is a schematic diagram of a fitting result of a mesh function implemented by a general fitting method for a neural network activation function disclosed in an embodiment of the present application;

FIG. 5 is a schematic diagram illustrating a fitting result of a Hard-Swish function implemented by a general fitting method for a neural network activation function disclosed in an embodiment of the present application;

FIG. 6 is a schematic structural diagram of a generic fitting apparatus for neural network activation functions according to an embodiment of the present disclosure;

FIG. 7 is a schematic diagram of an internal structure of a fitting unit in a generic fitting apparatus for neural network activation functions disclosed in an embodiment of the present application;

fig. 8 is a schematic workflow diagram of a hardware fitting of a SiReLu function implemented by the general fitting apparatus for a neural network activation function disclosed in the embodiment of the present application.

Detailed Description

In order to solve the problems that the waiting time of an NPU is increased and the actual processing performance of the NPU is reduced when an activation function is processed in a software mode, the application discloses a general fitting method and a general fitting device of a neural network activation function through the following embodiments.

Referring to fig. 1, a schematic flow chart of a general fitting method for a neural network activation function disclosed in the first embodiment of the present application includes:

step 01, adding an activation layer in a neural network.

Wherein the activation layer is a preset activation function.

And step 02, generating a first segmentation parameter, a second segmentation parameter, a third segmentation parameter and a fourth segmentation parameter according to the preset activation function.

The first segmentation parameter is determined according to an input value corresponding to an approximate zero point of the preset activation function; the second section parameter is determined according to the corresponding input value when the output of the preset activation function is minimum; the third section parameter is zero; and the fourth section parameter is determined according to the corresponding input value when the output value of the preset activation function is the maximum value.

Further, the approach zero point of the preset activation function refers to a corresponding input value when the data is smaller than the resolution of the data of the hardware. Because the data expression in hardware adopts a fixed-point mode, the range and the resolution of the data are determined, and the resolution of the data expression is 0.125 by taking 8-bit fixed-point number, and the (1, 4, 3) (1 sign bit, 4 integer bits and 3 decimal bits) quantization distance as an example. That is, when the data is less than 0.125 and greater than-0.125, the value thereof cannot be accurately represented, and it has no meaning in making an accurate distinction.

Further, for a (1, p, q) quantization distance, the resolution of the data is 2^-q。

Further, in practical applications, there are some activation functions for which the third section parameter of the target fitting function is positive infinity. In this case, correspondingly in hardware, the third section parameter is the maximum value that can be represented by a single datum in the convolutional neural network.

Step 03, outputting a first segment fitting function when the input data is less than or equal to the first segment parameter;

wherein the output of the first-segment fitting function is zero.

And step 04, outputting a second segment fitting function when the input data is larger than the first segment parameter and smaller than or equal to the second segment parameter.

Wherein the second fitting function is a unary linear function;

and step 05, outputting a third segment fitting function when the input data is larger than the second segment parameter and is less than or equal to zero.

Wherein the third segment fitting function is a direct proportional function.

And step 06, outputting a fourth segment fitting function when the input data is larger than the third segment and smaller than or equal to the fourth segment parameter.

Wherein the output value of the fourth segment of fitting function is an input value;

and step 07, outputting a fifth segment fitting function when the input data is larger than the fourth segment parameter.

Wherein the output of the fifth segment fitting function is the fourth segment parameters.

And step 08, generating a target fitting function according to the first section of fitting function, the second section of fitting function, the third section of fitting function, the fourth section of fitting function and the fifth section of fitting function.

And 09, resetting the activation layer according to the target fitting function.

Fig. 2 is a schematic diagram of a fitting result of the general fitting method for a neural network activation function disclosed in this embodiment.

Referring to fig. 2, the target fitting function satisfies the following relationship:

。

wherein x is₁Is a stand forThe first segmentation parameter, x₂Is the second section parameter, x₃Is the third segmentation parameter;

n, K and b are respectively a first fitting parameter, a second fitting parameter and a third fitting parameter; wherein,

the first fitting parameter, the second fitting parameter and the third fitting parameter are determined according to the corresponding function expression of the target fitting function when the square difference between the function expression of the target fitting function and the function expression of the preset activation function is minimum. That is, the functional expressions of the target fitting function and the preset activation function are calculated according to different n, K and b, the square difference of the two expressions is determined, and n, K and b which enable the square difference to be minimum are searched, namely, the n, K and b which enable the fitting error to be minimum are searched.

First, a fitting scheme of the SiReLu function is implemented by applying the general fitting method of the neural network activation function disclosed in this embodiment.

The mathematical expression of the SiReLu function is:

the corresponding target fitting function satisfies the following relation:

。

wherein x is₁= 3.6, corresponding approximation to zero, x₂= 1.2, corresponding to the lowest point; n =1, K =0.25, b = -0.45.

The fitted curve of the above fitting scheme is shown in fig. 3.

When the neural network operation is carried out, the realization precision of the SiReLu function is 81.9%, the realization precision of the corresponding target fitting function is 81.2%, and the precision loss is only 0.7%.

In a second example, the fitting scheme of the Mish function is realized by applying the general fitting method of the neural network activation function disclosed in the present embodiment.

The mathematical expression of the Mish function is:

the corresponding target fitting function satisfies the following relation:

。

wherein x is₁= 3.6, corresponding approximation to zero, x₂= 1.2, corresponding to the lowest point; n =1, K =0.125, b = -0.225.

The fitted curve of the above fitting scheme is shown in fig. 4.

When the neural network operation is carried out, the realization precision of the Mish function is 81.7%, the realization precision of the corresponding target fitting function is 81.1%, and the precision loss is only 0.6%.

And thirdly, implementing the fitting scheme of the Hard-Swish function by applying the general fitting method of the neural network activation function disclosed by the embodiment.

The mathematical expression of the Hard-Swish function is:

，

the corresponding target fitting function satisfies the following relation:

。

wherein x is₁= 3, corresponding approximation to zero, x₂= 1.5, corresponding to the lowest point; n =0, K =0.25, b = -0.75.

The fitted curve of the above fitting scheme is shown in fig. 5.

When the neural network operation is carried out, the implementation precision of the Hard-Swish function is 81.7%, the implementation precision of the corresponding target fitting function is 81.1%, and the precision loss is only 0.6%.

It should be noted that the general fitting method for neural network activation functions disclosed in this embodiment may fit any activation function, and is not limited to the three activation functions described above.

The embodiment discloses a general fitting method of a neural network activation function, which comprises the steps of loading an activation layer in a neural network; generating four segmentation parameters according to a preset activation function in the activation layer; obtaining five-segment piecewise fitting functions according to the four piecewise parameters to generate a target fitting function; and resetting the activation layer according to the target fitting function. The method can fit the complex activation function into a simple piecewise linear function, can ensure the precision of the convolutional neural network algorithm under lower hardware complexity, solves the hardware realization problem of the activation complex function, and improves the processing performance of the NPU.

The application also discloses a general fitting device of the neural network activation function, which is applied to the general fitting method of the first embodiment of the application, and the general fitting device comprises: a plurality of fitting units; the plurality of fitting units operate in a pipelined manner.

Fig. 6 is a schematic structural diagram of a generic fitting apparatus for neural network activation functions disclosed in this embodiment.

Referring to fig. 6, any of the fitting units includes an activation function unit and a control unit. The control unit configures corresponding fitting parameters in the active function units according to a preset active function, and temporarily registers a segmented fitting result output by the previous fitting unit; the activation function unit is used for carrying out the piecewise fitting operation of the activation function on the received data, generating a piecewise fitting result and outputting the piecewise fitting result to the next fitting unit.

Further, the fitting parameters include comparison objects, coefficients, shift numbers, bias terms and the like.

In a specific application, the control unit is generally a register.

Fig. 7 is a schematic diagram of an internal structure of the activation function unit of the generic fitting apparatus disclosed in this embodiment.

Referring to fig. 7, any of the activation function units includes a selector, a processing unit, a judging unit, and a data through path.

The processing unit is used for fitting the input data and outputting a fitting result; the judging unit is used for storing the segmentation judging conditions corresponding to the processing unit; the data through path is used for directly transmitting input data to the selector; the selector is used for selecting from the fitting result and the input data according to the segmentation judgment condition, generating the segmentation fitting result and transmitting the segmentation fitting result to the next fitting unit.

Further, only one of the fitting units comprises a multiplier; wherein the multiplier uses a 4-bit unsigned number as a coefficient.

The general fitting device realizes the fitting of the preset activation function in a sectional mode in a pipeline mode, and the number of the sections can be adjusted according to a specific application scene. The activation function units are relatively independent on different pipeline nodes, and better circuit time sequence and area can be realized.

Furthermore, the fitting process of the whole activation function is completed by a plurality of fitting units in a relay mode, the work completed by each fitting unit can be configured, and circuit and time sequence optimization can be achieved.

In practical application, the general fitting device comprises four fitting units, fitting can be completed in four cycles, and one segment of the fitting units is fitted in each cycle. In some application scenarios, in order to achieve lower power consumption, two fitting units may be used, and the fitting is performed in two cycles, where each cycle fits two segments of the target activation function.

Fig. 8 is a block diagram of a structure of the general fitting apparatus of the present embodiment including four fitting units, where Reg represents a register, Step: 1 denotes the first fitting unit and so on.

The input to the generic fitting device is Y = X.

A first fitting unit for judging whether the input data is less than or equal to x₁If yes, setting zero, otherwise, keeping unchanged.

A second fitting unit for determining whether the input data is less than or equal to x₂If yes, shift right by n bits, and add bias term, otherwise do not change. Wherein n is a first fitting parameter of the target fitting function; the bias term is a third fitting parameter of the objective fitting function.

The third fitting unit judges whether the input data is less than or equal to 0, if so, the input data is multiplied by a coefficient K; otherwise, the value is not changed. Wherein the coefficient K is a second fitting parameter of the target fitting function.

A fourth fitting unit for determining whether the input data is less than or equal to x₃If yes, then the result is not changed, otherwise, x is output₃。

The whole activation function is completed by fitting in four periods, and time sequence pressure can be relieved.

Taking the sierelu function as an example, the universal fitting device comprises four fitting units, fitting can be completed in four cycles, and one of the fitting units is fitted in each cycle, namely the four fitting units are respectively fitted with four segments of x < = -3.6, -3.6< x < = -1.2, -1.2< x < =0 and x > 0.

The first fitting unit is used for judging whether the input data is less than or equal to-3.6, if so, setting zero, otherwise, keeping unchanged; and outputting the processed data to the next fitting unit.

The second fitting unit is used for judging whether the input data is less than or equal to-1.2, if the input data is less than or equal to-0.5 x-1.8, the input data is shifted by one bit right, and a bias term is added, otherwise, the input data is not changed; and outputting the processed data to the next fitting unit.

Third fitting unit: judging whether the input data is less than or equal to 0, if so, multiplying by a coefficient of 0.25, and if not, keeping unchanged; and outputting the processed data to the next fitting unit.

Fourth fitting unit: the SiReLu function has no fourth segmentation parameter x₃. In practical application, x is₃The maximum value which can be represented by data in the general purpose fitting device, for example, the maximum value which can be represented by 8bit data is 7F (Hexadecimal). And judging whether the input data is less than 7F, if so, keeping the input data unchanged, and otherwise, setting the output to be 7F.

The logic of the processing process is clear, and the circuit structure is clear in grading, so that the circuit time sequence is improved.

Similar parts between the various embodiments and examples may be referred to one another.

Claims

1. A method for universal fitting of a neural network activation function, comprising:

generating a first segmentation parameter, a second segmentation parameter, a third segmentation parameter and a fourth segmentation parameter according to the preset activation function;

outputting a second segment fitting function when the input data is greater than the first segment parameter and less than or equal to the second segment parameter,

when the input data is larger than the second segmentation parameter and smaller than or equal to the third segmentation parameter, outputting a third segment fitting function;

when the input data is larger than the fourth segmentation parameter, outputting a fifth segment fitting function;

and resetting the activation layer according to the target fitting function.

2. The method according to claim 1, wherein the first segment parameter is determined according to an input value corresponding to an approximate zero point of the preset activation function;

the third section parameter is zero;

3. The method of claim 1, wherein the output of the first fitting function is zero;

the second fitting function is a unary linear function;

the third fitting function is a direct proportional function;

the output value of the fourth segment of fitting function is an input value;

the output of the fifth segment fitting function is the fourth segment parameters.

4. A method as claimed in any one of claims 1 to 3, wherein the target fitting function satisfies the following relationship:

，

wherein x is₁Is the first segmentation parameter, x₂Is the second section parameter, x₃Is the fourth segmentation parameter;

wherein n, K and b are respectively a first fitting parameter, a second fitting parameter and a third fitting parameter; wherein the first fitting parameter, the second fitting parameter and the third fitting parameter are determined according to a function expression of the target fitting function corresponding to the case where a square error between a function expression of the target fitting function and a function expression of the preset activation function is minimum.

5. The method of claim 2, wherein the fourth segment parameter is a maximum value that can be represented by a single datum in the neural network.

6. A general fitting apparatus for a neural network activation function, the apparatus being applied to the general fitting method for a neural network activation function according to any one of claims 1 to 5, the general fitting apparatus comprising:

wherein any of said fitting units comprises an activation function unit and a control unit; wherein,

the control unit is used for configuring corresponding fitting parameters in the activation function units according to a preset activation function and temporarily storing the segmented fitting result output by the previous fitting unit; wherein the fitting parameters include comparison objects, coefficients, shift numbers, and bias terms;

7. The apparatus of claim 6, wherein any of said activation function units comprises a selector, a processing unit, a determining unit, and a data through path; wherein

the data through path is used for directly transmitting input data to the selector;

the selector is used for selecting from the fitting result and the input data according to the segmentation judgment condition, generating the segmentation fitting result and transmitting the segmentation fitting result to the next fitting unit.

8. The apparatus of claim 6, wherein only one of said fitting units comprises a multiplier; wherein the multiplier uses a 4-bit unsigned number as a coefficient.

9. The apparatus of claim 6, wherein the control unit is a register.

10. The device of claim 6, wherein the general fitting device comprises two fitting units;

or

The general fitting device comprises four fitting units.