CN113743601A

CN113743601A - Method for retraining compensation activation function of low-bit quantization network

Info

Publication number: CN113743601A
Application number: CN202010460267.4A
Authority: CN
Inventors: 周飞飞
Original assignee: Hefei Ingenic Technology Co ltd
Current assignee: Hefei Ingenic Technology Co ltd
Priority date: 2020-05-27
Filing date: 2020-05-27
Publication date: 2021-12-03

Abstract

The invention provides a method for retraining a compensation activation function of a low-bit quantization network, which is characterized in that in a full-precision quantization model, the size of compressed data of an original activation function is divided by the size of compressed data of the activation function, and the compressed data is multiplied by the size of the compressed data of the corresponding activation function after calculation by an original model formula, namely, the full-precision data distribution is kept, and only the dispersion of characteristic data is modified.

Description

Method for retraining compensation activation function of low-bit quantization network

Technical Field

The invention relates to the technical field of neural networks, in particular to a method for retraining a compensation activation function by a low-bit quantization network.

Background

With the rapid development of computer technology, algorithms based on convolutional neural networks are successfully applied to various identification fields. In the prior art, particularly, as the number of stacked layers of convolutional neural network models is increased, features (features) to low bits (low bits) are required to be quantized in order to enable the models to normally operate at a mobile terminal, so that the calculated amount of the models is reduced. However, a series of more quantitative operation nodes are added to the activation function, so that precision loss and long model convergence time occur during low-bit quantization retraining. In the prior art, on the basis of full precision, corresponding quantization operation nodes are added to an activation function, and training is started from the beginning based on full-precision parameter information.

In the prior art, when 32-bit quantization is low bit, due to the requirement of ensuring the precision after quantization, fine adjustment needs to be performed on the basis of the original full precision, and due to the fact that the corresponding quantization operation mode is added to the activation function, the parameter distribution of the low-bit model has a large difference relative to the full precision, the low-bit model needs to be retrained based on the full-precision quantization, but the full-precision level is often difficult to achieve, and even convergence cannot be achieved.

Furthermore, the common terminology in the prior art is as follows:

and (3) activating a function, wherein each neuron node in the neural network receives the output value of the neuron in the previous layer as the input value of the neuron and transmits the input value to the next layer, and the neuron node in the input layer can directly transmit the input attribute value to the next layer (hidden layer or output layer). In the multi-layer neural network, there is a functional relationship between the output of the upper node and the input of the lower node, and this function is called an activation function (also called an excitation function), as shown in fig. 2, Relu ═ max (0, x), and an image of the Relu function and its derivative.

Disclosure of Invention

The method aims to make up the situation that the model convergence is difficult or even not convergent due to the fact that the corresponding quantization operation nodes are added to the activation functions, overcome the defects in the prior art, and solve the problems of network non-convergence, training time and the like in the processes of low-bit quantization fine adjustment and retraining.

The method belongs to a technology for carrying out quantitative retraining on the basis of full-precision quantification of a deep neural network, can effectively improve the degree of polymerization of a model in the process of network retraining, eliminates the problem that the activation function is not converged due to the addition of corresponding quantification nodes, and enables the model not to reach the full-precision model convergence. By the method, the model can reach the full-precision level more quickly, and the condition that the model is not converged due to the addition of the quantization nodes by the activation function is effectively reduced.

Specifically, the invention provides a method for retraining a compensation activation function of a low-bit quantization network, which is characterized in that in a full-precision quantization model, the size of compressed data of an original activation function is divided by the size of compressed data of the activation function, and the compressed data is multiplied by the size of the compressed data of the corresponding activation function after calculation by an original model formula, namely, the full-precision data distribution is kept, and only the dispersion of characteristic data is modified.

The method further comprises the steps of:

s1, assuming the feature of the ith layer is X_iQuantizing the features by using an activation function;

s2, adding a corresponding quantization node process on the basis of full precision to realize characteristic quantization to low bits, and specifically realizing the characteristic quantization to low bits by the following formula:

X_i＝clip(X_i,relu_size)

X_i＝Round{(X_i/relu_size)*2^bit}/2^bit

wherein relu _ size is the size of compressed data of an activation function, the characteristics are compressed to [0, relu _ size ], and a characteristic compression range is defined according to the size of the compressed data;

at S3, the calculation of the formula at S2 is multiplied by the corresponding relu _ size, which is equivalent to maintaining the full-precision data distribution, but only modifying the dispersion of the feature data.

The step S1 of using the activation function quantization feature includes using a relu6 activation function for the 8-bit quantization feature and using a relu3 activation function for the 4-bit quantization feature.

The relu6 activation function is:

relu6(x)＝min(max(x,0),6)∈[0,6]；

thus, the present application has the advantages that:

1. the problem of model non-convergence caused by adding quantization nodes to an activation function during low-bit model quantization is solved;

2. and the quantization retraining time of the model from full precision to low bit is reduced.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the principles of the invention.

FIG. 1 is a schematic flow diagram of the process of the present invention.

FIG. 2 is an image corresponding to the Relu function and its derivative function in the method of the present invention.

Fig. 3 is an image of the Relu6 activation function and derivative function correspondence in the method of the present invention.

Detailed Description

In order that the technical contents and advantages of the present invention can be more clearly understood, the present invention will now be described in further detail with reference to the accompanying drawings.

The invention relates to a method for retraining a compensation activation function of a low-bit quantization network, which divides an original activation function by the size of activation function compressed data in a full-precision quantization model, and multiplies the original activation function by the size of corresponding activation function compressed data after calculation by an original model formula, namely, the full-precision data distribution is kept, and only the dispersion of characteristic data is modified.

As shown in fig. 1, the method further comprises the steps of:

X_i＝clip(X_i,relu_size)

X_i＝Round{(X_i/relu_size)*2^bit}/2^bit

The activation function is: f (x) max (0, x).

The step S1 of using the activation function quantization feature includes using a relu6 activation function for an 8-bit quantization feature, and using a relu3 activation function relu3(x) ═ min (max (x,0),3) for a 4-bit quantization feature.

As shown in fig. 3, Relu performs linear activation using x in the region where x >0, which may cause the value after activation to be too large and affect the stability of the model, and to counteract the linear growth part of the Relu excitation function, the Relu6 function may be used. The relu6 activation function is:

relu6(x)＝min(max(x,0),6)∈[0,6]；

in step S2, a feature compression range is defined according to the compressed data size, including when relu _ size is equal to 6, the feature is compressed to [0, 6 ]. Under normal conditions, the relu _ size of 8 bits is selected to be 6, and the relu _ size of 4 bits is selected to be 3, which are only recommended values under normal conditions, and specific numerical values can be finely adjusted according to a training model.

In fact, for the sake of easy understanding, the technical solutions created by the present invention can also be interpreted as follows:

suppose feature of i-th layer is X_iIn order to quantize the feature parameters to low bits and keep the fixed point number fully utilized, a relu6 activation function is generally adopted for 8-bit quantization features, and a relu3 activation function is adopted for 4-bit quantization features.

X_i＝clip(X_i,relu_size)

X_i＝Round{(X_i/relu_size)*2^bit}/2^bit

The above formula is a process of adding corresponding quantization nodes on the basis of full precision to realize feature quantization to low bit;

the relu _ size is the activation function compressed data size, such as relu _ size equal to 6, feature compressed to [0, 6], defined according to the compressed data size.

According to the quantization operation nodes, in the full-precision quantization model, the original activation function is divided by the relu _ size, so that loss occurs to the distribution of the full-precision model, and retraining is needed to ensure that the low-bit quantization model keeps higher precision and reduce loss; when in quantification, large learning rate needs to be adjusted to quantify learning, and meanwhile, the convergence of the model is difficult to ensure;

the improvement is as follows: the above formula is multiplied by the corresponding relu _ size, which is equivalent to maintaining the full-precision data distribution, and only the dispersion of feature data is modified, so that the low loss and high precision of the low-bit model can be maintained by directly fine tuning with a low learning rate.

The above description is only a preferred embodiment of the present invention, and is not intended to limit the present invention, and various modifications and changes may be made to the embodiment of the present invention by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A method for retraining a compensation activation function of a low-bit quantization network is characterized in that in a full-precision quantization model, the size of original activation function divided by the size of activation function compressed data is calculated by an original model formula and then multiplied by the size of corresponding activation function compressed data, namely, the full-precision data distribution is kept, and only the dispersion of characteristic data is modified.

2. The method of claim 1, wherein the method further comprises the steps of:

X_i＝clip(X_i,relu_size)

X_i＝Round{(X_i/relu_size)*2^bit}/2^bit

3. The method of claim 2, wherein the activation function is selected from the group consisting of: f (x) max (0, x).

4. A method for retraining an activation function for a low bit quantized network as claimed in claim 3, wherein the step S1 using the activation function quantization feature comprises using relu6 activation function for 8bit quantization feature and relu3 activation function relu3(x) min (max (x,0),3) for 4bit quantization feature.

5. The method of claim 4, wherein the relu6 activation function is:

relu6(x)＝min(max(x,0),6)∈[0,6]；

6. a method for retraining a low bit-quantized network padding activation function according to claim 2, wherein the step S2 defines the feature compression range according to the compressed data size, including feature compression to [0, 6] when relu _ size is equal to 6.