CN110135568B

CN110135568B - Full-integer neural network method applying bounded linear rectification unit

Info

Publication number: CN110135568B
Application number: CN201910453798.8A
Authority: CN
Inventors: 赵恒锐
Original assignee: Individual
Current assignee: Individual
Priority date: 2019-05-28
Filing date: 2019-05-28
Publication date: 2022-03-04
Anticipated expiration: 2039-05-28
Also published as: CN110135568A

Abstract

The invention relates to the technical field of artificial intelligence, in particular to a full-integer neural network method applying a bounded linear rectification unit, which comprises the following stepsThe following steps: firstly, changing the format of input data by using a new normalization method to adapt to a full integer network; step two, after training is finished, calculating discrete step length for each convolution kernel, and for the convolution kernels with batch normalization layers, for W of different channels_fAnd b_fCalculating discrete step independently; discretizing network parameters of floating point numbers; step four, discretizing W_fThen, training a floating-point number network by using an accumulative gradient updating algorithm; replacing the traditional linear rectifying unit with a bounded linear rectifying unit; step six, an N variance principle is applied to find out a proper BLU parameter h, and the distribution of the output data of the ReLU is firstly counted before the BLU is adopted; the technical scheme provided by the invention can effectively overcome the defects that the prior art is difficult to realize the consideration of speed, storage space, performance and practicability and has lower performance.

Description

Full-integer neural network method applying bounded linear rectification unit

Technical Field

The invention relates to the technical field of artificial intelligence, in particular to a full-integer neural network method applying a bounded linear rectification unit.

Background

With the development of science and technology, people are increasingly researching artificial intelligence, which is abbreviated as AI. The method is a new technical science for researching and developing theories, methods, technologies and application systems for simulating, extending and expanding human intelligence. At present, a neural network compression algorithm mainly comprises a method for reducing bit width of parameters and reducing the number of the parameters, so that the storage volume of a neural network is smaller, the operation speed is higher, and the existing neural network optimization method is difficult to realize the consideration of speed, storage space, performance and practicability. Although the existing full-integer neural network can perform full-integer operation, the performance is low, and therefore, the development of a full-integer neural network method applying a bounded linear rectifying unit is the key point for solving the problems.

An effective convolutional neural network compression method (CNNpack) is disclosed in patent application publication No. CN 106557812 a, application publication No. 2017.04.05. Deep convolutional neural networks are widely used as a basic deep learning structure in many computer vision tasks. However, most convolutional neural networks are limited in their application to mobile devices due to their large memory and high computational load. The invention aims at the problem that the network is compressed in the frequency domain; by considering the convolution kernel as an image, the representation in the frequency domain is decomposed into a common part (cluster center) and a private part (residual), and then the low-energy coefficients are discarded without affecting the network accuracy. Furthermore, by linearly combining the convolution responses of the DCT bases, network computation cost can be reduced. The invention has the characteristics of high compression ratio and high speed-up ratio, and can be used for compressing a common deep convolutional network.

However, the effective convolutional neural network compression method is difficult to achieve the compromise of speed, storage space, performance and practicability, and the performance is low.

Disclosure of Invention

Technical problem to be solved

Aiming at the defects in the prior art, the invention provides a full-integer neural network method applying a bounded linear rectification unit, which can effectively overcome the defects of the prior art that the speed, the storage space, the performance and the practicability are difficult to be considered and the performance is lower.

(II) technical scheme

In order to achieve the purpose, the invention is realized by the following technical scheme:

a full integer neural network method employing bounded linear rectification units, comprising the steps of:

step one, subtracting a fixed value from the pixel of each image by using a new normalization method, converting the pixel of each image into a signed integer with the same number of bits, multiplying all images by a fixed floating point number for a floating point number network, maintaining the integer network unchanged, directly sending the images into the network, and training the floating point number network by adopting the conventional random gradient descent algorithm;

step two, after training is finished, calculating discrete step length for each convolution kernel, and for the convolution kernels with batch normalization layers, for W of different channels_fAnd b_fCalculating discrete step independently;

discretizing floating point number network parameters, and discretizing W_fThe method of invariable fixed zero point is adopted to calculate W_fDiscrete step size of (a);

step four, discretizing W_fThen, training a floating-point number network by using an accumulative gradient updating algorithm;

replacing the traditional linear rectifying unit with a bounded linear rectifying unit;

step six, an N variance principle is applied to find out a proper BLU parameter h, the distribution of the output data of the ReLU is firstly counted before the BLU is adopted, and according to the selected N, the minimum value in the data of the maximum corresponding proportion in the output data is used as the initial parameter h of the BLU;

step seven, quantizing and dequantizing h to calculate final parameters of a floating point network and an integer network, introducing multiplication mul and right shift to a feature map for cross-layer connection or a connection layer, and then modifying discrete step;

step eight, after BLU parameters of each layer of the floating-point number network and the integer network are determined, the floating-point number network is trained again by utilizing an accumulative gradient updating algorithm;

step nine, calculating an integer network parameter W_iAnd b_iAnd obtaining the final integer neural network.

Preferably, W is calculated in the third step_fMaximum of absolute values of parameters: maxabs ═ max (abs (W)_f))。

Preferably, W is calculated in the third step_fTo W_iDiscrete step of (1):

for convolutional layers with batch normalization, step needs to be recalculated from different channels.

Preferably, the specific algorithm of the step seven is as follows:

h_i＝[h_f*ratio_Y]

mul，shift＝mul_shift(h_i，m)

ratio_v＝ratio_Y*mul*2^-shift，h_f＝(2^m-1-1)/ratio_v

h_i＝[h_f*ratio_Y]。

preferably, said step nine

(III) advantageous effects

Compared with the prior art, the invention provides a full integer neural network method applying a bounded linear rectification unit, which has the following beneficial effects:

firstly, changing a processing method of input data of a neural network to make the input data suitable for integer network operation;

secondly, discretizing the weight parameters of the floating-point number network, keeping the zero point unchanged, and reducing the operation amount of the target integer network;

third, the floating-point number network is trained using the cumulative gradient descent algorithm, using different cumulative step sizes for the batch normalization layer. Therefore, the accuracy rate after training can be improved;

fourth, replace the ReLU with BLU. Therefore, the accuracy of the target integer network is improved;

fifthly, a method for searching for a BLU parameter suitable for a floating point network is provided, so that the accuracy of the floating point network is ensured, and the accuracy of the integer network is improved;

sixthly, a quantization and inverse quantization method of the BLU parameter is provided, so that the result of the integer network is closer to a floating point network;

seventh, processing of the special network structure: cross-layer connections and connection layers. In the case of adding or connecting a plurality of layers, the ratio of the layers can be ensured to be equal by adjusting three parameters of mul, shift and step.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a schematic flow diagram of the process of the present invention;

FIG. 2 is a schematic diagram of a single layer convolution according to an embodiment of the present invention;

FIG. 3 is a diagram illustrating BLU definition (left) and effects (right) according to an embodiment of the present invention;

FIG. 4 is a process of adjusting BLU parameters (left) and the result of the adjustment (right) in an embodiment of the present invention;

FIG. 5 is a diagram of an integer cross-layer connection (left) and a floating point cross-layer connection (right) according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

A full integer neural network method employing bounded linear rectification units, as shown in fig. 1-5, comprising the steps of:

step one, subtracting a fixed value from a pixel of each image by using a new normalization method, converting the pixel of each image into a signed integer with the same number of bits, for example, subtracting 128 bits from an 8-bit image, subtracting 32768 bits from a 16-bit image, multiplying all images by a fixed floating point number for a floating point number network, directly sending the integer network into the network while maintaining the integer network unchanged, and training the floating point number network by adopting the conventional random gradient descent algorithm, so that the floating point number network is easy to train under the condition that the input data of the floating point number network and the integer network only differ by a fixed multiplying power;

step two, after training is finished, calculating discrete step length for each convolution kernel, and for the convolution kernels with batch normalization layers, for W of different channels_fAnd b_fThe discrete step is independently calculated, so that the accuracy of the trained neural network is higher;

discretizing floating point number network parameters, and discretizing W_fThe method of invariable fixed zero point is adopted to calculate W_fCompared with the prior method of changing the zero point, the discrete step length of the method reduces the calculated amount;

step five, the traditional linear rectification unit is replaced by the bounded linear rectification unit, so that the output range of the convolution layer is fixed, the learning capability of the floating-point number network is kept, and the precision of the corresponding integer network is higher;

and step six, applying an N variance principle to find a proper BLU parameter h, counting the distribution of the output data of the ReLU before adopting the BLU, and according to the selected N, using the minimum value in the data of the maximum corresponding proportion in the output data as the initial parameter h of the BLU, for example, when the N is 3, selecting the minimum value in the data of the maximum 0.15% of all the output data of the ReLU of one layer as the initial h of the layer. By reasonably selecting the size of N, most data can be smaller than h, and h is not too large;

step seven, quantizing and dequantizing h to calculate final parameters of the floating point network and the integer network, introducing multiplication mul and right shift to the feature map for cross-layer connection or connection layers, then modifying discrete step, multiplying mul _ d and right shift _ d for other layers by taking one of the convolution layers as a reference layer, and modifying corresponding step, taking direct addition across two layers as an example, as shown in fig. 5, the specific algorithm is as follows:

first, according to the known ratio_Y1Recursion layer by layer to obtain ratio_V2And ratio_Y3；

Second, calculate mul_d，shift_d＝mul_shift_f(ratio_Y1，ratio_Y3) To make

Third, modify the step size of the latter convolutional layer:

for tie layers, e.g. Y_1iAnd Y_2iConnecting, selecting the one with the minimum ratio, and setting Y as the rest_1iAfter quantization is V_1iCorresponding to ratio_V1. Step, mul and shift corresponding to all other Y are adjusted to make the ratio corresponding to the other Y_VWith ratio_V1Equal;

Specifically, the step three is to calculate W_fMaximum of absolute values of parameters: maxabs ═ max (abs (W)_f) For convolution layers with batch normalization, step needs to be recalculated according to different channels; calculating W in the third step_fTo W_iDiscrete step of (1):

so that the denominator takes 2^n-1-1 instead of 2^n-1To avoid overflow; the specific algorithm of the seventh step is as follows:

h_i＝[h_f*ratio_Y]

mul，shift＝mul+shift(h_i，m)

ratio_V＝ratio_Y*mul*2^-shift，h_f＝(2^m-1-1)/ratio_V

h_i＝[h_f*ratio_Y]，[]representing a rounding operation. This has the advantage that an integer network of 2 can be made^m-11 ratio strictly_VH corresponding to floating-point number network_fThereby reducing errors; in the ninth step

The purpose of this scheme is to convert a given floating-point number network (down) into a full integer network (up, hereinafter simply referred to as an integer network), as shown in fig. 2. FIG. 2 is a structure of a layer of convolutional layers of a full-integer network, a network being formed by connecting a plurality of such layers;

given a floating-point number network, by setting the parameter W of the floating-point number network_fAnd b_fThe quantization can obtain the parameter W corresponding to the integer network_iAnd b_i. All data in a floating point network is of the float32 type. For integer networks, the weight W is defined in this scheme_iAnd input data X_iN, output of integer convolution operation U_iAnd parameter b_iAre all k-bit integers, which are added to give Y_iAlso a k-bit integer. Y is_iObtaining m-bit V through Bounded Linear rectification Unit (BLU) and quantization_iDefinition of the scheme V_iThe number of bits of (d) is m. In general V_i> 0, so V can also be represented by unsigned integers_iThus, one bit of data can be saved;

the bounded linear rectification unit may be represented in the form:

for a certain layer of floating-point number network, the scheme defines h as h_fThe corresponding integer network is defined as h_i. It can be seen that all data output is limited to between 1 and h, thus being output data Y_iQuantization to V_iBringing great convenience. To quantize Y_iTo V_iThe scheme adopts a method of first multiplying and then moving to the right proposed by Google;

by multiplying and right shifting, the effective data bit width will be reduced. The method for calculating mul and shift adopts the existing algorithm, and the scheme is defined as follows:

mul，shift＝mul_shift(Y_i，m)

and

mul，shift＝mul_shift_f(x，y)

in the scheme, all integer network parameters and data have corresponding objects in a floating-point network, and the difference between the integer network parameters and the floating-point network data is a multiple. Parameters or data X for an integer network_iAnd object X in a corresponding floating-point number network_fThe relationship between the two is as follows:

X_i≈X_f*ratrio_X

obviously, ratio_XIs propagated layer by layer, e.g. ratio_U＝ratio_X*ratio_W。

The above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those skilled in the art; the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not depart from the spirit and scope of the corresponding technical solutions.

Claims

1. A full integer neural network method employing bounded linear rectification units, characterized by: the method comprises the following steps:

discretizing floating point number network parameters, and discretizing W_fThe method of invariable fixed zero point is adopted to calculate W_fCalculating W from the discrete step size of_fMaximum of absolute values of parameters:

maxabs＝max(abs(W_f))；

calculating W_fTo W_iDiscrete step of (1):

for convolution layers with batch normalization, step needs to be recalculated according to different channels;

2. The full integer neural network method with the applied bounded linear rectification unit of claim 1, wherein: the specific algorithm of the seventh step is as follows:

h_i＝[h_f*ratio_Y]

mul，shift＝mul_shift(h_i，m)

ratio_v＝ratio_Y*mul*2^-shift，h_f＝(2^m-1-1)/ratio_v，

h_i＝[h_f*ratio_Y]。

3. the full integer neural network method with the applied bounded linear rectification unit of claim 1, wherein: in the ninth step