CN114937186A

CN114937186A - Neural network data-free quantification method based on heterogeneous generated data

Info

Publication number: CN114937186A
Application number: CN202210673423.4A
Authority: CN
Inventors: 纪荣嵘; 钟云山; 林明宝; 南宫瑞
Original assignee: Xiamen University
Current assignee: Xiamen University
Priority date: 2022-06-14
Filing date: 2022-06-14
Publication date: 2022-08-23
Anticipated expiration: 2042-06-14
Also published as: CN114937186B

Abstract

A neural network data-free quantization method based on heterogeneous generated data relates to the compression and acceleration of an artificial neural network. The method comprises the following steps: 1) the false pictures are randomly initialized using a standard gaussian distribution. 2) Optimizing and initializing the false picture until the iteration times reach the limit, and updating the false picture by using local object reinforcement, boundary distance limit, soft perception loss and BN loss; 3) quantizing the neural network, and then training the quantized network by using distillation loss and cross entropy loss by using an optimized false picture until a preset number of training rounds is reached; 4) and (5) after the training is finished, keeping the weight of the quantization network, and obtaining the quantized quantization network. Real data is not needed, a quantitative network can be obtained through training from the beginning, and network compression and acceleration can be achieved on a general hardware platform under the condition that specific hardware support is not needed.

Description

Neural network data-free quantification method based on heterogeneous generated data

Technical Field

The invention relates to compression and acceleration of an artificial neural network, in particular to a neural network data-free quantization method based on heterogeneous generated data.

Background

In recent years, Deep Neural Networks (DNNs) have been widely used in many fields such as computer vision and natural language processing. Despite the tremendous success of DNNs, the increasing network size hinders the deployment of DNNs on many resource-limited platforms, such as mobile phones, embedded devices, and the like. To overcome this dilemma, the academia and industry explore a variety of ways to reduce the complexity of DNNs, and network quantization to represent full-precision DNNs in a low-precision format is a promising direction.

Most existing methods belong to quantitative perceptual training, where quantization is performed on the premise that an original complete training data set can be obtained. However, the disadvantage also stems from its dependence on training data. In many practical situations, the original training data is sometimes prohibited from being accessed due to the ongoing deterioration of privacy and security issues. For example, people may not want their medical records to be disclosed to others, nor do business materials want to be disseminated over the internet. Therefore, the quantitative perceptual training is no longer applicable.

How to obtain quantized DNN without data is highly regarded by academic and industrial circles. Existing data-free quantification studies can be broadly divided into two categories:

the first class of dataless quantization methods does not utilize any data at all, but instead focuses on the calibration parameters. For example, DFQ (Nagel M, Baalen M, Blankovort T, et al. data-free quantization and bias correction [ C ]// Proceedings of the IEEE/CVF International Conference on Computer Vision.2019: 1325-. Simple parameter calibration tends to result in severe performance degradation. This problem is even magnified for ultra low precision cases. For example, when ResNet-18(He K, Zhang X, Ren S, et al. deep residual learning for image recognition [ C ]// Proceedings of the IEEE Conference on Computer vision and pattern recognition.2016:770-778.) is quantized to 4 bits, only the 0.10% top-1 precision of DFQ on image is reported in the GDFQ' S (Xu S, Li H, Zhuang B, et al. genetic low-bit data free quantization [ C ]// European Conference on Computer vision. Springer, Cham 2020:1-17.) appendix.

The second category helps to train the quantization network by using synthetic false images, and an intuitive solution is to deploy a generator to synthesize the training data. The generator-based approach has a large overhead in computational resources because the introduced generator must be trained from scratch for different bit settings. ZeroQ (Cai Y, Yao Z, Dong Z, et al. Zeroq: A novel zero shot quantization frame [ C ]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern registration. 2020:13169-13178.) and DSG (Zhang X, Qin H, Ding Y, et al. conversion sampling generation for encryption data-free quantization [ C ]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern registration. 2021:15658-15667.) describe data synthesis as an optimization problem in which random input data extracted from a standard Gaussian distribution is iteratively updated to fit the true data distribution. The benefit of this research route is that the composite image can be reused to calibrate or fine tune the network for different bit widths, thus enabling resource-friendly quantization. However, when comparing the feature visualizations of ZeroQ and DSG with real data, there is still a non-negligible quality gap in the composite image, since traditional gaussian synthesis is to fit the entire dataset, ignoring more subtle decision-like boundaries. Therefore, the quantization model usually suffers from a large performance degradation. To ensure decision-like boundaries in false images, perceptual loss IL (Haroush M, Hubara I, Hoffer E, et al. the knowledge with: Methods for data-free model compression [ C ]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern registration.2020: 8494-. Thus, the false data shows a separable distribution, and such false pictures do not capture the intra-class heterogeneity well. Images from the same class often contain different content; features from the same class of real pictures are very scattered and heterogeneous. Feature clustering of ZeroQ + IL and DSG + IL indicates that the same class of synthetic images is mostly homogenous. Quantization models trimmed with these false pictures do not generalize well to real test datasets with heterogeneity.

Disclosure of Invention

The invention aims to provide a neural network data-free quantization method based on heterogeneous generated data, aiming at the problems of performance degradation and the like caused by the current neural network data-free quantization method. Real data is not needed, a quantization network can be obtained through training from the beginning, the performance is higher, and particularly when a small network is quantized, compression and acceleration of the network can be realized on a general hardware platform under the condition that specific hardware support is not needed.

The invention comprises the following steps:

1) randomly initializing a false picture by using a standard Gaussian distribution;

2) optimizing the false picture until the iteration times reach the limit, and updating the false picture by using local object reinforcement, boundary distance limit, soft perception loss and BN loss;

3) quantizing the neural network, and then training the quantized network by using the optimized false picture in the step 2) and using distillation loss and cross entropy loss until a preset training round number is reached;

4) and (5) after the training is finished, keeping the weight of the quantization network, and obtaining the quantized quantization network.

In step 1), the randomly initializing false picture by using the standard gaussian distribution is to generate an initializing false picture with the same size as the real picture from the standard gaussian distribution sampling.

In step 2), the specific method for reinforcing the local object may be: random crop (crop), scale (reszie) with a probability of p ═ 50% before the fake pictures are input into the pre-trained network:

wherein, crop _η The scale representing the clipping is sampled from the uniform distribution U (η,1),

representing a false picture after local object reinforcement;

the specific method of the boundary distance limitation may be: the false pictures are limited to keep a certain distribution in the feature space of the pre-trained network:

wherein v is _F Representing features extracted using a pre-trained network,

the following were used:

wherein M is _c A set of features representing all the false drawings of the same category as the ith false drawing;

the soft perceptual loss is to provide a soft target for the false picture:

wherein U (e, 1) represents a uniform distribution from e to 1, and mes represents an average square error;

the BN loss:

wherein, mu _l (x ^f ),σ _l (x ^f ) Representing a false picture x ^f At the mean and variance of the l-th layer of the pre-training network,

representing BN parameters stored in the first layer of the pre-training network during training;

BN losses combining the above

Boundary distance limitation

Loss of soft feel

The total losses that can be obtained are:

in the step 3), the quantization neural network quantizes the pre-trained full-precision network to obtain a quantization network Q; the quantization is as follows:

wherein clip (F, l, u) ═ min (max (F, l), u), l, u denote the upper and lower clipping boundaries; representing a full precision input, which may be a network weight or an activation value; round denotes rounding its input to the nearest integer;

is a scaling factor for interconverting a full precision number and an integer, b denotes the quantization bit width; for the weight, a channel-by-channel quantization mode is used, and for the activation value, a layer-by-layer quantization mode is used; after the quantization value q is obtained, it is dequantized back by the scaling factor

Training a quantization network using distillation loss, cross entropy loss, wherein cross entropy loss:

wherein the content of the first and second substances,

representing a predicted value of the pre-trained full-precision network about the ith input picture belonging to the y-th class, wherein N represents a total of N input pictures;

distillation loss:

wherein, the first and the second end of the pipe are connected with each other,

indicating that the quantization network belongs to a predictor of class c with respect to the ith input picture,

and C represents the number of data set categories, and N represents N input pictures in total.

Compared with the prior art, the invention has the following outstanding advantages:

1) the heterogeneity of the false picture can be preserved, and the quality of the false picture is greatly improved.

2) Through a large number of experiments, the neural network non-data quantization method based on heterogeneous generated data is simple to implement, improves performance, and simultaneously exceeds various mainstream neural network non-data quantization methods in performance, especially when all layers are quantized into very low bits or smaller neural networks.

3) Real data is not needed, a quantization network can be obtained through training from the beginning, the performance is higher, and particularly when a small network is quantized, compression and acceleration of the network can be realized on a general hardware platform under the condition that specific hardware support is not needed. The method can be applied to a convolutional neural network in the field of image classification.

Drawings

FIG. 1 is a method block diagram of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the following embodiments will be further described with reference to the accompanying drawings.

A method block diagram of an embodiment of the invention is shown in fig. 1.

1. Description of the symbols

F(W ¹ ,W ² ,…,W ^L ) A full-precision Convolutional Neural Network (CNN) representing one L layer, where W ⁱ Represents the ith convolution layer, the number of convolution kernels of the ith convolution layer is out ⁱ The convolution kernel weight for this layer can be expressed as:

wherein the content of the first and second substances,

a jth convolution kernel representing an ith convolution layer, each convolution kernel

In order to realize the purpose of the method,

therein, in ⁱ ,width ⁱ ,height ⁱ The number of input channels of the ith layer, and the width and height of the convolution kernel are respectively. Given the input a of the ith convolutional layer ^i-1 (i.e., the output of the previous layer), the convolution result of the ith convolutional layer can be expressed as:

wherein the content of the first and second substances,

is the jth channel of the j convolution result, and all channels are collected to obtain O ⁱ ，

Representing a convolution operation. Then, the convolution result is passed through an activation function to obtain a final output activation value of the layer:

A ⁱ ＝σ(O ⁱ )

σ denotes the activation function.

The goal of the quantization algorithm is to obtain a neural network that can operate with low bits, when the convolution operation is expressed as:

a jth channel representing the quantized jth convolution kernel of the ith layer and the input of the ith layer. At the moment, the quantization algorithm can obtain an L-layer low-precision convolutional neural network

representing the ith convolutional layer that has been quantized.

To obtain a quantized network, the pre-trained full precision network is quantized. The quantization is as follows:

where clip (m, l, u) is min (max (m, l), u), and l, u represents the upper and lower clipping boundaries. m represents a full precision input, which may be a network weight W or an activation value a. round means rounding its input to the nearest integer.

Is a scaling factor for interconverting a full precision number and an integer, b denotes the quantization bit width. For the weights, a channel-by-channel quantization approach is used, i.e., each output channel has a separate clipping upper and lower bounds and scaling factor. For the activation values, a layer-by-layer quantization approach is used, i.e., each layer shares the same clipping upper and lower bounds and scaling factors. After the quantized value q is obtained, it can be dequantized back with a scaling factor

And then the operation is performed. For the convolution operation of the two quantized values, one can use:

wherein s is ₁ ,s ₂ Can be stored by pre-calculation, and q ₁ ,q ₂ All are low precision values, so that the original full-precision operation can be replaced by only low-precision convolution operation.

2. Heterogeneous data analysis

The existing neural network non-data quantization method is limited by poor quality of false pictures, and the performance is obviously reduced when all network layers are quantized into low bits. In order to improve the performance of the quantization network, the invention provides a neural network data-free quantization method based on heterogeneous generated data. Based on the characteristic that real data has heterogeneity, the false picture is updated by local object reinforcement, boundary distance limitation, soft perception loss and BN loss, so that the picture with heterogeneity is generated.

3. Description of the training

The embodiment of the invention comprises the following steps:

1) randomly initializing false pictures by using standard Gaussian distribution;

2) optimizing the false picture until the iteration number reaches the limit, and updating the false picture by using local object reinforcement, boundary distance limit, soft sensing loss and BN loss;

3) quantizing the neural network, and then training the quantized network by using distillation loss and cross entropy loss by using an optimized false picture until a preset number of training rounds is reached;

In step 1), an initialization dummy picture with the same size as the real picture is generated from the standard gaussian distribution samples.

In step 2), the dummy picture is optimized and initialized until the iteration number reaches the limit, and the dummy picture is updated by using local object reinforcement, boundary distance limitation, soft sensing loss and BN loss.

Local object enhancement, random crop (crop), scale (rescie) with a probability p 50% before a fake picture is input into the pre-trained network:

wherein crop _η The scale representing the clipping is sampled from the uniform distribution U (η,1),

showing a false picture after local object enhancement.

In the boundary distance limitation, the false pictures are limited to keep a certain distribution in the feature space of the pre-trained network:

wherein v is _F Representing features extracted using a pre-trained network,

the following:

wherein M is _c A set of features representing all the false drawings of the same category as the ith false drawing.

In soft perception loss, a soft target is provided for false pictures:

where U (e, 1) represents the uniform distribution from e to 1 and mes represents the mean squared error.

Loss of BN:

μ _l (x ^f ),σ _l (x ^f ) Representing a false picture x ^f The mean and variance at the l-th layer of the pre-trained network,

and the stored BN parameters of the l-th layer of the pre-training network during training are shown.

Combining the above losses, the total loss is:

and 3), quantizing the neural network, and training the quantized network by using distillation loss and cross entropy loss by using the optimized false picture until a preset training round number is reached.

And in the quantization neural network, quantizing the pre-trained full-precision network to obtain a quantization network Q. The quantization is as follows:

where clip (F, l, u) ═ min (max (F, l), u), and l, u denote the upper and lower clipping boundaries. F represents a full precision input, which may be a network weight or an activation value. round means rounding its input to the nearest integer.

Is a scaling factor for interconverting a full precision number and an integer, b denotes the quantization bit width. For the weights, a channel-by-channel quantization approach is used, and for the activation values, a layer-by-layer quantization approach is used. After the quantized value q is obtained, it can be dequantized back with a scaling factor

Training the quantization network using distillation loss, cross entropy loss, wherein cross entropy loss:

wherein the content of the first and second substances,

and the pre-trained full-precision network represents that the ith input picture belongs to a predicted value of the y-th class, and N represents a total of N input pictures.

Distillation loss:

wherein the content of the first and second substances,

4. Implementation details

The neural network data-free quantification method based on heterogeneous generated data uses an ImageNet data set for effect evaluation, and uses a Pythrch deep learning framework to implement on an NVIDIAGTX 3090 display card. For the generation of false data, the optimizer is set to Adam, the momentum of the optimizer is set to 0.9, as long as the loss is not reduced in 50 iterations, the learning rate is reduced to 0.1, the total number of iterations is 1000, the size of batch is set to 256, eta, lambda _l ,λ _u Epsilon is set to; 0.5,0.3,0.8,0.9. A total of 5120 pictures were generated. For training the quantization network, its optimizer uses a Stochastic Gradient Descent (SGD), the optimizer momentum is set to 0.9, and the weight decay is set to 10e-4 to adjust the learning rate of the quantization network. The batch size is set to 16, the initial learning rate is set to 10e-6, the learning rate is reduced to 0.1 per 100 rounds, and the number of rounds of total training is set to 150.

5. Field of application

The method can be applied to the field of the deep convolutional neural network CNN, and compression and acceleration of the deep convolutional neural network are realized. Table 1 shows ResNet18(He K, Zhang X, Ren S, et al. deep residual learning for image recognition [ C ]// Proceedings of the IEEE conference on computer vision and pattern recognition.2016:770-778.) the results of the method are compared with the results of other neural network post-training quantification methods on the ImageNet dataset, wherein Gner generates a false picture by using a generator, and WbAb represents the quantification model weight and the activation value into b bits.

TABLE 1

As can be seen from Table 1, when all layers of the model are scaled to 5 bits, the method of the present invention (IntraQ) and the latest neural network non-data quantization method can maintain high performance, and in addition, compared with BRECQ, when ResNet-18 is quantized to 4 bits, the method of the present invention achieves great performance improvement, with 1.97% of performance.

Table 2 shows the comparison of the results of the method on the ImageNet dataset with other neural network post-training quantification methods for MobileNet V1(Howard A G, Zhu M, Chen B, et al. Mobilenes: Efficient connected neural networks for mobile vision applications [ J ]. arXiv preprinting: 1704.04861,2017.).

TABLE 2

From table 2, it can be seen that the present invention can improve the previous highest performance by 9.17% when quantifying MobileNetV1 to 4 bits.

Table 3 shows the results of MobileNet V2(Sandler M, Howard A, Zhu M, et al. Mobileneetv2: Inverted responses and linear botterns [ C ]// Proceedings of the IEEE conference on computer vision and pattern recognition.2018:4510-4520.) on the ImageNet dataset comparing the results of the method with other neural network post-training quantification methods.

TABLE 3

From table 3, it can be seen that the present invention can improve the previous maximum performance by 4.65% when quantifying MobileNetV2 to 4 bits.

Thus, the performance advantage of the present invention is more pronounced when quantifying lightweight models such as MobileNet, especially at lower precision (e.g., 4 bits). Moreover, the invention achieves the best results on all networks and bits, which proves the effectiveness of the invention.

The above-described embodiments are merely preferred embodiments of the present invention, and should not be construed as limiting the scope of the invention. All equivalent changes and modifications made within the scope of the present invention shall fall within the scope of the present invention.

Claims

1. The neural network data-free quantification method based on heterogeneous generated data is characterized by comprising the following steps of:

3) firstly quantizing the neural network, and then training the quantized network by using distillation loss and cross entropy loss by using an optimized false picture until a preset number of training rounds is reached;

2. The method according to claim 1, wherein in step 1), the randomly initializing false pictures using the standard gaussian distribution is an initializing false picture generated from a sampling of the standard gaussian distribution and having a size consistent with a real picture.

3. The neural network data-free quantification method based on heterogeneous generated data as claimed in claim 1, wherein in the step 2), the specific method for reinforcing the local object is as follows: random crop (crop), scale (reszie) with a probability of p ═ 50% before the fake pictures are input into the pre-trained network:

showing a false picture after local object enhancement.

4. The neural network data-free quantification method based on heterogeneous generated data as claimed in claim 1, wherein in step 2), the specific method of the boundary distance limitation is: the false pictures are limited to keep a certain distribution in the feature space of the pre-trained network:

wherein v is _F Representing features extracted using a pre-trained network,

the following were used:

wherein, M _c A set of features representing all the false drawings of the same category as the ith false drawing.

5. The neural network data-free quantization method based on heterogeneous generated data of claim 1, wherein in step 2), the soft perceptual loss is to provide a soft target for the false picture:

6. The heterogeneous data generation-based neural network data-free quantization method of claim 1, wherein in step 2), the BN loss:

wherein, mu _l (x ^f ),σ _l (x ^f ) Representing a false picture x ^f The mean and variance at the l-th layer of the pre-trained network,

representing BN parameters stored in the l layer of the pre-training network during training;

combining the above BN losses

Boundary distance limitation

Loss of soft perception

The total loss was:

7. the neural network data-free quantization method based on heterogeneous generated data according to claim 1, wherein in step 3), the quantization neural network quantizes a pre-trained full-precision network to obtain a quantization network Q; the quantization mode is as follows:

wherein clip (F, l, u) ═ min (max (F, l), u), l, u denote the upper and lower clipping boundaries; f represents a full precision input, which may be a network weight or an activation value; round denotes rounding its input to the nearest integer;

8. The neural network data-free quantization method based on heterogeneous generated data according to claim 1, wherein in step 3), the quantization network is trained by using distillation loss and cross-entropy loss, wherein the cross-entropy loss is:

wherein the content of the first and second substances,

the method comprises the steps that a pre-trained full-precision network is represented, the ith input picture belongs to a predicted value of the y th class, and N represents a total of N input pictures;

distillation loss:

wherein the content of the first and second substances,