CN112183742B

CN112183742B - Neural network hybrid quantization method based on progressive quantization and Hessian information

Info

Publication number: CN112183742B
Application number: CN202010915248.6A
Authority: CN
Inventors: 王振宁; 许金泉; 王溢; 蔡碧颖
Original assignee: Nanqiang Zhishi Xiamen Technology Co ltd
Current assignee: Nanqiang Zhishi Xiamen Technology Co ltd
Priority date: 2020-09-03
Filing date: 2020-09-03
Publication date: 2023-05-12
Anticipated expiration: 2040-09-03
Also published as: CN112183742A

Abstract

The invention discloses a neural network mixed quantization method based on progressive quantization and Hessian information, which comprises the following steps: dividing a sample set and a calibration set by a given image-label pair set; defining a quantization precision range selectable by each layer of the neural network; randomly selecting a quantization layer to perform bit-down quantization, and repeatedly sampling for n times to obtain n basic mixed precision models; taking the image-label pair in the calibration set as input, and carrying out a forward process on all candidate hybrid precision neural network models; calculating a Hessian approximation by using Adam second-order Momentum information; recalculating the performance evaluation index

N are to

The index is sequenced, a mixed precision strategy corresponding to the minimum value is selected as a mixed precision super-parameter combination with the best performance under the current step, the calculation cost of the model is calculated, and then the current mixed precision network model is trained; and iterating until the ending condition is met. The method can greatly reduce the optimized mixed precision search space in each step and improve the performance evaluation efficiency of the quantization model.

Description

Neural network hybrid quantization method based on progressive quantization and Hessian information

Technical Field

The invention belongs to the technical field of neural network compression and acceleration, in particular to a method for improving the calculation speed of a neural network model and reducing the memory capacity of the model by utilizing a quantization technology, which can be used in various application fields such as picture classification, target detection, super-resolution reconstruction and the like.

Background

In recent years, products of artificial intelligence gradually open the doors of thousands of households, intelligent robots, VR, unmanned vehicles and the like slowly enter eyes of people, but these devices all face the problems of low computing performance and low storage capacity, which makes many efficient algorithms not applicable to these devices.

With the development of neural network algorithms, network performance is stronger and stronger, but the network performance is brought with the consumption of computing resources and memory storage, in order to obtain a more efficient network structure and be deployed at a mobile terminal, the compression algorithm of the neural network becomes a research hotspot in recent years, five main network compression approaches are adopted, quantization, pruning, low-rank decomposition, teacher-student network and light-weight network design are adopted, quantization is to convert floating point storage and operation into fixed point storage and operation, and the floating point storage and operation which is expressed by 32 bits or 64 bits in the past is stored and operated in a form of occupying less memory space by 1bit and 2bit, so that the method is an effective technology with very good application prospect.

As described in the reference (Jacob B, kligys S, chen B, et al, quantization and training of neural networks for efficient integer-arithmetical-only reference.in Proceedings of the IEEE Conference on Computer Vision and Pattern recovery.2018.) the neural network quantization algorithm requires the determination of three main variables: quantization bit number, scale factor, and zero offset in non-uniform quantization algorithms.

Traditional quantization methods compress and accelerate by setting the same number of quantization bits for the entire network structure, despite significant advances, but still suffer from reduced network performance, especially on lower bit network structures. For example, binary network-based methods (Hubara I, coubariaux M, soudry D, et al, binary neural networks in Neural Information Processing Systems, 2016.), ternary quantization networks (Rastegari M, ordonez V, redmon J, et al, xnor-net: imagenet classification using binary convolutional neural networks in European Conference on Computer vision Springer, cham, 2016.) and low bit networks (Zhou S, wu Y, ni Z, et al, dorefa-net: training low bitwidth convolutional neural networks with low bitwidth grad. ArXiv pre arXiv:1606.06160,2016.) are limited by lower expression spaces, and neural network performance drops dramatically as the number of quantized bits decreases. Recently, neural network quantization algorithms based on reinforcement learning, evolutionary learning and derivable are sequentially proposed, and quantization network performance is improved through mixed bit precision expression. However, the performance estimation method for distinguishing the hybrid precision strategy is still based on a training mode, and in addition, the search space of the performance estimation method increases exponentially with the size of the candidate hybrid precision bit number set.

Disclosure of Invention

The invention aims to solve the problems of overlarge search space and unreasonable performance evaluation indexes generated in the processes of searching for optimal weight and activating quantized bit width in the actual deployment process of a low-bit quantized neural network model, and provides a hybrid bit quantization method based on progressive quantization and Hession information based on the idea of progressive optimization and the perception characteristics of Hession itself to noise disturbance of the neural network, which can greatly reduce the optimized hybrid precision search space in each step, and simultaneously improve the performance evaluation indexes of the hybrid bit neural network based on Hessian information and the performance evaluation efficiency of the quantized model. Compared with the traditional fixed precision, the quantization method achieves good effects on model precision and compression rate.

In order to achieve the above object, the solution of the present invention is:

a neural network hybrid quantization method based on progressive quantization and Hessian information comprises the following steps:

step 1, a marked image-label pair set is given, the set is divided into a training sample set, a test sample set and a verification sample set, and a part of the set is randomly sampled from the training sample set to serve as a calibration set; defining a selectable quantization precision range of each layer of the neural network, and giving a floating point network for training convergence and a required calculation cost constraint;

step 2, randomly selecting a quantization layer for sampling, performing bit-down quantization on all the selected quantization layers, and repeatedly sampling for n times, thereby obtaining n basic mixed precision models;

step 3, taking the image-label pairs in the calibration set as input, and mixing all candidates obtained by sampling in step 2Performing a forward process on the precision neural network model; calculation of Hessian approximation using Adam second order Momentum information

Based on this approximation, calculate +.>

wherein ,

representing calibration set data, +.>

Representing the parameters of the whole neural network model,/->

Representing Hessian information, w _i Representing the trainable weight of the ith layer in the original given trained floating point model, wherein the trainable parameter of the corresponding network layer of the mixed bit model in the corresponding current step is +.>

Step 4, corresponding to all sample models

The performance indexes are ordered, and the value of the performance indexes is the smallest>

The corresponding mixing precision strategy is used as the mixing precision super-parameter combination with the best performance under the current step;

step 5, calculating the calculation cost of the model based on the mixed precision super-parameter combination with the best performance in the current step, and training the current mixed precision network model by using the training sample set in the step 1;

step 6, judging whether the ending condition is met, if yes, taking the current mixed precision network model as a final neural network model; otherwise, the model trained in the step 5 is used as an initialization, and the steps 2 to 5 are repeated until the ending condition is met.

In the step 2, performing bit-down quantization on the selected quantization layer refers to setting the number of quantization bits of the selected quantization layer as 1 less than the quantization bits of the corresponding quantization layer in the previous step.

In the step 2, the initial quantization bit number of each layer of the neural network is set to be the maximum value of the selectable quantization range of the layer.

In the above step 5, the calculation formula of the calculation cost of the model is:

wherein ,

and />

Respectively representing quantized parameter number and corresponding quantized bit number, P of ith layer in network _W Representing the amount of parameters in the overall network.

In the above step 6, the ending condition means that the quantized bit number of all the quantized layers in the network is 2 or the calculation cost of the model reaches the given calculation cost.

After the scheme is adopted, compared with the prior art, the method has the following outstanding advantages:

firstly, the invention provides a progressive quantization scheme based on discrete optimization, so that the mixed bit quantization space in each iterative optimization process is greatly reduced, and the optimization efficiency is improved;

secondly, based on Hessian information, the invention leads out the performance influence index of the quantized bit on the whole neural network model, and the traditional training-based performance verification mode is abandoned by the calculation approximation of the Adam algorithm, so that the model performance verification efficiency is greatly improved. The problem that the performance of the mixed bit strategy model is difficult to measure is solved to a certain extent;

thirdly, the invention provides a method for simultaneously solving search space and performance verification in the mixed bit quantization problem for the first time. Through verification experiments on different network structures and data sets, it is found that even with a simpler random search strategy, an optimal result can be obtained.

Drawings

FIG. 1 is a flow chart of the present invention;

FIG. 2 is an algorithm flow pseudocode of the present invention;

FIG. 3 is a comparison of the present invention with a prior quantization algorithm;

FIG. 4 is a graph showing the results of the mixed bit strategy on CIFAR-10ResNet-50 of the present invention.

Detailed Description

The technical scheme and beneficial effects of the present invention will be described in detail below with reference to the accompanying drawings.

As shown in fig. 1, the present invention provides a neural network hybrid quantization method based on progressive quantization and Hessian information, comprising the steps of:

step 1, a marked image-label pair set is given, the set is divided into a training sample set, a test sample set and a verification sample set, and a part of the set is randomly sampled from the training sample set to serve as a calibration set; defining the possible search space that requires a neural network, namely: each layer of the neural network can select a quantization precision range (such as 1-bit-8-bits); while given the floating point network of training convergence and the required computational cost constraints (described in detail below).

Step 2, randomly selecting a quantization layer for sampling, performing bit-down quantization on all the selected quantization layers, specifically, setting the quantization bit number of the selected quantization layer as the quantization bit number of the corresponding quantization layer in the previous step minus one, and repeatedly sampling 500 times, thereby obtaining 500 basic mixed precision models; in particular, the initial quantization bit number of each layer of the neural network is set to be the maximum value of the selectable quantization range of that layer, such as: in step 1, the quantization range is 1-8, and then initially, the quantization bit of each layer of the neural network is set to 8.

And 3, taking the image-label pair in the calibration set as input, and performing a forward process on all the candidate mixed precision neural network models obtained by sampling in the step 2. Calculation of Hessian approximation using Adam second order Momentum information

Based on this approximation, calculate +.>

wherein ,

representing calibration set data, +.>

Representing the parameters of the whole network model; wherein (1)>

The method is characterized in that Hessian information is second derivative information, adam is an algorithm for calculating second derivative information based on a batch of data and calculating a proper learning rate to perform network optimization, and Hessian is estimated and calculated directly by the aid of the second derivative information calculated in Adam, wherein the two information are approximately equal; in addition, w _i Representing the trainable weight of a certain layer in the original given trained floating point model, wherein the trainable parameter of the corresponding network layer of the mixed bit model in the corresponding current step is +.>

Step 4, corresponding to all sample models

The corresponding mixing precision strategy is the mixing precision super-parameter combination with the best performance under the current step.

Step 5, calculating the calculation cost (cost) of the model based on the current mixed precision super-parameter combination with the best performance, wherein the specific calculation formula is as follows:

wherein ,

is the quantized parameter quantity (or floating point calculation quantity) of the ith layer in the network,/or->

Is the quantized bit number, P, corresponding to the ith layer in the network _W Is the amount of parameters (or floating point operations) in the entire network. In order to compensate for the performance loss of the low-precision model, the training sample set in the step 1 is used for training the sample (the mixed-precision network model) corresponding to the minimum value obtained in the step 5.

And 6, if the quantized bit number of all the quantized layers in the network is 2 or the calculation cost of the model reaches the given calculation cost, terminating the algorithm. Otherwise, the trained model in the step 5 is used as an initialization, the steps 2 to 5 are repeated, and the mixing precision strategy in each iteration process and the corresponding network model trained in the step 5 are recorded.

The effects of the present invention are further described by the following simulation experiments.

1) Simulation conditions

The invention is developed on a VSCODE platform, and a deep learning framework developed is based on Pytorch. The language used mainly in the present invention is Python.

2) Emulation content

We simulated MNIST, CIFAR-10 and CIFAR-100 datasets, MNIST being a handwritten digital dataset, whose training dataset contained 60000, and test dataset contained 10000 samples. Each picture in the MNIST dataset is made up of 28 x 28 pixels, each represented by a gray value. Whereas the CIFAR-10 dataset consisted of 10 classes of 32 x 32 color pictures, containing 60000 pictures in total, each class containing 6000 pictures. Wherein 50000 pictures are used as training sets and 10000 pictures are used as test sets. The CIFAR-10 dataset was divided into 5 trained batches and 1 tested batch, each containing 10000 pictures. The pictures of the test set batch are composed of 1000 pictures randomly selected from each category, and the training set batch contains the remaining 50000 pictures in random order. The CIFAR-100 dataset is much like the CIFAR-10 dataset except that the former has 100 classes, each containing 600 images. There are 500 training images and 100 test images for each class.

In summary, the existing hybrid precision quantization algorithm only considers the design of the search algorithm, ignores the defects of overlarge search space and inaccurate performance evaluation standard of quantization precision, greatly reduces the search space of hybrid precision through progressive quantization design, and improves the accuracy of performance verification of a quantization model through staged retraining. Through experimentation, even if we use a random search algorithm, a better result can be obtained. Specifically, given a neural network structure and hardware requirements, after the algorithm processing provided by the invention, a hybrid precision neural network model meeting the given hardware requirements can be obtained, and after compiling processing, the hybrid precision neural network model can be directly applied to terminal equipment, so that the storage pressure of the terminal equipment is reduced, the calculation pressure of the terminal equipment is relieved, and meanwhile, a good result can be obtained.

Therefore, in the invention, based on the angle of discrete optimization, a hybrid precision progressive quantization method guided by Hessian information is provided, so that the search space and the performance verification time of a hybrid bit quantization model can be effectively reduced, the hybrid bit quantization model is applicable to various scale data sets and different network structures, and the final network model can obviously reduce the operation amount and the memory storage amount on the basis of keeping the precision because of the low bit expression, improve the performance of a neural network and is beneficial to deployment on a mobile terminal. In addition, the algorithm can be operated once to obtain the mixed bit model under the constraint of different hardware devices, and the applicability and the effectiveness of the algorithm are effectively improved. Specifically, with a progressive quantization scheme, our search space is reduced 19 times over conventional hybrid precision algorithms on ResNet-18. Meanwhile, under the condition that the Weight and the action are compressed 16 times, the precision of more than 90 can still be maintained.

The above embodiments are only for illustrating the technical idea of the present invention, and the protection scope of the present invention is not limited thereto, and any modification made on the basis of the technical scheme according to the technical idea of the present invention falls within the protection scope of the present invention.

Claims

1. The neural network hybrid quantization method based on progressive quantization and Hessian information is characterized by comprising the following steps:

step 3, taking the image-label pair in the calibration set as input, and carrying out a forward process on all candidate hybrid precision neural network models obtained by sampling in the step 2; calculation of Hessian approximation using Adam second order Momentum information

Based on this approximation, calculate/>

wherein ,

representing calibration set data, +.>

Representing the model parameters of the whole neural network, w _i Representing the trainable weight of the ith layer in the original given trained floating point model, wherein the trainable parameter of the corresponding network layer of the mixed bit model in the corresponding current step is +.>

Step 4, corresponding to all sample models

2. The neural network hybrid quantization method based on progressive quantization and Hessian information as recited in claim 1, wherein: in the step 2, performing bit-down quantization on the selected quantization layer refers to setting the number of quantization bits of the selected quantization layer as 1 less than the quantization bits of the corresponding quantization layer in the previous step.

3. The neural network hybrid quantization method based on progressive quantization and Hessian information as recited in claim 1, wherein: in the step 2, the initial quantization bit number of each layer of the neural network is set as the maximum value of the selectable quantization range of the layer.

4. The neural network hybrid quantization method based on progressive quantization and Hessian information as recited in claim 1, wherein: in the step 5, the calculation formula of the calculation cost of the model is:

wherein ,

and />

5. The neural network hybrid quantization method based on progressive quantization and Hessian information as recited in claim 1, wherein: in the step 6, the ending condition means that the quantized bit number of all the quantized layers in the network is 2 or the calculation cost of the model reaches the given calculation cost.