CN116721305A

CN116721305A - Hybrid precision quantized perception training method based on neural network structure search

Info

Publication number: CN116721305A
Application number: CN202310377705.4A
Authority: CN
Inventors: 尚凡华; 陈飞; 刘红英; 刘园园; 任岩; 万亮
Original assignee: Tianjin University
Current assignee: Tianjin University
Priority date: 2023-04-11
Filing date: 2023-04-11
Publication date: 2023-09-08

Abstract

A hybrid precision quantized perception training method based on neural network structure search comprises the following steps: inputting an image original data set, and dividing a training data set and a verification data set; acquiring the gradient of the super gateway in the training set; updating the weight of the super network; acquiring the gradient of the gateway in the verification set; updating bit importance parameters of the super network; storing the current mixing precision configuration; the complexity of completing the set iteration times or the current mixed precision configuration is lower than the expected complexity; acquiring a mixed precision configuration set of a target network under different constraints; starting from the flatness of the minimum value area of the model loss function, carrying out quantized perception training on the mixed precision network under different constraints; according to the method, the optimal mixed precision configuration of the model under the constraint condition is searched with low-cost calculation cost by utilizing the calculation equivalence of the parameter sharing and convolution operator; the generalization ability of a low-bit quantization model or a hybrid precision model containing a low-bit quantization layer is further improved by simultaneously minimizing the target loss value and the quantization loss sharpness.

Description

Hybrid precision quantized perception training method based on neural network structure search

Technical Field

The invention belongs to the technical field of computer vision, mainly relates to model quantization of a deep neural network, and particularly relates to a hybrid precision quantized perception training method based on neural network structure search.

Background

Model quantization is an important research direction for deep learning industrialization. Most existing quantization methods employ fixed precision quantization (also known as uniform precision quantization), i.e., weights and activation values for all layers in the network are typically quantized using the same bit width. The network model quantized by fixed precision is favored because the network model is well supported on conventional hardware such as a CPU, an FPGA and the like. However, the fixed-precision quantization mode ignores the position, structure, parameter number, and flow of the network layer on the network model, and under the same network parameter number and computational complexity, the fixed-precision quantization may result in suboptimal performance. Thus, hybrid precision quantization has been developed, which aims to allocate different quantization bit widths for weights and activation values of different layers, solving the above-mentioned limitations to some extent. Compared with fixed precision quantization, the mixed precision quantization is more flexible, and the memory and the calculation cost can be further saved under the condition of not sacrificing the network performance. In addition, hardware supporting hybrid precision reasoning (e.g., a12, tuning GPUs, etc.) also speeds up the floor-standing application of hybrid precision models.

Existing hybrid precision quantization techniques can be divided into rule-based methods and learning-based methods. Rule-based methods utilize specific metrics to determine the optimal quantization bit width for each layer. For example, the HAWQ method uses the Hessian matrix as a metric to determine the hierarchical quantization bit width of the network. However, these rule-based metrics typically rely on heuristics provided by domain experts, and thus have limited scalability in practice. Inspired by neural network structure search (NAS) techniques, researchers have proposed learning-based methods to automatically search for the optimal bit width of the network layer, these algorithms being built on top of Deep Reinforcement Learning (DRL) or based differentiable NAS methods. Although the existing mixed precision quantification methods achieve certain results, the methods still have the defects of low searching efficiency, high calculation cost and the like.

Disclosure of Invention

In order to overcome the defects in the prior art, the invention aims to provide a hybrid precision quantized perception training method based on neural network structure search, which reduces the memory requirement in the searching process by searching the optimal hybrid precision configuration of a model under the constraint condition and adopting parameter sharing, utilizes the calculation equivalence of a convolution operator, uses a composite convolution module to replace an expensive parallel convolution module, ensures that the size of a search space and the calculated amount of a super network are decoupled under the condition that the calculation complexity of the super network is kept unchanged, and directly searches the hybrid precision configuration of a large network under the condition that the proxy task is not based; after the mixed precision configuration of the target model is obtained, the generalization capability of the quantization model is improved by simultaneously minimizing the target loss value and the quantization loss sharpness, the phenomena of training difficulty and remarkable performance reduction caused by low-bit quantization are relieved, and the generalization capability of the low-bit quantization model or the mixed precision model containing the low-bit quantization layer is further improved.

In order to achieve the above purpose, the invention adopts the technical means that:

a hybrid precision quantized perception training method based on neural network structure search comprises the following specific steps:

(1) Processing an image original data set, and dividing the image original data set into a training data set Dtrain and a verification data set Dval;

(2) Acquiring the gradient of the super gateway in the training set: sampling a batch of data samples from the training data set Dtrain in the step (1), inputting the data samples into a super-network for forward reasoning, and obtaining a current target loss value; obtaining a weight value, a weight value bit importance parameter and a gradient of an activation value bit importance parameter relative to a target loss value in the super network according to back propagation;

(3) Updating the weight of the super network: updating the weight of the current super-network model by using a gradient descent method;

(4) Acquiring the gradient of the gateway in the verification set: sampling a batch of data samples from the verification data set Dval in the step (1), inputting the data samples into a super-network for forward reasoning, and obtaining a current target loss value; obtaining the gradient of the bit importance parameter in the super network relative to the target loss value according to the back propagation;

(5) Updating bit importance parameters of the super network: updating the weight bit importance parameters and the activation value bit importance parameters of the super network by using a gradient descent method;

(6) Saving the current mixing precision configuration: taking quantization bits corresponding to the maximum probability item from weight bit importance parameters and activation value bit importance parameters of the super network as mixed precision configuration of the current target network, and storing the mixed precision configuration in a file form;

(7) Repeating the steps (2) to (5) until the set iteration times or the complexity of the current mixed precision configuration is lower than the expected complexity;

(8) Acquiring a mixed precision configuration set of a target network under different constraints by reading the content of the mixed precision configuration file stored in the step (6);

(9) And starting from the flatness of the minimum value area of the model loss function, carrying out quantized perception training on the mixed precision network under different constraints.

The expected complexity in the step (7) is the complexity which is set by the user according to the requirement of the model calculation cost in the actual application scene.

The different constraints in the step (8) and the step (9) comprise different model sizes and the calculation cost, namely the calculation complexity of different models.

The specific method of the step (9) comprises the following steps:

(9-1) reading a configuration file of the mixed precision configuration obtained in the step (6) through a program and mapping the configuration file to bit distribution of a target neural network model, and setting a maximum disturbance coefficient and training configuration;

(9-2) performing end-to-end quantization perception training on the learnable parameters of the quantization model, namely the weight and the quantization step size, wherein the weight is updated by adopting an optimization method based on loss sharpness, and the quantization step size is updated by adopting a standard gradient descent method;

the step (9) uses a loss sharpness-based optimization method, and the magnitude of the disturbance needs to be attenuated as the number of training iterations increases.

Compared with the prior art, the invention has the advantages that:

the invention utilizes the calculation equivalence of parameter sharing and convolution operator to search the optimal mixed precision configuration of the model under the constraint condition with low cost calculation cost. In the mixed precision searching process, in order to alleviate the overfitting phenomenon of the bit importance parameters in the single-layer optimization and double-layer optimization processes, the invention utilizes the regularization idea, takes the loss item of the super-network model on the verification set as a regular item, and minimizes the training loss and the verification set loss about the bit importance parameters by adding the additional constraint item so as to obtain better mixed precision configuration. In addition, considering that the loss value in the local optimal solution range possibly fluctuates seriously due to the tiny change of the weight caused by quantization noise or gradient update, the probability that the low-precision model falls into a worse local minimum value in the optimization process is higher, and the precision of the model is obviously reduced.

According to the invention, the extremely high calculation requirement of directly searching the optimal mixed precision configuration of the target network on the large data set is reduced without using a proxy task, the mixed precision configuration meeting different constraints (model size and calculation cost of the model) is saved through the step (6), and the memory consumption and the calculation complexity in the searching process are reduced simultaneously by constructing the super network in the step (2) and utilizing the calculation equivalence of the parameter sharing and convolution operator in the training super network process of the step (2) to the step (5).

The invention also considers the problems of training difficulty and obviously reduced performance caused by low-bit quantization, and compared with the existing method for optimizing the model from the training perspective, the invention provides a quantized perception training strategy QSAM (quantization and perception device) based on the target loss sharpness minimization based on the flatness of the minimum value area of the model loss function, and particularly comprises the following steps of (9), and the optimization method based on the sharpness loss is used for flattening the target loss function plane of the quantization model while minimizing the task loss, so that the generalization performance of the low-bit quantization model is improved.

Drawings

FIG. 1 is a comparison of a computational graph design and a conventional computational graph in a hybrid precision search process, wherein FIG. 1 (a) is a graph showing that multiple copies of full-precision weight parameters are required to be maintained in a conventional hybrid precision search algorithm; fig. 1 (b) shows a method according to the present invention.

Detailed Description

The present invention will be described in detail with reference to the accompanying drawings.

Comparing the design of the calculation graph with the traditional calculation graph in the process of the mixed precision searching according to the invention in fig. 1, wherein fig. 1 (a) shows that the traditional mixed precision searching algorithm needs to keep multiple copies of the full-precision weight parameter; fig. 1 (b) shows a method according to the present invention. The method only needs to store one weight tensor, and performs aggregation after a plurality of quantization branches, and finally only needs to execute one convolution operation. It is emphasized that different branches have independent learnable quantization step sizes, so that each branch is ensured to have a dynamic and flexible quantization function during the training period of the super-network, and the convergence speed and the search efficiency of the super-network training are further improved.

Example 1

A mixed precision search is performed on the ResNet-20 network over the CIFAR-10 dataset. The performance of the EMPS method was compared to the classical fixed precision quantization method Dorefa, PACT, LQ-Ne and LSQ. Furthermore, in order to embody the inventionThe effectiveness of the mixed double-layer optimization strategy in the super-network searching process, and the performance of the EMPS method is compared with that of the conventional mixed precision method HAWQ ^[5] 、BP-NAS ^[6] SSPS (Single Stroke Power System) ^[7] A comparison is made. In the mixed precision search of the target network ResNet-20:

constructing a super-network: firstly, constructing a super-network with the same macro architecture as a ResNet20 network, and then initializing a super-network model by utilizing pre-trained full-precision model parameters to search;

dividing the data set: to train the super-network, 50% of the training samples in the CIFAR-10 dataset were used

As a training set, 50% of the remaining training samples were used as a validation set, with batch size B set to 128;

setting a search bit width set: in each search unit in the super-network, the candidate bit width set of weights and activation values is set to {2,3,4,5};

updating the super network parameters: for the weight W of the super-network, the SGD optimizer is utilized to optimize, the initial learning rate is set to be 0.2, the momentum is set to be 0.9, and the weight attenuation coefficient is set to be 5e-4; for the bit importance parameter of the super-network weight and the bit importance parameter of the activation value, an Adam optimizer is used, the initial learning rate is 5e-3, and the weight attenuation coefficient is 1e-3. In the whole training process of the super-network, cosine attenuation is used for controlling the change of learning rate;

acquiring a mixed precision configuration set MPCs: after the super-network training is finished by using the training flow, a mixed precision configuration set MPCs of the target network under different calculation constraints can be obtained

Retraining a mixed precision model: performing quantization perception training on a quantization model under mixed precision configuration with different complexity, and retraining 200 epochs;

table 1 results of quantization of ResNet-20 network on CIFAR-10 dataset. Wherein 'MP' represents mixed precision quantification, 'WComp' represents compression ratio of model weight, and 'B-Comp' represents model reasoning

Cheng Zhongwei compression ratio of the number of operations (BOPs), "Ave-Bits" represents the average bit operand.

From the results in table 1, it can be found that the EMPS hybrid precision quantization method proposed by the present invention is superior to all the comparison methods, compared to the fixed precision quantization method. Under the same model weight compression ratio (W-Comp), the Top-1 accuracy of 93.29% is realized by using the EMPS method, and compared with PACT, LQ-Net and LSQ methods, the Top-1 accuracy is respectively improved by 2.19%, 1.69% and 0.99%. At nearly identical BOPs bit operation compression ratio (B-Comp), our EMPS method achieved 93.00% Top-1 accuracy, which was improved by 1.90%, 1.40% and 0.70% compared to PACT, LQ-Net and LSQ methods, respectively.

In addition, compared with the existing BP-NAS and SSPS mixed precision method, the mixed precision quantization model obtained by the method provided by the invention has obviously improved precision under the indexes of model parameter compression ratio (W-Comp), bit operation compression (B-Comp) ratio and average bit operation (Ave-Bits). It can be seen that the mixed precision configuration of the target model searched by the EMPS method can obtain better performance under different calculation cost constraints.

Example 2

A mixed precision search is performed on the MobileNetV2 network on the CIFAR-100 dataset. The process of performing mixed precision search on the CIFAR-100 data set by the MobileNet V2 model is similar to the process of performing mixed precision search on the CIFAR-10 data set by the ResNet20 model, 50% of training samples in the CIFAR-100 data set are used as training sets, 50% of the rest training samples are used as verification sets, and the batch size is set to 128; except that after different blending accuracy configurations were obtained, all training samples were used to retrain 300epochs on the quantization model and no weight decay was used for the learnable parameters of the batch normalization layer.

Table 2 quantization results of MobileNetV2 network on CIFAR-100 dataset. Where "MP" represents mixed precision quantization, "W-Comp" represents compression ratio of model weight, "B-Comp" represents compression ratio of number of Bit Operations (BOPs) in model reasoning process, "Ave-Bits" represents average bit operand, and "-" represents unreported result.

Table 2 compares the accuracy of the mobilenet v2 model under 4-bit and 3-bit constraints for the different quantization methods. Compared to fixed precision quantization, the performance of the EMPS has significant advantages, both at the same model compression rate and at the same bit operation compression rate. For example, in the case of quantization with weight and activation of about 4 bits, the EMPS is not only 0.6% higher in Top-1 accuracy, but also 1.78×higherin model compression index than the best fixed precision quantization method LSQ of the current effect. In addition, compared with the existing HAQ and SAMQ mixed precision methods, the quantitative result has obvious precision advantages under the similar average bit operation (Ave-Bits) index.

Example 3

In order to verify the effectiveness of the hybrid precision algorithm proposed in this chapter on large data sets and network models of different depths, we have performed experimental verification on ResNet-18, resNet-50 and MobieNet V2 network models, respectively, on the ImageNet ILSVRC-2012 data set. When the super-network is trained, 75% of training samples are selected as a training set Dtrain, and the remaining 25% of training samples are selected as a verification set Dval. The detailed information of the search training super parameters and the retraining super parameters of different depth networks is shown in table 3. In the experiment, training images were randomly cropped and the image resolution was adjusted to 224×224, with data augmentation being random horizontal flipping (bright=0.2, coherent=0.2, saturation=0.2, hue=0.1); the verification image is adjusted to 224 x 224 resolution using center clipping. Furthermore, a label smoothing strategy (label_mooth=0.1) is used in the training process to add regularization.

Table 3 trains the super parameter settings. * The weight decay coefficient representing the BN layer parameter is 0.

The method proposed by the invention is compared with the currently popular quantification methods (such as PACT, LSQ, BP-NAS, SAMQ, SSPS, etc.). Similar to the experimental procedure described above, the experimental results were obtained by retraining the hybrid accuracy model searched for by the EMPS method using the QSAMDecay training method. Table 4 shows the quantification results of different depth networks on ImageNet datasets. First, compared with the fixed-precision quantization method, the method of the invention achieves better Top-1 accuracy under the same model compression ratio and higher BOPs bit operation compression ratio. For example, when ResNet-18, resNet-50 and MobileNet V2 are selected for the target network, our method is 0.19%, 0.34% and 0.32% higher in Top-1 accuracy than the fixed precision quantization method LSQ which is currently the best performing.

Secondly, compared with the existing differential hybrid precision search methods DNAS, BP-NAS and SSPS, the method only saves one weight tensor shared by each branch in the search process, and performs equivalent transformation in forward computation by utilizing the computation equivalence of a convolution operator, thereby avoiding a great amount of memory requirements and training computation cost. Finally, compared to the reinforcement learning based hybrid precision search method HAQ, autoQ, SAMQ, our method does not require a long training period and a large amount of computing resources (e.g., thousands of GPUs) in terms of training cost, and is therefore more suitable for hybrid precision configuration for directly searching deep networks on large datasets; in terms of quantization model accuracy, our algorithm is 1.8% higher than HAQ on the ResNet-50 network and 1.06% and 0.26% higher than HAQ and SAMQ on the mobilenet v2 network, respectively.

Table 4 quantifies a comparison of the performance of the network model on the ImageNet dataset. "-" indicates that the result was not reported.

Example 4

In order to verify the effectiveness of the method on the target detection task, we used the ResNet-50 mixed precision network selected in section 3 of the experiment as a Backbone network (Backbone) on the COCO detection dataset to train a mixed precision model for the two-stage Faster-RCNN detector and the single-stage Retinonet detector respectively. For the fast-RCNN detector, the weights and activation values in the RPN region proposal module and the ROIHead module are also quantized to 4-bit in addition to the backbone network. During the quantization perception training we fine-tuned 12 epochs on the 4-block GPU to the Faster-RCNN model, with the batch size set to 8. In addition, we select SGD as the base optimizer of qsam deviy, then use qsam deviy optimization method to perform quantized perceptual training on the detection network, set the initial learning rate to 0.02, momentum to 0.9, weight decay factor to 10-4, maximum perturbation factor ρmax to 0.05, and use multi steplr (milestones= [5,8,10 ]) learning rate decay strategy during training. For the RetinaNet network framework, the feature pyramid and other parts of the detection head module use 4-bit quantization except for the last layer in the detection head module which uses 8-bit quantization.

Table 5 Performance comparisons of Faster-RCNN and RetinaNet detection models on COCO data sets.

In table 5, the hybrid precision quantized perceptual training method EMPS proposed by the present invention is compared with FQN, auxi, BP-NAS and SSPS algorithms, where FQN and augi belong to the fixed precision quantization method and BP-NAS and SSPS belong to the hybrid precision quantization algorithm. From the experimental results, it can be seen that after the mixed precision configuration of the ResNet-50 network is obtained on the image Net classification task by using the EMPS method, the mixed precision configuration is directly applied to the backbone network of the Faster-RCNN detection model, and then fine tuning training is performed, so that the performances of the detectors are respectively 4.5%,1.8% and 0.2% higher than those of the FQN, BP-NAS and SSPS methods. On the RetinaNet detection model, the model quantization method provided by the invention also obtains the best detection result compared with other quantization methods.

The experimental result shows that the method provided by the invention has good mobility on the target detection task.

In summary, the present invention discloses an efficient hybrid precision search method (EMPS), which can directly perform hybrid precision search on a target network on a target data set without any proxy task, thanks to an efficient search computation graph design. In addition, the bit importance parameters in the super-network are updated by utilizing a double-layer hybrid optimization strategy, and the problem of over-fitting in the super-network training process is relieved by embedding more training information. In consideration of the problems of training difficulty and remarkably reduced performance of a network model in a low-bit quantization process, the invention provides a quantization perception training method (QSAM Decay) for minimizing target loss sharpness, analysis is performed from the view of model parameter disturbance, and generalization capability of the quantization model is improved by simultaneously minimizing target loss and quantization loss sharpness. By performing image classification tasks on CIFAR-10, CIFAR-100 and image Net data and performing object detection tasks on a COCO data set, the mixed precision search method provided by the invention can obtain better performance than the existing comparison method under the same calculation constraint. Further accelerating the floor application of the hybrid precision model.

Claims

1. A hybrid precision quantized perception training method based on neural network structure search is characterized in that: the method comprises the following specific steps:

2. The hybrid precision quantized perceptual training method based on neural network structure search of claim 1, wherein the hybrid precision quantized perceptual training method is characterized by: the expected complexity in the step (7) is the complexity which is set by the user according to the requirement of the model calculation cost in the actual application scene.

3. The hybrid precision quantized perceptual training method based on neural network structure search of claim 1, wherein the hybrid precision quantized perceptual training method is characterized by: the different constraints in the step (8) and the step (9) comprise different model sizes and the calculation cost, namely the calculation complexity of different models.

4. The hybrid precision quantized perceptual training method based on neural network structure search of claim 1, wherein the hybrid precision quantized perceptual training method is characterized by: the specific method of the step (9) is as follows:

(9-2) performing end-to-end quantization perception training on the learnable parameters of the quantization model, namely the weight and the quantization step size, wherein the weight is updated by adopting an optimization method based on loss sharpness, and the quantization step size is updated by adopting a standard gradient descent method.

5. The hybrid precision quantized perceptual training method based on neural network structure search of claim 1, wherein the hybrid precision quantized perceptual training method is characterized by: the step (9) uses a loss sharpness-based optimization method, and the magnitude of the disturbance needs to be attenuated as the number of training iterations increases.