CN113076663A

CN113076663A - Dynamic hybrid precision model construction method and system

Info

Publication number: CN113076663A
Application number: CN202110491111.7A
Authority: CN
Inventors: 郭锴凌; 杨弈才; 徐向民; 邢晓芬
Original assignee: South China University of Technology SCUT
Current assignee: South China University of Technology SCUT
Priority date: 2021-05-06
Filing date: 2021-05-06
Publication date: 2021-07-06

Abstract

The invention discloses a dynamic hybrid precision model construction method and a dynamic hybrid precision model construction system, and relates to a deep neural network technology. The method is provided aiming at the problems of the number of convertible states and the precision in the prior art, and is characterized in that a mixed precision state conversion table is constructed according to the average trace of the hessian matrix of parameters of different blocks in a full-precision model and an optional parameter precision table S; in the training process, a plurality of mixed precision sub-models are sampled at random by each training iteration, and the improved quantization function is used for quantization operation to obtain a mixed precision model; and forming a mixed bit deployment state table according to actual deployment requirements to carry out actual deployment. The method has the advantages that by utilizing the Hessian matrix average trace and random sampling, the search space and the calculated amount required by training are reduced, meanwhile, the improved quantization function can be directly transferred among different quantization bits, the quantization error is small, so that a mixed precision model can be adaptively deployed in the situation of lower bits, and meanwhile, the precision in the situation of higher bits is also improved.

Description

Dynamic hybrid precision model construction method and system

Technical Field

The invention relates to a deep neural network technology, in particular to a dynamic hybrid precision model construction method and a dynamic hybrid precision model construction system.

Background

In recent years, more and more smart devices deploy deep neural models, such as mobile phones and VR glasses. Deep neural models typically require a large amount of memory footprint and expensive computational operations, while the memory and computational resources of smart devices are limited. Considering more complex practical application scenarios, even if the same device is used, the requirements for the deep neural model may change due to changes in battery conditions, hardware aging, and other factors. This brings more difficulty to the deployment work of the intelligent device, and is not favorable for the lightweight application of the neural network algorithm.

In order to solve the problems of overhigh memory occupation and calculation requirements of the deep neural network, researchers provide a method for carrying out quantitative compression on a neural network model. The quantized neural network model has less memory occupation and calculation amount, can obviously reduce the expenditure of storage and calculation resources, and is better deployed on light-weight equipment with limited resources.

Aiming at the problem of real-time change of the actual application scene of the intelligent equipment, researchers provide that one or more neural network models can be designed and trained for different application requirements individually by manual design or by using a neural network searching method. However, the two methods need to consume excessive design and training costs, and are not beneficial to practical popularization and application. A more efficient solution is to use an adaptive model, which means that only one model is designed and trained, with multiple transition states, one for each sub-model. The self-adaptive model can change self structural parameters in a self-adaptive mode according to the actual use scene change and is converted into different sub-models.

The self-adaptive quantization model combines the self-adaptive model and the quantization method, and parameters and calculation accuracy of the model can be changed in a self-adaptive mode. However, the current adaptive quantization model has two disadvantages, on one hand, the model cannot be applied to the condition of low parameter and calculation accuracy; on the other hand, there are limited transition states that can be used, each being a single bit sub-model.

In order to solve the problem that the statistical parameters of the submodels of the adaptive model are not matched, the adaptive pruning model replaces the traditional BN layer with a Switch BN (Batch Normalization) layer, the Switch BN layer comprises a plurality of BN layers with the same dimension but different parameters, and each BN layer corresponds to one channel number. The adaptive quantization model also employs Switch BN layers, where each BN layer corresponds to a bit precision. The learning parameterized Clipping (Clipping Level) is carried out on the activation layer of the quantization model, so that the performance can be effectively improved, and the adaptive quantization model also improves the Clipping Level into a Switch Clipping Level layer. Wherein, each ReLU activation layer is added with a plurality of different learnable parameters, and each parameter corresponds to one bit precision.

In recent years, researchers find that the hessian matrix of parameters of the deep neural model can be used for analyzing the sensitivity of different layers or blocks of the neural network model to noise disturbance, and further propose to train a hybrid precision model by using hessian matrix information. The mixed precision model is different quantization models with single precision, and different layers or blocks of the mixed precision model have different parameter precisions. Research shows that the mixed precision model has better performance and precision performance than a single precision quantization model.

However, the current adaptive quantization model has the difficult problem of difficult application in a lower bit scene, and at the same time, the convertible state is limited, which greatly limits the popularization and application of the adaptive quantization model. How to effectively increase the conversion state of the self-adaptive quantization model by using the specific Hessian matrix information and improve the precision becomes a problem which needs to be solved urgently.

Disclosure of Invention

The invention aims to provide a dynamic hybrid precision model construction method and a dynamic hybrid precision model construction system, which are used for solving the problems in the prior art.

The invention relates to a dynamic hybrid precision model construction method, which comprises the following steps:

s1, preprocessing the original data;

s2, training a full-precision model;

s3, giving an optional parameter precision table S;

s4, constructing a mixed precision state conversion table according to the traces of the Hessian matrix of the parameters of different blocks in the full-precision model and the optional parameter precision table S;

s5, initializing the full-precision model to an adaptive quantization model;

s6, training the self-adaptive quantization model, and performing quantization operation in a mode of randomly sampling a plurality of mixed precision sub-models in each iteration in the training process;

s7, obtaining a mixed precision model after the adaptive quantization model is trained;

and S8, selecting proper mixed precision submodels according to the actual deployment requirement and the mixed precision state conversion table to form a mixed bit deployment state table for actual deployment.

In step S5, the BN layer of the adaptive quantization model is replaced with the Switch BN layer, and the ReLU activation function is replaced with the Switch Clipping Level layer.

The traces of the hessian matrix are represented in their approximate values.

The approximate value is calculated in the manner of

Wherein Tr (H)_i) For the approximation, T is the number of iterations, H_iIs Hessian matrix, z_iA random matrix is regenerated for each iteration.

Wherein g is_iGradient derived for the loss function in the training, W_iIs a parameter matrix.

In step S2, before training, the parameter matrix is scaled to [ -1, +1] and then propagated forward, and when propagated backward, the parameter matrix is updated.

In step S4, the average trace of each hessian matrix is calculated first, and then reordered and grouped in sequence according to the average trace size; the precision in the optional parameter precision table S is assigned to the packets according to size, and a relatively large precision is assigned to a packet with a relatively large average trace.

In step S6, the parameter w of the original floating point type is normalized to obtain a parameter

Wherein

To normalized parameters

Performing quantization operation with the formula

Wherein k is the quantization precision; the quantized value

Remap to [ -1, +1 [ -1 [ ]]Interval of the formula

Each training iteration samples the submodel MIN with the lowest average parameter precision and n-1 random mixed precision submodels in all the mixed precision submodels, namely n mixed precision submodels are sampled in total for iteration.

In the step S8, two optional deployment evaluation indexes are set based on the memory occupation size and the calculation amount size of the mixed precision sub-model parameter during deployment to dynamically adjust the optimal classification precision;

the method specifically comprises the following steps:

constructing an index evaluation interval according to the memory occupation size or the calculated amount size of the mixed precision sub-model parameter of each state, and averagely dividing the index evaluation interval into m sub-intervals;

if the optimal classification precision of the current subinterval is superior to that of the previous subinterval, the optimal classification precision of the current subinterval is unchanged;

if the optimal classification precision of the current subinterval is inferior to that of the previous subinterval, the optimal classification precision of the current subinterval is set as that of the previous subinterval;

and if the current subinterval has no state, setting the optimal classification precision of the current subinterval as the optimal classification precision of the previous subinterval.

The dynamic hybrid precision model construction system provided by the invention is used for constructing a hybrid precision model, and selecting a proper hybrid precision sub-model according to an actual deployment requirement and a hybrid precision state conversion table to form a hybrid bit deployment state table for actual deployment.

The method and the system for constructing the dynamic hybrid precision model have the advantages that the matrix z is randomly generated in each iteration_iAnd the self-adaptive quantization function is improved through multiple iterative computations, and the improved self-adaptive quantization function enables the mixed precision model to be self-adaptively deployed in the situation of lower bits, and meanwhile, the precision in the situation of high bits is also improved.

The invention expands the sub-model with single bit into the sub-model with mixed precision, utilizes the trace of the Hessian matrix to represent the sensitivity of different modules of the model to quantization, and constructs the state conversion table of mixed precision, so that each state is a mixed precision model.

Compared with the traditional self-adaptive quantization model, the hybrid precision model obtained by final training does not increase the training time additionally during training and does not increase the memory occupation additionally during training while providing more conversion states and better precision performance.

By combining the improved adaptive quantization function and the design of the mixed precision model, the conversion state of the adaptive conversion model is increased, the high accuracy of each mixed precision sub-model is kept, and the application of the adaptive quantization model to lightweight equipment is promoted.

Drawings

FIG. 1 is a schematic flow chart of a dynamic hybrid precision model construction method according to the present invention.

FIG. 2 is a schematic flow chart of the dynamic hybrid precision model construction method in calculating the approximate value of the Hessian matrix trace.

FIG. 3 is a schematic diagram of a training process of the dynamic hybrid precision model construction method in iteration.

FIG. 4 is a schematic flow chart of the dynamic hybrid precision model construction method of the present invention during deployment.

Detailed Description

As shown in fig. 1 to 4, the method for constructing the dynamic hybrid precision model of the present invention specifically includes:

and carrying out preprocessing such as zero padding, random cutting, random overturning and the like on the original data, and then carrying out 8-bit quantization to obtain input data.

Cross entropy is used as a loss function and SGD as an optimizer.

Training a full-precision model by first using the parameter matrix w_iScaling to [ -1, + 1)]Interval of the formula

Then forward propagation is carried out, and the parameter w is updated during reverse propagation_i. In this embodiment, weights are used as the parameters, and the parameter matrix is a weight matrix.

Selecting part of training data, and calculating an approximate value of a trace of a hessian matrix of each block parameter of the full-precision model, wherein the method specifically comprises the following sub-steps:

(1) randomly sampling 2000 training data, preprocessing and inputting the training data into a full-precision model;

(2) the parameter matrix of each block is W_iDeriving from the chain law to obtain the corresponding gradient g_i；

(3) For the parameter matrix of each block, randomly generating a random matrix z with the same dimension and normal distribution_i；

(4) The Hessian matrix corresponding to the parameters is H_iFirst, calculate

Recalculation

(5) The iteration is carried out for a plurality of times from (1) to (4), and a new random matrix z is regenerated in each iteration_i. Accumulating the results of each iteration

Assuming the number of iterations is T, the Hessian matrix H_iIs approximated by

The traces of the hessian matrix are represented by the approximation.

Given an optional parameter precision table S, such as {2, 3, 4}, the average traces of the parametric hessian matrix for each block are sorted from large to small and manually sorted into 4 groups in order, with larger groups of average traces being assigned greater parameter precision. And (4) distributing the higher precision in the selectable parameter precision table S to the group with the larger trace, and constructing a mixed precision state conversion table.

And modifying the traditional adaptive quantization function, and performing quantization operation of the adaptive quantization model by using the modified quantization function. Specifically, the method comprises the following substeps:

for each convolutional layer, the weight parameters are normalized to [0, 1%]Let the original floating-point type parameter be w, and the normalized parameter be

The formula is then:

let k be the quantization accuracy, for the normalized parameters

Performing quantization operation, wherein the formula is as follows:

remapping the quantized values to [ -1, +1] intervals, the formula being:

for different sub-models with mixed precision, different blocks use different parameters according to corresponding preset parameters and calculation precision

The function is quantized.

And initializing an adaptive quantization model by using a full-precision model, replacing the BN layer and the ReLU activation function into a Switch BN layer and a Switch Clipping Level layer, and training.

In the training process, a plurality of mixed precision submodels are sampled randomly in each iteration for training. Assuming that n sub-models with mixed precision are sampled in each iteration during training, the sub-model with the lowest average parameter precision in the sub-models with mixed precision is MIN, and the sub-model with the lowest average parameter precision MIN and n-1 random sub-models are sampled in each iteration for training. And adding the loss functions calculated by the n sub-networks, and then performing back propagation to update the weight parameters.

The trained adaptive quantization model is a construction target of the invention, namely a mixed precision model, and proper mixed precision submodels are selected according to actual deployment requirements to form a mixed bit deployment state table for actual deployment.

The invention adopts two deployment evaluation indexes, which are respectively based on the parameter memory occupation size and the calculation amount size of the mixed precision sub-model. Specifically, the method comprises the following steps:

and constructing an index evaluation interval according to the sub-model parameter occupation size/calculated quantity size of each state, wherein the left boundary of the interval is the value with the minimum occupation/calculated quantity of the sub-model parameters in the state table, and the right boundary of the interval is the value with the maximum occupation/calculated quantity of the sub-model parameters in the state table. And uniformly dividing the interval into m sub-intervals, determining the optimal classification precision of each sub-interval, and taking the evaluation standard as the average of the optimal classification precision of all sub-intervals.

Further, the optimal classification precision of each subinterval is expressed as:

if the optimal classification precision of the current interval is superior to that of the previous interval, the optimal classification precision of the current interval is the optimal classification precision of the current interval;

if the optimal classification precision of the current interval is inferior to that of the previous interval, the precision of the current interval is set as the optimal classification precision of the previous interval;

and if the current interval has no state, setting the current interval precision as the optimal classification precision of the previous interval.

The dynamic hybrid precision model construction system utilizes the method to construct a hybrid precision model, and selects proper hybrid precision sub-models to form a hybrid bit deployment state table for actual deployment according to actual deployment requirements and a hybrid precision state conversion table.

It will be apparent to those skilled in the art that various other changes and modifications may be made in the above-described embodiments and concepts and all such changes and modifications are intended to be within the scope of the appended claims.

Claims

1. A dynamic hybrid precision model construction method is characterized by comprising the following steps:

s1, preprocessing the original data;

s2, training a full-precision model;

s3, giving an optional parameter precision table S;

s5, initializing parameters of the full-precision model to an adaptive quantization model;

2. The dynamic mixture accuracy model construction method according to claim 1, wherein in step S5, the BN layer of the adaptive quantization model is replaced by a Switch BN layer, and the ReLU activation function is replaced by a Switch Clipping Level layer.

3. The method of constructing a dynamic hybrid accuracy model of claim 1, wherein the traces of the hessian matrix are represented as approximations thereof.

4. The method according to claim 3, wherein the approximate value is calculated by

5. The dynamic hybrid precision model building method of claim 4,

6. The method for constructing a dynamic hybrid precision model according to claim 5, wherein in step S2, the parameter matrix is scaled to the range of [ -1, +1] before training and then propagated forward, and the parameter matrix is updated when propagated backward.

7. The method for constructing a dynamic hybrid precision model according to claim 1, wherein in step S4, the average trace of each hessian matrix is calculated, and then reordered and grouped in sequence according to the average trace size; the precision in the optional parameter precision table S is assigned to the packets according to size, and a relatively large precision is assigned to a packet with a relatively large average trace.

8. The method for constructing a dynamic mixed-precision model according to claim 1, wherein in step S6, the original floating-point type parameter w is normalized to obtain a parameter

Wherein

To normalized parameters

Performing quantization operation with the formula

Wherein k is the quantization precision; the quantized value

Remap to [ -1, +1 [ -1 [ ]]Interval of the formula

Each training iteration samples the submodel MIN with the lowest average parameter precision and n-1 random mixed precision submodels in all mixed precision submodels, namely a total sampling n mixed precision submodelsAnd (6) performing iteration.

9. The dynamic hybrid precision model construction method according to claim 1, wherein in step S8, two optional deployment evaluation indicators are set based on the memory occupation size and the calculation amount size of the hybrid precision submodel parameters during deployment to dynamically adjust the optimal classification precision;

the method specifically comprises the following steps:

10. A dynamic hybrid precision model construction system is characterized in that a hybrid precision model is constructed by the method of any one of claims 1 to 9, and proper hybrid precision submodels are selected according to actual deployment requirements and a hybrid precision state conversion table to form a hybrid bit deployment state table for actual deployment.