CN113076663A - Dynamic hybrid precision model construction method and system - Google Patents

Dynamic hybrid precision model construction method and system Download PDF

Info

Publication number
CN113076663A
CN113076663A CN202110491111.7A CN202110491111A CN113076663A CN 113076663 A CN113076663 A CN 113076663A CN 202110491111 A CN202110491111 A CN 202110491111A CN 113076663 A CN113076663 A CN 113076663A
Authority
CN
China
Prior art keywords
precision
model
mixed
parameter
hybrid
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110491111.7A
Other languages
Chinese (zh)
Inventor
郭锴凌
杨弈才
徐向民
邢晓芬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China University of Technology SCUT
Original Assignee
South China University of Technology SCUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China University of Technology SCUT filed Critical South China University of Technology SCUT
Priority to CN202110491111.7A priority Critical patent/CN113076663A/en
Publication of CN113076663A publication Critical patent/CN113076663A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Evolutionary Computation (AREA)
  • Geometry (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The invention discloses a dynamic hybrid precision model construction method and a dynamic hybrid precision model construction system, and relates to a deep neural network technology. The method is provided aiming at the problems of the number of convertible states and the precision in the prior art, and is characterized in that a mixed precision state conversion table is constructed according to the average trace of the hessian matrix of parameters of different blocks in a full-precision model and an optional parameter precision table S; in the training process, a plurality of mixed precision sub-models are sampled at random by each training iteration, and the improved quantization function is used for quantization operation to obtain a mixed precision model; and forming a mixed bit deployment state table according to actual deployment requirements to carry out actual deployment. The method has the advantages that by utilizing the Hessian matrix average trace and random sampling, the search space and the calculated amount required by training are reduced, meanwhile, the improved quantization function can be directly transferred among different quantization bits, the quantization error is small, so that a mixed precision model can be adaptively deployed in the situation of lower bits, and meanwhile, the precision in the situation of higher bits is also improved.

Description

Dynamic hybrid precision model construction method and system
Technical Field
The invention relates to a deep neural network technology, in particular to a dynamic hybrid precision model construction method and a dynamic hybrid precision model construction system.
Background
In recent years, more and more smart devices deploy deep neural models, such as mobile phones and VR glasses. Deep neural models typically require a large amount of memory footprint and expensive computational operations, while the memory and computational resources of smart devices are limited. Considering more complex practical application scenarios, even if the same device is used, the requirements for the deep neural model may change due to changes in battery conditions, hardware aging, and other factors. This brings more difficulty to the deployment work of the intelligent device, and is not favorable for the lightweight application of the neural network algorithm.
In order to solve the problems of overhigh memory occupation and calculation requirements of the deep neural network, researchers provide a method for carrying out quantitative compression on a neural network model. The quantized neural network model has less memory occupation and calculation amount, can obviously reduce the expenditure of storage and calculation resources, and is better deployed on light-weight equipment with limited resources.
Aiming at the problem of real-time change of the actual application scene of the intelligent equipment, researchers provide that one or more neural network models can be designed and trained for different application requirements individually by manual design or by using a neural network searching method. However, the two methods need to consume excessive design and training costs, and are not beneficial to practical popularization and application. A more efficient solution is to use an adaptive model, which means that only one model is designed and trained, with multiple transition states, one for each sub-model. The self-adaptive model can change self structural parameters in a self-adaptive mode according to the actual use scene change and is converted into different sub-models.
The self-adaptive quantization model combines the self-adaptive model and the quantization method, and parameters and calculation accuracy of the model can be changed in a self-adaptive mode. However, the current adaptive quantization model has two disadvantages, on one hand, the model cannot be applied to the condition of low parameter and calculation accuracy; on the other hand, there are limited transition states that can be used, each being a single bit sub-model.
In order to solve the problem that the statistical parameters of the submodels of the adaptive model are not matched, the adaptive pruning model replaces the traditional BN layer with a Switch BN (Batch Normalization) layer, the Switch BN layer comprises a plurality of BN layers with the same dimension but different parameters, and each BN layer corresponds to one channel number. The adaptive quantization model also employs Switch BN layers, where each BN layer corresponds to a bit precision. The learning parameterized Clipping (Clipping Level) is carried out on the activation layer of the quantization model, so that the performance can be effectively improved, and the adaptive quantization model also improves the Clipping Level into a Switch Clipping Level layer. Wherein, each ReLU activation layer is added with a plurality of different learnable parameters, and each parameter corresponds to one bit precision.
In recent years, researchers find that the hessian matrix of parameters of the deep neural model can be used for analyzing the sensitivity of different layers or blocks of the neural network model to noise disturbance, and further propose to train a hybrid precision model by using hessian matrix information. The mixed precision model is different quantization models with single precision, and different layers or blocks of the mixed precision model have different parameter precisions. Research shows that the mixed precision model has better performance and precision performance than a single precision quantization model.
However, the current adaptive quantization model has the difficult problem of difficult application in a lower bit scene, and at the same time, the convertible state is limited, which greatly limits the popularization and application of the adaptive quantization model. How to effectively increase the conversion state of the self-adaptive quantization model by using the specific Hessian matrix information and improve the precision becomes a problem which needs to be solved urgently.
Disclosure of Invention
The invention aims to provide a dynamic hybrid precision model construction method and a dynamic hybrid precision model construction system, which are used for solving the problems in the prior art.
The invention relates to a dynamic hybrid precision model construction method, which comprises the following steps:
s1, preprocessing the original data;
s2, training a full-precision model;
s3, giving an optional parameter precision table S;
s4, constructing a mixed precision state conversion table according to the traces of the Hessian matrix of the parameters of different blocks in the full-precision model and the optional parameter precision table S;
s5, initializing the full-precision model to an adaptive quantization model;
s6, training the self-adaptive quantization model, and performing quantization operation in a mode of randomly sampling a plurality of mixed precision sub-models in each iteration in the training process;
s7, obtaining a mixed precision model after the adaptive quantization model is trained;
and S8, selecting proper mixed precision submodels according to the actual deployment requirement and the mixed precision state conversion table to form a mixed bit deployment state table for actual deployment.
In step S5, the BN layer of the adaptive quantization model is replaced with the Switch BN layer, and the ReLU activation function is replaced with the Switch Clipping Level layer.
The traces of the hessian matrix are represented in their approximate values.
The approximate value is calculated in the manner of
Figure BDA0003052129890000021
Wherein Tr (H)i) For the approximation, T is the number of iterations, HiIs Hessian matrix, ziA random matrix is regenerated for each iteration.
Figure BDA0003052129890000031
Wherein g isiGradient derived for the loss function in the training, WiIs a parameter matrix.
In step S2, before training, the parameter matrix is scaled to [ -1, +1] and then propagated forward, and when propagated backward, the parameter matrix is updated.
In step S4, the average trace of each hessian matrix is calculated first, and then reordered and grouped in sequence according to the average trace size; the precision in the optional parameter precision table S is assigned to the packets according to size, and a relatively large precision is assigned to a packet with a relatively large average trace.
In step S6, the parameter w of the original floating point type is normalized to obtain a parameter
Figure BDA0003052129890000032
Wherein
Figure BDA0003052129890000033
To normalized parameters
Figure BDA0003052129890000034
Performing quantization operation with the formula
Figure BDA0003052129890000035
Figure BDA0003052129890000036
Wherein k is the quantization precision; the quantized value
Figure BDA0003052129890000037
Remap to [ -1, +1 [ -1 [ ]]Interval of the formula
Figure BDA0003052129890000038
Each training iteration samples the submodel MIN with the lowest average parameter precision and n-1 random mixed precision submodels in all the mixed precision submodels, namely n mixed precision submodels are sampled in total for iteration.
In the step S8, two optional deployment evaluation indexes are set based on the memory occupation size and the calculation amount size of the mixed precision sub-model parameter during deployment to dynamically adjust the optimal classification precision;
the method specifically comprises the following steps:
constructing an index evaluation interval according to the memory occupation size or the calculated amount size of the mixed precision sub-model parameter of each state, and averagely dividing the index evaluation interval into m sub-intervals;
if the optimal classification precision of the current subinterval is superior to that of the previous subinterval, the optimal classification precision of the current subinterval is unchanged;
if the optimal classification precision of the current subinterval is inferior to that of the previous subinterval, the optimal classification precision of the current subinterval is set as that of the previous subinterval;
and if the current subinterval has no state, setting the optimal classification precision of the current subinterval as the optimal classification precision of the previous subinterval.
The dynamic hybrid precision model construction system provided by the invention is used for constructing a hybrid precision model, and selecting a proper hybrid precision sub-model according to an actual deployment requirement and a hybrid precision state conversion table to form a hybrid bit deployment state table for actual deployment.
The method and the system for constructing the dynamic hybrid precision model have the advantages that the matrix z is randomly generated in each iterationiAnd the self-adaptive quantization function is improved through multiple iterative computations, and the improved self-adaptive quantization function enables the mixed precision model to be self-adaptively deployed in the situation of lower bits, and meanwhile, the precision in the situation of high bits is also improved.
The invention expands the sub-model with single bit into the sub-model with mixed precision, utilizes the trace of the Hessian matrix to represent the sensitivity of different modules of the model to quantization, and constructs the state conversion table of mixed precision, so that each state is a mixed precision model.
Compared with the traditional self-adaptive quantization model, the hybrid precision model obtained by final training does not increase the training time additionally during training and does not increase the memory occupation additionally during training while providing more conversion states and better precision performance.
By combining the improved adaptive quantization function and the design of the mixed precision model, the conversion state of the adaptive conversion model is increased, the high accuracy of each mixed precision sub-model is kept, and the application of the adaptive quantization model to lightweight equipment is promoted.
Drawings
FIG. 1 is a schematic flow chart of a dynamic hybrid precision model construction method according to the present invention.
FIG. 2 is a schematic flow chart of the dynamic hybrid precision model construction method in calculating the approximate value of the Hessian matrix trace.
FIG. 3 is a schematic diagram of a training process of the dynamic hybrid precision model construction method in iteration.
FIG. 4 is a schematic flow chart of the dynamic hybrid precision model construction method of the present invention during deployment.
Detailed Description
As shown in fig. 1 to 4, the method for constructing the dynamic hybrid precision model of the present invention specifically includes:
and carrying out preprocessing such as zero padding, random cutting, random overturning and the like on the original data, and then carrying out 8-bit quantization to obtain input data.
Cross entropy is used as a loss function and SGD as an optimizer.
Training a full-precision model by first using the parameter matrix wiScaling to [ -1, + 1)]Interval of the formula
Figure BDA0003052129890000041
Figure BDA0003052129890000042
Then forward propagation is carried out, and the parameter w is updated during reverse propagationi. In this embodiment, weights are used as the parameters, and the parameter matrix is a weight matrix.
Selecting part of training data, and calculating an approximate value of a trace of a hessian matrix of each block parameter of the full-precision model, wherein the method specifically comprises the following sub-steps:
(1) randomly sampling 2000 training data, preprocessing and inputting the training data into a full-precision model;
(2) the parameter matrix of each block is WiDeriving from the chain law to obtain the corresponding gradient gi
(3) For the parameter matrix of each block, randomly generating a random matrix z with the same dimension and normal distributioni
(4) The Hessian matrix corresponding to the parameters is HiFirst, calculate
Figure BDA0003052129890000051
Recalculation
Figure BDA0003052129890000052
(5) The iteration is carried out for a plurality of times from (1) to (4), and a new random matrix z is regenerated in each iterationi. Accumulating the results of each iteration
Figure BDA0003052129890000053
Assuming the number of iterations is T, the Hessian matrix HiIs approximated by
Figure BDA0003052129890000054
The traces of the hessian matrix are represented by the approximation.
Given an optional parameter precision table S, such as {2, 3, 4}, the average traces of the parametric hessian matrix for each block are sorted from large to small and manually sorted into 4 groups in order, with larger groups of average traces being assigned greater parameter precision. And (4) distributing the higher precision in the selectable parameter precision table S to the group with the larger trace, and constructing a mixed precision state conversion table.
And modifying the traditional adaptive quantization function, and performing quantization operation of the adaptive quantization model by using the modified quantization function. Specifically, the method comprises the following substeps:
for each convolutional layer, the weight parameters are normalized to [0, 1%]Let the original floating-point type parameter be w, and the normalized parameter be
Figure BDA0003052129890000055
The formula is then:
Figure BDA0003052129890000056
let k be the quantization accuracy, for the normalized parameters
Figure BDA0003052129890000057
Performing quantization operation, wherein the formula is as follows:
Figure BDA0003052129890000058
remapping the quantized values to [ -1, +1] intervals, the formula being:
Figure BDA0003052129890000059
for different sub-models with mixed precision, different blocks use different parameters according to corresponding preset parameters and calculation precision
Figure BDA00030521298900000510
The function is quantized.
And initializing an adaptive quantization model by using a full-precision model, replacing the BN layer and the ReLU activation function into a Switch BN layer and a Switch Clipping Level layer, and training.
In the training process, a plurality of mixed precision submodels are sampled randomly in each iteration for training. Assuming that n sub-models with mixed precision are sampled in each iteration during training, the sub-model with the lowest average parameter precision in the sub-models with mixed precision is MIN, and the sub-model with the lowest average parameter precision MIN and n-1 random sub-models are sampled in each iteration for training. And adding the loss functions calculated by the n sub-networks, and then performing back propagation to update the weight parameters.
The trained adaptive quantization model is a construction target of the invention, namely a mixed precision model, and proper mixed precision submodels are selected according to actual deployment requirements to form a mixed bit deployment state table for actual deployment.
The invention adopts two deployment evaluation indexes, which are respectively based on the parameter memory occupation size and the calculation amount size of the mixed precision sub-model. Specifically, the method comprises the following steps:
and constructing an index evaluation interval according to the sub-model parameter occupation size/calculated quantity size of each state, wherein the left boundary of the interval is the value with the minimum occupation/calculated quantity of the sub-model parameters in the state table, and the right boundary of the interval is the value with the maximum occupation/calculated quantity of the sub-model parameters in the state table. And uniformly dividing the interval into m sub-intervals, determining the optimal classification precision of each sub-interval, and taking the evaluation standard as the average of the optimal classification precision of all sub-intervals.
Further, the optimal classification precision of each subinterval is expressed as:
if the optimal classification precision of the current interval is superior to that of the previous interval, the optimal classification precision of the current interval is the optimal classification precision of the current interval;
if the optimal classification precision of the current interval is inferior to that of the previous interval, the precision of the current interval is set as the optimal classification precision of the previous interval;
and if the current interval has no state, setting the current interval precision as the optimal classification precision of the previous interval.
The dynamic hybrid precision model construction system utilizes the method to construct a hybrid precision model, and selects proper hybrid precision sub-models to form a hybrid bit deployment state table for actual deployment according to actual deployment requirements and a hybrid precision state conversion table.
It will be apparent to those skilled in the art that various other changes and modifications may be made in the above-described embodiments and concepts and all such changes and modifications are intended to be within the scope of the appended claims.

Claims (10)

1. A dynamic hybrid precision model construction method is characterized by comprising the following steps:
s1, preprocessing the original data;
s2, training a full-precision model;
s3, giving an optional parameter precision table S;
s4, constructing a mixed precision state conversion table according to the traces of the Hessian matrix of the parameters of different blocks in the full-precision model and the optional parameter precision table S;
s5, initializing parameters of the full-precision model to an adaptive quantization model;
s6, training the self-adaptive quantization model, and performing quantization operation in a mode of randomly sampling a plurality of mixed precision sub-models in each iteration in the training process;
s7, obtaining a mixed precision model after the adaptive quantization model is trained;
and S8, selecting proper mixed precision submodels according to the actual deployment requirement and the mixed precision state conversion table to form a mixed bit deployment state table for actual deployment.
2. The dynamic mixture accuracy model construction method according to claim 1, wherein in step S5, the BN layer of the adaptive quantization model is replaced by a Switch BN layer, and the ReLU activation function is replaced by a Switch Clipping Level layer.
3. The method of constructing a dynamic hybrid accuracy model of claim 1, wherein the traces of the hessian matrix are represented as approximations thereof.
4. The method according to claim 3, wherein the approximate value is calculated by
Figure FDA0003052129880000011
Wherein Tr (H)i) For the approximation, T is the number of iterations, HiIs Hessian matrix, ziA random matrix is regenerated for each iteration.
5. The dynamic hybrid precision model building method of claim 4,
Figure FDA0003052129880000012
wherein g isiGradient derived for the loss function in the training, WiIs a parameter matrix.
6. The method for constructing a dynamic hybrid precision model according to claim 5, wherein in step S2, the parameter matrix is scaled to the range of [ -1, +1] before training and then propagated forward, and the parameter matrix is updated when propagated backward.
7. The method for constructing a dynamic hybrid precision model according to claim 1, wherein in step S4, the average trace of each hessian matrix is calculated, and then reordered and grouped in sequence according to the average trace size; the precision in the optional parameter precision table S is assigned to the packets according to size, and a relatively large precision is assigned to a packet with a relatively large average trace.
8. The method for constructing a dynamic mixed-precision model according to claim 1, wherein in step S6, the original floating-point type parameter w is normalized to obtain a parameter
Figure FDA0003052129880000021
Wherein
Figure FDA0003052129880000022
To normalized parameters
Figure FDA0003052129880000023
Performing quantization operation with the formula
Figure FDA0003052129880000024
Wherein k is the quantization precision; the quantized value
Figure FDA0003052129880000025
Remap to [ -1, +1 [ -1 [ ]]Interval of the formula
Figure FDA0003052129880000026
Each training iteration samples the submodel MIN with the lowest average parameter precision and n-1 random mixed precision submodels in all mixed precision submodels, namely a total sampling n mixed precision submodelsAnd (6) performing iteration.
9. The dynamic hybrid precision model construction method according to claim 1, wherein in step S8, two optional deployment evaluation indicators are set based on the memory occupation size and the calculation amount size of the hybrid precision submodel parameters during deployment to dynamically adjust the optimal classification precision;
the method specifically comprises the following steps:
constructing an index evaluation interval according to the memory occupation size or the calculated amount size of the mixed precision sub-model parameter of each state, and averagely dividing the index evaluation interval into m sub-intervals;
if the optimal classification precision of the current subinterval is superior to that of the previous subinterval, the optimal classification precision of the current subinterval is unchanged;
if the optimal classification precision of the current subinterval is inferior to that of the previous subinterval, the optimal classification precision of the current subinterval is set as that of the previous subinterval;
and if the current subinterval has no state, setting the optimal classification precision of the current subinterval as the optimal classification precision of the previous subinterval.
10. A dynamic hybrid precision model construction system is characterized in that a hybrid precision model is constructed by the method of any one of claims 1 to 9, and proper hybrid precision submodels are selected according to actual deployment requirements and a hybrid precision state conversion table to form a hybrid bit deployment state table for actual deployment.
CN202110491111.7A 2021-05-06 2021-05-06 Dynamic hybrid precision model construction method and system Pending CN113076663A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110491111.7A CN113076663A (en) 2021-05-06 2021-05-06 Dynamic hybrid precision model construction method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110491111.7A CN113076663A (en) 2021-05-06 2021-05-06 Dynamic hybrid precision model construction method and system

Publications (1)

Publication Number Publication Date
CN113076663A true CN113076663A (en) 2021-07-06

Family

ID=76616357

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110491111.7A Pending CN113076663A (en) 2021-05-06 2021-05-06 Dynamic hybrid precision model construction method and system

Country Status (1)

Country Link
CN (1) CN113076663A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114676797A (en) * 2022-05-27 2022-06-28 浙江大华技术股份有限公司 Model precision calculation method and device and computer readable storage medium
CN118035628A (en) * 2024-04-11 2024-05-14 清华大学 Matrix vector multiplication operator realization method and device supporting mixed bit quantization

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114676797A (en) * 2022-05-27 2022-06-28 浙江大华技术股份有限公司 Model precision calculation method and device and computer readable storage medium
CN118035628A (en) * 2024-04-11 2024-05-14 清华大学 Matrix vector multiplication operator realization method and device supporting mixed bit quantization
CN118035628B (en) * 2024-04-11 2024-06-11 清华大学 Matrix vector multiplication operator realization method and device supporting mixed bit quantization

Similar Documents

Publication Publication Date Title
CN108846517B (en) Integration method for predicating quantile probabilistic short-term power load
CN109711532B (en) Acceleration method for realizing sparse convolutional neural network inference aiming at hardware
CN110730046B (en) Cross-frequency-band spectrum prediction method based on deep migration learning
CN111738401A (en) Model optimization method, grouping compression method, corresponding device and equipment
CN110097187A (en) It is a kind of based on activation-entropy weight hard cutting CNN model compression method
CN112633316B (en) Load prediction method and device based on boundary estimation theory
CN113076663A (en) Dynamic hybrid precision model construction method and system
CN111026548B (en) Power communication equipment test resource scheduling method for reverse deep reinforcement learning
CN112052951A (en) Pruning neural network method, system, equipment and readable storage medium
CN111079899A (en) Neural network model compression method, system, device and medium
CN111814973B (en) Memory computing system suitable for neural ordinary differential equation network computing
CN109871622A (en) A kind of low-voltage platform area line loss calculation method and system based on deep learning
CN115829024B (en) Model training method, device, equipment and storage medium
CN111355633A (en) Mobile phone internet traffic prediction method in competition venue based on PSO-DELM algorithm
CN112990420A (en) Pruning method for convolutional neural network model
CN111105007A (en) Compression acceleration method of deep convolutional neural network for target detection
CN112307667A (en) Method and device for estimating state of charge of storage battery, electronic equipment and storage medium
CN116245030A (en) Deep learning water demand prediction method with automatic parameter feedback adjustment
CN114528987A (en) Neural network edge-cloud collaborative computing segmentation deployment method
CN112861996A (en) Deep neural network model compression method and device, electronic equipment and storage medium
CN113610227A (en) Efficient deep convolutional neural network pruning method
CN111832817A (en) Small world echo state network time sequence prediction method based on MCP penalty function
Shin et al. Prediction confidence based low complexity gradient computation for accelerating DNN training
CN112105048A (en) Combined prediction method based on double-period Holt-Winters model and SARIMA model
Yang et al. Non-uniform dnn structured subnets sampling for dynamic inference

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20210706

RJ01 Rejection of invention patent application after publication