CN111652366A - Combined neural network model compression method based on channel pruning and quantitative training - Google Patents

Combined neural network model compression method based on channel pruning and quantitative training Download PDF

Info

Publication number
CN111652366A
CN111652366A CN202010388100.1A CN202010388100A CN111652366A CN 111652366 A CN111652366 A CN 111652366A CN 202010388100 A CN202010388100 A CN 202010388100A CN 111652366 A CN111652366 A CN 111652366A
Authority
CN
China
Prior art keywords
layer
pruning
model
quantization
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010388100.1A
Other languages
Chinese (zh)
Inventor
徐磊
何林
苏华友
刘小龙
罗荣
张海涛
李君宝
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Harbin Institute of Technology
Original Assignee
Harbin Institute of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Harbin Institute of Technology filed Critical Harbin Institute of Technology
Priority to CN202010388100.1A priority Critical patent/CN111652366A/en
Publication of CN111652366A publication Critical patent/CN111652366A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The invention discloses a combined neural network model compression method based on channel pruning and quantitative training. Step 1: sparsifying the training model; step 2: training model pruning; and step 3: fine adjustment of the model; and 4, step 4: quantizing the model after pruning is finished, and constructing a conventional floating point number calculation graph; and 5: inserting pseudo-quantization modules at corresponding positions of convolution calculation in a calculation graph, inserting two pseudo-quantization modules at convolution weights and activation values, and quantizing the weights and the activation values into 8-bit integer; step 6: dynamically quantizing the training model until convergence; and 7: quantitative reasoning; and 8: finally, a model after pruning and quantification is obtained. The invention greatly reduces the consumption of the model on time and space by two technologies of pruning and quantification under the condition of keeping the accuracy of the model.

Description

Combined neural network model compression method based on channel pruning and quantitative training
Technical Field
The invention belongs to the technical field of data processing; in particular to a combined neural network model compression method based on channel pruning and quantitative training.
Background
The existing neural network pruning algorithm can be mainly divided into 3 steps: sparse training, cutting off channels with little influence and fine adjustment on a data set; existing pruning algorithms often take the calculation of the average of the convolution filter parameters in assessing the importance of a channel. However, this evaluation method only considers the influence of the convolution operation on the feature map, and does not consider the influence of the BN layer on the feature map. The network pruned in this way therefore has a considerable loss in performance. In the aspect of quantification, the existing method is mainly a static quantification method after model training is completed. The quantized parameters of the quantization method have certain errors, and an adjustment method on a data set is lacked, so that the accuracy of the quantized model has certain loss.
Disclosure of Invention
In order to solve the problem that the neural network model is difficult to deploy on general computing equipment due to large parameter quantity and large calculation quantity, the invention designs a combined neural network model compression method based on channel pruning and quantization training, and greatly reduces the consumption of the model on time and space by two technologies of pruning and quantization under the condition of keeping the accuracy rate of the model.
The invention is realized by the following technical scheme:
a combined neural network model compression method based on channel pruning and quantitative training comprises the following steps:
step 1: the sparse training model applies L1 norm punishment to the BN layer parameters after the convolutional layer needing to be sparse in the training process, so that the parameters have the characteristic of structured sparsity and are prepared for next channel cutting;
step 2: training model pruning, pruning channels corresponding to the convolutional layers with small gamma parameters in the BN layer according to the corresponding relation between the convolutional layers and the BN layer in the model in the pruning process, and pruning each layer shallowly and deeply to form a new model after channel pruning;
and step 3: fine adjustment of the model, continuing training the model after pruning on the data set, and properly reducing the learning rate to the previous one
Figure BDA0002484838070000011
Training until the model precision is not improved any more, and ending channel pruning;
and 4, step 4: quantizing the model after pruning is finished, and constructing a conventional floating point number calculation graph;
and 5: inserting pseudo-quantization modules at corresponding positions of convolution calculation in a calculation graph, inserting two pseudo-quantization modules at convolution weights and activation values, and quantizing the weights and the activation values into 8-bit integer;
step 6: dynamically quantizing the training model until convergence, wherein in quantization training, the weight and the activation value of the convolutional layer need to be quantized;
and 7: quantizing reasoning, namely saving the quantization parameters of the convolutional layer weight and the activation value, scaling the coefficient S and the zero point Z, and finishing quantization training;
and 8: finally, a model after pruning and quantification is obtained.
Further, the step 1 sparse training model specifically comprises,
step 1.1: constructing an original convolutional neural network model, traversing each layer of the model, finding out the BN layer after each convolutional layer, and adding each BN layer into a BN layer list;
step 1.2: setting a training hyper-parameter in an original convolutional neural network model, wherein a sparsification coefficient lambda is between 0.0001 and 0.01,
step 1.3: setting a training super parameter and then performing sparse training;
carrying out forward propagation and calculating gradient information of each layer parameter by backward propagation; before gradient application, applying an L1 norm penalty to gamma parameters in each BN layer in a BN layer list;
collecting absolute values of all gamma parameters in a BN layer in the training process, sequencing and listing the sizes of the gamma parameters of each quantile;
judging the sparsification level according to the value, wherein the smaller the parameter value is, the higher the sparsification level is;
the training process is continued until the accuracy index and the sparsification level are not increased any more;
and after the training is stopped, storing the model and the model structure obtained by the training, and simultaneously calculating the parameter quantity and the calculated quantity of the model.
Further, in step 1.3, an L1 norm penalty is imposed on the γ parameter in each BN layer in the BN layer list, as shown below,
Figure BDA0002484838070000021
in the above formula, Ω (w) represents the L1 norm of the BN layer γ parameter, and the L1 norm is multiplied by the sparsification coefficient λ and added to the original objective function L to form a new objective function L;
the calculation process of the BN layer is shown as the following formula,
Figure BDA0002484838070000022
wherein z isinRepresents the input tensor of the layer, mu and sigma are respectively the mean and the variance of the tensor per channel, oa is a small value to ensure the stability of numerical calculation, and gamma and β are respectively trainable parameters in the two BN layers and respectively represent the scaling and the offset of the layer, wherein the gamma parameter is the target parameter punished by the L1 norm.
Further, the training model pruning in the step 2 is specifically
Step 2.1: traversing the model from front to back, finding out the corresponding BN layer behind each convolutional layer, if no corresponding BN layer exists, pruning the convolutional layer, for a network output layer, because an output channel is limited by a target task, the output channel cannot be pruned, a pruning part needs to be marked, the pruning information is summarized into a table, and for a part with short-circuit connection, the part can be regarded as a plurality of inputs;
step 2.2: and globally sorting the gamma parameters in all BN layers, calculating the pruning threshold of the gamma parameters according to the pruning ratio, calculating the minimum value of the maximum values of the gamma parameters of all BN layers, and taking the value as the upper limit of the pruning threshold. Exceeding this threshold will result in a layer being completely cut;
step 2.3: the pruning information table traversed from front to back in the step 2.1 is classified into the following three types,
for the combination of the unrestricted convolution layer plus the BN layer;
for the combination of the constrained convolutional layer plus the BN layer:
for residual block structure with short circuit connection
Step 2.4: redefining the network model according to the residual channel number after each layer of pruning, and storing the parameters of the new model after pruning.
5. The compression method according to claim 4, characterized in that said step 2.3 is, in particular,
step 2.3.1: for the combination of the unrestricted convolution layer plus the BN layer; pruning a convolution filter corresponding to the pruning mask of the input channel and the output pruning mask; outputting a pruning mask: according to the index composition of the convolution filter corresponding to the situation that the gamma parameter of the current convolution layer BN layer is smaller than the threshold value, pruning is realized in a mode of recombining the parameters of the convolution layer and the parameters of the BN layer; if the current convolutional layer is a depth separable convolution, namely the number of packets is equal to the number of input channels, the output pruning mask is the same as the input pruning mask, and the number of the packets after pruning is the same as the number of the residual convolution filters;
step 2.3.2: for the combination of the constrained convolutional layer plus the BN layer: the output channel is not pruned, only the convolution filter corresponding to the pruning mask of the input channel is pruned, and pruning is realized by recombining the parameters of the convolutional layer and the parameters of the BN layer;
step 2.3.3: for a residual block structure with a short circuit connection; the number of output channels of the last layer in the residual block must be the same as the number of input channels of the residual block, and the output pruning mask of the last layer of the residual block needs to be equal to the output pruning mask of the previous layer of the residual block in the pruning process, so that the model structure after pruning is ensured to be good.
Further, the step 6 dynamically quantifies the training model until convergence,
step 6.1, in quantization training, floating point numbers which are still unquantized are input, convolution layer parameters participate in floating point number operation after being quantized by a pseudo quantization module, the intermediate convolution processes are all floating point number operation, and the activated values after being activated by functions are quantized by the pseudo quantization module;
step 6.2: because the weight value distribution of the convolutional layer is concentrated and is a fixed constant and does not change along with the input of the model, the quantization of the parameter layering of the convolutional layer is integer; the activation value is influenced by the model input, and the value of the activation value can fluctuate in a large range, so a channel-by-channel quantification method is adopted; the formula for the quantization calculation is shown below,
clamp(r;a,b):=min(max(x,a),b) (3)
Figure BDA0002484838070000041
Figure BDA0002484838070000042
wherein r represents a floating point number being quantized; [ a, b ]]Represents a quantization range; n is the quantization order, n is 2 in 8-bit integer quantization8=256;[·]Represents the nearest integer; q (r; a, b, n) represents the result after quantization;
in the above formula, the calculation of the quantization result according to the floating point numerical value can be calculated only by determining the quantization ranges [ a, b ]; in the training process, because the input and model parameters are constantly changed, the distribution range of the quantized parameters needs to be observed to determine the quantization range;
step 6.3: for the quantization range of the convolutional layer parameter, for each convolutional layer parameter w, a: ═ minw and b: ═ maxw are taken. The value of the quantization range is converged along with the convergence of the convolutional layer parameters;
step 6.4: for the quantization range of the activation value, the quantization range needs to be calculated independently for each channel;
in order that the quantization range of the activation values may reflect the distribution over the entire data set, it is necessary to calculate an exponential moving average of the single quantization range, the exponential moving average calculation formula is as follows,
St=α×Yt+(1-α)×St-1(6)
α is moving average coefficient with value between 0-1, α is near 1 to reflect long-term numerical average, YtFor the value observed this time, StIs an exponential moving average value at the time t;
step 6.5: the BN layer operation is fused during training, the calculation formula of the parameters of the fused convolutional layer is as follows,
Figure BDA0002484838070000043
wherein gamma is the gamma parameter in the BN layer;
Figure BDA0002484838070000044
an exponential moving average representing the variance of the convolutional layer results for each batch; is a small constant; w and wfoldBefore and after fusion, respectively.
Further, the quantitative reasoning of step 7 is specifically,
quantizing the fused offset parameters into 32-bit integer, taking a scaling coefficient S as the product of the convolutional layer weight and an input scaling coefficient, and setting a zero point Z to be 0;
the quantization reasoning needs to solve the problem of carrying out convolution matrix operation by using integers, and the formula is as follows;
first, the scaling coefficient S and the zero point Z of each layer are calculated from the obtained quantization range.
S=s(a,b,n)
Z=[q(0.0;a,b,n)](8)
The functions in the above two equations are defined before, and the quantized values can be dequantized to floating point numbers according to the scaling factor S and the zero point Z, and the dequantization equation is as follows,
r=S(q-Z) (9)
consider two real matrices r of two N × N1,r2Multiplication with the result r3α∈ {1,2,3}, 1 ≦ i, j ≦ N, having rα (i,j)Is represented by rαIn ith row and jth column of (e), each matrix quantization parameter is represented by (S)α,Zα) Each matrix after quantization is represented as qα (i,j)
The inverse quantization formula can be expressed as:
Figure BDA0002484838070000051
from the matrix multiplication, one can obtain:
Figure BDA0002484838070000052
the above formula can be rewritten as:
Figure BDA0002484838070000053
wherein
Figure BDA0002484838070000054
M is only one non-integer in the formula, but under the normal condition, the value of M is between 0 and 1, and the precision requirement can be met by using 32-bit fixed point number to express M; q. q.s3The results are calculated for quantification.
The invention has the beneficial effects that:
1. the invention does not increase extra variables in the pruning process, directly limits the parameters of the BN layer, can better utilize the scaling effect of the gamma parameters of the BN layer, has better pruning effect than the existing pruning method and has better acceleration effect on common hardware.
2. The method is different from the prior quantization method of training before quantization, and the quantization training method directly simulates the quantization process in the training process, thereby overcoming the defect that the quantization parameter can not be adjusted by the quantization method after the quantization. The quantization method of the invention has smaller precision loss and is suitable for various common convolution models.
3. The operation method designed by the invention completely avoids floating point number operation and has better acceleration effect on some hardware.
4. The two model compression methods are orthogonal to each other, can be used independently or jointly, and can achieve a better model compression effect.
Drawings
FIG. 1 is a flow chart of the method of the present invention.
FIG. 2 is a flow chart of the pruning algorithm of the present invention.
FIG. 3 is a schematic diagram of the pruning algorithm of the present invention.
FIG. 4 is a schematic diagram of pruning in a multi-layer structure according to the pruning algorithm of the present invention.
FIG. 5 is a schematic diagram of pruning in a residual structure by the pruning algorithm of the present invention.
FIG. 6 is a flow chart of the quantization algorithm of the present invention.
FIG. 7 is a diagram of the present invention for quantitative inference computation.
FIG. 8 is a diagram of the computation of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be described clearly and completely with reference to the accompanying drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Example 1
A combined neural network model compression method based on channel pruning and quantitative training comprises the following steps: channel pruning is to reduce the number of neural network channels; integer operation instead of floating-point operation for quantization training
Step 1: the sparse training model applies L1 norm punishment to the BN layer parameters after the convolutional layer needing to be sparse in the training process, so that the parameters have the characteristic of structured sparsity and are prepared for next channel cutting;
step 2: training model pruning, pruning channels corresponding to the convolutional layers with small gamma parameters in the BN layer according to the corresponding relation between the convolutional layers and the BN layer in the model in the pruning process, and pruning each layer shallowly and deeply to form a new model after channel pruning;
and step 3: fine adjustment of the model, continuing training the model after pruning on the data set, and properly reducing the learning rate to the previous one
Figure BDA0002484838070000061
Training until the model precision is not improved any more, and ending channel pruning;
and 4, step 4: quantizing the model after pruning is finished, and constructing a conventional floating point number calculation graph; 32-bit floating point numbers are adopted in general training;
and 5: inserting pseudo quantization modules at corresponding positions of convolution calculation in a calculation diagram, and inserting two pseudo quantization modules at convolution weight positions and activation value positions in order to simulate quantization effects in actual quantization estimation, so as to quantize the weight and activation values into 8-bit integer;
step 6: dynamically quantizing the training model until convergence, wherein in quantization training, the weight and the activation value of the convolutional layer need to be quantized;
and 7: quantizing reasoning, namely saving the quantization parameters of the convolutional layer weight and the activation value, scaling the coefficient S and the zero point Z, and finishing quantization training;
and 8: finally, a model after pruning and quantification is obtained.
Further, the step 1 sparse training model specifically comprises,
step 1.1: constructing an original convolutional neural network model, traversing each layer of the model, finding out the BN layer after each convolutional layer, and adding each BN layer into a BN layer list;
step 1.2: setting a training superparameter in an original convolutional neural network model, wherein the sparsification coefficient lambda is between 0.0001 and 0.01, generally 0.01, and the rest training superparameters are consistent with those without sparsification training;
step 1.3: setting a training super parameter and then performing sparse training;
carrying out forward propagation and calculating gradient information of each layer parameter by backward propagation; before gradient application, applying an L1 norm penalty to gamma parameters in each BN layer in a BN layer list;
collecting absolute values of all gamma parameters in a BN layer in the training process, sequencing and listing the sizes of the gamma parameters of each quantile;
judging the sparsification level according to the value, wherein the smaller the parameter value is, the higher the sparsification level is;
the training process is continued until the accuracy index and the sparsification level are not increased any more;
and after the training is stopped, storing the model and the model structure obtained by the training, and simultaneously calculating the parameter quantity and the calculated quantity of the model.
Further, in step 1.3, an L1 norm penalty is imposed on the γ parameter in each BN layer in the BN layer list, as shown below,
Figure BDA0002484838070000071
in the above formula, Ω (w) represents the L1 norm of the BN layer γ parameter, and the L1 norm is multiplied by the sparsification coefficient λ and added to the original objective function L to form a new objective function L;
the calculation process of the BN layer is shown as the following formula,
Figure BDA0002484838070000081
wherein z isinRepresents the input tensor of the layer, mu and sigma are respectively the mean and the variance of the tensor per channel, oa is a small value to ensure the stability of numerical calculation, and gamma and β are respectively trainable parameters in the two BN layers and respectively represent the scaling and the offset of the layer, wherein the gamma parameter is the target parameter punished by the L1 norm.
Further, the training model pruning in the step 2 is specifically
Step 2.1: traversing the model from front to back, finding out the corresponding BN layer behind each convolutional layer, if no corresponding BN layer exists, pruning the convolutional layer, for a network output layer, because an output channel is limited by a target task, the output channel cannot be pruned, a pruning part needs to be marked, the pruning information is summarized into a table, and for a part with short-circuit connection, the part can be regarded as a plurality of inputs;
step 2.2: and globally sorting the gamma parameters in all BN layers, calculating the pruning threshold of the gamma parameters according to the pruning ratio, calculating the minimum value of the maximum values of the gamma parameters of all BN layers, and taking the value as the upper limit of the pruning threshold. Exceeding this threshold will result in a layer being completely cut;
step 2.3: the pruning information table traversed from front to back in the step 2.1 is classified into the following three types,
for the combination of the unrestricted convolution layer plus the BN layer;
for the combination of the constrained convolutional layer plus the BN layer:
for residual block structure with short circuit connection
Step 2.4: redefining the network model according to the residual channel number after each layer of pruning, and storing the parameters of the new model after pruning.
Further, the step 2.3 is specifically,
step 2.3.1: for the combination of the unrestricted convolution layer plus the BN layer; pruning a pruning mask (corresponding to a pruning result of an output channel at the upper layer) of the input channel and a convolution filter corresponding to the output pruning mask; outputting a pruning mask: according to the index composition of the convolution filter corresponding to the situation that the gamma parameter of the current convolution layer BN layer is smaller than the threshold value, pruning is realized in a mode of recombining the parameters of the convolution layer and the parameters of the BN layer; if the current convolutional layer is a depth separable convolution, namely the number of packets is equal to the number of input channels, the output pruning mask is the same as the input pruning mask, and the number of the packets after pruning is the same as the number of the residual convolution filters;
step 2.3.2: for the combination of the constrained convolutional layer plus the BN layer: the output channel is not pruned, only the convolution filter corresponding to the pruning mask (corresponding to the pruning result of the upper layer of the output channel) of the input channel is pruned, and pruning is realized by recombining the parameters of the convolutional layer and the parameters of the BN layer;
step 2.3.3: for a residual block structure with a short circuit connection; the number of output channels of the last layer in the residual block must be the same as the number of input channels of the residual block, and the output pruning mask of the last layer of the residual block needs to be equal to the output pruning mask of the previous layer of the residual block in the pruning process, so that the model structure after pruning is ensured to be good.
Further, the step 6 dynamically quantifies the training model until convergence,
step 6.1, in quantization training, floating point numbers which are still unquantized are input, convolution layer parameters participate in floating point number operation after being quantized by a pseudo quantization module, the intermediate convolution processes are all floating point number operation, and the activated values after being activated by functions are quantized by the pseudo quantization module;
step 6.2: because the weight value distribution of the convolutional layer is concentrated and is a fixed constant and does not change along with the input of the model, the quantization of the parameter layering of the convolutional layer is integer; the activation value is influenced by the model input, and the value of the activation value can fluctuate in a large range, so a channel-by-channel quantification method is adopted; the formula for the quantization calculation is shown below,
clamp(r;a,b):=min(max(x,a),b) (3)
Figure BDA0002484838070000091
Figure BDA0002484838070000092
wherein r represents a floating point number being quantized; [ a, b ]]Represents a quantization range; n is the quantization order, n is 2 in 8-bit integer quantization8=256;[·]Represents the nearest integer; q (r; a, b, n) represents the result after quantization;
in the above formula, the calculation of the quantization result according to the floating point numerical value can be calculated only by determining the quantization ranges [ a, b ]; in the training process, because the input and model parameters are constantly changed, the distribution range of the quantized parameters needs to be observed to determine the quantization range;
step 6.3: for the quantization range of the convolutional layer parameter, for each convolutional layer parameter w, a: ═ minw and b: ═ maxw are taken. The value of the quantization range is converged along with the convergence of the convolutional layer parameters;
step 6.4: for the quantization range of the activation value, the quantization range needs to be calculated independently for each channel;
because the activation value of the initial training model is unstable, the activation value is not quantized at the moment; generally, after one fourth of the training process, observing the activation values channel by channel, and calculating the quantization range in the same way as the above; in order that the quantization range of the activation values may reflect the distribution over the entire data set, it is necessary to calculate an exponential moving average of the single quantization range, the exponential moving average calculation formula is as follows,
St=α×Yt+(1-α)×St-1(6)
α is moving average coefficient with value between 0-1, α is near 1 to reflect long-term numerical average, YtFor the value observed this time, StIs an exponential moving average value at the time t;
step 6.5: for networks where a BN layer exists, BN is a stand-alone operation in general conventional training. In the quantitatively optimized model, the BN operation is usually merged in the convolutional layer. In order to simulate the influence caused by the difference, the BN layer operation is fused during training, the calculation formula of the parameters of the fused convolutional layer is as follows,
Figure BDA0002484838070000101
wherein gamma is the gamma parameter in the BN layer;
Figure BDA0002484838070000102
an exponential moving average representing the variance of the convolutional layer results for each batch; is a small constant; w and wfoldBefore and after fusion, respectively.
In the quantization training process, the quantization range and the BN layer parameters need to be frozen timely. The aim is to enable the network to learn the weights under static quantization parameters and BN layer parameters so as to better simulate the network inference process under the real condition. Both are usually frozen sequentially at 10% to 20% of the whole training process.
Further, the quantitative reasoning of step 7 is specifically,
quantizing the fused offset parameters into 32-bit integer, taking a scaling coefficient S as the product of the convolutional layer weight and an input scaling coefficient, and setting a zero point Z to be 0;
in the quantitative reasoning, the invention realizes the full integer reasoning, namely floating point number operation is not used in the model reasoning process; the quantization reasoning needs to solve the problem of carrying out convolution matrix operation by using integers, and the formula is as follows;
first, the scaling coefficient S and the zero point Z of each layer are calculated from the obtained quantization range.
S=s(a,b,n)
Z=[q(0.0;a,b,n)](8)
The functions in the above two equations are defined before, and the quantized values can be dequantized to floating point numbers according to the scaling factor S and the zero point Z, and the dequantization equation is as follows,
r=S(q-Z) (9)
consider two real matrices r of two N × N1,r2Multiplication with the result r3α∈ {1,2,3}, 1 ≦ i, j ≦ N, having rα (i,j)Is represented by rαIn ith row and jth column of (e), each matrix quantization parameter is represented by (S)α,Zα) Each matrix after quantization is represented as qα (i,j)
The inverse quantization formula can be expressed as:
Figure BDA0002484838070000103
from the matrix multiplication, one can obtain:
Figure BDA0002484838070000104
the above formula can be rewritten as:
Figure BDA0002484838070000111
wherein
Figure BDA0002484838070000112
M is only one non-integer in the formula, but under the normal condition, the value of M is between 0 and 1, and the precision requirement can be met by using 32-bit fixed point number to express M; q. q.s3The results are calculated for quantification.
Example 2
The improved YOLOv3 network was compressed using the pruning algorithm of the present invention. The improved YOLOv3 network architecture employs Mobilenetv2 as a feature extractor, followed by replacing the normal convolution with a deep separable convolution to reduce the amount of computation. The improved YOLOv3 network achieved 78.46% of the test set maps on the VOC data set. On a 512 × 512 input image, the calculated amount is 4.15GMACs, and the model parameter number is 6.775M.
The results are trained on the VOC training set for 80 rounds, standard data enhancement methods including random cutting, perspective transformation and horizontal turning are adopted, and a mixup data enhancement method is additionally adopted. And (3) adopting an Adam optimization algorithm and a cosine annealing learning rate strategy, wherein the initial learning rate is 4e-3, and the size of the batch size is 16. The following sparsification training and fine tuning training both use the same hyper-parameter settings.
In the sparse training, the sparse coefficient is set to be 0.01, 80 rounds of training are performed on the VOC data set from the beginning, and 75.65% of mAP of a test set is achieved. And finally achieving 75.44% of test set mAP through pruning by 40% of channel number and 20 rounds of fine tuning training, and reducing the precision by 3.0% compared with an un-pruned model. The calculated amount is reduced to 1.74GMACs, and the parameter amount is reduced to 2.31M. The reduction is 58.1 percent and 65.9 percent respectively compared with the non-pruning model.
And the model after pruning is quantized by using the quantization training algorithm of the invention. Int8 quantification is adopted, a post-pruning model is used for quantitative training on the VOC data set, and the same setting is adopted for the hyper-parameters. The BN layer parameters were frozen after 10 rounds and the quantization parameters were frozen after 15 rounds. The final quantitative model achieved 76.74% mAP on the test set, which is 1.7% lower than the original model.
And carrying out speed test on the models on a test platform, wherein the test platform adopts E5-2630 v4 CPU. The tests were performed on a test set of VOC data sets and the results are shown in the table below.
TABLE 1 pruning and quantification model speed test results
Figure BDA0002484838070000113
From the above table, it can be seen that the pruning quantification combination method of the present invention can greatly accelerate the small model such as Mobilenetv2 with only a small loss of precision.

Claims (7)

1. A combined neural network model compression method based on channel pruning and quantitative training is characterized by comprising the following steps:
step 1: the sparse training model applies L1 norm punishment to the BN layer parameters after the convolutional layer needing to be sparse in the training process, so that the parameters have the characteristic of structured sparsity and are prepared for next channel cutting;
step 2: training model pruning, pruning channels corresponding to the convolutional layers with small gamma parameters in the BN layer according to the corresponding relation between the convolutional layers and the BN layer in the model in the pruning process, and pruning each layer shallowly and deeply to form a new model after channel pruning;
and step 3: fine adjustment of the model, continuing training the model after pruning on the data set, and properly reducing the learning rate to the previous one
Figure FDA0002484838060000011
Training until the model precision is not improved any more, and ending channel pruning;
and 4, step 4: quantizing the model after pruning is finished, and constructing a conventional floating point number calculation graph;
and 5: inserting pseudo-quantization modules at corresponding positions of convolution calculation in a calculation graph, inserting two pseudo-quantization modules at convolution weights and activation values, and quantizing the weights and the activation values into 8-bit integer;
step 6: dynamically quantizing the training model until convergence, wherein in quantization training, the weight and the activation value of the convolutional layer need to be quantized;
and 7: quantizing reasoning, namely saving the quantization parameters of the convolutional layer weight and the activation value, scaling the coefficient S and the zero point Z, and finishing quantization training;
and 8: finally, a model after pruning and quantification is obtained.
2. The compression method according to claim 1, wherein the step 1 sparse training model is specifically,
step 1.1: constructing an original convolutional neural network model, traversing each layer of the model, finding out the BN layer after each convolutional layer, and adding each BN layer into a BN layer list;
step 1.2: setting a training hyper-parameter in an original convolutional neural network model, wherein a sparsification coefficient lambda is between 0.0001 and 0.01;
step 1.3: setting a training super parameter and then performing sparse training;
carrying out forward propagation and calculating gradient information of each layer parameter by backward propagation; before gradient application, applying an L1 norm penalty to gamma parameters in each BN layer in a BN layer list;
collecting absolute values of all gamma parameters in a BN layer in the training process, sequencing and listing the sizes of the gamma parameters of each quantile;
judging the sparsification level according to the value, wherein the smaller the parameter value is, the higher the sparsification level is;
the training process is continued until the accuracy index and the sparsification level are not increased any more;
and after the training is stopped, storing the model and the model structure obtained by the training, and simultaneously calculating the parameter quantity and the calculated quantity of the model.
3. The compression method according to claim 2, wherein an L1 norm penalty is imposed on the gamma parameter in each BN layer in the BN layer list in step 1.3, as shown in the following formula,
Figure FDA0002484838060000021
in the above formula, Ω (w) represents the L1 norm of the BN layer γ parameter, and the L1 norm is multiplied by the sparsification coefficient λ and added to the original objective function L to form a new objective function L;
the calculation process of the BN layer is shown as the following formula,
Figure FDA0002484838060000022
wherein z isinRepresents the input tensor of the layer, mu and sigma are respectively the mean and the variance of the tensor per channel, oa is a small value to ensure the stability of numerical calculation, and gamma and β are respectively trainable parameters in the two BN layers and respectively represent the scaling and the offset of the layer, wherein the gamma parameter is the target parameter punished by the L1 norm.
4. The compression method according to claim 1, wherein the training model pruning in step 2 is specifically
Step 2.1: traversing the model from front to back, finding out the corresponding BN layer behind each convolutional layer, if no corresponding BN layer exists, pruning the convolutional layer, for a network output layer, because an output channel is limited by a target task, the output channel cannot be pruned, a pruning part needs to be marked, the pruning information is summarized into a table, and for a part with short-circuit connection, the part can be regarded as a plurality of inputs;
step 2.2: and globally sorting the gamma parameters in all BN layers, calculating the pruning threshold of the gamma parameters according to the pruning ratio, calculating the minimum value of the maximum values of the gamma parameters of all BN layers, and taking the value as the upper limit of the pruning threshold. Exceeding this threshold will result in a layer being completely cut;
step 2.3: the pruning information table traversed from front to back in the step 2.1 is classified into the following three types,
for the combination of the unrestricted convolution layer plus the BN layer;
for the combination of the constrained convolutional layer plus the BN layer:
for residual block structure with short circuit connection
Step 2.4: redefining the network model according to the residual channel number after each layer of pruning, and storing the parameters of the new model after pruning.
5. The compression method according to claim 4, characterized in that said step 2.3 is, in particular,
step 2.3.1: for the combination of the unrestricted convolution layer plus the BN layer; pruning a convolution filter corresponding to the pruning mask of the input channel and the output pruning mask; outputting a pruning mask: according to the index composition of the convolution filter corresponding to the situation that the gamma parameter of the current convolution layer BN layer is smaller than the threshold value, pruning is realized in a mode of recombining the parameters of the convolution layer and the parameters of the BN layer; if the current convolutional layer is a depth separable convolution, namely the number of packets is equal to the number of input channels, the output pruning mask is the same as the input pruning mask, and the number of the packets after pruning is the same as the number of the residual convolution filters;
step 2.3.2: for the combination of the constrained convolutional layer plus the BN layer: the output channel is not pruned, only the convolution filter corresponding to the pruning mask of the input channel is pruned, and pruning is realized by recombining the parameters of the convolutional layer and the parameters of the BN layer;
step 2.3.3: for a residual block structure with a short circuit connection; the number of output channels of the last layer in the residual block must be the same as the number of input channels of the residual block, and the output pruning mask of the last layer of the residual block needs to be equal to the output pruning mask of the previous layer of the residual block in the pruning process, so that the model structure after pruning is ensured to be good.
6. The compression method according to claim 1, wherein the step 6 dynamically quantifies the training model until convergence,
step 6.1, in quantization training, floating point numbers which are still unquantized are input, convolution layer parameters participate in floating point number operation after being quantized by a pseudo quantization module, the intermediate convolution processes are all floating point number operation, and the activated values after being activated by functions are quantized by the pseudo quantization module;
step 6.2: because the weight value distribution of the convolutional layer is concentrated and is a fixed constant and does not change along with the input of the model, the quantization of the parameter layering of the convolutional layer is integer; the activation value is influenced by the model input, and the value of the activation value can fluctuate in a large range, so a channel-by-channel quantification method is adopted; the formula for the quantization calculation is shown below,
clamp(r;a,b):=min(max(x,a),b)(3)
Figure FDA0002484838060000031
Figure FDA0002484838060000032
wherein r represents a floating point number being quantized; [ a, b ]]Represents a quantization range; n is the quantization order, n is 2 in 8-bit integer quantization8=256;[·]Represents the nearest integer; q (r; a, b, n) represents the result after quantization;
in the above formula, the calculation of the quantization result according to the floating point numerical value can be calculated only by determining the quantization ranges [ a, b ]; in the training process, because the input and model parameters are constantly changed, the distribution range of the quantized parameters needs to be observed to determine the quantization range;
step 6.3: for the quantization range of the convolutional layer parameter, for each convolutional layer parameter w, a: ═ minw and b: ═ maxw are taken. The value of the quantization range is converged along with the convergence of the convolutional layer parameters;
step 6.4: for the quantization range of the activation value, the quantization range needs to be calculated independently for each channel;
in order that the quantization range of the activation values may reflect the distribution over the entire data set, it is necessary to calculate an exponential moving average of the single quantization range, the exponential moving average calculation formula is as follows,
St=α×Yt+(1-α)×St-1(6)
α is moving average coefficient with value between 0-1, α is near 1 to reflect long-term numerical average, YtFor the value observed this time, StIs an exponential moving average value at the time t;
step 6.5: the BN layer operation is fused during training, the calculation formula of the parameters of the fused convolutional layer is as follows,
Figure FDA0002484838060000041
wherein gamma is the gamma parameter in the BN layer;
Figure FDA0002484838060000042
an exponential moving average representing the variance of the convolutional layer results for each batch; is a small constant; w and wfoldBefore and after fusion, respectively.
7. The compression method according to claim 1, characterized in that said quantitative reasoning of step 7 is in particular,
quantizing the fused offset parameters into 32-bit integer, taking a scaling coefficient S as the product of the convolutional layer weight and an input scaling coefficient, and setting a zero point Z to be 0;
the quantization reasoning needs to solve the problem of carrying out convolution matrix operation by using integers, and the formula is as follows;
first, the scaling coefficient S and the zero point Z of each layer are calculated from the obtained quantization range.
S=s(a,b,n)
Z=[q(0.0;a,b,n)](8)
The functions in the above two equations are defined before, and the quantized values can be dequantized to floating point numbers according to the scaling factor S and the zero point Z, and the dequantization equation is as follows,
r=S(q-Z) (9)
consider two real matrices r of two N × N1,r2Multiplication with the result r3α∈ {1,2,3}, 1 ≦ i, j ≦ N, having rα (i,j)Is represented by rαIn ith row and jth column of (e), each matrix quantization parameter is represented by (S)α,Zα) Each matrix after quantization is represented as qα (i,j)
The inverse quantization formula can be expressed as:
Figure FDA0002484838060000043
from the matrix multiplication, one can obtain:
Figure FDA0002484838060000044
the above formula can be rewritten as:
Figure FDA0002484838060000051
wherein
Figure FDA0002484838060000052
M is only one non-integer in the formula, but under the normal condition, the value of M is between 0 and 1, and the precision requirement can be met by using 32-bit fixed point number to express M; q. q.s3The results are calculated for quantification.
CN202010388100.1A 2020-05-09 2020-05-09 Combined neural network model compression method based on channel pruning and quantitative training Pending CN111652366A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010388100.1A CN111652366A (en) 2020-05-09 2020-05-09 Combined neural network model compression method based on channel pruning and quantitative training

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010388100.1A CN111652366A (en) 2020-05-09 2020-05-09 Combined neural network model compression method based on channel pruning and quantitative training

Publications (1)

Publication Number Publication Date
CN111652366A true CN111652366A (en) 2020-09-11

Family

ID=72343243

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010388100.1A Pending CN111652366A (en) 2020-05-09 2020-05-09 Combined neural network model compression method based on channel pruning and quantitative training

Country Status (1)

Country Link
CN (1) CN111652366A (en)

Cited By (44)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111860472A (en) * 2020-09-24 2020-10-30 成都索贝数码科技股份有限公司 Television station caption detection method, system, computer equipment and storage medium
CN111932690A (en) * 2020-09-17 2020-11-13 北京主线科技有限公司 Pruning method and device based on 3D point cloud neural network model
CN112101487A (en) * 2020-11-17 2020-12-18 深圳感臻科技有限公司 Compression method and device for fine-grained recognition model
CN112132219A (en) * 2020-09-24 2020-12-25 天津锋物科技有限公司 General deployment scheme of deep learning detection model based on mobile terminal
CN112149724A (en) * 2020-09-14 2020-12-29 浙江大学 Electroencephalogram data feature extraction method based on intra-class compactness
CN112149829A (en) * 2020-10-23 2020-12-29 北京百度网讯科技有限公司 Method, device, equipment and storage medium for determining network model pruning strategy
CN112183725A (en) * 2020-09-27 2021-01-05 安徽寒武纪信息科技有限公司 Method of providing neural network, computing device, and computer-readable storage medium
CN112396179A (en) * 2020-11-20 2021-02-23 浙江工业大学 Flexible deep learning network model compression method based on channel gradient pruning
CN112488070A (en) * 2020-12-21 2021-03-12 上海交通大学 Neural network compression method for remote sensing image target detection
CN112488291A (en) * 2020-11-03 2021-03-12 珠海亿智电子科技有限公司 Neural network 8-bit quantization compression method
CN112581423A (en) * 2020-09-29 2021-03-30 宁波大学 Neural network-based rapid detection method for automobile surface defects
CN112598020A (en) * 2020-11-24 2021-04-02 深兰人工智能(深圳)有限公司 Target identification method and system
CN112613610A (en) * 2020-12-25 2021-04-06 国网江苏省电力有限公司信息通信分公司 Deep neural network compression method based on joint dynamic pruning
CN112784839A (en) * 2021-02-03 2021-05-11 华南理工大学 Scene character detection model lightweight method based on mobile terminal, electronic equipment and storage medium
CN112800268A (en) * 2021-03-02 2021-05-14 安庆师范大学 Quantification and approximate nearest neighbor searching method for image visual characteristics
CN112836819A (en) * 2021-01-26 2021-05-25 北京奇艺世纪科技有限公司 Neural network model generation method and device
CN112836751A (en) * 2021-02-03 2021-05-25 歌尔股份有限公司 Target detection method and device
CN112884144A (en) * 2021-02-01 2021-06-01 上海商汤智能科技有限公司 Network quantization method and device, electronic equipment and storage medium
CN113011581A (en) * 2021-02-23 2021-06-22 北京三快在线科技有限公司 Neural network model compression method and device, electronic equipment and readable storage medium
CN113159297A (en) * 2021-04-29 2021-07-23 上海阵量智能科技有限公司 Neural network compression method and device, computer equipment and storage medium
CN113160062A (en) * 2021-05-25 2021-07-23 烟台艾睿光电科技有限公司 Infrared image target detection method, device, equipment and storage medium
CN113269312A (en) * 2021-06-03 2021-08-17 华南理工大学 Model compression method and system combining quantization and pruning search
CN113408723A (en) * 2021-05-19 2021-09-17 北京理工大学 Convolutional neural network pruning and quantization synchronous compression method for remote sensing application
CN113554147A (en) * 2021-04-27 2021-10-26 北京小米移动软件有限公司 Sample feature processing method and device, electronic equipment and storage medium
CN113570055A (en) * 2021-06-04 2021-10-29 合肥工业大学 Convolutional neural network compression method based on pre-quantization and scaling coefficient pruning
CN113627595A (en) * 2021-08-06 2021-11-09 温州大学 Probability-based MobileNet V1 network channel pruning method
CN113705791A (en) * 2021-08-31 2021-11-26 上海阵量智能科技有限公司 Neural network inference quantification method and device, electronic equipment and storage medium
CN113850385A (en) * 2021-10-12 2021-12-28 北京航空航天大学 Coarse and fine granularity combined neural network pruning method
CN114386588A (en) * 2022-03-23 2022-04-22 杭州雄迈集成电路技术股份有限公司 Neural network quantification method and device, and neural network reasoning method and system
WO2022088063A1 (en) * 2020-10-30 2022-05-05 华为技术有限公司 Method and apparatus for quantizing neural network model, and method and apparatus for processing data
WO2022095675A1 (en) * 2020-11-04 2022-05-12 安徽寒武纪信息科技有限公司 Neural network sparsification apparatus and method and related product
CN114565076A (en) * 2022-01-18 2022-05-31 中国人民解放军国防科技大学 Adaptive incremental streaming quantile estimation method and device
CN114626527A (en) * 2022-03-25 2022-06-14 中国电子产业工程有限公司 Neural network pruning method and device based on sparse constraint retraining
CN115170917A (en) * 2022-06-20 2022-10-11 美的集团(上海)有限公司 Image processing method, electronic device, and storage medium
CN115496207A (en) * 2022-11-08 2022-12-20 荣耀终端有限公司 Neural network model compression method, device and system
WO2022262660A1 (en) * 2021-06-15 2022-12-22 华南理工大学 Pruning and quantization compression method and system for super-resolution network, and medium
CN115797477A (en) * 2023-01-30 2023-03-14 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) Pruning type image compression sensing method and system for light weight deployment
CN116069743A (en) * 2023-03-06 2023-05-05 齐鲁工业大学(山东省科学院) Fluid data compression method based on time sequence characteristics
CN116167413A (en) * 2023-04-20 2023-05-26 国网山东省电力公司济南供电公司 Method and system for quantized pruning joint optimization of deep convolutional neural network
CN116405127A (en) * 2023-06-09 2023-07-07 北京理工大学 Compression method and device of underwater acoustic communication preamble signal detection model
CN116468101A (en) * 2023-03-21 2023-07-21 美的集团(上海)有限公司 Model pruning method, device, electronic equipment and readable storage medium
CN116611495A (en) * 2023-06-19 2023-08-18 北京百度网讯科技有限公司 Compression method, training method, processing method and device of deep learning model
CN116894189A (en) * 2023-09-11 2023-10-17 中移(苏州)软件技术有限公司 Model training method, device, equipment and readable storage medium
CN117497194B (en) * 2023-12-28 2024-03-01 苏州元脑智能科技有限公司 Biological information processing method and device, electronic equipment and storage medium

Cited By (63)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112149724A (en) * 2020-09-14 2020-12-29 浙江大学 Electroencephalogram data feature extraction method based on intra-class compactness
CN111932690A (en) * 2020-09-17 2020-11-13 北京主线科技有限公司 Pruning method and device based on 3D point cloud neural network model
CN111932690B (en) * 2020-09-17 2022-02-22 北京主线科技有限公司 Pruning method and device based on 3D point cloud neural network model
CN111860472A (en) * 2020-09-24 2020-10-30 成都索贝数码科技股份有限公司 Television station caption detection method, system, computer equipment and storage medium
CN112132219A (en) * 2020-09-24 2020-12-25 天津锋物科技有限公司 General deployment scheme of deep learning detection model based on mobile terminal
CN112183725B (en) * 2020-09-27 2023-01-17 安徽寒武纪信息科技有限公司 Method of providing neural network, computing device, and computer-readable storage medium
CN112183725A (en) * 2020-09-27 2021-01-05 安徽寒武纪信息科技有限公司 Method of providing neural network, computing device, and computer-readable storage medium
CN112581423A (en) * 2020-09-29 2021-03-30 宁波大学 Neural network-based rapid detection method for automobile surface defects
EP3876166A3 (en) * 2020-10-23 2022-01-12 Beijing Baidu Netcom Science And Technology Co. Ltd. Method and apparatus for determining network model pruning strategy, device and storage medium
CN112149829B (en) * 2020-10-23 2024-05-14 北京百度网讯科技有限公司 Method, device, equipment and storage medium for determining pruning strategy of network model
CN112149829A (en) * 2020-10-23 2020-12-29 北京百度网讯科技有限公司 Method, device, equipment and storage medium for determining network model pruning strategy
WO2022088063A1 (en) * 2020-10-30 2022-05-05 华为技术有限公司 Method and apparatus for quantizing neural network model, and method and apparatus for processing data
CN112488291A (en) * 2020-11-03 2021-03-12 珠海亿智电子科技有限公司 Neural network 8-bit quantization compression method
CN112488291B (en) * 2020-11-03 2024-06-04 珠海亿智电子科技有限公司 8-Bit quantization compression method for neural network
WO2022095675A1 (en) * 2020-11-04 2022-05-12 安徽寒武纪信息科技有限公司 Neural network sparsification apparatus and method and related product
CN112101487A (en) * 2020-11-17 2020-12-18 深圳感臻科技有限公司 Compression method and device for fine-grained recognition model
CN112101487B (en) * 2020-11-17 2021-07-16 深圳感臻科技有限公司 Compression method and device for fine-grained recognition model
CN112396179A (en) * 2020-11-20 2021-02-23 浙江工业大学 Flexible deep learning network model compression method based on channel gradient pruning
CN112598020A (en) * 2020-11-24 2021-04-02 深兰人工智能(深圳)有限公司 Target identification method and system
CN112488070A (en) * 2020-12-21 2021-03-12 上海交通大学 Neural network compression method for remote sensing image target detection
CN112613610A (en) * 2020-12-25 2021-04-06 国网江苏省电力有限公司信息通信分公司 Deep neural network compression method based on joint dynamic pruning
CN112836819B (en) * 2021-01-26 2023-07-25 北京奇艺世纪科技有限公司 Neural network model generation method and device
CN112836819A (en) * 2021-01-26 2021-05-25 北京奇艺世纪科技有限公司 Neural network model generation method and device
CN112884144A (en) * 2021-02-01 2021-06-01 上海商汤智能科技有限公司 Network quantization method and device, electronic equipment and storage medium
CN112836751A (en) * 2021-02-03 2021-05-25 歌尔股份有限公司 Target detection method and device
CN112784839A (en) * 2021-02-03 2021-05-11 华南理工大学 Scene character detection model lightweight method based on mobile terminal, electronic equipment and storage medium
CN113011581A (en) * 2021-02-23 2021-06-22 北京三快在线科技有限公司 Neural network model compression method and device, electronic equipment and readable storage medium
CN113011581B (en) * 2021-02-23 2023-04-07 北京三快在线科技有限公司 Neural network model compression method and device, electronic equipment and readable storage medium
CN112800268B (en) * 2021-03-02 2022-08-26 安庆师范大学 Quantification and approximate nearest neighbor searching method for image visual characteristics
CN112800268A (en) * 2021-03-02 2021-05-14 安庆师范大学 Quantification and approximate nearest neighbor searching method for image visual characteristics
CN113554147A (en) * 2021-04-27 2021-10-26 北京小米移动软件有限公司 Sample feature processing method and device, electronic equipment and storage medium
CN113159297B (en) * 2021-04-29 2024-01-09 上海阵量智能科技有限公司 Neural network compression method, device, computer equipment and storage medium
CN113159297A (en) * 2021-04-29 2021-07-23 上海阵量智能科技有限公司 Neural network compression method and device, computer equipment and storage medium
CN113408723A (en) * 2021-05-19 2021-09-17 北京理工大学 Convolutional neural network pruning and quantization synchronous compression method for remote sensing application
CN113160062A (en) * 2021-05-25 2021-07-23 烟台艾睿光电科技有限公司 Infrared image target detection method, device, equipment and storage medium
CN113269312B (en) * 2021-06-03 2021-11-09 华南理工大学 Model compression method and system combining quantization and pruning search
CN113269312A (en) * 2021-06-03 2021-08-17 华南理工大学 Model compression method and system combining quantization and pruning search
CN113570055A (en) * 2021-06-04 2021-10-29 合肥工业大学 Convolutional neural network compression method based on pre-quantization and scaling coefficient pruning
WO2022262660A1 (en) * 2021-06-15 2022-12-22 华南理工大学 Pruning and quantization compression method and system for super-resolution network, and medium
CN113627595B (en) * 2021-08-06 2023-07-25 温州大学 Probability-based MobileNet V1 network channel pruning method
CN113627595A (en) * 2021-08-06 2021-11-09 温州大学 Probability-based MobileNet V1 network channel pruning method
CN113705791B (en) * 2021-08-31 2023-12-19 上海阵量智能科技有限公司 Neural network reasoning quantification method and device, electronic equipment and storage medium
CN113705791A (en) * 2021-08-31 2021-11-26 上海阵量智能科技有限公司 Neural network inference quantification method and device, electronic equipment and storage medium
CN113850385A (en) * 2021-10-12 2021-12-28 北京航空航天大学 Coarse and fine granularity combined neural network pruning method
CN114565076A (en) * 2022-01-18 2022-05-31 中国人民解放军国防科技大学 Adaptive incremental streaming quantile estimation method and device
CN114386588A (en) * 2022-03-23 2022-04-22 杭州雄迈集成电路技术股份有限公司 Neural network quantification method and device, and neural network reasoning method and system
CN114626527B (en) * 2022-03-25 2024-02-09 中国电子产业工程有限公司 Neural network pruning method and device based on sparse constraint retraining
CN114626527A (en) * 2022-03-25 2022-06-14 中国电子产业工程有限公司 Neural network pruning method and device based on sparse constraint retraining
CN115170917A (en) * 2022-06-20 2022-10-11 美的集团(上海)有限公司 Image processing method, electronic device, and storage medium
CN115170917B (en) * 2022-06-20 2023-11-07 美的集团(上海)有限公司 Image processing method, electronic device and storage medium
CN115496207B (en) * 2022-11-08 2023-09-26 荣耀终端有限公司 Neural network model compression method, device and system
CN115496207A (en) * 2022-11-08 2022-12-20 荣耀终端有限公司 Neural network model compression method, device and system
CN115797477A (en) * 2023-01-30 2023-03-14 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) Pruning type image compression sensing method and system for light weight deployment
CN116069743A (en) * 2023-03-06 2023-05-05 齐鲁工业大学(山东省科学院) Fluid data compression method based on time sequence characteristics
CN116468101A (en) * 2023-03-21 2023-07-21 美的集团(上海)有限公司 Model pruning method, device, electronic equipment and readable storage medium
CN116167413A (en) * 2023-04-20 2023-05-26 国网山东省电力公司济南供电公司 Method and system for quantized pruning joint optimization of deep convolutional neural network
CN116405127B (en) * 2023-06-09 2023-09-12 北京理工大学 Compression method and device of underwater acoustic communication preamble signal detection model
CN116405127A (en) * 2023-06-09 2023-07-07 北京理工大学 Compression method and device of underwater acoustic communication preamble signal detection model
CN116611495A (en) * 2023-06-19 2023-08-18 北京百度网讯科技有限公司 Compression method, training method, processing method and device of deep learning model
CN116611495B (en) * 2023-06-19 2024-03-01 北京百度网讯科技有限公司 Compression method, training method, processing method and device of deep learning model
CN116894189A (en) * 2023-09-11 2023-10-17 中移(苏州)软件技术有限公司 Model training method, device, equipment and readable storage medium
CN116894189B (en) * 2023-09-11 2024-01-05 中移(苏州)软件技术有限公司 Model training method, device, equipment and readable storage medium
CN117497194B (en) * 2023-12-28 2024-03-01 苏州元脑智能科技有限公司 Biological information processing method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN111652366A (en) Combined neural network model compression method based on channel pruning and quantitative training
CN110135580B (en) Convolution network full integer quantization method and application method thereof
CN111489364B (en) Medical image segmentation method based on lightweight full convolution neural network
CN110874631A (en) Convolutional neural network pruning method based on feature map sparsification
CN112052951B (en) Pruning neural network method, system, equipment and readable storage medium
CN114118402A (en) Self-adaptive pruning model compression algorithm based on grouping attention mechanism
CN111985523A (en) Knowledge distillation training-based 2-exponential power deep neural network quantification method
CN112329922A (en) Neural network model compression method and system based on mass spectrum data set
CN113222138A (en) Convolutional neural network compression method combining layer pruning and channel pruning
CN109615068A (en) The method and apparatus that feature vector in a kind of pair of model is quantified
CN110111266B (en) Approximate information transfer algorithm improvement method based on deep learning denoising
CN114139683A (en) Neural network accelerator model quantization method
CN111695624A (en) Data enhancement strategy updating method, device, equipment and storage medium
CN114118406A (en) Quantitative compression method of convolutional neural network
CN114187261A (en) Non-reference stereo image quality evaluation method based on multi-dimensional attention mechanism
CN112651500B (en) Method for generating quantization model and terminal
CN114140641A (en) Image classification-oriented multi-parameter self-adaptive heterogeneous parallel computing method
CN115170902B (en) Training method of image processing model
CN113160081A (en) Depth face image restoration method based on perception deblurring
CN116757255A (en) Method for improving weight reduction of mobile NetV2 distracted driving behavior detection model
CN113554104B (en) Image classification method based on deep learning model
CN114372565B (en) Target detection network compression method for edge equipment
CN116309171A (en) Method and device for enhancing monitoring image of power transmission line
CN113947203A (en) YOLOV3 model pruning method for intelligent vehicle-mounted platform
CN113033804B (en) Convolution neural network compression method for remote sensing image

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination