CN111652366A - Combined neural network model compression method based on channel pruning and quantitative training - Google Patents
Combined neural network model compression method based on channel pruning and quantitative training Download PDFInfo
- Publication number
- CN111652366A CN111652366A CN202010388100.1A CN202010388100A CN111652366A CN 111652366 A CN111652366 A CN 111652366A CN 202010388100 A CN202010388100 A CN 202010388100A CN 111652366 A CN111652366 A CN 111652366A
- Authority
- CN
- China
- Prior art keywords
- layer
- pruning
- model
- quantization
- training
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/082—Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Abstract
The invention discloses a combined neural network model compression method based on channel pruning and quantitative training. Step 1: sparsifying the training model; step 2: training model pruning; and step 3: fine adjustment of the model; and 4, step 4: quantizing the model after pruning is finished, and constructing a conventional floating point number calculation graph; and 5: inserting pseudo-quantization modules at corresponding positions of convolution calculation in a calculation graph, inserting two pseudo-quantization modules at convolution weights and activation values, and quantizing the weights and the activation values into 8-bit integer; step 6: dynamically quantizing the training model until convergence; and 7: quantitative reasoning; and 8: finally, a model after pruning and quantification is obtained. The invention greatly reduces the consumption of the model on time and space by two technologies of pruning and quantification under the condition of keeping the accuracy of the model.
Description
Technical Field
The invention belongs to the technical field of data processing; in particular to a combined neural network model compression method based on channel pruning and quantitative training.
Background
The existing neural network pruning algorithm can be mainly divided into 3 steps: sparse training, cutting off channels with little influence and fine adjustment on a data set; existing pruning algorithms often take the calculation of the average of the convolution filter parameters in assessing the importance of a channel. However, this evaluation method only considers the influence of the convolution operation on the feature map, and does not consider the influence of the BN layer on the feature map. The network pruned in this way therefore has a considerable loss in performance. In the aspect of quantification, the existing method is mainly a static quantification method after model training is completed. The quantized parameters of the quantization method have certain errors, and an adjustment method on a data set is lacked, so that the accuracy of the quantized model has certain loss.
Disclosure of Invention
In order to solve the problem that the neural network model is difficult to deploy on general computing equipment due to large parameter quantity and large calculation quantity, the invention designs a combined neural network model compression method based on channel pruning and quantization training, and greatly reduces the consumption of the model on time and space by two technologies of pruning and quantization under the condition of keeping the accuracy rate of the model.
The invention is realized by the following technical scheme:
a combined neural network model compression method based on channel pruning and quantitative training comprises the following steps:
step 1: the sparse training model applies L1 norm punishment to the BN layer parameters after the convolutional layer needing to be sparse in the training process, so that the parameters have the characteristic of structured sparsity and are prepared for next channel cutting;
step 2: training model pruning, pruning channels corresponding to the convolutional layers with small gamma parameters in the BN layer according to the corresponding relation between the convolutional layers and the BN layer in the model in the pruning process, and pruning each layer shallowly and deeply to form a new model after channel pruning;
and step 3: fine adjustment of the model, continuing training the model after pruning on the data set, and properly reducing the learning rate to the previous oneTraining until the model precision is not improved any more, and ending channel pruning;
and 4, step 4: quantizing the model after pruning is finished, and constructing a conventional floating point number calculation graph;
and 5: inserting pseudo-quantization modules at corresponding positions of convolution calculation in a calculation graph, inserting two pseudo-quantization modules at convolution weights and activation values, and quantizing the weights and the activation values into 8-bit integer;
step 6: dynamically quantizing the training model until convergence, wherein in quantization training, the weight and the activation value of the convolutional layer need to be quantized;
and 7: quantizing reasoning, namely saving the quantization parameters of the convolutional layer weight and the activation value, scaling the coefficient S and the zero point Z, and finishing quantization training;
and 8: finally, a model after pruning and quantification is obtained.
Further, the step 1 sparse training model specifically comprises,
step 1.1: constructing an original convolutional neural network model, traversing each layer of the model, finding out the BN layer after each convolutional layer, and adding each BN layer into a BN layer list;
step 1.2: setting a training hyper-parameter in an original convolutional neural network model, wherein a sparsification coefficient lambda is between 0.0001 and 0.01,
step 1.3: setting a training super parameter and then performing sparse training;
carrying out forward propagation and calculating gradient information of each layer parameter by backward propagation; before gradient application, applying an L1 norm penalty to gamma parameters in each BN layer in a BN layer list;
collecting absolute values of all gamma parameters in a BN layer in the training process, sequencing and listing the sizes of the gamma parameters of each quantile;
judging the sparsification level according to the value, wherein the smaller the parameter value is, the higher the sparsification level is;
the training process is continued until the accuracy index and the sparsification level are not increased any more;
and after the training is stopped, storing the model and the model structure obtained by the training, and simultaneously calculating the parameter quantity and the calculated quantity of the model.
Further, in step 1.3, an L1 norm penalty is imposed on the γ parameter in each BN layer in the BN layer list, as shown below,
in the above formula, Ω (w) represents the L1 norm of the BN layer γ parameter, and the L1 norm is multiplied by the sparsification coefficient λ and added to the original objective function L to form a new objective function L;
the calculation process of the BN layer is shown as the following formula,
wherein z isinRepresents the input tensor of the layer, mu and sigma are respectively the mean and the variance of the tensor per channel, oa is a small value to ensure the stability of numerical calculation, and gamma and β are respectively trainable parameters in the two BN layers and respectively represent the scaling and the offset of the layer, wherein the gamma parameter is the target parameter punished by the L1 norm.
Further, the training model pruning in the step 2 is specifically
Step 2.1: traversing the model from front to back, finding out the corresponding BN layer behind each convolutional layer, if no corresponding BN layer exists, pruning the convolutional layer, for a network output layer, because an output channel is limited by a target task, the output channel cannot be pruned, a pruning part needs to be marked, the pruning information is summarized into a table, and for a part with short-circuit connection, the part can be regarded as a plurality of inputs;
step 2.2: and globally sorting the gamma parameters in all BN layers, calculating the pruning threshold of the gamma parameters according to the pruning ratio, calculating the minimum value of the maximum values of the gamma parameters of all BN layers, and taking the value as the upper limit of the pruning threshold. Exceeding this threshold will result in a layer being completely cut;
step 2.3: the pruning information table traversed from front to back in the step 2.1 is classified into the following three types,
for the combination of the unrestricted convolution layer plus the BN layer;
for the combination of the constrained convolutional layer plus the BN layer:
for residual block structure with short circuit connection
Step 2.4: redefining the network model according to the residual channel number after each layer of pruning, and storing the parameters of the new model after pruning.
5. The compression method according to claim 4, characterized in that said step 2.3 is, in particular,
step 2.3.1: for the combination of the unrestricted convolution layer plus the BN layer; pruning a convolution filter corresponding to the pruning mask of the input channel and the output pruning mask; outputting a pruning mask: according to the index composition of the convolution filter corresponding to the situation that the gamma parameter of the current convolution layer BN layer is smaller than the threshold value, pruning is realized in a mode of recombining the parameters of the convolution layer and the parameters of the BN layer; if the current convolutional layer is a depth separable convolution, namely the number of packets is equal to the number of input channels, the output pruning mask is the same as the input pruning mask, and the number of the packets after pruning is the same as the number of the residual convolution filters;
step 2.3.2: for the combination of the constrained convolutional layer plus the BN layer: the output channel is not pruned, only the convolution filter corresponding to the pruning mask of the input channel is pruned, and pruning is realized by recombining the parameters of the convolutional layer and the parameters of the BN layer;
step 2.3.3: for a residual block structure with a short circuit connection; the number of output channels of the last layer in the residual block must be the same as the number of input channels of the residual block, and the output pruning mask of the last layer of the residual block needs to be equal to the output pruning mask of the previous layer of the residual block in the pruning process, so that the model structure after pruning is ensured to be good.
Further, the step 6 dynamically quantifies the training model until convergence,
step 6.1, in quantization training, floating point numbers which are still unquantized are input, convolution layer parameters participate in floating point number operation after being quantized by a pseudo quantization module, the intermediate convolution processes are all floating point number operation, and the activated values after being activated by functions are quantized by the pseudo quantization module;
step 6.2: because the weight value distribution of the convolutional layer is concentrated and is a fixed constant and does not change along with the input of the model, the quantization of the parameter layering of the convolutional layer is integer; the activation value is influenced by the model input, and the value of the activation value can fluctuate in a large range, so a channel-by-channel quantification method is adopted; the formula for the quantization calculation is shown below,
clamp(r;a,b):=min(max(x,a),b) (3)
wherein r represents a floating point number being quantized; [ a, b ]]Represents a quantization range; n is the quantization order, n is 2 in 8-bit integer quantization8=256;[·]Represents the nearest integer; q (r; a, b, n) represents the result after quantization;
in the above formula, the calculation of the quantization result according to the floating point numerical value can be calculated only by determining the quantization ranges [ a, b ]; in the training process, because the input and model parameters are constantly changed, the distribution range of the quantized parameters needs to be observed to determine the quantization range;
step 6.3: for the quantization range of the convolutional layer parameter, for each convolutional layer parameter w, a: ═ minw and b: ═ maxw are taken. The value of the quantization range is converged along with the convergence of the convolutional layer parameters;
step 6.4: for the quantization range of the activation value, the quantization range needs to be calculated independently for each channel;
in order that the quantization range of the activation values may reflect the distribution over the entire data set, it is necessary to calculate an exponential moving average of the single quantization range, the exponential moving average calculation formula is as follows,
St=α×Yt+(1-α)×St-1(6)
α is moving average coefficient with value between 0-1, α is near 1 to reflect long-term numerical average, YtFor the value observed this time, StIs an exponential moving average value at the time t;
step 6.5: the BN layer operation is fused during training, the calculation formula of the parameters of the fused convolutional layer is as follows,
wherein gamma is the gamma parameter in the BN layer;an exponential moving average representing the variance of the convolutional layer results for each batch; is a small constant; w and wfoldBefore and after fusion, respectively.
Further, the quantitative reasoning of step 7 is specifically,
quantizing the fused offset parameters into 32-bit integer, taking a scaling coefficient S as the product of the convolutional layer weight and an input scaling coefficient, and setting a zero point Z to be 0;
the quantization reasoning needs to solve the problem of carrying out convolution matrix operation by using integers, and the formula is as follows;
first, the scaling coefficient S and the zero point Z of each layer are calculated from the obtained quantization range.
S=s(a,b,n)
Z=[q(0.0;a,b,n)](8)
The functions in the above two equations are defined before, and the quantized values can be dequantized to floating point numbers according to the scaling factor S and the zero point Z, and the dequantization equation is as follows,
r=S(q-Z) (9)
consider two real matrices r of two N × N1,r2Multiplication with the result r3α∈ {1,2,3}, 1 ≦ i, j ≦ N, having rα (i,j)Is represented by rαIn ith row and jth column of (e), each matrix quantization parameter is represented by (S)α,Zα) Each matrix after quantization is represented as qα (i,j);
The inverse quantization formula can be expressed as:
from the matrix multiplication, one can obtain:
the above formula can be rewritten as:
whereinM is only one non-integer in the formula, but under the normal condition, the value of M is between 0 and 1, and the precision requirement can be met by using 32-bit fixed point number to express M; q. q.s3The results are calculated for quantification.
The invention has the beneficial effects that:
1. the invention does not increase extra variables in the pruning process, directly limits the parameters of the BN layer, can better utilize the scaling effect of the gamma parameters of the BN layer, has better pruning effect than the existing pruning method and has better acceleration effect on common hardware.
2. The method is different from the prior quantization method of training before quantization, and the quantization training method directly simulates the quantization process in the training process, thereby overcoming the defect that the quantization parameter can not be adjusted by the quantization method after the quantization. The quantization method of the invention has smaller precision loss and is suitable for various common convolution models.
3. The operation method designed by the invention completely avoids floating point number operation and has better acceleration effect on some hardware.
4. The two model compression methods are orthogonal to each other, can be used independently or jointly, and can achieve a better model compression effect.
Drawings
FIG. 1 is a flow chart of the method of the present invention.
FIG. 2 is a flow chart of the pruning algorithm of the present invention.
FIG. 3 is a schematic diagram of the pruning algorithm of the present invention.
FIG. 4 is a schematic diagram of pruning in a multi-layer structure according to the pruning algorithm of the present invention.
FIG. 5 is a schematic diagram of pruning in a residual structure by the pruning algorithm of the present invention.
FIG. 6 is a flow chart of the quantization algorithm of the present invention.
FIG. 7 is a diagram of the present invention for quantitative inference computation.
FIG. 8 is a diagram of the computation of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be described clearly and completely with reference to the accompanying drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Example 1
A combined neural network model compression method based on channel pruning and quantitative training comprises the following steps: channel pruning is to reduce the number of neural network channels; integer operation instead of floating-point operation for quantization training
Step 1: the sparse training model applies L1 norm punishment to the BN layer parameters after the convolutional layer needing to be sparse in the training process, so that the parameters have the characteristic of structured sparsity and are prepared for next channel cutting;
step 2: training model pruning, pruning channels corresponding to the convolutional layers with small gamma parameters in the BN layer according to the corresponding relation between the convolutional layers and the BN layer in the model in the pruning process, and pruning each layer shallowly and deeply to form a new model after channel pruning;
and step 3: fine adjustment of the model, continuing training the model after pruning on the data set, and properly reducing the learning rate to the previous oneTraining until the model precision is not improved any more, and ending channel pruning;
and 4, step 4: quantizing the model after pruning is finished, and constructing a conventional floating point number calculation graph; 32-bit floating point numbers are adopted in general training;
and 5: inserting pseudo quantization modules at corresponding positions of convolution calculation in a calculation diagram, and inserting two pseudo quantization modules at convolution weight positions and activation value positions in order to simulate quantization effects in actual quantization estimation, so as to quantize the weight and activation values into 8-bit integer;
step 6: dynamically quantizing the training model until convergence, wherein in quantization training, the weight and the activation value of the convolutional layer need to be quantized;
and 7: quantizing reasoning, namely saving the quantization parameters of the convolutional layer weight and the activation value, scaling the coefficient S and the zero point Z, and finishing quantization training;
and 8: finally, a model after pruning and quantification is obtained.
Further, the step 1 sparse training model specifically comprises,
step 1.1: constructing an original convolutional neural network model, traversing each layer of the model, finding out the BN layer after each convolutional layer, and adding each BN layer into a BN layer list;
step 1.2: setting a training superparameter in an original convolutional neural network model, wherein the sparsification coefficient lambda is between 0.0001 and 0.01, generally 0.01, and the rest training superparameters are consistent with those without sparsification training;
step 1.3: setting a training super parameter and then performing sparse training;
carrying out forward propagation and calculating gradient information of each layer parameter by backward propagation; before gradient application, applying an L1 norm penalty to gamma parameters in each BN layer in a BN layer list;
collecting absolute values of all gamma parameters in a BN layer in the training process, sequencing and listing the sizes of the gamma parameters of each quantile;
judging the sparsification level according to the value, wherein the smaller the parameter value is, the higher the sparsification level is;
the training process is continued until the accuracy index and the sparsification level are not increased any more;
and after the training is stopped, storing the model and the model structure obtained by the training, and simultaneously calculating the parameter quantity and the calculated quantity of the model.
Further, in step 1.3, an L1 norm penalty is imposed on the γ parameter in each BN layer in the BN layer list, as shown below,
in the above formula, Ω (w) represents the L1 norm of the BN layer γ parameter, and the L1 norm is multiplied by the sparsification coefficient λ and added to the original objective function L to form a new objective function L;
the calculation process of the BN layer is shown as the following formula,
wherein z isinRepresents the input tensor of the layer, mu and sigma are respectively the mean and the variance of the tensor per channel, oa is a small value to ensure the stability of numerical calculation, and gamma and β are respectively trainable parameters in the two BN layers and respectively represent the scaling and the offset of the layer, wherein the gamma parameter is the target parameter punished by the L1 norm.
Further, the training model pruning in the step 2 is specifically
Step 2.1: traversing the model from front to back, finding out the corresponding BN layer behind each convolutional layer, if no corresponding BN layer exists, pruning the convolutional layer, for a network output layer, because an output channel is limited by a target task, the output channel cannot be pruned, a pruning part needs to be marked, the pruning information is summarized into a table, and for a part with short-circuit connection, the part can be regarded as a plurality of inputs;
step 2.2: and globally sorting the gamma parameters in all BN layers, calculating the pruning threshold of the gamma parameters according to the pruning ratio, calculating the minimum value of the maximum values of the gamma parameters of all BN layers, and taking the value as the upper limit of the pruning threshold. Exceeding this threshold will result in a layer being completely cut;
step 2.3: the pruning information table traversed from front to back in the step 2.1 is classified into the following three types,
for the combination of the unrestricted convolution layer plus the BN layer;
for the combination of the constrained convolutional layer plus the BN layer:
for residual block structure with short circuit connection
Step 2.4: redefining the network model according to the residual channel number after each layer of pruning, and storing the parameters of the new model after pruning.
Further, the step 2.3 is specifically,
step 2.3.1: for the combination of the unrestricted convolution layer plus the BN layer; pruning a pruning mask (corresponding to a pruning result of an output channel at the upper layer) of the input channel and a convolution filter corresponding to the output pruning mask; outputting a pruning mask: according to the index composition of the convolution filter corresponding to the situation that the gamma parameter of the current convolution layer BN layer is smaller than the threshold value, pruning is realized in a mode of recombining the parameters of the convolution layer and the parameters of the BN layer; if the current convolutional layer is a depth separable convolution, namely the number of packets is equal to the number of input channels, the output pruning mask is the same as the input pruning mask, and the number of the packets after pruning is the same as the number of the residual convolution filters;
step 2.3.2: for the combination of the constrained convolutional layer plus the BN layer: the output channel is not pruned, only the convolution filter corresponding to the pruning mask (corresponding to the pruning result of the upper layer of the output channel) of the input channel is pruned, and pruning is realized by recombining the parameters of the convolutional layer and the parameters of the BN layer;
step 2.3.3: for a residual block structure with a short circuit connection; the number of output channels of the last layer in the residual block must be the same as the number of input channels of the residual block, and the output pruning mask of the last layer of the residual block needs to be equal to the output pruning mask of the previous layer of the residual block in the pruning process, so that the model structure after pruning is ensured to be good.
Further, the step 6 dynamically quantifies the training model until convergence,
step 6.1, in quantization training, floating point numbers which are still unquantized are input, convolution layer parameters participate in floating point number operation after being quantized by a pseudo quantization module, the intermediate convolution processes are all floating point number operation, and the activated values after being activated by functions are quantized by the pseudo quantization module;
step 6.2: because the weight value distribution of the convolutional layer is concentrated and is a fixed constant and does not change along with the input of the model, the quantization of the parameter layering of the convolutional layer is integer; the activation value is influenced by the model input, and the value of the activation value can fluctuate in a large range, so a channel-by-channel quantification method is adopted; the formula for the quantization calculation is shown below,
clamp(r;a,b):=min(max(x,a),b) (3)
wherein r represents a floating point number being quantized; [ a, b ]]Represents a quantization range; n is the quantization order, n is 2 in 8-bit integer quantization8=256;[·]Represents the nearest integer; q (r; a, b, n) represents the result after quantization;
in the above formula, the calculation of the quantization result according to the floating point numerical value can be calculated only by determining the quantization ranges [ a, b ]; in the training process, because the input and model parameters are constantly changed, the distribution range of the quantized parameters needs to be observed to determine the quantization range;
step 6.3: for the quantization range of the convolutional layer parameter, for each convolutional layer parameter w, a: ═ minw and b: ═ maxw are taken. The value of the quantization range is converged along with the convergence of the convolutional layer parameters;
step 6.4: for the quantization range of the activation value, the quantization range needs to be calculated independently for each channel;
because the activation value of the initial training model is unstable, the activation value is not quantized at the moment; generally, after one fourth of the training process, observing the activation values channel by channel, and calculating the quantization range in the same way as the above; in order that the quantization range of the activation values may reflect the distribution over the entire data set, it is necessary to calculate an exponential moving average of the single quantization range, the exponential moving average calculation formula is as follows,
St=α×Yt+(1-α)×St-1(6)
α is moving average coefficient with value between 0-1, α is near 1 to reflect long-term numerical average, YtFor the value observed this time, StIs an exponential moving average value at the time t;
step 6.5: for networks where a BN layer exists, BN is a stand-alone operation in general conventional training. In the quantitatively optimized model, the BN operation is usually merged in the convolutional layer. In order to simulate the influence caused by the difference, the BN layer operation is fused during training, the calculation formula of the parameters of the fused convolutional layer is as follows,
wherein gamma is the gamma parameter in the BN layer;an exponential moving average representing the variance of the convolutional layer results for each batch; is a small constant; w and wfoldBefore and after fusion, respectively.
In the quantization training process, the quantization range and the BN layer parameters need to be frozen timely. The aim is to enable the network to learn the weights under static quantization parameters and BN layer parameters so as to better simulate the network inference process under the real condition. Both are usually frozen sequentially at 10% to 20% of the whole training process.
Further, the quantitative reasoning of step 7 is specifically,
quantizing the fused offset parameters into 32-bit integer, taking a scaling coefficient S as the product of the convolutional layer weight and an input scaling coefficient, and setting a zero point Z to be 0;
in the quantitative reasoning, the invention realizes the full integer reasoning, namely floating point number operation is not used in the model reasoning process; the quantization reasoning needs to solve the problem of carrying out convolution matrix operation by using integers, and the formula is as follows;
first, the scaling coefficient S and the zero point Z of each layer are calculated from the obtained quantization range.
S=s(a,b,n)
Z=[q(0.0;a,b,n)](8)
The functions in the above two equations are defined before, and the quantized values can be dequantized to floating point numbers according to the scaling factor S and the zero point Z, and the dequantization equation is as follows,
r=S(q-Z) (9)
consider two real matrices r of two N × N1,r2Multiplication with the result r3α∈ {1,2,3}, 1 ≦ i, j ≦ N, having rα (i,j)Is represented by rαIn ith row and jth column of (e), each matrix quantization parameter is represented by (S)α,Zα) Each matrix after quantization is represented as qα (i,j);
The inverse quantization formula can be expressed as:
from the matrix multiplication, one can obtain:
the above formula can be rewritten as:
Example 2
The improved YOLOv3 network was compressed using the pruning algorithm of the present invention. The improved YOLOv3 network architecture employs Mobilenetv2 as a feature extractor, followed by replacing the normal convolution with a deep separable convolution to reduce the amount of computation. The improved YOLOv3 network achieved 78.46% of the test set maps on the VOC data set. On a 512 × 512 input image, the calculated amount is 4.15GMACs, and the model parameter number is 6.775M.
The results are trained on the VOC training set for 80 rounds, standard data enhancement methods including random cutting, perspective transformation and horizontal turning are adopted, and a mixup data enhancement method is additionally adopted. And (3) adopting an Adam optimization algorithm and a cosine annealing learning rate strategy, wherein the initial learning rate is 4e-3, and the size of the batch size is 16. The following sparsification training and fine tuning training both use the same hyper-parameter settings.
In the sparse training, the sparse coefficient is set to be 0.01, 80 rounds of training are performed on the VOC data set from the beginning, and 75.65% of mAP of a test set is achieved. And finally achieving 75.44% of test set mAP through pruning by 40% of channel number and 20 rounds of fine tuning training, and reducing the precision by 3.0% compared with an un-pruned model. The calculated amount is reduced to 1.74GMACs, and the parameter amount is reduced to 2.31M. The reduction is 58.1 percent and 65.9 percent respectively compared with the non-pruning model.
And the model after pruning is quantized by using the quantization training algorithm of the invention. Int8 quantification is adopted, a post-pruning model is used for quantitative training on the VOC data set, and the same setting is adopted for the hyper-parameters. The BN layer parameters were frozen after 10 rounds and the quantization parameters were frozen after 15 rounds. The final quantitative model achieved 76.74% mAP on the test set, which is 1.7% lower than the original model.
And carrying out speed test on the models on a test platform, wherein the test platform adopts E5-2630 v4 CPU. The tests were performed on a test set of VOC data sets and the results are shown in the table below.
TABLE 1 pruning and quantification model speed test results
From the above table, it can be seen that the pruning quantification combination method of the present invention can greatly accelerate the small model such as Mobilenetv2 with only a small loss of precision.
Claims (7)
1. A combined neural network model compression method based on channel pruning and quantitative training is characterized by comprising the following steps:
step 1: the sparse training model applies L1 norm punishment to the BN layer parameters after the convolutional layer needing to be sparse in the training process, so that the parameters have the characteristic of structured sparsity and are prepared for next channel cutting;
step 2: training model pruning, pruning channels corresponding to the convolutional layers with small gamma parameters in the BN layer according to the corresponding relation between the convolutional layers and the BN layer in the model in the pruning process, and pruning each layer shallowly and deeply to form a new model after channel pruning;
and step 3: fine adjustment of the model, continuing training the model after pruning on the data set, and properly reducing the learning rate to the previous oneTraining until the model precision is not improved any more, and ending channel pruning;
and 4, step 4: quantizing the model after pruning is finished, and constructing a conventional floating point number calculation graph;
and 5: inserting pseudo-quantization modules at corresponding positions of convolution calculation in a calculation graph, inserting two pseudo-quantization modules at convolution weights and activation values, and quantizing the weights and the activation values into 8-bit integer;
step 6: dynamically quantizing the training model until convergence, wherein in quantization training, the weight and the activation value of the convolutional layer need to be quantized;
and 7: quantizing reasoning, namely saving the quantization parameters of the convolutional layer weight and the activation value, scaling the coefficient S and the zero point Z, and finishing quantization training;
and 8: finally, a model after pruning and quantification is obtained.
2. The compression method according to claim 1, wherein the step 1 sparse training model is specifically,
step 1.1: constructing an original convolutional neural network model, traversing each layer of the model, finding out the BN layer after each convolutional layer, and adding each BN layer into a BN layer list;
step 1.2: setting a training hyper-parameter in an original convolutional neural network model, wherein a sparsification coefficient lambda is between 0.0001 and 0.01;
step 1.3: setting a training super parameter and then performing sparse training;
carrying out forward propagation and calculating gradient information of each layer parameter by backward propagation; before gradient application, applying an L1 norm penalty to gamma parameters in each BN layer in a BN layer list;
collecting absolute values of all gamma parameters in a BN layer in the training process, sequencing and listing the sizes of the gamma parameters of each quantile;
judging the sparsification level according to the value, wherein the smaller the parameter value is, the higher the sparsification level is;
the training process is continued until the accuracy index and the sparsification level are not increased any more;
and after the training is stopped, storing the model and the model structure obtained by the training, and simultaneously calculating the parameter quantity and the calculated quantity of the model.
3. The compression method according to claim 2, wherein an L1 norm penalty is imposed on the gamma parameter in each BN layer in the BN layer list in step 1.3, as shown in the following formula,
in the above formula, Ω (w) represents the L1 norm of the BN layer γ parameter, and the L1 norm is multiplied by the sparsification coefficient λ and added to the original objective function L to form a new objective function L;
the calculation process of the BN layer is shown as the following formula,
wherein z isinRepresents the input tensor of the layer, mu and sigma are respectively the mean and the variance of the tensor per channel, oa is a small value to ensure the stability of numerical calculation, and gamma and β are respectively trainable parameters in the two BN layers and respectively represent the scaling and the offset of the layer, wherein the gamma parameter is the target parameter punished by the L1 norm.
4. The compression method according to claim 1, wherein the training model pruning in step 2 is specifically
Step 2.1: traversing the model from front to back, finding out the corresponding BN layer behind each convolutional layer, if no corresponding BN layer exists, pruning the convolutional layer, for a network output layer, because an output channel is limited by a target task, the output channel cannot be pruned, a pruning part needs to be marked, the pruning information is summarized into a table, and for a part with short-circuit connection, the part can be regarded as a plurality of inputs;
step 2.2: and globally sorting the gamma parameters in all BN layers, calculating the pruning threshold of the gamma parameters according to the pruning ratio, calculating the minimum value of the maximum values of the gamma parameters of all BN layers, and taking the value as the upper limit of the pruning threshold. Exceeding this threshold will result in a layer being completely cut;
step 2.3: the pruning information table traversed from front to back in the step 2.1 is classified into the following three types,
for the combination of the unrestricted convolution layer plus the BN layer;
for the combination of the constrained convolutional layer plus the BN layer:
for residual block structure with short circuit connection
Step 2.4: redefining the network model according to the residual channel number after each layer of pruning, and storing the parameters of the new model after pruning.
5. The compression method according to claim 4, characterized in that said step 2.3 is, in particular,
step 2.3.1: for the combination of the unrestricted convolution layer plus the BN layer; pruning a convolution filter corresponding to the pruning mask of the input channel and the output pruning mask; outputting a pruning mask: according to the index composition of the convolution filter corresponding to the situation that the gamma parameter of the current convolution layer BN layer is smaller than the threshold value, pruning is realized in a mode of recombining the parameters of the convolution layer and the parameters of the BN layer; if the current convolutional layer is a depth separable convolution, namely the number of packets is equal to the number of input channels, the output pruning mask is the same as the input pruning mask, and the number of the packets after pruning is the same as the number of the residual convolution filters;
step 2.3.2: for the combination of the constrained convolutional layer plus the BN layer: the output channel is not pruned, only the convolution filter corresponding to the pruning mask of the input channel is pruned, and pruning is realized by recombining the parameters of the convolutional layer and the parameters of the BN layer;
step 2.3.3: for a residual block structure with a short circuit connection; the number of output channels of the last layer in the residual block must be the same as the number of input channels of the residual block, and the output pruning mask of the last layer of the residual block needs to be equal to the output pruning mask of the previous layer of the residual block in the pruning process, so that the model structure after pruning is ensured to be good.
6. The compression method according to claim 1, wherein the step 6 dynamically quantifies the training model until convergence,
step 6.1, in quantization training, floating point numbers which are still unquantized are input, convolution layer parameters participate in floating point number operation after being quantized by a pseudo quantization module, the intermediate convolution processes are all floating point number operation, and the activated values after being activated by functions are quantized by the pseudo quantization module;
step 6.2: because the weight value distribution of the convolutional layer is concentrated and is a fixed constant and does not change along with the input of the model, the quantization of the parameter layering of the convolutional layer is integer; the activation value is influenced by the model input, and the value of the activation value can fluctuate in a large range, so a channel-by-channel quantification method is adopted; the formula for the quantization calculation is shown below,
clamp(r;a,b):=min(max(x,a),b)(3)
wherein r represents a floating point number being quantized; [ a, b ]]Represents a quantization range; n is the quantization order, n is 2 in 8-bit integer quantization8=256;[·]Represents the nearest integer; q (r; a, b, n) represents the result after quantization;
in the above formula, the calculation of the quantization result according to the floating point numerical value can be calculated only by determining the quantization ranges [ a, b ]; in the training process, because the input and model parameters are constantly changed, the distribution range of the quantized parameters needs to be observed to determine the quantization range;
step 6.3: for the quantization range of the convolutional layer parameter, for each convolutional layer parameter w, a: ═ minw and b: ═ maxw are taken. The value of the quantization range is converged along with the convergence of the convolutional layer parameters;
step 6.4: for the quantization range of the activation value, the quantization range needs to be calculated independently for each channel;
in order that the quantization range of the activation values may reflect the distribution over the entire data set, it is necessary to calculate an exponential moving average of the single quantization range, the exponential moving average calculation formula is as follows,
St=α×Yt+(1-α)×St-1(6)
α is moving average coefficient with value between 0-1, α is near 1 to reflect long-term numerical average, YtFor the value observed this time, StIs an exponential moving average value at the time t;
step 6.5: the BN layer operation is fused during training, the calculation formula of the parameters of the fused convolutional layer is as follows,
7. The compression method according to claim 1, characterized in that said quantitative reasoning of step 7 is in particular,
quantizing the fused offset parameters into 32-bit integer, taking a scaling coefficient S as the product of the convolutional layer weight and an input scaling coefficient, and setting a zero point Z to be 0;
the quantization reasoning needs to solve the problem of carrying out convolution matrix operation by using integers, and the formula is as follows;
first, the scaling coefficient S and the zero point Z of each layer are calculated from the obtained quantization range.
S=s(a,b,n)
Z=[q(0.0;a,b,n)](8)
The functions in the above two equations are defined before, and the quantized values can be dequantized to floating point numbers according to the scaling factor S and the zero point Z, and the dequantization equation is as follows,
r=S(q-Z) (9)
consider two real matrices r of two N × N1,r2Multiplication with the result r3α∈ {1,2,3}, 1 ≦ i, j ≦ N, having rα (i,j)Is represented by rαIn ith row and jth column of (e), each matrix quantization parameter is represented by (S)α,Zα) Each matrix after quantization is represented as qα (i,j);
The inverse quantization formula can be expressed as:
from the matrix multiplication, one can obtain:
the above formula can be rewritten as:
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010388100.1A CN111652366A (en) | 2020-05-09 | 2020-05-09 | Combined neural network model compression method based on channel pruning and quantitative training |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010388100.1A CN111652366A (en) | 2020-05-09 | 2020-05-09 | Combined neural network model compression method based on channel pruning and quantitative training |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111652366A true CN111652366A (en) | 2020-09-11 |
Family
ID=72343243
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010388100.1A Pending CN111652366A (en) | 2020-05-09 | 2020-05-09 | Combined neural network model compression method based on channel pruning and quantitative training |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111652366A (en) |
Cited By (44)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111860472A (en) * | 2020-09-24 | 2020-10-30 | 成都索贝数码科技股份有限公司 | Television station caption detection method, system, computer equipment and storage medium |
CN111932690A (en) * | 2020-09-17 | 2020-11-13 | 北京主线科技有限公司 | Pruning method and device based on 3D point cloud neural network model |
CN112101487A (en) * | 2020-11-17 | 2020-12-18 | 深圳感臻科技有限公司 | Compression method and device for fine-grained recognition model |
CN112132219A (en) * | 2020-09-24 | 2020-12-25 | 天津锋物科技有限公司 | General deployment scheme of deep learning detection model based on mobile terminal |
CN112149724A (en) * | 2020-09-14 | 2020-12-29 | 浙江大学 | Electroencephalogram data feature extraction method based on intra-class compactness |
CN112149829A (en) * | 2020-10-23 | 2020-12-29 | 北京百度网讯科技有限公司 | Method, device, equipment and storage medium for determining network model pruning strategy |
CN112183725A (en) * | 2020-09-27 | 2021-01-05 | 安徽寒武纪信息科技有限公司 | Method of providing neural network, computing device, and computer-readable storage medium |
CN112396179A (en) * | 2020-11-20 | 2021-02-23 | 浙江工业大学 | Flexible deep learning network model compression method based on channel gradient pruning |
CN112488070A (en) * | 2020-12-21 | 2021-03-12 | 上海交通大学 | Neural network compression method for remote sensing image target detection |
CN112488291A (en) * | 2020-11-03 | 2021-03-12 | 珠海亿智电子科技有限公司 | Neural network 8-bit quantization compression method |
CN112581423A (en) * | 2020-09-29 | 2021-03-30 | 宁波大学 | Neural network-based rapid detection method for automobile surface defects |
CN112598020A (en) * | 2020-11-24 | 2021-04-02 | 深兰人工智能(深圳)有限公司 | Target identification method and system |
CN112613610A (en) * | 2020-12-25 | 2021-04-06 | 国网江苏省电力有限公司信息通信分公司 | Deep neural network compression method based on joint dynamic pruning |
CN112784839A (en) * | 2021-02-03 | 2021-05-11 | 华南理工大学 | Scene character detection model lightweight method based on mobile terminal, electronic equipment and storage medium |
CN112800268A (en) * | 2021-03-02 | 2021-05-14 | 安庆师范大学 | Quantification and approximate nearest neighbor searching method for image visual characteristics |
CN112836819A (en) * | 2021-01-26 | 2021-05-25 | 北京奇艺世纪科技有限公司 | Neural network model generation method and device |
CN112836751A (en) * | 2021-02-03 | 2021-05-25 | 歌尔股份有限公司 | Target detection method and device |
CN112884144A (en) * | 2021-02-01 | 2021-06-01 | 上海商汤智能科技有限公司 | Network quantization method and device, electronic equipment and storage medium |
CN113011581A (en) * | 2021-02-23 | 2021-06-22 | 北京三快在线科技有限公司 | Neural network model compression method and device, electronic equipment and readable storage medium |
CN113159297A (en) * | 2021-04-29 | 2021-07-23 | 上海阵量智能科技有限公司 | Neural network compression method and device, computer equipment and storage medium |
CN113160062A (en) * | 2021-05-25 | 2021-07-23 | 烟台艾睿光电科技有限公司 | Infrared image target detection method, device, equipment and storage medium |
CN113269312A (en) * | 2021-06-03 | 2021-08-17 | 华南理工大学 | Model compression method and system combining quantization and pruning search |
CN113408723A (en) * | 2021-05-19 | 2021-09-17 | 北京理工大学 | Convolutional neural network pruning and quantization synchronous compression method for remote sensing application |
CN113554147A (en) * | 2021-04-27 | 2021-10-26 | 北京小米移动软件有限公司 | Sample feature processing method and device, electronic equipment and storage medium |
CN113570055A (en) * | 2021-06-04 | 2021-10-29 | 合肥工业大学 | Convolutional neural network compression method based on pre-quantization and scaling coefficient pruning |
CN113627595A (en) * | 2021-08-06 | 2021-11-09 | 温州大学 | Probability-based MobileNet V1 network channel pruning method |
CN113705791A (en) * | 2021-08-31 | 2021-11-26 | 上海阵量智能科技有限公司 | Neural network inference quantification method and device, electronic equipment and storage medium |
CN113850385A (en) * | 2021-10-12 | 2021-12-28 | 北京航空航天大学 | Coarse and fine granularity combined neural network pruning method |
CN114386588A (en) * | 2022-03-23 | 2022-04-22 | 杭州雄迈集成电路技术股份有限公司 | Neural network quantification method and device, and neural network reasoning method and system |
WO2022088063A1 (en) * | 2020-10-30 | 2022-05-05 | 华为技术有限公司 | Method and apparatus for quantizing neural network model, and method and apparatus for processing data |
WO2022095675A1 (en) * | 2020-11-04 | 2022-05-12 | 安徽寒武纪信息科技有限公司 | Neural network sparsification apparatus and method and related product |
CN114565076A (en) * | 2022-01-18 | 2022-05-31 | 中国人民解放军国防科技大学 | Adaptive incremental streaming quantile estimation method and device |
CN114626527A (en) * | 2022-03-25 | 2022-06-14 | 中国电子产业工程有限公司 | Neural network pruning method and device based on sparse constraint retraining |
CN115170917A (en) * | 2022-06-20 | 2022-10-11 | 美的集团(上海)有限公司 | Image processing method, electronic device, and storage medium |
CN115496207A (en) * | 2022-11-08 | 2022-12-20 | 荣耀终端有限公司 | Neural network model compression method, device and system |
WO2022262660A1 (en) * | 2021-06-15 | 2022-12-22 | 华南理工大学 | Pruning and quantization compression method and system for super-resolution network, and medium |
CN115797477A (en) * | 2023-01-30 | 2023-03-14 | 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) | Pruning type image compression sensing method and system for light weight deployment |
CN116069743A (en) * | 2023-03-06 | 2023-05-05 | 齐鲁工业大学(山东省科学院) | Fluid data compression method based on time sequence characteristics |
CN116167413A (en) * | 2023-04-20 | 2023-05-26 | 国网山东省电力公司济南供电公司 | Method and system for quantized pruning joint optimization of deep convolutional neural network |
CN116405127A (en) * | 2023-06-09 | 2023-07-07 | 北京理工大学 | Compression method and device of underwater acoustic communication preamble signal detection model |
CN116468101A (en) * | 2023-03-21 | 2023-07-21 | 美的集团(上海)有限公司 | Model pruning method, device, electronic equipment and readable storage medium |
CN116611495A (en) * | 2023-06-19 | 2023-08-18 | 北京百度网讯科技有限公司 | Compression method, training method, processing method and device of deep learning model |
CN116894189A (en) * | 2023-09-11 | 2023-10-17 | 中移(苏州)软件技术有限公司 | Model training method, device, equipment and readable storage medium |
CN117497194B (en) * | 2023-12-28 | 2024-03-01 | 苏州元脑智能科技有限公司 | Biological information processing method and device, electronic equipment and storage medium |
-
2020
- 2020-05-09 CN CN202010388100.1A patent/CN111652366A/en active Pending
Cited By (63)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112149724A (en) * | 2020-09-14 | 2020-12-29 | 浙江大学 | Electroencephalogram data feature extraction method based on intra-class compactness |
CN111932690A (en) * | 2020-09-17 | 2020-11-13 | 北京主线科技有限公司 | Pruning method and device based on 3D point cloud neural network model |
CN111932690B (en) * | 2020-09-17 | 2022-02-22 | 北京主线科技有限公司 | Pruning method and device based on 3D point cloud neural network model |
CN111860472A (en) * | 2020-09-24 | 2020-10-30 | 成都索贝数码科技股份有限公司 | Television station caption detection method, system, computer equipment and storage medium |
CN112132219A (en) * | 2020-09-24 | 2020-12-25 | 天津锋物科技有限公司 | General deployment scheme of deep learning detection model based on mobile terminal |
CN112183725B (en) * | 2020-09-27 | 2023-01-17 | 安徽寒武纪信息科技有限公司 | Method of providing neural network, computing device, and computer-readable storage medium |
CN112183725A (en) * | 2020-09-27 | 2021-01-05 | 安徽寒武纪信息科技有限公司 | Method of providing neural network, computing device, and computer-readable storage medium |
CN112581423A (en) * | 2020-09-29 | 2021-03-30 | 宁波大学 | Neural network-based rapid detection method for automobile surface defects |
EP3876166A3 (en) * | 2020-10-23 | 2022-01-12 | Beijing Baidu Netcom Science And Technology Co. Ltd. | Method and apparatus for determining network model pruning strategy, device and storage medium |
CN112149829B (en) * | 2020-10-23 | 2024-05-14 | 北京百度网讯科技有限公司 | Method, device, equipment and storage medium for determining pruning strategy of network model |
CN112149829A (en) * | 2020-10-23 | 2020-12-29 | 北京百度网讯科技有限公司 | Method, device, equipment and storage medium for determining network model pruning strategy |
WO2022088063A1 (en) * | 2020-10-30 | 2022-05-05 | 华为技术有限公司 | Method and apparatus for quantizing neural network model, and method and apparatus for processing data |
CN112488291A (en) * | 2020-11-03 | 2021-03-12 | 珠海亿智电子科技有限公司 | Neural network 8-bit quantization compression method |
CN112488291B (en) * | 2020-11-03 | 2024-06-04 | 珠海亿智电子科技有限公司 | 8-Bit quantization compression method for neural network |
WO2022095675A1 (en) * | 2020-11-04 | 2022-05-12 | 安徽寒武纪信息科技有限公司 | Neural network sparsification apparatus and method and related product |
CN112101487A (en) * | 2020-11-17 | 2020-12-18 | 深圳感臻科技有限公司 | Compression method and device for fine-grained recognition model |
CN112101487B (en) * | 2020-11-17 | 2021-07-16 | 深圳感臻科技有限公司 | Compression method and device for fine-grained recognition model |
CN112396179A (en) * | 2020-11-20 | 2021-02-23 | 浙江工业大学 | Flexible deep learning network model compression method based on channel gradient pruning |
CN112598020A (en) * | 2020-11-24 | 2021-04-02 | 深兰人工智能(深圳)有限公司 | Target identification method and system |
CN112488070A (en) * | 2020-12-21 | 2021-03-12 | 上海交通大学 | Neural network compression method for remote sensing image target detection |
CN112613610A (en) * | 2020-12-25 | 2021-04-06 | 国网江苏省电力有限公司信息通信分公司 | Deep neural network compression method based on joint dynamic pruning |
CN112836819B (en) * | 2021-01-26 | 2023-07-25 | 北京奇艺世纪科技有限公司 | Neural network model generation method and device |
CN112836819A (en) * | 2021-01-26 | 2021-05-25 | 北京奇艺世纪科技有限公司 | Neural network model generation method and device |
CN112884144A (en) * | 2021-02-01 | 2021-06-01 | 上海商汤智能科技有限公司 | Network quantization method and device, electronic equipment and storage medium |
CN112836751A (en) * | 2021-02-03 | 2021-05-25 | 歌尔股份有限公司 | Target detection method and device |
CN112784839A (en) * | 2021-02-03 | 2021-05-11 | 华南理工大学 | Scene character detection model lightweight method based on mobile terminal, electronic equipment and storage medium |
CN113011581A (en) * | 2021-02-23 | 2021-06-22 | 北京三快在线科技有限公司 | Neural network model compression method and device, electronic equipment and readable storage medium |
CN113011581B (en) * | 2021-02-23 | 2023-04-07 | 北京三快在线科技有限公司 | Neural network model compression method and device, electronic equipment and readable storage medium |
CN112800268B (en) * | 2021-03-02 | 2022-08-26 | 安庆师范大学 | Quantification and approximate nearest neighbor searching method for image visual characteristics |
CN112800268A (en) * | 2021-03-02 | 2021-05-14 | 安庆师范大学 | Quantification and approximate nearest neighbor searching method for image visual characteristics |
CN113554147A (en) * | 2021-04-27 | 2021-10-26 | 北京小米移动软件有限公司 | Sample feature processing method and device, electronic equipment and storage medium |
CN113159297B (en) * | 2021-04-29 | 2024-01-09 | 上海阵量智能科技有限公司 | Neural network compression method, device, computer equipment and storage medium |
CN113159297A (en) * | 2021-04-29 | 2021-07-23 | 上海阵量智能科技有限公司 | Neural network compression method and device, computer equipment and storage medium |
CN113408723A (en) * | 2021-05-19 | 2021-09-17 | 北京理工大学 | Convolutional neural network pruning and quantization synchronous compression method for remote sensing application |
CN113160062A (en) * | 2021-05-25 | 2021-07-23 | 烟台艾睿光电科技有限公司 | Infrared image target detection method, device, equipment and storage medium |
CN113269312B (en) * | 2021-06-03 | 2021-11-09 | 华南理工大学 | Model compression method and system combining quantization and pruning search |
CN113269312A (en) * | 2021-06-03 | 2021-08-17 | 华南理工大学 | Model compression method and system combining quantization and pruning search |
CN113570055A (en) * | 2021-06-04 | 2021-10-29 | 合肥工业大学 | Convolutional neural network compression method based on pre-quantization and scaling coefficient pruning |
WO2022262660A1 (en) * | 2021-06-15 | 2022-12-22 | 华南理工大学 | Pruning and quantization compression method and system for super-resolution network, and medium |
CN113627595B (en) * | 2021-08-06 | 2023-07-25 | 温州大学 | Probability-based MobileNet V1 network channel pruning method |
CN113627595A (en) * | 2021-08-06 | 2021-11-09 | 温州大学 | Probability-based MobileNet V1 network channel pruning method |
CN113705791B (en) * | 2021-08-31 | 2023-12-19 | 上海阵量智能科技有限公司 | Neural network reasoning quantification method and device, electronic equipment and storage medium |
CN113705791A (en) * | 2021-08-31 | 2021-11-26 | 上海阵量智能科技有限公司 | Neural network inference quantification method and device, electronic equipment and storage medium |
CN113850385A (en) * | 2021-10-12 | 2021-12-28 | 北京航空航天大学 | Coarse and fine granularity combined neural network pruning method |
CN114565076A (en) * | 2022-01-18 | 2022-05-31 | 中国人民解放军国防科技大学 | Adaptive incremental streaming quantile estimation method and device |
CN114386588A (en) * | 2022-03-23 | 2022-04-22 | 杭州雄迈集成电路技术股份有限公司 | Neural network quantification method and device, and neural network reasoning method and system |
CN114626527B (en) * | 2022-03-25 | 2024-02-09 | 中国电子产业工程有限公司 | Neural network pruning method and device based on sparse constraint retraining |
CN114626527A (en) * | 2022-03-25 | 2022-06-14 | 中国电子产业工程有限公司 | Neural network pruning method and device based on sparse constraint retraining |
CN115170917A (en) * | 2022-06-20 | 2022-10-11 | 美的集团(上海)有限公司 | Image processing method, electronic device, and storage medium |
CN115170917B (en) * | 2022-06-20 | 2023-11-07 | 美的集团(上海)有限公司 | Image processing method, electronic device and storage medium |
CN115496207B (en) * | 2022-11-08 | 2023-09-26 | 荣耀终端有限公司 | Neural network model compression method, device and system |
CN115496207A (en) * | 2022-11-08 | 2022-12-20 | 荣耀终端有限公司 | Neural network model compression method, device and system |
CN115797477A (en) * | 2023-01-30 | 2023-03-14 | 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) | Pruning type image compression sensing method and system for light weight deployment |
CN116069743A (en) * | 2023-03-06 | 2023-05-05 | 齐鲁工业大学(山东省科学院) | Fluid data compression method based on time sequence characteristics |
CN116468101A (en) * | 2023-03-21 | 2023-07-21 | 美的集团(上海)有限公司 | Model pruning method, device, electronic equipment and readable storage medium |
CN116167413A (en) * | 2023-04-20 | 2023-05-26 | 国网山东省电力公司济南供电公司 | Method and system for quantized pruning joint optimization of deep convolutional neural network |
CN116405127B (en) * | 2023-06-09 | 2023-09-12 | 北京理工大学 | Compression method and device of underwater acoustic communication preamble signal detection model |
CN116405127A (en) * | 2023-06-09 | 2023-07-07 | 北京理工大学 | Compression method and device of underwater acoustic communication preamble signal detection model |
CN116611495A (en) * | 2023-06-19 | 2023-08-18 | 北京百度网讯科技有限公司 | Compression method, training method, processing method and device of deep learning model |
CN116611495B (en) * | 2023-06-19 | 2024-03-01 | 北京百度网讯科技有限公司 | Compression method, training method, processing method and device of deep learning model |
CN116894189A (en) * | 2023-09-11 | 2023-10-17 | 中移(苏州)软件技术有限公司 | Model training method, device, equipment and readable storage medium |
CN116894189B (en) * | 2023-09-11 | 2024-01-05 | 中移(苏州)软件技术有限公司 | Model training method, device, equipment and readable storage medium |
CN117497194B (en) * | 2023-12-28 | 2024-03-01 | 苏州元脑智能科技有限公司 | Biological information processing method and device, electronic equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111652366A (en) | Combined neural network model compression method based on channel pruning and quantitative training | |
CN110135580B (en) | Convolution network full integer quantization method and application method thereof | |
CN111489364B (en) | Medical image segmentation method based on lightweight full convolution neural network | |
CN110874631A (en) | Convolutional neural network pruning method based on feature map sparsification | |
CN112052951B (en) | Pruning neural network method, system, equipment and readable storage medium | |
CN114118402A (en) | Self-adaptive pruning model compression algorithm based on grouping attention mechanism | |
CN111985523A (en) | Knowledge distillation training-based 2-exponential power deep neural network quantification method | |
CN112329922A (en) | Neural network model compression method and system based on mass spectrum data set | |
CN113222138A (en) | Convolutional neural network compression method combining layer pruning and channel pruning | |
CN109615068A (en) | The method and apparatus that feature vector in a kind of pair of model is quantified | |
CN110111266B (en) | Approximate information transfer algorithm improvement method based on deep learning denoising | |
CN114139683A (en) | Neural network accelerator model quantization method | |
CN111695624A (en) | Data enhancement strategy updating method, device, equipment and storage medium | |
CN114118406A (en) | Quantitative compression method of convolutional neural network | |
CN114187261A (en) | Non-reference stereo image quality evaluation method based on multi-dimensional attention mechanism | |
CN112651500B (en) | Method for generating quantization model and terminal | |
CN114140641A (en) | Image classification-oriented multi-parameter self-adaptive heterogeneous parallel computing method | |
CN115170902B (en) | Training method of image processing model | |
CN113160081A (en) | Depth face image restoration method based on perception deblurring | |
CN116757255A (en) | Method for improving weight reduction of mobile NetV2 distracted driving behavior detection model | |
CN113554104B (en) | Image classification method based on deep learning model | |
CN114372565B (en) | Target detection network compression method for edge equipment | |
CN116309171A (en) | Method and device for enhancing monitoring image of power transmission line | |
CN113947203A (en) | YOLOV3 model pruning method for intelligent vehicle-mounted platform | |
CN113033804B (en) | Convolution neural network compression method for remote sensing image |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |