CN114970853A - Cross-range quantization convolutional neural network compression method - Google Patents

Cross-range quantization convolutional neural network compression method Download PDF

Info

Publication number
CN114970853A
CN114970853A CN202210260332.8A CN202210260332A CN114970853A CN 114970853 A CN114970853 A CN 114970853A CN 202210260332 A CN202210260332 A CN 202210260332A CN 114970853 A CN114970853 A CN 114970853A
Authority
CN
China
Prior art keywords
quantization
neural network
convolutional neural
range
weight
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210260332.8A
Other languages
Chinese (zh)
Inventor
邢晓芬
杨弈才
郭锴凌
徐向民
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China University of Technology SCUT
Zhongshan Institute of Modern Industrial Technology of South China University of Technology
Original Assignee
South China University of Technology SCUT
Zhongshan Institute of Modern Industrial Technology of South China University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China University of Technology SCUT, Zhongshan Institute of Modern Industrial Technology of South China University of Technology filed Critical South China University of Technology SCUT
Priority to CN202210260332.8A priority Critical patent/CN114970853A/en
Publication of CN114970853A publication Critical patent/CN114970853A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The invention discloses a cross-range quantized convolutional neural network compression method, which comprises the following steps: determining quantization bits of a quantization convolutional neural network, a quantization range under the quantization bits, and a quantization function based on step quantization; training a full-precision convolutional neural network, and initializing a quantization convolutional neural network and a quantization step length by using the full-precision convolutional neural network; in the forward propagation stage, the weight parameters and the activation values of the convolutional neural network are quantized, and a conventional quantization mode is adopted for the values in the range of the quantization threshold; for the value outside the range of the quantization threshold, subtracting the quantization threshold, and then carrying out conventional quantization; the back propagation phase uses gradient approximation to make the non-derivable quantization function conductive. The invention adopts different quantization modes for the value outside the quantization range on the basis of conventional quantization, and realizes the compression and acceleration of the convolutional neural network while keeping the image identification precision.

Description

Cross-range quantization convolutional neural network compression method
Technical Field
The invention belongs to the field of image recognition, and relates to a cross-range quantized convolutional neural network compression method.
Background
In recent years, deep convolutional neural networks have been developed rapidly, and have achieved excellent performance in various tasks, such as image recognition, target detection, and the like, and more intelligent devices have deployed deep convolutional neural networks. However, the deployment and application of the deep convolutional neural network are limited by the large amount of memory resources occupied by the deep convolutional neural network and the characteristics of the computing resources. Therefore, researchers have begun to research convolutional neural network compression technology, which can reduce the weight of a bulked deep convolutional neural network while ensuring the performance of the deep convolutional neural network.
At present, four convolutional neural network compression methods based on an image recognition task are mainly used, namely low-rank decomposition, pruning, knowledge distillation and quantization.
(1) Low rank decomposition
In general, a matrix has a lot of redundant information, and it can be considered that the matrix is not of full rank. Therefore, the matrix can be decomposed into a plurality of small matrixes with fewer parameters, lower rank and simpler form, and the original matrix is reconstructed through the operations of outer product of the small matrixes and the like, so that the aims of reducing the memory and accelerating the operation are fulfilled. In general, the low rank decomposition method aims to minimize reconstruction errors while ensuring the performance of the convolutional neural network.
(2) Pruning branches
The pruning method is generally a method for compressing a pre-trained convolutional neural network. The bigger and deeper a convolutional neural network is, the more easily redundant and invalid parameters exist, and pruning ensures that the redundant and invalid parameters do not participate in the convolutional neural network reasoning by removing the redundant and invalid parameters, thereby achieving the purpose of compressing the convolutional neural network. Pruning is generally divided into unstructured pruning and structured pruning. Unstructured pruning generally sets some small weights to zero, and although model performance is largely preserved after pruning, the hardware aspect is more difficult to achieve this compression effect. The structured pruning is generally to prune modules such as convolution kernels and the like, and is friendly to the deployment and implementation of hardware, so that the method is a main research direction in the pruning field.
(3) Knowledge distillation
Knowledge distillation is to transfer the 'knowledge' of a deeper, more complex or better-performance large convolutional neural network to a relatively simple small convolutional neural network to improve the performance of the small convolutional neural network, and finally, the small convolutional neural network can be used for replacing the large convolutional neural network to perform an actual deep learning task, so that the purposes of saving memory and accelerating operation are achieved. Knowledge distillation generally has two ways, one is to make the final output of the small convolutional neural network imitate the final output of the large convolutional neural network, and the other is to make the intermediate output of the small convolutional neural network imitate the intermediate output of the large convolutional neural network. The two methods can be used independently or together.
(4) Quantization
Quantization generally refers to representing weight parameters and even activation values of the convolutional neural network by lower-bit values, which can greatly reduce the memory and speed up the calculation. From the quantization bit perspective, quantization can be divided into binary quantization and multi-bit quantization, the compression efficiency and computational efficiency of binary quantization are very high, but the performance loss of convolutional neural networks is not a small amount. While multi-bit quantization can bring less performance loss while ensuring certain compression efficiency and calculation efficiency.
Current multi-bit quantization methods generally use learnable quantization functions, some of which use floating point quantization ranges of different convolutional layers or fully-connected layers as a learnable parameter, such as DSQ (Ruihao Gong, xiaanglong Liu, Shenghu Jiang, tianxiaang Li, Peng Hu, jiazhezhen Lin, Fengwei Yu, and Junjie yan.differential soft quantization: brightening of floating point range participating in quantization and low-bit neural network ICCV,2019.2,4,5) to explicitly change left and right thresholds of the floating point range of each convolutional layer into learnable; LSQ (Steven K Esser, Jeffrey L mckindly, deepia bablandi, Rathinakumar applushwamy, and dharmandra S module a left step size quantification. in ICLR,2020.2,3,4,5,6,8) indirectly makes the range of floating point numbers involved in the quantification learnable using the quantification step as a learnable parameter.
The quantization methods such as LSQ and DSQ may perform truncation on the values outside the quantization range, which are less than the quantization range, but are often important. Truncation of the part of values into quantization thresholds brings about a certain information loss, and meanwhile, truncation operation is not derivable, which affect normal training of the low-bit quantization convolutional neural network and bring about a certain precision loss.
The lower the equivalent bit, the more severely the classification accuracy of its corresponding quantized convolutional neural network decreases. By carrying out range-spanning quantization on the value of the part of the low-bit convolutional neural network exceeding the quantization range, although the actual calculation amount is increased after the range-spanning quantization, the value of the part is less, so that compared with the conventional quantization mode, the classification accuracy can be effectively improved by only increasing a small amount of calculation amount.
The four methods can achieve the effect of convolutional neural network compression from different angles, but quantization often has a larger compression rate and a calculation acceleration effect. Most of the existing quantization methods adopt a truncation type quantization function, namely, numerical values exceeding the range of a quantization threshold value are uniformly truncated into the quantization threshold value. Although the proportion of the larger values occupying the whole parameters is small, the larger values are often important, and certain information loss is brought by truncating the larger values, so that the performance of the convolutional neural network is influenced. While the truncation operation is not derivable, this further limits the training and updating of the quantized convolutional neural network parameters. Therefore, if the information of the numerical value outside the quantization threshold range can be reserved, the performance of the quantization convolutional neural network can be effectively improved.
Disclosure of Invention
Aiming at the defect that the conventional quantization method uniformly adopts simple truncation operation on values outside a quantization threshold value, the invention provides a convolution neural network compression method of cross-range quantization. In the task of image recognition, a small amount of calculation amount is added on the basis of a conventional quantization method, so that higher classification precision can be improved on the basis of the conventional quantization method.
The invention is realized by at least one of the following technical schemes.
A convolution neural network compression method of cross-range quantization comprises the following steps:
preprocessing an original image to obtain a preprocessed image;
carrying out cross-range quantization and training on the weight and the activation value of the convolutional neural network to construct a low bit quantization convolutional neural network;
and performing image recognition on the preprocessed image by using the quantized convolutional neural network.
Further, the weight quantization process of the convolutional neural network comprises:
in the initialization stage of the quantization convolutional neural network, initializing the weight parameters of the quantization convolutional neural network by using the weight parameters of the full-precision convolutional neural network, simultaneously calculating the statistical information of the weight and the activation value of the full-precision convolutional neural network, and initializing the quantization step length of the weight and the activation value of the quantization convolutional neural network by using the statistical information and the set quantization bits;
secondly, in a forward propagation stage of the training process, performing cross-range quantization on the weight and the activation value;
and thirdly, in the back propagation stage of the training process, deriving and updating the quantized convolutional neural network parameters according to the cross entropy loss function.
Further, initialization of the quantization step sw of each layer of weight W of the quantization convolutional neural network is obtained by jointly calculating the set quantization bit and the distribution of the weight parameters of the full-precision convolutional neural network.
Further, classifying and identifying the samples by using a full-precision model, and simultaneously recording the distribution information of the activation values of each layer, wherein the quantization step sA of the activation values A of each layer of the quantization convolutional neural network is obtained by jointly calculating the set quantization bits and the distribution of the activation values of the full-precision convolutional neural network.
Further, for the set quantization bits, the weight W of each layer of the quantization convolutional neural network and the quantization range thresholds QW and QA of the activation value a are fixed values.
Further, the quantization function includes Round operation, that is, for an input floating point number x, rounding the floating point number x to a corresponding integer value; the Round operation changes the original floating point number to a low-order integer.
Further, for the original floating point number input, dividing the original floating point number input by a quantization step length for scaling, and if the scaled value is within the range of a quantization threshold value, obtaining quantized output by using Round operation; if the scaled value exceeds the range of the quantization threshold, the quantization threshold is subtracted first, then the value after the quantization threshold is subtracted is subjected to Round operation to obtain quantization output, and if the quantization output at the moment still exceeds the quantization threshold, the quantization threshold is cut off; when convolution calculation is performed, the value outside the quantization threshold range is represented as the quantization threshold plus the quantization value minus the quantization threshold.
Further, the Round operation in the quantization function is not derivable, and the quantization function is made derivable by a gradient approximation in the back propagation phase of the training process.
Further, in a back propagation stage, the gradient of the quantization step size parameter is reduced, and the training of the quantization convolutional neural network can be ensured to be converged.
Further, the reduction coefficient of the weight quantization step gradient of each convolution layer is related to the set quantization bits and the layer weight parameter number; the reduction coefficient of the activation value quantization step gradient of each convolution layer is related to the set quantization bits and the number of activation value parameters.
Compared with the prior art, the invention has the following beneficial effects:
the method can compress and accelerate the conventional convolutional neural network, realizes the light weight of the convolutional neural network, and promotes the application of an image recognition algorithm to light-weight equipment.
Drawings
FIG. 1 is a diagram illustrating an implementation of a cross-range quantization convolutional neural network compression method and system thereof according to the present invention;
FIG. 2 is a diagram illustrating an implementation of cross-range weight two-bit quantization according to the present invention;
FIG. 3 illustrates a two-bit quantization process for cross-range activation values according to the present invention;
FIG. 4 is a flow chart of image recognition according to an embodiment of the present invention.
Detailed Description
The present invention will be described in further detail below with reference to specific embodiments, but the embodiments of the present invention are not limited thereto.
The principle of the invention is as follows: quantifying the weights and activation values of the convolutional neural network can greatly reduce the memory and the calculation amount of the convolutional neural network. Most of the existing low-bit quantization methods adopt truncation operation on the numerical value exceeding the quantization threshold value, namely, the numerical value exceeding the quantization threshold value Q is truncated into Q, the operation causes information loss on the part of the numerical value, and meanwhile, the truncation operation is not derivable, so that the training update of the convolutional neural network is further limited. The above two reasons illustrate that the truncation operation of conventional quantization can affect the performance of the quantized convolutional neural network. The invention provides a compression method and a compression system of cross-range quantization, which keep the information of the numerical value exceeding the quantization range threshold Q, and theoretically, compared with the original quantization method, the invention can obtain larger performance improvement by only increasing a small amount of calculation.
Example 1
As shown in fig. 1 and 4, a convolutional neural network compression method with cross-range quantization includes the following steps:
and S1, carrying out preprocessing such as zero filling, random clipping, random turning, normalization and the like on the original image to obtain a preprocessed image.
S2, quantizing and training the weight and the activation value of the convolutional neural network, and constructing a low-bit quantized convolutional neural network;
specifically, the method comprises the following steps:
firstly, training a full-precision convolutional neural network as a pre-training model (including but not limited to common convolutional neural networks such as ResNet and MobileNet) by using the preprocessed image;
and secondly, setting quantization bits, and calculating the quantization threshold of the weight and the activation value.
The quantization bit is set as b bits, and the basic structure of the convolutional neural network is as follows: convolution layer → batch normalization layer → ReLU activation function, because the ReLU function generally truncates to zero the part of the value less than zero, the quantization thresholds for the weight and activation value are different, and set
Figure BDA0003550500920000071
And
Figure BDA0003550500920000072
the left and right boundaries of the weight quantization range, respectively, then
Figure BDA0003550500920000073
Figure BDA0003550500920000074
Is provided with
Figure BDA0003550500920000075
And
Figure BDA0003550500920000076
the left and right boundaries of the quantization range of the activation values, respectively, then
Figure BDA0003550500920000077
Figure BDA0003550500920000078
Initializing and quantizing the weight parameters of the convolutional neural network by using the weight parameters of the full-precision convolutional neural network;
fourthly, calculating the statistical information of the weight and the activation value of the full-precision convolutional neural network, and then initializing the quantization step length of the weight and the activation value of the quantization convolutional neural network by using the statistical information and preset quantization bits.
Setting the weight of the current convolutional layer as a matrix
Figure BDA0003550500920000079
Wherein
Figure BDA00035505009200000710
Denotes the K line Cd 2 And C is the number of input channels, K is the number of convolution kernels, and d is the size of the convolution kernels. Quantization step size s of current layer weight W The initialization is as follows:
Figure BDA00035505009200000711
wherein | W | ceiling 1 Is the L1 norm of the weight W.
Classifying and identifying a certain number of samples (such as 128 samples) by using a full-precision convolutional neural network, recording the activation value of each layer in the inference process, wherein the quantization step sA of the activation value A of each layer of the quantization convolutional neural network is determined by preset quantization bits and the mean value of the L1 norm of the activation value currently recorded by the full-precision convolutional neural network. And initializing the quantization step size of the activation value of the current layer by using the statistical information of the activation value of the pre-trained full-precision convolutional neural network and the activation value quantization threshold.
Setting the current layer input activation value as a matrix
Figure BDA00035505009200000712
Wherein
Figure BDA00035505009200000713
A matrix representing B rows and CHW columns, B being the number of input samples (e.g., 128), C being the number of input channels, H and W being the length and width dimensions of the input feature map, the quantization step size s of the current layer activation value A The initialization is as follows:
Figure BDA0003550500920000081
wherein | A | Y phosphor 1 Is the L1 norm of the activation value a.
Sixthly, a forward propagation stage, wherein the weight W is uniformly quantized based on a learnable step length, wherein W i The ith element, i.e. floating point number W, representing the weight W i . For floating point number w i Dividing by the weight quantization step size s w And changing the value into an integer by using a Round function, wherein the value exceeding the quantization threshold value is subjected to cross-range quantization, namely, the quantization threshold value is subtracted firstly, then the value obtained after the quantization threshold value is subtracted is subjected to Round operation to obtain quantization output, and if the quantization output at the moment still exceeds the quantization threshold value, the quantization output is cut off into the quantization threshold value.
Figure BDA0003550500920000082
In order to be an integer after the quantization,
Figure BDA0003550500920000083
for the final quantized output, as shown in fig. 2, the specific formula is as follows:
Figure BDA0003550500920000084
Figure BDA0003550500920000085
seventhly, a forward propagation stage, performing uniform quantization on the activation value A based on a learnable step size, wherein a i The ith element representing the activation value a. For floating point number a i Divided by the activation value quantization step size s A And changing the value into an integer by using Round operation, wherein the value exceeding the quantization threshold value is subjected to cross-range quantization, namely, the quantization threshold value is subtracted firstly, then the value obtained after the quantization threshold value is subtracted is subjected to Round operation to obtain quantization output, and if the quantization output at the moment still exceeds the quantization threshold value, the quantization output is cut off into the quantization threshold value.
Figure BDA0003550500920000091
To be quantizedThe number of the integer (c) of (d),
Figure BDA0003550500920000092
for the final quantized output, as shown in fig. 3, the specific formula is as follows:
Figure BDA0003550500920000093
Figure BDA0003550500920000094
and in a back propagation stage, reducing the gradient of the quantization step size parameter to ensure that the training of the quantization convolutional neural network can be converged.
The Round operation in the quantization function is not derivable, and here, the gradient approximation is performed on the gradient of the Round operation based on the Straight Through Estimation (STE) method, and the input is set as x, and the specific formula is as follows:
Figure BDA0003550500920000095
ninthly, in the derivation process, because the weights of each convolution layer share one quantization step length, and the number of the weight parameters is higher than the number of the corresponding quantization step lengths by several orders of magnitude, the gradient amplitude of the quantization step length is higher than that of a single weight parameter w after derivation by the chain rule i Is several orders of magnitude higher for activation values. Extreme imbalance in the magnitude of the parameter gradient is detrimental to the training and convergence of the quantization convolutional neural network, and therefore the gradient of the quantization step needs to be reduced. The reduction coefficient of the quantization step gradient is related to the preset quantization bits and the number of the layer weight parameters or activation values.
Weight matrix for convolutional layer
Figure BDA0003550500920000101
Where C is the number of input channels, K is the number of convolution kernels, and d isConvolution kernel size, its corresponding quantization step size gradient reduction factor g w Comprises the following steps:
Figure BDA0003550500920000102
input active layer matrix for convolutional layer
Figure BDA0003550500920000103
Where B is the number of input samples (e.g., 128), C is the number of input channels, and H and W are the input feature map dimensions. Its corresponding quantization step size gradient minification coefficient g A Independent of the number of input samples, is:
Figure BDA0003550500920000104
and in (r), updating the quantitative convolutional neural network by using cross entropy as a loss function and random gradient descent as an optimizer.
As a preferred embodiment, the parameters of the first convolutional layer and the last fully-connected layer of the quantized convolutional neural network and the batch normalization layer are not quantized; the original input to the quantized convolutional neural network is also not quantized.
As a preferred embodiment, the quantization function would include a Round operation, i.e. for an input floating-point number x, rounding x to the corresponding integer value. If x is in the range of a quantization threshold value Q, conventional quantization is adopted for a value x to be subjected to quantization operation, and quantized output round (x) is obtained; if x exceeds the range of the quantization threshold Q, the quantization threshold Q is subtracted, then the quantization threshold Q is quantized conventionally, and the quantized output is represented as Q + Round (x-Q).
And S3, performing image recognition on the preprocessed image by using the quantized convolutional neural network.
The following experiments demonstrate the method of the present invention based on convolutional neural networks ResNet20 and ResNet 32.
Example 2
This example was conducted on a convolutional neural network ResNet20 on a common data set CIFAR-10, with two-bit and three-bit quantization on ResNet20, respectively. Experimental comparison results are shown in Table 1, LSQ is from reference 1 (see, for details: Steven K Eser, Jeffrey L McKinstry, Deepika Bablani, Rathi-nakumar applied wamy, and Dharmandra S Modha. Learned step size quantification. International Conference on Learning retrieval (ICLR), 2020.).
In the table, all methods use the same training setup. Initializing the convolutional neural networks in the same full-precision ResNet20 convolutional neural network, training the quantized convolutional neural networks, performing gradient descent training by using a random gradient descent optimizer with momentum of 0.9, wherein the initial learning rate is lr of 0.01, training 400 epochs, adopting cosine learning rate and finally attenuating to zero, the weight attenuation weight _ decay is 1e-4, and the sample batch size is batch size of 128.
TABLE 1 comparison results with LSQ two bit quantization
Figure BDA0003550500920000111
TABLE 2 comparison results with LSQ three bit quantization
Figure BDA0003550500920000121
From tables 1 and 2, the cross-range quantization method of the present invention can effectively improve the accuracy of the convolutional neural network of ResNet20, wherein the accuracy of the method of the present invention is improved by more than 1% in two-bit quantization. Moreover, the quantization precision can be effectively improved by independently carrying out cross-range quantization on the activation value, and meanwhile, after the cross-range quantization is carried out on the weight and the activation value, the precision improvement effect is more obvious. As can be seen from the results in the rightmost columns of tables 1 and 2, the increased amount of computation across the range quantization unit can bring more increase in accuracy than the way of directly increasing the quantization bits.
Example 3
This example performs an experiment on convolutional neural network ResNet32 on common data set CIFAR-100, and performs two-bit quantization on ResNet 32. The results are shown in Table 3. The parameter settings of example 2 were as in example 1.
TABLE 3 comparison results with LSQ two bit quantization
Figure BDA0003550500920000122
Figure BDA0003550500920000131
From table 3, the cross-range quantization method of the present invention improves the accuracy of the convolutional neural network of ResNet32 by 1.54% in two-bit quantization, and the effect is obvious.
The above embodiments are preferred embodiments of the present invention, but the present invention is not limited to the above embodiments, and any other changes, modifications, substitutions, combinations, and simplifications which do not depart from the spirit and principle of the present invention should be construed as equivalents thereof, and all such changes, modifications, substitutions, combinations, and simplifications are intended to be included in the scope of the present invention.

Claims (10)

1. A convolution neural network compression method of cross-range quantization is characterized by comprising the following steps:
preprocessing an original image to obtain a preprocessed image;
carrying out cross-range quantization and training on the weight and the activation value of the convolutional neural network to construct a low bit quantization convolutional neural network;
and performing image recognition on the preprocessed image by using the quantized convolutional neural network.
2. The convolutional neural network compression method for quantization across the range of claim 1, wherein the weight quantization process of the convolutional neural network comprises:
in the initialization stage of the quantization convolutional neural network, initializing the weight parameters of the quantization convolutional neural network by using the weight parameters of the full-precision convolutional neural network, simultaneously calculating the statistical information of the weight and the activation value of the full-precision convolutional neural network, and initializing the quantization step length of the weight and the activation value of the quantization convolutional neural network by using the statistical information and the set quantization bits;
secondly, performing cross-range quantization on the weight and the activation value in a forward propagation stage of the training process;
and thirdly, in the back propagation stage of the training process, deriving and updating the quantized convolutional neural network parameters according to the cross entropy loss function.
3. The convolutional neural network compression method for quantization across range of claim 2, wherein the quantization step s of the weight W of each layer of the convolutional neural network is quantized w The initialization of the method is obtained by jointly calculating the set quantization bits and the distribution of the weight parameters of the full-precision convolutional neural network.
4. The convolutional neural network compression method based on cross-range quantization of claim 2, wherein the samples are classified and identified by using a full-precision model, distribution information of each layer of activation values is recorded, and the quantization step size s of each layer of activation values A of the convolutional neural network is quantized A The method is obtained by jointly calculating the distribution of the set quantization bits and the activation values of the full-precision convolutional neural network.
5. The convolutional neural network compression method based on range-spanning quantization of claim 2, wherein for the set quantization bits, the quantization range threshold Q of the activation value a and the weight W of each layer of the convolutional neural network are quantized W And Q A Are all fixed values.
6. The convolutional neural network compression method of claim 2, wherein the quantization function comprises a Round operation, that is, for an input floating point number x, the floating point number x is rounded to a corresponding integer value; the Round operation changes the original floating point number to a low-order integer.
7. The method according to claim 6, wherein for an original floating-point number input, scaling is performed by dividing the original floating-point number input by a quantization step, and if the scaled value is within a quantization threshold range, a Round operation is performed to obtain a quantized output; if the scaled value exceeds the range of the quantization threshold, the quantization threshold is subtracted first, then the value after the quantization threshold is subtracted is subjected to Round operation to obtain quantization output, and if the quantization output at the moment still exceeds the quantization threshold, the quantization threshold is cut off; when convolution calculation is performed, the value outside the quantization threshold range is represented as the quantization threshold plus the quantization value minus the quantization threshold.
8. The convolutional neural network compression method of claim 6, wherein a Round operation in the quantization function is non-conductive, and a gradient approximation is performed on the quantization function to make it conductive in the back propagation stage of the training process.
9. The method as claimed in claim 6, wherein in the back propagation stage, the gradient of the quantization step parameter is reduced to ensure convergence of the training of the quantization convolutional neural network.
10. The convolutional neural network compression method of claim 8, wherein the reduction coefficient of the weight quantization step gradient of each convolutional layer is related to the set quantization bits and the number of weight parameters; the reduction coefficient of the activation value quantization step gradient of each convolutional layer is related to the set quantization bits and the number of activation value parameters.
CN202210260332.8A 2022-03-16 2022-03-16 Cross-range quantization convolutional neural network compression method Pending CN114970853A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210260332.8A CN114970853A (en) 2022-03-16 2022-03-16 Cross-range quantization convolutional neural network compression method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210260332.8A CN114970853A (en) 2022-03-16 2022-03-16 Cross-range quantization convolutional neural network compression method

Publications (1)

Publication Number Publication Date
CN114970853A true CN114970853A (en) 2022-08-30

Family

ID=82975543

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210260332.8A Pending CN114970853A (en) 2022-03-16 2022-03-16 Cross-range quantization convolutional neural network compression method

Country Status (1)

Country Link
CN (1) CN114970853A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116451770A (en) * 2023-05-19 2023-07-18 北京百度网讯科技有限公司 Compression method, training method, processing method and device of neural network model
CN116721399A (en) * 2023-07-26 2023-09-08 之江实验室 Point cloud target detection method and device for quantitative perception training
CN117095271A (en) * 2023-10-20 2023-11-21 第六镜视觉科技(西安)有限公司 Target identification method, device, electronic equipment and storage medium

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116451770A (en) * 2023-05-19 2023-07-18 北京百度网讯科技有限公司 Compression method, training method, processing method and device of neural network model
CN116451770B (en) * 2023-05-19 2024-03-01 北京百度网讯科技有限公司 Compression method, training method, processing method and device of neural network model
CN116721399A (en) * 2023-07-26 2023-09-08 之江实验室 Point cloud target detection method and device for quantitative perception training
CN116721399B (en) * 2023-07-26 2023-11-14 之江实验室 Point cloud target detection method and device for quantitative perception training
CN117095271A (en) * 2023-10-20 2023-11-21 第六镜视觉科技(西安)有限公司 Target identification method, device, electronic equipment and storage medium
CN117095271B (en) * 2023-10-20 2023-12-29 第六镜视觉科技(西安)有限公司 Target identification method, device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
He et al. Asymptotic soft filter pruning for deep convolutional neural networks
CN110378468B (en) Neural network accelerator based on structured pruning and low bit quantization
Zhuang et al. Discrimination-aware channel pruning for deep neural networks
CN111247537B (en) Method and system for effectively storing sparse neural network and sparse convolutional neural network
CN110880038B (en) System for accelerating convolution calculation based on FPGA and convolution neural network
Sung et al. Resiliency of deep neural networks under quantization
CN114970853A (en) Cross-range quantization convolutional neural network compression method
CN108510067B (en) Convolutional neural network quantification method based on engineering realization
CN110222821B (en) Weight distribution-based convolutional neural network low bit width quantization method
CN108491926B (en) Low-bit efficient depth convolution neural network hardware accelerated design method, module and system based on logarithmic quantization
CN111079781B (en) Lightweight convolutional neural network image recognition method based on low rank and sparse decomposition
CN107516129A (en) The depth Web compression method decomposed based on the adaptive Tucker of dimension
CN108304928A (en) Compression method based on the deep neural network for improving cluster
CN111147862B (en) End-to-end image compression method based on target coding
CN109635935A (en) Depth convolutional neural networks model adaptation quantization method based on the long cluster of mould
CN110020721B (en) Target detection deep learning network optimization method based on parameter compression
CN112329910A (en) Deep convolutional neural network compression method for structure pruning combined quantization
CN113269312B (en) Model compression method and system combining quantization and pruning search
CN112884149B (en) Random sensitivity ST-SM-based deep neural network pruning method and system
CN114677548A (en) Neural network image classification system and method based on resistive random access memory
Jiang et al. A low-latency LSTM accelerator using balanced sparsity based on FPGA
Ma et al. A survey of sparse-learning methods for deep neural networks
CN112686384A (en) Bit-width-adaptive neural network quantization method and device
CN116976428A (en) Model training method, device, equipment and storage medium
Hossain et al. Computational Complexity Reduction Techniques for Deep Neural Networks: A Survey

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination